I would like to run some parallel for loops:

for ii = 1:rows

parfor k = 1:ksize

mags(:,k,ii)=expm(A*step)*(mags(:,k,ii-1)+Ainvb)-Ainvb;

end

end

The mags(:,k,ii-1) is where parfor is failing, but the data value ii-1 is just *** in the mags matrix with the proper indices. Is there anyway to get around this?

I have not had much luck with understanding the matlab help unfortunately.

Thanks

The immediate problem is that the MAGS variable is sliced into using two

different expressions based on ii. From the online doc

# Fixed Index Listing - Within the first-level parenthesis or braces, the

list of indices is the same for all occurrences of a given variable.

http://www.yqcomputer.com/

There is one way around this language restriction.

for ii = 1:rows

mags1=mags; % note matlab does not actually make a full copy here

parfor k = 1:ksize

mags(:,k,ii) = expm(A*step)*(mags1(:,k,ii-1)+Ainvb)-Ainvb;

end

end

However, allow me to make a suggestion along a different line. The above

code pieces together the 3D MAGS matrix one sheet at a time. This requires

moving the newly constructed sheet from the workers at the end of the PARFOR

loop and then BACK to the workers when the PARFOR is re-entered to start

working on the next sheet. I beleive you can reorder the PARFOR/FOR loops

so that MAGS is constructed by columns. The code snippet below approximates

your original code, but the PARFOR loop is only entered once, hence reducing

the amount of data transmitted between the workers and the client.

% the dimensions of a single sheet

NROWS = 40;

NCOLS = 40;

% put the parfor on the outside

% imagine each parfor loop iterate is stacking column vectors

% on top of one another

parfor( idx = 1 : NCOLS )

numSheets = 200; % how tall is the MAGS 3D matrix

% initialize the column being worked on at the "bottom"

col = zeros(NROWS, 1, numSheets);

% now loop from the "bottom" to the "top"

for sheet = 2 : numSheets

% now each column is calculated as the column below it modified in

some way

col( :, 1, sheet ) = col( :, 1, sheet-1) + ones( NROWS, 1 );

end

% slice the output, put the columns next each other

mags( :, idx, : ) = col;

end

end

HTH

Jeremy

different expressions based on ii. From the online doc

# Fixed Index Listing - Within the first-level parenthesis or braces, the

list of indices is the same for all occurrences of a given variable.

http://www.yqcomputer.com/

There is one way around this language restriction.

for ii = 1:rows

mags1=mags; % note matlab does not actually make a full copy here

parfor k = 1:ksize

mags(:,k,ii) = expm(A*step)*(mags1(:,k,ii-1)+Ainvb)-Ainvb;

end

end

However, allow me to make a suggestion along a different line. The above

code pieces together the 3D MAGS matrix one sheet at a time. This requires

moving the newly constructed sheet from the workers at the end of the PARFOR

loop and then BACK to the workers when the PARFOR is re-entered to start

working on the next sheet. I beleive you can reorder the PARFOR/FOR loops

so that MAGS is constructed by columns. The code snippet below approximates

your original code, but the PARFOR loop is only entered once, hence reducing

the amount of data transmitted between the workers and the client.

% the dimensions of a single sheet

NROWS = 40;

NCOLS = 40;

% put the parfor on the outside

% imagine each parfor loop iterate is stacking column vectors

% on top of one another

parfor( idx = 1 : NCOLS )

numSheets = 200; % how tall is the MAGS 3D matrix

% initialize the column being worked on at the "bottom"

col = zeros(NROWS, 1, numSheets);

% now loop from the "bottom" to the "top"

for sheet = 2 : numSheets

% now each column is calculated as the column below it modified in

some way

col( :, 1, sheet ) = col( :, 1, sheet-1) + ones( NROWS, 1 );

end

% slice the output, put the columns next each other

mags( :, idx, : ) = col;

end

end

HTH

Jeremy

My problem is very similar!

matlab has a problem with:

parfor (k=1:length(Rindices))

a(Rindices(k))=a(Rindices(k))+1;

end

saying "Legal indices for 'FCGR' are restricted in PARFOR loops. See Documentation"

But examining the documentation, my indices don't violate any of those four rules. They are striclty inclreasing by one: k= 1,2,3,......length(Rindices)

Also, changing the variable like this:

parfor (k=1:length(Rindices))

i=k;

a(Rindices(i))=a(Rindices(i))+1;

end

gives the same error message.

There's no reason why this should be an error. Each iteration of the forloop is COMPLETELY independent of the others, so they should be able to be performed on different machines.

Is there a way around this or does the loop HAVE to be done serially ??

matlab has a problem with:

parfor (k=1:length(Rindices))

a(Rindices(k))=a(Rindices(k))+1;

end

saying "Legal indices for 'FCGR' are restricted in PARFOR loops. See Documentation"

But examining the documentation, my indices don't violate any of those four rules. They are striclty inclreasing by one: k= 1,2,3,......length(Rindices)

Also, changing the variable like this:

parfor (k=1:length(Rindices))

i=k;

a(Rindices(i))=a(Rindices(i))+1;

end

gives the same error message.

There's no reason why this should be an error. Each iteration of the forloop is COMPLETELY independent of the others, so they should be able to be performed on different machines.

Is there a way around this or does the loop HAVE to be done serially ??

"Juliette Salexa" < XXXX@XXXXX.COM > writes:

The problem here is that "a" is not a "sliced" output variable. I agree that

your loop iterations are independent, but basically PARFOR cannot figure out by

inspecting the code which elements of "a" will be assigned by which worker. You

could re-write this in a couple of ways:

parfor k=1:length(Rindices)

xx(k) = a(Rindices(k)) + 1;

end

a(Rindices) = xx;

In the code above, a will become a "broadcast" variable - because it's not

"sliced", each worker will be sent the full value of "a" (this is

inefficient). To avoid that, I'd do:

xx = a(Rindices);

parfor k=1:length(Rindices)

xx(k) = xx(k)+1;

end

a(Rindices) = xx;

In that case, "xx" is a sliced input and output variable - so each worker only

ever gets sent the values that it needs to operate on.

(For more about "variable classification", see

< http://www.yqcomputer.com/ #bq_of7_-1>)

Cheers,

Edric.

The problem here is that "a" is not a "sliced" output variable. I agree that

your loop iterations are independent, but basically PARFOR cannot figure out by

inspecting the code which elements of "a" will be assigned by which worker. You

could re-write this in a couple of ways:

parfor k=1:length(Rindices)

xx(k) = a(Rindices(k)) + 1;

end

a(Rindices) = xx;

In the code above, a will become a "broadcast" variable - because it's not

"sliced", each worker will be sent the full value of "a" (this is

inefficient). To avoid that, I'd do:

xx = a(Rindices);

parfor k=1:length(Rindices)

xx(k) = xx(k)+1;

end

a(Rindices) = xx;

In that case, "xx" is a sliced input and output variable - so each worker only

ever gets sent the values that it needs to operate on.

(For more about "variable classification", see

< http://www.yqcomputer.com/ #bq_of7_-1>)

Cheers,

Edric.

*snip*

Since the PARFOR code is using Rindices(k) as linear indices into a, they

must contain positive integer values, and so you could do this without the

PARFOR loop. Try replacing that loop with a call to ACCUMARRAY.

Rindices = randi(10, 100, 1); % or if you're using a version that doesn't

contain RANDI

% Rindices = ceil(10*rand(100, 1));

a = accumarray(Rindices, 1)

Alternately, if the maximum value in Rindices is large but only a small

number of the values between 1 and that maximum value are represented in

Rindices, use SPARSE to create a as a sparse matrix.

Rindices = randi(1e6, 1000, 1); % or

% Rindices = ceil(1e6*rand(1000, 1));

a = sparse(Rindices, 1, 1)

--

Steve Lord

XXXX@XXXXX.COM

have written Jeremy back with this problem, but I thought I would make everyone else aware as well. Declaring mags1 like he suggests does allow the code to run. However, the act of passing the relatively large mags matrix (It is usually 6*401*256 or bigger) in and out of the parfor loop as mags apparently makes a lot of extra overhead. I have a 2 processor machine, and the parfor version with mags1 is slightly slower than the normal version. Does this surprise you Steven?

I cannot run my loop with k on the outside and ii on the inside. The variables related to ii depend on the ii-1 value, so while I can loop across k, I cannot loop across ii.

Thanks for all the help everyone.

"Jeremy Greenwald" < XXXX@XXXXX.COM > wrote in message <h5uhan$d88$ XXXX@XXXXX.COM >...

I cannot run my loop with k on the outside and ii on the inside. The variables related to ii depend on the ii-1 value, so while I can loop across k, I cannot loop across ii.

Thanks for all the help everyone.

"Jeremy Greenwald" < XXXX@XXXXX.COM > wrote in message <h5uhan$d88$ XXXX@XXXXX.COM >...

did not receive an email from you Matt. Let me try and respond here.

I do not see which variables are dependent on the entire ii-1 sheet of MAGS.

The reason I thought it was safe to switch the order of the loops was

because the indexing into MAGS on the right hand side of the assignment only

needed to fetch data from the kth column of the ii-1 sheet. Do the values

of A, step, or Ainvb depend on the full ii-1 sheet?

Another trick you can try to reduce the overhead of passing MAGS into the

PARFOR is to precalculate ii-1, save that value to another variable, and

then use that variable to index into MAGS1. This should allow the language

frontend to recognize mags1 as a sliced input, rather than a broadcast

variable.

...

iiMinus1 = ii-1;

parfor k = 1:ksize

mags(:,k,ii) = expm(A*step)*(mags1(:,k,iiMinus1)+Ainvb)-Ainvb;

end

HTH a little more

"Matthew" < XXXX@XXXXX.COM > wrote in message

news:h61o35$75n$ XXXX@XXXXX.COM ...

I do not see which variables are dependent on the entire ii-1 sheet of MAGS.

The reason I thought it was safe to switch the order of the loops was

because the indexing into MAGS on the right hand side of the assignment only

needed to fetch data from the kth column of the ii-1 sheet. Do the values

of A, step, or Ainvb depend on the full ii-1 sheet?

Another trick you can try to reduce the overhead of passing MAGS into the

PARFOR is to precalculate ii-1, save that value to another variable, and

then use that variable to index into MAGS1. This should allow the language

frontend to recognize mags1 as a sliced input, rather than a broadcast

variable.

...

iiMinus1 = ii-1;

parfor k = 1:ksize

mags(:,k,ii) = expm(A*step)*(mags1(:,k,iiMinus1)+Ainvb)-Ainvb;

end

HTH a little more

"Matthew" < XXXX@XXXXX.COM > wrote in message

news:h61o35$75n$ XXXX@XXXXX.COM ...

atthew,

Your original code is as follows ...

for ii = 1:rows

parfor k = 1:ksize

mags(:,k,ii)=expm(A*step)*(mags(:,k,ii-1)+Ainvb)-Ainvb;

end

end

Your code is an example of the following code (which BTW won't work as you've already correctly pointed out).

bigData = rand(nROWS, nCOLS, nSHEETS);

for k = 2:nSHEETS

parfor j = 1:nCOLS

bigData(:, j, k) = someFunction( bigData(:, j, k-1) );

end

end

Notice that k starts at 2 since we are going to index by k-1.

The following should work efficently ...

bigData = rand(nROWS, nCOLS, nSHEETS);

for k = 2:nSHEETS

thisSheet = zeros(nROWS, nCOLS);

prevSheet = bigData(:, :, k-1);

parfor j = 1:nCOLS

thisSheet(:, j) = someFunction( prevSheet(:, j) );

end

bigData(:, :, k) = thisSheet;

end

What I've done is to pull out the two sheet's (2-D matrices from the big 3D data) that you are wanting to work on in the parfor. By doing this, you'll see that thisSheet and prevSheet are both correctly sliced variables (one an input and one an

output variable). Thus only prevSheet (sliced) needs to be sent to each lab in the parfor and only thisSheet (again sliced) will be returned.

Once a sheet has been completely returned we assign it back into the larger 3D data matrix and continue. In fact with a little more tweaking you can see that each iteration round the outer for loop reuses thisSheet as it's prevSheet and you could make use of that fact by ordering the loops differently (as Jeremy suggested)

Basically this algorithm takes an initial value and computes some series from it. So you could rewrite as follows

outData = zeros(nROWS, nCOLS, nSHEETS);

inData = rand(nROWS, nCOLS);

parfor j = 1:nCOLS

temp = zeros(nROWS, nSHEETS);

temp(:, 1) = inData(:, j);

for k = 2:nSHEETS

temp(:, k) = someFunction( temp(:, k-1) );

end

outData(:, j, :) = temp;

end

Here I've explicitly defined the input data inData as the initial ROWS and COLS with no SHEETS (as that is what we are going to compute) The we can parfor over the column computation as before, and grap the initial input data into a temporary variable that we will use to undertake the series computation. Next we compute the series in the for loop and then assign the results to the return output data.

This correctly slices the inData and the outData so there is minimal variable transfer.

Hope this helps

Regards

Jos

"Matthew" < XXXX@XXXXX.COM > wrote in message <h61o35$75n$ XXXX@XXXXX.COM >...

Your original code is as follows ...

for ii = 1:rows

parfor k = 1:ksize

mags(:,k,ii)=expm(A*step)*(mags(:,k,ii-1)+Ainvb)-Ainvb;

end

end

Your code is an example of the following code (which BTW won't work as you've already correctly pointed out).

bigData = rand(nROWS, nCOLS, nSHEETS);

for k = 2:nSHEETS

parfor j = 1:nCOLS

bigData(:, j, k) = someFunction( bigData(:, j, k-1) );

end

end

Notice that k starts at 2 since we are going to index by k-1.

The following should work efficently ...

bigData = rand(nROWS, nCOLS, nSHEETS);

for k = 2:nSHEETS

thisSheet = zeros(nROWS, nCOLS);

prevSheet = bigData(:, :, k-1);

parfor j = 1:nCOLS

thisSheet(:, j) = someFunction( prevSheet(:, j) );

end

bigData(:, :, k) = thisSheet;

end

What I've done is to pull out the two sheet's (2-D matrices from the big 3D data) that you are wanting to work on in the parfor. By doing this, you'll see that thisSheet and prevSheet are both correctly sliced variables (one an input and one an

output variable). Thus only prevSheet (sliced) needs to be sent to each lab in the parfor and only thisSheet (again sliced) will be returned.

Once a sheet has been completely returned we assign it back into the larger 3D data matrix and continue. In fact with a little more tweaking you can see that each iteration round the outer for loop reuses thisSheet as it's prevSheet and you could make use of that fact by ordering the loops differently (as Jeremy suggested)

Basically this algorithm takes an initial value and computes some series from it. So you could rewrite as follows

outData = zeros(nROWS, nCOLS, nSHEETS);

inData = rand(nROWS, nCOLS);

parfor j = 1:nCOLS

temp = zeros(nROWS, nSHEETS);

temp(:, 1) = inData(:, j);

for k = 2:nSHEETS

temp(:, k) = someFunction( temp(:, k-1) );

end

outData(:, j, :) = temp;

end

Here I've explicitly defined the input data inData as the initial ROWS and COLS with no SHEETS (as that is what we are going to compute) The we can parfor over the column computation as before, and grap the initial input data into a temporary variable that we will use to undertake the series computation. Next we compute the series in the for loop and then assign the results to the return output data.

This correctly slices the inData and the outData so there is minimal variable transfer.

Hope this helps

Regards

Jos

"Matthew" < XXXX@XXXXX.COM > wrote in message <h61o35$75n$ XXXX@XXXXX.COM >...

irst, let me say thanks to those who have taken an interest in this. I have not worked through Jos's latest suggestion, but Jeremy I have tried your iiMinus1 suggestion to make the indexing work in parfor.

for ii = 1:rows

mags1=mags;

if ii>1

iiMinus1=ii-1;

else

iiMinus1=1;

end

parfor k = 1:ksize

math happens

end

if ii==1

mags(:,k,ii)=expm(A*0.0)*(y0+Ainvb)-Ainvb;

else

mags(:,k,ii)=expm(A*step)*(mags1(:,k,iiMinus1)+Ainvb)-Ainvb;

end

end

end

The if->else for iiMinus1 I put in to handle an error I was getting on the first iteration of the loop. As soon as I put the iiMinus1 in, mags1 is no longer underlined with the warning "variable 1 is indexed but not sliced in a parfor loop. This might result in unnecessary communications overhead." I took this as a very good sign. However, when I run, the parfor version of my program is still slower than the normal version.

This makes me believe it is the overhead of moving mags1, so I guess Jos's suggestion about passing a sheet is the next thing to try.

Thanks, Matt

"Jeremy Greenwald" < XXXX@XXXXX.COM > wrote in message <h61qr9$g2e$ XXXX@XXXXX.COM >...

for ii = 1:rows

mags1=mags;

if ii>1

iiMinus1=ii-1;

else

iiMinus1=1;

end

parfor k = 1:ksize

math happens

end

if ii==1

mags(:,k,ii)=expm(A*0.0)*(y0+Ainvb)-Ainvb;

else

mags(:,k,ii)=expm(A*step)*(mags1(:,k,iiMinus1)+Ainvb)-Ainvb;

end

end

end

The if->else for iiMinus1 I put in to handle an error I was getting on the first iteration of the loop. As soon as I put the iiMinus1 in, mags1 is no longer underlined with the warning "variable 1 is indexed but not sliced in a parfor loop. This might result in unnecessary communications overhead." I took this as a very good sign. However, when I run, the parfor version of my program is still slower than the normal version.

This makes me believe it is the overhead of moving mags1, so I guess Jos's suggestion about passing a sheet is the next thing to try.

Thanks, Matt

"Jeremy Greenwald" < XXXX@XXXXX.COM > wrote in message <h61qr9$g2e$ XXXX@XXXXX.COM >...

i Jos,

I incorporated your suggestion (I think).

mags=rand(6,401,205);

for ii = 2:rows

mags1=zeros(6,401);

mags2=mags(:,:,ii-1);

mags1(:,k)=expm(A*step)*(mags2(:,k,1)+Ainvb)-Ainvb;

end

mags(:,:,ii)=mags1;

So it looks like I have removed a lot of transferring of the mags matrix in and out of the parfor loop, but the parfor version is still a little slower than the regular for loop. I don't see where my overhead is.

From a computation perspective, A is a 6x6 matrix and Ainvb=A\b where b is a column vector. Is there something about this calculation that would make parfor not useful? I can't imagine it would. It is just a little matrix algebra.

I would really like to make this calculation go twice as fast. It is taking about 10 hours total right now (I actually am forced to call the parfor loop about 3000 times.)

Thanks, MAtt

"Matthew" < XXXX@XXXXX.COM > wrote in message <h64a9n$h97$ XXXX@XXXXX.COM >...

I incorporated your suggestion (I think).

mags=rand(6,401,205);

for ii = 2:rows

mags1=zeros(6,401);

mags2=mags(:,:,ii-1);

mags1(:,k)=expm(A*step)*(mags2(:,k,1)+Ainvb)-Ainvb;

end

mags(:,:,ii)=mags1;

So it looks like I have removed a lot of transferring of the mags matrix in and out of the parfor loop, but the parfor version is still a little slower than the regular for loop. I don't see where my overhead is.

From a computation perspective, A is a 6x6 matrix and Ainvb=A\b where b is a column vector. Is there something about this calculation that would make parfor not useful? I can't imagine it would. It is just a little matrix algebra.

I would really like to make this calculation go twice as fast. It is taking about 10 hours total right now (I actually am forced to call the parfor loop about 3000 times.)

Thanks, MAtt

"Matthew" < XXXX@XXXXX.COM > wrote in message <h64a9n$h97$ XXXX@XXXXX.COM >...

Hi Matt,

Sorry I haven't got back to you - I've been out of the office this last week. I'm not sure why you aren't seeing a speed up so I tried out my final example with your algorithm. Below is what I used - it is identical to my final example, and defines the sizes you specified in you last posting. In addition it passes the broadcast variable directly into the function (someFunction) in which I use the matrix exponential. (Note this is all one file in which someFunction is defined as an internal function to be called. For this to work in matlabpool the file forMatt.m needs to be on the path of the client and the workers - this will definitely be the case for a local matlabpool but may not be if you run on a cluster where the remote workers are on different machines).

function outData = forMatt

% Define the broadcast variables

A = rand(6);

b = rand(6, 1);

Ainvb = A\b;

step = 1;

% Define the problem size

nROWS = 6;

nCOLS = 401;

nSHEETS = 205;

% Code almost copied from my previous example - only change

% is to include broadcast variables in call to someFunction

outData = zeros(nROWS, nCOLS, nSHEETS);

inData = rand(nROWS, nCOLS);

parfor j = 1:nCOLS

temp = zeros(nROWS, nSHEETS);

temp(:, 1) = inData(:, j);

for k = 2:nSHEETS

temp(:, k) = someFunction( temp(:, k-1), A, step, Ainvb );

end

outData(:, j, :) = temp;

end

% Actual computation that we want to do.

function out = someFunction( in, A, step, Ainvb )

out = expm(A*step)*(in + Ainvb) - Ainvb;

Some results on a 64-bit linux quad-core Intel Core-duo @ 2.4GHz with 6GB RAM

1. Change the outer 'parfor' to a 'for' and run locally - best serial time

Elapsed time is 26.021504 seconds.

2. Replace with 'parfor' BUT no open matlabpool - should be the same as above since there are no extra resources to speed up the computation.

Elapsed time is 25.841782 seconds.

3. Now open a local matlabpool of size 4 on the local machine - parallel speed up should occur with the 4 workers and the parfor

Elapsed time is 7.385429 seconds.

This gives me a parallel speed up of ~3.5 (best we could expect is 4). Since all 4 matlab processes are using the same memory bandwidth (quad-core machine) this looks pretty good to me.

I hope that the example above is the same as you are attempting to use. Let us know if this doesn't work for you.

Best regards

Jos

Sorry I haven't got back to you - I've been out of the office this last week. I'm not sure why you aren't seeing a speed up so I tried out my final example with your algorithm. Below is what I used - it is identical to my final example, and defines the sizes you specified in you last posting. In addition it passes the broadcast variable directly into the function (someFunction) in which I use the matrix exponential. (Note this is all one file in which someFunction is defined as an internal function to be called. For this to work in matlabpool the file forMatt.m needs to be on the path of the client and the workers - this will definitely be the case for a local matlabpool but may not be if you run on a cluster where the remote workers are on different machines).

function outData = forMatt

% Define the broadcast variables

A = rand(6);

b = rand(6, 1);

Ainvb = A\b;

step = 1;

% Define the problem size

nROWS = 6;

nCOLS = 401;

nSHEETS = 205;

% Code almost copied from my previous example - only change

% is to include broadcast variables in call to someFunction

outData = zeros(nROWS, nCOLS, nSHEETS);

inData = rand(nROWS, nCOLS);

parfor j = 1:nCOLS

temp = zeros(nROWS, nSHEETS);

temp(:, 1) = inData(:, j);

for k = 2:nSHEETS

temp(:, k) = someFunction( temp(:, k-1), A, step, Ainvb );

end

outData(:, j, :) = temp;

end

% Actual computation that we want to do.

function out = someFunction( in, A, step, Ainvb )

out = expm(A*step)*(in + Ainvb) - Ainvb;

Some results on a 64-bit linux quad-core Intel Core-duo @ 2.4GHz with 6GB RAM

1. Change the outer 'parfor' to a 'for' and run locally - best serial time

Elapsed time is 26.021504 seconds.

2. Replace with 'parfor' BUT no open matlabpool - should be the same as above since there are no extra resources to speed up the computation.

Elapsed time is 25.841782 seconds.

3. Now open a local matlabpool of size 4 on the local machine - parallel speed up should occur with the 4 workers and the parfor

Elapsed time is 7.385429 seconds.

This gives me a parallel speed up of ~3.5 (best we could expect is 4). Since all 4 matlab processes are using the same memory bandwidth (quad-core machine) this looks pretty good to me.

I hope that the example above is the same as you are attempting to use. Let us know if this doesn't work for you.

Best regards

Jos

Jos Martin is a Matlab God! Thanks for the help. In all of this, I have neglected to open my pools. With them open to 2 (my # of CPU's) I am doubling my speed. Thanks to all on this thread for your help. It has been immensely educational.

1. Parfor structure transparency ( cannot be classified ) issue

2. parfor, can not be classified help

3. parfor reduction variable identified as temp variable

4. Another PARFOR reduction variable headache

5. Passing Loaded Variables with parfor

6. global variables in parfor loop

7. Indexing of PARFOR reduction variables

8. Parallel computing - Parfor - passing variables by reference

9. parfor reference to a cleared variable

10. Usage of reduction variables in parfor loops

11. Parfor and non-deterministic iid variables

12. Accessing sliced variable outside parfor loop

13. Parallel computing toolbox: help with parfor!

14. help on parfor and file-management in Matlab

12 post • Page:**1** of **1**