Hi,
I am not a sophisticated programmer and need some advice on how (if possible) to speed up my plugin on multiprocessor machines. Context: I just got a quad core machine. The main routine in my plugin goes through an image pixel by pixel and transforms each pixel. The result for each pixel is independent of the other pixels. So is there a way I can split the image into pieces and use a separate processor on each piece? Jon |
Jon,
Here is the approach I've taken. My codes run 4 times as fast on my 4-processor Mac now: int nThreads = Runtime.getRuntime().availableProcessors(); ... Step1Thread[] s1t = new Step1Thread[nThreads]; for(int thread = 0; thread < nThreads; thread++){ s1t[thread] = new Step1Thread(thread,nThreads,w,h,d,s,data); s1t[thread].start(); } try{ for(int thread = 0; thread< nThreads; thread++){ s1t[thread].join(); } }catch(InterruptedException ie){ IJ.error("A thread was interrupted in step 1 ."); } Here Step1Thread is a class that has a run method that does part of the work. In this case, it operates on a subset of the slices in a stack, but there are many ways to break up a problem. It figures out what part of the work to do based on its thread number and nThreads. The .join operation causes the main code to wait until everybody is done. Here is the example Step1Thread. I removed some of the variables and arguments of the constructor to simplify it. In practice, more parameters concerning the particular task are necessary for the run method to do its job. class Step1Thread extends Thread{ int thread,nThreads,w,h,d; float[][] s; byte[][] data; public Step1Thread(int thread, int nThreads, int w, int h, int d, float[][] s, byte[][] data){ this.thread = thread; this.nThreads = nThreads; this.w = w; this.h = h; this.d = d; this.data = data; this.s = s; } public void run(){ float[] sk; byte[] dk; for(int k = thread; k < d; k+=nThreads){ sk = s[k]; //Slice k of output dk = data[k]; //Slice k of input for(int j = 0; j < h; j++){ sk[i+w*j] =something computed from dk; } } } } Bob On Jan 16, 2008, at 10:48 AM, Jon Harman wrote: > Hi, > > I am not a sophisticated programmer and need some advice on how (if > possible) to speed up my plugin on multiprocessor machines. > > Context: I just got a quad core machine. > > The main routine in my plugin goes through an image pixel by pixel > and transforms each pixel. The result for each pixel is > independent of the other pixels. So is there a way I can split the > image into pieces and use a separate processor on each piece? > > Jon > Robert Dougherty, Ph.D. President, OptiNav, Inc. 10900 NE 8th St, Suite 900 Bellevue, WA 98004 Tel. (425)990-5912 FAX (425)467-1119 www.optinav.com [hidden email] |
You may also try Parallel Colt:
http://piotr.wendykier.googlepages.com/parallelcolt Examples can be found in the source code of the Parallel Spectral Deconvolution plugin available at: http://piotr.wendykier.googlepages.com/deconvolution Piotr On Jan 16, 2008 2:32 PM, Robert Dougherty <[hidden email]> wrote: > Jon, > > Here is the approach I've taken. My codes run 4 times as fast on my > 4-processor Mac now: > > int nThreads = Runtime.getRuntime().availableProcessors(); > ... > Step1Thread[] s1t = new Step1Thread[nThreads]; > for(int thread = 0; thread < nThreads; thread++){ > s1t[thread] = new Step1Thread(thread,nThreads,w,h,d,s,data); > s1t[thread].start(); > } > try{ > for(int thread = 0; thread< nThreads; thread++){ > s1t[thread].join(); > } > }catch(InterruptedException ie){ > IJ.error("A thread was interrupted in step 1 ."); > } > > Here Step1Thread is a class that has a run method that does part of > the work. In this case, it operates on a subset of the slices in a > stack, but there are many ways to break up a problem. It figures > out what part of the work to do based on its thread number and > nThreads. The .join operation causes the main code to wait until > everybody is done. > > Here is the example Step1Thread. I removed some of the variables and > arguments of the constructor to simplify it. In practice, more > parameters concerning the particular task are necessary for the run > method to do its job. > > class Step1Thread extends Thread{ > int thread,nThreads,w,h,d; > float[][] s; > byte[][] data; > public Step1Thread(int thread, int nThreads, int w, int h, int d, > float[][] s, byte[][] data){ > this.thread = thread; > this.nThreads = nThreads; > this.w = w; > this.h = h; > this.d = d; > this.data = data; > this.s = s; > } > public void run(){ > float[] sk; > byte[] dk; > for(int k = thread; k < d; k+=nThreads){ > sk = s[k]; //Slice k of output > dk = data[k]; //Slice k of input > for(int j = 0; j < h; j++){ > sk[i+w*j] =something computed from dk; > } > } > } > } > > > > Bob > > > > > On Jan 16, 2008, at 10:48 AM, Jon Harman wrote: > > > Hi, > > > > I am not a sophisticated programmer and need some advice on how (if > > possible) to speed up my plugin on multiprocessor machines. > > > > Context: I just got a quad core machine. > > > > The main routine in my plugin goes through an image pixel by pixel > > and transforms each pixel. The result for each pixel is > > independent of the other pixels. So is there a way I can split the > > image into pieces and use a separate processor on each piece? > > > > Jon > > > > Robert Dougherty, Ph.D. > President, OptiNav, Inc. > 10900 NE 8th St, Suite 900 > Bellevue, WA 98004 > Tel. (425)990-5912 > FAX (425)467-1119 > www.optinav.com > [hidden email] > |
Greetings list,
Macro query (MacOS X, 1.38): Is there any way to remove the labels from text copied from the Results or Summary windows before it is written out (e.g., to the log window)? A nested loop using replace(string, old, new) would do but would be slow. -- Mark Mark J. Chopping, Ph.D. Associate Professor, Earth & Environmental Studies Montclair State University, Montclair, NJ 07043 NASA EOS/MISR ST/LCLUC ST/N. American Carbon Program Tel: (973) 655-7384 Fax: (973) 655-4072 ------------------------------------------------------- http://csam.montclair.edu/~chopping/wood |
In reply to this post by Robert Dougherty
Thanks!
Your example is very clear, I had no problem implementing it in my situation and it works like a charm. Jon Robert Dougherty wrote: > Jon, > > Here is the approach I've taken. My codes run 4 times as fast on my 4-processor Mac now: > > int nThreads = Runtime.getRuntime().availableProcessors(); > ... > Step1Thread[] s1t = new Step1Thread[nThreads]; > for(int thread = 0; thread < nThreads; thread++){ > s1t[thread] = new Step1Thread(thread,nThreads,w,h,d,s,data); > s1t[thread].start(); > } > try{ > for(int thread = 0; thread< nThreads; thread++){ > s1t[thread].join(); > } > }catch(InterruptedException ie){ > IJ.error("A thread was interrupted in step 1 ."); > } > > Here Step1Thread is a class that has a run method that does part of the work. In this case, it operates on a subset of the slices in a stack, but there are many ways to break up a problem. It figures out what part of the work to do based on its thread number and nThreads. The .join operation causes the main code to wait until everybody is done. > > Here is the example Step1Thread. I removed some of the variables and arguments of the constructor to simplify it. In practice, more parameters concerning the particular task are necessary for the run method to do its job. > > class Step1Thread extends Thread{ > int thread,nThreads,w,h,d; > float[][] s; > byte[][] data; > public Step1Thread(int thread, int nThreads, int w, int h, int d, float[][] s, byte[][] data){ > this.thread = thread; > this.nThreads = nThreads; > this.w = w; > this.h = h; > this.d = d; > this.data = data; > this.s = s; > } > public void run(){ > float[] sk; > byte[] dk; > for(int k = thread; k < d; k+=nThreads){ > sk = s[k]; //Slice k of output > dk = data[k]; //Slice k of input > for(int j = 0; j < h; j++){ > sk[i+w*j] =something computed from dk; > } > } > } > } > > > > Bob > > > > On Jan 16, 2008, at 10:48 AM, Jon Harman wrote: > >> Hi, >> >> I am not a sophisticated programmer and need some advice on how (if possible) to speed up my plugin on multiprocessor machines. >> >> Context: I just got a quad core machine. >> >> The main routine in my plugin goes through an image pixel by pixel and transforms each pixel. The result for each pixel is independent of the other pixels. So is there a way I can split the image into pieces and use a separate processor on each piece? >> >> Jon >> > > Robert Dougherty, Ph.D. > President, OptiNav, Inc. > 10900 NE 8th St, Suite 900 > Bellevue, WA 98004 > Tel. (425)990-5912 > FAX (425)467-1119 > www.optinav.com > [hidden email] > |
Free forum by Nabble | Edit this page |