Login  Register

Re: Questions regarding multithreaded processing

Posted by Olivier Burri on Jul 14, 2016; 7:09am
URL: http://imagej.273.s1.nabble.com/Questions-regarding-multithreaded-processing-tp5016878p5016881.html

Hi Thom and George.

Seeing as this bit is a bit more technical and closer to a plugin development question, would you mind posting it on http://forum.imagej.net 
Long technical email threads like this one tend to get muddy, especially if we try to share code snippets or want to comment on a particular part.

Plus you get a bunch of the ImageJ Devs who hang out there all the time.

And finally, it looks pretty :)

A quick remark though. Seeing as we do not know HOW you implemented the parallel processing, it will be difficult to help.

Some notes: If you 'simply' make a bunch of threads where each accesses a given pixel in a loop through an atomic integer for example, it is not going to be faster. Accessing a single pixel is extremely fast and what will slow you down is having each thread waiting to get its pixel index.

This is why on most examples, first you break the task by number of cores and you assign each thread with a pre-defined number of pixels (a block) to process. That way each thread can just go and access the pixels they want without worrying about what another thread does. And there you can expect a speed increase pretty that scales linearly with the number of available cores.

Best

Oli

> -----Original Message-----
> From: ImageJ Interest Group [mailto:[hidden email]] On Behalf Of Kurt
> Thorn
> Sent: jeudi, 14 juillet 2016 01:26
> To: [hidden email]
> Subject: Re: Questions regarding multithreaded processing
>
> I don't know enough about multithreading to say much intelligent, but I did
> recently see a post suggesting that in parallel processing with matlab, using
> more instances than physical cores may nor produce much speed improvement:
> http://undocumentedmatlab.com/blog/a-few-parfor-tips
>
> Kurt
>
> On 7/13/2016 2:55 PM, George Patterson wrote:
> > Dear all,
> > I’ve assembled a plugin to analyze a time series on a pixel-by-pixel basis.
> > It works fine but is slow.
> > There are likely still plenty of optimizations that can be done to
> > improve the speed and thanks to Albert Cordona and Stephen Preibisch
> > sharing code and tutorials
> > (http://albert.rierol.net/imagej_programming_tutorials.html),
> > I’ve even have a version that runs multi-threaded.
> > When run on multi-core machines the speed is improved, but I’m not
> > sure what sort of improvement I should expect.  Moreover, the machines
> > I expected to be the fastest are not.  This is likely stemming from my
> > misunderstanding of parallel processing and Java programming in
> > general so I’m hoping some of you with more experience can provide some
> feedback.
> > I list below some observations and questions along with test runs on
> > the same data set using the same plugin on a few different machines.
> > Thanks for any suggestions.
> > George
> >
> >
> > Since the processing speeds differ, I realize the speeds of each
> > machine to complete the analysis will differ.  I’m more interested the
> > improvement of multiple threads on an individual machine.
> > In running these tests, I altered the code to use a different number
> > of threads in each run.
> > Is setting the number of threads in the code and determining the time
> > to finish the analysis a valid approach to testing improvement?
> >
> > Machine 5 is producing some odd behavior which I’ll discuss and ask
> > for suggestions below.
> >
> > For machines 1-4, the speed improves with the number of threads up to
> > about half the number of available processors.
> > Do the improvements with the number of threads listed below seem
> reasonable?
> > Is the improvement up to only about half the number of available
> > processors due to “hyperthreading”?  My limited (and probably wrong)
> > understanding is that hyperthreading makes a single core appear to be
> > two which share resources and thus a machine with 2 cores will return
> > 4 when queried for number of cpus.  Yes, I know that is too
> > simplistic, but it’s the best I can do.
> > Could it simply be that my code is not written properly to take
> > advantage of hyperthreading?  Could anyone point me to a source and/or
> > example code explaining how I could change it to take advantage of
> > hyperthreading if this is the problem?
> >
> > Number of threads used are shown in parentheses where applicable.
> > 1. MacBook Pro 2.66 GHz Intel Core i7
> > number of processors: 1
> > Number of cores: 2
> > non-threaded plugin version ~59 sec
> > threaded (1) ~51 sec
> > threaded (2) ~36 sec
> > threaded (3) ~34 sec
> > threaded (4) ~35 sec
> >
> > 2. Mac Pro 2 x 2.26 GHz Quad-Core Intel Xeon number of processors: 2
> > Number of cores: 8 non-threaded plugin version ~60 sec threaded (1)
> > ~59 sec threaded (2) ~28.9 sec threaded (4) ~15.6 sec threaded (6)
> > ~13.2 sec threaded (8) ~11.3 sec threaded (10) ~11.1 sec threaded (12)
> > ~11.1 sec threaded (16) ~11.5 sec
> >
> > 3. Windows 7 DELL   3.2 GHz Intel Core i5
> > number of cpus shown in resource monitor: 4 non-threaded plugin
> > version ~45.3 sec threaded (1) ~48.3 sec threaded (2) ~21.7 sec
> > threaded (3) ~20.4 sec threaded (4) ~21.8 sec
> >
> > 4. Windows 7 Xi MTower 2P64 Workstation 2 x 2.1 GHz  AMD Opteron 6272
> > number of cpus shown in resource monitor: 32 non-threaded plugin
> > version ~162 sec threaded (1) ~158 sec threaded (2) ~85.1 sec threaded
> > (4) ~46 sec threaded (8) ~22.9 sec threaded (10) ~18.6 sec threaded
> > (12) ~16.4 sec threaded (16) ~15.8 sec threaded (20) ~15.7 sec
> > threaded (24) ~15.9 sec threaded (32) ~16 sec
> >
> > For machines 1-4, the cpu usage can be observed in the Activity
> > Monitor
> > (Mac) or Resource Monitor (Windows) and during the execution of the
> > plugin all of the cpus were active.  For machine 5 shown below, only
> > 22 of the 64 show activity.  And it is not always the same 22.  From
> > the example runs below you can see it really isn’t performing very
> > well considering the number of available cores.  I originally thought
> > this machine should be the best, but it barely outperforms my laptop.
> > This is probably a question for another forum, but I am wondering if
> > anyone else has encountered anything similar.
> >
> > 5. Windows Server 2012 Xi MTower 2P64 Workstation 4 x 2.4 GHz  AMD
> Opteron
> > 6378
> > number of cpus shown in resource monitor: 64 non-threaded plugin
> > version ~140 sec threaded (1) ~137 sec threaded (4) ~60.3 sec threaded
> > (8) ~29.3 sec threaded (12) ~22.9 sec threaded (16) ~23.8 sec threaded
> > (24) ~24.1 sec threaded (32) ~24.5 sec threaded (40) ~24.8 sec
> > threaded (48) ~23.8 sec threaded (64) ~24.8 sec
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://imagej.1557.x6.nabble.com/Questions-regarding-multithreaded-pro
> > cessing-tp5016878.html Sent from the ImageJ mailing list archive at
> > Nabble.com.
> >
> > --
> > ImageJ mailing list: http://imagej.nih.gov/ij/list.html
> >
>
>
> --
> Kurt Thorn
> Associate Professor
> Director, Nikon Imaging Center
> http://thornlab.ucsf.edu/
> http://nic.ucsf.edu/blog/
>
> --
> ImageJ mailing list: http://imagej.nih.gov/ij/list.html

--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html