Posted by
George Patterson on
Jul 13, 2016; 9:55pm
URL: http://imagej.273.s1.nabble.com/Questions-regarding-multithreaded-processing-tp5016878.html
Dear all,
I’ve assembled a plugin to analyze a time series on a pixel-by-pixel basis. It works fine but is slow.
There are likely still plenty of optimizations that can be done to improve the speed and thanks to Albert Cordona and Stephen Preibisch sharing code and tutorials (
http://albert.rierol.net/imagej_programming_tutorials.html), I’ve even have a version that runs multi-threaded.
When run on multi-core machines the speed is improved, but I’m not sure what sort of improvement I should expect. Moreover, the machines I expected to be the fastest are not. This is likely stemming from my misunderstanding of parallel processing and Java programming in general so I’m hoping some of you with more experience can provide some feedback.
I list below some observations and questions along with test runs on the same data set using the same plugin on a few different machines.
Thanks for any suggestions.
George
Since the processing speeds differ, I realize the speeds of each machine to complete the analysis will differ. I’m more interested the improvement of multiple threads on an individual machine.
In running these tests, I altered the code to use a different number of threads in each run.
Is setting the number of threads in the code and determining the time to finish the analysis a valid approach to testing improvement?
Machine 5 is producing some odd behavior which I’ll discuss and ask for suggestions below.
For machines 1-4, the speed improves with the number of threads up to about half the number of available processors.
Do the improvements with the number of threads listed below seem reasonable?
Is the improvement up to only about half the number of available processors due to “hyperthreading”? My limited (and probably wrong) understanding is that hyperthreading makes a single core appear to be two which share resources and thus a machine with 2 cores will return 4 when queried for number of cpus. Yes, I know that is too simplistic, but it’s the best I can do.
Could it simply be that my code is not written properly to take advantage of hyperthreading? Could anyone point me to a source and/or example code explaining how I could change it to take advantage of hyperthreading if this is the problem?
Number of threads used are shown in parentheses where applicable.
1. MacBook Pro 2.66 GHz Intel Core i7
number of processors: 1
Number of cores: 2
non-threaded plugin version ~59 sec
threaded (1) ~51 sec
threaded (2) ~36 sec
threaded (3) ~34 sec
threaded (4) ~35 sec
2. Mac Pro 2 x 2.26 GHz Quad-Core Intel Xeon
number of processors: 2
Number of cores: 8
non-threaded plugin version ~60 sec
threaded (1) ~59 sec
threaded (2) ~28.9 sec
threaded (4) ~15.6 sec
threaded (6) ~13.2 sec
threaded (8) ~11.3 sec
threaded (10) ~11.1 sec
threaded (12) ~11.1 sec
threaded (16) ~11.5 sec
3. Windows 7 DELL 3.2 GHz Intel Core i5
number of cpus shown in resource monitor: 4
non-threaded plugin version ~45.3 sec
threaded (1) ~48.3 sec
threaded (2) ~21.7 sec
threaded (3) ~20.4 sec
threaded (4) ~21.8 sec
4. Windows 7 Xi MTower 2P64 Workstation 2 x 2.1 GHz AMD Opteron 6272
number of cpus shown in resource monitor: 32
non-threaded plugin version ~162 sec
threaded (1) ~158 sec
threaded (2) ~85.1 sec
threaded (4) ~46 sec
threaded (8) ~22.9 sec
threaded (10) ~18.6 sec
threaded (12) ~16.4 sec
threaded (16) ~15.8 sec
threaded (20) ~15.7 sec
threaded (24) ~15.9 sec
threaded (32) ~16 sec
For machines 1-4, the cpu usage can be observed in the Activity Monitor (Mac) or Resource Monitor (Windows) and during the execution of the plugin all of the cpus were active. For machine 5 shown below, only 22 of the 64 show activity. And it is not always the same 22. From the example runs below you can see it really isn’t performing very well considering the number of available cores. I originally thought this machine should be the best, but it barely outperforms my laptop. This is probably a question for another forum, but I am wondering if anyone else has encountered anything similar.
5. Windows Server 2012 Xi MTower 2P64 Workstation 4 x 2.4 GHz AMD Opteron 6378
number of cpus shown in resource monitor: 64
non-threaded plugin version ~140 sec
threaded (1) ~137 sec
threaded (4) ~60.3 sec
threaded (8) ~29.3 sec
threaded (12) ~22.9 sec
threaded (16) ~23.8 sec
threaded (24) ~24.1 sec
threaded (32) ~24.5 sec
threaded (40) ~24.8 sec
threaded (48) ~23.8 sec
threaded (64) ~24.8 sec