http://imagej.273.s1.nabble.com/Questions-regarding-multithreaded-processing-tp5016878p5016887.html
Thank you all for your feedback. Below I'll try to respond to the parts I
can answer.
if we try to share code snippets or want to comment on a particular part.
In the future, I’ll direct plugin development questions to that forum. I
what improvements to expect with multi-threaded processing.
>A quick remark though. Seeing as we do not know HOW you implemented the
parallel processing, it will be difficult to help.
going to be faster. Accessing a single pixel is extremely fast >and what
will slow you down is having each thread waiting to get its pixel index.
problem. The atomic integer approach is what I used initially. To be
much as it should based on the responses by Oli and Micheal. Based on
blocks of the image to different threads. This seemed to improve the speed
modestly 5-10%. Thanks for the suggestion. I’ll take this approach for
any future developments.
thousands or millions of Objects).
Micheal thanks for sharing the list of potential problems. I’ll work my
way through them as well as I can. The number of objects created is the
first I started checking. A new Curvefitter is created for every pixel so
collection I guess. I still haven’t found a way around generating this
many curvefitters.
only to about half the number of cpus. Possible? Probable? No way? Since
probably know well, I’ll take Oli's advice ask this on the imagej.net forum.
>We made the same kind of tests and experience as you did. We also tested
respective computers. We also tested different processes, from a simple
Gaussian blur to more complex macros.
Laurent, thanks for sharing your experiences. Our issues with different
). Maybe we should start a new query on just this
Thanks again for the feedback.
> Dear George,
>
> We made the same kind of tests and experience as you did. We also tested
> numerous machines with a variable number of cores declared in the ImageJ
> Option Menu, in combination with different amounts of RAM, without being
> able to draw really clear conclusions about why it is fast or slow on the
> respective computers. We also tested different processes, from a simple
> Gaussian blur to more complex macros.
>
> In a nutshell:
> We also observed awful performances on our Microscoft Server 2012 / 32CPUs
> / 512GB RAM machine, irrespective of the combination of CPUs and RAM we
> declare in ImageJ. Surely, giving more than 16 CPUs to ImageJ does not
> improve overall speed, sometimes it even decreases. Note that this very
> same machine is really fast when using Matlab and the parallel processing
> toolbox.
> Until recently, the fastest computers we could find to run ImageJ were my
> iMac, which runs Windows 7 (:-)) (specs: i7-4771 CPU 3.5GHz, 32GB RAM),
> and the HIVE (hexacore machine) sold by the company Acquifer (no commercial
> interest). Until then, we thought the speed of individual CPUs is the key,
> less their numbers, but we got really surprised lately when we tested the
> new virtual machines (VMs) our IT department set up for us to do some
> remote processing of very big microscopy datasets (24 cores, 128 to 256 GB
> RAM for each VM). Although the CPUs on the physical servers are not that
> fast (2.5 GHz, but is this really a good measure of computation speed? I am
> not sure...), we measured that our VMs were the fastest machines we tested
> so far. So we have actually no theory anymore about ImageJ and speed. It is
> not clear to us either, whether having Windows 7 or Windows server 2012
> makes a difference.
> Finally, I should mention that when you use complex processes, for example
> Stitching, the speed of the individual CPUs is also important, as we had
> the impression that the reading/loading of the file uses only one core.
> There again, we could see a beautiful correlation between CPU speed (GHz
> specs) and the process.
>
> Current solution:
> If we really need to be very fast,
> 1. we write an ImageJ macro in python and launch multiple threads in
> parallel, but we observed that the whole was not "thread safe", i.e. we see
> "collisions" between the different processes.
> 2. we write a python program to launch multiple ImageJ instances in a
> headless mode and parse the macro this way.
>
> I would be also delighted to understand what makes ImageJ go fast or slow
> on a computer, that would help us to purchase the right machines from the
> beginning.
>
> Very best regards,
>
> Laurent.
>
> ___________________________
> Laurent Gelman, PhD
> Friedrich Miescher Institut
> Head, Facility for Advanced Imaging and Microscopy
> Light microscopy
> WRO 1066.2.16
> Maulbeerstrasse 66
> CH-4058 Basel
> +41 (0)61 696 35 13
> +41 (0)79 618 73 69
> www.fmi.ch
> www.microscopynetwork.unibas.ch/
>
>
> -----Original Message-----
> From: George Patterson [mailto:
[hidden email]]
> Sent: mercredi 13 juillet 2016 23:55
> Subject: Questions regarding multithreaded processing
>
> Dear all,
> I’ve assembled a plugin to analyze a time series on a pixel-by-pixel basis.
> It works fine but is slow.
> There are likely still plenty of optimizations that can be done to improve
> the speed and thanks to Albert Cordona and Stephen Preibisch sharing code
> and tutorials (
http://albert.rierol.net/imagej_programming_tutorials.html> ),
> I’ve even have a version that runs multi-threaded.
> When run on multi-core machines the speed is improved, but I’m not sure
> what sort of improvement I should expect. Moreover, the machines I
> expected to be the fastest are not. This is likely stemming from my
> misunderstanding of parallel processing and Java programming in general so
> I’m hoping some of you with more experience can provide some feedback.
> I list below some observations and questions along with test runs on the
> same data set using the same plugin on a few different machines.
> Thanks for any suggestions.
> George
>
>
> Since the processing speeds differ, I realize the speeds of each machine
> to complete the analysis will differ. I’m more interested the improvement
> of multiple threads on an individual machine.
> In running these tests, I altered the code to use a different number of
> threads in each run.
> Is setting the number of threads in the code and determining the time to
> finish the analysis a valid approach to testing improvement?
>
> Machine 5 is producing some odd behavior which I’ll discuss and ask for
> suggestions below.
>
> For machines 1-4, the speed improves with the number of threads up to
> about half the number of available processors.
> Do the improvements with the number of threads listed below seem
> reasonable?
> Is the improvement up to only about half the number of available
> processors due to “hyperthreading”? My limited (and probably wrong)
> understanding is that hyperthreading makes a single core appear to be two
> which share resources and thus a machine with 2 cores will return 4 when
> queried for number of cpus. Yes, I know that is too simplistic, but it’s
> the best I can do.
> Could it simply be that my code is not written properly to take advantage
> of hyperthreading? Could anyone point me to a source and/or example code
> explaining how I could change it to take advantage of hyperthreading if
> this is the problem?
>
> Number of threads used are shown in parentheses where applicable.
> 1. MacBook Pro 2.66 GHz Intel Core i7
> number of processors: 1
> Number of cores: 2
> non-threaded plugin version ~59 sec
> threaded (1) ~51 sec
> threaded (2) ~36 sec
> threaded (3) ~34 sec
> threaded (4) ~35 sec
>
> 2. Mac Pro 2 x 2.26 GHz Quad-Core Intel Xeon number of processors: 2
> Number of cores: 8 non-threaded plugin version ~60 sec threaded (1) ~59 sec
> threaded (2) ~28.9 sec threaded (4) ~15.6 sec threaded (6) ~13.2 sec
> threaded (8) ~11.3 sec threaded (10) ~11.1 sec threaded (12) ~11.1 sec
> threaded (16) ~11.5 sec
>
> 3. Windows 7 DELL 3.2 GHz Intel Core i5
> number of cpus shown in resource monitor: 4 non-threaded plugin version
> ~45.3 sec threaded (1) ~48.3 sec threaded (2) ~21.7 sec threaded (3) ~20.4
> sec threaded (4) ~21.8 sec
>
> 4. Windows 7 Xi MTower 2P64 Workstation 2 x 2.1 GHz AMD Opteron
> 6272
> number of cpus shown in resource monitor: 32 non-threaded plugin version
> ~162 sec threaded (1) ~158 sec threaded (2) ~85.1 sec threaded (4) ~46 sec
> threaded (8) ~22.9 sec threaded (10) ~18.6 sec threaded (12) ~16.4 sec
> threaded (16) ~15.8 sec threaded (20) ~15.7 sec threaded (24) ~15.9 sec
> threaded (32) ~16 sec
>
> For machines 1-4, the cpu usage can be observed in the Activity Monitor
> (Mac) or Resource Monitor (Windows) and during the execution of the plugin
> all of the cpus were active. For machine 5 shown below, only 22 of the 64
> show activity. And it is not always the same 22. From the example runs
> below you can see it really isn’t performing very well considering the
> number of available cores. I originally thought this machine should be the
> best, but it barely outperforms my laptop. This is probably a question for
> another forum, but I am wondering if anyone else has encountered anything
> similar.
>
> 5. Windows Server 2012 Xi MTower 2P64 Workstation 4 x 2.4 GHz AMD
> Opteron
> 6378
> number of cpus shown in resource monitor: 64 non-threaded plugin version
> ~140 sec threaded (1) ~137 sec threaded (4) ~60.3 sec threaded (8) ~29.3
> sec threaded (12) ~22.9 sec threaded (16) ~23.8 sec threaded (24) ~24.1 sec
> threaded (32) ~24.5 sec threaded (40) ~24.8 sec threaded (48) ~23.8 sec
> threaded (64) ~24.8 sec
>
>
>
>
>
>
>
> --
> View this message in context:
>
http://imagej.1557.x6.nabble.com/Questions-regarding-multithreaded-processing-tp5016878.html> Sent from the ImageJ mailing list archive at Nabble.com.
>
> --
> ImageJ mailing list:
http://imagej.nih.gov/ij/list.html>