http://imagej.273.s1.nabble.com/Questions-regarding-multithreaded-processing-tp5016878p5016889.html
CurveFitter. You would not reach me on the developers' mailing list as
I think I might eventually need it).
indeed. It does not use the Minimizer (and thus, only one Thread) for
- "Power (linear regression)" 'a*x^b'.
e.g. if the Minimizer does not find two consistent solutions
of three tries.
regression class that can be reused without creating a new object.
linear regression with only two parameters (e.g. polynomial,
Jama).
objects, so having one per pixel really induces a lot of garbage collection.
Minimizer reusable (e.g. with a CurveFitter.setData(double[] xData,
double[] yData) method, which clears the previous result and settings).
> Hi all,
>
> Thank you all for your feedback. Below I'll try to respond to the parts I
> can answer.
>
>
>> Seeing as this bit is a bit more technical and closer to a plugin
> development question, would you mind posting it on
http://forum.imagej.net>
>> Long technical email threads like this one tend to get muddy, especially
> if we try to share code snippets or want to comment on a particular part.
>
> In the future, I’ll direct plugin development questions to that forum. I
> didn’t bother sharing code since at this point since I just wanted to know
> what improvements to expect with multi-threaded processing.
>
>
>> A quick remark though. Seeing as we do not know HOW you implemented the
> parallel processing, it will be difficult to help.
>
>> Some notes: If you 'simply' make a bunch of threads where each accesses a
> given pixel in a loop through an atomic integer for example, it is not
> going to be faster. Accessing a single pixel is extremely fast >and what
> will slow you down is having each thread waiting to get its pixel index.
>
> As I mentioned, I didn’t know what to expect so I wasn’t sure I had a
> problem. The atomic integer approach is what I used initially. To be
> clear, the speed does improve with more threads, it just doesn’t improve as
> much as it should based on the responses by Oli and Micheal. Based on
> suggestions from Oli and Micheal, I changed the code to designate different
> blocks of the image to different threads. This seemed to improve the speed
> modestly 5-10%. Thanks for the suggestion. I’ll take this approach for
> any future developments.
>
>
>> IMHO the most important point for efficient parallelization (and efficient
> Java code anyhow) is avoiding creating lots of objects that need garbage
> collection (don't care about dozens, but definitely avoid >hundreds of
> thousands or millions of Objects).
>
> Micheal thanks for sharing the list of potential problems. I’ll work my
> way through them as well as I can. The number of objects created is the
> first I started checking. A new Curvefitter is created for every pixel so
> for a 256x256x200 stack >65000 are created and subjected to garbage
> collection I guess. I still haven’t found a way around generating this
> many curvefitters.
>
> This led me to looking more closely at the Curvefitter documentation and I
> found this
https://imagej.nih.gov/ij/docs/curve-fitter.html where it
> indicates “Two threads with two independent minimization runs, with
> repetition until two identical results are found, to avoid local minima or
> keeping the result of a stuck simplex.” Does this mean that for each
> thread that generates a new Curvefitter, the Curvefitter generates a second
> thread on its own? If so, then my plugin is generating twice as many
> threads as I think and might explain why my speed improvement is observed
> only to about half the number of cpus. Possible? Probable? No way? Since
> this is maybe getting into some technical bits which the plugin developers
> probably know well, I’ll take Oli's advice ask this on the imagej.net forum.
>
>
>> We made the same kind of tests and experience as you did. We also tested
> numerous machines with a variable number of cores declared in the ImageJ
> Option Menu, in combination with different amounts of >RAM, without being
> able to draw really clear conclusions about why it is fast or slow on the
> respective computers. We also tested different processes, from a simple
> Gaussian blur to more complex macros.
>
> Laurent, thanks for sharing your experiences. Our issues with different
> machines might be better answered on another forum (maybe
>
http://forum.imagej.net ). Maybe we should start a new query on just this
> topic?
>
>
> Thanks again for the feedback.
>
> George
>
>
>
> On Fri, Jul 15, 2016 at 3:49 AM, Gelman, Laurent <
[hidden email]>
> wrote:
>
>> Dear George,
>>
>> We made the same kind of tests and experience as you did. We also tested
>> numerous machines with a variable number of cores declared in the ImageJ
>> Option Menu, in combination with different amounts of RAM, without being
>> able to draw really clear conclusions about why it is fast or slow on the
>> respective computers. We also tested different processes, from a simple
>> Gaussian blur to more complex macros.
>>
>> In a nutshell:
>> We also observed awful performances on our Microscoft Server 2012 / 32CPUs
>> / 512GB RAM machine, irrespective of the combination of CPUs and RAM we
>> declare in ImageJ. Surely, giving more than 16 CPUs to ImageJ does not
>> improve overall speed, sometimes it even decreases. Note that this very
>> same machine is really fast when using Matlab and the parallel processing
>> toolbox.
>> Until recently, the fastest computers we could find to run ImageJ were my
>> iMac, which runs Windows 7 (:-)) (specs: i7-4771 CPU 3.5GHz, 32GB RAM),
>> and the HIVE (hexacore machine) sold by the company Acquifer (no commercial
>> interest). Until then, we thought the speed of individual CPUs is the key,
>> less their numbers, but we got really surprised lately when we tested the
>> new virtual machines (VMs) our IT department set up for us to do some
>> remote processing of very big microscopy datasets (24 cores, 128 to 256 GB
>> RAM for each VM). Although the CPUs on the physical servers are not that
>> fast (2.5 GHz, but is this really a good measure of computation speed? I am
>> not sure...), we measured that our VMs were the fastest machines we tested
>> so far. So we have actually no theory anymore about ImageJ and speed. It is
>> not clear to us either, whether having Windows 7 or Windows server 2012
>> makes a difference.
>> Finally, I should mention that when you use complex processes, for example
>> Stitching, the speed of the individual CPUs is also important, as we had
>> the impression that the reading/loading of the file uses only one core.
>> There again, we could see a beautiful correlation between CPU speed (GHz
>> specs) and the process.
>>
>> Current solution:
>> If we really need to be very fast,
>> 1. we write an ImageJ macro in python and launch multiple threads in
>> parallel, but we observed that the whole was not "thread safe", i.e. we see
>> "collisions" between the different processes.
>> 2. we write a python program to launch multiple ImageJ instances in a
>> headless mode and parse the macro this way.
>>
>> I would be also delighted to understand what makes ImageJ go fast or slow
>> on a computer, that would help us to purchase the right machines from the
>> beginning.
>>
>> Very best regards,
>>
>> Laurent.
>>
>> ___________________________
>> Laurent Gelman, PhD
>> Friedrich Miescher Institut
>> Head, Facility for Advanced Imaging and Microscopy
>> Light microscopy
>> WRO 1066.2.16
>> Maulbeerstrasse 66
>> CH-4058 Basel
>> +41 (0)61 696 35 13
>> +41 (0)79 618 73 69
>> www.fmi.ch
>> www.microscopynetwork.unibas.ch/
>>
>>
>> -----Original Message-----
>> From: George Patterson [mailto:
[hidden email]]
>> Sent: mercredi 13 juillet 2016 23:55
>> Subject: Questions regarding multithreaded processing
>>
>> Dear all,
>> I’ve assembled a plugin to analyze a time series on a pixel-by-pixel basis.
>> It works fine but is slow.
>> There are likely still plenty of optimizations that can be done to improve
>> the speed and thanks to Albert Cordona and Stephen Preibisch sharing code
>> and tutorials (
http://albert.rierol.net/imagej_programming_tutorials.html>> ),
>> I’ve even have a version that runs multi-threaded.
>> When run on multi-core machines the speed is improved, but I’m not sure
>> what sort of improvement I should expect. Moreover, the machines I
>> expected to be the fastest are not. This is likely stemming from my
>> misunderstanding of parallel processing and Java programming in general so
>> I’m hoping some of you with more experience can provide some feedback.
>> I list below some observations and questions along with test runs on the
>> same data set using the same plugin on a few different machines.
>> Thanks for any suggestions.
>> George
>>
>>
>> Since the processing speeds differ, I realize the speeds of each machine
>> to complete the analysis will differ. I’m more interested the improvement
>> of multiple threads on an individual machine.
>> In running these tests, I altered the code to use a different number of
>> threads in each run.
>> Is setting the number of threads in the code and determining the time to
>> finish the analysis a valid approach to testing improvement?
>>
>> Machine 5 is producing some odd behavior which I’ll discuss and ask for
>> suggestions below.
>>
>> For machines 1-4, the speed improves with the number of threads up to
>> about half the number of available processors.
>> Do the improvements with the number of threads listed below seem
>> reasonable?
>> Is the improvement up to only about half the number of available
>> processors due to “hyperthreading”? My limited (and probably wrong)
>> understanding is that hyperthreading makes a single core appear to be two
>> which share resources and thus a machine with 2 cores will return 4 when
>> queried for number of cpus. Yes, I know that is too simplistic, but it’s
>> the best I can do.
>> Could it simply be that my code is not written properly to take advantage
>> of hyperthreading? Could anyone point me to a source and/or example code
>> explaining how I could change it to take advantage of hyperthreading if
>> this is the problem?
>>
>> Number of threads used are shown in parentheses where applicable.
>> 1. MacBook Pro 2.66 GHz Intel Core i7
>> number of processors: 1
>> Number of cores: 2
>> non-threaded plugin version ~59 sec
>> threaded (1) ~51 sec
>> threaded (2) ~36 sec
>> threaded (3) ~34 sec
>> threaded (4) ~35 sec
>>
>> 2. Mac Pro 2 x 2.26 GHz Quad-Core Intel Xeon number of processors: 2
>> Number of cores: 8 non-threaded plugin version ~60 sec threaded (1) ~59 sec
>> threaded (2) ~28.9 sec threaded (4) ~15.6 sec threaded (6) ~13.2 sec
>> threaded (8) ~11.3 sec threaded (10) ~11.1 sec threaded (12) ~11.1 sec
>> threaded (16) ~11.5 sec
>>
>> 3. Windows 7 DELL 3.2 GHz Intel Core i5
>> number of cpus shown in resource monitor: 4 non-threaded plugin version
>> ~45.3 sec threaded (1) ~48.3 sec threaded (2) ~21.7 sec threaded (3) ~20.4
>> sec threaded (4) ~21.8 sec
>>
>> 4. Windows 7 Xi MTower 2P64 Workstation 2 x 2.1 GHz AMD Opteron
>> 6272
>> number of cpus shown in resource monitor: 32 non-threaded plugin version
>> ~162 sec threaded (1) ~158 sec threaded (2) ~85.1 sec threaded (4) ~46 sec
>> threaded (8) ~22.9 sec threaded (10) ~18.6 sec threaded (12) ~16.4 sec
>> threaded (16) ~15.8 sec threaded (20) ~15.7 sec threaded (24) ~15.9 sec
>> threaded (32) ~16 sec
>>
>> For machines 1-4, the cpu usage can be observed in the Activity Monitor
>> (Mac) or Resource Monitor (Windows) and during the execution of the plugin
>> all of the cpus were active. For machine 5 shown below, only 22 of the 64
>> show activity. And it is not always the same 22. From the example runs
>> below you can see it really isn’t performing very well considering the
>> number of available cores. I originally thought this machine should be the
>> best, but it barely outperforms my laptop. This is probably a question for
>> another forum, but I am wondering if anyone else has encountered anything
>> similar.
>>
>> 5. Windows Server 2012 Xi MTower 2P64 Workstation 4 x 2.4 GHz AMD
>> Opteron
>> 6378
>> number of cpus shown in resource monitor: 64 non-threaded plugin version
>> ~140 sec threaded (1) ~137 sec threaded (4) ~60.3 sec threaded (8) ~29.3
>> sec threaded (12) ~22.9 sec threaded (16) ~23.8 sec threaded (24) ~24.1 sec
>> threaded (32) ~24.5 sec threaded (40) ~24.8 sec threaded (48) ~23.8 sec
>> threaded (64) ~24.8 sec
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>>
http://imagej.1557.x6.nabble.com/Questions-regarding-multithreaded-processing-tp5016878.html>> Sent from the ImageJ mailing list archive at Nabble.com.
>>
>> --
>> ImageJ mailing list:
http://imagej.nih.gov/ij/list.html>>
>
> --
> ImageJ mailing list:
http://imagej.nih.gov/ij/list.html>