Arrays and thread safety

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Arrays and thread safety

Michael Doube
Hi all

I recently wrote a multithreaded plugin
(http://doube.org/plugins.html#purify) to find the biggest particle in a
stack and delete all others.  It works by running a particle joining
method on separate chunks (sets of slices), where each chunk is
processed in a separate thread. The chunks are then stitched back
together; the particle relabelling method replaceLabels() is run
multithreaded.  However, it seems that sometimes stitching fails - I can
run with the same conditions on the same stack and get different
results, so I guess that the threads are clobbering each other.

I have 2 questions:

1) is there any difference in thread safety to use 1D work arrays with
(x,y,z) indexing like

  workArray[index] where index = z * width * height + y * width + x

and each thread given a discrete range of z values to work on

or to use 2D arrays with (x,y,z) indexing like

  workArray[z][index] where index = y * width + x

and each thread given a z to work on

(The plugin uses 1D arrays at the moment)


2) the multithreaded version of replaceLabels() runs within a nested for
loop that iterates over voxels.  A single thread checks label values and
each time a particle label needs to be replaced, nThreads instances of
replaceLabels() run, replacing labels in discrete slices indexed in a 1D
work array.  Is it possible that one thread, having finished its job of
replaceLabels(), iterates through the for loop while the other thread(s)
is(are) still running replaceLabels()?

Any comments appreciated,

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Thomas Boudier
Hi Michael,

I have used a 3d version for multi-threading and it seems ok, of course
it is not has fast as 1D ;-) . I've noticed that it works much faster
when each thread works on its own separate piece of image.

Thomas


Michael Doube a écrit :

> Hi all
>
> I recently wrote a multithreaded plugin
> (http://doube.org/plugins.html#purify) to find the biggest particle in a
> stack and delete all others.  It works by running a particle joining
> method on separate chunks (sets of slices), where each chunk is
> processed in a separate thread. The chunks are then stitched back
> together; the particle relabelling method replaceLabels() is run
> multithreaded.  However, it seems that sometimes stitching fails - I can
> run with the same conditions on the same stack and get different
> results, so I guess that the threads are clobbering each other.
>
> I have 2 questions:
>
> 1) is there any difference in thread safety to use 1D work arrays with
> (x,y,z) indexing like
>
>  workArray[index] where index = z * width * height + y * width + x
>
> and each thread given a discrete range of z values to work on
>
> or to use 2D arrays with (x,y,z) indexing like
>
>  workArray[z][index] where index = y * width + x
>
> and each thread given a z to work on
>
> (The plugin uses 1D arrays at the moment)
>
>
> 2) the multithreaded version of replaceLabels() runs within a nested for
> loop that iterates over voxels.  A single thread checks label values and
> each time a particle label needs to be replaced, nThreads instances of
> replaceLabels() run, replacing labels in discrete slices indexed in a 1D
> work array.  Is it possible that one thread, having finished its job of
> replaceLabels(), iterates through the for loop while the other thread(s)
> is(are) still running replaceLabels()?
>
> Any comments appreciated,
>
> Mike
>
>

--
   /**********************************************************/
      Thomas Boudier, MCU Université Pierre et Marie Curie,
      IFR 83. Bat B 7ème étage, porte 709, Jussieu.
      Tel : 01 44 27 20 11  Fax : 01 44 27 22 91
/*******************************************************/
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Michael Doube
Hi Thomas

> I've noticed that it works much faster
> when each thread works on its own separate piece of image.

Do you mean when each thread is given a 1D array from a 2D array (e.g.
workArray[z][]), or when each thread is given a separate range in a 1D
workArray[]?

Mike


> Thomas
>
>
> Michael Doube a écrit :
>> Hi all
>>
>> I recently wrote a multithreaded plugin
>> (http://doube.org/plugins.html#purify) to find the biggest particle in a
>> stack and delete all others.  It works by running a particle joining
>> method on separate chunks (sets of slices), where each chunk is
>> processed in a separate thread. The chunks are then stitched back
>> together; the particle relabelling method replaceLabels() is run
>> multithreaded.  However, it seems that sometimes stitching fails - I can
>> run with the same conditions on the same stack and get different
>> results, so I guess that the threads are clobbering each other.
>>
>> I have 2 questions:
>>
>> 1) is there any difference in thread safety to use 1D work arrays with
>> (x,y,z) indexing like
>>
>>  workArray[index] where index = z * width * height + y * width + x
>>
>> and each thread given a discrete range of z values to work on
>>
>> or to use 2D arrays with (x,y,z) indexing like
>>
>>  workArray[z][index] where index = y * width + x
>>
>> and each thread given a z to work on
>>
>> (The plugin uses 1D arrays at the moment)
>>
>>
>> 2) the multithreaded version of replaceLabels() runs within a nested for
>> loop that iterates over voxels.  A single thread checks label values and
>> each time a particle label needs to be replaced, nThreads instances of
>> replaceLabels() run, replacing labels in discrete slices indexed in a 1D
>> work array.  Is it possible that one thread, having finished its job of
>> replaceLabels(), iterates through the for loop while the other thread(s)
>> is(are) still running replaceLabels()?
>>
>> Any comments appreciated,
>>
>> Mike
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Thomas Boudier
Hi Michael,

I mean when each thread work on separate arrays, I split my images in 4
parts (when I have 4 cpus) with overlap and process each part with a
thread and then recombine them.

Thomas


Michael Doube a écrit :

> Hi Thomas
>
>> I've noticed that it works much faster when each thread works on its
>> own separate piece of image.
>
> Do you mean when each thread is given a 1D array from a 2D array (e.g.
> workArray[z][]), or when each thread is given a separate range in a 1D
> workArray[]?
>
> Mike
>
>
>> Thomas
>>
>>
>> Michael Doube a écrit :
>>> Hi all
>>>
>>> I recently wrote a multithreaded plugin
>>> (http://doube.org/plugins.html#purify) to find the biggest particle
>>> in a stack and delete all others.  It works by running a particle
>>> joining method on separate chunks (sets of slices), where each chunk
>>> is processed in a separate thread. The chunks are then stitched back
>>> together; the particle relabelling method replaceLabels() is run
>>> multithreaded.  However, it seems that sometimes stitching fails - I
>>> can run with the same conditions on the same stack and get different
>>> results, so I guess that the threads are clobbering each other.
>>>
>>> I have 2 questions:
>>>
>>> 1) is there any difference in thread safety to use 1D work arrays
>>> with (x,y,z) indexing like
>>>
>>>  workArray[index] where index = z * width * height + y * width + x
>>>
>>> and each thread given a discrete range of z values to work on
>>>
>>> or to use 2D arrays with (x,y,z) indexing like
>>>
>>>  workArray[z][index] where index = y * width + x
>>>
>>> and each thread given a z to work on
>>>
>>> (The plugin uses 1D arrays at the moment)
>>>
>>>
>>> 2) the multithreaded version of replaceLabels() runs within a nested
>>> for loop that iterates over voxels.  A single thread checks label
>>> values and each time a particle label needs to be replaced, nThreads
>>> instances of replaceLabels() run, replacing labels in discrete slices
>>> indexed in a 1D work array.  Is it possible that one thread, having
>>> finished its job of replaceLabels(), iterates through the for loop
>>> while the other thread(s) is(are) still running replaceLabels()?
>>>
>>> Any comments appreciated,
>>>
>>> Mike
>>>
>>>
>
>

--
   /**********************************************************/
      Thomas Boudier, MCU Université Pierre et Marie Curie,
      IFR 83. Bat B 7ème étage, porte 709, Jussieu.
      Tel : 01 44 27 20 11  Fax : 01 44 27 22 91
/*******************************************************/
Reply | Threaded
Open this post in threaded view
|

Antwort: Re: Arrays and thread safety

Joachim Wesner
Interesting "Thread" :-)

However, IMHO there should be no difference sd Michael suggests if you have
truely different arrays or only disjoint parts in the same array!?

In the end, it´s all about "memory adresses". (Cash trashing could be a
point which might be in favour of each of both version, depending of the
size of the image and the cashes)

Yet, depending on the "hotspot optimization" of the Java virtual machine,
could there be a difference witrh regard to (not fully optimized away)
bounds checking
or does the VM even check for possible access conflicts? In the latter
case, when it knows it´s clearly different "object" it could avoid that
check.
However, I would assum that is the responsibilty of the programmer and
finally the meory hardware to soemhow serialze the access. If you write
non-thread-safe code, the VM can´t fix it in the general case.

Mit freundlichen Grüßen / Best regards

Joachim Wesner
____________________________________________

Leica Microsystems CMS GmbH | GmbH mit Sitz in Wetzlar | Amtsgericht
Wetzlar  HRB 2432
Geschäftsführer:  Dr. Stefan Traeger | Dr. Wolf-Otto Reuter | Dr. David Roy
Martyr | Colin Davis


                                                                           
             Thomas Boudier                                                
             <thomas.boudier@S                                            
             NV.JUSSIEU.FR>                                             An
             Gesendet von:              [hidden email]                
             ImageJ Interest                                         Kopie
             Group                                                        
             <[hidden email].                                       Thema
             GOV>                       Re: Arrays and thread safety      
                                                                           
                                                                           
             07.05.2009 17:07                                              
                                                                           
                                                                           
              Bitte antworten                                              
                    an                                                    
              ImageJ Interest                                              
                   Group                                                  
             <[hidden email].                                            
                   GOV>                                                    
                                                                           
                                                                           




Hi Michael,

I mean when each thread work on separate arrays, I split my images in 4
parts (when I have 4 cpus) with overlap and process each part with a
thread and then recombine them.

Thomas


Michael Doube a écrit :

> Hi Thomas
>
>> I've noticed that it works much faster when each thread works on its
>> own separate piece of image.
>
> Do you mean when each thread is given a 1D array from a 2D array (e.g.
> workArray[z][]), or when each thread is given a separate range in a 1D
> workArray[]?
>
> Mike
>
>
>> Thomas
>>
>>
>> Michael Doube a écrit :
>>> Hi all
>>>
>>> I recently wrote a multithreaded plugin
>>> (http://doube.org/plugins.html#purify) to find the biggest particle
>>> in a stack and delete all others.  It works by running a particle
>>> joining method on separate chunks (sets of slices), where each chunk
>>> is processed in a separate thread. The chunks are then stitched back
>>> together; the particle relabelling method replaceLabels() is run
>>> multithreaded.  However, it seems that sometimes stitching fails - I
>>> can run with the same conditions on the same stack and get different
>>> results, so I guess that the threads are clobbering each other.
>>>
>>> I have 2 questions:
>>>
>>> 1) is there any difference in thread safety to use 1D work arrays
>>> with (x,y,z) indexing like
>>>
>>>  workArray[index] where index = z * width * height + y * width + x
>>>
>>> and each thread given a discrete range of z values to work on
>>>
>>> or to use 2D arrays with (x,y,z) indexing like
>>>
>>>  workArray[z][index] where index = y * width + x
>>>
>>> and each thread given a z to work on
>>>
>>> (The plugin uses 1D arrays at the moment)
>>>
>>>
>>> 2) the multithreaded version of replaceLabels() runs within a nested
>>> for loop that iterates over voxels.  A single thread checks label
>>> values and each time a particle label needs to be replaced, nThreads
>>> instances of replaceLabels() run, replacing labels in discrete slices
>>> indexed in a 1D work array.  Is it possible that one thread, having
>>> finished its job of replaceLabels(), iterates through the for loop
>>> while the other thread(s) is(are) still running replaceLabels()?
>>>
>>> Any comments appreciated,
>>>
>>> Mike
>>>
>>>
>
>

--
   /**********************************************************/
      Thomas Boudier, MCU Université Pierre et Marie Curie,
      IFR 83. Bat B 7ème étage, porte 709, Jussieu.
      Tel : 01 44 27 20 11  Fax : 01 44 27 22 91
/*******************************************************/



______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Michael Schmid
Hi Mike,

in multithreaded code, reading is *never* a problem as long as the  
data are unchanged - as far as I see, your workarray is read-only  
while you have multiple threads. If you do nothing than reading, you  
could also use the original data of the stack.

When writing, make sure that threads have their own range of data,  
and that no thread writes to the data that are *read or written* by  
another thread.

I don't understand all of your code, but you may have a problem like  
the following:
Assume, e.g., running code like your replaceLabel in multiple threads:
        for (int i = start; i < end; i++) {
            if (tagArray[i] == m) {
                tagArray[i] = n;
            }
        }
Results will be unpredictable if the ranges from 'start' to 'end'  
overlap and one thread would change tags from m=3 to n=4 and the  
other m=4 to n=5.

A side-remark: all your
   return particleLabels;
statements look strange. particleLabels  is an array. If a method  
writes to an array, its contents are modified, whether you return it  
or not. By returning it, you return the reference to the array, which  
makes sense only if you replace the array by a new one (but that  
would not be multithread-complient at all).

Hope this helps,

Michael
________________________________________________________________


> Interesting "Thread" :-)
>
> However, IMHO there should be no difference sd Michael suggests if  
> you have
> truely different arrays or only disjoint parts in the same array!?
>
> In the end, it´s all about "memory adresses". (Cash trashing could  
> be a
> point which might be in favour of each of both version, depending  
> of the
> size of the image and the cashes)
>
> Yet, depending on the "hotspot optimization" of the Java virtual  
> machine,
> could there be a difference witrh regard to (not fully optimized away)
> bounds checking
> or does the VM even check for possible access conflicts? In the latter
> case, when it knows it´s clearly different "object" it could avoid  
> that
> check.
> However, I would assum that is the responsibilty of the programmer and
> finally the meory hardware to soemhow serialze the access. If you  
> write
> non-thread-safe code, the VM can´t fix it in the general case.
>
> Mit freundlichen Grüßen / Best regards
>
> Joachim Wesner
> ____________________________________________
>
> Hi Michael,
>
> I mean when each thread work on separate arrays, I split my images  
> in 4
> parts (when I have 4 cpus) with overlap and process each part with a
> thread and then recombine them.
>
> Thomas
>
>
> Michael Doube a écrit :
>> Hi Thomas
>>
>>> I've noticed that it works much faster when each thread works on its
>>> own separate piece of image.
>>
>> Do you mean when each thread is given a 1D array from a 2D array  
>> (e.g.
>> workArray[z][]), or when each thread is given a separate range in  
>> a 1D
>> workArray[]?
>>
>> Mike
>>
>>
>>> Thomas
>>>
>>>
>>> Michael Doube a écrit :
>>>> Hi all
>>>>
>>>> I recently wrote a multithreaded plugin
>>>> (http://doube.org/plugins.html#purify) to find the biggest particle
>>>> in a stack and delete all others.  It works by running a particle
>>>> joining method on separate chunks (sets of slices), where each  
>>>> chunk
>>>> is processed in a separate thread. The chunks are then stitched  
>>>> back
>>>> together; the particle relabelling method replaceLabels() is run
>>>> multithreaded.  However, it seems that sometimes stitching fails  
>>>> - I
>>>> can run with the same conditions on the same stack and get  
>>>> different
>>>> results, so I guess that the threads are clobbering each other.
>>>>
>>>> I have 2 questions:
>>>>
>>>> 1) is there any difference in thread safety to use 1D work arrays
>>>> with (x,y,z) indexing like
>>>>
>>>>  workArray[index] where index = z * width * height + y * width + x
>>>>
>>>> and each thread given a discrete range of z values to work on
>>>>
>>>> or to use 2D arrays with (x,y,z) indexing like
>>>>
>>>>  workArray[z][index] where index = y * width + x
>>>>
>>>> and each thread given a z to work on
>>>>
>>>> (The plugin uses 1D arrays at the moment)
>>>>
>>>>
>>>> 2) the multithreaded version of replaceLabels() runs within a  
>>>> nested
>>>> for loop that iterates over voxels.  A single thread checks label
>>>> values and each time a particle label needs to be replaced,  
>>>> nThreads
>>>> instances of replaceLabels() run, replacing labels in discrete  
>>>> slices
>>>> indexed in a 1D work array.  Is it possible that one thread, having
>>>> finished its job of replaceLabels(), iterates through the for loop
>>>> while the other thread(s) is(are) still running replaceLabels()?
>>>>
>>>> Any comments appreciated,
>>>>
>>>> Mike
>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Albert Cardona
In reply to this post by Thomas Boudier
Thomas Boudier wrote:
> Hi Michael,
>
> I mean when each thread work on separate arrays, I split my images in
> 4 parts (when I have 4 cpus) with overlap and process each part with a
> thread and then recombine them.
>
> Thomas



I second that, makes big difference.
It's all about resource contention when accessing/modifying variables:
the best is to have none at all.

Albert

--
Albert Cardona
http://albert.rierol.net
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Michael Doube
In reply to this post by Michael Schmid
> I don't understand all of your code, but you may have a problem like  
> the following:
> Assume, e.g., running code like your replaceLabel in multiple threads:
> for (int i = start; i < end; i++) {
>    if (tagArray[i] == m) {
> tagArray[i] = n;
>    }
> }
> Results will be unpredictable if the ranges from 'start' to 'end'  
> overlap and one thread would change tags from m=3 to n=4 and the  
> other m=4 to n=5.

Exactly; ranges shouldn't overlap as each thread is given a discrete
range of array indices.  Array values should be read and written only
once by only one thread each time a new ReplaceLabelThread is created.

> A side-remark: all your
>    return particleLabels;
> statements look strange. particleLabels  is an array. If a method  
> writes to an array, its contents are modified, whether you return it  
> or not. By returning it, you return the reference to the array, which  
> makes sense only if you replace the array by a new one (but that  
> would not be multithread-complient at all).

OK, thanks for the tip, I shall make the affected methods return void.

>
> Hope this helps,

very much,

Thanks

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Michael Doube
In reply to this post by Albert Cardona
Albert Cardona wrote:

> Thomas Boudier wrote:
>> Hi Michael,
>>
>> I mean when each thread work on separate arrays, I split my images in
>> 4 parts (when I have 4 cpus) with overlap and process each part with a
>> thread and then recombine them.
>>
>> Thomas
>
>
>
> I second that, makes big difference.
> It's all about resource contention when accessing/modifying variables:
> the best is to have none at all.
>
> Albert

So is it enough for each thread to have access to separate arrays within
a 2D array (e.g. different values of z for workArray[z][])?  Or does
each thread need its own copy of the elements as a completely separate
entity?

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Albert Cardona
Michael Doube wrote:
> So is it enough for each thread to have access to separate arrays
> within a 2D array (e.g. different values of z for workArray[z][])?  Or
> does each thread need its own copy of the elements as a completely
> separate entity?



Test it. Separate entity entirely works best for me.
A subrange of an array doesn't prevent resource access contention.

Albert

--
Albert Cardona
http://albert.rierol.net
Reply | Threaded
Open this post in threaded view
|

Re: Arrays and thread safety

Michael Doube
Albert Cardona wrote:
> Michael Doube wrote:
>> So is it enough for each thread to have access to separate arrays
>> within a 2D array (e.g. different values of z for workArray[z][])?  Or
>> does each thread need its own copy of the elements as a completely
>> separate entity?

> Test it. Separate entity entirely works best for me.
> A subrange of an array doesn't prevent resource access contention.

Hmmm. 2D arrays offer no improvement over 1D arrays - I get the same
unpredictable error.  There might be a bug in my indexing or some
contention going on.

I tested with and without multithreading of that particular method and
found negligible speed increase, so it looks I've got something horribly
  wrong or the method is fundamentally unsuited to multithreading.  For
now I've taken it out.

Thanks everyone for your suggestions,

Mike