ImageJ

Arrays and thread safety

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

11 messages Options

Michael Doube

Arrays and thread safety

Hi all

I recently wrote a multithreaded plugin
(http://doube.org/plugins.html#purify) to find the biggest particle in a
stack and delete all others. It works by running a particle joining
method on separate chunks (sets of slices), where each chunk is
processed in a separate thread. The chunks are then stitched back
together; the particle relabelling method replaceLabels() is run
multithreaded. However, it seems that sometimes stitching fails - I can
run with the same conditions on the same stack and get different
results, so I guess that the threads are clobbering each other.

I have 2 questions:

1) is there any difference in thread safety to use 1D work arrays with
(x,y,z) indexing like

workArray[index] where index = z * width * height + y * width + x

and each thread given a discrete range of z values to work on

or to use 2D arrays with (x,y,z) indexing like

workArray[z][index] where index = y * width + x

and each thread given a z to work on

(The plugin uses 1D arrays at the moment)

2) the multithreaded version of replaceLabels() runs within a nested for
loop that iterates over voxels. A single thread checks label values and
each time a particle label needs to be replaced, nThreads instances of
replaceLabels() run, replacing labels in discrete slices indexed in a 1D
work array. Is it possible that one thread, having finished its job of
replaceLabels(), iterates through the for loop while the other thread(s)
is(are) still running replaceLabels()?

Any comments appreciated,

Mike

Thomas Boudier

Re: Arrays and thread safety

Hi Michael,

I have used a 3d version for multi-threading and it seems ok, of course
it is not has fast as 1D ;-) . I've noticed that it works much faster
when each thread works on its own separate piece of image.

Thomas

Michael Doube a écrit :

> Hi all
>
> I recently wrote a multithreaded plugin
> (http://doube.org/plugins.html#purify) to find the biggest particle in a
> stack and delete all others. It works by running a particle joining
> method on separate chunks (sets of slices), where each chunk is
> processed in a separate thread. The chunks are then stitched back
> together; the particle relabelling method replaceLabels() is run
> multithreaded. However, it seems that sometimes stitching fails - I can
> run with the same conditions on the same stack and get different
> results, so I guess that the threads are clobbering each other.
>
> I have 2 questions:
>
> 1) is there any difference in thread safety to use 1D work arrays with
> (x,y,z) indexing like
>
> workArray[index] where index = z * width * height + y * width + x
>
> and each thread given a discrete range of z values to work on
>
> or to use 2D arrays with (x,y,z) indexing like
>
> workArray[z][index] where index = y * width + x
>
> and each thread given a z to work on
>
> (The plugin uses 1D arrays at the moment)
>
>
> 2) the multithreaded version of replaceLabels() runs within a nested for
> loop that iterates over voxels. A single thread checks label values and
> each time a particle label needs to be replaced, nThreads instances of
> replaceLabels() run, replacing labels in discrete slices indexed in a 1D
> work array. Is it possible that one thread, having finished its job of
> replaceLabels(), iterates through the for loop while the other thread(s)
> is(are) still running replaceLabels()?
>
> Any comments appreciated,
>
> Mike
>
>

Michael Doube

Re: Arrays and thread safety

Hi Thomas

> I've noticed that it works much faster
> when each thread works on its own separate piece of image.

Do you mean when each thread is given a 1D array from a 2D array (e.g.
workArray[z][]), or when each thread is given a separate range in a 1D
workArray[]?

Mike

> Thomas
>
>
> Michael Doube a écrit :
>> Hi all
>>
>> I recently wrote a multithreaded plugin
>> (http://doube.org/plugins.html#purify) to find the biggest particle in a
>> stack and delete all others. It works by running a particle joining
>> method on separate chunks (sets of slices), where each chunk is
>> processed in a separate thread. The chunks are then stitched back
>> together; the particle relabelling method replaceLabels() is run
>> multithreaded. However, it seems that sometimes stitching fails - I can
>> run with the same conditions on the same stack and get different
>> results, so I guess that the threads are clobbering each other.
>>
>> I have 2 questions:
>>
>> 1) is there any difference in thread safety to use 1D work arrays with
>> (x,y,z) indexing like
>>
>> workArray[index] where index = z * width * height + y * width + x
>>
>> and each thread given a discrete range of z values to work on
>>
>> or to use 2D arrays with (x,y,z) indexing like
>>
>> workArray[z][index] where index = y * width + x
>>
>> and each thread given a z to work on
>>
>> (The plugin uses 1D arrays at the moment)
>>
>>
>> 2) the multithreaded version of replaceLabels() runs within a nested for
>> loop that iterates over voxels. A single thread checks label values and
>> each time a particle label needs to be replaced, nThreads instances of
>> replaceLabels() run, replacing labels in discrete slices indexed in a 1D
>> work array. Is it possible that one thread, having finished its job of
>> replaceLabels(), iterates through the for loop while the other thread(s)
>> is(are) still running replaceLabels()?
>>
>> Any comments appreciated,
>>
>> Mike
>>
>>

Thomas Boudier

Re: Arrays and thread safety

Hi Michael,

I mean when each thread work on separate arrays, I split my images in 4
parts (when I have 4 cpus) with overlap and process each part with a
thread and then recombine them.

Thomas

Michael Doube a écrit :

> Hi Thomas
>
>> I've noticed that it works much faster when each thread works on its
>> own separate piece of image.
>
> Do you mean when each thread is given a 1D array from a 2D array (e.g.
> workArray[z][]), or when each thread is given a separate range in a 1D
> workArray[]?
>
> Mike
>
>
>> Thomas
>>
>>
>> Michael Doube a écrit :
>>> Hi all
>>>
>>> I recently wrote a multithreaded plugin
>>> (http://doube.org/plugins.html#purify) to find the biggest particle
>>> in a stack and delete all others. It works by running a particle
>>> joining method on separate chunks (sets of slices), where each chunk
>>> is processed in a separate thread. The chunks are then stitched back
>>> together; the particle relabelling method replaceLabels() is run
>>> multithreaded. However, it seems that sometimes stitching fails - I
>>> can run with the same conditions on the same stack and get different
>>> results, so I guess that the threads are clobbering each other.
>>>
>>> I have 2 questions:
>>>
>>> 1) is there any difference in thread safety to use 1D work arrays
>>> with (x,y,z) indexing like
>>>
>>> workArray[index] where index = z * width * height + y * width + x
>>>
>>> and each thread given a discrete range of z values to work on
>>>
>>> or to use 2D arrays with (x,y,z) indexing like
>>>
>>> workArray[z][index] where index = y * width + x
>>>
>>> and each thread given a z to work on
>>>
>>> (The plugin uses 1D arrays at the moment)
>>>
>>>
>>> 2) the multithreaded version of replaceLabels() runs within a nested
>>> for loop that iterates over voxels. A single thread checks label
>>> values and each time a particle label needs to be replaced, nThreads
>>> instances of replaceLabels() run, replacing labels in discrete slices
>>> indexed in a 1D work array. Is it possible that one thread, having
>>> finished its job of replaceLabels(), iterates through the for loop
>>> while the other thread(s) is(are) still running replaceLabels()?
>>>
>>> Any comments appreciated,
>>>
>>> Mike
>>>
>>>
>
>

Joachim Wesner

Antwort: Re: Arrays and thread safety

Interesting "Thread" :-)

However, IMHO there should be no difference sd Michael suggests if you have
truely different arrays or only disjoint parts in the same array!?

In the end, it´s all about "memory adresses". (Cash trashing could be a
point which might be in favour of each of both version, depending of the
size of the image and the cashes)

Yet, depending on the "hotspot optimization" of the Java virtual machine,
could there be a difference witrh regard to (not fully optimized away)
bounds checking
or does the VM even check for possible access conflicts? In the latter
case, when it knows it´s clearly different "object" it could avoid that
check.
However, I would assum that is the responsibilty of the programmer and
finally the meory hardware to soemhow serialze the access. If you write
non-thread-safe code, the VM can´t fix it in the general case.

Mit freundlichen Grüßen / Best regards

Joachim Wesner
____________________________________________

Leica Microsystems CMS GmbH | GmbH mit Sitz in Wetzlar | Amtsgericht
Wetzlar HRB 2432
Geschäftsführer: Dr. Stefan Traeger | Dr. Wolf-Otto Reuter | Dr. David Roy
Martyr | Colin Davis

Thomas Boudier
<thomas.boudier@S
NV.JUSSIEU.FR> An
Gesendet von: [hidden email]
ImageJ Interest Kopie
Group
<[hidden email]. Thema
GOV> Re: Arrays and thread safety

07.05.2009 17:07

Bitte antworten
an
ImageJ Interest
Group
<[hidden email].
GOV>

Hi Michael,

I mean when each thread work on separate arrays, I split my images in 4
parts (when I have 4 cpus) with overlap and process each part with a
thread and then recombine them.

Thomas

Michael Doube a écrit :

--
/**********************************************************/
Thomas Boudier, MCU Université Pierre et Marie Curie,
IFR 83. Bat B 7ème étage, porte 709, Jussieu.
Tel : 01 44 27 20 11 Fax : 01 44 27 22 91
/*******************************************************/

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________

Michael Schmid

Re: Arrays and thread safety

Hi Mike,

in multithreaded code, reading is *never* a problem as long as the
data are unchanged - as far as I see, your workarray is read-only
while you have multiple threads. If you do nothing than reading, you
could also use the original data of the stack.

When writing, make sure that threads have their own range of data,
and that no thread writes to the data that are *read or written* by
another thread.

I don't understand all of your code, but you may have a problem like
the following:
Assume, e.g., running code like your replaceLabel in multiple threads:
for (int i = start; i < end; i++) {
if (tagArray[i] == m) {
tagArray[i] = n;
}
}
Results will be unpredictable if the ranges from 'start' to 'end'
overlap and one thread would change tags from m=3 to n=4 and the
other m=4 to n=5.

A side-remark: all your
return particleLabels;
statements look strange. particleLabels is an array. If a method
writes to an array, its contents are modified, whether you return it
or not. By returning it, you return the reference to the array, which
makes sense only if you replace the array by a new one (but that
would not be multithread-complient at all).

Hope this helps,

Michael
________________________________________________________________

> Interesting "Thread" :-)
>
> However, IMHO there should be no difference sd Michael suggests if
> you have
> truely different arrays or only disjoint parts in the same array!?
>
> In the end, it´s all about "memory adresses". (Cash trashing could
> be a
> point which might be in favour of each of both version, depending
> of the
> size of the image and the cashes)
>
> Yet, depending on the "hotspot optimization" of the Java virtual
> machine,
> could there be a difference witrh regard to (not fully optimized away)
> bounds checking
> or does the VM even check for possible access conflicts? In the latter
> case, when it knows it´s clearly different "object" it could avoid
> that
> check.
> However, I would assum that is the responsibilty of the programmer and
> finally the meory hardware to soemhow serialze the access. If you
> write
> non-thread-safe code, the VM can´t fix it in the general case.
>
> Mit freundlichen Grüßen / Best regards
>
> Joachim Wesner
> ____________________________________________
>
> Hi Michael,
>
> I mean when each thread work on separate arrays, I split my images
> in 4
> parts (when I have 4 cpus) with overlap and process each part with a
> thread and then recombine them.
>
> Thomas
>
>
> Michael Doube a écrit :
>> Hi Thomas
>>
>>> I've noticed that it works much faster when each thread works on its
>>> own separate piece of image.
>>
>> Do you mean when each thread is given a 1D array from a 2D array
>> (e.g.
>> workArray[z][]), or when each thread is given a separate range in
>> a 1D
>> workArray[]?
>>
>> Mike
>>
>>
>>> Thomas
>>>
>>>
>>> Michael Doube a écrit :
>>>> Hi all
>>>>
>>>> I recently wrote a multithreaded plugin
>>>> (http://doube.org/plugins.html#purify) to find the biggest particle
>>>> in a stack and delete all others. It works by running a particle
>>>> joining method on separate chunks (sets of slices), where each
>>>> chunk
>>>> is processed in a separate thread. The chunks are then stitched
>>>> back
>>>> together; the particle relabelling method replaceLabels() is run
>>>> multithreaded. However, it seems that sometimes stitching fails
>>>> - I
>>>> can run with the same conditions on the same stack and get
>>>> different
>>>> results, so I guess that the threads are clobbering each other.
>>>>
>>>> I have 2 questions:
>>>>
>>>> 1) is there any difference in thread safety to use 1D work arrays
>>>> with (x,y,z) indexing like
>>>>
>>>> workArray[index] where index = z * width * height + y * width + x
>>>>
>>>> and each thread given a discrete range of z values to work on
>>>>
>>>> or to use 2D arrays with (x,y,z) indexing like
>>>>
>>>> workArray[z][index] where index = y * width + x
>>>>
>>>> and each thread given a z to work on
>>>>
>>>> (The plugin uses 1D arrays at the moment)
>>>>
>>>>
>>>> 2) the multithreaded version of replaceLabels() runs within a
>>>> nested
>>>> for loop that iterates over voxels. A single thread checks label
>>>> values and each time a particle label needs to be replaced,
>>>> nThreads
>>>> instances of replaceLabels() run, replacing labels in discrete
>>>> slices
>>>> indexed in a 1D work array. Is it possible that one thread, having
>>>> finished its job of replaceLabels(), iterates through the for loop
>>>> while the other thread(s) is(are) still running replaceLabels()?
>>>>
>>>> Any comments appreciated,
>>>>
>>>> Mike
>>>>
>>>>
>>

Albert Cardona

Re: Arrays and thread safety

In reply to this post by Thomas Boudier

Thomas Boudier wrote:
> Hi Michael,
>
> I mean when each thread work on separate arrays, I split my images in
> 4 parts (when I have 4 cpus) with overlap and process each part with a
> thread and then recombine them.
>
> Thomas

I second that, makes big difference.
It's all about resource contention when accessing/modifying variables:
the best is to have none at all.

Albert

--
Albert Cardona
http://albert.rierol.net

Michael Doube

Re: Arrays and thread safety

In reply to this post by Michael Schmid

> I don't understand all of your code, but you may have a problem like
> the following:
> Assume, e.g., running code like your replaceLabel in multiple threads:
> for (int i = start; i < end; i++) {
> if (tagArray[i] == m) {
> tagArray[i] = n;
> }
> }
> Results will be unpredictable if the ranges from 'start' to 'end'
> overlap and one thread would change tags from m=3 to n=4 and the
> other m=4 to n=5.

Exactly; ranges shouldn't overlap as each thread is given a discrete
range of array indices. Array values should be read and written only
once by only one thread each time a new ReplaceLabelThread is created.

> A side-remark: all your
> return particleLabels;
> statements look strange. particleLabels is an array. If a method
> writes to an array, its contents are modified, whether you return it
> or not. By returning it, you return the reference to the array, which
> makes sense only if you replace the array by a new one (but that
> would not be multithread-complient at all).

OK, thanks for the tip, I shall make the affected methods return void.

>
> Hope this helps,

very much,

Thanks

Mike

Michael Doube

Re: Arrays and thread safety

In reply to this post by Albert Cardona

Albert Cardona wrote:

> Thomas Boudier wrote:
>> Hi Michael,
>>
>> I mean when each thread work on separate arrays, I split my images in
>> 4 parts (when I have 4 cpus) with overlap and process each part with a
>> thread and then recombine them.
>>
>> Thomas
>
>
>
> I second that, makes big difference.
> It's all about resource contention when accessing/modifying variables:
> the best is to have none at all.
>
> Albert

So is it enough for each thread to have access to separate arrays within
a 2D array (e.g. different values of z for workArray[z][])? Or does
each thread need its own copy of the elements as a completely separate
entity?

Mike

Albert Cardona

Re: Arrays and thread safety

Michael Doube wrote:
> So is it enough for each thread to have access to separate arrays
> within a 2D array (e.g. different values of z for workArray[z][])? Or
> does each thread need its own copy of the elements as a completely
> separate entity?

Test it. Separate entity entirely works best for me.
A subrange of an array doesn't prevent resource access contention.

Albert

--
Albert Cardona
http://albert.rierol.net

Michael Doube

Re: Arrays and thread safety

Albert Cardona wrote:
> Michael Doube wrote:
>> So is it enough for each thread to have access to separate arrays
>> within a 2D array (e.g. different values of z for workArray[z][])? Or
>> does each thread need its own copy of the elements as a completely
>> separate entity?

> Test it. Separate entity entirely works best for me.
> A subrange of an array doesn't prevent resource access contention.

Hmmm. 2D arrays offer no improvement over 1D arrays - I get the same
unpredictable error. There might be a bug in my indexing or some
contention going on.

I tested with and without multithreading of that particular method and
found negligible speed increase, so it looks I've got something horribly
wrong or the method is fundamentally unsuited to multithreading. For
now I've taken it out.

Thanks everyone for your suggestions,

Mike