Error Bars question and suggestion

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Error Bars question and suggestion

vischer
Hello statistics people,

I would like to know the most common way to use error bars in a plot. In our lab, we use error bars by subdividing a scatter plot into a number of bins, and then display the 95% confidence of each bin a an error bar.
 
The macro below simulates such a plot, and shows bin#2 as a green box with its statistics in the Log window. For calculating an error bar, we use the equation:

        e = 2* stDev/sqrt(n-1),

where "2* stDev" corresponds to ~95% confidence and "n" is the number of dots in the bin.

We are not interested whether the curve follows any polynomial or other fit, but whether two curves (partially) differ significantly.


Question 1: is this the most common way to use error bars?
Question 2: is it better to replace the equation above by the Student's test, which would result in larger error bars for very small count n? (Michael Schmid made me aware of this).

This discussion started when I suggested to extend ImageJ's Plot commands with something like:
Plot.add("error bars", xValues, yValues, options)
where the options define start, stop, number of bins and confidence.


Best regards,
Norbert






macro "Simulated Error Bars"{
        close("Error Bars");
        random("seed", 1);
        nPoints = 500;
        xValues = newArray(nPoints);
        yValues = newArray(nPoints);
        for(jj = 0; jj < nPoints; jj++){
                x = random;
                y= 1 + sqrt(x) + (random - 0.5) * sqrt(random);
                xValues[jj] = x;
                yValues[jj] = y;
        }
       
        nBins = 10;
        start = 0;
        stop = 1;
        eX = newArray(nBins);
        eY = newArray(nBins);
        eBar = newArray(nBins);
       
        for(bin = 0; bin < nBins; bin++){
                binWidth = (stop - start)/nBins;
                left = bin * binWidth;
                right = left + binWidth;
                binContent = newArray(nPoints);
                count = 0;
                for(jj = 0; jj < nPoints; jj++){
                        x = xValues[jj];
                        y = yValues[jj];
                        if(x >= left && x < right){
                                binContent[count++] = y;
                        }
                }
                binContent = Array.trim(binContent, count);
                Array.getStatistics(binContent, min, max, mean, stdDev);
                eX[bin] = (left + right)/2;
                eY[bin] = mean;
                eBar[bin] = 2 * stdDev/sqrt(count-1);
                if(bin== 2){
                        print("\\Clear");
                        print("Green Bin:");
                        print("left=" , left);
                        print("right=" , right);
                        print("mean=" , mean);
                        print("stDev=" , stdDev);
                        print("count=" , count);
                        print("error=" , eBar[bin]);
                       
                }
        }

        Plot.create("Error Bars", "Age", "Y");
        Plot.setLimits(0, 1, 0, 3);
        Plot.setColor("gray");
        Plot.add("+", xValues, yValues);
        Plot.setColor("red");
        Plot.setLineWidth(2);
        Plot.add("line", eX, eY);
        Plot.add("error bars", eBar);
        Plot.show;
        Plot.freeze();
        greenBinX = newArray(0.2, 0.3, 0.3, 0.2);
        greenBinY = newArray(0, 0, 3, 3);
        toUnscaled(greenBinX, greenBinY);
        makeSelection("polygon", greenBinX, greenBinY);
        changeValues(0xffffff, 0xffffff, 0xaaffaa);
        run("Select None");
}



Output:
=======
Green Bin:
left= 0.2
right= 0.3
mean= 1.5106
stDev= 0.2031
count= 40
error= 0.06506

--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html
Reply | Threaded
Open this post in threaded view
|

Re: Error Bars question and suggestion

Rex Kerr
The Kolmogorov-Smirnov test is typically used to tell whether two
distributions differ significantly.  It's based on comparing the largest
observed deviation between the two curves and the largest expected
randomly.  Note that for small sample sizes, an exact estimate needs to be
used instead of the parametric form that can be found.

Alternatively, almost anything can be turned into a statistical test by
finding any measure of difference you can imagine, and then using monte
carlo sampling to estimate the chance that it will happen randomly (by
mixing the data from the two conditions together and then sampling randomly
at the sample sizes you observed).

Always be suspicious of tests that involve a small number of points: you
might get a result that is significant by some valid statistical measure,
but that doesn't mean it's true or can be reproduced.  Rank tests are less
likely than parametric tests to give false results because of small sample
sizes (as the parametric form being wrong or mis-estimated with a small
number of points is one of the biggest causes of failure of statistical
testing).

  --Rex


On Mon, Sep 4, 2017 at 3:40 PM, Norbert Vischer <[hidden email]>
wrote:

> Hello statistics people,
>
> I would like to know the most common way to use error bars in a plot. In
> our lab, we use error bars by subdividing a scatter plot into a number of
> bins, and then display the 95% confidence of each bin a an error bar.
>
> The macro below simulates such a plot, and shows bin#2 as a green box with
> its statistics in the Log window. For calculating an error bar, we use the
> equation:
>
>         e = 2* stDev/sqrt(n-1),
>
> where "2* stDev" corresponds to ~95% confidence and "n" is the number of
> dots in the bin.
>
> We are not interested whether the curve follows any polynomial or other
> fit, but whether two curves (partially) differ significantly.
>
>
> Question 1: is this the most common way to use error bars?
> Question 2: is it better to replace the equation above by the Student's
> test, which would result in larger error bars for very small count n?
> (Michael Schmid made me aware of this).
>
> This discussion started when I suggested to extend ImageJ's Plot commands
> with something like:
> Plot.add("error bars", xValues, yValues, options)
> where the options define start, stop, number of bins and confidence.
>
>
> Best regards,
> Norbert
>
>
>
>
>
>
> macro "Simulated Error Bars"{
>         close("Error Bars");
>         random("seed", 1);
>         nPoints = 500;
>         xValues = newArray(nPoints);
>         yValues = newArray(nPoints);
>         for(jj = 0; jj < nPoints; jj++){
>                 x = random;
>                 y= 1 + sqrt(x) + (random - 0.5) * sqrt(random);
>                 xValues[jj] = x;
>                 yValues[jj] = y;
>         }
>
>         nBins = 10;
>         start = 0;
>         stop = 1;
>         eX = newArray(nBins);
>         eY = newArray(nBins);
>         eBar = newArray(nBins);
>
>         for(bin = 0; bin < nBins; bin++){
>                 binWidth = (stop - start)/nBins;
>                 left = bin * binWidth;
>                 right = left + binWidth;
>                 binContent = newArray(nPoints);
>                 count = 0;
>                 for(jj = 0; jj < nPoints; jj++){
>                         x = xValues[jj];
>                         y = yValues[jj];
>                         if(x >= left && x < right){
>                                 binContent[count++] = y;
>                         }
>                 }
>                 binContent = Array.trim(binContent, count);
>                 Array.getStatistics(binContent, min, max, mean, stdDev);
>                 eX[bin] = (left + right)/2;
>                 eY[bin] = mean;
>                 eBar[bin] = 2 * stdDev/sqrt(count-1);
>                 if(bin== 2){
>                         print("\\Clear");
>                         print("Green Bin:");
>                         print("left=" , left);
>                         print("right=" , right);
>                         print("mean=" , mean);
>                         print("stDev=" , stdDev);
>                         print("count=" , count);
>                         print("error=" , eBar[bin]);
>
>                 }
>         }
>
>         Plot.create("Error Bars", "Age", "Y");
>         Plot.setLimits(0, 1, 0, 3);
>         Plot.setColor("gray");
>         Plot.add("+", xValues, yValues);
>         Plot.setColor("red");
>         Plot.setLineWidth(2);
>         Plot.add("line", eX, eY);
>         Plot.add("error bars", eBar);
>         Plot.show;
>         Plot.freeze();
>         greenBinX = newArray(0.2, 0.3, 0.3, 0.2);
>         greenBinY = newArray(0, 0, 3, 3);
>         toUnscaled(greenBinX, greenBinY);
>         makeSelection("polygon", greenBinX, greenBinY);
>         changeValues(0xffffff, 0xffffff, 0xaaffaa);
>         run("Select None");
> }
>
>
>
> Output:
> =======
> Green Bin:
> left= 0.2
> right= 0.3
> mean= 1.5106
> stDev= 0.2031
> count= 40
> error= 0.06506
>
> --
> ImageJ mailing list: http://imagej.nih.gov/ij/list.html
>

--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html
Reply | Threaded
Open this post in threaded view
|

Re: Error Bars question and suggestion

vischer
Hi Rex,

thanks for the detailed answer. You mainly focussed on pitfalls such as small number of points per bin. My focus is more the positive case: ImageJ supports error bars, and I interested in cases where they are applied correctly.

I still think they are mainly intended for converting a scatter plot into a line plot and show the confidence per bin.
Without using further mathematics, an author can publish two plots and state that they are likely different where the error bars do not overlap. I would not use the Kolmogorov-Smirnov test, because neither the confidences would be respected, nor would be shown in which range the curves differ most.

So I come back to my suggestion to extend ImageJ's plot commands:

- with a single Plot.add() statement, the user could convert a scatter plot into a line plot and play with error bars

- for error bars, we use the simple equation e = k*stDev/sqrt(n-1), where k is confidence (typically =2)  and n number of dots in the bin. It assumes gaussian distribution per bin and is OK if n is not very small.

- we assume that the user will create his own error bars (if any) in difficult cases  

Possible syntax:
short: Plot.add("error bars", xValues, yValues, "nBins=10");
long: Plot.add("error bars", xValues, yValues, "nBins=10 start=0 stop=1 stdevs=2");

regards, Norbert



> On 5. Sep 2017, at 5:08, Rex Kerr <[hidden email]> wrote:
>
> The Kolmogorov-Smirnov test is typically used to tell whether two
> distributions differ significantly.  It's based on comparing the largest
> observed deviation between the two curves and the largest expected
> randomly.  Note that for small sample sizes, an exact estimate needs to be
> used instead of the parametric form that can be found.
>
> Alternatively, almost anything can be turned into a statistical test by
> finding any measure of difference you can imagine, and then using monte
> carlo sampling to estimate the chance that it will happen randomly (by
> mixing the data from the two conditions together and then sampling randomly
> at the sample sizes you observed).
>
> Always be suspicious of tests that involve a small number of points: you
> might get a result that is significant by some valid statistical measure,
> but that doesn't mean it's true or can be reproduced.  Rank tests are less
> likely than parametric tests to give false results because of small sample
> sizes (as the parametric form being wrong or mis-estimated with a small
> number of points is one of the biggest causes of failure of statistical
> testing).
>
>  --Rex
>
>
> On Mon, Sep 4, 2017 at 3:40 PM, Norbert Vischer <[hidden email]>
> wrote:
>
>> Hello statistics people,
>>
>> I would like to know the most common way to use error bars in a plot. In
>> our lab, we use error bars by subdividing a scatter plot into a number of
>> bins, and then display the 95% confidence of each bin a an error bar.
>>
>> The macro below simulates such a plot, and shows bin#2 as a green box with
>> its statistics in the Log window. For calculating an error bar, we use the
>> equation:
>>
>>        e = 2* stDev/sqrt(n-1),
>>
>> where "2* stDev" corresponds to ~95% confidence and "n" is the number of
>> dots in the bin.
>>
>> We are not interested whether the curve follows any polynomial or other
>> fit, but whether two curves (partially) differ significantly.
>>
>>
>> Question 1: is this the most common way to use error bars?
>> Question 2: is it better to replace the equation above by the Student's
>> test, which would result in larger error bars for very small count n?
>> (Michael Schmid made me aware of this).
>>
>> This discussion started when I suggested to extend ImageJ's Plot commands
>> with something like:
>> Plot.add("error bars", xValues, yValues, options)
>> where the options define start, stop, number of bins and confidence.
>>
>>
>> Best regards,
>> Norbert
>>
>>
>>
>>
>>
>>
>> macro "Simulated Error Bars"{
>>        close("Error Bars");
>>        random("seed", 1);
>>        nPoints = 500;
>>        xValues = newArray(nPoints);
>>        yValues = newArray(nPoints);
>>        for(jj = 0; jj < nPoints; jj++){
>>                x = random;
>>                y= 1 + sqrt(x) + (random - 0.5) * sqrt(random);
>>                xValues[jj] = x;
>>                yValues[jj] = y;
>>        }
>>
>>        nBins = 10;
>>        start = 0;
>>        stop = 1;
>>        eX = newArray(nBins);
>>        eY = newArray(nBins);
>>        eBar = newArray(nBins);
>>
>>        for(bin = 0; bin < nBins; bin++){
>>                binWidth = (stop - start)/nBins;
>>                left = bin * binWidth;
>>                right = left + binWidth;
>>                binContent = newArray(nPoints);
>>                count = 0;
>>                for(jj = 0; jj < nPoints; jj++){
>>                        x = xValues[jj];
>>                        y = yValues[jj];
>>                        if(x >= left && x < right){
>>                                binContent[count++] = y;
>>                        }
>>                }
>>                binContent = Array.trim(binContent, count);
>>                Array.getStatistics(binContent, min, max, mean, stdDev);
>>                eX[bin] = (left + right)/2;
>>                eY[bin] = mean;
>>                eBar[bin] = 2 * stdDev/sqrt(count-1);
>>                if(bin== 2){
>>                        print("\\Clear");
>>                        print("Green Bin:");
>>                        print("left=" , left);
>>                        print("right=" , right);
>>                        print("mean=" , mean);
>>                        print("stDev=" , stdDev);
>>                        print("count=" , count);
>>                        print("error=" , eBar[bin]);
>>
>>                }
>>        }
>>
>>        Plot.create("Error Bars", "Age", "Y");
>>        Plot.setLimits(0, 1, 0, 3);
>>        Plot.setColor("gray");
>>        Plot.add("+", xValues, yValues);
>>        Plot.setColor("red");
>>        Plot.setLineWidth(2);
>>        Plot.add("line", eX, eY);
>>        Plot.add("error bars", eBar);
>>        Plot.show;
>>        Plot.freeze();
>>        greenBinX = newArray(0.2, 0.3, 0.3, 0.2);
>>        greenBinY = newArray(0, 0, 3, 3);
>>        toUnscaled(greenBinX, greenBinY);
>>        makeSelection("polygon", greenBinX, greenBinY);
>>        changeValues(0xffffff, 0xffffff, 0xaaffaa);
>>        run("Select None");
>> }
>>
>>
>>
>> Output:
>> =======
>> Green Bin:
>> left= 0.2
>> right= 0.3
>> mean= 1.5106
>> stDev= 0.2031
>> count= 40
>> error= 0.06506
>>
>> --
>> ImageJ mailing list: http://imagej.nih.gov/ij/list.html
>>
>
> --
> ImageJ mailing list: http://imagej.nih.gov/ij/list.html
>

--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html