Curve Fitting issue/question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Curve Fitting issue/question

Rainer M. Engel
Hello everyone,

I'm working with curve fitting, to smooth data but in the first place to
fill invalid positions in a curve.

I tend to use Gaussian equation, which works well so far. But it seems
that sometimes, I get undesired results, like the one from the attached
macro, where -995468 as curve point is calculated.

Is this an error/bug?

If this is correct, what would you suggest to avoid/remove such deflections?

Just out of curiosity. Is there an equation or function to interpolate
only missing positions of a curve, where each given point is kept?

Any help/hint is much appreciated.

Kind regards,
Rainer


--
Rainer M. Engel, Dipl. Digital Artist

endime|ENGEL DIGITAL MEDIA
Pichelsdorferstr. 143
D-13595 Berlin


--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html

IJ_CurveFitting-eq-12.ijm (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Curve Fitting issue/question

Michael Schmid
Hi Rainer,

 > I tend to use Gaussian equation, which works well so far. But it seems
 > that sometimes, I get undesired results, like the one from the
 > attached macro, where -995468 as curve point is calculated.

After analyzing your case, I am quite confident that this is not a bug.
The reason for the "strange" fit result is the following:
Your data set is essentially a flat curve with a few outliers at
significantly more negative values than the rest. There are many
equivalent solutions for a best fit, all of them are very narrow
Gaussians that are essentially constant everywhere except for one
outlier point, which is fitted very well by the Gaussian.
With just one outlier point defining the Gaussian, the curve fitter can
put the peak of the Gaussian essentially anywhere in between the
surrounding data points; the peak does not necessarily have to be
exactly at the outlier point. If the peak position is a bit off the
outlier point, that point will be somewhere on the slope of the
Gaussian. If the point happens to rather far from the peak of the
Gaussian, the Gaussian has to be very high to accurately fit the outlier
point.

With your data set, the outlier is at 23, and the CurveFitter happens to
put the peak of the Gaussian at 23.65, which is still far from the
adjacent points. I have attached a high-resolution plot. It shows that
the outlier point at 23 is exactly on fitting curve, and it makes no
difference for the adjacent points whether one shifts the Gaussian a bit
to the left or right (but this would make a huge difference for the
Gaussian's height).

By the way, there is also another local solution to your fitting
problem, which is rather broad peak at c=29.5, with a width of d=14.5,
but that one is worse than fitting one of the outliers by putting a very
sharp Gaussian there (sum of residuals squared = 1763.3 vs. 1559.8,
correlation coefficient 0.043 vs. 0.117).

To avoid such problems, one could think of doing a fit with a 'penalty
function' for such high values, e.g., something in the sense of maximum
entropy methods, but this is not implemented in ImageJ and it would be
rather difficult to do so.
   https://en.wikipedia.org/wiki/Principle_of_maximum_entropy

By the way, the fitting process is based on minimizing the least-squares
deviation with the Nelder-Mead method, which is rather robust even for
badly conditioned minimization problems
   https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
In addition, for more robust fitting, it eliminates up to two parameters
by linear regression (in case of the Gaussian, the baseline and height
of the Gaussian, so only two fit parameters remain to be determined by
the Nelder-Mead method).
---

In your case, maybe you could detect whether the r-squared of the
Gaussian fit is too bad (say, below 0.4), or if the width ('d'
parameter) of the Gaussian is too narrow, and if so, use a different fit
function such as a fourth-order polynomial? It won't give you a much
better fit, but it can't run into the problem of a Gaussian trying to
catch a single outlier point.

---

The second of your questions is easier to answer:
 > Just out of curiosity. Is there an equation or function to interpolate
 > only missing positions of a curve, where each given point is kept?

You can do linear interpolation
   https://en.wikipedia.org/wiki/Linear_interpolation
cubic interpolation, splines, etc.
   https://en.wikipedia.org/wiki/Spline_(mathematics)

In ImageJ, if you have equidistant points simply create an image with a
height of one pixel and the data as pixel values, then you can use the
macro function
   getPixel(x, 0)
to get the interpolated value (linear by default; setOption("bicubic,
true) for cubic interpolation.

ImageJ uses splines for 'rounding' segmented line rois, but I am not
aware of a macro interface for it's built-in SplineFitter. One could
probably access it via JavaScript.


Michael
________________________________________________________________


On 24/06/2017 16:37, Rainer M. Engel wrote:

> Hello everyone,
>
> I'm working with curve fitting, to smooth data but in the first place to
> fill invalid positions in a curve.
>
> I tend to use Gaussian equation, which works well so far. But it seems
> that sometimes, I get undesired results, like the one from the attached
> macro, where -995468 as curve point is calculated.
>
> Is this an error/bug?
>
> If this is correct, what would you suggest to avoid/remove such deflections?
>
> Just out of curiosity. Is there an equation or function to interpolate
> only missing positions of a curve, where each given point is kept?
>
> Any help/hint is much appreciated.
>
> Kind regards,
> Rainer
>
>
--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html

GaussianFit.png (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Curve Fitting issue/question

Rainer M. Engel
Dear Michael,

thank you very much for your exceedingly informative reply. I do not
have that strong math background but I could understand most of it.

I tested your suggestions for both. The resulting data array is checked
against min/max of the source data. So if there might be a peak like
described I switch to another equation.

The interpolation via image is also useful. Maybe I'll incorporate that
method too.

Thanks again..

Rainer



Am 26.06.2017 um 12:01 schrieb Michael Schmid:

> Hi Rainer,
>
>> I tend to use Gaussian equation, which works well so far. But it seems
>> that sometimes, I get undesired results, like the one from the
>> attached macro, where -995468 as curve point is calculated.
>
> After analyzing your case, I am quite confident that this is not a bug.
> The reason for the "strange" fit result is the following:
> Your data set is essentially a flat curve with a few outliers at
> significantly more negative values than the rest. There are many
> equivalent solutions for a best fit, all of them are very narrow
> Gaussians that are essentially constant everywhere except for one
> outlier point, which is fitted very well by the Gaussian.
> With just one outlier point defining the Gaussian, the curve fitter can
> put the peak of the Gaussian essentially anywhere in between the
> surrounding data points; the peak does not necessarily have to be
> exactly at the outlier point. If the peak position is a bit off the
> outlier point, that point will be somewhere on the slope of the
> Gaussian. If the point happens to rather far from the peak of the
> Gaussian, the Gaussian has to be very high to accurately fit the outlier
> point.
>
> With your data set, the outlier is at 23, and the CurveFitter happens to
> put the peak of the Gaussian at 23.65, which is still far from the
> adjacent points. I have attached a high-resolution plot. It shows that
> the outlier point at 23 is exactly on fitting curve, and it makes no
> difference for the adjacent points whether one shifts the Gaussian a bit
> to the left or right (but this would make a huge difference for the
> Gaussian's height).
>
> By the way, there is also another local solution to your fitting
> problem, which is rather broad peak at c=29.5, with a width of d=14.5,
> but that one is worse than fitting one of the outliers by putting a very
> sharp Gaussian there (sum of residuals squared = 1763.3 vs. 1559.8,
> correlation coefficient 0.043 vs. 0.117).
>
> To avoid such problems, one could think of doing a fit with a 'penalty
> function' for such high values, e.g., something in the sense of maximum
> entropy methods, but this is not implemented in ImageJ and it would be
> rather difficult to do so.
>   https://en.wikipedia.org/wiki/Principle_of_maximum_entropy
>
> By the way, the fitting process is based on minimizing the least-squares
> deviation with the Nelder-Mead method, which is rather robust even for
> badly conditioned minimization problems
>   https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
> In addition, for more robust fitting, it eliminates up to two parameters
> by linear regression (in case of the Gaussian, the baseline and height
> of the Gaussian, so only two fit parameters remain to be determined by
> the Nelder-Mead method).
> ---
>
> In your case, maybe you could detect whether the r-squared of the
> Gaussian fit is too bad (say, below 0.4), or if the width ('d'
> parameter) of the Gaussian is too narrow, and if so, use a different fit
> function such as a fourth-order polynomial? It won't give you a much
> better fit, but it can't run into the problem of a Gaussian trying to
> catch a single outlier point.
>
> ---
>
> The second of your questions is easier to answer:
>> Just out of curiosity. Is there an equation or function to interpolate
>> only missing positions of a curve, where each given point is kept?
>
> You can do linear interpolation
>   https://en.wikipedia.org/wiki/Linear_interpolation
> cubic interpolation, splines, etc.
>   https://en.wikipedia.org/wiki/Spline_(mathematics)
>
> In ImageJ, if you have equidistant points simply create an image with a
> height of one pixel and the data as pixel values, then you can use the
> macro function
>   getPixel(x, 0)
> to get the interpolated value (linear by default; setOption("bicubic,
> true) for cubic interpolation.
>
> ImageJ uses splines for 'rounding' segmented line rois, but I am not
> aware of a macro interface for it's built-in SplineFitter. One could
> probably access it via JavaScript.
>
>
> Michael
> ________________________________________________________________
>
>
> On 24/06/2017 16:37, Rainer M. Engel wrote:
>> Hello everyone,
>>
>> I'm working with curve fitting, to smooth data but in the first place to
>> fill invalid positions in a curve.
>>
>> I tend to use Gaussian equation, which works well so far. But it seems
>> that sometimes, I get undesired results, like the one from the attached
>> macro, where -995468 as curve point is calculated.
>>
>> Is this an error/bug?
>>
>> If this is correct, what would you suggest to avoid/remove such
>> deflections?
>>
>> Just out of curiosity. Is there an equation or function to interpolate
>> only missing positions of a curve, where each given point is kept?
>>
>> Any help/hint is much appreciated.
>>
>> Kind regards,
>> Rainer
>>
>>
>
> --
> ImageJ mailing list: http://imagej.nih.gov/ij/list.html


--
Rainer M. Engel, Dipl. Digital Artist

endime|ENGEL DIGITAL MEDIA

--
ImageJ mailing list: http://imagej.nih.gov/ij/list.html