Need advice about Principal Component Analysis (PCA)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Need advice about Principal Component Analysis (PCA)

Jim Cant
Hi,

I hava a few questions (below) about Principal Component Analysis (PCA) which
I am hoping someone will help me with.  I ask this because I'm trying two
PCA packages* and they give different results.  My fear is that I
know just enough to be dangerous.

I haven't been able to find answere either on-line or in any of the
linear algegra books in our library.

Thanks for your help; it is greatly appreciated.

Jim Cant


I apologize for the length of the questions; I opted for clarity rather than brevity.

1.  Under what conditions are 2 sets of eigenvectors and associated
    eigenvalues considered equal?

    My hunch is that
        1. If all corresponding eigenvectors are the same scalar
           multiple of each other.
        AND
        2. If the ratio of corresponding eigenvalues from each set is
           the same, i.e. are scalar multiples of each other
        THEN
        The results are equivalent.

    1b.  What if #1 is relaxed to say that each pair of corresponding
    eigenvector are scalar multiples but the multiplier differes
    for each pair?

    1c.  What if the multplier is the same for all pairs but sometimes
    differs in sign?

2.  When calculating the covariance matrix, does one use the
    deviation of each observations from the mean of all
    observations for the feature or the mean of all observations
    over all features.  From what I read, the first is the correct
    approach but these two packages seem to differ.

3.  Does the order of the calculated eigenvectors have any
    significance?.  It seems they are often returned sorted by
    eigenvalue.  I ask because in my data, each feature is an image
    taken at a paticular time interval after an perturbation giving
    the data an inherent ordering.  I'm concerned that if I consider
    the data after sorting, that it may be difficult to 'attribute' an
    eigenvector to a particular underlying cause (if the sort order
    changes).

4.  Can anyone point me to some data with the results of eigenvalue
    analysis for the data?  This would help a lot in testing.  Even
    better, is there a way to programatically generate test data where
    the eigenvectors/values are known?

5.  Are there any other packages to do PCA that you'd recommend?


*  The two packages are
       JAMA from NIST (http://math.nist.gov/javanumerics/jama/)
   and
       BIJ, Bio-medical Imaging in Java (http://bij.isi.uu.nl/)

   I can only get the BIJ to agree with the JAMA if the raw data has
   only 2 features and the mean of the observations is 0 (before
   analysis.)  Looking at the BIJ source code, it appears that when
   calculating the covariance matrix, the deviations are taken with
   respect to the mean of all observations.  (Also, the calculation of the
   mean itself appears suspect.)
Reply | Threaded
Open this post in threaded view
|

Re: Need advice about Principal Component Analysis (PCA)

Michael Schmid
Hi Jim,

answers to part of your questions:

Eigenvectors - you can multiply each of them individually with
any scalar you want. If A is a matrix, V an eigenvector and e
the eigenvalue,
   A V = e V
then A (cV) = e (cV)
where c is any constant.

Eigenvalues - these cannot be multiplied by any value.
   http://en.wikipedia.org/wiki/Eigenvalue%2C_eigenvector_and_eigenspace

PCA packages usually find the largest eigenvalue first, then the
others in sequence of decreasing value. Different results for
different algorithms may arise from different normalization or
pre-processing of the data.

Calculations should be done after subtracting the mean (i.e.,
the average intensity of each channel).

For RGB images, cases where principal component analysis works
well will be seen when rotating the 3D histogram and all data
lie essentially in one plane (2 principal components) or along
a line (1 principal component).
   http://rsb.info.nih.gov/ij/plugins/color-inspector.html

Michael
________________________________________________________________

On 26 Jun 2007, at 18:26, Jim Cant wrote:

> Hi,
>
> I hava a few questions (below) about Principal Component Analysis  
> (PCA)
> which
> I am hoping someone will help me with.  I ask this because I'm  
> trying two
> PCA packages* and they give different results.  My fear is that I
> know just enough to be dangerous.
>
> I haven't been able to find answere either on-line or in any of the
> linear algegra books in our library.
>
> Thanks for your help; it is greatly appreciated.
>
> Jim Cant
>
>
> I apologize for the length of the questions; I opted for clarity  
> rather than
> brevity.
>
> 1.  Under what conditions are 2 sets of eigenvectors and associated
>     eigenvalues considered equal?
>
>     My hunch is that
>         1. If all corresponding eigenvectors are the same scalar
>            multiple of each other.
>         AND
>         2. If the ratio of corresponding eigenvalues from each set is
>            the same, i.e. are scalar multiples of each other
>         THEN
>         The results are equivalent.
>
>     1b.  What if #1 is relaxed to say that each pair of corresponding
>     eigenvector are scalar multiples but the multiplier differes
>     for each pair?
>
>     1c.  What if the multplier is the same for all pairs but sometimes
>     differs in sign?
>
> 2.  When calculating the covariance matrix, does one use the
>     deviation of each observations from the mean of all
>     observations for the feature or the mean of all observations
>     over all features.  From what I read, the first is the correct
>     approach but these two packages seem to differ.
>
> 3.  Does the order of the calculated eigenvectors have any
>     significance?.  It seems they are often returned sorted by
>     eigenvalue.  I ask because in my data, each feature is an image
>     taken at a paticular time interval after an perturbation giving
>     the data an inherent ordering.  I'm concerned that if I consider
>     the data after sorting, that it may be difficult to 'attribute' an
>     eigenvector to a particular underlying cause (if the sort order
>     changes).
>
> 4.  Can anyone point me to some data with the results of eigenvalue
>     analysis for the data?  This would help a lot in testing.  Even
>     better, is there a way to programatically generate test data where
>     the eigenvectors/values are known?
>
> 5.  Are there any other packages to do PCA that you'd recommend?
>
>
> *  The two packages are
>        JAMA from NIST (http://math.nist.gov/javanumerics/jama/)
>    and
>        BIJ, Bio-medical Imaging in Java (http://bij.isi.uu.nl/)
>
>    I can only get the BIJ to agree with the JAMA if the raw data has
>    only 2 features and the mean of the observations is 0 (before
>    analysis.)  Looking at the BIJ source code, it appears that when
>    calculating the covariance matrix, the deviations are taken with
>    respect to the mean of all observations.  (Also, the calculation  
> of the
>    mean itself appears suspect.)
Reply | Threaded
Open this post in threaded view
|

Re: Need advice about Principal Component Analysis (PCA)

Noel BONNET
In reply to this post by Jim Cant
Hello,

I agree with the comments given by Michael previously.

Some additional answers:
1c. PCA (or any Multivariate Statistical Analysis variant) provides
eigen-vectors (eigen-images when the data set is composed of a series of
images) and scores (the weights of the different eigen-images in the
original images). What is conserved if the product of the two signs (the
sign of one eigen-image and the sign of the associated score). So, one sign
appear more or less randomly (depending on the algorithm used for computing
the eigenvectors), but the contribution of each eigen-image to the original
images is not random at all!!!

2. There are many variants of PCA:
- "raw" PCA (no centering, no normalization)
- PCA with centering
- PCA with normalization
- PCA With centering and normalization
- Correspondence Analysis (double normalization, on images and on pixels)
There are also two ways for performing PCA:
- either the objects are the images and the features are the intensities
associated to the different pixels
- or the objects are the pixels and the features are the values of these
pixels in the different images.
If you want to perform centering, you have to subtract the mean values ob
the OBJECTS (which is equivalent to the answer by Michael if your objects
are the images)

3. The order of the eigenvectors is always fixed by a decreasing order of
the eigenvalues, because a large eigenvalue means an important significance
(considering that information is represented by a large variance).

4. Give me your email adress and I will send you data with results.

5. I recommend my own pluging for ImageJ (of course!), available at:
http://www.univ-reims.fr/INSERM514/ImageJ

It includes the following variants: "raw" PCA, PCA with centering,
Correspondence Analysis.

I hope it helps.


Noel ([hidden email])
Reply | Threaded
Open this post in threaded view
|

Re: Need advice about Principal Component Analysis (PCA)

dksamuel
Dear all
As far as PCA is concerned there are 2 good ppts from
http://www.ggebiplot.com/ the help file and demo programs help to understand
these terms better (but are not data for Images) with regards Samuel

On 6/28/07, Noel BONNET <[hidden email]> wrote:

>
> Hello,
>
> I agree with the comments given by Michael previously.
>
> Some additional answers:
> 1c. PCA (or any Multivariate Statistical Analysis variant) provides
> eigen-vectors (eigen-images when the data set is composed of a series of
> images) and scores (the weights of the different eigen-images in the
> original images). What is conserved if the product of the two signs (the
> sign of one eigen-image and the sign of the associated score). So, one
> sign
> appear more or less randomly (depending on the algorithm used for
> computing
> the eigenvectors), but the contribution of each eigen-image to the
> original
> images is not random at all!!!
>
> 2. There are many variants of PCA:
> - "raw" PCA (no centering, no normalization)
> - PCA with centering
> - PCA with normalization
> - PCA With centering and normalization
> - Correspondence Analysis (double normalization, on images and on pixels)
> There are also two ways for performing PCA:
> - either the objects are the images and the features are the intensities
> associated to the different pixels
> - or the objects are the pixels and the features are the values of these
> pixels in the different images.
> If you want to perform centering, you have to subtract the mean values ob
> the OBJECTS (which is equivalent to the answer by Michael if your objects
> are the images)
>
> 3. The order of the eigenvectors is always fixed by a decreasing order of
> the eigenvalues, because a large eigenvalue means an important
> significance
> (considering that information is represented by a large variance).
>
> 4. Give me your email adress and I will send you data with results.
>
> 5. I recommend my own pluging for ImageJ (of course!), available at:
> http://www.univ-reims.fr/INSERM514/ImageJ
>
> It includes the following variants: "raw" PCA, PCA with centering,
> Correspondence Analysis.
>
> I hope it helps.
>
>
> Noel ([hidden email])
>