Hi,
I hava a few questions (below) about Principal Component Analysis (PCA) which I am hoping someone will help me with. I ask this because I'm trying two PCA packages* and they give different results. My fear is that I know just enough to be dangerous. I haven't been able to find answere either on-line or in any of the linear algegra books in our library. Thanks for your help; it is greatly appreciated. Jim Cant I apologize for the length of the questions; I opted for clarity rather than brevity. 1. Under what conditions are 2 sets of eigenvectors and associated eigenvalues considered equal? My hunch is that 1. If all corresponding eigenvectors are the same scalar multiple of each other. AND 2. If the ratio of corresponding eigenvalues from each set is the same, i.e. are scalar multiples of each other THEN The results are equivalent. 1b. What if #1 is relaxed to say that each pair of corresponding eigenvector are scalar multiples but the multiplier differes for each pair? 1c. What if the multplier is the same for all pairs but sometimes differs in sign? 2. When calculating the covariance matrix, does one use the deviation of each observations from the mean of all observations for the feature or the mean of all observations over all features. From what I read, the first is the correct approach but these two packages seem to differ. 3. Does the order of the calculated eigenvectors have any significance?. It seems they are often returned sorted by eigenvalue. I ask because in my data, each feature is an image taken at a paticular time interval after an perturbation giving the data an inherent ordering. I'm concerned that if I consider the data after sorting, that it may be difficult to 'attribute' an eigenvector to a particular underlying cause (if the sort order changes). 4. Can anyone point me to some data with the results of eigenvalue analysis for the data? This would help a lot in testing. Even better, is there a way to programatically generate test data where the eigenvectors/values are known? 5. Are there any other packages to do PCA that you'd recommend? * The two packages are JAMA from NIST (http://math.nist.gov/javanumerics/jama/) and BIJ, Bio-medical Imaging in Java (http://bij.isi.uu.nl/) I can only get the BIJ to agree with the JAMA if the raw data has only 2 features and the mean of the observations is 0 (before analysis.) Looking at the BIJ source code, it appears that when calculating the covariance matrix, the deviations are taken with respect to the mean of all observations. (Also, the calculation of the mean itself appears suspect.) |
Hi Jim,
answers to part of your questions: Eigenvectors - you can multiply each of them individually with any scalar you want. If A is a matrix, V an eigenvector and e the eigenvalue, A V = e V then A (cV) = e (cV) where c is any constant. Eigenvalues - these cannot be multiplied by any value. http://en.wikipedia.org/wiki/Eigenvalue%2C_eigenvector_and_eigenspace PCA packages usually find the largest eigenvalue first, then the others in sequence of decreasing value. Different results for different algorithms may arise from different normalization or pre-processing of the data. Calculations should be done after subtracting the mean (i.e., the average intensity of each channel). For RGB images, cases where principal component analysis works well will be seen when rotating the 3D histogram and all data lie essentially in one plane (2 principal components) or along a line (1 principal component). http://rsb.info.nih.gov/ij/plugins/color-inspector.html Michael ________________________________________________________________ On 26 Jun 2007, at 18:26, Jim Cant wrote: > Hi, > > I hava a few questions (below) about Principal Component Analysis > (PCA) > which > I am hoping someone will help me with. I ask this because I'm > trying two > PCA packages* and they give different results. My fear is that I > know just enough to be dangerous. > > I haven't been able to find answere either on-line or in any of the > linear algegra books in our library. > > Thanks for your help; it is greatly appreciated. > > Jim Cant > > > I apologize for the length of the questions; I opted for clarity > rather than > brevity. > > 1. Under what conditions are 2 sets of eigenvectors and associated > eigenvalues considered equal? > > My hunch is that > 1. If all corresponding eigenvectors are the same scalar > multiple of each other. > AND > 2. If the ratio of corresponding eigenvalues from each set is > the same, i.e. are scalar multiples of each other > THEN > The results are equivalent. > > 1b. What if #1 is relaxed to say that each pair of corresponding > eigenvector are scalar multiples but the multiplier differes > for each pair? > > 1c. What if the multplier is the same for all pairs but sometimes > differs in sign? > > 2. When calculating the covariance matrix, does one use the > deviation of each observations from the mean of all > observations for the feature or the mean of all observations > over all features. From what I read, the first is the correct > approach but these two packages seem to differ. > > 3. Does the order of the calculated eigenvectors have any > significance?. It seems they are often returned sorted by > eigenvalue. I ask because in my data, each feature is an image > taken at a paticular time interval after an perturbation giving > the data an inherent ordering. I'm concerned that if I consider > the data after sorting, that it may be difficult to 'attribute' an > eigenvector to a particular underlying cause (if the sort order > changes). > > 4. Can anyone point me to some data with the results of eigenvalue > analysis for the data? This would help a lot in testing. Even > better, is there a way to programatically generate test data where > the eigenvectors/values are known? > > 5. Are there any other packages to do PCA that you'd recommend? > > > * The two packages are > JAMA from NIST (http://math.nist.gov/javanumerics/jama/) > and > BIJ, Bio-medical Imaging in Java (http://bij.isi.uu.nl/) > > I can only get the BIJ to agree with the JAMA if the raw data has > only 2 features and the mean of the observations is 0 (before > analysis.) Looking at the BIJ source code, it appears that when > calculating the covariance matrix, the deviations are taken with > respect to the mean of all observations. (Also, the calculation > of the > mean itself appears suspect.) |
In reply to this post by Jim Cant
Hello,
I agree with the comments given by Michael previously. Some additional answers: 1c. PCA (or any Multivariate Statistical Analysis variant) provides eigen-vectors (eigen-images when the data set is composed of a series of images) and scores (the weights of the different eigen-images in the original images). What is conserved if the product of the two signs (the sign of one eigen-image and the sign of the associated score). So, one sign appear more or less randomly (depending on the algorithm used for computing the eigenvectors), but the contribution of each eigen-image to the original images is not random at all!!! 2. There are many variants of PCA: - "raw" PCA (no centering, no normalization) - PCA with centering - PCA with normalization - PCA With centering and normalization - Correspondence Analysis (double normalization, on images and on pixels) There are also two ways for performing PCA: - either the objects are the images and the features are the intensities associated to the different pixels - or the objects are the pixels and the features are the values of these pixels in the different images. If you want to perform centering, you have to subtract the mean values ob the OBJECTS (which is equivalent to the answer by Michael if your objects are the images) 3. The order of the eigenvectors is always fixed by a decreasing order of the eigenvalues, because a large eigenvalue means an important significance (considering that information is represented by a large variance). 4. Give me your email adress and I will send you data with results. 5. I recommend my own pluging for ImageJ (of course!), available at: http://www.univ-reims.fr/INSERM514/ImageJ It includes the following variants: "raw" PCA, PCA with centering, Correspondence Analysis. I hope it helps. Noel ([hidden email]) |
Dear all
As far as PCA is concerned there are 2 good ppts from http://www.ggebiplot.com/ the help file and demo programs help to understand these terms better (but are not data for Images) with regards Samuel On 6/28/07, Noel BONNET <[hidden email]> wrote: > > Hello, > > I agree with the comments given by Michael previously. > > Some additional answers: > 1c. PCA (or any Multivariate Statistical Analysis variant) provides > eigen-vectors (eigen-images when the data set is composed of a series of > images) and scores (the weights of the different eigen-images in the > original images). What is conserved if the product of the two signs (the > sign of one eigen-image and the sign of the associated score). So, one > sign > appear more or less randomly (depending on the algorithm used for > computing > the eigenvectors), but the contribution of each eigen-image to the > original > images is not random at all!!! > > 2. There are many variants of PCA: > - "raw" PCA (no centering, no normalization) > - PCA with centering > - PCA with normalization > - PCA With centering and normalization > - Correspondence Analysis (double normalization, on images and on pixels) > There are also two ways for performing PCA: > - either the objects are the images and the features are the intensities > associated to the different pixels > - or the objects are the pixels and the features are the values of these > pixels in the different images. > If you want to perform centering, you have to subtract the mean values ob > the OBJECTS (which is equivalent to the answer by Michael if your objects > are the images) > > 3. The order of the eigenvectors is always fixed by a decreasing order of > the eigenvalues, because a large eigenvalue means an important > significance > (considering that information is represented by a large variance). > > 4. Give me your email adress and I will send you data with results. > > 5. I recommend my own pluging for ImageJ (of course!), available at: > http://www.univ-reims.fr/INSERM514/ImageJ > > It includes the following variants: "raw" PCA, PCA with centering, > Correspondence Analysis. > > I hope it helps. > > > Noel ([hidden email]) > |
Free forum by Nabble | Edit this page |