April 03, 2012

Principal Components Analysis (Advanced Data Analysis from an Elementary Point of View)

Principal components is the simplest, oldest and most robust of dimensionality-reduction techniques. It works by finding the line (plane, hyperplane) which passes closest, on average, to all of the data points. This is equivalent to maximizing the variance of the projection of the data on to the line/plane/hyperplane. Actually finding those principal components reduces to finding eigenvalues and eigenvectors of the sample covariance matrix. Why PCA is a data-analytic technique, and not a form of statistical inference. An example with cars. PCA with words: "latent semantic analysis"; an example with real newspaper articles. Visualization with PCA and multidimensional scaling. Cautions about PCA; the perils of reification; illustration with genetic maps.

Reading: Notes, chapter 18; pca.R, pca-examples.Rdata, and cars-fixed04.dat

Advanced Data Analysis from an Elementary Point of View

Posted by crshalizi at April 03, 2012 09:10 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems