Statistical & Financial Consulting by Stanford PhD

Home Page

Principal Component Analysis (PCA) is a form of data compression. Suppose we have information stored in P correlated random variables. Since the variables are correlated they contain less information than P uncorrelated ones. In a degenerate example when one variable is a linear combination of the others, the amount of stored information corresponds to P-1 uncorrelated variables at best.

PCA delivers the optimal way of approximating the P variables with linear combinations of Q uncorrelated factors, where Q < P. The optimality is meant in the sense of minimizing the variances of discrepancies between the variables and their approximations. The uncorrelated factors are called principal components. Here we "compress" the P original variables into Q uncorrelated factors. Knowing the values of those factors allows us to approximate the values of the original variables at any time.

The principal components are calculated via a singular value decomposition (SVD) of the covariance matrix V of the original variables. The SVD decomposition has the form:

V = U * D * U',

where U is an orthogonal matrix and D is a diagonal matrix, with the values decreasing along the diagonal. The columns of U represent the eigenvectors of V. The diagonal of D contains the eigenvalues of V.

Let X be a P-by-1 vector containing the values of the original random variables. Then the P-by-1 vector of principal components is calculated as

PC = (PC_{1}, PC_{2}, ... , PC_{P}) = U' * X

The first Q coordinates of PC are the Q factors necessary for compressing the original variables. We see that the principal components are the result of an orthogonal transformation of the original variables.

The first principal component PC

PCA is closely related to factor analysis. Factor analysis typically incorporates more constraints on the underlying factors and the structure of random shocks. As the result, factor analysis solves for eigenvectors of a slightly different matrix.

Abdi. H., & Williams, L.J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2: 433–459.

Jolliffe I.T. (2002). Principal Component Analysis Series: Springer Series in Statistics (2nd ed.), XXIX. New York: Springer. ISBN 978-0-387-95442-4.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2008). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. Elsevier. ISBN 0-12-269851-7.

- Detailed description of the services offered in the areas of statistical consulting and financial consulting: home page, types of service, experience, case studies and payment options
- Directory of financial topics