Statistical & Financial Consulting by Stanford PhD
Home Page

Linear Discriminant Analysis (LDA) is a classification technique developed by Ronald Fisher. In a classification setting we need to solve the following problem. We observe N objects. For each object we know the values of variables X1, ..., Xp. We also know that the objects are split into classes 1, 2, ..., K. For each object we know its class membership. We need to develop a statistical method that allows to identify class membership of a new object for which only the values of X1, ..., Xp are known.

Linear discriminant analysis assumes that, within each class i, features X1, ..., Xp have joint normal distribution. Random vector (X1, ..., Xp) has mean mi = (mi1, ... , mip) within class i. For different classes i and j, means mi and mj are different, which gives us the ability to distinguish between the classes.

The parameters of joint normal distributions are estimated via the method of maximum likelihood (MML). Because of the normality assumption, MML leads to very simple formulas. Assuming that the parameter estimates are true, we can derive a formula for the probability of object A belonging to class i if its features X1, ..., Xp have values x1, ..., xp respectively. After that we can classify A to be a member of class j if j is the likeliest class.

It can be shown that this approach implies linear boundaries between the classes. In other words, in the p-dimensional Euclidean space there are hyperplanes that separate the "territory" of one class from the "territories" of the others. If object A has features (X1, ..., Xp) which belong to the "territory" of class j, then the probability that A is a member of j is larger than the probability that A is a member of class i, for any other i. So we classify A to be a member of class j...The linearity of the boundaries is why we call the method "linear discriminant analysis".

LDA is one of the simplest classification techniques, along with logistic regression and k-nearest neighbor. Despite the simplicity, these techniques compete with more convoluted approaches succesfully in many situations.


Duda, R. O., Hart, P. E., & Stork, D. H. (2000). Pattern Classification (2nd ed). New York: Wiley Interscience.

Hilbe, J. M. (2009). Logistic Regression Models. Boca Raton, FL: Chapman & Hall / CRC Press.

McLachlan, G. J. (2004). Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley Interscience.

Tatsuoka, M. M. (1971). Multivariate analysis. New York: John Wiley & Sons, Inc.

Krzanowski W. J. (1990). Principles of Multivariate Analysis. Oxford University Press.

Mika, S, Ratsch, G., Weston, J., Scholkoph, B. & Mullers, K. R. (1999). Fisher discriminant analysis with kernels. Neural Networks for Signal Processing.

Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed). New York: Springer Verlag.