Statistical & Financial Consulting by Stanford PhD
Home Page

Path Analysis focuses on identifying causal relationships within a group of random variables. "Causal" is meant in the following sense: variable X causes variable Z if

i) a change in variable X precedes a change in variable Z,
ii) the distribution of the change in Z depends on the change in X.

The definition implies that causation is possible in one direction only: there is no pair of variables that cause each other. Causal relationships are best visualized with so-called "directed graphs". In a directed graph random variables are represented with nodes. An arrow is drawn from node X to node Z if variable X causes variable Z. Variable X is called the "parent" while variable Z is called the "child".

All suspected causal links are collected in a so-called "full model". They are just suspects, some of those links may not exist. For that reason we specify an alternative model, which has only a subset of the causal links. It is called the "reduced model". The reduced model is a particular case of the full model. The preferred estimation method is the Method of Maximum Likelihood (MML). This preference is explained by several properties of MML:

1) Even multi-layer directed graphs imply simple conditional distributions of one variable given its parents. This leads to a simple multiplicative formula for the likelihood function of the data.

2) MML does not require closed form solutions for any function of the parameters. Once the likelihood function has been derived, its maximum can be found either analytically or with the help of numerical methods.

3) Oftentimes, the full model is assumed to be linear and the conditional distributions are assumed to be normal. Supposed we have made a mistake about the latter assumption. Suppose the conditional distributions are not normal. Then the calculated likelihood function is incorrect. Nonetheless, maximizing this function still produces consistent estimates of the parameters since we have figured out the dependencies of the first and second moments of the variables on the parameters. The resulting approach is sometimes referred to as the Method of Quasi-maximum Likelihood.

After estimation has been done, the full and reduced model can be compared using the likelihood ratio test. If the full model is true, then the logarithm of the likelihood ratio tends to have small values. Therefore, if the logarithm of the likelihood ratio is especially large, the reduced model is accepted. In that case, we conclude that many suspected causal relationships do not exist.

Path analysis can be viewed as a special case of structural equation modeling (SEM).


Dodge, Y. (2003). The Oxford Dictionary of Statistical Terms. OUP. ISBN 0-19-920613-9.

Kline, R.B. (2004). Principles and Practice of Structural Equation Modeling (2nd ed.). New York: Guilford Press.

Joreskog, K. G. & Sorbom, D. (1979). Advances in Factor Analysis and Structural Equation Models. Cambridge, MA: Abt Books.

Bartholomew, D.J. (1987). Latent variable models and factor analysis. London: Charles Griffin.

Bollen, K.A. (1989.) Structural Equations with Latent variables. New York: Wiley.

Lee, S.-Y. (2007). Structural Equation Modeling: A Bayesian Approach. New York: Wiley.

Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press.

Nunnally, J. C. (1967). Psychometric Theory. New York: McGraw-Hill.

Lord, F.M. & Novick, M.R. (1968). Statistical theories of mental test scores. Menlo Park, CA: Addison-Wesley.

Wright, S. (1921). Correlation and causation. J. Agricultural Research 20: 557–585.

Wright, S. (1934). The method of path coefficients. Annals of Mathematical Statistics 5 (3): 161–215.