Statistical & Financial Consulting by Stanford PhD

Home Page

Multiple Linear Regression (MLR) is a method used to model the linear relationship between a dependent variable and one or more independent variables. The dependent variable is also called the response variable, while the independent variables are called the predictors. The regression model is the following:

Here

In the most general set-up, predictors

To estimate the regression model we observe several realizations of random vector (

1] If

2] If

3] If

The data do not have to be jointly normal (Gaussian) for the estimation to work. In fact the following is true: if the residuals are normal conditional on predictors (

After the estimation has been done, it's the time for diagnostics. The estimates of regression coefficients

1] Whether the true value of coefficient

2] Whether true values of several coefficients

3] Whether predictors

4] The diagnostics of outliers and high-leverage points is performed using Cook's distance, the leverage statistic and other relevant metrics. If flawed data points are detected they are dropped from the data set.

As the next stage, we experiment with adding new and dropping some of the old predictors to see what the resulting models are. At the end, we want to have a collection of candidate models, where each predictor is significant. Now we choose the best model using one of the standard model selection methods. This model can be used for prediction on a completely new data set.

Note, that multiple linear regression is called "linear" because of the linear dependence of the response on the regression coefficients. The relationship between the response and the predictors is not important. Even if the true relationship is the following:

we can denote

The new notation does not change the estimates of coefficients

Freedman, D., Pisani, R., & Purves, R. (1998). Statistics (3rd ed). New York: W. W. Norton & Company.

Dekking, F. M., Kraaikamp, C., Lopuhaä, H. P., & Meester, M. E. (2007). A Modern Introduction to Probability and Statistics: Understanding Why and How (3rd ed). London: Springer.

Greene, W. H. (2011). Econometric Analysis (7th ed). Upper Saddle River, NJ: Prentice Hall.

Draper, N.R., & Smith, H.(1998). Applied Regression Analysis. New York: Wiley Series in Probability and Statistics.

Cohen, J., Cohen P., West, S.G., & Aiken, L.S. (2003). Applied multiple regression / correlation analysis for the behavioral sciences (2nd ed.) Hillsdale, NJ: Lawrence Erlbaum Associates.

Teller, G. R. (1999). Mathematical Statistics: A Unified Introduction. New York: Springer.

Kennedy, P. (2003). A Guide to Econometrics. Cambridge, MA: MIT Press.

- Detailed description of the services offered in the areas of statistical and financial consulting: home page, types of service, experience, case studies, payment options and statistics tutoring
- Directory of financial topics