Statistical Consulting in Washington, Philadelphia, Pittsburgh, Baltimore

Statistical & Financial Consulting by Stanford PhD

Home Page

CLASSIFICATION

In a classification setting we need to solve the following problem. We observe N objects. For each object we know the values of variables X₁, ..., X_p. We also know that the objects are split into classes 1, 2, ..., K. For each object we know its class membership. We need to develop a statistical method that allows to identify class membership of a new object for which only the values of X₁, ..., X_p are known.

There are many approaches to classification. They can be classified themselves according to the following criteria.

1] Parametric (logistic regression, linear discriminant analysis, naive Bayes) versus nonparametric (k-nearest neighbor, classification trees / CART, boosted classification trees / MART, random forests, support vector machines, neural networks, genetic algorithms).

2] Simple (logistic regression, linear discriminant analysis, naive Bayes, k-nearest neighbor) versus complex (classification trees / CART, boosted classification trees / MART, random forests, support vector machines, neural networks, genetic algorithms).

3] Robust (logistic regression, k-nearest neighbor, classification trees / CART, boosted classification trees / MART, random forests) versus non-robust but statistically efficient (linear discriminant analysis, naive Bayes, support vector machines, neural networks, genetic algorithms).

All major approaches are defined within the frequentist framework (no prior distribution on parameters) but allow for Bayesian modifications as well (certain prior distribution on parameters). The simple approaches like logistic regression, linear discriminant analysis or k-nearest neighbor should not be looked down upon. By design, they avoid the problem of overfitting and serve as decent competition to advanced techniques in many settings.

CLASSIFICATION SUBCATEGORIES

CLASSIFICATION REFERENCES

Duda, R. O., Hart, P. E., & Stork, D. H. (2000). Pattern Classification (2nd ed). New York: Wiley Interscience.

Agresti, A. (2002). Categorical Data Analysis. New York: Wiley-Interscience.

McLachlan, G. J. (2004). Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley Interscience.

Edelstein, H., A. (1999). Introduction to data mining and knowledge discovery (3rd ed). Potomac, MD: Two Crows Corp.

Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in knowledge discovery & data mining. Cambridge, MA: MIT Press.

Bishop, C. M (1996). Neural Networks for Pattern Recognition. Oxford University Press.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2008). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

Witten, I. H., Frank, E., Hall, M., A., Pal, & C. J. (2017). Data Mining: Practical Machine Learning Tools and Techniques (4th ed). New York: Morgan-Kaufmann.

Hilbe, J. M. (2009). Logistic Regression Models. Boca Raton, FL: Chapman & Hall / CRC Press.

Greene, W. H. (2011). Econometric Analysis (7th ed). Upper Saddle River, NJ: Prentice Hall.

Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed). New York: Springer Verlag.

BACK TO THE STATISTICAL ANALYSES DIRECTORY

IMPORTANT LINKS ON THIS SITE

Detailed description of the services offered in the areas of statistical consulting and financial consulting: home page, types of service, experience, case studies and payment options
Directory of financial topics

consulting@stanfordphd.com