Statistical & Financial Consulting by Stanford PhD
Home Page
BAYESIAN INFORMATION CRITERION

Schwarz's Bayesian Information Criterion (BIC) is a model selection tool. If a model is estimated on a particular data set (training set), BIC score gives an estimate of the model performance on a new, fresh data set (testing set). BIC is given by the formula: 

                                BIC = -2 * loglikelihood + d * log(N),

where N is the sample size of the training set and d is the total number of parameters. The lower BIC score signals a better model.

To use BIC for model selection, we simply chose the model giving smallest BIC over the whole set of candidates. BIC attempts to mitigate the risk of over-fitting by introducing the penalty term d * log(N), which grows with the number of parameters. This allows to filter out unnecessarily complicated models, which have too many parameters to be estimated accurately on a given data set of size N. BIC has preference for simpler models compared to Akaike Information Criterion (AIC).


BAYESIAN INFORMATION CRITERION REFERENCES

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6(2), pp. 461-464.

Bishop, C. (1995). Neural Networks for Pattern Recognition. Clarendon Press, Oxford.

Cover, T. & Thomas, J. (1991). Elements of Information Theory. Wiley, New York.

Breiman, L., Friedman, J., Olshen, R. & Stone, C. (1984). Classication and Regression Trees. Wadsworth.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press.


BACK TO THE
STATISTICAL ANALYSES DIRECTORY


IMPORTANT LINKS ON THIS SITE