Statistical & Financial Consulting by Stanford PhD

Home Page

Random Forest is a machine learning method for regression and classification. It combines many decision trees to produce an overall verdict. As such, random forest qualifies as an *ensemble method*. In the regression setting regression trees are used, while in the classification setting classification trees are used.

Random forest utilizes the idea of bagging. It says: create many bootstrap samples of the data, grow a tree on each of them and average the results. There is one nuance: executing bagging in a straightforward manner, with no tweaks, may lead to high correlation among the trees. For example, imagine a large data set where classification is to be performed. The data set contains one strong binary predictor, one medium-effect binary predictor and one weak binary predictor. Any two bootstrap samples may easily overlap by more than 70% of the data and, in that case, they are likely to generate exactly the same tree. There will be no difference whatsoever since we always classify each terminal node to the most prevalent class, and the law of large numbers will lead to the same value in each tree. The example is idealized, of course. In practice we would deal with many more variables.

Averaging highly correlated trees may result in high variance of the average and, therefore, high overall predictive error. To mitigate the issue, random forests go one step beyond bagging. They *de-correlate* the trees by considering only a subset of the variables at each split of a node. Typically, the number of candidate predictors
*m*
is much smaller than the total number of variables
*J*. It can be viewed as a parameter to be fine-tuned via cross-validation or other procedures. The starting values recommended by several references are
*m = Int(J / 3)*
for regression and
*m = Int(J ^{ 0.5})*
for classification. As Hastie et. al. (2008) point out, the suggested starting values are not always the optimum since the optimal

Let

1. For i = 1, ...,

1.1 Draw a sample with replacement of size

1.2 Using the generated sample, grow a tree by repeating the following steps until

the minimum node size has been reached.

1.2.1 Randomly select

1.2.2

decrease in the root mean square error, mean absolute error or a similar metric.

decrease in the misclassification rate, Gini index or a similar metric..

1.2.3 Split the node into two daughter nodes.

2.

A) the vote based on the average class probabilities implied by the grown trees, or

B) the

of the grown trees) .

Note that random forests allow for deep trees if the minimum node size is set small. This is in contrast with stand-alone decision trees, which are first grown to the fullest and then pruned to a smaller version according to a certain regularization criterion.

Random forest allows for 1) asymptotically unbiased estimation of predictive error, 2) testing potential explanatory power of a particular variable, 3) comparing explanatory power of different variables. First, we take each data point and predict the dependent variable using the random forest based on bootstrap samples not containing the given data point. We record the so-called

Second, we assess the importance of each variable present in the estimated random forest. For each tree containing the given variable, we collect the corresponding out-of-bag observations (observations not included in the bootstrap sample used to grow the tree). We randomly permute the variable on the out-of-bag observations and calculate the resulting increase in the predictive error of the tree in question. The predictive error is aggregated over all the trees containing the variable. This is the out-of-bag estimate of the variable importance. We can now test whether it is bigger than 0 via bootstrap. We can also compare the estimated importance of two different variables. We can test if the two variables differ in importance via bootstrap.

Alongside gradient boosting, random forests have three important properties.

- They draw their robustness from the robustness of the underlying decision trees. If a certain predictor has nothing to do with the dependent variable it is unlikely to be utilized substantially in the estimated model.
- The granular nature of random forests allows them to fit highly non-linear and discontinuous surfaces.
- Random forests require relatively little fine-tuning. There are few specifications to vary and even those are optimized in a straightforward manner.

Breiman, L. (2001). Random Forests. Machine Learning, 45 (1): 5–32.

Breiman, L., & Ghahramani, Z. (2004). Consistency for a Simple Model of Random Forests. Department of Statistics, University of California Berkeley. Technical Report.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2008). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

Efron, B., & Hastie, T. (2017). Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. Cambridge University Press.

Zhu, R., Zeng, D., & Kosorok, M. R. (2015). Reinforcement Learning Trees. Journal of the American Statistical Association, 110 (512): pp. 1770–1784.

Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63: 3–42.

Burkov, A. (2019). The Hundred-Page Machine Learning Book. Amazon Australia Services, Inc.

- Adele Cutler's Random Forests Page
- Random Forests Tutorial (focus on implementation in R)
- Data Mining Resources, Dept. of Computer Science, Purdue University Data Mining Resources

- Australian Institute for Machine Learning, University of Adelaide

- Detailed description of the services offered in the areas of statistical and financial consulting: home page, types of service, experience, case studies, payment options and statistics tutoring
- Directory of financial topics