How Random Forest Works?

3 min readMar 21, 2021

Random Forests: The name can be split into 2 parts Random and Forests. Forests in common terms are collection of trees, in similar way, a collection of decision tree classifiers are created with training set chosen at random which votes an instance, an aggregate of votes is chosen as the class label for the instance.

Before we go into details on How a forest is created, various optimized ways to select random features and samples. We will look at some concepts that are needed for deciding the goodness of Random Forest.

Margin: Margin measures the extent to which average no of votes for a set (X,y) for right class exceeds the average votes for any other class.
Strength and Correlation: A random forest should have trees which have very minimum dependence with other trees and prediction strength is high.

Why random features are used?

It is relatively robust to outliers and noise.
It is useful in estimating errors, strength and correlation.
faster to process.

How Strength of Random Forests are determined?

Out of Bag Estimates are used to determine the strength of classifier, to monitor error.

The process of OOB estimates is as follows:

Each new training set is drawn, with replacement from original training set. Given a training set T, construct a bootstrap training set Tk, a classifier h(X,Tk). This classifier votes to form bagged predictor
A tree is grown with the new dataset using Random Features.
For each (y,X) in the training set, aggregate the votes only over those classifiers for which Tk doesnot contain (y,X)
This is called OOB classifier, then OOB estimate of generalization error is the error rate of OOB classifier of training set.
In bootstrap training set, about 1/3rd instances are left out. OOB estimates are based on combining only about 1/3rd as many classifiers in bagged predictor.

How random features are chosen?

If there are only a few input features, M, choosing F a fraction of M might lead to increase in strength but also increase in correlation. So, one more approach is to take Linear Combination of randomly chosen L features.
L features are chosen and added together the coefficients that are uniform random numbers between [-1,1]
F Linear Combination are generated, then a search is made over these for best split.

This method is called as Forest RC.

If categorical variables are present, then the variables are dummy coded to get the F linear combination of features.

Using the above procedures, a bagging estimator with Tk training samples are trained with randomly chosen features that has High Bias and Low Variance model.

As Random Forests are built on a subset of Features and subset of Training Data, it is critical to understand the interaction of variables that are providing the predictive accuracy, in short, Feature Importance.

How Feature Importance is calculated in Random Forests?

Suppose that there are M variables, after each tree is created, the values of mth variable in the OOB examples are randomly permuted and the OOB data is run for corresponding tree.
Classification given for each xn, is saved. This is repeated for 1,2…m. With the Mth variable permuted, is compared with true class of xn to get the misclassification rate.
These estimates are got using single run of forest with 1000 trees and no test set.

I tried to explain Random Forest’s internal working in a concise way. Please encourage me, if the article seems useful!!!! 😊

How Random Forest Works?

Written by Thiruthuvaraj Rajasekhar