Random forests have been booming in the past few years. It is a nonlinear tree-based model, which can often get accurate results. However, the working process of random forest is mostly in a black box state, which is often difficult to interpret and fully understand. A recent article by Pivotal Engineering Journal offers an in-depth look at the basics of random forests. This paper starts from the decision tree of random forest construction module, and introduces the working process of random forest through vivid charts, which can help readers to have a more thorough understanding of the working mode of random forest. This article is based on a GitHub project by Ando Saabas. Also, you can find the code to create the various charts in this article on GitHub.
- Ando Saabas’ project: github.com/andosa/tree…
- The code to create the chart: github.com/gregtam/int…
How decision trees work
A decision tree can be seen as a set of IF-then rules, that is, a rule is constructed from each path from the root node to the leaf node of the decision tree. The characteristics of internal nodes on the path correspond to the conditions of the rule, and the class of the leaf node corresponds to the conclusion of the rule. Therefore, the decision tree can be regarded as consisting of the condition if (internal node) and the corresponding rule then (edge) satisfying the condition.
A decision tree works by iteratively dividing the data into different subsets in a greedy fashion. The purpose of regression tree is to minimize MSE (mean square error) or MAE (mean absolute error) in all subsets. Classification tree is to segment data to minimize entropy or Gini impurity in the resulting subset.
The resulting classifier can divide the feature space into different subsets. The prediction of an observation will depend on the subset that observation belongs to.
Figure 1: Iterative process of a decision tree
The contribution of decision trees
With abalone data set (archive.ics.uci.edu/ml/datasets…). As an example. We will predict the number of rings on abalone shells based on variables such as shell weight, length and diameter. To demonstrate, we built a very shallow decision tree. We can get this tree by limiting the maximum number of layers in the tree to 3.
Figure 2: Decision tree paths for predicting different loop numbers
To predict the number of rings in the abalone, the decision tree moves down the tree until it reaches a leaf node. Each step splits the current subset into two. For a particular segmentation, we define the variables contributing to the segmentation according to the change in the average number of loops.
For example, if we take an abalone with a shell weight of 0.02 and a length of 0.220, it will land on the leftmost leaf node with a predicted ring number of 4.4731. The contribution of shell weight to the predicted ring number is:
(7.587-9.958) + (5.701-7.587) = -4.257
The contribution of length is:
(4.473-5.701) = -1.228
These contributions are all negative, suggesting that for this particular abalone, shell weight and length values decrease the predicted ring number.
We can get these contributions by running the following code.
from treeinterpreter
import treeinterpreter
as ti dt_reg_pred, dt_reg_bias, dt_reg_contrib = ti.predict(dt_reg, X_test)
The variable dT_reg is the Sklearn classifier target, and X_test is a Pandas DataFrame or NUMPY array containing the characteristics from which we want to predict and contribute. Its contribution variable dt_reg_contrib is a two-dimensional NUMPY array (n_obs, n_features), where n_OBs is the number of observations and n_features is the number of features.
We can chart these contributions for a given abalone to see which characteristics have the greatest impact on the predicted values. We can see the negative impact of this particular abalone weight and length value on the predicted number of rings in the chart below.
Figure 3: An example contribution chart (decision tree)
We can use Violin Plots to compare the contribution of this particular abalone with that of all abalones. This allows a kernel density estimate to be superimposed on the chart. In the image below, we can see that the shell weight of this particular abalone is unusually low compared to other abalones. In fact, many abalone have a positive shell weight contribution.
Figure 4: Contribution diagram (decision tree) of violin diagram to an observed sample. Basic concepts and usage of violin diagram are attached at the end of the article.
The graph above, while informative, still doesn’t give us a complete understanding of the effect of a particular variable on the number of rings an abalone has. Thus, we can plot the contribution of a given feature based on its values. If we plot the value of shell weight against its contribution, we can see that an increase in shell weight causes an increase in contribution.
Figure 5: Contribution and shell weight (decision tree)
On the other hand, the relationship between weight and contribution after shell removal is nonlinear and non-monotonic. Lower shelled weight has no contribution, higher shelled weight has a negative contribution, and in between, the contribution is positive.
Figure 6: Contribution and weight after hulling (decision tree)
Expand to a random forest
By forming a forest of many decision trees and taking the average contribution of all trees for a variable, this process of determining the contribution of features can be naturally extended to a random forest.
Figure 7: Contribution of violin plots to an observation plot (random forest)
Because of the randomness inherent in random forests, the contribution of a given shell weight value will differ. But as the smooth black trend line in the chart below shows, the growing trend remains. Just like in the decision tree, we can see that the contribution is higher as the shell weight increases.
Figure 8: Contribution and shell weight (random forest)
Similarly, we are likely to see complex trends that are not monotonous. The contribution of diameter seems to decline at about 0.45 and peak at about 0.3 and 0.6 respectively. Otherwise, the relationship between diameter and number of loops is basically increasing.
Figure 9: Contribution and diameter (random forest)
classification
We have seen that the characteristic distribution of regression trees derives from the mean value of the rings and the way it varies in subsequent segmentation. We can extend each subset into binomial or multinomial categories by examining the proportion of observations for a particular category. The contribution of a feature is the total change in proportion caused by that feature.
It is easier to understand through case explanation. Suppose our goal now is to predict sex, whether the abalone is female, male or larval.
Figure 10: Decision tree path for multiple categories
Each node has three values — the proportion of females, males, and juveniles in that subset. An abalone with 0.1 organ weight and 0.1 shell weight belongs to the leftmost leaf node (probabilities 0.082, 0.171 and 0.747). The same contribution logic that applies to regression trees also applies here.
If this particular abalone is juvenile, then the contribution of organ weight is:
(0.59-0.315) = 0.275
The contribution of shell weight is:
(0.747-0.59) = 0.157
We can draw a contribution map for each category. Below we give the contribution diagram for the larval category.
Figure 11: Contribution diagram of violin diagram to the observation of a juvenile (multi-class decision tree)
As before, we can also chart contributions and characteristics for each category. The contribution of shell weight to abalone being female increases with the increase of shell weight, while the contribution of shell weight to abalone being young decreases with the increase of shell weight. For males, the contribution of shell weight increases first and then decreases after the shell weight exceeds 0.5.
Figure 12: Contribution and shell weight for each category (random forest)
conclusion
In this article, we show that you can gain a deeper understanding of decision trees and random forests by looking at paths. This is especially useful for random forests, which are very parallel and often high-performing machine learning models. To meet the business needs of Pivotal’s customers, we needed to provide highly predictive models as well as interpretable models. That said, we don’t want to give them a black box, no matter how effective it is. This is an important requirement when doing business with government and financial clients, as our models need to pass compliance checks.
With violin diagram basis
The Violin plot is a way of plotting digital data. It’s very similar to a boxplot, but it also shows the probability density of distributions. Let’s start with the boxplot:
The data set above shows that:
- The minimum is equal to 5
- The maximum is equal to 10
- The average is 8
- The lower quartile is 7, the first quartile (Q1), and is equal to the 25% of all values in the sample in ascending order.
- The median is 8.5, the second quartile (Q2), and is equal to the 50th percentile of all values in the sample in ascending order.
- The upper quartile is 9, the third quartile (Q3), and is equal to the 75% of all values in the sample in ascending order.
- The quartile distance is 2 (i.e. δ Q=Q3-Q1).
The above are the basic parameters of the boxplot. The boxplot only shows aggregate statistics such as the mean/median and quartile range. The violin plot shows the complete distribution of data.
The violin diagram summarizes the statistics expressed in the boxplot:
- The white dots represent the median
- The gray rectangle represents the quartile between Q3 and Q1
- The gray line represents a 95% confidence interval
The gray curves on both sides represent kernel density estimates, which show the shape of the data distribution. The curve segment with wide spacing on both sides indicates that the sample population has a high probability of taking a given value, while the curve segment with narrow spacing indicates that the sample population has a low probability of taking a given value.
The original address: engineering. The pivotal. IO/post/interp…
From the Pivotal
Heart of the machine compiles
Participation: Panda