Hi, the visualization of decision trees. I thought the method I introduced earlier was amazing. Recently, I found a more amazing and more realistic one
directlyDrawing random forest
No problem
Here is a minimalist primer on the use of pybaobabdt
Install GraphViz
Pybaobabdt relies on GraphViz to download the installation package first
www.graphviz.org/download/
2. Double-click on the MSI file and keep selecting Next (the default installation path is C: Program Files (x86)\Graphviz2.38\). After the installation is complete, a shortcut message will be created in the Windows Start menu.
3, Configure environment variables: computer → Properties → Advanced system Settings → Advanced → Environment variables → System variables →path
4. Verification: In the Windows command line interface, enter dot-version and press Enter. If the version information of Graphviz is displayed as shown below, the installation and configuration is successful.
Install PyGraphviz and PyBaobabdt
PIP install PyGraphviz directly, there is a high probability of error, recommend download WHL file local install.
PIP install Pybaobabdt
Pybaobabdt usage
Pybaobabdt is also ridiculously simple to use. The core command is a single PyBaobabdt. drawTree.
import pybaobabdt
import pandas as pd
from scipy.io import arff
from sklearn.tree import DecisionTreeClassifier
from matplotlib.colors import LinearSegmentedColormap
from matplotlib.colors import ListedColormap
from colour import Color
import matplotlib.pyplot as plt
import numpy as np
data = arff.loadarff('vehicle.arff')
df = pd.DataFrame(data[0])
y = list(df['class'])
features = list(df.columns)
features.remove('class')
X = df.loc[:, features]
clf = DecisionTreeClassifier().fit(X, y)
ax = pybaobabdt.drawTree(clf, size=10, dpi=72, features=features, colormap='Spectral')
Copy the code
How do you look at this picture?
Different colors correspond to different categories (targets), and each fork is marked with splitting conditions, so the division logic is clear at a glance. The depth of the tree is also neatly reflected.
The diameter of the branches is not a decoration, but represents the number of samples (proportion), the more samples under this partition condition, the thicker the trunk.
When you find that the lowest branch is too thin and fragile, should you consider the risk of overfitting, such as adjusting the minimum sample size?
Drawing random forest
import pybaobabdt import pandas as pd from scipy.io import arff import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier data = arff.loadarff('vehicle.arff') df = pd.DataFrame(data[0]) y = list(df['class']) features = list(df.columns) features.remove('class') X = df.loc[:, features] clf = RandomForestClassifier(n_estimators=20, n_jobs=-1, random_state=0) clf.fit(X, Y) size= (15,15) plt.rcparams [' fig.figsize '] =size FIG = plt.figure(figsize=size, dpi=300) for idx, tree in enumerate(clf.estimators_): ax1 = fig.add_subplot(5, 4, idx+1) pybaobabdt.drawTree(tree, model=clf, size=15, dpi=300, features=features, ax=ax1) fig.savefig('random-forest.png', format='png', dpi=300, transparent=True)Copy the code
How to use it? Isn’t it cool? Go and try it! If there is any harvest, can you review, save and forward it? Thank you ~
Mp.weixin.qq.com/s/uIazCL9Sj…