This problem can be analyzed from several dimensions. But first, it should be explained that decision tree is decision tree, and the performance improvement of random forest and XGBoost mainly comes from ensemble learning. So, let’s expand the title and extend the contrast to:
A single decision tree, such as the more common C4.5, etc
Ensemble Tree learning algorithms based on decision Tree models, such as Random Forest, Gradient Boosting, and XGBoost
Neural networks, including networks of various depths and structures
In my opinion, the use of single decision trees has been relatively limited and has been largely replaced by integrated decision trees. However, decision tree integration model and neural network have different application scenarios and there is no substitution. A more general view is given for reference only:
If absolute explanatory degree is not emphasized, try to avoid single decision tree and use integrated tree model
In the ensemble number model, xgBoost is preferred
In small and medium datasets, the integration tree model is preferred. Recommended neural networks on large data sets
The tree model is preferred for projects that require model interpretation
On short projects, the integration tree model is preferred if the data quality is low (lots of missing values, noise, etc.)
Under the premise of limited hardware condition and machine learning knowledge, tree model is preferred
For highly structured data, especially voice, picture and language, neural network model is preferred (usually with a large amount of data)
To use an unfortunate metaphor, the integration tree model is like Python and the neural network is like C++. The former is simple and rough with good effect and easy to use, while the latter is more expensive and complicated but has great potential in serious projects. C++ can do almost anything if you’re willing to learn about data structures, memory allocation, and deadly Pointers. But if you’re just going to write a simple web crawler, Python can do it in ten lines.
Single decision tree vs. integrated learning
Decision tree was proposed by Morgan and Sonquist in 1963 [5], which realizes classification and regression through the structure of class tree. We generally believe that the decision tree model is:
It is easy to use and explain [6], and the single-tree decision tree is easy to visualize and extract rules
Feature selection can be achieved automatically [3] by computing impurity reduction and pruning during node splitting.
The prediction ability is limited and cannot be compared with strong supervised learning model [6]
Low stability and high variance, data disturbance can easily lead to great changes in the performance of decision tree [1, 6]
Random forest was proposed by Breiman [10]. The model uses integrated learning to reduce high variance in a single decision tree, thus improving the overall prediction ability. Gradient Boosting Machine (GBM) [9] and XGBoost [8] were proposed in 2001 and 2014 respectively. Given the similarities between the two models discussed together, the two models:
Unlike parallel learning of random forest, models that use serial learning to continuously improve their performance reduce bias.
It is very fast and requires low storage space for prediction classification [3]
Boosting, a learning method, can be regarded as L1 regularization to prevent overfitting, so the model is not easy to fit [3]. The Python library SciKit-Learn also mentioned that it is not easy to overfit [2], and that using more base learners is not a problem
Comparing GBM and XGBoost alone, their classification performance is similar, and XgBoost has an additional regularization term that further reduces overfitting. Xgboost, on the other hand, is faster [4] and tends to be more suitable for larger data sets
According to various practices and studies, random forest, GBM and XGBoost are significantly superior to ordinary single decision trees, so from this point of view, single decision trees can be eliminated.
The biggest moat of a single decision tree is that it can be easily visualized or even extracted to classify rules. However, integrated learning is difficult to achieve at this point [34]. In the sense that explicability is often important in industry, decision trees are a little bit more fundamental. However, the premise of this use is that you must make sure that the decision tree performs well (such as looking at the results of cross-validation) before visualization and rule extraction, otherwise invalid or even wrong rules may be extracted.
The authors of random forests have done this before, but it is not nearly as intuitive as decision trees. Xgboost supports single-tree visualization. At some point, perhaps visualizing the more reliable and stable individual trees in XGBoost can eliminate decision trees altogether.
Integrated tree model vs. neural network
Neural network is already a very familiar algorithm with its maximum capability of feature representation from complex data, and it is also believed to be able to approximate any function (assuming a specific number of nodes) [3]. Deep learning, which is so popular now, is the specific name of neural network with large depth. The neural network is different from the integration tree model in the following aspects:
In terms of data volume, neural networks often need a large number, and the tree model on small data sets has obvious advantages. The question is often asked, how small is too small? It also depends on the number of features. But in general, neural networks do not perform well with hundreds or dozens of data.
From the perspective of feature engineering, neural network requires more rigorous data preparation, while tree model generally does not need the following steps: (I) missing value imputation (II) data type transformation to numerical: (iii) data scaling: data of different ranges are normalized to [0,1] or projected onto a normal distribution. (iv) more parameter adjustments, such as initialization of weights and selection of appropriate learning rates
The integration tree model is much less difficult than the neural network. Most ensemble tree models only need :(I) the number of base learners (ii) the number of features considered (iii) the maximum depth, etc. Neural network tuning tragedy has nothing to say, this point and tree model gap is very large.
From the point of view of model interpretation: the integration tree model is generally of higher interpretation, for example, feature importance can be generated automatically. Although the characteristics of neural network can be analyzed to a certain extent, it is not intuitive. In the early years, the wrapper method was used in neural network, adding or subtracting one feature at a time for feature sorting, which is far less convenient and intuitive than embedded feature selection in integrated tree model.
From the perspective of model prediction ability, the performance of large and medium-sized data is relatively close, regardless of the difference in difficulty of adjustment. As the amount of data increases, the potential of neural network becomes greater and greater
From the perspective of project cycle: Since neural network requires more time in all aspects, the total time required is often much longer than decision tree integration, not to mention good hardware support, such as GPU.
Generally speaking, the integrated tree model is superior to the neural network in the case of small amount of data and multiple features. As the amount of data increases, the performance of the two approaches tends to be close. As the amount of data continues to rise, the advantages of the neural network will gradually manifest. This is similar to what many respondents have mentioned: as the volume of data increases, the demand for model power increases and the risk of overfitting decreases, the advantages of neural networks finally come into play and the advantages of ensemble learning decrease.
conclusion
To sum up, most projects suggest using integrated decision tree, with XGBoost as the first choice. Debug C++ pointer debug C++ pointer
So for the time being, the single decision tree has been replaced, and the integrated decision tree is still very important and has become even more important. In the short term, there is no possibility that the integrated decision tree model will be replaced ʕ•ᴥ• stamp
* Also recommended is an interesting but somewhat dated article [7] that compares various algorithms and gives the author’s analysis.
Original: https://weibo.com/ttarticle/p/show?id=2309404213436883035081#_0
Via: Global Artificial Intelligence
End