• XGBoost Algorithm: Long May She Reign!
  • Originally written by Vishal Morde
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: lsvih
  • Proofread by Ultrasteve

Long live the XGBoost algorithm!

The new queen of machine learning algorithms will take over the world…

(This article was written with Venkat Anurag Setty)

I remember my first job 15 years ago. At the time, I had just finished a postgraduate course and joined an international investment bank as an analyst. On my first day on the job, I worked carefully, constantly reviewing what I had learned and wondering if I was qualified for the job. Sensing my anxiety, my boss smiled and said,

“Don’t worry! You just need to understand regression models!”

I thought about it carefully. “GOT it!” I understand both linear regression and logistic regression. The boss was right. During my tenure, I specialized in building statistical models based on regression. I wasn’t alone, because back then, regression models were the undisputed queen of predictive analysis. Fifteen years later, the era of regression models is over and the old queen has abdicated. The new queen has a snappy name: XGBoost or Extreme Gradient Boosting.


What is XGBoost?

XGBoost is an ensemble machine learning algorithm based on decision tree, using gradient Boosting framework. In the prediction of unstructured data (such as image, text, etc.), artificial neural network is better than all other algorithms and frameworks. However, the algorithm based on decision tree is the best for solving small and medium-sized structured and flat data. The chart below shows the evolution of tree-based algorithms in recent years:

The XGBoost algorithm was developed at the University of Washington in a research project. Tianqi Chen and Carlos Guestrin published their paper at SIGKDD 2016, and it quickly captured the attention of the machine learning community. Since its launch, XGBoost has not only won many Kaggle competitions, but also powered some of the industry’s top apps. As a result, there is a strong community of data scientists contributing to XGBoost, which currently has over 350 contributors and 3,600 commit entries on GitHub. The XGBoost algorithm also excels in several ways:

  1. Widely used: it can be used to solve regression, classification, sorting and other user-defined prediction problems;
  2. Portability: Runs smoothly on Windows, Linux, and OS X systems.
  3. Language: support C++, Python, R, Java, Scala, Julia and other mainstream programming languages;
  4. Cloud integration: Supports AWS, Azure, and Yarn clusters, and works well with Flink and Spark ecosystems.

How to intuitively understand XGBoost?

Decision trees, in their simplest form, are the easiest and most explicable algorithms to visualize, but it can be difficult to intuitively understand the new generation of tree-based algorithms. The following analogy can be used to better understand the evolution of tree-based algorithms.

Imagine you are an HR employee interviewing several top candidates. Each step in the evolution of tree-based algorithms can be seen as a version of the interview process.

  1. Decision Tree: Every HR person has a set of criteria, such as education, years of employment, interview performance, etc. A decision tree is like an HR person who filters candidates based on these criteria.

  2. Bagging: Let’s say there’s not just one interviewer, but a panel where each interviewer has a vote. Bagging and Bootstrap are a democratic voting process that aggregates all the interviewers’ input into a final decision.

  3. Random Forest: It is a Bagging-based algorithm and the key point is that Random Forest uses a Random subset of features. In other words, each interviewer will only test a candidate’s qualifications against a random selection of criteria (e.g., programming skills in technical terms, non-technical skills in behavioral terms).

  4. Boosting: This is an alternative method, with each interviewer changing their evaluation criteria based on the results of the previous interviewer’s interview. You can boost the efficiency of the interview process by using a more dynamic evaluation process.

  5. Gradient Boosting: A special case of Boosting, which uses a Gradient descent algorithm to minimize errors. For example, consulting firms use case interviews to weed out less-qualified candidates.

  6. XGBoost: An XGBoost is an Extreme Gradient boost (hence the full name). It is the perfect combination of software and hardware optimization techniques and can get excellent results in the shortest time with less computing resources.


Why does XGBoost work so well?

Both XGBoost and Gradient Boosting Machines (GBM) are ensemble tree methods that use Gradient drop architectures to boost multiple weak classifiers (usually CARTs). However, XGBoost improves on the GBM framework through system optimization and algorithm enhancement.

System optimization:

  1. Parallelization: XGBoost implements sequential tree building through parallelization. This is possible because of the interchangeability within the base learner loops, including the outer loops for the leaf nodes of the enumeration tree, and the inner loops for calculating features; The nesting of loops limits parallelism because you cannot start a new outer loop without completing two more expensive inner loops. The XGBoost algorithm initializes by using parallel threads to globally scan and sort all instances, making the order of the loops interchangeable, thus reducing the running time. In this way, the parallelization cost can be offset and the algorithm performance can be improved.

  2. Tree pruning: In the GBM framework, the criteria for stopping tree splitting are greedy in nature, depending on the loss value at the splitting point. XGBoost uses the max_depth parameter instead of a metric to stop splitting, and then starts pruning the tree in reverse. This “depth-first” approach significantly improves computational performance.

  3. Hardware optimization: The XGBoost algorithm is designed for efficient use of hardware resources. It allocates an internal cache for each thread to store gradient statistics. In addition, the “out-of-core computing” method is used to further optimize the available disk space when processing large data slices that are not suitable for memory.

Algorithm enhancement:

  1. Regularization (L1) : XGBoost penalizes overly complex models through both LASSO (L1) and Ridge (L2) Regularization to avoid overfitting.

  2. Sparsity Awareness: XGBoost automatically “learns” missing values in input based on training loss, thus receiving sparse features naturally and processing data from various Sparsity patterns more efficiently.

  3. Weighted Quantile Sketch: XGBoost uses the distributed Weighted Quantile Sketch algorithm to efficiently find the optimal segmentation points for most Weighted data sets.

  4. Cross-validation: The algorithm has built-in cross-validation at each iteration, eliminating the need to explicitly search and specify boosting iterations for a round of training.


Where is the proof?

We use sciKit-learn’s Make_Classification package to create a random sample set containing 1 million data points, 20 features (including 2 informational features and 2 redundant features), and test several algorithms with it: Logistic regression, random forest, standard gradient lifting, and XGBoost.

As shown in the figure above, the XGBoost model achieves the best predictive performance and the shortest processing time compared to other algorithms. Similar results were found for other rigorous benchmarks. It’s no surprise, then, that XGBoost has been widely adopted in recent data science competitions.

“When in doubt, XGBoost is the way to go,” says Owen Zhang, winner of Avito Kaggle’s Contextual AD click prediction contest.


Can we use XGBoost in any situation?

In machine learning (or indeed in life), there is no free lunch. As a data scientist, you have to test all the algorithms for the data at hand to find the one that works best. However, it is not enough to select the correct algorithm, the hyperparameters of the algorithm must be configured correctly for the data set. In addition, other factors such as computational complexity, interpretability and ease of use should be considered in the selection of the optimal algorithm. This is where machine learning moves from science to art, and where the magic happens!


What does the future hold?

Machine learning is a very active area of research, and various variants of XGBoost have emerged. Microsoft Research recently proposed the LightGBM gradient elevation framework, which shows great potential. Yandex Technologies developed CatBoost with impressive benchmarking results. It’s only a matter of time before a framework outperforms XGBoost in predictive performance, flexibility, interpretability, and practicality. However, until this stronger challenger arrives, XGBoost will continue to rule the machine learning world!


Leave a comment below. Thanks to Venkat Anurag Setty for co-writing this article.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.