TensorFlow Decision Forests is an open source software library for KerAS-based Decision Forests. It aims to introduce some of the most advanced decision forest algorithms (e.g., random forest, GBDT, LambdaMart) into TensorFlow in an easy-to-use way. Decision forest has been the most advanced machine learning algorithm for modeling tabular data for a long time. Decision forests provide excellent performance in many machine learning applications, such as learning rankings.
Classification and decision forests
What is classification?
- A tabular data set
- It contains samples (rows) and attributes (columns)
- Some properties are category properties, some properties are number properties
Classification: Use models to predict category attributes through other attributes.
Why is categorization important?
- Data that is difficult or costly to obtain can be obtained
What is a model?
Models: Select (or train) models that best match available observations (called “tagged samples”).
The decision tree
- A common model
- A set of problems organized hierarchically in a tree structure (highlighted in green, also known as decision nodes)
- Leaf nodes (indicated in yellow) contain the predicted results
- Typically, the question addresses a single attribute (axis alignment) and the answer is binary (binary tree)
Decision tree learning
Greedy strategies are used to grow problem by problem to maximize local scoring functions (e.g., information gain, mean square error).
Keep recursing and get a decision tree:
Decision forest
- Sum the predicted results of multiple decision trees
- Often consisting of hundreds or thousands of decision trees
- Prediction results tend to be more accurate (but slower) than individual decision trees
- Different algorithms can be used to train decision trees together (e.g. random forest, gradient ascending tree, AdaBoost)
TensorFlow decision forest library
- TensorFlow provides a series of decision forest algorithms
- Easy to use
- The TensorFlow toolkit is available
- Support advanced Settings, such as decision forest + neural network combination
Tf-df core code:Model visualization:Summary Displays various information about the model:Use with tensorFlow’s other tools:
When to use a decision forest?
- Processing tabular data
- Simplicity: No need to tweak too much
- Can interpret sexual
- Speed: includes training speed and inferred speed