First of all, we have a general understanding of the working principle and usage of machine learning model. If you’ve already done some statistical modeling or machine learning work, this might seem too basic, but don’t worry, we’ll be building some powerful models soon.
Decision tree model
This micro course will allow you to build models after you are familiar with the following scenarios.
Your cousin has spent millions of dollars on real estate forecasting, and since you are very interested in data science, he offers you an opportunity to do business with him. He’s going to fund it, and you’re going to build models that predict the value of different properties.
As a professional excavation engineer, you ask your cousin how he used to predict real estate values. He tells you it’s all about intuition. But then you ask questions that suggest he has identified price patterns from real estate he has seen in the past, and that he is using those patterns to value the properties he is considering.
Machine learning works the same way. Let’s start with a model called a decision tree. Of course, there are plenty of good models out there that give better predictions. But decision tree models are easy to understand and are the building blocks of some of the best models in data science.
For simplicity, we’ll start with the simplest decision number.
This decision tree divides all houses into two categories. The forecast price for any house considered is the historical average price for the category in which the house is located.
We used the data to determine how to group all the houses into two groups, and then determined the predicted prices in each group. The step of capturing patterns from data is called fitting or training models. The data used for model fitting is called training data.
The details of model fitting, including how to slice the data, are a complex process that we will leave to be solved later. After the model fits, you can use it to predict the price of other houses.
Improved decision tree model
By fitting the training data, which of the following 2 decision trees is more likely to be generated?
Obviously, the first tree on the left is more meaningful because it captures the reality that houses with more bedrooms are usually more expensive than houses with fewer. But the model’s biggest flaw is that it fails to capture the many factors that affect the price of a house, such as the number of bathrooms, the number of hands (first or second) and the location of the house.
You can capture more house-price influences by a tree with more branches. Such multi-branched trees are called deeper trees.
By tracking the decision tree, you can predict any house price by choosing a path that matches the characteristics of the predicted house. House price predictions are located at the very bottom of the tree, and the points at the bottom of the tree for prediction are called leaf nodes.
The tree branches and leaf node values will be determined by the data, so it’s time to take a look at the data that will be used. Machine Learning series (2) blog.csdn.net/fwj_ntu/art…
Welcome to pay attention to wechat public number: “Data Analyst notes”, grow up and progress together!