Linear regression algorithm (II)

The content of this section is to measure the index of linear regression algorithm, and the graph is as follows:

Mean square error MSE

One problem: this metric is related to m. We can’t compare them so we can easily modify our measure by dividing them all by m.

Root mean square error RMSE

Another problem with this MSE is that it has different dimensions. Data sets are in ten thousand yuan, but MSE is ten thousand square, obviously different dimensions. It’s the same principle as why if you have standard deviation, you also have variance, it’s all about unifying dimensions.

So, our solution is the same as the variance and standard deviation. Let MSE take the square root to get RMSE:

Mean absolute error MAE

There is also a very straightforward method, which is as follows:

RMSE vs MAE

The dimensions of RMSE and MAE are the same, they are the dimensions corresponding to y in the data. Their differences are as follows:

In actual combat

Let’s practice with real Boston housing prices.

Step 1: Import data –
Step 2: Intercept the room number characteristics
Step 3: Draw a scatter plot
Step 4: Eliminate distractions


from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,shuffle =Awesome!)
Copy the code

After the separation:

Step 5: Call regression algorithm for prediction

Step 6: Algorithmic measurement

R Squared

In fact, there is a problem with the evaluation criteria mentioned above, that is, instead of using the usual classification accuracy standard, that is, 0 represents the worst,1 represents the best, and then the algorithm accuracy value is between (0,1), we can easily compare the advantages and disadvantages of the two algorithms. For example, in algorithm 1, I use the size of the house as the feature, and in algorithm 2, I use the location of the house from the city center as the feature. After calculation by RMSE or MAE, it is impossible to measure the advantages and disadvantages of the two algorithms, because one is the area and the other is the distance, not the same thing.

Therefore, we need to introduce a new metric: R Squared.