“This is the 26th day of my participation in the First Challenge 2022. For details: First Challenge 2022.”

Today we’re going to talk about linear regression models, and we all learned in school that the idea of linear regression is to solve an equation and get a regression function. Specifically, the relationship between the dependent variable Y and the independent variable X. It’s called unary linear regression if there’s only one X, and if there’s multiple X’s (X1,X2,X3…) It’s called multiple linear regression.

The sklearn library in python provides liner_model.linearregression to implement LinearRegression. Let’s take a simple example of how the process works:

LinearRegression() X = [[1,1],[2,1],[3,1]] Y = [3,4,5] test.fit(X,Y) print(test.coef_) print(test.intercept)Copy the code

We’ve already done a regression on X and Y using the fit function.

Where, coef_ stores the regression coefficient, intercept_ stores the intercept, and the equation is the value of the two variables. Of course, this model will also have some problems, such as over-fitting, under-fitting. Let’s talk about underfitting first. Underfitting means that the model does not capture the characteristics of the data well and cannot fit the data well. Like, if you take a parabola model, and you make a machine with a single equation, that’s underfitting. The usual solution is to add additional feature items or lower the value of the regularization parameter.

Overfitting means that a parabola is fitted out of a straight line, which overlearns the details and noise in the training data, resulting in poor performance of the model in the new data set. At this point, we usually rewash the data or increase the amount of training for the data.

We can also use matplotlib to draw the graph, import matplotlib:

import matplotlib.pyplot as plt
from matplotlib.font_manager import FontProperties
Copy the code

Then create the canvas, figure,title, X-axis and Y-axis

plt.figure(figsize=size)
Copy the code

Because the above data is virtual construction, there are some problems with the data, so I can’t draw anything. But when we apply it to real problems, it’s easy to see the dispersion, or the outliers, and of course when we build models, we don’t have to go for too complicated a model, we just have to apply it.