concept

Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the interdependent quantitative relationship between two or more variables, which is widely used. It is expressed as y = W ‘x+e, where the error follows a normal distribution with a mean of 0. In regression analysis, only one independent variable and one dependent variable are included, and the relationship between them can be approximated by a straight line. This regression analysis is called unary linear regression analysis. If the regression analysis includes two or more independent variables and the relationship between dependent variables and independent variables is linear, it is called multiple linear regression analysis.

The instance

The figure below is unary linear regression, which simulates a straight line and makes known data points fall on or around the straight line as far as possible.Expressed by the formula:
f ( x ) = w x + b f(x)=wx+b
W is the coefficient and b is the intercept. When this concept is extended to n x, that is, the final result of Y is jointly affected by multiple x, then:
f ( x ) = w 1 x 1 + w 2 x 2 + . . . + w n x n + b f(x)=w_1x_1+w_2x_2+… +w_nx_n+b
Suppose the dataset has m pieces of data, each of which corresponds to n x’s and a y, then the whole x can be represented by an m×n matrix, and the whole y can be represented by an M ×1 matrix. For a single piece of data, x can be represented as an m by 1 matrix, and this y is a value.

And for this single piece of data, you can say that
f ( x ) = w 1 x 1 + w 2 x 2 + . . . + w 4 x 4 + w 5 x 5 + b f(x)=w_1x_1+w_2x_2+… +w_4x_4+w_5x_5+b

b = w 0 x 0 Among them w 0 = b . x 0 = 1 B = w_0x_0 where w_0 = b,x_0 = 1

f ( x ) = w 0 x 0 + w 1 x 1 + w 2 x 2 + . . . + w 4 x 4 + w 5 x 5 f(x)=w_0x_0+w_1x_1+w_2x_2+… +w_4x_4+w_5x_5

When we generalize from a single piece of data to the whole data, we can get f(x)=WXTf(x) =WX ^Tf(x)=WXT but the predicted value f(x) from W and T is different from the actual y, For their square error as coss = ∑ I = 1 m (WxiT – yi) 2 coss = \ sum_ {I = 1} ^ m (Wx_i ^ T – yi) ^ 2 coss = ∑ I = 1 m (WxiT – yi) 2

So let’s minimize coss and solve for W. The expression of W optimal solution is :(key!!)

This gives you the model of the function based on the given data set, and then you input the new X (here the dimension is increased by 1 because x0 = 1), and you get the new predicted value y.

Note: For a single piece of data, I’m going to set both x and W as row vectors 1×n so x = [1, 2, 3] w = [1, 2, 3] or I can set them as column vectors, so that all of x and all of Y will change, f of x will change, Now f(x)=XTW or f(x)=WTXf(x) = x ^TW or f(x)=W ^TXf(x)=XTW or f(x)=WTX because for some data, x and W become a 3×1 array, transpose one of them into a 1×3 array and multiply it by another 3×1, I’ll get an exact number. This resets coSS to calculate W. Refer to another vector representation below for details