Welcome to “Algorithms and the Beauty of Programming” ↑ pay attention to us!

This article was first published on the wechat official account “Beauty of Algorithms and Programming”. Please follow us and learn more about this series of articles in time.

Concept of multiple linear regression

In regression analysis, if there are two or more independent variables, it is called multiple regression. Changes in social and economic phenomena are often influenced by multiple factors. For example, household consumption expenditures are influenced not only by household disposable income, but also by a variety of factors such as household wealth, price level, interest on deposits in financial institutions and so on. Therefore, it is generally necessary to conduct multiple regression analysis, and we call the regression including two or more independent variables multiple linear regression. Unary linear regression is a major influencing factor as an independent variable to explain the change in dependent variables. In fact, a phenomenon is often associated with multiple factors. It is more effective and practical to predict or estimate the dependent variable by the optimal combination of multiple independent variables than to use only one independent variable. Therefore, multiple linear regression is more practical than one linear regression.

Multivariate linear regression is similar to unary linear regression, which can be used to estimate model parameters by the least square method, but also need to carry out statistical tests on the model and model parameters.

The selection of appropriate independent variables is one of the preconditions for correct multiple regression prediction. The selection of independent variables in multiple regression model can be solved by using the correlation matrix between variables.

Multivariate linear regression obeys normal distribution

Multiple linear regression requires obedience to gaussian or normal distribution.

Normal distribution function:

3. Multiple linear regression model

The multiple linear regression model is:

Where, B0 is a constant term, b1, B2… Bk is the regression coefficient, b1 is X1,X2… When Xk is fixed, the effect of x1 increasing by one unit on y is the partial regression coefficient of X1 on y. Similarly, b2 is X1,X2… When Xk is fixed, the effect on y of each additional unit of x2, that is, the partial regression coefficient of x2 with respect to y, and so on. If the two independent variables X1 and x2 are linearly correlated with the same dependent variable y, the binary linear regression model can be described as follows:

The least square method is used to solve the parameters. Taking the binary regression model as an example, the standard equations for solving regression parameters are:

Solving this equation gives the values of B0, B1 and B2. It can also be obtained by the following matrix method:

That is:

Maximum likelihood estimation and least square method:

Here are two variables:

Use maximum likelihood estimation to explain least two:

Logarithmic likelihood and least squares of Gauss:

4. Detection and evaluation of multiple regression model

The multivariate regression model is the same as the unitary linear regression model. After obtaining the estimation value of the least square method of parameters, necessary tests and evaluations are needed to determine whether the model can be applied. The following steps are required:

1) Measurement of fitting degree

And monadic linear regression of determination coefficient R2, multiple linear regression in the R2 also have multiple r-squared figures, it is a total change in the dependent variable, the regression equation to explain the change of proportion of sum of squares (regression), R2, the greater the return side each of sample sites, the more the degree of fitting all the independent variable and dependent variable, the more closely.

The calculation formula is:

2) Estimate standard error

The estimated standard error is the standard error between the actual value of dependent variable Y and the estimated value obtained by the regression equation. The smaller the estimated standard error is, the better the fitting degree of the regression equation is.

Where, k is the number of independent variables in the multiple linear regression equation.

3) Significance test of regression equation

Significance test of regression equation is to test the significance of the whole regression equation, or to evaluate whether the linear relationship between all independent variables and dependent variables is close.

F test is often used, and the formula for calculating F statistics is:

According to the given significance level A, degree of freedom (k, n-K-1), check the F distribution table and obtain the corresponding critical value Fa. If F > Fa, the regression equation has significant significance and the regression effect is significant; if F<Fa, the regression equation has no significant significance and the regression effect is not significant.

Application of multiple linear regression

(1) Determine whether there is a correlation between several specific variables, and if so, find out the appropriate mathematical expressions between them;

(2) According to the value of one or several variables, predict or control the value of another variable, and can know what kind of accuracy this prediction or control can achieve;

(3) Carry out factor analysis. For example, among many variables (factors) that jointly affect a variable, find out which are important factors and which are secondary factors, and what is the relationship between these factors and so on.

In real life, multiple linear regression can analyze many things, such as: the analysis of the influencing factors of residents’ savings, housing prices, medical expenses, the analysis of the influencing factors of the elderly hypertension, the analysis of the quality of life of AIDS patients.

More interesting articles:


Tips: Click on the lower right corner of the page “Write a message” to comment, looking forward to your participation! Looking forward to your forwarding!