Author: alg – flody
Editor: Emily
The first three days push machine learning linear regression algorithm of the least square method, from the hypothesis to the principle, detailed analysis of direct solution and gradient descent two algorithms, then manually write Python code to achieve linear regression algorithm.
01 Data Preprocessing
After getting a data set, it is often necessary to go through a long pre-processing process. Do not ignore this seemingly unrelated step to model building and model solving, but it is actually a very important step to prepare for the subsequent work. Now the focus of this section is not on the pretreatment method, so I will not introduce the pretreatment process in detail here, but focus on the example of simulating the linear regression least square method of two solutions.
After obtaining the data set, the first 10 items of pre-processed data are displayed as follows. The first item is the area of the house, the second item is the service life of the house, and the third item is the label value of the house. These values have been pre-processed.
Life value of housing area
[[0.35291809, 0.16468428, 0.35774628].
[0.55106013, 0.10981663, 0.25468008].
[0.65439632, 0.71406955, 0.1061582].
[0.19790689, 0.61536205, 0.43122894].
[0.00171825, 0.66827656, 0.44198075].
[0.2739687, 1.16342739, 0.01195186].
[0.11592071, 0.18320789, 0.29397728].
[0.02707248, 0.53269863, 0.21784183].
[0.7321352, 0.27868019, 0.42643361],
[0.76680149, 0.89838545, 0.06411818]]
The linear regression is solved by direct method and gradient descent method.
Let’s start with the libraries we use:
‘Import numpy library’
import numpy as np
‘import pyplot’
import matplotlib.pyplot as plt
‘Import time module’
import time
In this simulation, the first 100 data of the data set are taken for iterative calculation, that is, M = 100.
Make a combination of offset and 2 features, so as to link with the theory section of the previous push, the combined code
As follows:
‘offset b shape=(100,1)’
b = np.array([1])
b=b.repeat(100)
Shape = (100,3)’ combine offset with 2 eigenvalues
X = np.column_stack((b,X))
02 Directly solve the parameters
We know that when we establish the linear regression model, because it is linear and the error term meets the Gaussian distribution, the calculation formula of weight parameters can be directly obtained by using the maximum likelihood estimation. If you want to see the theoretical part, please refer to the direct solution:
‘This is a module for finding the inverse of a matrix.’
from numpy.linalg import linalg as la
xtx = X.transpose().dot(X)
xtx = la.inv(xtx)
‘Find the parameters directly’
theta = xtx.dot(X.transpose()).dot(y)
This solution is very simple. Directly apply the formula to calculate the weight parameters as follows:
Array ([0.29348374, 0.10224818, 0.19596799])
That is, the offset is 0.29, the weight parameter of the first feature is 0.10, and the weight parameter of the second feature is 0.195.
So let’s do gradient descent, which is what we’re going to focus on, and this is consistent with other machine learning algorithms, like logistic regression, and so on, so it’s worth studying.
03 Gradient descent parameters
For detailed introduction of gradient descent, please refer to the section of gradient descent to solve weight parameters. Next, we discuss how to convert theory into code.
Firstly, the steps of gradient descent are listed, and the linear regression model is adopted to calculate the cost function, and then the gradient is obtained, and the partial derivative is an important step. Then, a learning rate iteration parameter is set, and when the difference between the cost function of the previous step and the current cost function is less than the threshold, the calculation is finished, as follows:
-
The linear regression model established by ‘model’
-
Cosine of t
-
‘gradient’ gradient formula
-
The ‘Theta update’ parameter updates the formula
-
‘Stop Stratege’ Iterative stop strategies: cost function less than threshold method
The concrete implementation code of the above five steps is written below respectively.
‘model’
def model(theta,X):
theta = np.array(theta)
return X.dot(theta)
‘cost’
def cost(m,theta,X,y):
‘print(theta)’
ele = y – model(theta,X)
item = ele**2
item_sum = np.sum(item)
return item_sum/2/m
‘gradient’
def gradient(m,theta,X,y,cols):
grad_theta = []
for j in range(cols):
grad = (y-model(theta,X)).dot(X[:,j])
grad_sum = np.sum(grad)
grad_theta.append(-grad_sum/m)
return np.array(grad_theta)
‘theta update’
def theta_update(grad_theta,theta,sigma):
return theta – sigma * grad_theta
‘stop stratege’
def stop_stratege(cost,cost_update,threshold):
return cost-cost_update < threshold
‘OLS algorithm’
def OLS(X,y,threshold):
start = time.clock()
‘Sample number’
m=100
‘Set the initial value of the weight parameter’
Theta = [0, 0]
‘Iterative steps’
iters = 0;
‘Record the value of the cost function’
cost_record=[]
‘Learning rate’
Sigma = 0.0001
cost_val = cost(m,theta,X,y)
cost_record.append(cost_val)
while True:
grad = gradient(m,theta,X,y,3)
‘Parameter Update’
theta = theta_update(grad,theta,sigma)
cost_update = cost(m,theta,X,y)
if stop_stratege(cost_val,cost_update,threshold):
break
iters=iters+1
cost_val = cost_update
cost_record.append(cost_val)
end = time.clock()
print(“OLS convergence duration: %f s” % (end – start))
return cost_record, iters,theta
The results showed that OLS gradient descent reached initial convergence after the following time, OLS convergence duration: 7.456927s, after more than 30,000 time-step iterations, the value of cost function was calculated for each time-step, as shown in the figure below:
When converging, the weight parameters obtained are:
Array ([0.29921652, 0.09754371, 0.1867609])
It can be seen that the weight parameters obtained by gradient descent are basically similar to those obtained by direct calculation method, and the error is due to the lack of further iteration.
04 summary
The above is the code implementation of the two solutions of the least square method, so far we have explained the most basic OLS of the linear regression algorithm from the theory, hypothesis, to the present code implementation. Let’s take a look at the distant sea, and the lofty mountains, and relax!
However, when faced with data sets where two or more columns are strongly correlated, is OLS still competent? If not, what are the reasons?
Please see tomorrow’s post, defects and principles of OLS algorithm.
Welcome to algorithmic Channel
Exchange ideas, focus on analysis, focus on process, including but not limited to: classical algorithms, machine learning, deep learning, LeetCode problem solving, Kaggle actual practice, English salon, regularly invite experts to tweet. Looking forward to your arrival!