This article is participating in Python Theme Month. See the link to the event for more details

The target

Using only Python and its common operation array library Numpy and visual library Matplotlib to complete a linear model, from reading the data set, processing the data, finding the function set, setting the objective function to the gradient descent algorithm to update the parameters to optimize the loss function step by step. Gives you an overview of the process of implementing a model. In the whole process, for some key points or difficult points to give necessary explanations, if there is a deviation or not detailed enough in the article, welcome to leave a message to discuss.

A small request

  • Familiar with simple higher mathematics, such as taking derivatives
  • Can write code in Python
  • Just a little bit about Numpy and matplotlib
%matplotlib inline
# import libs
import numpy as np
import matplotlib.pyplot as plt
Copy the code

Getting a data set

Usually, we first analyze the data set, which has only two fields. The data set is relatively simple, with only 97 records, and each record has two columns. The first column is used as the sample attribute, and the second column is used as the label of the sample. Anyone who needs a data set can leave me a comment and I’ll send it to you.

datafile = 'data/dataset.txt'
Copy the code

To read data, use the np.loadtxt method. The parameter can specify the field divider. Each row is separated by a comma

# load dataset comma , first col is x second is y(label)
cols = np.loadtxt(datafile,delimiter=', ',usecols=(0.1),unpack=True)
Copy the code

Because unpack=True means reading by column, so reading data is one attribute per row and one record per column, so we use the cols variable to accept data, which means reading data by column

cols.shape()# (2, 97)
Copy the code
(2, 97)
Copy the code

The next step is to set the first row of the data set (that is, the attribute values of the first column of each sample) as sample X, and then extract the label y of the second row of the data set to use

X = np.transpose(np.array(cols[:-1]))
Copy the code
X.shape # (97, 1)
# shapes [[x],[],[]]
Copy the code
(97, 1)
Copy the code
y = np.transpose(np.array(cols[-1:]))
y.shape # (97, 1)
Copy the code
(97, 1)
Copy the code
# Sample size
m = y.size
m
Copy the code
97
Copy the code

Insert (x,0,1,axis=1) insert(x,0,1,axis=1) insert(x,0,1,axis=1) insert(x,0,1,axis=1) insert(x,0,1,axis=1) insert(x,0,1,axis=1) X form is now [[x1], [x2],… [xn]] to shaft with insert method 1, that is, each record first add a 1 0 position then programming [[1 x1], [1, x2],…, [1, xn]]

X = np.insert(X,0.1,axis=1)
X.shape# (97, 2)
Copy the code
(97, 2)
Copy the code
X[:2]
Copy the code
Array ([[1., 6.1101], [1., 5.5277]])Copy the code

Let’s look at the shape of X and y, and then use Matplotlib to display the sample points separately in the frame

X.shape, y.shape
Copy the code
((97, 2), (97, 1))
Copy the code
plt.figure(figsize=(10.6))
plt.plot(X[:,1],y[:,0].'rx',markersize=10)
plt.grid(True)
plt.ylabel('y')
plt.xlabel('x')
Copy the code
The Text (0.5, 0, 'x')Copy the code

Find a set of functions

So what I’m going to do is I’m going to find a set of functions of the form y=mx+cy =mx+cy =mx+c, and the output is a linear combination of x, m, and c, We further expressed as h theta (x) = Θ Tx = theta 1 x + theta 0 h_ (x) = {\ theta} \ theta ^ Tx = + \ \ theta_1x theta_0h theta (x) = Θ Tx 1 x + theta 0 = theta, including 1 1 \ theta_1 theta. Theta said function the slope of the linear equations, Here, Theta \ theta_1 Theta 1 Θ = [Theta. Theta 0 1] \ Theta = \ begin {bmatrix} \ theta_0 & \ theta_1 \ end Θ = {bmatrix} [Theta. Theta 0 1]


h Theta. = Theta. T x = Theta. 0 + Theta. 1 x h_{\theta} = \theta^T x = \theta_0 + \theta_1 x

iterations = 1500
alpha = 0.01
Copy the code
# y = \theta_0 x + \theta_1
def hypothesis_fxn(theta,X) :
# (97, 2), (2, 1)
    return np.dot(X,theta)
Copy the code

Cost function (loss function)

The cost function measures the error of finding the function corresponding to the parameter. In the cost function, ℎ𝜃(𝑥(𝑖)) is the output value of the model. And y(I) is a true value. Used to calculate the difference between the predicted value and the true value. This is a linear example. As we know, the value of (ℎ𝜃(𝑥(𝑖))-𝑦(𝑖)) can be positive or negative. You can cancel out the negative values by squaring the errors and then summing them up. You can also take the average. The cost function of regression has a number of variables. Such as: mean error (MSE), mean absolute error (MAE), etc. J (theta) = 12 m ∑ I = 1 m (h theta – y (x (I)) (I)) 2 J (\ theta) = \ frac {1} {2} m. \ sum_ {I = 1} ^ m (h_ {\ theta} (x ^ {(I)}) – Y ^ {} (I)) ^ 2 j (theta) = 2 m1 ∑ I = 1 m (h theta – y (x (I)) (I)) 2

def cost_fxn(theta, X, y) :
    h = hypothesis_fxn(theta, X)
# print(h)
    d = h-y
    
    c = 1/ (2 * m)
    loss = c * np.sum(d **2)
    return loss
Copy the code

Initial parameter value

We initialize the parameters that the model is learning (weights, bias) to zero and then we calculate the value of the loss function without learning anything

# \ theta (2, 1)
intial_theta = np.zeros((X.shape[1].1))
Copy the code
intial_theta.shape
Copy the code
(2, 1)
Copy the code
cost_fxn(intial_theta,X,y)
Copy the code
32.072733877455676
Copy the code

Start training

We start with a simple parameter value (usually 0), calculate the gradient, and update the parameter with the gradient value of the current parameter. In order to minimize the cost function, the derivative of the parameter relative cost function with respect to theta is calculated. The derivative of 𝐽(𝜃) with respect to 𝜃 is the direction of the adjustment parameter, and the magnitude is related to α\alphaα, α\alphaα is the step size of the updating gradient. Also known as the learning rate.


Theta. j = Theta. j Alpha. 1 m i = 1 m ( h ( Theta. ) ( x ( i ) ) y ( i ) ) x ( i ) \theta_j = \theta_j – \alpha \frac{1}{m} \sum_{i=1}^m (h_{(\theta)}(x^{(i)}) – y^{(i)})x^{(i)}

Super parameter

  • Iterations
  • Vector alpha
def gradient_descent(X, theta=np.zeros(2)) :
# cost list plot cost
    costs = []
    theta_history = []
    for i in range(iterations):
        temp_theta = theta
# cal loss cost
        c = cost_fxn(theta, X,y)
    
        costs.append(c)
        
        theta_history.append(list(theta[:,0]))
        
        for j in range(len(temp_theta)):
#         
            temp_theta[j] = theta[j] - (alpha / m) * np.sum((hypothesis_fxn(theta,X) -y )*np.array(X[:,j]).reshape(m,1))
        theta = temp_theta
    return theta, theta_history,costs
        
Copy the code
Costs and theta_history are used to save the lost values and parameter values in each iteration for drawing images, and to understand the training process by observing the images.Copy the code
initial_theta = np.zeros((X.shape[1].1))
initial_theta
Copy the code
array([[0.],
       [0.]])
Copy the code
list(initial_theta[:,0])
Copy the code
[0.0, 0.0]
Copy the code
theta, thetahisotry,jvec = gradient_descent(X, initial_theta)
jvec = np.array(jvec).reshape(-1.1)
Copy the code
def plot_convergence(jvec) :
    plt.figure(figsize=(10.6))
    plt.plot(range(len(jvec)),jvec,'co')
    
    plt.grid(True)
    plt.title('Convergence of Cost Function')
    plt.xlabel("Iteration number")
    plt.ylabel("cost function")
    dummy = plt.xlim([-0.05 * iterations, 1.05 * iterations])
Copy the code
plot_convergence(jvec)
# dummy = PLT. Ylim ([4, 7)
Copy the code

thetas = np.array(thetahisotry)
plt.plot(thetas)
plt.grid(True)
plt.legend([r"$\Theta0$".r"$\Theta1$"])
plt.ylabel(r"$\Theta")
Copy the code
The Text (0, 0.5, '$\ \ Theta')Copy the code

To predict

Once we’ve trained the model, we can draw some lines that correspond to the model, and then we can see how that line fits the sample

def prediction(theta,x) :
    return theta[0] + theta[1] * x
Copy the code
pred = prediction(theta,X[:,1])
Copy the code
plt.plot(X[:,1],pred)
plt.plot(X[:,1],y[:,0].'rx',markersize=10,label='Training Data')
Copy the code
[<matplotlib.lines.Line2D at 0x124e36a90>]
Copy the code