Mathematical derivation + Pure Python implementation of machine learning algorithms: logistic regression

Since the launch of the first lecture of this series, has been a lot of students’ response and approval, there are also students said that it is best to write the mathematical derivation part of the detailed point, the author can only say try my best, because the formula is too waste of time. In this section, we will learn the logistic regression model. We will continue to follow the manual formula and pure Python writing style.

Logistic regression doesn’t have much to do with logic per se, it’s a literal translation of the name. What do you call it? In fact, logistic regression should be called logarithmic probability regression, which is a generalization of linear regression, so we also call it generalized linear model in statistics. It is well known that linear regression is for machine learning tasks labeled as continuous values, so if we want to use linear models to do classification at all? The answer, of course, is yes.

The sigmoid function

While the dependent variable y of linear regression is a continuous value, the dependent variable of logistic regression is a binary value of 0/1, which requires us to establish a mapping to transform the original real value into a value of 0/1. This is where the familiar sigmoid function comes in:

Its function graph is as follows:

In addition to its elegant length, the sigmoid function also has a nice feature that its derivative calculation is equal to the following equation, which provides great convenience for us to find the gradient of cross entropy loss in the future.

Mathematical derivation of logistic regression model

According to sigmoid function, the basic form of logistic regression model is:

Invert the above equation slightly:

Y as below class posterior probability p (y | x = 1), the type can be written as:

There are:

Simply synthesize the above formula, which can be written as follows:

Written in logarithmic form, it is known as the cross entropy loss function, which is also the derivation of the cross entropy loss:

The above optimization formula is essentially what we call maximum likelihood estimation in statistics. Based on the above formula, partial derivatives of W and B can be obtained:

The optimal value of derivative parameters can be obtained by weight updating based on the gradient of W and B, which minimizes the loss function and obtains the maximum likelihood estimate of parameters. All roads lead to the same result.

A Python implementation of logistic regression

Just like we did in the last video on writing linear models, we need to get our heads around it before we actually do it. To write a complete logistic regression model we need: sigmoID function, model body, parameter initialization, parameter updating training based on gradient descent, data testing and visual presentation.

Start by defining a sigmoid function:

import numpy as np
def sigmoid(x):
    z = 1 / (1 + np.exp(-x))    
    return z
Copy the code

Define the model parameter initialization function:

def initialize_params(dims):
    W = np.zeros((dims, 1))
    b = 0
    return W, b
Copy the code

Define the main part of logistic regression model, including model calculation formula, loss function and parameter gradient formula:

def logistic(X, y, W, b):
    num_train = X.shape[0]
    num_feature = X.shape[1]

    a = sigmoid(np.dot(X, W) + b)
    cost = -1/num_train * np.sum(y*np.log(a) + (1-y)*np.log(1-a))

    dW = np.dot(X.T, (a-y))/num_train
    db = np.sum(a-y)/num_train
    cost = np.squeeze(cost) 

    return a, cost, dW, db
Copy the code

Define the training process of parameter updating based on gradient descent:

def logistic_train(X, y, learning_rate, epochs):    
    Initialize model parameters
    W, b = initialize_params(X.shape[1])  
    cost_list = []  

    # Iterative training
    for i in range(epochs):       
        # Calculate the current model calculation results, losses and parameter gradients
        a, cost, dW, db = logistic(X, y, W, b)    
        # update parameters
        W = W -learning_rate * dW
        b = b -learning_rate * db        

        # Record loss
        if i % 100 == 0:
            cost_list.append(cost)   
        # Print losses during training
        if i % 100 == 0:
            print('epoch %d cost %f' % (i, cost)) 

    # save parameters
    params = {            
        'W': W,            
        'b': b
    }        
    # Save gradient
    grads = {            
        'dW': dW,            
        'db': db
    }           
    return cost_list, params, grads
Copy the code

Define the prediction function for the test data:

def predict(X, params):
    y_prediction = sigmoid(np.dot(X, params['W']) + params['b']) 
    for i in range(len(y_prediction)):        
        ifY_prediction [I] > 0.5: y_prediction[I] = 1else:
            y_prediction[i] = 0
    return y_prediction
Copy the code

Use Sklearn to generate simulated binary data sets for model training and testing:

import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_classification
X,labels=make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=2)
rng=np.random.RandomState(2)
X+=2*rng.uniform(size=X.shape)

unique_lables=set(labels)
colors=plt.cm.Spectral(np.linspace(0, 1, len(unique_lables)))
for k, col in zip(unique_lables, colors):
    x_k=X[labels==k]
    plt.plot(x_k[:, 0], x_k[:, 1], 'o', markerfacecolor=col, markeredgecolor="k",
             markersize=14)
plt.title('data by make_classification()')
plt.show()
Copy the code

Data distribution is shown as follows:

Divide the data into simple training set and test set:

X_train = int(x.shape [0] * 0.9) X_train, y_train = X[:offset], labels[:offset] X_test, y_test = X[offset:], 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0print('X_train=', X_train.shape)
print('X_test=', X_test.shape)
print('y_train=', y_train.shape)
print('y_test=', y_test.shape)
Copy the code

To train the training set:

Cost_list, params, grads = lr_train(X_train, y_train, 0.01, 1000)Copy the code

The iterative process is as follows:

Prediction of test set data:

y_prediction = predict(X_test, params)
print(y_prediction)
Copy the code

The predicted results are as follows:

Define a classification accuracy function to evaluate the accuracy of training set and test set:

def accuracy(y_test, y_pred):
    correct_count = 0
    for i in range(len(y_test)):        
        for j in range(len(y_pred)):            
            if y_test[i] == y_pred[j] and i == j:
                correct_count +=1

    accuracy_score = correct_count / len(y_test)    
    return accuracy_score
    
# Print training accuracy
accuracy_score_train = accuracy(y_train, y_train_pred)
print(accuracy_score_train)
Copy the code

View test set accuracy:

accuracy_score_test = accuracy(y_test, y_prediction)
print(accuracy_score_test)
Copy the code

Without cross-validation, the accuracy of the test set is somewhat accidental. Finally, we define a graphic function to draw the model decision boundary to visually display the training results:

def plot_logistic(X_train, y_train, params):
    n = X_train.shape[0]
    xcord1 = []
    ycord1 = []
    xcord2 = []
    ycord2 = []    
    for i in range(n):        
        if y_train[i] == 1:
            xcord1.append(X_train[i][0])
            ycord1.append(X_train[i][1])        
        else:
            xcord2.append(X_train[i][0])
            ycord2.append(X_train[i][1])
        
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1,s=32, c='red')
    ax.scatter(xcord2, ycord2, s=32, c='green'X = np. Arange (-1.5, 3, 0.1) y = (-params['b'] - params['W'][0] * x) / params['W'][1]
    ax.plot(x, y)
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.show()

plot_logistic(X_train, y_train, params)
Copy the code

Components are packaged into logistic regression classes

Encapsulate the above implementation with a Python class:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_classification

class logistic_regression():    
    def __init__(self):        
        pass

    def sigmoid(self, x):
        z = 1 / (1 + np.exp(-x))        
        return z    
        
    def initialize_params(self, dims):
        W = np.zeros((dims, 1))
        b = 0
        return W, b    
    
    def logistic(self, X, y, W, b):
        num_train = X.shape[0]
        num_feature = X.shape[1]

        a = self.sigmoid(np.dot(X, W) + b)
        cost = -1 / num_train * np.sum(y * np.log(a) + (1 - y) * np.log(1 - a))

        dW = np.dot(X.T, (a - y)) / num_train
        db = np.sum(a - y) / num_train
        cost = np.squeeze(cost)        
        return a, cost, dW, db    
        
    def logistic_train(self, X, y, learning_rate, epochs):
        W, b = self.initialize_params(X.shape[1])
        cost_list = []        
        for i in range(epochs):
            a, cost, dW, db = self.logistic(X, y, W, b)
            W = W - learning_rate * dW
            b = b - learning_rate * db            
            if i % 100 == 0:
                cost_list.append(cost)            
            if i % 100 == 0:
                print('epoch %d cost %f' % (i, cost))

        params = {
            'W': W, 
            'b': b
        }
        grads = {            
            'dW': dW,            
            'db': db
        }        
        
        return cost_list, params, grads    
        
    def predict(self, X, params):
        y_prediction = self.sigmoid(np.dot(X, params['W']) + params['b'])        
        for i in range(len(y_prediction)):            
            ifY_prediction [I] > 0.5: y_prediction[I] = 1else:
                y_prediction[i] = 0

        return y_prediction    
            
    def accuracy(self, y_test, y_pred):
        correct_count = 0
        for i in range(len(y_test)):            
            for j in range(len(y_pred)):                
                if y_test[i] == y_pred[j] and i == j:
                    correct_count += 1

        accuracy_score = correct_count / len(y_test)        
        returnaccuracy_score def create_data(self): X, labels = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2, random_state=1, N_clusters_per_class labels = = 2) labels. Reshape ((1, 1)) offset = int (X.s hape [0] * 0.9) X_train, y_train = X [: offset]. labels[:offset] X_test, y_test = X[offset:], labels[offset:]return X_train, y_train, X_test, y_test    
        
    def plot_logistic(self, X_train, y_train, params):
        n = X_train.shape[0]
        xcord1 = []
        ycord1 = []
        xcord2 = []
        ycord2 = []        
        for i in range(n):            
            if y_train[i] == 1:
                xcord1.append(X_train[i][0])
                ycord1.append(X_train[i][1])            
            else:
                xcord2.append(X_train[i][0])
                ycord2.append(X_train[i][1])
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.scatter(xcord1, ycord1, s=32, c='red')
        ax.scatter(xcord2, ycord2, s=32, c='green'X = np. Arange (-1.5, 3, 0.1) y = (-params['b'] - params['W'][0] * x) / params['W'][1]
        ax.plot(x, y)
        plt.xlabel('X1')
        plt.ylabel('X2')
       plt.show()

            
if __name__ == "__main__":
    model = logistic_regression()
    X_train, y_train, X_test, y_test = model.create_data()
    print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
    cost_list, params, grads = model.logistic_train(X_train, y_train, 0.01, 1000)
    print(params)
    y_train_pred = model.predict(X_train, params)
    accuracy_score_train = model.accuracy(y_train, y_train_pred)
    print('train accuracy is:', accuracy_score_train)
    y_test_pred = model.predict(X_test, params)
    accuracy_score_test = model.accuracy(y_test, y_test_pred)
    print('test accuracy is:', accuracy_score_test)
    model.plot_logistic(X_train, y_train, params)
Copy the code

Well, about the content of logistic regression, the author will introduce this, as for how to test the classification effect of logistic regression in actual combat, you need to try yourself. In addition, LogisticRegression model is closely related to perceptron, neural network and deep learning. As a basic machine learning model, we hope you can firmly grasp its mathematical derivation and manual implementation, and then call the LogisticRegression module of sklearn. Can certainly can make up for the gap leak, have a wide benefit.

References:

Zhou Zhihua machine learning

Mathematical derivation + Pure Python implementation of machine learning algorithms: logistic regression

The sigmoid function

Mathematical derivation of logistic regression model

A Python implementation of logistic regression

Components are packaged into logistic regression classes

Related Posts

Lock-based concurrent Data Structures: How to lock data Structures?

Why AlphaGo Zero?

Building a garbage sorting system – based on tensorflow2.3