Logistic regression is a generalized linear regression, through the construction of regression function, the use of machine learning to achieve classification or prediction.

The principle of

The last article briefly introduced linear regression, and the principle of logistic regression is similar.

  1. Prediction function (h). This function is the classification function used to predict the judgment result of the input data. The process is critical to predict the “rough form” of the function, such as whether it is linear or non-linear. This article will refer to the corresponding section of machine learning in action and take a look at the data set.
// Two features -0.017612 14.053064 0-1.395634 4.662541 1-0.752157 6.538620 0-1.322371 7.152853 0 0.423363 11.054677 0 0.406704 7.067335 1Copy the code

  1. Cost function (loss function) : The deviation between the output H predicted by the function and the training data class Y, (h-y) or other form. Comprehensively consider the cost of all training data and sum or average them, as J function, to represent the deviation between the predicted value and the actual value of all training data.

  2. Obviously, the smaller the value of J function is, the more accurate the predicted function is (that is, the more accurate h function is), so the minimum value of J function needs to be found. Sometimes gradient descent is used.

The specific process

Constructing prediction function

Logistic regression, called regression, is actually classification and is used for two classification problems. The sigmoID function is given directly here.

Next, determine the classification boundary. As mentioned above, this data set requires a linear boundary. Different data requires different boundaries.

The classification function is determined, and the input is called z, so

The cost function

Based on the above, the prediction function is:

Logistic regression summary

Given the above cost function, we can use gradient ascent to minimize the function J. See the link above for the derivation.

To sum up, the gradient update formula is as follows:

Here is the Python code implementation:

# sigmoID function and initialization data
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def init_data():
    data = np.loadtxt('data.csv')
    dataMatIn = data[:, 0:-1]
    classLabels = data[:, -1]
    dataMatIn = np.insert(dataMatIn, 0, 1, axis=1)  # Feature data set, add 1 to construct constant term x0
    return dataMatIn, classLabels
Copy the code
Def grad_descent(dataMatIn, classLabels): dataMatrix = np.mat(dataMatIn)#(m,n)
    labelMat = np.mat(classLabels).transpose()
    m, n = np.shape(dataMatrix)
    weights = np.ones((n, 1))  Initial regression coefficient (n, 1)Alpha = 0.001# step
    maxCycle = 500  # Maximum number of cycles

    for i in range(maxCycle):
        h = sigmoid(dataMatrix * weights)  # sigmoid function
        weights = weights + alpha * dataMatrix.transpose() * (labelMat - h)  # gradient
    return weights
Copy the code
// Calculate the resultif __name__ == '__main__':
    dataMatIn, classLabels = init_data()
    r = grad_descent(dataMatIn, classLabels)
    print(r)
Copy the code

Enter the following:

[[4.12414349] [0.48007329] [-0.6168482]]Copy the code

W is the regression coefficient. W0 = 4.12414349, w1 = 0.4800, w2=-0.6168 The previously predicted linear equation 0 = W0x0 + W1x1 + W2x2 can be used to determine the boundary by the regression coefficient. x2 = (-w0 – w1*x1) / w2

Graph the function:

def plotBestFIt(weights):
    dataMatIn, classLabels = init_data()
    n = np.shape(dataMatIn)[0]
    xcord1 = []
    ycord1 = []
    xcord2 = []
    ycord2 = []
    for i in range(n):
        if classLabels[i] == 1:
            xcord1.append(dataMatIn[i][1])
            ycord1.append(dataMatIn[i][2])
        else:
            xcord2.append(dataMatIn[i][1])
            ycord2.append(dataMatIn[i][2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1,s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green'Arange (-1, 2, 0.1) y = (-weights[0, 0] -weights[1, 0] * x)/weights[2, 0]#matix
    ax.plot(x, y)
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.show()
Copy the code

As follows:

Algorithm to improve

Stochastic gradient ascent

In the above algorithm, each cycle matrix will be multiplied m * n times, and the time complexity is maxCycles* M * N. When the amount of data is large, the time complexity is large. Here we try to use the stochastic gradient rise method to improve. The idea of the STOCHASTIC gradient ascent method is that only one data sample point is used at a time to update the regression coefficient. This greatly reduces the computational overhead. The algorithm is as follows:

Def stoc_grad_ascent(dataMatIn, classLabels): m, n = np.shape(dataMatIn) alpha = 0.01 weights = np.ones(n)for i in range(m):
        h = sigmoid(sum(dataMatIn[i] * weights))  # Numerical calculation
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatIn[i]
    return weights
Copy the code

Test:

Improvement of stochastic gradient ascent

def stoc_grad_ascent_one(dataMatIn, classLabels, numIter=150):
    m, n = np.shape(dataMatIn)
    weights = np.ones(n)
    for j in range(numIter):
        dataIndex = list(range(m))
        for i inRange (m): alpha = 4 / (1 + I + j) + 0.01Ensure that new data remains relevant after multiple iterations
            randIndex = int(np.random.uniform(0, len(dataIndex)))
            h = sigmoid(sum(dataMatIn[i] * weights))  # Numerical calculation
            error = classLabels[i] - h
            weights = weights + alpha * error * dataMatIn[i]
            del(dataIndex[randIndex])
    return weights
Copy the code

It is possible to plot the fluctuation of the regression coefficients in the above three cases. It is found that the third method converges faster. Evaluate the advantages and disadvantages of the algorithm to see whether it converges or not and whether it reaches a stable value. The faster the convergence, the better the algorithm.

conclusion

The gradient up and gradient down are the same thing, you’re maximizing the function, and you have to change the sign a little bit. A gradient means you’re moving some distance in the x direction and some distance in the y direction. The derivative of cosine of t with respect to x and y.

For the complete code, see github: Logistic Regression

Reference article :Logistic regression for Machine learning and Implementing Machine Learning in Python Notes :Logistic regression summarizes a series of basic algorithms for machine learning