I learned about linear regression in my previous articles, so I would like to share my understanding of logistic regression with you this time.
First, what is the classification problem?
This is actually pretty straightforward, like if you have an apple and an orange in your hand, our classification problem can be described as how to write an algorithm that tells the computer which one is an apple and which one is an orange.
The output of the classification problem is discrete values, such as 1 for apples and 0 for oranges. However, the output of linear regression we learned before is continuous, such as predicting housing price, which cannot be expressed by 0 and 1.
So remember one thing: classification problems output discrete values, linear regression problems output continuous values.
What is logistic regression?
The logistic regression we are going to learn today is a classification problem. You may have doubts about “logistic regression”. Since it is a classification problem, why is it called regression problem? Why not call it a logical classification problem?
I feel a little weird, too, but the master was born before us? If we had been born earlier and invented this algorithm, we might have called it logical classification.
Since it can’t be changed, we just have to accept it and remember it as a classification problem.
Third, the hypothesis function of logistic regression
Remember the hypothesis function of linear regression, the model of prediction, we used polynomial, but in the classification problem we had to change the model, why?
Very simple, we can know from the definition of classification problem and linear regression problem, linear regression problem outputs continuous value (housing price), logistic regression only outputs discrete value (0 1), so the output of the model is different, so we need to choose a function that can output discrete value:
Among themIs the eigenvector,Represents the vector of parameters to be learned.
But in machine learning classification problems, the first step before the model outputs 0 or 1 is usually to determine the probability of 0 or 1, rather than directly output 0 or 1 based on the instance data. For example, the model predicts that the probability of apples is 90%, and the probability of oranges is 10% (because the sum of probabilities is 1). Then the model thinks that the fruit is most likely to be an apple, so output 1 to indicate that the currently recognized fruit is an apple.
According to this probabilistic property, our logistic regression assumes that the Function takes a commonly used logical Function Sigmoid Function:
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
Copy the code
Use this function as a hypothesis function for logistic regression, so that it can be based on the input parametersTo outputFor example, outputThat means there’s a 90% chance it’s an apple, and there’s a 10% chance it’s an orange.
4. Classification boundary of logistic regression
In the classification problem, there is the concept of Decision Boundary, because we ultimately want to classify data by function, which is reflected in the coordinate system. It is the function curve that divides data into two categories, for example, one is apple and one is orange.
The purpose of understanding classification boundaries is to understand how logistic regression hypothesis functions work. The following is a small example to illustrate how the classification boundary is derived, which is also easy to understand.
We assume:
- When, forecast, Apple
- When, forecast, orange
It can be seen from the Sigmoid function image:
- Between:
- :
- Between:
And becausePay attention here, so the above hypothesis can be replaced by:
- When, forecast, Apple
- When classifying boundaries!
- When, forecast, orange
To illustrate this intuitively, use a picture:
This diagram illustrates this in great detail, and the red line in the middle is the classification boundary, both sides are greater than 0 and less than 0 respectively. In practical application, we often put theTo represent by merging.
In this example, the classification boundary is a straight line, but the classification boundary can also be nonlinear, such as a circle:
These are two simple examples of classification boundaries. In practice, more complex polynomials can be used to fit very complex classification boundaries.
Cost function and gradient descent
Like linear regression, logistic regression also needs to define the cost function to find the optimal parameters of the modelHowever, the cost function cannot use the sum of squares of model errors, because if the assumed function of logistic regression is put into the cost function of square error, the cost function will not be convex, and then the cost function will generate many local optimal values, which will have a great influence on the optimization of gradient descent.
Due to this shortcoming, we need to redefine the cost function of logistic regression:
The function is as follows:
This function is not intuitive, so we can use a curve to look at it:
- 2. When,; Infinite when;
- 2. When,; Infinite when;
This cost function can be understood in one sentence: the greater the difference between the model’s prediction and the actual value, the greater the substitution value.
However, we found that the above cost function is separate, which is not convenient for gradient descent calculation. Can it be integrated together? It really can:
whenThe following term is 0 when, the preceding item is 0, which corresponds to the previous piecewise expression, and then superscript is added to substitute the cost functionIn:
With the cost function, we can use the gradient descent method learned before to iteratively find the minimum cost function!
Logistic regression to actual combat
Finally, we will use the previous logistic regression technology to classify 2 types of data!
6.1 Data Preparation
The data to be classified is visualized as follows. There are only two categories, so just use straight line decision boundaries:
6.2 Hypothesis Function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
Copy the code
6.3 Cost Function
The vectorization code is still used here:
Logistic regression cost function
def cost_function(theta, X, y):
# vectorization code
return np.mean(-y * np.log(sigmoid(X @ theta)) - (1 - y) * np.log(1 - sigmoid(X @ theta)))
Copy the code
6.4 Gradient descent
The principle of gradient descent was introduced in the previous article:
# Gradient calculation
# return gradient one-dimensional array
def gradient(theta, X, y):
return (1 / len(X)) * X.T @ (sigmoid(X @ theta) - y)
Copy the code
6.5 Training Parameters
Here, the existing optimization means are used to optimize the cost function (loss function) :
import scipy.optimize as opt
Train parameters of logistic regression with opt. Minimize
# Newton-CG is a family of Newton methods, using the second derivative matrix of the loss function, namely the Hessian matrix, to optimize the loss function iteratively
res = opt.minimize(fun = cost_function, x0 = theta, args = (X, y), method = 'Newton-CG', jac = gradient)
Copy the code
6.6 Prediction on the training set
# Calculate the predicted value of y on the training set
y_predict = predict(X, final_theta)
Print classified reports
print(classification_report(y, y_predict))
Copy the code
F1-score scores in the report are 0.86 and 0.91 respectively, indicating that the classification results are quite good:
6.7 Displaying classification boundaries
We use the drawing library to plot the predicted classification boundary, and it can be found that the classification boundary can better separate the data of two categories:
OK! Today deng Dragon shared with you the principle of logistic regression and actual combat programming, we practice, learn as soon as possible! The full annotated code is in my repository:
Github.com/DLonng/AI-N…
See you next time 🙂
Machine learning, algorithmic programming, Python, robotics and other original articles, scan code attention to reply to “1024” you know!