The first problem
Linear regression approach
So we can do it by linear regression first
Let’s set up some data, train it, get a line
And then we can make a judgment
Can also be solved
But there are problems
When we retrain a model with this data
He’s going to have a problem, the complexity of the sample increases, and this method might not be good
So we need to use a new method, which is logistic regression
Logistic regression
So what do we do with logistic regression
We need to prepare a logistic regression equation
As the sample
explain
Since we are in this classification problem, we reduce the classification problem to a probability that he is in a certain category, then we judge him to be 1 when the probability is greater than 0.5, and 0 when the probability is less than 0.5.
How to use it
And we’re going to use these three numbers to figure out what we saw before, the drain of the pool
When x is equal to minus 20, 1 plus e to the 20 is very, very big, so 1 over 1 plus e to the 20 is very close to 0 and the rest of the data, if you take it in, you'll see that it fits the reality.Copy the code
That pool water is also a simple problem, and then look at the complex problem how to do
A complex problem
Let’s look at this case
So all of these are simple cases, so what do we do for a little bit more complicated cases
Let’s look at this problem
We can see that this classification, graphically, can be separated by a straight line
If we look at this line, we can see that the red area is above the line, and the blue area is below it
We can take the line, figure it out, and plug it in to the exponent of e
Like this,
And just to do that, let’s take this line
g((x,y)) = y -x - 1
If you plug the data into the logistic regression equation, you can figure out that if this point is above the line then e to the power will be positive, and if this point is below the line e to the power will be negative.
Is not found with the previous problem of water release
Now look at this situation
Let’s do the same thing. Let’s create a line that separates the two types of data
Let’s analyze that this is a circle with center 0,0 and radius 2
So g of (x,y) is equal to x squared plus y squared minus 4
Again, plug this g of x into the original logistic regression equation e to the exponent
And you get the logistic regression equation for the classification
Unified the
Let’s call that line that distinguishes the value or the classification of the point the decision boundary of this model. Okay
To summarize
Solve logistic regression model
Let me remind you of linear regression
First of all, in linear regression, we find a loss function.
Then, with the aim of finding the minimum point of the loss function, we adjust the loss function, give a step size, and perform gradient descent to know the convergence of the function.
Logistic regression
So we do the same thing with logistic regression, we find a loss function
But for our logistic regression, the actual numbers are 0 and 1. We would expect that if the prediction doesn’t match the reality, the loss function should be very large. And similarly, if they’re right, we want this value to be infinitely close to 0
Loss function solution
So let’s sort out our loss function
So with that function, we’re going to apply gradient descent
And for this task, let’s do it, so we’re essentially doing gradient descent for the boundary function. According to the shape of different boundary lines, it is determined to be a function of several times and several elements. Then, the data of a random point is given and adjusted according to the direction and step diameter of the gradient. The data is stored in a temporary data, and then the next point is found and compared with the temporary data until the two data are equal. I know the solution to this regression model.