Function model

In the previous section we discussed general linear regression. This time we will discuss generalized linear regression, also known as Logistic regression. Let’s first recall the functional model of linear regression:


This function represents a straight line, a plane, or a hyperplane, but it has a fatal disadvantage, and it does not do a very good fit when it comes to a classification problem, let’s take a dichotomous example, if, say, there is a training set., it is difficult for us to carry out a classification, and it cannot fit a surface. For a surface, we want to use linear function for fitting, and hope that the fitting process can be as good as possible. Then, we can transform the above equation:


On the whole, through the logistic regression model, we map x in the whole range of real numbers to a limited number of points, so as to achieve the classification of X. Because every time you take an X, by logistic regression, you can put it into some type of Y.

Boundary decision function

This function name is my own a habitual name, here we briefly introduce several common functions. Let’s take dichotomies as an example, and for a dichotomies problem, the simplest way we can do it is when, then., then, you might immediately think of piecewise functions (we call themThresholdFunction) :


Is this function good enough for classification? At least in terms of difficulty and accuracy, it must be good. However, since the function is not differentiable and the piecewise gradient is always zero, we obviously have a lot of trouble with subsequent processing, so we usually don’t use this method. Here we directly introduce our Sigmoid function:


The Sigmoid function is usually used as the default configuration for logistic regression, neural network, etc. Although it can be used as a default function, it also has disadvantages. We see an image of it:

Obviously, given that our initial input value is too large, Sigmoid is still limited toSo, at this point, when we take the derivative of the Sigmoid function, we find that the derivative of the Sigmoid function is close to zeroEase of saturatedSo we should try to modify the initial value, and the common method is scaling, which we’ll talk about later.

Moreover, the Sigmoid function has the disadvantage of having a non-zero output expectation, which is a very bad thing in neural networks, as we will also explain later.

For Sigmoid functions, the first weakness is obviously easy to fix, but the second is more fatal. Therefore, Hyperbolic Tangent is also commonly used:


Its image:

In fact, the tanh function is a variant of the Sigmoid function:


However, in order to speed up the operation, we usually only use Sigmoid function, so our Logistic function model is:


Cost function

Here we give the cost function without explanation. The selection and use of the cost function will be explained in great detail in the next section.


We don’t introduce the concept of regularization in the current study, and I don’t know if you still remember this formula, which is actually our cross entropy formula. In the next section, we will deduce this cost function and explain the optimal use range of each cost function in detail.

My nuggets :WarrenRyan

My Jane book :WarrenRyan

Welcome to my blog to get the first update of blog.tity. Online

My lot: StevenEco