This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together

Writing in the front

Logistic regression involves advanced mathematics, linear algebra, probability theory and optimization problems. This paper tries to explain Logistic regression to readers in the most simple and easy to understand way, with less formula principle and more visualization case as the principle. If allergic to mathematical formulas, causing discomfort, at your own risk.

Logistic regression principle and derivation

Although there is a word of regression in Logistic regression, this algorithm is a classification algorithm. As shown in the figure, there are two types of data (red dots and green dots) distributed as follows. If two types of data need to be classified, we can divide them by a straight line (W0 * x0 + W1 * x1+ W2 * x2). When a new sample (x1,x2) needs to be predicted, it is put into the line function, the function value is greater than 0, the green sample (positive sample), otherwise the red sample (negative sample). In high-dimensional space, we need to get a hyperplane (a straight line in two dimensions, a plane in three dimensions, and n-1 hyperplane in n-dimensions) to slice our sample data, which is actually to find the W parameter of the hyperplane, which is very similar to regression, so it is named Logistic regression.

The sigmoid function

Of course, we do not directly use the z function, we need to convert the z value to the interval [0-1], the converted z value is to judge the probability that the new sample belongs to the positive sample. We use the sigmoid function to complete the transformation process, as follows. By looking at the Sigmoid function, as shown in the figure, σ is greater than 0.5 when z is greater than 0 and less than 0.5 when z is less than 0. By sigmoID function, Logistic regression is essentially a discriminant model based on conditional probability.

The objective function

In fact, we now is o W, how to evaluate W, we look at the picture below, we can see the second figure line segmentation is best, in other words, to make these sample points away from the straight line as far as possible, so for the arrival of the new sample, also has the very good division, said that how to use the formula and calculation of the objective function?

We apply the Sigmoid formula to the z function:

The following formula can be derived from conditional probability, and the formula is integrated into one, as shown below.

And then we need to maximize our target function to solve for theta.

Gradient ascent method

Before we introduce gradient ascent, let’s look at a middle school lesson: find the maximum value of the following function at x equals.

Solution: Take the derivative of f(x) : 2x, set it to 0, find x=0, take the maximum 0. However, when the function is complex, it is also difficult to calculate the extreme value of the function to calculate the derivative, so gradient ascending method is needed to approach the extreme value step by step through iteration. The formula is as follows: We approach the extreme value step by step along the direction of the derivative (gradient).

The gradient algorithm is used to calculate the x value of the function:

def f(x_old): return -2*x_old def cal(): X_old = 0 x_new = -6 eps = 0.01 presision = 0.00001 while abs(x_new-x_old)>presision: X_old =x_new x_new= X_old + EPS *f(x_old) return X_new -0.0004892181072978443Copy the code
Objective function solution

Here, we take the partial derivative of the function and get the iterative formula as follows:

Logistic regression practice

data

Read in data and plot it:

def loadDataSet(): dataMat = []; LabelMat = [] fr = open(' data /Logistic/ testset.txt ') for line in fr.readlines(): LineArr = linear.strip ().split() datamat.append ([1.0, float(lineArr[0]), lineArr = linear.strip ().split() Datamat.append ([1.0, float(lineArr[0]), float(lineArr[1])]) labelMat.append(int(lineArr[2])) return dataMat, labelMatCopy the code
Training algorithm

Using gradient iteration formula, calculate W:

Def sigmoid(inX): return 1.0/(1 + np.exp(-inx)) def gradAscent(dataMatIn, labelMatIn): LabelMat = Np.mat (labelMatIn).transpose() m,n = Np.shape (dataMatrix) alpha = 0.001 maxCycles = 500 weights = np.ones((n,1)) for k in range(maxCycles): h = sigmoid(dataMatrix * weights) error = labelMat - h weights = weights + alpha * dataMatrix.transpose() * error return  weightsCopy the code

View the classification results by calculating weights drawing:

Advantages and disadvantages of the algorithm

  • Advantages: Easy to understand and calculate
  • Disadvantages: low accuracy