Preface:

In the course of support vector machine, I watched hu Haoji’s class of Zhejiang University on the Chinese MOOC website. I felt that the Chinese lecture was more clear and easy to understand, and then I watched Zhou Zhihua’s Machine Learning. After watching the video and then reading the melon book to understand the above formula is how to come, maybe for beginners or native language video teaching better understand. At the beginning, it was really a headache to read melon books, which were full of various mathematical formulas and no derivation process.

1. Support Vector Machine

Definition: divisible and indivisible



A figure that has a straight line separating different pieces of data is linear separable.



If you can’t separate the different data with a straight line, that’s linear nonseparable.

This theory can be extended to three dimensional, even thinking above the feature space. Three dimensions use planes to separate data. Four or more dimensions cannot be drawn because human beings cannot intuitively perceive them, but they can separate data. Such planes are called hyperplanes.

Two-dimensional linearly separable case:

Mathematical Definition:

Vector definition:

2. Support vector machines

Problem Description:

For linearly separable cases, how to find the optimal classification hyperplaneIt can be seen from the figure that the middle hyperplane is optimal, which has the best tolerance for dividing data. For example, it is better than the limitation of data in the training set. Data outside the training set may lead to wrong classification results on the classification line. However, the execution in the second diagram is the least affected and is less likely to be affected by new data to produce incorrect results.

The optimal classification line sought by support vector machine should meet the following requirements: 2.1 The line separates two categories 2.2 the line maximizes the interval (margin) 2.3 The line is in the middle of the interval and is equidistant from all support vectors

The same rules apply to linearly separable multidimensional feature Spaces, except that straight lines become hyperplanes.

3. Derivation of linear separable problem of mathematical definition:

It took me a long time to understand the formula of this part, watching many videos and comparing with books and materials. The further I went, the more I realized how little I knew about math. First let’s look at the formula for the distance from a point to the plane:And you can actually view this formula as the area of the parallelogram divided by the base, and then the height is the distance from the plane.

Then we derive the formulas for linearly separable problems: Let’s start with two existing formulas: Fact 1:

Fact 2

So the formula for the distance from any point in our sample space to the hyperplane is omega x plus b where omega is a normal vector, which determines the direction of the hyperplane, and b is the displacement term and the distance between the hyperplane and the origin.

Derive from facts 1 and 2:

To find the hyperplane with the greatest spacing, find the parameters ω and b so that D is the largest. So the problem is converted into, | | | omega | minimum value.

So optimization problems as: minimize 1/2 * | | | omega | ^ 2 (this and minimize | | omega | | is the same, just for the sake of convenient derivation is this form)

This formula is a quadratic programming problem, so our problem of omega, b becomes a quadratic programming problem.Because quadratic programming either has no solution, or it has a unique minimum. Because in convex optimization problems, there must be a unique minimum solution, so the quadratic programming problems are convex optimization problems must have an optimal solution. So as long as the following transformation to quadratic programming, again convex optimization problem. We can use tools that already exist to solve convex optimization problems.

Remark:

1. Support vector machine still needs to learn the kernel function, duality problem, these sections have not been learned, so it is not written here. 2. Next week, I will study the kernel function and the duality problem, and then I will study the Lagrangian days multiplication, which will be used in the derivation of the duality problem formula.