Support vector Machine, known as Support Vector Machine in English, is generally referred to as SVM in general terms, it is a binary classification model, its basic model is defined as a linear classifier with the largest interval in the feature space, its learning strategy is to maximize the interval, and finally can be transformed into a convex quadratic programming problem solution. If it's linearly inseparable, then you have to go through the kernel. To understand SVM, we must first clarify a concept: a linear classifier is given a number of data points that belong to two different classes, and now we need to find a linear classifier to classify the data into two different classes. If x (x1, x2, x3, x4...) Represents data points and uses y to represent categories (y can be 1 or -1, respectively, representing two different classes). The learning goal of a linear classifier is to find a hyper plane in the n-dimensional data space.Copy the code

Readers may have questions about taking 1 or -1 as a category. In fact, the 1 or -1 classification criteria originated in logistic regression.

The purpose of Logistic regression is to learn a 0/1 classification model from features, and this model takes the combination of features as the independent variable. Since the value range of the independent variable is from minus infinity to plus infinity, the Logistic function (or sigmoid function) is used to map the independent variable to (0,1)

An example of 2/ linear classification here's a simple example. As you can see in the figure below we now have a two-dimensional plane, and on that plane we have two different kinds of data, represented by circles and crosses. Since the data are linearly separable, the two types of data can be separated by a line that acts as a hyperplane, where all the points on one side of the hyperplane correspond to y of negative 1 and all the points on the other side correspond to y of 1.Copy the code

3/ Function interval and geometric intervalCopy the code

Partition hyperplane function: represented by function f(x). If f(x)=0, then the data point is located on the hyperplane; if f(x)>0, then the data point corresponding to y=1; if f(x)<0, then the data point interval function corresponding to y=-1: The distance between data points and the segmentation hyperplane can be observed whether the symbol of the distance between data points and the segmentation hyperplane is consistent with the symbol of the class marker Y to judge whether the classification is correct. This allows you to adjust the position of the split hyperplane. 4/ The definition of the maximum interval classifier classifies a data point, and the greater the 'interval' from the data point to the segmentation hyperplane, the more reliable the classification will be. Therefore, in order to maximize the reliability of classification, the selected segmentation hyperplane needs to maximize the "interval" value, which is half of the GAP passed in the figure belowCopy the code