Dear friends, these days, I have weakly looked at the old Support Vector Machine (SVM) and Support Vector Regression (SVR), and found that I know too little and too weak. I need to improve my basic knowledge.

Principle reference of SVM

http://en.wikipedia.org/wiki/Support_vector_machine

http://zh.wikipedia.org/wiki/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%9C%BA

There are several methods for the processing of class K, that is, multi-class problems in SVM (excerpted from a bad book: Support Vector Machine Theory and Application Analysis by Fang Ruiming) :

(1) One against all: k SVM classifiers were designed;

(2) All against All /one Against one: Design two k(K-1)/2 SVM classifier.

(3) Error correcting output codes(ECOCs) : Roughly using a similar way to Hanming codes to remove output errors, not too carefully studied, but this method may be useful to me in the future. P.S. Hanming coding: use multi-bit coding and little bit data, take processing to avoid signal errors caused by channel noise, use Hanming matrix, roughly refer to (2005)A study on Error Correcting Output codes.pdf, said A little bit.

(4) One-time classification: solve the problem by one-time optimization. For each category, w_I and B_I are designed, and the w_i X + B_I corresponding to the real category is greater than the w_I X + b_I of other categories for training. The solving goal is to minimize the sum of norms of all W_I, and relaxation variables multiplied by the number of samples and the number of categories can also be introduced.

LIBSVM takes (2), and votes. If two classes are the same, it selects the lower-numbered class -_-b.

Main ideas of SVR:

(1) Regression is basically fitting, using a function to fit the relationship between X and Y. For SVR, x is a vector, y is a scalar, and the fitting function form is y=W^T*g(x)+b, where G (x) is the eigenspace vector corresponding to the kernel function.

(2) SVR considers that as long as the estimated Y is within a fixed range (Epsilon) on both sides of the actual Y, it is considered to be correct without any loss;

(3) the SVR optimization goal is minimum, | | W y – so x curve slope of the youngest, the most flat, the function that it is said that can increase the robustness of estimation.

(4) The following things are natural, just like SVM: soft margin can be used to control with a small positive number. Use the dual formula to solve;

(5) But with one difference, the epsilon value of the control range is difficult to determine, and a C* is added to the minimum optimization goal

U *epsilon, where epsilon is a variable and nu is a pre-given positive number.

Incremental learning in SVM can be adopted in several ways:

(1) Based on the KKT conditional method, the new training samples that do not conform to the KKT (Karush-Kuhn-Tucker) condition of the trained classifier are selected from the new training samples to form the new training set with the original support vector, and the same process is repeated.

(2) Batch-SVM: original support vector + new training samples for training;

(3) Incremental learning method: this is more complicated and requires more iterations.

Some other things about SVM:

(1) Remove the non-support vectors in the training data (including the correctly classified samples outside the interval band in the soft interval problem), and the optimization result remains unchanged, because those are the ineffective constraints in the original optimization problem, and there are global optimal solutions;

(2) Hard interval SVM and two norm soft interval SVM(L2SVM) have unique solutions, while one norm soft interval SVM(L1SVM) may not have unique solutions.

The following are my study notes. Your comments are welcome

The picture is from jianshu App