In SVM board | is derived by the largest interval target loss Function derivation of evolution in the whiteboard hand pushed on the principle of SVM, and introduces the hard interval implementation principle of kernel Function and formula derivation, in this section, I’ll detailed introduce the SVM in Keynel Function.
All along, we only know that kernel function can make SVM achieve nonlinear separability in high dimensional space. Then, under what circumstances is kernel function proposed? What kinds of kernels are there?
This article explains SVM kernel function from two perspectives.
- Nonlinearity leads to high-dimensional transformations (model angles),
- Dual representation brings inner product (optimization Angle),
From linear to linear indivisible
The following table introduces the model evolution results of perceptron PLA and SVM from linear separable to nonlinear separable.
Linear separable | A little bit wrong | Strict nonlinearity |
---|---|---|
PLA | Pocket Algorithm | + PLA |
Hard-Margin SVM | Soft-Margin SVM | + Hard-Margin SVM |
And in the case of linearly indivisible, what if I make the model linearly divisible? So, as I said, there are two ways to think about it.
1. Nonlinearity brings high-dimensional transformation, introducing
We know that features in higher dimensions are more linearly separable than features in lower dimensions, that’s a theorem, that’s provable, that’s all we need to know.
So, we can think of a way to map features in the input space to higher dimensions by a function.
Let’s say I have a point in my input spaceTheta is two dimensional, we go through a functionMap it to three dimensions, from two-dimensional to three-dimensional space is expressed as:
2. Dual representation brings inner product and introduces kernel function
From another point of view, we have deduced the loss function of SVM before. In the duality problem of hard-margin SVM, the final optimization problem is only related to the inner product of X, that is, the support vector.
From this, we can write the inner product of X asThe inner product.
And in our real life, maybeNot in three dimensions or higher than that, but in infinite dimensions, soIt’s going to be very hard to find.
Another way to think about it, all we care about isI don’t care. Is there a way to find the inner product directly? The answer is yes.
We can introduce the kernel function keynel function.
For one of the kernels above, we can find the inner product of X directly, rather than in higher dimensions.
We can sum up three things about the kernel.
- When linearly indivisible, we can map the features in the input space to higher dimensional space to achieve linearly separable.
- In higher dimensional space, because of the calculationIt’s very difficult becauseThere could be infinite dimensions.
- Therefore, we introduce the kernel function to change the inner product that needs to be calculated in the higher dimensional space into the inner product that needs to be calculated in the input space, which can also achieve the same effect, thus reducing the calculation.
There are conditions for the kernel
Theorem: 令 Is the input space,Is defined in theIs the symmetric function of, thenIs a kernel if and only if for arbitrary data, “Nuclear matrix”Always semidefinite:
The theorem shows that a symmetric function can be used as a kernel function as long as its corresponding kernel matrix is semi-positive definite. In fact, for a semi-positive definite kernel matrix, a mapping can always be found. In other words, any kernel implicitly defines an eigenspace called a regenerated kernel Hilbert space.
Common kernel functions
As described above, the choice of kernel function is very important to the performance of nonlinear support vector machines. However, it is difficult for us to know the form of feature mapping, so we cannot choose the appropriate kernel function for target optimization. Therefore, “selection of kernel function” is called the biggest variable of support vector machine. We commonly have the following kernel functions:
In addition, it can be obtained by combining functions, for example:
- if 和 Is the kernel function, then for any positive number, its linear combination is also a kernel function.
- if 和 Is the kernel function, then the direct product of the kernel function is also the kernel function.
- If k_1kIf 1 is the kernel, then for any functionIt’s also a kernel.
For nonlinear cases, SVM processing is to select a kernel functionIn order to solve the problem of linear inseparability in the original space by mapping data to higher dimensional space. Due to the excellent quality of the kernel function, such a nonlinear extension is not much more computationally complex, which is very rare.
This is, of course, thanks to the kernel method — any method that will compute the inner product representing data points, other than SVM, can use the kernel method for nonlinear scaling.