This is my second day of the Gwen Challenge

1. Linear support vector machines

The idea of support vector machines is to give a set of training samples
D D
, find a partitioned hyperplane in the sample space, as shown in the following figure:

But if you encounter some data, it is not so easy to separate it with a plane, such as the following ring data:

In reality, there are still a large number of nonlinear data, and solving nonlinear problems is much more complex than linear problems, and the cost of resources will increase exponentially, so we use kernel trick.

2. The kernel function

If the original space is finite dimensional, that is, the number of attributes is finite, then there must be a high-dimensional feature space that makes the samples separable. 1

To put it simply, it is to map data to a higher dimensional space and make the mapping of data in this higher dimensional space linearly separable. The text may not be easy to understand, but if we look at the picture, it will be more intuitive.

As for the annular data above, the distribution in 3d space after kernel function mapping is utilized, as shown in the figure below:

RBF (Gaussian) maps of origin (0, 0) calculated using SciKit-learn, with code attached at the end.

It’s not hard to see that points in this three-dimensional space can be easily separated by a plane, so that you don’t have to understand the nonlinear problem.

2.1. Radial Basis Function

A radial basis function is a scalar function that is symmetric along the radial direction. Usually defined as a monotone function of the Euclidean distance between any point XXX in space and some center XCX_CXC. The farther away XXX is from the center, the smaller the value of the function. 2

The general RBF nucleus also refers to the Gaussian nucleus, which is in the form of:


κ ( x i . x j ) = e x p ( x i x j 2 2 sigma 2 ) \kappa(\boldsymbol{x}_i,\boldsymbol{x}_j)=\mathrm{exp}\left(- \frac{\left\|\boldsymbol{x}_i-\boldsymbol{x}_j\right\|^2}{2\sigma^2}\right )

Where σ>0 sigma > gt 0σ>0 is the bandwidth of the Gaussian kernel, xi\boldsymbol{x}_ixi is the third data.

2.2. Compute the kernel function

Computing kernel function can use sklearn sklearn. Metrics. The pairwise. Rbf_kernel to calculate: (the specific code, can be downloaded on my lot, if you have any help to you, hope you can give me a star.

>>> from sklearn.metrics import pairwise

>>> # draw circles data
>>> X, y = make_circles(100, factor=1., noise=1.)
>>> # calculate the rbf (gaussian) kernel between X and (0, 0)
>>> K = pairwise.rbf_kernel(X, np.array([[0.0]]))
>>> print(K)
[[0.58581766]
 [0.74202632]... [0.63660304]
 [0.98952965]]
Copy the code

Using this transformation, we can use SVM to find a plane between the data that can distinguish the two categories:


  1. Machine Learning, Chapter 6, Support Vector Machines, 6.3 kernel functions ↩
  2. Baidu Baike, Gaussian kernel ↩