Five nuclear method

Kernel methods are widely used in machine learning. They are a flexible technique that can be used to extend algorithms such as SVMs to define nonlinear decision boundaries. Similar extensions can be made to other algorithms that rely only on the inner product between sample points, many of which will be examined in later chapters. The main idea behind these methods is to implicitly define inner products in higher-dimensional Spaces, based on so-called kernels or kernel functions, under some symmetric and positive definite techniques. Replacing the original inner product in the input space with a positive definite kernel immediately extends algorithms like SVMs to linear separation in this higher-dimensional space or, equivalently, to nonlinear separation in the input space. In this chapter, we give the main definitions and key properties of positive definite symmetric nuclei, including their proofs for defining inner products in Hilbert Spaces, and their closure properties. We then extend these verified SVM algorithms and give some theoretical results, including a general edge-based learning guarantee for a kernel-based hypothesis set. We also introduce negative definite symmetric nuclei and show that they are related to the construction of positive definite nuclei, especially in terms of distance or metric. Finally, we illustrate the design of nonvector discrete structures by introducing rational kernels of general kernel families of sequences. We describe an efficient algorithm for computing these cores and illustrate it with several examples.


5.1 introduction



Figure 5.1Nonlinear separable case. The sorting task involves distinguishing solid squares from solid circles. (a) No hyperplane can separate the two populations. (b) can be replaced by a nonlinear mapping.

In the previous chapter, we propose a linear classification algorithm, SVMs, which is effective in application and benefits from strong theoretical proof. In practice, linear separation is usually impossible. Figure 5.1a shows an example where any hyperplane crosses two populations. However, a more complex function can be used to separate the two collections, as shown in Figure 5.1b. One way to define such nonlinear decision boundaries is to use slave input space
X X
Go to higher dimensions
H \mathbb H
Nonlinear mapping of
Φ Φ
Where linear separation is possible.

In practice,
H \mathbb H
This dimension can be very large indeed. For example, in the case of document classification, you might want to feature a sequence of three consecutive words, such as trigonometry. Thus, with a vocabulary of only 100,000 words, the dimension of the feature space
H \mathbb H
To achieve
1 0 15 10 ^ {15}
. On the positive side, the boundaries are given in Section 4.4. The results show that, significantly, the generalization ability of SVMs and other large scale classification algorithms does not depend on the dimension of the feature space, but only on the edge
rho rho
And number of training samples
m m
. So if there is a favorable edge
rho rho
, such algorithms can succeed even in very high dimensional Spaces. However, determining hyperplane solutions requires multiple inner product computations in higher-dimensional Spaces, which can be very expensive.

One solution to this problem is to use a kernel method based on the kernel or kernel functions.

Defining 5.1 Kernels

A function K: X×X→RX\times X\to mathbb RX×X→R is called a kernel X. Our idea is to define a kernel function KKK for any two points X, X ‘∈Xx, X ‘\in Xx, X’ ∈X, K(X, X ‘)K(X, X ‘), K (x, x ‘) is equal to the vector Φ (x) Φ Φ (x) (x) and Φ Φ (y) (y) Φ inner product of (y) : ^ 1 to 1


x . x X . K ( x . x ) = < Φ ( x ) . Φ ( x ) > . ( 5.1 ) \ qquad \ qquad \ qquad \ forall x, x ‘\ in x, K (x, x’) = < Φ (x), Φ (x ‘) > \ qquad \ qquad \ qquad (5.1)

1. To separate the inner product from an input space, it is usually denoted by <·, ·>.

For some mappings φ :X→H φ :X\to \mathbb H φ :X→H to the Hilbert space H\mathbb HH — called the eigenspace. Because the inner product is a measure of similarity between two vectors, KKK is often interpreted as a measure of similarity between XXX elements of the input space. An important advantage of such a nuclear KKK is efficiency: KKK calculations are typically much more efficient than the inner product of phi phi phi and H\mathbb HH. We will see a few common examples of K (x, x ‘) K (x, x ‘), K (x, x ‘) calculation can be O (N) O (N) O (N), and < Φ (x), Φ (x ‘) > < Φ (x), Φ (x ‘) > < Φ (x), Φ (x ‘) > usually requires O (dim (H)) O (dim (\ mathbb H)) O (dim) (H), and the dim (H) ≫ Ndim \ \ mathbb (H) gg Ndim ≫ N (H). In addition, in some cases, his dimensions are infinite. Perhaps a more important benefit of this kernel function KKK is flexibility: there is no need to explicitly define or calculate the mapping phi phi phi phi. As long as the existence of φ φ is guaranteed, that is, KKK satisfies the Mercer conditions (see Theorem 5.1), the nuclear KKK can be arbitrarily selected.

Theorem 5.1 Mercer conditions

Let X⊂RNX\subset\mathbb R^NX⊂RN be a compact set, and let K:X×X→RK:X\times X\to mathbb RK:X×X→R be a continuous symmetric function. KKK then recognizes the uniform convergent expansion of the form


K ( x . x ) = n = 0 up a n ϕ n ( x ) ϕ n ( x ) . K(x,x’)=\sum^{\infty}_{n=0}a_n\phi _n(x)\phi_n(x’),

For any square integrable function CCC ∈ L2 (X) (c) (c \ in L_2 (X)) (c ∈ L2 (X)), an > 0 a_n an > > 0 0, the following conditions:


X x X c ( x ) c ( x ) K ( x . x ) d x d x p 0. \int\int_{X\times X}c(x)c(x’)K(x,x’)dxdx’\ge 0.

This condition is very important to ensure the convexity and convergence of SVMs and other optimization problems. Under the assumption of the theorem, a condition equivalent to Mercer’s condition is that the kernel is positive definite symmetry (PDS). This property is actually more general because it does not require any assumptions about XXX. In the next section, we give the definition of this property, give several examples of PDS nuclei in common use, then prove the inner product of PDS nuclei in Hilbert Spaces, and prove several general closure properties of PDS nuclei.

5.2 Positive definite symmetric nuclei

5.2.1 definition

Definition 5.2 Positive definite symmetric kernel

A kernel k:X×X→Rk:X\times X\to mathbb Rk:X×X→R is said to be positive definite symmetry (PDS) if for any {x1,…. , xm} ⊆ X \ {x_1,… ,x_m \} \subseteq X{x1,…. , xm} ⊆ X, matrix K = (xi, xj) [K] ij ∈ Rm * m \ mathbf K = (x_j x_i) [K] _ {ij} \ \ mathbb in R ^ {m} \ times m K = (xi, xj) [K] ij ∈ Rm * m is positive semi-definite symmetric (SPSD).

K\mathbf KK is semidefinite symmetry if it is symmetric and one of the following two equivalent conditions holds:

  • The eigenvalues of KKK are non-negative
  • For any column vector C=(c1,… , T ∈ Rm * 1 cm) c = (c_1,… ,c_m)^T\in \mathbb R^{m\times 1}C=(c1,… , T ∈ Rm * 1 cm),

c T K c = i . j = 1 n c i c j K ( x i . x j ) p 0. ( 5.2 ) \qquad\qquad\qquad c^T\mathbf Kc=\sum^{n}_{I,j=1}c_ic_jK(x_i,x_j)\ge 0. \qquad\qquad\qquad\qquad \qquad c^T\ sum^{n}_{I,j=1}c_ic_jK(x_i,x_j)\ge 0.

For a sample S=(x1… , xm), K=[K(xi,xj)]ij∈Rm×mS= (x_1… , x_m), \ mathbf K = (x_j x_i) [K] _ {ij} \ \ mathbb in R ^ {m} \ times m S = (x1,… , xM), K=[K(xi,xj)]ij∈Rm×m is called kernel matrix or Gram matrix related to KKK and sample SSS.

Let us stick to the term: the kernel matrix associated with a positive definite kernel is semidefinite. That’s the correct mathematical term. However, readers should be aware that in the context of machine learning, some authors choose to use positive definite kernels instead of positive definite kernels, or use new terms such as positive semidefinite kernels. Here are some standard examples of PDS kernels commonly used in applications.

Example 5.1 Polynomial kernel