Hello and welcome to Your Tuesday machine learning session. Today’s article will continue to cover the SVM model.

Maybe you are tired of seeing the SVM model, think I can not write new tricks, rehash the same old thing. In fact, there is really nothing new, and this is the last article on this topic. We are going to start our deep learning journey next week, and I’m sure many of you have been looking forward to this day for a long time, as have I in fact, because most of the topics in this project are only used in the interview, and I haven’t had an interview for a long time. So let’s gather our excitement and finish up the last bits of SVM.

Although there is only one last point left, today’s content is very important and can be said to be the core end of the SVM model interview. When it comes to SVM possible dual questions, Lagrange interviewers may not ask you, he may not ask himself, but today we are going to talk about the kernel is bound to ask. On the surface, it seems to be the most important content of SVM. In fact, when I just changed my career and prepared for the interview, I knew nothing about SVM model but kernel function. Therefore, I think you should be able to understand the meaning of this.

What exactly is a kernel

First of all, let’s introduce the concept of kernel function. Maybe you will be very curious. Although we have completely derived the principle of SVM model, how can a kernel function appear again? In fact, kernel function is very exciting, and it is also very important for SVM, because it establishes the “arena status” of SVM, which can be said to be the biggest feature of THE SVM model.

Before we get to the kernel, let’s look at a problem that is very famous in the history of machine learning, called the or problem. We all know that in binary there is an operation called an or operation. Or maybe it’s a simple operation that returns 0 if two numbers are the same, and 1 otherwise. If our data were similar or composed, it would look something like this:


If we look at the diagram above, one problem is that there is no way we can find a line to complete this classification. Because a line can only divide two regions, but the diagram above clearly has four regions.

So if we map the above data to higher dimensions, the above is a two-dimensional image, and we map it to three dimensions, we can use a plane to separate the samples. In other words, through a mapping function, the sample is mapped from the N dimension to the n+1 or higher dimension, so that the previously linearly indivisible data becomes linearly separable, so that we can solve some problems that could not be solved otherwise.


So what’s the kernel? Is a collection of functions whose input is sample X and whose output is a sample mapped to higher dimensions. Most of the functions that can achieve this can be considered kernel functions (not exactly, just for the sake of easy understanding), of course, some bizarre functions are kernel functions, but they may not be very valuable to us, so we rarely use them, there are still only a few commonly used kernel functions.

Method of use

Now that we know what a kernel is, how does it work?

This problem is not difficult, mathematically the more difficult problem is the representation problem, it’s probably the hardest thing to describe and express a problem, and it’s probably a lot easier to solve it once you’ve expressed it. So let’s start with the problem, with one letterTo represent the kernel. Well, as I said, the input to the kernel is sample x, so the sample after the mapping is going to be.

Remember that formula that we derived to the end last time? So let’s write it out, so you can review it.


All we have to do is plug in the kernel, that’s all, and when we plug in, we get:


There’s a little bit of a problem here. We talked about functionsIt maps x to higher dimensions. Let’s say x itself is 10 dimensions, and we’ve mapped it to 1000 dimensions by using the function, and of course the problem of linear inseparability may be solved, but that brings up another problem, which isThe computational complexity increases. Because the originalIt used to take 10 computations, but now that I’ve mapped it, it takes 1,000 computations to get the result. That doesn’t fit with our desire to be a free prostitute, so we put some restrictions on the kernel function,Only mapping functions that are debatable are called kernel functions.

So let’s write down the conditions that we need to satisfy, and it’s really easy. We call the kernel function satisfying the condition K, then K should satisfy:


So that means K pairsThe calculation of the result is equivalent to the dot product operation of the result after the mapping, so that the mapping can be completed under the condition of constant computational complexity. In fact, there is a mathematical definition of a kernel, and I didn’t put it out here, because one is that it’s too complicated to use, and the other is that you wouldn’t ask for it in an interview, but you just need to know its properties. Because there are only a few common kernel functions that go back and forth, let’s just remember them.

Let’s take a look at the common kernel functions. There are about four kinds:

  1. Linear kernels, in fact, have no kernels. So let’s write it out
  2. Polynomial kernel function, which is equivalent to a polynomial transformation: b and d here are the parameters that we set
  3. Gaussian kernel, which is used a lot,
  4. Sigmoid kernel, whose formula is:

And the way we’re going to use the kernel is very simple, we’re going to use the function KTo replace the originalAs a result of,It has no influence on the deduction of SVM model. That’s why we used the SMO algorithm in the last article, rightWhen optimizing the method, makeIn fact, it is paving the way for the later explanation of kernel functions.

I personally feel that compared with the previous hard interval soft interval and the duality problem and the derivation of SMO algorithm, the principle of kernel function should be the simplest. You can understand the kernel function even if you don’t understand the principles of the SVM model at all. So you should read this article is not too much pressure.

This is the end of the article, if you like, I hope you can come to a wave of support, thank you again for reading (follow, forward, like).

Original link, ask a concern

This article is formatted using MDNICE

– END –