How do we use it when we know the parameterswithKernel functiontheSupport vector machine (SVM)What about training the hypothesis function?


First, we use samples as markers:

For each x, calculate its feature f, which is a vector:

If the following formula is true, y can be predicted to be equal to 1, which is the positive sample:


But how do you pick the parameters? When we minimize the following function, we get the argument vector:


There are now two coefficients left unchosen, C and σ^2. C is equal to 1/λ, and we learned earlier that a change in λ affects high bias or high variance. Therefore, it can be selected according to the influence of 1/λ on high deviation or high variance.


Now let’s actually observe the influence of C on SVM with data. When C is very small, the decision boundary for the following data is as follows:

If we replace C with 100, we can look at the decision boundary:

Now do you have an intuition for C?


When σ^2 is large, the graph is:

Feature F changes smoothly, so it shows high bias and low variance. Sigma ^ 2 Very small, the image is:

Feature F changes dramatically, so it is characterized by high variance and low bias.


So, a completeSupport vector machine algorithm using kernel functionEven if it’s done.



How to choose between using logistic regression orSVM


If the number of features is much larger than the sample size, logistic regression or linear kernel function (SVM without kernel function) is used.


If the number of features is small and the number of samples is just right, a Gaussian kernel is used.


If the number of features is small and the number of samples is very large, logistic regression or linear kernel function (SVM without kernel function) is used.





Ps. This article is based on the study notes of Ng’s machine learning course. If you want to learn machine learning together, you can follow the wechat public account “SuperFeng”, looking forward to meeting you.