Visual interpretation of hyperparameters of support vector machines

Author | Soner Y ı ld ı r ı m compile | source of vitamin k | forward Datas of Science

Support vector machine (SVM) is a widely used supervised machine learning algorithm. It is mainly used for classification tasks, but is also suitable for regression tasks.

In this article, we will delve into two important hyperparameters of support vector machines, C and Gamma, and visually explain their effects. So I assume you have a basic understanding of algorithms and focus on these hyperparameters.

Support vector machines use a decision boundary to separate data points belonging to different categories. In determining decision boundaries, soft interval support vector machines (soft margin means allowing certain data points to be misclassified) attempt to solve an optimization problem with the following objectives:

Increase the distance between the decision boundary and the class (or support vector)
Maximize the number of correctly classified points in the training set

Obviously, there is a compromise between these two goals, which is controlled by C, which adds a penalty for each misclassified data point.

If C is small, the penalty for misclassification points is low, so choosing a decision boundary with a large interval comes at the expense of more misclassification.

When the value of C is large, support vector opportunities minimize the number of misclassified samples because punishment results in smaller intervals at decision boundaries. For all cases of misclassification, the penalty is different. It’s proportional to the distance from the decision boundary.

It will become clearer after these examples. Let’s start by logging in and creating a synthetic dataset.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.svm import SVC
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=200, n_features=2,
n_informative=2, n_redundant=0, n_repeated=0, n_classes=2,random_state=42)

plt.figure(figsize=(10.6))
plt.title("Synthetic Binary Classification Dataset", fontsize=18)
plt.scatter(X[:,0], X[:,1], c=y, cmap='cool')
Copy the code

We first train a linear support vector machine that only needs to adjust C, then implement a support vector machine with an RBF kernel and adjust the Gamma parameter.

In order to draw the decision boundary, we will use the Jake VanderPlas written in Python function in data science SVM in the manual chapter: jakevdp. Making. IO/PythonDataS…

We can now create two linear SVM classifiers with different C values.

clf = SVC(C=0.1, kernel='linear').fit(X, y)

plt.figure(figsize=(10.6))
plt.title("Linear kernel with C = 0.1", fontsize=18)
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='cool')
plot_svc_decision_function(clf)
Copy the code

Simply change the C value to 100 to generate the following plot.

As we increase C, the interval gets smaller. Therefore, the model with low C value is more general. As the data set grows, the differences become more pronounced.

The hyperparameters of the linear kernel affect only to a certain extent. In the nonlinear kernel, the influence of hyperparameters is more obvious.

Gamma is a hyperparameter for nonlinear support vector machines. One of the most commonly used nonlinear kernel functions is the radial basis function (RBF). RBF’s Gamma parameter controls the influence distance of a single training point.

A lower gamma indicates a larger radius of similarity, which results in more points being grouped together. In the case of a high gamma, the points must be very close to each other to be considered the same group (or class). As a result, models with very large gamma values tend to overfit.

Let’s draw predictions for three support vector machines with different gamma values.

clf = SVC(C=1, kernel='rbf', gamma=0.01).fit(X, y)
y_pred = clf.predict(X)

plt.figure(figsize=(10.6))
plt.title(Predictions of RBF kernel with C=1 and Gamma=0.01", fontsize=18)
plt.scatter(X[:, 0], X[:, 1], c=y_pred, s=50, cmap='cool')
plot_svc_decision_function(clf)
Copy the code

Simply change gamma to generate the following plot.

As the gamma value increases, the model becomes overfitted. Data points need to be very close to each other to be grouped together because the radius of similarity decreases as the gamma value increases.

At gamma values of 0.01, 1 and 5, the accuracy of RBF kernel function is 0.89, 0.92 and 0.93, respectively. These values indicate that the fitting degree of the model to the training set increases with the increase of Gamma.

Gamma and C parameters

For a linear kernel, we just need to optimize the C parameter. However, if the RBF kernel function is to be used, both the C and gamma parameters need to be optimized simultaneously. If gamma is large, the effect of C is negligible. If gamma is small, c affects the model just as it affects the linear model. Typical values for C and gamma are as follows. However, depending on the application, there may be a specific optimal value:

0.0001 < gamma < 10

0.1 < c < 100

Refer to the reference

Jakevdp. Making. IO/pythondasta…

The original link: towardsdatascience.com/svm-hyperpa…

Welcome to panchuangai blog: panchuang.net/

Sklearn123.com/

Welcome to docs.panchuang.net/

Visual interpretation of hyperparameters of support vector machines

Gamma and C parameters

Refer to the reference

Related Posts

What is network security, information security, computer security, what is the difference?

5 Learn how to convert the data type

Classification of Machine Learning Algorithms (2)