Today is the 33rd article in our machine learning series, and we continue to talk about SVM models.
In the last article, we deduced the formula derivation of SVM model in the linear separable problem, and the final conclusion was a quadratic term with inequality:
For those of you who want to see how this works, you can refer to my last article:
Machine learning | into SVM (a) derived from the principle and model
Actually on an article on the left of a problem, is that we hope to get ∣ ∣ omega ∣ ∣ | | \ omega | | ∣ ∣ omega ∣ ∣ minimum, why not just ask ∣ ∣ omega ∣ ∣ | | \ omega | | ∣ ∣ omega ∣ ∣ minimum, rather than a request ∣ ∣ omega ∣ ∣ 2 | | \ omega | | ^ 2 ∣ ∣ omega ∣ ∣ 2 min? The reason lies in its solution, because we are going to transform it into a convex Quadratic Programming problem. QP problem is also a very important field in computing science, but also a relatively difficult field, because of the need for computers and mathematics have deep attainments.
Personally, I feel that the combination with actual machine learning and engineering is not very close. At present, it is mainly used in the principle derivation of SVM model, so we only need to understand the formula principle of SVM.
Solving process
In fact, there is a special QP calculation package for THE QP problem to find its extreme value, and I have used it, but this is not the only solution, and this solution has a big disadvantage is that it cannot apply the kernel function. Therefore, we generally do not use the method of QP programming to solve.
If we look at the original formula, it’s natural to get the feeling that those inequalities are annoying. We wish to eliminate these inequalities, and we can do so by using the Lagrange multiplier method to transform constrained optimization targets into unconstrained optimization functions.
Let’s take a look at the process of using the Lagrange multiplier method, given an inequality constraint problem:
For this seemingly complicated system, we introduce a generalized Langrange function and rewrite it as follows:
This looks a little bit simpler than the system above, but what does it do? If we do a simple analysis, we find that L≤f(x)L \le f(x)L≤f(x). Because alpha I acuity 0 \ alpha_i \ ge 0 alpha I acuity 0, and gi (x) 0 or less g_i gi (x) (x) \ le 0 0 or less. So add the two, L≤f(x)L \le f(x)L≤f(x), when α I =0\alpha_i =0 α I =0 L=f(x)L =f(x)L =f(x). So we are asking the Max L (x, alpha, beta) \ Max L (x, \ alpha, beta) maxL (x, alpha, beta).
And since we’re looking for a minimum of f(x), f(x), Our ultimate goal is min xmax alpha I acuity 0 L (x, alpha, beta) \ min_ {x} \ max_ ge 0} {\ alpha_i \ L (x, \ alpha, beta) minxmax alpha I acuity 0 L (x, alpha, beta).
The dual problem
And then we’re going to talk about the famous duality problem, the duality problem, which is essentially transposing the inequalities of an equation with two inequalities. Change min Max L\min \ Max LminmaxL to Max minL\ Max \min LmaxminL, but the problem is that the position of the inequality sign is obviously particular, and the order can not be arbitrarily changed, so we need to prove that this swap is valid.
For convenience’s sake, We put the original problem into theta P (x) = min xmax alpha, beta L (x, alpha, beta) \ theta_P (x) = \ min_x \ max_ {, alpha, and beta} L (x, \ alpha, beta) theta P (x) = minxmax alpha, beta L (x, alpha, beta), Let’s define the equation for α\alphaα and β\betaβ : (alpha, beta, theta D) = min xL (x, alpha, beta) \ theta_D (, alpha, and beta) = \ min_ {x} L (x, \ alpha, beta) (alpha, beta, theta D = minxL (x, alpha, beta). On the right-hand side of this equation is the minimization of the Lagrange function. Since x is determined, the result of the equation is only dependent on alpha \alphaα and beta \betaβ, so it is a function of alpha \alphaα and beta \betaβ.
We take the maximum of this function:
I don’t know if this looks familiar, because it’s very similar to the original problem that we just derived, except the inequality sign is in a different place. Why are we listing these two things? Not for fun, of course, but mostly because I want to get this inequality:
If we overdo it with the Lagrange equation, we can easily prove this:
If we want to replace the original problem with the dual problem, we need this inequality up here. For the conditions of Lagrange equation equality, mathematicians have had a rigorous proof, only need to satisfy the Karush-Kuhn-Tucker condition (KTT condition). The conditional derivation of KTT is also a very complicated process, and we don’t need to go into it, just need to know that it exists.
There are several KTT conditions, it seems that more is not complicated.
Let’s solve for the minimum value of θP(x)\theta_P(x)θP(x) using the KTT condition. This is part of high school mathematics. Let’s first simplify the original formula:
Then take the derivative of ω\omegaω and BBB:
We find the relationship between omega omega omega omega and alpha alpha by taking the derivative, which means that if we can identify alpha alpha alpha, we can also identify omega omega omega omega, and we can find that there is no b in the formula, which means that we have eliminated the B. We substitute the ω\omegaω and BBB obtained by the above derivation into the original formula, and eliminate ω\omegaω and BBB, so as to obtain:
If we look at this formula, we see that x and y are fixed values determined by the sample, and the only variable is alpha \alpha alpha. What we want is the maximum of the above formula, and the only variable is alpha \alphaα, which leads to the corresponding ω\omegaω and b.
So how do we solve for alpha \alpha alpha? We’ll leave that to the next article.
This is the end of today’s article, if you like this article, please invite a wave of quality sanlian, give me a little support (follow, forward, like).
Original link, ask a concern