Optimization problem is the core of machine learning, and convex function plays an important role in optimization.

NVS Yashwanth compiled by McGL

When you first start learning machine learning, probably the most interesting thing is the optimization algorithm, specifically the gradient descent optimization algorithm, which is a first-order iterative optimization algorithm to minimize the cost function.

The intuition behind gradient descent is to converge to a solution that may be a local minimum in the neighborhood or, at best, a global minimum. All seems fine until you start to question convergence. A good understanding of convexity will help you prove the intuition behind the theory of gradient descent. So let’s talk about it.

Convex Sets

In simple terms, you can think of a convex set as a shape in which no line connecting 2 points goes beyond the convex set. This is called a convex set.

Take a look at the following example.

Understand the convex set

It is obvious that any line segment connecting two points on a circle or square (the left and middle shapes) will be included in this shape. These are examples of convex sets. On the other hand, the shape on the far right of the figure above has part of the line segment outside the shape. So this is not a convex set. A convex set C can be represented as follows.

Convex set conditions

Epigraph

Look at the graph of f below.

Epigraph is a set of points in or on a function.

Epigraph of a function

Convex Function

Ok, so now that you know what a convex set and epigraph is, we can talk about convex functions.

A convex function and its epigraph

A function f is called convex if its epigraph is a convex set (as shown in the green figure at the bottom left).

This means that every line segment drawn between two points on this epigraph is always equal to or higher than the function graph. Pause for a minute and check for yourself.

Understanding convex functions

This means that f is not convex if there are two points x, y such that the line segment connecting f(x) to f(y) is below the curve of the function f. This results in the loss of epigraph convexity (shown in the red figure on the right side of the figure above).

This means that each line segment drawn on epigraph is not always equal to or higher than a correspondence line. You can prove that by taking a point at the bend.

Convexity inspection

In neural networks, most cost functions are non-convex. Therefore, the convexity of the function must be tested.

A function f is convex if its second derivative is greater than or equal to 0.

The conditions for convex functions

Examples of convex functions: y=eˣ, y=x². Both of these functions are quadratic differentiable. If -f(x) is a convex function, then f is called concave.

The condition for concave functions

Example of a concave function: y=-eˣ. This function is quadratic differentiable.

Let us examine convexity by plotting the exponential function eˣ.

Draw a convex and concave function code:

import numpy as np 
import matplotlib.pyplot as plt 
x=np.linspace(-1, 2, 100)
# Convex function y1
y1 = np.exp(x)
plt.figure(1)
plt.plot(x, y1)
plt.xlabel('$x$')
plt.ylabel('$e^x$')
plt.show()

# Concave function y2
y2 = -np.exp(x)
plt.figure(2)
plt.plot(x, y2)
plt.xlabel('$x$')
plt.ylabel('$-e^x$')
plt.show()
view raw
Copy the code
Code output:

Convex and concave

Convexity in gradient descent optimization

As mentioned above, the gradient descent optimization algorithm is a first-order iterative optimization algorithm used to minimize the cost function. To understand how convexity plays a key role in gradient descent, let’s take convex and non-convex cost functions as examples. For the linear regression model, we define the cost function mean square error (MSE), which measures the mean variance between the actual and predicted values. Our goal is to minimize this cost function to improve the accuracy of the model. MSE is a convex function (it is quadratic differentiable). This means that there are no local minima, only global minima. Therefore, the gradient descent method will converge to the global minimum.

MSE equation

Now let’s consider a non-convex cost function, in which case take an arbitrary non-convex function, as shown in the figure below.

Gradient descent method for nonconvex functions

You can see that the gradient descent method will stop at the local minimum instead of converging to the global minimum. Because the gradient at this point is zero (the slope is zero) and it’s a minimum in the neighborhood. One way to solve this problem is to use momentum.

conclusion

Convex functions play an important role in optimization problems. Optimization is at the heart of machine learning models. Convexity is therefore very important, as I believe you have understood clearly by the end of this article. thank you

Source: towardsdatascience.com/understand-…