NVS Yashwanth compiled by McGL
When you first start learning machine learning, probably the most interesting thing is the optimization algorithm, specifically the gradient descent optimization algorithm, which is a first-order iterative optimization algorithm to minimize the cost function.
Convex Sets
It is obvious that any line segment connecting two points on a circle or square (the left and middle shapes) will be included in this shape. These are examples of convex sets. On the other hand, the shape on the far right of the figure above has part of the line segment outside the shape. So this is not a convex set. A convex set C can be represented as follows.
Epigraph
Convex Function
A function f is called convex if its epigraph is a convex set (as shown in the green figure at the bottom left).
This means that f is not convex if there are two points x, y such that the line segment connecting f(x) to f(y) is below the curve of the function f. This results in the loss of epigraph convexity (shown in the red figure on the right side of the figure above).
Convexity inspection
Examples of convex functions: y=eˣ, y=x². Both of these functions are quadratic differentiable. If -f(x) is a convex function, then f is called concave.
Example of a concave function: y=-eˣ. This function is quadratic differentiable.
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(-1, 2, 100)
# Convex function y1
y1 = np.exp(x)
plt.figure(1)
plt.plot(x, y1)
plt.xlabel('$x$')
plt.ylabel('$e^x$')
plt.show()
# Concave function y2
y2 = -np.exp(x)
plt.figure(2)
plt.plot(x, y2)
plt.xlabel('$x$')
plt.ylabel('$-e^x$')
plt.show()
view raw
Copy the code
Convexity in gradient descent optimization
Now let’s consider a non-convex cost function, in which case take an arbitrary non-convex function, as shown in the figure below.
You can see that the gradient descent method will stop at the local minimum instead of converging to the global minimum. Because the gradient at this point is zero (the slope is zero) and it’s a minimum in the neighborhood. One way to solve this problem is to use momentum.
conclusion
Source: towardsdatascience.com/understand-…