Machine Learning: juejin.cn/post/684490…

Classification problem

One way to try categorization is to use linear regression and map all predictions greater than 0.5 to 1 and all predictions less than 0.5 to 0. However, this approach does not work well because the classification is not really a linear function.

The classification problem is like the regression problem, except that the values we now want to predict occupy only a small number of discrete values. For now, we’ll focus on the binary classification problem, where Y can only handle two values, 0 and 1. (Most of what we say here will also be generalized to the multiclass case.) For example, if we are trying to build a spam classifier X (I) for E-mail, y might be 1 if it is a spam message and 0 otherwise. Therefore, y∈{0,1}. 0 is also called the negative class, and 1 is the positive class, and they are sometimes denoted by the symbols “-” and “+”. X (I), the corresponding Y (I) is also called the label of the training example.

Let’s say the expression for the function

We can ignore the fact that y is a discrete value to deal with the classification problem, and use our old linear regression algorithm to try to predict given x. But the results are very poor.

Intuitively, when y∈{0,1}, it makes no sense for h θ(x) to take a value greater than or less than 0. To solve this problem, let’s change the form of our hypothesis function h theta of x so that it satisfies

Our new form uses the “Sigmoid function”, also known as the “logical function” :

H theta of x will give you the probability that the output is 1. For example, hθ(x) = 0.7, which means that the probability of our output being 1 is 70%, and the probability of our prediction being 0 is just a supplement to our probability of 1 (for example, if the probability of it being 1 is 70%, then the probability of it being 0 is 30%).

The decision boundary

To get the discrete 0 or 1 classification, we can convert the output of the hypothesis function as follows:

so

The decision boundary is where y is equal to 0 and where y is equal to 1, and this is the line that separates the regions created by our hypothesis function.

Example:

In this case, our decision boundary is a line and the vertical line is placed in the graph X1 equals 5, and everything on the left represents y equals 1, and everything on the right represents y equals 0.

Example:

Similarly, the input to the sigmoid function g (z) (e.g., θ^TX) need not be linear, and can be functions describing circles (e.g.,