1. Regression algorithm

In most machine learning courses, regression is the first algorithm introduced. There are two reasons: One. Regression algorithms are relatively simple and allow for a smooth transition from statistics to machine learning. Regression algorithm is the foundation of several powerful algorithms. If you don’t understand regression algorithm, you can’t learn those powerful algorithms. Regression algorithm has two important subclasses: linear regression and logistic regression.

Linear regression is the housing price problem we talked about earlier. How do I fit a line that best matches all of my data? The least square method is generally used to solve the problem. The idea of the least square method is to assume that the line we fit represents the true value of the data, and the observed data represents the value with an error. In order to minimize the effects of the errors, you need to solve for a line that minimizes the sum of squares of all the errors. The least square method transforms the optimal problem into the problem of finding the extremum of the function. Extremum of a function In mathematics we usually take the method of finding the derivative to be 0. But this approach is not suitable for computers, may not solve it, may also be too much calculation.

There is a special discipline in computer science called numerical computation, which is dedicated to improving the accuracy and efficiency of computers in performing all kinds of calculations. For example, the famous “gradient descent” and “Newton’s method” are the classical algorithms in numerical calculation, which are also very suitable to deal with the problem of solving the extreme value of functions. Gradient descent method is one of the simplest and effective methods to solve regression model. Strictly speaking, since both the neural network and the recommendation algorithm in the following paper have the factor of linear regression, the gradient descent method is also applied in the implementation of the following algorithm.

Logistic regression is an algorithm very similar to linear regression, but essentially, the type of problems linear regression deals with are not the same as logistic regression. Linear regression deals with numerical problems, that is, the predicted result is a number, such as house prices. Logistic regression is a classification algorithm, that is to say, logistic regression predicts discrete categories, such as whether the email is spam or not, and whether the user will click on the AD and so on.

In terms of implementation, logistic regression simply adds a Sigmoid function to the calculated result of linear regression, and converts the numerical result into the probability between 0 and 1. (The Sigmoid function is generally not intuitive, you just need to understand that the larger the logarithm, the closer the function is to 1, and the smaller the logarithm, the closer the function is to 0.) Then we can make predictions based on this probability, for example, if the probability is greater than 0.5, the email is spam, or whether the tumor is malignant, etc. Intuitively, logistic regression draws a classification line, as shown in Figure 1.

FIG. 1 An intuitive interpretation of logistic regression

Suppose we have data from a group of patients with tumors that are either benign (blue dots) or malignant (red dots). The red and blue color of the tumor can be called the “label” of the data. Each data also included two “characteristics” : the patient’s age and the size of the tumour. We mapped these two features and labels to this two-dimensional space to form the data in my figure above.

When I have a green dot, do I know if the tumor is malignant or benign? We trained a logistic regression model based on the red and blue dots, which are the classification lines in the diagram. At this point, the green dot appears on the right side of the classification line, so we know it should be labeled red, which means it’s a malignant tumor.

The classification line drawn by logistic regression algorithm is basically linear (there are also logistic regression that draw nonlinear classification line, but such model will be very inefficient when processing large amount of data), which means that when the line between the two categories is not linear, the expression ability of logistic regression is insufficient. The following two algorithms are among the most powerful and important in machine learning, and both can fit nonlinear classification lines.

2. Neural networks

Neural network (also known as artificial neural network, ANN) algorithms were very popular in machine learning in the 1980s, but declined in the mid 1990s. Now, with the momentum of “deep learning”, neural networks are back as one of the most powerful machine learning algorithms.

The birth of neural network originates from the study of brain working mechanism. Early biologists used neural networks to model the brain. Machine learning scholars use neural networks to conduct machine learning experiments, and found that the effect of visual and speech recognition is quite good. After the birth of BP algorithm (numerical algorithm to accelerate the training process of neural network), the development of neural network has entered a boom. One of the inventors of the BP algorithm is Geoffrey Hinton(middle in Figure 1), the machine learning hero introduced earlier.

Specifically, what is the learning mechanism of neural networks? Simply put, it is decomposition and integration. In the famous Hubel-Wiesel experiment, researchers studied the visual analysis mechanism of cats like this. (Figure 2)

Figure 2 Hubel-Wiesel test and the mechanism of brain vision

A square, for example, breaks down into four polylines and goes to the next level of visual processing. Four neurons each process a polyline. Each polyline is further decomposed into two straight lines, and each line is decomposed into black and white faces. So a complex image turns into a lot of detail that goes into the neuron, and the neuron processes it and integrates it, and finally it says you’re looking at a square. That’s how visual recognition works in the brain, and that’s how neural networks work.

Let’s look at the logical architecture of a simple neural network. In this network, there are input layers, hidden layers, and output layers. The input layer is responsible for receiving signals, the hidden layer is responsible for data decomposition and processing, and the final results are integrated into the output layer. A circle in each layer represents a processing unit, which can be considered as a simulation of a neuron. Several processing units form a layer, and several layers form a network, namely “neural network” (FIG. 3).

Figure 3 Logical architecture of neural network Figure 3 logical architecture of neural network

In the neural network, each processing unit is actually a logistic regression model. The logistic regression model receives the input from the upper layer and transmits the prediction results of the model as output to the next layer. Through this process, neural network can complete very complex nonlinear classification.

The following figure illustrates a well-known application of neural networks in the field of image recognition. This program is called LeNet, which is a neural network based on multiple hidden layers. LeNet can recognize a variety of handwritten digits with high recognition accuracy and good robustness. (See Figure 4)

Figure 4 LeNet effect display Figure 4 LeNet effect display

The image of the input computer is displayed in the lower right square, and the output of the computer is displayed after the word “answer” in red above the square. The three vertical image columns on the left show the output of the three hidden layers in the neural network. It can be seen that with the deepening of the layers, the deeper the layers are, the lower the details are processed. For example, layer 3 basically deals with the details of lines. LeNet was invented by machine learning guru Yann LeCun(figure 1, right).

In the 1990s, the development of neural network has entered a bottleneck period. The main reason is that despite the acceleration of BP algorithm, the training process of neural network is still very difficult. Therefore, in the late 1990s, support vector machine (SVM) algorithm replaced the position of neural network.

3. SVM (Support vector Machine)

Support vector machine (SVM) algorithm is a classical algorithm born in statistical learning field and shining in machine learning field.

In a sense, support vector machine algorithm is the enhancement of logistic regression algorithm: by giving more strict optimization conditions to logistic regression algorithm, support vector machine algorithm can obtain better classification boundaries than logistic regression. But without some kind of function technique, the support vector machine algorithm is at best a better linear classification technique.

However, by combining with gaussian kernel, support vector machines can express very complex classification boundaries and achieve good classification results. “Kernel” is actually a special kind of function, the most typical characteristic of which is that it can map a lower dimensional space to a higher dimensional space.

For example, see Figure 5:


Figure 5 Support vector machine legend Figure 5 Support vector machine legend

How do we make a circular classification boundary in two dimensions? This can be difficult in two dimensions, but using the kernel you can map two dimensions to three, and then use a linear plane to achieve similar results. In other words, the nonlinear classification boundary of a two-dimensional plane can be equivalent to the linear classification boundary of a three-dimensional plane. Therefore, we can achieve the effect of nonlinear partition in two-dimensional plane by simple linear partition in three-dimensional space. (See Figure 6)

Figure 6 Cutting in three-dimensional space figure 6 Cutting in three-dimensional space

Support vector machines are machine learning algorithms with a heavy mathematical component (as opposed to neural networks with a biological component). One of the core steps in the algorithm proves that mapping data from lower dimensions to higher dimensions does not result in an increase in computational complexity. Therefore, support vector machine algorithm can not only maintain computational efficiency, but also obtain very good classification effect. Therefore, support vector machine has been occupying the most core position in machine learning in the late 1990s, basically replacing neural network algorithm. Only now, with the resurgence of neural networks through deep learning, is the delicate balance shifting again.

4. Clustering algorithm

A significant feature of the previous algorithm is that my training data contains labels, and the trained model can predict labels for other unknown data. In the following algorithm, the training data does not contain labels, and the purpose of the algorithm is to infer the labels of these data through training. This kind of algorithm has a general name, namely unsupervised algorithm (the algorithm with labeled data in front is supervised algorithm). The most typical unsupervised algorithm is clustering algorithm.

Let’s take a two-dimensional piece of data, a piece of data that contains two features. I want to label different kinds of them by clustering algorithm, so how do I do that? In simple terms, the clustering algorithm is to calculate the distance in the population and divide the data into multiple populations according to the distance.

The most typical representative of clustering algorithm is k-means algorithm.

5. Dimensionality reduction algorithm

Dimension reduction algorithm is also a kind of unsupervised learning algorithm whose main feature is to reduce the data from high dimension to low dimension level. Here, the dimension actually refers to the size of the data’s feature quantity. For example, the house price includes the length, width, area and number of rooms, that is, the data with a dimension of 4 dimensions. As you can see, the length and width actually overlap with the information for area, for example area = length × width. Through the dimensionality reduction algorithm, we can remove redundant information and reduce the features to two features: area and number of rooms, that is, from 4 dimensional data to 2 dimensional data. So we reduced the data from higher dimensions to lower dimensions, which not only facilitated presentation, but also accelerated computation.

The dimensions that are reduced during dimensionality reduction are at the visually visible level, and there is no loss of information due to compression (because information is redundant). Dimensionality reduction works if it is invisible to the naked eye, or if there are no redundant features, but some information is lost. However, the dimensionality reduction algorithm can be proved mathematically that the data information is preserved to the maximum extent in the lower dimension compressed from the higher dimension. Therefore, the use of dimensionality reduction algorithm still has many benefits.

The main function of dimensionality reduction algorithm is to compress data and improve the efficiency of other machine learning algorithms. Through the dimensionality reduction algorithm, the data with thousands of features can be compressed into several features. In addition, another benefit of dimensionality reduction algorithm is the visualization of data, for example, 5 dimensional data compressed into 2 dimensions, and then can be viewed with a two-dimensional plane. The main representative of dimension reduction algorithm is PCA algorithm (principal component analysis algorithm).

6. Recommendation algorithm

Recommendation algorithm is a very popular algorithm in the current industry, which has been widely used in e-commerce, such as Amazon, Tmall, Jingdong and so on. The main feature of recommendation algorithm is that it can automatically recommend to users what they are most interested in, so as to increase the purchase rate and improve benefits. There are two main categories of recommendation algorithms:

One kind is based on the recommendations from the article content, is the will and the content of the users to buy similar items to recommend to the user, this is the premise of each item have several tags, so they can find similar items to users to buy goods, the benefits of such recommendations are associated degree is bigger, but because each item need to label, so larger workload.

Recommendation, another kind is based on the user similarity is the other will be the same as the target user interest recommend users to buy things to target users, such as small A history bought items B and C, after algorithm analysis, find another small with A similar user purchased item E, D and E recommended for small A the item.

Both types of recommendations have their own advantages and disadvantages. In general e-commerce applications, they are generally mixed. The most famous recommendation algorithm is the collaborative filtering algorithm.

7, other

In addition to the above algorithms, there are other algorithms in machine learning such as Gaussian discrimination, naive Bayes, decision tree and so on. But the six algorithms listed above are the most widely used, influential, and diverse. One of the features of machine learning is that there are so many algorithms, and so many things are happening.



The following is a summary. According to whether the trained data has labels, the above algorithms can be divided into supervised learning algorithms and unsupervised learning algorithms. However, the recommendation algorithm is special and belongs to a separate category, neither supervised learning nor unsupervised learning.

Supervised learning algorithms: Linear regression, logistic regression, neural network, SVM

Unsupervised learning algorithm: clustering algorithm, dimensionality reduction algorithm

Special algorithm: recommendation algorithm

In addition to these algorithms, there are several algorithms whose names have cropped up frequently in the field of machine learning. But they are not machine learning algorithms per se, they are designed to solve a subproblem. You can understand them as sub-algorithms of the above algorithms, used to greatly improve the training process. Among them, there are: gradient descent method, mainly used in linear regression, logistic regression, neural network, recommendation algorithm; Newton’s method, mainly used in linear regression; BP algorithm, mainly used in neural network; SMO algorithm is mainly used in SVM.