Through this article, we can have a common sense understanding of ML algorithms, no code, no complex theoretical derivation, just a diagram, know what these algorithms are, how they are applied, examples are mainly classification problems.

Each algorithm watched several videos, and picked out the most clear and interesting ones for popularization. We’ll have time to do a more in-depth analysis of individual algorithms later.

Today’s algorithm is as follows:

  1. The decision tree
  2. Random forest algorithm
  3. Logistic regression
  4. SVM
  5. Naive Bayes
  6. K nearest neighbor algorithm
  7. K-means algorithm
  8. Adaboost algorithm
  9. The neural network
  10. markov

1. The decision tree

Classify according to some features, ask a question for each node, divide the data into two categories through judgment, and then continue to ask questions. These questions are learned from existing data, and when you put in new data, you can divide the data into the appropriate leaves based on the problems on the tree.





2. Random forest

video

Select data randomly from the source data to form several subsets





S matrix is the source data, with 1-N pieces of data. A, B and C are features, and the last column C is the category





Generate M submatrices randomly from S





These M subsets are going to give you M decision trees and if you put new data into these M trees, you’re going to get M classification results, and you’re going to count the number of predicted categories, and you’re going to use that category as your final prediction





Logistic regression

video

When the prediction target is probability, the range needs to be greater than or equal to 0, less than or equal to 1. At this time, the simple linear model cannot do this, because when the range is not within a certain range, the range also exceeds the specified interval.





So it would be good to have a model with this shape





So how do you get this model?

So this model has to satisfy two conditions: greater than or equal to 0, less than or equal to 1 and if it’s greater than or equal to 0 you can choose the absolute value, the square value, the exponential function here, it must be greater than or equal to 0 and less than or equal to 1 if you divide, the numerator is itself, and the denominator is itself plus 1, so it must be less than 1





And if you do another transformation, you get a Logistic regression model





You can calculate the coefficients from the source data





And you end up with the logistic graph





4. SVM

video

support vector machine

To separate the two types and obtain a hyperplane, the optimal hyperplane is to maximize the margin of the two types. Margin is the distance between the hyperplane and its nearest point, as shown in the figure below, Z2>Z1, so the green hyperplane is better





I’m going to represent this hyperplane as a linear equation, one class above the line, all greater than or equal to 1, and one class less than or equal to minus 1





The distance from point to surface is calculated according to the formula in the figure





Therefore, the expression of the total margin is as follows. The goal is to maximize the margin, so the denominator needs to be minimized, which becomes an optimization problem





For example, three points, find the optimal hyperplane and define the weight vector = (2,3) – (1,1).





The weight vector is obtained as (a, 2a), the two points are substituted into the equation, (2,3) and its value = 1, and (1, 1) and its value = -1, then the values of a and the moment w0 are solved, and the expression of the hyperplane is obtained.





After a is solved and substituted into (a, 2a), the support vector is obtained

The equation of a and W0 into the hyperplane is the support Vector machine

Naive Bayes

video

Take an example of an application in NLP

Give a paragraph of text, return to the emotional classification, the attitude of the paragraph is positive or negative





To solve this problem, just look at some of the words





This text will only be represented by a few words and their count





The original question was: To give you a word, which category does it belong to is turned into a relatively simple and easily solved question by Bayes rules





The question becomes, what is the probability of this statement in this category, and of course, remember the other two probabilities in this formula

The probability of the word “love” occurring in positive situations is 0.1, and in negative situations 0.001





6. K nearest neighbor

video

k nearest neighbours

When you give a new number, the number of k points closest to it, the number of k points closest to it, that number belongs to that category

To distinguish the cat and dog, according to the claws and sound features, circle and triangle are categorized, so what category does the star represent





When k = 3, the points connected by these three lines are the nearest three points, so there are more circles, so this star belongs to the cat





7. K-means

video

I want to divide a set of data into three categories, the pink value is high, the yellow value is low, and I want to initialize it first, so I choose the simplest 3,2,1 as the initial value of each category and then I calculate the distance between each of the rest of the data and the three initial values, and then I put it into the category of the nearest initial value





After sorting the categories, calculate the average of each category as the center point of the new round





After a few rounds, the group has stopped changing, and you can stop










8. Adaboost

video

Adaboost was one of Bosting’s methods

Bosting is to synthesize several classifiers with poor classification effect and get a classifier with good effect.

In the figure below, the left and right decision trees, individually, are not very good, but putting the same data into them, considering the two results together, will increase the credibility, right





Adaboost chestnut, handwriting recognition, can capture a lot of features on the drawing board, such as the direction of the beginning point, the distance between the beginning point and the end point and so on





During training, the weight of each feature will be obtained. For example, the beginning parts of 2 and 3 are very similar. This feature plays a small role in classification, so its weight will be small





This alpha Angle has strong identification, and the weight of this feature will be large. The final prediction result is the comprehensive consideration of the results of these features





Neural networks

video

Neural Networks are suitable for an input that can fall into at least two categories

NN consists of several layers of neurons and the connections between them. The first layer is input layer, and the last layer is Output layer

Each layer has its own classifier in the hidden layer and output layer





Input is input into the network, activated, and the calculated scores are transferred to the next layer, which activates the neural layer behind. Finally, the scores on the nodes of the output layer represent the scores belonging to various types. In the example below, the classification result is class 1

The same input is transmitted to different nodes. The result is different because each node has different weights and bias

This is also called forward Propagation





10. Markov

video

Markov Chains are composed of State and Transitions

The sentence: “The quick brown fox jumps over the lazy dog” : to get markov chain

Step, first set each word into a state, and then calculate the probability of switching between the states





This is the probability calculated in one sentence. When you use a large amount of text to do statistics, you will get a larger state transition matrix, such as the words that can be connected after the, and the corresponding probability





In life, the alternative result of keyboard input method is the same principle, the model will be more advanced