Through this article, we can have a common sense understanding of ML algorithms, no code, no complex theoretical derivation, just a diagram, know what these algorithms are, how they are applied, examples are mainly classification problems.
Each algorithm watched several videos, and picked out the most clear and interesting ones for popularization. We’ll have time to do a more in-depth analysis of individual algorithms later.
Today’s algorithm is as follows:
- The decision tree
- Random forest algorithm
- Logistic regression
- SVM
- Naive Bayes
- K nearest neighbor algorithm
- K-means algorithm
- Adaboost algorithm
- The neural network
- markov
1. The decision tree
Classify according to some features, ask a question for each node, divide the data into two categories through judgment, and then continue to ask questions. These questions are learned from existing data, and when you put in new data, you can divide the data into the appropriate leaves based on the problems on the tree.
2. Random forest
video
Select data randomly from the source data to form several subsets
S matrix is the source data, with 1-N pieces of data. A, B and C are features, and the last column C is the category
Generate M submatrices randomly from S
These M subsets are going to give you M decision trees and if you put new data into these M trees, you’re going to get M classification results, and you’re going to count the number of predicted categories, and you’re going to use that category as your final prediction
Logistic regression
video
When the prediction target is probability, the range needs to be greater than or equal to 0, less than or equal to 1. At this time, the simple linear model cannot do this, because when the range is not within a certain range, the range also exceeds the specified interval.
So it would be good to have a model with this shape
So how do you get this model?
So this model has to satisfy two conditions: greater than or equal to 0, less than or equal to 1 and if it’s greater than or equal to 0 you can choose the absolute value, the square value, the exponential function here, it must be greater than or equal to 0 and less than or equal to 1 if you divide, the numerator is itself, and the denominator is itself plus 1, so it must be less than 1
And if you do another transformation, you get a Logistic regression model
You can calculate the coefficients from the source data
And you end up with the logistic graph
4. SVM
video
support vector machine
To separate the two types and obtain a hyperplane, the optimal hyperplane is to maximize the margin of the two types. Margin is the distance between the hyperplane and its nearest point, as shown in the figure below, Z2>Z1, so the green hyperplane is better
I’m going to represent this hyperplane as a linear equation, one class above the line, all greater than or equal to 1, and one class less than or equal to minus 1
The distance from point to surface is calculated according to the formula in the figure
Therefore, the expression of the total margin is as follows. The goal is to maximize the margin, so the denominator needs to be minimized, which becomes an optimization problem
For example, three points, find the optimal hyperplane and define the weight vector = (2,3) – (1,1).
The weight vector is obtained as (a, 2a), the two points are substituted into the equation, (2,3) and its value = 1, and (1, 1) and its value = -1, then the values of a and the moment w0 are solved, and the expression of the hyperplane is obtained.
After a is solved and substituted into (a, 2a), the support vector is obtained
The equation of a and W0 into the hyperplane is the support Vector machine
Naive Bayes
video
Take an example of an application in NLP
Give a paragraph of text, return to the emotional classification, the attitude of the paragraph is positive or negative
To solve this problem, just look at some of the words
This text will only be represented by a few words and their count
The original question was: To give you a word, which category does it belong to is turned into a relatively simple and easily solved question by Bayes rules
The question becomes, what is the probability of this statement in this category, and of course, remember the other two probabilities in this formula
The probability of the word “love” occurring in positive situations is 0.1, and in negative situations 0.001
6. K nearest neighbor
video
k nearest neighbours
When you give a new number, the number of k points closest to it, the number of k points closest to it, that number belongs to that category
To distinguish the cat and dog, according to the claws and sound features, circle and triangle are categorized, so what category does the star represent
When k = 3, the points connected by these three lines are the nearest three points, so there are more circles, so this star belongs to the cat
7. K-means
video
I want to divide a set of data into three categories, the pink value is high, the yellow value is low, and I want to initialize it first, so I choose the simplest 3,2,1 as the initial value of each category and then I calculate the distance between each of the rest of the data and the three initial values, and then I put it into the category of the nearest initial value
After sorting the categories, calculate the average of each category as the center point of the new round
After a few rounds, the group has stopped changing, and you can stop
8. Adaboost
video
Adaboost was one of Bosting’s methods
Bosting is to synthesize several classifiers with poor classification effect and get a classifier with good effect.
In the figure below, the left and right decision trees, individually, are not very good, but putting the same data into them, considering the two results together, will increase the credibility, right
Adaboost chestnut, handwriting recognition, can capture a lot of features on the drawing board, such as the direction of the beginning point, the distance between the beginning point and the end point and so on
During training, the weight of each feature will be obtained. For example, the beginning parts of 2 and 3 are very similar. This feature plays a small role in classification, so its weight will be small
This alpha Angle has strong identification, and the weight of this feature will be large. The final prediction result is the comprehensive consideration of the results of these features
Neural networks
video
Neural Networks are suitable for an input that can fall into at least two categories
NN consists of several layers of neurons and the connections between them. The first layer is input layer, and the last layer is Output layer
Each layer has its own classifier in the hidden layer and output layer
Input is input into the network, activated, and the calculated scores are transferred to the next layer, which activates the neural layer behind. Finally, the scores on the nodes of the output layer represent the scores belonging to various types. In the example below, the classification result is class 1
The same input is transmitted to different nodes. The result is different because each node has different weights and bias
This is also called forward Propagation
10. Markov
video
Markov Chains are composed of State and Transitions
The sentence: “The quick brown fox jumps over the lazy dog” : to get markov chain
Step, first set each word into a state, and then calculate the probability of switching between the states
This is the probability calculated in one sentence. When you use a large amount of text to do statistics, you will get a larger state transition matrix, such as the words that can be connected after the, and the corresponding probability
In life, the alternative result of keyboard input method is the same principle, the model will be more advanced