Through this article, we can have a common sense understanding of ML algorithms, no code, no complex theoretical derivation, just a diagram, know what these algorithms are, how they are applied, examples are mainly classification problems.
Each algorithm watched several videos, and picked out the most clear and interesting ones for popularization.
We’ll have time to do a more in-depth analysis of individual algorithms later.
We’ll have time to do a more in-depth analysis of individual algorithms later.
Today’s algorithm is as follows:
The decision tree Random forest algorithm Logistic regression SVM Naive Bayes K nearest neighbor algorithm K-means algorithm Adaboost algorithm The neural network markov
1. The decision tree
Classify according to some features, ask a question for each node, divide the data into two categories through judgment, and then continue to ask questions. These questions are learned from existing data, and when you put in new data, you can divide the data into the appropriate leaves based on the problems on the tree.
2. Random forest
Select data randomly from the source data to form several subsets
S matrix is the source data, with 1-N pieces of data. A, B and C are features, and the last column C is the category
Generate M submatrices randomly from S
These M subsets give me M decision trees
Put the new data into the M trees, and get M classification results. Count to see which category has the most number of predictions, and take this category as the final prediction result
Put the new data into the M trees, and get M classification results. Count to see which category has the most number of predictions, and take this category as the final prediction result
Logistic regression
When the prediction target is probability, the range needs to be greater than or equal to 0, less than or equal to 1. At this time, the simple linear model cannot do this, because when the range is not within a certain range, the range also exceeds the specified interval.
So it would be good to have a model with this shape
So how do you get this model?
This model has to satisfy two conditions: greater than or equal to 0, less than or equal to 1
If you have a model greater than or equal to 0, you can choose the absolute value, the square value, the exponential function, is greater than 0
Less than or equal to 1 if you divide, the numerator is itself, and the denominator is itself plus 1, that must be less than 1
If you have a model greater than or equal to 0, you can choose the absolute value, the square value, the exponential function, is greater than 0
Less than or equal to 1 if you divide, the numerator is itself, and the denominator is itself plus 1, that must be less than 1
And if you do another transformation, you get a Logistic regression model
You can calculate the coefficients from the source data
And you end up with the logistic graph
4. SVM
support vector machine
To separate the two types and obtain a hyperplane, the optimal hyperplane is to maximize the margin of the two types. Margin is the distance between the hyperplane and its nearest point, as shown in the figure below, Z2>Z1, so the green hyperplane is better
I’m going to represent this hyperplane as a linear equation, one class above the line, all greater than or equal to 1, and one class less than or equal to minus 1
The distance from point to surface is calculated according to the formula in the figure
Therefore, the expression of the total margin is as follows. The goal is to maximize the margin, so the denominator needs to be minimized, which becomes an optimization problem
For example, three points, find the optimal hyperplane and define the weight vector = (2,3) – (1,1).
The weight vector is obtained as (a, 2a), the two points are substituted into the equation, (2,3) and its value = 1, and (1, 1) and its value = -1, then the values of a and the moment w0 are solved, and the expression of the hyperplane is obtained.
After a is solved and substituted into (a, 2a), the support vector is obtained
The equation of a and W0 into the hyperplane is the support Vector machine
Naive Bayes
Take an example of an application in NLP
Give a paragraph of text, return to the emotional classification, the attitude of the paragraph is positive or negative
To solve this problem, just look at some of the words
This text will only be represented by a few words and their count
The original question was: To give you a word, which category does it fall into
Bayes rules make this a relatively easy problem to solve
Bayes rules make this a relatively easy problem to solve
The question becomes, what is the probability of this statement in this category, and of course, remember the other two probabilities in this formula
The probability of the word “love” occurring in positive situations is 0.1, and in negative situations 0.001
6. K nearest neighbor
k nearest neighbours
When you give a new number, the number of k points closest to it, the number of k points closest to it, that number belongs to that category
To distinguish the cat and dog, according to the claws and sound features, circle and triangle are categorized, so what category does the star represent
When k = 3, the points connected by these three lines are the nearest three points, so there are more circles, so this star belongs to the cat
7. K-means
You want to divide a set of data into three categories, pink is high, yellow is low
It is best to initialize first, and we chose the simplest 3,2,1 as the initial values of each class
The rest of the data is calculated from the three initial values, and then grouped into the category of the initial value closest to it
It is best to initialize first, and we chose the simplest 3,2,1 as the initial values of each class
The rest of the data is calculated from the three initial values, and then grouped into the category of the initial value closest to it
After sorting the categories, calculate the average of each category as the center point of the new round
After a few rounds, the group has stopped changing, and you can stop
8. Adaboost
Adaboost was one of Bosting’s methods
Bosting is to synthesize several classifiers with poor classification effect and get a classifier with good effect.
In the figure below, the left and right decision trees, individually, are not very good, but putting the same data into them, considering the two results together, will increase the credibility, right
Adaboost chestnut, handwriting recognition, can capture a lot of features on the drawing board, such as the direction of the beginning point, the distance between the beginning point and the end point and so on
During training, the weight of each feature will be obtained. For example, the beginning parts of 2 and 3 are very similar. This feature plays a small role in classification, so its weight will be small
This alpha Angle has strong identification, and the weight of this feature will be large. The final prediction result is the comprehensive consideration of the results of these features
Neural networks
Neural Networks are suitable for an input that can fall into at least two categories
NN consists of several layers of neurons, and the connections between them
The first layer is the input layer, and the last layer is the Output layer
The first layer is the input layer, and the last layer is the Output layer
Each layer has its own classifier in the hidden layer and output layer
Input is input into the network, activated, and the calculated scores are transferred to the next layer, which activates the neural layer behind. Finally, the scores on the nodes of the output layer represent the scores belonging to various types. In the example below, the classification result is class 1
The same input is transmitted to different nodes. The result is different because each node has different weights and bias
This is also called forward Propagation
10. Markov
Markov Chains are composed of State and Transitions
The sentence: “The quick brown fox jumps over the lazy dog” : to get markov chain
Step, first set each word into a state, and then calculate the probability of switching between the states
This is the probability calculated in one sentence. When you use a large amount of text to do statistics, you will get a larger state transition matrix, such as the words that can be connected after the, and the corresponding probability
In life, the alternative result of keyboard input method is the same principle, the model will be more advanced.
For more free technical information: annalin1203