Sequence article: The Vernacular Machine Learning Concept

I. Machine learning category

Machine learning is based on differences in the experience of learning data, that is, differences in the label information of the training data, It can be divided into supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning Learning).

1.1 Supervised Learning

Supervised learning is the most widely used and mature machine learning. It learns how to correlate x to the correct Y from labeled data samples (X, Y). This process is just like the model learning with reference to the answer (label y) in the given given condition (feature X). With the help of the supervised correction of label Y, the model constantly adjusts its parameters through the algorithm to achieve the learning goal.

The models commonly used in supervised learning include linear regression, naive Bayes, K-nearest neighbor, logistic regression, support vector machine, neural network, decision tree, integrated learning (such as LightGBM), etc. According to the application scenario, if the value of Y predicted by the model is finite or infinite, it can be further divided into classification or regression model.

Classification model

Classification model is a classification task that deals with the finite value of prediction results. The following example uses logistic regression classification model to predict whether it will rain according to temperature, humidity and wind speed.

  • Introduction to Logistic regression

Although logistic regression has the name “regression”, it is actually a generalized linear classification model, which is widely used in practice because of its simplicity and efficiency.

The logistic regression model structure can be regarded as a two-layer neural network (See Figure 4.5). The model input x, the neuron activation function F (f is sigmoid function) is used to nonlinearly transform the input to the value output of 0~1, and the final model decision function is Y=sigmoid(wx + b).

. Where, the model parameter W corresponds to each feature (x1, x2, x3…) The weight of (w1,w2,w3…) , model B parameters represent the bias term, Y is the prediction result (range 0~1).

The learning objective of the model is to minimize the cross entropy loss function. The gradient descent algorithm is often used to iteratively solve the minimum value of the loss function to get better model parameters.

  • Code sample

The weather data set used in the example is a simple weather record data, including outdoor temperature and humidity, wind speed, whether it is raining, etc. In the classification task, we use whether it is raining as the label, and others as characteristics (See Figure 4.6).

Import pandas as pd # Import Pandas library weather_df = pd.read_csv('./data/weather.csv') # Load weather data set weather_df.head(10) # Display the first 10 rows of data X = weather_df.drop('If Rain'); Axis =1) # regression = LogisticRegression() lR.fit () Lr. Predict (x[0:10])) # The model predicts the first 10 samples and outputs the resultsCopy the code

The prediction results of the first 10 samples output by the trained model are as follows: [1 1 1 1 1 10 1 1]. Compared with the actual label of the first 10 samples: [1 1 1 10 10 1], the prediction accuracy is not high. In the following sections, we will describe how to evaluate the prediction effect of the model and further optimize the model effect.

Regression model Regression model is a regression task that deals with the infinite value of prediction results. The following code example uses a linear regression model to predict outdoor humidity based on temperature, wind, and rain.

  • Introduction to Linear Regression

The linear regression model assumes a linear relationship between y and x, and the model decision function is y =wx+b for input x. The learning objective of the model is to minimize the mean square error loss function. The least square method is often used to solve the optimal model parameters.

  • Code sample
X = weather_df.drop('Humidity'); Axis =1) # linear regression () linear regression () linear fit(x, y) # linear regression () linear regression (x, y) ", linear. Predict (x[0:10])) [0.42053525 0.32811401 0.31466161 0.3238797 0.29984453 0.29880059Copy the code

1.2 Unsupervised learning

Unsupervised learning is also widely used in machine learning. It is the inherent law of learning data from unlabeled data (X). This process is just like the model in which no one provides the reference answer (y), and the knowledge points are summed up and summarized completely by pondering the knowledge points of the topic. According to application scenarios, unsupervised learning can be divided into clustering, feature reduction and correlation analysis. In the following example, iris samples of different varieties are classified by Kmeans clustering.

  • Introduction to Kmeans clustering

Kmeans clustering is a common method of unsupervised learning. Its principle is to initialize K cluster centers and update each cluster sample through iterative algorithm to achieve the goal of minimizing the distance between the sample and the cluster center to which it belongs. The algorithm steps are as follows: 1. Initialization: randomly select K samples as the initial cluster center (the value of K can be determined by prior knowledge and verification method); 2. 2. Calculate the distance between each sample and k cluster center in the data set, and assign it to the class corresponding to the cluster center with the smallest distance; 3. Recalculate the center position of each cluster class; 4. Repeat the above steps 2 and 3 until a certain termination condition is reached (such as the number of iterations, the center position of the cluster remains unchanged, etc.)

  • Code sample
From sklear. datasets import load_iris # dataset from sklear. cluster import KMeans # KMeans model import matplotlib.pyplot as Lris_df = datasets.load_iris() # Three types of IRIS varieties x = LRIS_df. data k = 3 # were clustered into K clusters, and there were three types of varieties in the known data set. Set as 3 model = KMeans(n_clusters=k) model.fit(x) # Training model print(" Clustering results of the first 10 samples: ",model.predict(x[0:10])) # Model predicts the first 10 samples and outputs the clustering results: [1 1 1 1 1 1 1 1 1] # The clustering effect of samples is shown by scatter diagram x_axis = lRIS_df. data[:,0] # The Sepal length (cm) feature of Iris flower is taken as the X-axis y_axis = Lris_df.data [:,1] # Use the sepal width (cm) feature of iris flower as Y-axis plt.scatter(x_axis, y_axis, Plt.ylabel ('Sepal width (cm)')# select * from model Plt.title ('Iris KMeans Scatter') plt.show() #Copy the code

1.3 Semi-supervised learning

Semi-supervised learning is between traditional supervised learning and unsupervised learning (see FIG. 4.8). Its idea is to introduce unlabeled samples into model training with certain assumptions when the number of tagged samples is small, so as to fully capture the overall potential distribution of data. To improve the traditional unsupervised learning process blindness, supervised learning in insufficient training samples caused by poor learning results. According to the application scenarios, semi-supervised learning can be divided into clustering, classification and regression methods. The following example categorizes club members using a semi-supervised graph-based algorithm, the tag propagation algorithm.

  • Introduction to tag propagation algorithms

Label propagation algorithm (LPA) is a semi-supervised learning classification algorithm based on graph. The basic idea is to predict unlabeled node labels from labeled node label information in a graph network composed of all samples.

  1. First, the complete graph model is established by using the relationship between samples (which can be the objective relationship between samples, or the relationship between samples calculated by using the similarity function).

  2. The tagged tag information (or none) is then added to the graph, and the untagged node is initialized with a random unique tag.

  3. The algorithm converges when the label of a node is set to the label with the highest occurrence frequency among the adjacent nodes of the node, and the iteration is repeated until the label remains unchanged.

  • Code sample

The example’s data set, karate Club, is a widely used social network, where the nodes represent the members of the karate club and the edges represent the relationships between the members.

Import Networkx as nx # Import Networkx Network library import Matplotlib. pyplot as PLT from Networkx. algorithms import community # G=nx.karate_club_graph() Lpa = community-.label_propagation_communities (G) # Community_index = {n: i for i, Enumerate (lPA) for n in com} # Community_index [n] for n in G] # Nx. draw_networkX_labels (G, pos) # nx. draw_networkX_labels (G, pos) # nx. draw_networkX_labels (G, pos) # nx. draw_networkX_labels (G, pos) Plt.title (' Karate_club network LPA') plt.show() #Copy the code

1.4 Reinforcement Learning

Reinforcement learning can be regarded as supervised learning with delayed label information to some extent (As shown in Figure 4.9). It refers to the learning process in which an Agent takes an action in the Environment, which is converted into a reward and a state representation state by the Environment, and then feeds back to the Agent. This book is only a brief introduction to reinforcement learning, you can expand it if you are interested.


The article is published in algorithm Advanced, and the GitHub project source code can be accessed by the public account reading the original text