This is the second day of my participation in Gwen Challenge

We have already set up the basic environment, so today we are going to take a look at some related concepts, mainly about the types of machine learning and the concepts related to data.

Types of machine learning


First of all, let’s take a look at the mainstream types of machine learning, mainly including supervised learning, unsupervised learning, reinforcement learning, and deep learning.

Supervised learning

Supervised learning refers to providing marked data, including basic input data and expected output data. The algorithm will train the model continuously according to the marked expected data to generate a model that is close to the expected data.

Unsupervised learning

Unsupervised learning refers to the fact that the data provided is unlabeled, requiring the machine to explore and drive out potential connections from unlabeled data.

Reinforcement learning

Reinforcement learning is a learning style with incentive mechanism, that is, if the machine acts correctly, it will generate positive incentives, and if the machine acts incorrectly, it will generate negative incentives. In such a scenario to obtain the maximum benefit, to achieve the maximum incentive.

Deep learning

Deep learning is an algorithm derived from the algorithm based on neural network. It takes artificial neural network as the framework to carry on the representation learning of data.

Data and data sets


Machine learning requires data sets, so let’s take a look at the following table:

The serial number countries gender age income
1 China male 24 3500
2 China female 44 12500
3 The United States male 28 25000
4 Japan male 34 18000
5 China male 17500

In the above data we call the entire table a dataset, we call a row a sample, we call a column in the table a feature, and a specific value in a column we call an attribute value. Of course, there may also be blank data in the data table. For example, the age in line 5 is blank, which is called missing data.

In the above data table, we often expect to infer the income of people in different countries according to their gender and age, so we can divide the above table into two tables:

The serial number countries gender age
1 China male 24
2 China female 44
3 The United States male 28
4 Japan male 34
5 China male
The serial number income
1 3500
2 12500
3 25000
4 18000
5 17500

We expect to be able to infer the second table from the first table, as above we can call the data of the first table independent variables, and the data of the second table dependent variables.

In practice, we also need to divide the data into two parts, one for training the model and the other for testing whether the model we generate is accurate, so we can divide the data into the following two parts

The serial number countries gender age
1 China male 24
2 China female 44
3 The United States male 28
The serial number countries gender age
4 Japan male 34
5 China male

The first table we use to train the model is called the training set, and the second model is called the test set.

Then we’ll talk about data preprocessing, which is another necessary operation before machine learning.