This is the second day of my participation in Gwen Challenge
We have already set up the basic environment, so today we are going to take a look at some related concepts, mainly about the types of machine learning and the concepts related to data.
Types of machine learning
First of all, let’s take a look at the mainstream types of machine learning, mainly including supervised learning, unsupervised learning, reinforcement learning, and deep learning.
Supervised learning
Supervised learning refers to providing marked data, including basic input data and expected output data. The algorithm will train the model continuously according to the marked expected data to generate a model that is close to the expected data.
Unsupervised learning
Unsupervised learning refers to the fact that the data provided is unlabeled, requiring the machine to explore and drive out potential connections from unlabeled data.
Reinforcement learning
Reinforcement learning is a learning style with incentive mechanism, that is, if the machine acts correctly, it will generate positive incentives, and if the machine acts incorrectly, it will generate negative incentives. In such a scenario to obtain the maximum benefit, to achieve the maximum incentive.
Deep learning
Deep learning is an algorithm derived from the algorithm based on neural network. It takes artificial neural network as the framework to carry on the representation learning of data.
Data and data sets
Machine learning requires data sets, so let’s take a look at the following table:
The serial number | countries | gender | age | income |
---|---|---|---|---|
1 | China | male | 24 | 3500 |
2 | China | female | 44 | 12500 |
3 | The United States | male | 28 | 25000 |
4 | Japan | male | 34 | 18000 |
5 | China | male | 17500 |
In the above data we call the entire table a dataset, we call a row a sample, we call a column in the table a feature, and a specific value in a column we call an attribute value. Of course, there may also be blank data in the data table. For example, the age in line 5 is blank, which is called missing data.
In the above data table, we often expect to infer the income of people in different countries according to their gender and age, so we can divide the above table into two tables:
The serial number | countries | gender | age |
---|---|---|---|
1 | China | male | 24 |
2 | China | female | 44 |
3 | The United States | male | 28 |
4 | Japan | male | 34 |
5 | China | male | – |
The serial number | income |
---|---|
1 | 3500 |
2 | 12500 |
3 | 25000 |
4 | 18000 |
5 | 17500 |
We expect to be able to infer the second table from the first table, as above we can call the data of the first table independent variables, and the data of the second table dependent variables.
In practice, we also need to divide the data into two parts, one for training the model and the other for testing whether the model we generate is accurate, so we can divide the data into the following two parts
The serial number | countries | gender | age |
---|---|---|---|
1 | China | male | 24 |
2 | China | female | 44 |
3 | The United States | male | 28 |
The serial number | countries | gender | age |
---|---|---|---|
4 | Japan | male | 34 |
5 | China | male | – |
The first table we use to train the model is called the training set, and the second model is called the test set.
Then we’ll talk about data preprocessing, which is another necessary operation before machine learning.