The previous article machine Learning introduced you to the basics of machine learning and briefly introduced the concept of machine learning. This article will continue to introduce the basics of machine learning. We rely on large amounts of data for machine learning, and data is the foundation of machine learning. Let’s start by looking at some concepts related to data
Recognize data in machine learning
Let’s take the data of iris as an example to understand the basic concept of data
The length and width of petals and sepals in the figure are all characteristics of data
Each sample is actually a point in the space composed of various features of the sample, which is called special spatial classification. The essence of classification is to slice in the feature space, for example, the red part is the red iris, and the blue part is the blue iris. The example in the diagram shows the data in two dimensional space, and the same is true in multi-dimensional space, if it’s not convenient for us to think about a problem in high latitude space we usually analyze it in the present position space and then generalize the results from low latitude space to high latitude space.
In the above example, iris features have very clear semantics, but in machine learning features can be very abstract, for example, each pixel in image recognition is a feature, for example, a 28*28 image has 784 features, of course, there will be more features if it is a color image.
Machine learning is taking as much information as possible and asking the machine to find the relationship between that information and the result that we want in the end, and the features that we give to the machine will largely determine the accuracy and reliability of the result that the algorithm eventually computates and there’s a special area of research for that which is feature engineering, For example, deep learning can be understood as the algorithm automatically helps us to carry out feature engineering
Machine learning tasks
The basic tasks concerned by supervised learning in machine learning can be divided into two categories: classification tasks and regression tasks.
Classification task
The second category
The common dichotomies are
- Determine whether the message is spam or not
- Determine whether the patient’s tumor is benign or malignant
- Judge the rise or fall of a stock, etc
- Determine whether credit card users are at risk or not
Many classification
The common multiple categories are
- Digital recognition
- Image recognition
- Determine the risk rating of credit cards issued to customers
Many complex problems can also be converted into multi-classification problems, such as 2048 games, Go games, unmanned cars, but pay attention to solve does not represent the best way.
Some algorithms only support binary tasks; However, multi-classification tasks can be converted into binary tasks; There are algorithms that can naturally perform multiple categories
Multi-label task
For example, the picture is divided into multiple categories
Return to the task
The final conclusion of the classification task is to get a category, whileReturn to the taskYou get a continuous number of values, not a category, such as home prices
Such as:
- House prices
- Market analysis predicts sales
- Student achievement
- The price of the stock
Some algorithms solve regression problems intelligently; Some algorithms can only solve classification tasks; Some algorithms can solve both regression problems and classification tasks
In some cases, regression task can be simplified to classification task, such as Angle tree value of steering wheel in unmanned driving.
The classification tasks and regression tasks mentioned above are classified from the perspective of the problems that machine learning can solve, rather than the machine learning algorithm itself. We mentioned supervised learning above, so what is supervised learning? And if we look at machine learning itself, what categories can we classify machine learning algorithms into?
Classification of machine learning
Machine learning can be divided into supervised learning, unsupervised learning, semi-supervised learning and enhanced learning.
Supervised learning
It means the training data we give the machine has a marker or an answerFor example, the iris data set and real estate data mentioned above are availabletagthe
There are many examples of supervised learning in real life, such as
- The image already has tag information
- Banks have accumulated a certain amount of customer information and credit status on their credit cards
- The bank has accumulated a certain amount of information about patients and whether they were eventually diagnosed with the disease
- The market accumulates basic information about a home and the amount of money ultimately sold
Many algorithms in machine learning are supervised learning algorithms, such as
- K neighbor
- Linear regression and polynomial regression
- Logistic regression
- SVM
- Decision trees and random forests
Unsupervised learning
The training data pointed to the machine had no marks or answers
The significance of unsupervised learning lies in classifying and clustering analysis of unlabeled data
Semi-supervised learning
Part of the training data given to the machine is marked or answered, and part is not
More common, such as: missing marks caused by various reasons.
Unsupervised learning is usually used to process the data first, and then supervised learning is used to train and forecast the model
To enhance learning
Take action based on the circumstances around you, and learn from the results of taking actionThe following figure determines the final behavior according to the environment. Common examples include Go, unmanned driving and robot.
Supervised learning and semi-supervised learning are the basis of reinforcement learning
Other categories of machine learning
- Online learning and Batch learning (offline learning)
- Parametric learning and nonparametric learning
Batch learning
The batch learning process is shown below
Pros: Simplicity
Question: How to adapt to the changing environment? Solution: Re-learn in batches regularly
Disadvantages: re-batch learning each time, huge amount of calculation; In some cases, the environment changes so drastically that it is even impossible.
Online learning
The online learning process is shown below
Advantages: timely response to new environmental changes
Q: Does new data bring bad changes? Solution: Data monitoring needs to be strengthened
Others: It is also suitable for the environment where the amount of data is huge and batch learning is completely impossible
Parameter learning
Once you learn parameters, you no longer need raw data
Nonparametric learning
- Don’t make too many assumptions about the model
- Nonargument does not mean no argument
Machine learning thinking
Machine learning focuses on solving uncertain problems. Unlike classical algorithms, which often have standard, deterministic, and unique answers, machine learning gives us probabilistic and statistically significant answers that are uncertain. Faced with this answer, we often think about its reliability, how much we can trust the answer, and the nature of what machine learning has learned.
In fact, as long as the data of our algorithm is enough and the quality of the data is good enough, there is even the concept of data as an algorithm
- The data itself is important
- Data driven, data cleaning, processing, feature engineering, etc
Of course, there are other theories such as algorithm is king
How to choose machine algorithm?
As mentioned above, machine learning mainly solves two kinds of problems: classification problem and regression problem. In fact, it can apply Occam’s razor principle: simple is good.
There is no such thing as a free lunch
- It can be rigorously deduced that the expected (average) performance of any two algorithms is the same.
- When it comes to a particular problem, one algorithm might be better.
- No algorithm is absolutely better than another algorithm.
- It doesn’t make sense to talk about which algorithm is good apart from the specific problem.
- In the face of a specific problem, it is necessary to try to use multiple algorithms for comparative experiments
Above we have a general overview of machine learning global introduction, I believe that we have a closer understanding of machine learning, and will continue to learn the algorithms related to machine learning.