The paper contains 2149 words and is expected to last 4 minutes



Image credit: Unsplash By Franck V.

There are three main types of tasks in machine learning: supervised, semi-supervised, and unsupervised.

The main difference between the three types is in the availability level of the live data, which is a pointer to a given input and the output value of the model is known in advance.

The goal of supervised learning is to learn a function and, given the sample data and output values of the function, fit the relationship between input and output as best as possible.

Semi-supervised learning aims to label unlabeled data points using what is learned from a small number of labeled data points.

There is no labeled sample output value in unsupervised learning, so its goal is to infer the internal structure of a set of data samples.

Supervised learning

Supervised learning models fit inputs and outputs.

When we want to map an input to an output label or regression, or to map an input to a continuous output, supervised learning is usually done as a sorting task. Common algorithms in supervised learning include logistic regression, naive Bayes, artificial neural network, support vector machine and random forest. In regression and classification, the goal is to find a specific relationship or structure of the input data in order to efficiently generate the correct output data.

Note that the “correct” output depends entirely on the training data, so even if a model does assume real live data, it does not mean that the data displayed is labeled accurately. Data noise or incorrect data labeling can significantly reduce the effectiveness of the model.

complexity

Model complexity refers to the complexity of the function you are trying to learn — similar to the complexity of a polynomial function. The appropriate level of model complexity usually depends on the nature of the training data.

If there is a small amount of data, or if the data is not evenly distributed across different scenarios, a low-complexity model should be used. This is because using a highly complex model over a small number of data points can lead to overfitting.

Overfitting is when the learned function matches the training data very well but is difficult to generalize to other data. In other words, learning to generate training data step-by-step without understanding the actual trends and structure in the data that led to the output. Imagine trying to fit a curve between two points. In theory, you can use any degree of function, but in practice, you simply add complexity to the fit by using linear functions.

Bias variance balance

The bias – variance balance is also related to model generalization. In any model, there is a balance between the bias (that is, the constant error term) and the variance (the amount of error that can change between different data sets). Therefore, in general, models with high bias and low variance will have an error rate of 20%, while models with low bias and high variance will have a random error rate of 5-50%, depending on training data.

Note that bias and variance are usually inversely proportional; Increasing bias usually results in a decrease in variance and vice versa. When modeling, specific problems and the nature of the data will guide the model to make informed choices based on the variation of biases.

In general, increasing the bias (and decreasing the variance) produces a relatively stable model at the baseline level, which may be critical in some tasks. In addition, the variance of the model should be adjusted according to the size and complexity of the training data in order to generate the model with good generalization. In general, low-variance models should be used to learn small and simple data sets, and high-variance models should be used to learn data structures of large and complex data sets.

Semi-supervised learning

Learn untagged and tagged data points.

Semi-supervised learning is between supervised and unsupervised learning. The semi-supervised model is designed to use a little labeled training data and a lot of unlabeled training data. Usually used when the price of labeled data is high or there is a constant data flow.

For example, suppose the intent is to detect inappropriate messages in a social network, and there is too much information and too much cost to obtain manual tagging information for each message. Instead, part of the information can be manually tagged, using semi-supervised techniques that rely on this small group of tagged data to help understand the rest of the information.

Common semi-supervised learning methods include direct support vector machine and graph theory methods, such as tag propagation algorithm.

Assuming that

The semi-supervised approach requires making some assumptions about the data to draw conclusions about unlabeled data points using a small set of labeled data. They can be divided into the following three categories.

1. Smoothing hypothesis – Assume that similar data is more likely to have the same label.

2. Clustering hypothesis – It is assumed that the data naturally forms separate clusters and the data in the same cluster have the same labels.

3. Manifold assumption – Assume that the data resides roughly in a low-dimensional space (or manifold) lower than the input space. This assumption is of great significance when unobservable or hard-to-observe systems with a few parameters produce high-dimensional observable outputs.

Unsupervised learning

Unsupervised models discover the internal structure of data.

The most common tasks in unsupervised learning are clustering, representation learning and density estimation. In these tasks, you want to understand the internal structure of the data without explicitly providing labels. Common algorithms include K-means clustering, principal component analysis and autoencoder. Since no labels are provided, there is no specific method for comparing model performance in most unsupervised learning methods.

Exploratory Data Analysis (EDA)

Unsupervised learning is useful in exploratory analysis because of its ability to automatically identify structures in data. For example, if an analyst is trying to segment consumers, an unsupervised clustering approach would be a good place to start. Unsupervised learning can provide initial guidance for testing individual assumptions in situations where it is impossible or impractical to present data trends.

Dimension reduction

Dimensionality reduction refers to the use of fewer features to represent data and can be done using unsupervised methods. In representation learning, people want to understand the relationship between features so that data can be represented by potential features between initial features. This sparse underlying structure is often represented by using fewer features than the original features, thus making data features more dense and eliminating data redundancy. In other cases, dimension reduction can be used to convert data from one mode to another. For example, a sequence can be converted to a fixed-length representation using a recursive autoencoder.

Recommended Reading topics

Leave a comment like follow

We share the dry goods of AI learning and development. Welcome to pay attention to the “core reading technology” of AI vertical we-media on the whole platform.



(Add wechat: DXSXBB, join readers’ circle and discuss the freshest artificial intelligence technology.)