"One day, ARTIFICIAL intelligence will look at us like we look at the fossils of lower life on the plains of Africa. In the eyes of AI, humans are just upright apes, using crude language and rudimentary tools, doomed to extinction from birth." -- Ex MachinaCopy the code
Machine learning is a popular subfield of artificial intelligence, covering a wide range of fields. One reason for its popularity is that there is a comprehensive toolbox of complex algorithms, techniques, and methodologies under its strategy. The toolbox has been developed and improved for many years, and new ones are being researched continuously. To better use the machine learning toolkit, we need to understand the following categories of machine learning.
The categories based on whether there is human supervision are as follows.
- Supervised learning. This category relies heavily on human supervision. The algorithm under supervised learning learns the mapping between two variables from the training data and corresponding output and applies the mapping to data that has never been seen before. Classification task and regression task are two main types of supervised learning algorithm.
- Unsupervised learning. Such algorithms attempt to learn the underlying structures, patterns, and relationships inherent in input data without any (under human supervision) associated output or markup. Clustering, dimensionality reduction and association rule mining are the main types of unsupervised learning algorithms.
- Semi-supervised learning. These algorithms are a mixture of supervised and unsupervised learning algorithms. Algorithms in this category use less labeled training data and more unlabeled training data, so they need to creatively use supervised and unsupervised learning methods to solve specific problems.
- Reinforcement learning. This kind of algorithm is slightly different from supervised and unsupervised learning algorithms. The central entity of reinforcement learning algorithms is a proxy that interacts with the environment during training to maximize rewards. Agents learn iteratively and adjust their strategies based on the rewards or penalties they receive from interacting with the environment.
The classification based on data availability is as follows.
- Learn in batches. Also known as offline learning, these algorithms can be used when the required training data is available, and they can be used to train and fine-tune the model before it is deployed to a production environment or the real world.
- Online learning. As the name suggests, learning does not stop in such algorithms as long as the data is available. In addition, in such algorithms, data is fed into the system in small batches, and the next training will use data from the new batch.
The classification approach discussed above gives us an abstract understanding of how machine learning algorithms are organized, understood, and utilized. The most common classification methods of machine learning algorithms are supervised learning algorithm and unsupervised learning algorithm. Let’s discuss these two categories in more detail, as this will help us open up to more advanced topics that we’ll cover later.
1.2.1 Supervised learning
Supervised learning algorithms are a class of algorithms that use data samples (also known as training samples) and corresponding outputs (or labels) to infer mapping functions between them. The inference mapping function or learning function is the output of this training process. Learning functions correctly map new and never-before-seen data points (that is, input elements) to test their own performance.
Several key concepts in supervised learning algorithms are described below.
- Training data set. The training samples and corresponding outputs used in the training process are called training data. Formally, a training data set is a binary tuple consisting of an input element (usually a vector) and corresponding output elements or signals.
- Test the data set. Never-before-seen data set used to test the performance of learning functions. The data set is also a binary tuple containing input data points and corresponding output signals. Data points from this set are not used during the training phase (the data set is also further divided into validation sets, which we will discuss in detail in a subsequent section).
- Learning function. This is the output of the training phase, also known as the inference function or model. The function is inferred from the training instance (input data points and their corresponding outputs) in the training data set. The mappings learned from an ideal model or learning function can also be generalized to never-before-seen data.
There are many supervised learning algorithms available. They can be divided into classification model and regression model according to usage requirements.
1. Classification model
In the simplest terms, classification algorithms can help us answer objective questions or make yes-or-no predictions. For example, these algorithms are useful in scenarios such as “Is it going to rain today? “Or” Could this tumor be cancerous? And so on.
Formally, the key goal of the classification algorithm is to predict the output labels of intrinsic classification based on input data points. Output tags are all categories in nature, that is, they belong to a discrete class or category category.
Logistic regression, Support Vector Machine (SVM), neural network, random forest, K-nearest Neighbour (KNN), decision tree and other algorithms are popular classification algorithms.
Suppose we have a real-world use case to evaluate different car models. For simplicity, we assume that the model is expected to predict whether the output of each vehicle model is acceptable or unacceptable based on multiple input training samples. The attributes of the input training sample include purchase price, number of doors, capacity (in number of people), and security level.
In addition to the class tag, other attributes of each layer are used to indicate whether each data point is acceptable. Figure 1.3 depicts the current binary classification problem. The classification algorithm takes the training sample as input to generate a supervised model, which is then used to predict evaluation labels for a new data point.
Figure 1.3
In the classification problem, since the output labels are discrete classes, the task is called a binary classification problem if there are only two possible output classes, otherwise it is called a multiclass classification problem. For example, predicting whether it will rain tomorrow is a binary classification problem (its output is yes or no); Predicting a number from scanned handwritten images is a multi-class classification problem with 10 labels (possible output labels range from 0 to 9).
2. The regression model
These kinds of supervised learning algorithms help us answer quantitative questions like “how much?” Formally, the key goal of regression models is valuation. In this type of problem, the output labels are essentially continuous values (rather than discrete outputs as in the classification problem).
In regression problems, the input data points are called independent or explanatory variables, and the output is called the dependent variable. Regression models are also trained using training data samples consisting of input (or independent variable) data points and output (or dependent variable) signals. Linear regression, multiple regression, regression tree and other algorithms are supervised regression algorithms.
Regression models can be further classified based on their models of the relationship between dependent and independent variables.
Simple linear regression model is suitable for problems involving a single independent variable and a single dependent variable. Ordinary Least Square (OLS) regression is a popular linear regression model. Multiple regression or multivariable regression is a vector problem with only one dependent variable and each observation is composed of multiple explanatory variables.
Polynomial regression model is a special form of multiple regression. The model uses the n power of the independent variable to model the dependent variable. Polynomial regression models are also called nonlinear regression models because they can fit or map the nonlinear relationship between dependent variables and independent variables.
Figure 1.4 shows an example of linear regression.
Figure 1.4
To understand the different types of regression, consider a real-world use case that estimates the distance a car travels (in units omitted) based on its speed. In this problem, based on the available training data, we can model the distance as a linear function of the car speed (units omitted), or as a polynomial function of the car speed. Remember, the main goal is to minimize errors without fitting the training data itself.
The previous figure 1.4 depicts a linear fitting model, while Figure 1.5 depicts a polynomial fitting model using the same data set.
Figure 1.5
1.2.2 Unsupervised learning
As the name implies, unsupervised learning algorithms learn or infer concepts without supervision. Supervised learning algorithms infer mapping functions based on the training data set composed of input data points and output signals, while the task of unsupervised learning algorithms is to find patterns and relationships in the training data set without any output signals. Such algorithms use input data sets to detect patterns, mine rules, or group/cluster data points to extract meaningful insights from the original input data set.
Unsupervised learning algorithms come in handy when we don’t have a training set containing corresponding output signals or labels. In many real-world scenarios, data sets are available without output signals, and it is difficult to mark them manually. So unsupervised learning algorithms help fill these gaps.
Similar to supervised learning algorithms, unsupervised learning algorithms can also be classified for ease of understanding and learning. Here are the different classes of unsupervised learning algorithms.
1. clustering
The unsupervised learning algorithm for classification problems is called clustering. These algorithms enable us to cluster or group data points into different groups or categories without the need to include any output labels in the input or training data set. These algorithms try to find patterns and relationships from the input data set, grouping them based on some similarity measure using inherent characteristics.
A useful real-world example of clustering is news articles. Hundreds of news stories are created every day, each one on a different topic, such as politics, sports and entertainment. Clustering is an unsupervised way to group these articles, as shown in Figure 1.6.
There are several ways to perform the clustering process, the most popular of which include the following.
- Barycentric approach. For example, the popular K-mean algorithm and k-center algorithm.
- Aggregation and splitting hierarchical clustering method. For example, the popular Ward algorithm and affine propagation algorithm.
- Methods based on data distribution. For example, gaussian mixture model.
- Density-based approach. For example, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), etc.
Figure 1.6
2. Dimension reduction
Data and machine learning are best friends, but more and bigger data brings a lot of problems. A large number of attributes or bloated feature Spaces are common problems. While a large feature space brings problems in data analysis and visualization, it also brings problems related to training, memory, and spatial constraints. This phenomenon is known as the dimensional curse. Because unsupervised methods can help us extract insights and patterns from unmarked training data sets, they are useful in helping us reduce dimensions.
In other words, the unsupervised approach helps us reduce the feature space by selecting a representative set of features from a complete list of available features, as shown in Figure 1.7.
Figure 1.7
Principal Component Analysis (PCA), nearest neighbor Analysis and discriminant Analysis are commonly used dimensionality reduction techniques.
Figure 1.7 is a famous illustration of how pCA-based dimension reduction works. The left side of the image shows a set of data that can be represented as Swiss rolls in 3d space, and the right side of the image shows the results of using PCA to transform the data into 2d space.
3. Association rule mining
Such unsupervised machine learning algorithms can help us understand and extract patterns from trading data sets. These algorithms, called Market Basket Analysis (MBA), help us identify interesting relationships between trading items.
Using association rule mining, we can answer the question “Which items will be purchased together in a particular store?” “Or” Do people who buy wine also buy cheese? Wait for a problem. Fp-growth, ECLAT and Apriori are some widely used algorithms for mining association rules.
4. Anomaly detection
Anomaly detection is the task of identifying rare events or observations based on historical data, also known as outlier detection. Outliers or outliers are usually characterized by infrequency or sudden outbursts over a short period of time.
For this type of task, we provide the algorithm with a historical data set so that it can identify and learn the normal behavior of the data in an unsupervised learning manner. Once the learning is complete, the algorithm helps us identify patterns that are different from the previous learning behavior.
This article is excerpted from Python Migration Learning
This book is designed to help Python practitioners become familiar with and use the techniques in their respective fields. The structure of the book is roughly divided into the following three parts:
- Foundation of deep learning;
- Transfer learning essentials;
- Transfer learning case studies.
The book first introduces the core concepts of machine learning and deep learning. Then it introduces some important Deep learning architectures, such as Deep Neural Network (DNN) and Convolutional Neural Network (Convolutional Neural Network). CNN), Recurrent Neural Network (RNN), Long Short Term Memory (LSTM) and capsule Network; We then introduce the concept of transfer learning and current state-of-the-art pre-training networks such as VGG, Inception, and ResNet, and learn how to leverage these systems to improve the performance of deep learning models. Finally, a number of real-world case studies and problems in different areas such as computer vision, audio analysis, and natural language processing are presented.
After reading this book, readers will be able to implement deep learning and transfer learning in their own systems.