Illustrated machine learning

Warning: Kill cats!

When it comes to machine learning, people are always confused by the various algorithms and methods, and feel confused. It is true that there are many different approaches to machine learning, but there are some that can be followed if you master the right path and method. Here I recommend SAS’s Li Hui’s blog on how to choose a variety of approaches to machine learning.

In addition, SciKit-Learn provides a clear road map to choose from:

In fact, the basic algorithms of machine learning are very simple. Let’s take a look at some basic algorithms in machine learning and how they work using two-dimensional data and interactive graphics. (Hats off to Bret Victor, who has deeply influenced me at Inventing on Principle.)

All the code, or demos, can be found in this Collection of my Codepen.

First, the largest branch of machine learning is supervised learning and unsupervised learning. To put it simply, the labeled data is supervised learning, while the unlabeled data is unsupervised learning. In terms of large classification, regression and classification belong to unsupervised learning, while regression and classification belong to supervised learning.

Unsupervised learning

If your data isn’t labeled, you can either pay someone to label your data, or use unsupervised learning.

First you can consider whether to reduce the dimension of your data.

Dimension reduction

Dimensionality reduction, as the name implies, is to transform high-dimensional data into low-dimensional data. Common dimensionality reduction methods include PCA, LDA, SVD and so on.

Principal component analysis (PCA)

The classic method of dimensionality reduction is principal component analysis (PCA), which is to find the main components of data and discard the unimportant ones.

Here, we randomly generate 8 data points with the mouse, and then draw a white straight line representing the principal components. This line is the principal component of the two-dimensional data after dimensionality reduction, and the blue line is the projection line of the data point on the new principal component dimension, that is, the perpendicular line. The mathematical meaning of principal component analysis can be thought of as finding this white line that minimizes the sum of the lengths of the projected blue line segments.

See the Pen ML Explained PCA by gangtao (@gangtao) on CodePen.

For more examples of PCA, please refer to:

D3 bl.ocks.org/hardbyte/40…
Setosa. IO/ev/principa…

clustering

Because in an unsupervised learning environment, data is not labeled, the best analysis that can be done on data besides dimensionality reduction is to merge data with the same characteristics, which is called clustering.

Hierarchical clustering is Hierachical Cluster

The clustering method is used to build a cluster with a hierarchical structure,

As shown in the figure above, the hierarchical clustering algorithm is very simple:

At the beginning, all points are a cluster by themselves
Find the two closest clusters (that is, two points at the beginning) and form a cluster
The distance between two clusters refers to the distance between the nearest two points in the cluster
Repeat the second step until all the points are clustered into the cluster.

See the Pen ML Explained Hierarchical Clustering by gangtao (@gangtao) on CodePen.

KMeans

The K-means algorithm is the most common clustering algorithm.

Randomly pick K (where K=3) central seed points in the graph.
Then calculate the distance of K central seed points from all points in the figure. If point P is closest to the central point S, then P belongs to the cluster of POINTS S.
Next, we move the center point to the center of his “cluster”.
Then steps 2) and 3) are repeated until, the center point does not move, the algorithm converges and all clusters are found.

The KMeans algorithm has several problems:

How do I determine the value of K? In the example above, I choose K equals 3 because I know I have to divide it into three clusters. However, in practical applications, I often do not know how many clusters I should divide into
Since the initial location of the center point is random and may not be classified correctly, you can try different data in my Codepen
As shown in the figure below, if the distribution of data has particularity in space, KMeans algorithm cannot effectively classify data. The dots in the middle are classified as orange and blue, respectively, when they should both be blue.

See the Pen ML Explained KMeans by gangtao (@gangtao) on CodePen.

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based Clustering algorithm.

The DBSCAN algorithm is based on the fact that a cluster can be uniquely determined by any core object in it.

The specific clustering process of the algorithm is as follows: scan the whole data set, find any core point, and expand the core point. The method of expansion is to find all density-connected data points starting from the core point. Iterate over all core points in the neighborhood of the core point (because boundary points are not scalable), looking for points connected to the density of these data points, until there are no data points that can be expanded. The boundary nodes of the cluster are all non-core data points. After that, the data set is rescan (excluding any data points in the previously found cluster) to find the core points that are not clustered, and then repeat the above steps to expand the core points until there are no new core points in the data set. A data point in a data set that is not contained in any cluster constitutes an outlier.

See the Pen ML Explained DBSCAN by gangtao (@gangtao) on CodePen.

As shown in the figure above, DBSCAN can effectively solve the data set that KMeans cannot correctly classify. And you don’t have to know the K value.

Of course, DBCSAN still decides on two parameters, and how it decides on these two parameters is a key factor in the classification effect:

One parameter is the radius (Eps), which represents the range of the circular neighborhood centered at a given point P;
Another parameter is the minimum number of points in the neighborhood centered on point P (MinPts). Point P is called core point if: the number of points in the neighborhood centered on point P with radius Eps is not less than MinPts.

Supervised learning

Data in supervised learning is required to have labels. That is, to predict new data from existing results. If the thing we’re predicting is a numerical type, we call it regression, and if the thing we’re predicting is category or discrete, we call it classification.

In fact, regression and classification are essentially similar, so many algorithms can be used for both classification and regression.

Return to the

Linear regression

Linear regression is the most classical regression algorithm.

In statistics, Linear regression is a regression analysis that models the relationship between one or more independent and dependent variables using least-squares functions called Linear regression equations. Such functions are linear combinations of one or more model parameters called regression coefficients. The case with only one independent variable is called simple regression, and the case with more than one independent variable is called multiple regression.

See the Pen ML Explained Linear Regression by gangtao (@gangtao) on CodePen.

Linear regression, as shown above, is all about finding a line that minimizes the error in predicting all points. The sum of the blue line is the smallest. This diagram looks a lot like the PCA in our first example. Look carefully and tell the difference.

If the accuracy of the algorithm is highly required, the recommended regression algorithms include Random forest, neural network or Gradient Boosting Tree.

If speed is required, decision trees and linear regression are recommended.

classification

Support vector machine SVM

If the accuracy of classification is highly required, the algorithms that can be used include Kernel SVM, random forest, neural network and Gradient Boosting Tree.

Given a set of training instances, each marked as belonging to one or the other of two categories, the SVM training algorithm creates a model that assigns new instances to one of the two categories, making it an improbabilistic binary linear classifier. The SVM model represents instances as points in space, such that the mapping separates instances of individual classes by as wide an apparent interval as possible. The new instances are then mapped to the same space and the category is predicted based on which side of the interval they fall on.

See the Pen ML Explained SVM by gangtao (@gangtao) on CodePen.

As shown in the figure above, SVM algorithm is to find a straight line in space, which can best divide two sets of data. Make the absolute sum of the distances from the two sets of data to the line as large as possible.

See the Pen ML Explained SVM Kernels by gangtao (@gangtao) on CodePen.

The figure above illustrates the different classification effects of different kernel methods.

The decision tree

If the classification results are required to be explicable, consider decision trees or logistic regression.

A decision tree is a tree structure (which can be binary or non-binary).

Each of its non-leaf nodes represents a test on a characteristic attribute, each branch represents the output of this characteristic attribute on a range of values, and each leaf node stores a category.

The process of using decision tree to make decisions is to test the corresponding characteristic attributes in the items to be classified from the root node, and select the output branches according to their values until it reaches the leaf node, and take the categories stored in the leaf node as the decision result.

Decision trees can be used for regression or classification, as shown below for an example of classification.

See the Pen ML Explained Decision Tree by gangtao (@gangtao) on CodePen.

As shown in the figure above, the decision tree divides the space into different regions.

Logistic regression

Logistic regression, though named regression, is a classification algorithm. Because it is a dichotomy similar to SVM, the mathematical model is the probability of predicting 1 or 0. So I say regression and classification are essentially the same thing.

See the Pen ML Explained Logistic Regression by gangtao (@gangtao) on CodePen.

Note the difference between logistic regression and linear SVM classification, which can be read:

www.zhihu.com/question/26…
blog.jobbole.com/98635/

Naive Bayes

Naive Bayes method is a good choice when the data volume is quite large.

In 2015, I shared the Bayers method with my colleagues in the company, but unfortunately the speaker deck was blocked. If you are interested, you can find a way by yourself.

See the Pen ML Explained Naive Bayes by gangtao (@gangtao) on CodePen.

As shown in the figure above, you can think about the effect of the green dot on the overall classification results.

KNN

KNN classification is probably the simplest of all machine learning algorithms.

See the Pen ML Explained KNN by gangtao (@gangtao) on CodePen.

As shown in the figure above, K=3, move the mouse pointer to any point, find K points closest to the point, then, these K points vote, and the majority vote wins. It’s that simple.

conclusion

This paper uses two-dimensional interaction diagrams to help you understand the basic algorithms of machine learning, hoping to increase your understanding of various methods of machine learning. All the code can be found in reference. Welcome to talk to me.

Reference:

Code and demo animation
- My Codepen Collection contains all the demo code
- My Github contains all the demo animations
Javascript-based machine learning libraries and demonstrations
- A javascript-based Machine learning library that is used in some of the demos in this article.
- Another javascard-based machine learning library, not as feature-rich or dynamic as the first one, but well demonstrated
- Nice demo, three regressions and one cluster
If you want to build your own machine learning algorithms, you can use some basic math libraries
- Numeric Javascript is a javasjavas-based library for numerical computation and analysis, providing linear algebra, complex number computation, and more.
- This and the previous one can be considered as a JavaScript counterpart to Python’s Numpy/Scipy/Sympy.
- Victorjs 2D vector library
Recommend some road maps for machine learning
- ml-cheatsheet.readthedocs.io/en/latest/
- 10 machine learning algorithms www.gitbook.com/book/wizard…
- Blogs.sas.com/content/sub…
- Scikit-learn.org/stable/tuto…
tool
- Convert the MOV file online to a GIF convertio.co/zh/mov-gif/ or cloudconvert.com/mov-to-gif
- GIF editing tool ezgif.com

comments