Author: Zhu Jinxing, a big data scientist

As one of the hottest words in recent years, artificial intelligence has been a hot topic in the technology circle. With the iteration of intelligent hardware, smart home products have gradually entered thousands of households, and ai-related technologies such as voice recognition and image recognition have also experienced stepped development. How to view the nature of artificial intelligence? What is the history of the rapid development of artificial intelligence? This article introduces several concepts frequently mentioned in the field of artificial intelligence and a brief history of AI development from a technical perspective.

I. Concepts related to artificial intelligence

1. Artifical Intelligence (AI) : it is the application of machine learning and deep learning in practice to make machines intelligent and thinking like humans. Artificial intelligence is better understood as an industry, generally referring to the production of more intelligent software and hardware, the method of artificial intelligence is machine learning.


2. Data mining: Data mining is a non-trivial processing process that extracts effective, novel, potentially effective, credible and finally understood patterns from a large amount of data.

Data mining uses statistics, machine learning, database and other technologies to solve problems; Data mining is not only statistical analysis, but the extension and extension of statistical analysis methodology. Many mining algorithms come from statistics.

3. Machine learning: The study of how computers simulate or implement human learning in order to acquire new knowledge or skills. Machine learning is the study of computer algorithms that can be automatically improved through experience.


Machine learning is built on the development of data mining technology, is only a new branch and segmentation in the field of data mining, but based on big data technology makes it gradually become the current prominent and mainstream. It is the core of artificial intelligence, is the fundamental way to make computer intelligent, its application throughout the various fields of artificial intelligence.


4. Deep Learning: it is a new field in machine Learning research, which is relatively shallow Learning. Its motivation lies in the establishment and simulation of human brain neural network for analyzing Learning. It mimics the brain’s mechanisms for interpreting data, such as images, sound and text. The concept of deep learning originates from the research of artificial neural network. Deep learning discovers distributed feature representation of data by combining low-level features to form more abstract high-level representation attribute categories or features.

Machines trained in deep learning are now as good as humans at recognizing images, such as cats, signatures of cancer cells in blood, and tumors in MRI scans. In areas such as Google’s AlphaGo learning go, AI has surpassed the limits of what humans can do today.

In order to facilitate your understanding, the relationship between the four concepts mentioned above is shown in the following figure. It should be noted that the graph shows only a rough dependency relationship, in which data mining and artificial intelligence are not completely inclusive.


Second, the development history of artificial intelligence


(Photo from Internet)

It can be clearly seen from the figure that Deep Learning experienced two valleys before its rise in 2006, which also divided the development of neural network into several different stages. These stages are described below.

1. First-generation Neural Network (1958-1969)

The idea of neural network originated from the MP artificial neuron model in 1943. At that time, it was hoped that the process of human neuron reaction could be simulated by computer. This model simplified the neuron into three processes: linear weighting of input signals, sum and nonlinear activation (threshold method). As shown below:


In 1958, Rosenblatt invented the Perceptron algorithm. The algorithm uses MP model to binary classify the input multidimensional data, and can automatically learn and update weights from training samples by gradient descent method. In 1962, the method was proved to be capable of convergence, and the theoretical and practical effects caused the first wave of neural networks.

1. Second Generation Neural Network (1986~1998)

Hinton, the master of modern Deep Learning, broke the nonlinear curse for the first time. In 1986, Hinton invented BP algorithm suitable for multi-layer perceptron (MLP) and adopted Sigmoid for nonlinear mapping, effectively solving the problem of nonlinear classification and Learning. This approach caused a second wave of neural networks.

In 1989, Robert Hecht-Nielsen proved the universal approximation theorem of MLP, that is, for a continuous function F in any closed interval, it can be approximated by a BP network containing a hidden layer. The discovery of the theorem greatly encouraged the researchers of neural networks.

In the same year, LeCun invented convolutional neural network -LeNet, and applied it to number recognition, and achieved good results, but did not attract enough attention at that time.

It is worth emphasizing that since no special method was put forward after 1989, and neural network (NN) has been lacking the corresponding strict mathematical theory support, the upsurge of neural network gradually cooled down.

In 1997, LSTM model was invented. Although this model has outstanding characteristics in sequence modeling, it has not attracted enough attention because it is in the downhill phase of NN.

3. Spring of Statistical Modeling (1986~2006)

In 1986, decision tree method was put forward, and soon improved decision tree methods such as ID3, ID4 and CART appeared successively.

In 1995, linear SVM was proposed by statistician Vapnik. This method has two characteristics: it is derived from perfect mathematical theory (statistics and convex optimization, etc.) and conforms to people’s intuitive feeling (maximum interval). Most importantly, however, the method achieved the best results of its time on linear classification problems.

In 1997, AdaBoost was proposed as a representative of PAC (Probably Approximately Correct) theory on machine learning practices, which also gave rise to the category of integrated methods. The method achieves the effect of strong classifier by integrating a series of weak classifiers.

In 2000, KernelSVM was proposed. Nucleated SVM successfully solved the problem of nonlinear classification by mapping the problem of linear inseparability of original space into linear separable problem of high-dimensional space through Kernel in a clever way, and the classification effect was very good. So far also more end NN era.

In 2001, random forest was proposed, which is another representative of the integration method. The theory of this method is solid, better than AdaBoost to suppress the over-fitting problem, and the practical effect is very good.

In 2001, a new unified frame-graph model was proposed, which attempted to unify the chaotic methods of machine learning, such as naive Bayes, SVM, hidden Markov model, and so on, to provide a unified description framework for various learning methods.

4. Rapid development period (2006~2012)

2006 was the first year of deep learning (DL). In that year, Hinton proposed a solution to the problem of gradient disappearance in deep network training: unsupervised pre-training initializing weights + supervised training fine-tuning. The main idea is to first learn the structure of training data (autoencoder) by self-learning method, and then conduct supervised training fine-tuning on the structure. However, the paper did not attract attention because there was no particularly effective experimental verification.

In 2011, ReLU activation function was proposed, which can effectively suppress the gradient disappearance problem.

Microsoft made a big breakthrough in 2011 when it first applied the DL to speech recognition.

5. Outbreak Period (2012~ present)

In 2012, Hinton’s research group participated in ImageNet image recognition competition for the first time in order to prove the potential of deep learning. The CNN AlexNet built by Hinton won the first place and beat the second place in classification performance (SVM method). Thanks to the competition, CNN has attracted the attention of many researchers.

AlexNet’s innovations:

(1) The ReLU activation function is adopted for the first time, which greatly increases the convergence speed and fundamentally solves the problem of gradient disappearance;

(2) Since ReLU method can well suppress the problem of gradient disappearance, AlexNet abandoned the method of “pre-training + fine-tuning” and completely adopted supervised training. Because of this, the mainstream learning method of DL has changed into pure supervised learning;

(3) The LeNet5 structure is extended to add the Dropout layer to reduce overfitting, while the LRN layer enhances generalization/overfitting.

(4) GPU is used for the first time to accelerate computing.

Conclusion: As one of the most influential technologies in the 21st century, ARTIFICIAL intelligence not only beats us in go and data mining, but also challenges us in image recognition, speech recognition and other fields. Nowadays, ARTIFICIAL intelligence is also integrating and evolving with the Internet of Things, quantum computing, cloud computing and many other technologies, and developing at a speed beyond our imagination. And all this happened and evolved in just a few decades…