Ai will be as disruptive as electricity. – Wu En da
Like steam engines in the age of steam, generators in the age of electricity, and computers and the Internet in the age of information, artificial intelligence (AI) is enabling various industries and pushing humanity into the age of intelligence.
This paper introduces artificial intelligence and its main ideological factions, further systematically combs its development history, landmark achievements and focuses on the introduction of its algorithm ideas, presents the history of more than 60 years of ups and downs in a clear context, so as to look into the future trend of artificial intelligence (AI).
I. Introduction to artificial intelligence
1.1 Research objectives of artificial intelligence
Artificial Intelligence (AI) research aims to explore the essence of Intelligence, Extend human intelligence – led intelligent agent will listen to (speech recognition, machine translation, etc.), will see (image recognition, character recognition, etc.), will say (voice synthesis, man-machine dialogue, etc.), thinking (man-machine game, expert system, etc.), will learn (knowledge representation, machine learning, etc.), action (robot, self-driving cars, etc.). A classic definition of AI is: “The ability of an intelligent agent to understand data, learn from it, and use that knowledge to achieve specific goals and tasks. A system’s ability to correctly interpret external data, to learn from such data, And to use those Learnings to achieve specific goals and tasks through flexible adaptation.”
1.2 Schools of artificial intelligence
In the process of the development of artificial intelligence, people of different eras and academic backgrounds have different ideas and propositions on the understanding of intelligence and its realization methods, and thus different schools of thought have been derived. The influential schools and their representative methods are as follows:
Among them,Symbolism and connectionismAre the two main factions:
- Symbolicism, also known as logicism and computer school, holds that cognition is the derivation and calculation of meaningful expression symbols, and regards learning as backward deduction, advocating the establishment of artificial intelligence system with explicit axioms and logical systems. If a decision tree model is used to input business characteristics to forecast weather:
- Connectionism, also known as bionics school, believes in the reverse engineering of brain, advocates the research on human cognition through mathematical model, and realizes artificial intelligence through the connection mechanism of neurons. For example, a neural network model is used to input radar image data to predict weather:
Ii. History of artificial intelligence
From the beginning, artificial intelligence (AI) has been exploring a road full of unknowns, with twists and turns. We can roughly divide the development process into five stages:
- Initial development period: 1943-1960s
- Reflect on the development period: 1970s
- Application development period: 1980s
- Stable development period: 1990s to 2010
- Flourishing period: from 2011 to present
2.1 Initial development period: 1943-1960s
After the concept of artificial intelligence was put forward, symbolism and connectionism (neural network) were developed, and a number of remarkable research achievements have been made successively, such as machine theorem proof, checkers program, man-machine dialogue, etc., which set off the first climax of the development of artificial intelligence.
-
In 1943, American neuroscientist Warren McCulloch and logician Water Pitts put forward the mathematical model of neurons, which is one of the foundation stones of modern artificial intelligence.
-
The idea of machines producing intelligence first came to light in 1950 when Alan Mathison Turing proposed the Turing Test, which tests whether a machine can display intelligence that is indistinguishable from a human.
-
In 1950, Claude Shannon proposed computer games.
-
In 1956, the term artificial Intelligence (AI) was formally used at the Dartmouth College Summer Symposium on Artificial Intelligence. This is the first artificial intelligence seminar in human history, marking the birth of artificial intelligence discipline.
-
In 1957, Frank Rosenblatt simulated a neural network model he invented called the Perceptron on an IBM-704 computer.
Perceptron can be regarded as a feedforward artificial neural network in its simplest form. It is a binary linear classification discriminant model, whose input is the eigenvector of the instance like (x1,x2…). , the activation function f of the neuron is sign, and the output is the category of the instance (+1 or -1). The goal of the model is to separate the positive and negative types of the input instance through the hyperplane.
- Logistic Regression was proposed by David Cox in 1958.
LR is a linear classification and discrimination model similar to the perceptron structure. The main difference is that the activation function F of neurons is Sigmoid, and the goal of the model is (maximum likelihood) to maximize the probability of correct classification.
-
In 1959, Arthur Samuel gave machine learning a clear definition: Field of study that gives computers the ability to learn without being explicitly Machine learning is the study of how computers are programmed to learn without explicit
-
In 1961, Leonard Merrick Uhr and Charles M Vossler published A paper entitled A Pattern Recognition Program That Generates, Evaluates and Adjusts its Own Operators’ pattern Recognition paper, which describes an attempt to design pattern recognition programs using machine learning or self-organizing processes.
-
In 1965, I. J. Good published an article on the possible future threat of ARTIFICIAL intelligence to humanity, a precursor to the “AI threat theory.” He believes that the superintelligence of machines and the inevitable explosion of intelligence will eventually be beyond human control. Later, famous scientist Hawking, inventor Musk and other people’s dire predictions of artificial intelligence echoed the warning of Goodall half a world ago.
-
In 1966, ELIZA- A Computer Program for the Study of Natural Language Communication between, published by Joseph Weizenbaum, a scientist at MIT, at the ACM Man and Machine described how ELIZA’s programs made it possible for humans and computers to have a natural language conversation to a certain extent. ELIZA was implemented by decomposing input through keyword matching rules, and then generating responses according to the corresponding recombination rules of the decomposition rules.
-
In 1967, Thomas et al. proposed The nearest Neighbor Algorithm.
The core idea of KNN is that given a training data set, for the new input instance Xu, K instances closest to the instance are found in the training data set, and the category of the largest number of these K instances is taken as the category of the new instance Xu.
- In 1968, Edward Feigenbaum proposed the first expert system DENDRAL and gave a preliminary definition of knowledge base, which gave birth to the second wave of artificial intelligence. The system has very rich chemical knowledge and can help chemists infer molecular structure from the mass spectrometry data.
Expert Systems is an important branch of AI. It is one of the three major research directions of AI, together with natural language understanding and robotics. It is defined as the use of computer models of inference by human experts to deal with complex problems in the real world that need to be explained by experts and reach the same conclusions as the experts. It can be regarded as the combination of “knowledge base” and “inference machine”.
- In 1969, Marvin Minsky, a representative of “symbolism”, raised the question of linear indivisibility of XOR in his book The Perceptron: The single-layer perceptron cannot divide XOR raw data. To solve this problem, a higher-dimensional nonlinear network (MLP, requiring at least two layers) needs to be introduced. However, there is no effective training algorithm for multi-layer network. These arguments dealt a heavy blow to the research of neural networks, and the research of neural networks went into a period of low tide for ten years.
2.2 Reflection on the development period: 1970s
The breakthrough in the early stage of the development of artificial intelligence greatly raised people’s expectations on artificial intelligence, and people began to try more challenging tasks. However, the lack of computing power and theory made unrealistic goals fall through, and the development of artificial intelligence fell into a low point.
- In 1974, Harvard University (Paul Werbos) doctoral thesis, the first proposed through error back propagation (BP) to train artificial neural network, but in this period did not attract attention.
The basic idea of BP algorithm is not to use the error itself to adjust the weight (as perceptron does), but to use the derivative of the error (gradient) to adjust. Through the error gradient to do back propagation, update the weight of the model, in order to reduce the error of learning, fitting learning objectives, to achieve the ‘universal approximation function of the network’ process.
-
In 1975, Marvin Minsky put forward the Knowledge representation learning Framework theory applied to artificial intelligence in his paper A Framework for Knowledge Representation.
-
In 1976, Randall Davis built and maintained a large-scale knowledge base and proposed that the use of integrated object-oriented model can improve the integrity of knowledge base (KB) development, maintenance and use.
-
In 1976, Edward H. Shortliffe of Stanford University and others completed MYCIN, the first medical expert system for diagnosis, treatment and consultation of blood infection diseases.
-
In 1976, Dr. Lenat of Stanford University published his paper “Artificial Intelligence Methods found in Mathematics — Heuristic Search”, describing a program named “AM”, which developed new concept mathematics under the guidance of a large number of heuristic rules, and eventually rediscovered hundreds of common concepts and theorems.
-
In 1977, Hayes. Roth et al. ‘s logic-based machine learning system made great progress, but it could only learn a single concept and was not put into practical application.
-
In 1979, a computer program built by Hans Berliner defeated the world champion backgammon. (Later, behavior-based robotics, promoted by the likes of Rodney Brooks and Sutton, developed rapidly as an important branch of artificial intelligence. The self-learning backgammon program developed by Gary Tesoro and others laid the foundation for reinforcement learning.)
2.3 Application development period: 1980s
Artificial intelligence has entered a new upsurge in application development. Expert system simulates the knowledge and experience of human experts to solve the problems in specific fields, realizing a major breakthrough of artificial intelligence from theoretical research to practical application, from general reasoning strategy discussion to the application of special knowledge. Machine learning (especially neural network) explores different learning strategies and learning methods, and has begun to recover in a large number of practical applications.
-
In 1980, the first International Symposium on Machine Learning was held in Carnegie Mellon University (CMU) in the United States, marking the rise of machine learning research in the world.
-
In 1980, Drew McDermott and Jon Doyle proposed nonmonotonic logic and, later, robotic systems.
-
In 1980, Carnegie Mellon university developed an expert system for Digital called XCON, which saved the company $40 million a year and was a huge success.
-
In 1981, R.P.Paul published his first robotics textbook, Robot: Mathematics, Programmings and Control, marking the coming of age of robotics.
-
In 1982, David Marr published his representative work theory of visual computing, which proposed the concept of Computer Vision and constructed the Vision theory of the system, which has also exerted a profound influence on CognitiveScience.
-
In 1982, John Hopfield invented the Hopfield network, a prototype of the earliest RNN. Hopfield neural network model is a kind of single-layer feedback neural network (neural network structure can be divided into feedforward neural network, feedback neural network and graph network), with feedback connection from output to input. It has been widely used in machine learning, associative memory, pattern recognition, optimization calculation, parallel implementation of VLSI and optical devices in artificial intelligence.
-
In 1983, Terrence Sejnowski, Hinton et al. invented Boltzmann Machines, also known as random Hopfield network, which is essentially an unsupervised model used to reconstruct input data to extract data features for predictive analysis.
-
In 1985, The Bayesian Network was proposed by Judea Pearl, who is known for advocating probabilistic approaches to artificial intelligence and developing Bayesian networks, and is credited with developing a theory of causality and counterfactual reasoning based on structural models.
Bayesian network is a kind of uncertainty processing model to simulate causality in human reasoning process, such as the common naive Bayesian classification algorithm is the most basic application of Bayesian network.
The topological structure of Bayesian network is a directed acyclic graph (DAG). Random variables involved in a research system are drawn in a directed graph according to whether conditions are independent to describe the conditional dependence between random variables, and random variables are represented by circles. Conditional dependencies with arrows form Bayesian networks. For any random variable, the joint probability can be obtained by multiplying the respective local conditional probability distributions. As shown in figure b depends on the a (i.e., a – > b), c is dependent on the a and b, a independent without relying on, according to bayes’ theorem is P (a, b, c) = P (a) * P (b | a) * P (c | a, b)
-
In 1986, Rodney Brooks published his paper “Robust Hierarchical Control Systems for Mobile Robots”, which marked the creation of behaviation-based robotics, and the robotics community began to pay attention to practical engineering topics.
-
In 1986, Geoffrey Hinton and others successively proposed the concept of combining multi-layer perceptron (MLP) with back propagation (BP) training (this method still had many challenges in computing power at that time, which was basically related to the gradient algorithm of chain derivative), which also solved the problem that single-layer perceptron could not do nonlinear classification. Opened a new round of neural network climax.
- In 1986, Ross Quinlan proposed ID3 decision tree algorithm.
The decision tree model can be regarded as the combination of multiple rules (if, THEN), which is quite different from the neural network black box model in that it has good model interpretation.
The core idea of ID3 algorithm is to build A decision tree through A top-down greedy strategy: features are selected and divided according to information gain (information gain refers to the degree of uncertainty reduction of data D after information of attribute A is introduced. In other words, the greater the information gain, the stronger the ability to distinguish D), the decision tree is constructed recursively in turn.
- In 1989 George Cybenko proved the Universal approximation Theorem. Simply put, the multilayer feedforward network can approximate any function, and its expression force is equivalent to that of Turing machine. This fundamentally eliminates Minsky’s doubts about the expressive power of neural networks.
“Universal approximation theorem” can be regarded as the basic theory of neural network: A feedforward neural network can approximate any Borel measurable function from one finite dimensional space to another with arbitrary precision if it has a linear layer and at least one layer of activation functions with “squeezing” properties (such as Sigmoid, etc.), given a sufficient number of hidden elements in the network.
- In 1989, LeCun (the father of CNN) invented Convolutional Neural Network (CNN) by combining the Convolutional Neural layer of back propagation algorithm and weight sharing, and successfully applied Convolutional Neural Network to the handwritten character recognition system of the American Post Office for the first time.
Convolutional neural networks are usually composed of input layer, convolutional layer, Pooling layer and full connection layer. The convolutional layer is responsible for extracting local features in the image, the pooling layer is used to significantly reduce the magnitude of parameters (dimensionality reduction), and the full connection layer is similar to the part of the traditional neural network to output the desired results.
2.4 Stable development period: 1990s to 2010
Due to the rapid development of Internet technology, the innovation research of artificial intelligence has been accelerated, and artificial intelligence technology has been further practical, and various fields related to artificial intelligence have made great progress. In the early 2000s, the focus of AI research shifted from knowledge-based systems to machine learning, as projects in expert systems had to encode too many explicit rules, which reduced efficiency and increased costs.
- In 1995, Cortes and Vapnik proposed the associationist classic Support Vector Machine, which showed many unique advantages in solving small sample size, nonlinear and high-dimensional pattern recognition, and could be extended to other Machine learning problems such as function fitting.
Support Vector Machine (SVM) can be regarded as an improvement on perceptron. It is a generalized linear classifier based on the VC dimension theory of statistical learning theory and the principle of minimum structural risk. And perceive machine main difference is that: 1, the perceptron goal is to find a hyperplane to separate the sample as soon as possible the correct (countless), goal is to find a hyperplane SVM not only will the sample separation is correct, as far as possible to make each sample from the hyperplane furthest distance (only one of the biggest margin hyperplane), stronger generalization ability of SVM. 2. For the problem of linear inseparability, different from perceptron adding nonlinear hidden layer, SVM uses kernel function to realize nonlinear transformation of feature space in essence, so that it can be linearly classified.
- In 1995, Freund and Schapire proposed AdaBoost (Adaptive Boosting) algorithm. Boosting ensemble learning method AdaBoost uses Boosting ensemble learning method — serial combination of weak learners to achieve better generalization performance. Another important integration method is Bagging parallel combination represented by random forest. Boosting method mainly optimized bias, and Bagging mainly optimized variance based on bias variance decomposition analysis.
The basic idea of Adaboost iterative algorithm is to train different classifiers by adjusting the weight of each training sample in each round (the weight of misclassified samples is higher). Finally, the accuracy rate of each classifier is taken as the weight of its combination, and it is combined together into a strong classifier.
-
In 1997 international Business Machines Corp. ‘s Deep Blue supercomputer beat world chess champion Garry Kasparov. Deep Blue achieves chess domain intelligence based on brute force exhaust, by generating all possible moves, then performing a search as deep as possible and constantly evaluating the situation to try to figure out the best move.
-
In 1997, Sepp Hochreiter and Jurgen Schmidhuber proposed long and short term memory Neural Network (LSTM).
LSTM is a cyclic neural network (RNN) with complex structure, which introduces forgetting gate, input gate and output gate in structure: The input gate determines how much of the input data of the network needs to be saved to the unit state at the current moment, the forgetting gate determines how much of the unit state at the previous moment needs to be saved to the current moment, and the output gate controls how much of the current unit state needs to be output to the current output value. This structure design can solve the problem of gradient disappearance in long sequence training.
-
In 1998, Tim Berners-Lee of the World Wide Web Consortium (W3C) put forward the concept of Semantic Web. Its core idea is to make the whole Internet a general information exchange medium based on semantic links by adding Meta data that can be understood by computers to documents (such as HTML) on the World Wide Web. In other words, it is to build an intelligent network that can realize barrier-free communication between people and computers.
-
In 2001, Conditional Random Field (CRF) was first proposed by John Lafferty.
CRF is a discriminant probability graph model based on Bayesian theoretical framework, given conditional random field P (Y ∣ X) and input sequence X, the output sequence Y * with the maximum conditional probability is calculated. In many natural language processing tasks such as word segmentation, named entity recognition and other performance is particularly good.
- In 2001, Dr. Breman proposed the Random Forest.
Random forest is an ensemble learning method that combines Bagging weak learners (decision trees) with differences in order to optimize generalization performance by building multiple well-fitting and differentiated models to combine decisions. Diversity difference can reduce dependence on some characteristic noise, reduce variance (overfitting), and combinatorial decision can eliminate the deviation among some learners.
Random forest algorithm is the basic train of thought for each weak learning (decision tree) have returned to the sample structure of the training set, and randomly selected feature subsets are available, and as the training sample and the diversity of the feature space training N different weak learning, finally combined with N prediction of weak learning (category or regression prediction value), Take the maximum number of categories or average as the final result.
- In 2003, David Blei, Andrew Ng and Michael I. Jordan proposed LDA (Latent Dirichlet Allocation) in 2003.
LDA is an unsupervised method used to predict the topic distribution of documents. The topic of each document in a document set is given in the form of probability distribution. Subject clustering or text classification can be carried out according to the topic distribution.
- In 2003, Google published three papers on the foundation of big data, which provided ideas for the core issues of big data storage and distributed processing: unstructured file Distributed storage (GFS), distributed computing (MapReduce) and structured data storage (BigTable), and laid the theoretical foundation of modern big data technology.
-
In 2005, Boston Dynamics introduced a power-balanced quadruped robot dog that is versatile enough to adapt to more complex terrains.
-
In 2006, Jeffrey Hinton and his student Ruslan Salahdinov formally proposed the concept of deep Learning, which started the wave of deep Learning in academia and industry. The year 2006 is also known as the year of deep learning, making Jeffrey Hinton the father of deep learning.
The concept of deep learning originates from the research of artificial neural network. Its essence is to use multiple hidden layer network structure to learn the higher-order representation of the internal information of data through a large number of vector calculation.
- In 2010, Sinno Jialin Pan and Qiang Yang published an investigation on Transfer Learning.
Generally speaking, transfer learning is to use existing knowledge (such as trained network weights) to learn new knowledge to adapt to specific target tasks. The core is to find the similarity between existing knowledge and new knowledge.
2.5 Flourishing Period: from 2011 to present
With the development of big data, cloud computing, Internet, Internet of Things and other information technologies, ubiquitous perceptual data and graphics processors and other computing platforms promote the rapid development of artificial intelligence technology represented by deep neural network, which has greatly crossed the technological gap between science and application. Artificial intelligence technologies such as image classification, voice recognition, knowledge question and answer, man-machine chess, and unmanned driving have achieved major technological breakthroughs and ushered in a new upsurge of explosive growth.
-
In 2011, IBM Watson won the Jeopardy quiz. Waston is a computerised Q&A system that combines natural language processing, knowledge representation, automatic reasoning and machine learning.
-
In 2012, Hinton and his student Alex Krizhevsky won the ImageNet competition with an AlexNet neural network model, the first time a model had performed so well in the ImageNet dataset, and sparked a wave of neural network research.
AlexNet is a classic CNN model with significant improvements in Data, algorithm, and computational power. It innovatively applies Data Augmentation, ReLU, Dropout, and LRN, and uses GPU-accelerated network training.
- In 2012, Google officially released Google Knowledge Graph (Google Knowledge Graph), which is a Knowledge base of Google gathered from various sources of information. Knowledge Graph is used to superimpose relationships on ordinary string searches. While helping users find the information they need more quickly, they can also take a step closer to knowledge-based search to improve the quality of Google search.
Knowledge graph is a structured semantic knowledge base, a representative method of symbolism, which is used to describe concepts and their relationships in the physical world in symbolic form. Its common constituent units are RDF triples (entity-relationship-entity), which are connected by relationships to form a network of knowledge structures.
- In 2013, Durk Kingma and Max Welling proposed Variational auto-encoder (VAE) in ICLR with the article auto-encoding Variational Bayes.
The basic idea of VAE is to transform real samples into an ideal data distribution through the encoder network, and then transmit the data distribution to the decoder network to construct the generated samples. The process of model training and learning is to make the generated samples close enough to the real samples.
- In 2013, Tomas Mikolov of Google proposed the classic Word2Vec model in Efficient Estimation of Word Representation in Vector Space to learn distributed Word Representation. Because of its simplicity and efficiency, it has attracted great attention from industry and academia.
The basic idea of Word2Vec is to learn the relationship between each word and its neighbors to represent words as low-dimensional and dense vectors. The semantic information of words can be learned through such distributed representation. Intuitively, the distance between semantically similar words is similar.
Word2Vec network structure is a shallow neural network (input layer – linear full connection hidden layer -> output layer), which can be divided into CBOW model (using a word as input to predict its adjacent words) or Skip-Gram model (using a word’s adjacent words as input to predict the word) according to the training learning mode.
-
In 2014, chat program “Eugene Goostman” first “passed” the Turing Test at the Turing Test 2014 conference held at the Royal Society.
-
In 2014, Goodfellow and Bengio et al proposed Generative Adversarial Network (GAN), hailed as the coolest neural Network in recent years.
GAN is designed based on reinforcement learning (RL) and consists of two parts, namely, Generator network (G) and Discriminator network (D). The generative network forms a mapping function G: Z→X (input noise Z, faked data X generated by output), discriminating network discriminates whether the input is from real data or data generated by the generated network. In the course of such training, the generative and discriminant abilities of the two models are improved.
- In 2015, LeCun, Bengio, and Hinton(who shared the Turing Prize in 2018) launched Deep Learning, a joint review of Deep learning, to mark the 60th anniversary of the concept of artificial intelligence.
The paper “Deep Learning” points out that Deep learning is a feature learning method that transforms the original data into higher-level and abstract expressions through some simple but non-linear models, which can strengthen the ability to distinguish the input data. With enough combinations of transformations, very complex functions can be learned.
- In 2015, the residual network (ResNet) proposed by Kaiming He et al. of Microsoft Research won the image classification and object recognition in ImageNet large-scale visual recognition competition.
The main contribution of the residual network is to discover the “Degradation” caused by the non-identity transformation of the network, and introduce “Shortcut connection” to address the Degradation, which alleviates the problem of gradient disappearance caused by the increase of depth in the deep neural network.
-
In 2015, Google opened its TensorFlow framework. It is a symbolic mathematics system based on Dataflow programming. It is widely used in the programming implementation of various machine learning algorithms. Its predecessor is DistBelief, Google’s neural network algorithm library.
-
Musk and others co-founded OpenAI in 2015. It is a non-profit research organization with a mission to ensure that general artificial intelligence — a system that is highly autonomous and outperforms humans at most economically valuable tasks — will deliver benefits to all of humanity. It releases popular products such as OpenAI Gym, GPT and so on.
-
In 2016, Google introduced federated learning, which trains algorithms on multiple decentralized edge devices or servers that hold local data samples without exchanging their data samples.
The three most important technologies for federal learning to protect privacy are: Differential Privacy, Homomorphic Encryption, and Private Set Intersection Enables multiple participants to build a common and powerful machine learning model without sharing data, thus addressing key issues such as data privacy, data security, data access rights, and access to heterogeneous data.
- In 2016, AlphaGo won a game of Go with World champion and professional nine-dan player Lee Se-dol by an aggregate score of 4-1.
AlphaGo is an artificial intelligence program of go. Its main working principle is “deep learning”, which consists of the following four main parts: A Policy Network predicts and samples the next move given the current situation; Fast rollout targets are the same as those of strategic networks, but 1000 times faster at a modest sacrifice of the quality of the moves. Value Network estimates the win rate of the current situation; Monte Carlo Tree Search Tree Search estimates the win rate of each move.
AlphaGo Zero, which was updated in 2017, built on previous versions and trained itself with reinforcement learning. It does not know the rules of the game at all before playing chess and games. It is completely through its own experiment and exploration to understand the rules of chess and games and form its own decisions. With the increase of the self-game, the neural network gradually adjusts to improve the probability of winning. What’s more, as the training went on, AlphaGo Zero independently discovered the rules of the game and came up with new strategies, bringing new insights into the ancient game of Go.
-
In 2017, Sophia, a humanoid robot developed by Hanson Robotics in Hong Kong, China, became the first robot in history to be granted citizenship. Sophia looks like a human female with rubber skin and is able to show more than 62 natural facial expressions. Algorithms in its “brain” can understand language, recognize faces and interact with people.
-
In 2018, Google proposed a pre-training of Deep Bidirectional Transformers for Language Understanding and published Bert(Bidirectional Encoder) Representation from Transformers) model, successfully achieved the state of the art results in 11 NLP tasks.
BERT is a pre-trained language representation model that can be used to learn dynamic word representation on a large corpus using unsupervised learning methods. It is based on Transformer model of attention mechanism, which can be more efficient and capture longer distance dependence information compared with RNN. Besides, traditional one-way language model or shallow splicing-together method of two one-way language models are not used for pre-training as before. Instead, a new Masked Language model (MLM) is used to generate bi-directional linguistic representations of depth.
-
In 2019, IBM announced the launch of Q System One, the world’s first integrated general purpose approximate quantum computing System designed specifically for scientific and commercial use.
-
In 2019, a team at Insilico Medicine and the University of Toronto achieved a major experimental breakthrough by discovering several drug candidates through deep learning and generation model-related techniques, demonstrating the effectiveness of AI’s molecular discovery strategy. It has largely solved the problem of difficult and time-consuming molecular identification in traditional new drug development.
-
In 2020, Google and Facebook respectively proposed SimCLR and MoCo unsupervised learning algorithms, both of which can learn image data representation from unlabeled data. The framework behind the two algorithms is contrastive learning, and the core training signal of contrastive learning is the “differentiability” of pictures.
-
In 2020, OpenAI developed the TEXT Generation ARTIFICIAL intelligence GPT-3, which has a 175 billion parameter natural language deep learning model, 100 times higher than the previous version GPT-2, which has been pre-trained with nearly 0.5 trillion words, State-of-the-art performance can be achieved on multiple NLP task benchmarks (answering questions, translating, writing articles).
-
Neuralink, Musk’s brain — Computer Interface (BCI) company, hosted a live broadcast in 2020 showing the brain activity of experimental pigs implanted with Neuralink equipment.
-
In 2020, Google DeepMind’s AlphaFold2 artificial intelligence system powerfully solved a landmark problem in protein structure prediction. It beat other contestants at the International Protein Structure Prediction Competition (CASP) by predicting the three-dimensional structure of proteins with an accuracy comparable to experimental techniques such as cryo-EM, NUCLEAR magnetic resonance or X-ray crystallography.
-
In 2020, Pan Jianwei and his colleagues at the University of Science and Technology of China successfully built a 76-photon quantum computing prototype called jiuzhang. It only takes 200 seconds to solve the mathematical algorithm “Gaussian Bose sampling”, while the current world’s fastest supercomputer takes 600 million years.
-
In 2021, OpenAI proposed two neural networks connecting text and image: DALL·E and CLIP. DALL·E can directly generate images based on text, while CLIP can complete the matching between images and text categories.
-
In 2021, German artificial intelligence company Eleuther launched its open source text AI model GPT-NEO in late March this year. The difference with GPT-3 is that it is open source and free.
-
In 2021, researchers at Stanford University developed a brain-computer Interface (BCI) for typing that decoded the imagined handwritten movements of paralyzed patients from neural activity in the motor cortex. These handwritten actions are converted into text in real time by recursive neural network (RNN) decoding method. Relevant research results were published in Nature on May 13, 2021 with the title of “High-performance brain-to-text Communication via Handwriting”.
3. Future trends of AI
Artificial intelligence has three elements: data, computing power and algorithms. Data is the raw material of knowledge, and computing power and algorithms provide “computational intelligence” to learn knowledge and achieve specific goals.The 60 years of technological development of ARTIFICIAL intelligence can be attributed to the development of algorithms, computing power and data. Then what will be the trend of artificial intelligence development in the foreseeable future?
3.1 Data Layer
Data is the basic element of mapping the real world to build the virtual world. With the exponential growth of data volume, the territory of the virtual world is also expanding. Different from open source AI algorithms, key data is often not open, and data privacy and private domain is a trend. Data to AI applications is just like traffic is the moat of the Internet, and only core data can have key AI capabilities.
3.2 Computing power level
Reason is nothing but reckoning — Thomas Hobbes
Computing is key to AI, and the wave of deep learning since the 2010s owes much to advances in computing power.
- Quantum computing development
At a time when computing chips are becoming increasingly ineffective according to Moore’s Law, and slowing advances in computing power will limit future AI technologies, quantum computing offers a new level of power enhancement. As the number of quantum bits of a quantum computer grows exponentially, and its computing power is exponentially greater than the number of quantum bits, the growth rate will be far greater than the growth of the amount of data, bringing a strong hardware foundation for artificial intelligence in the era of data explosion.
- Edge computing development
Edge computing as a supplement and optimization of cloud computing, part of artificial intelligence is accelerating from the cloud to the edge, into the smaller and smaller devices of the Internet of Things. These iot devices tend to be very small, which is why lightweight machine learning (TinyML) is favored for power consumption, latency, and accuracy.
- Development of brain-like computing
Various brain-like computing systems with brain-like computing chips as the core are gradually showing their advantages in dealing with some intelligent problems and low-power intelligent computing. The design of brain-like computing chip will draw inspiration from the design methodology and development history of existing processors and realize complete hardware functions based on the theory of computational completeness combined with application requirements. At the same time, the basic software of brain-like computing will integrate the existing brain-like computing programming language and framework, and realize the gradual evolution of brain-like computing system from “special” to “general”.
- Artificial intelligence computing center has become the key infrastructure in the era of intelligence
The AI Computing Center is based on the latest AI theory and adopts the leading AI computing architecture. It is a “four-in-one” comprehensive platform integrating public computing power services, data open sharing, intelligent ecological construction and industrial innovation gathering. It can provide computing power, data and algorithms and other AI full-stack capabilities. It is a new computing power infrastructure for the rapid development and application of artificial intelligence. In the future, with the continuous development of intelligent society, artificial intelligence computing centers will become the key information infrastructure to promote the deep integration of digital economy and traditional industries, accelerate industrial transformation and upgrading, and promote high-quality economic development.
3.3 Algorithm Level
- Machine learning automation (AutoML) development
Automated machine learning (AutoML) addresses the core questions of which machine learning algorithm to use on a given data set, whether and how to preprocess its features, and how to set all the hyperparameters. As machine learning has made great strides in many application areas, this has contributed to the growing demand for machine learning systems and the hope that machine learning applications can be built and used automatically. AutoMl and MLOps will significantly reduce machine learning manual training and deployment, allowing technicians to focus on core solutions.
- Evolution towards distributed privacy protection
At present, many countries and regions in the world have introduced data regulatory regulations, such as HIPAA (Health Insurance Facilitation and Accountability Act of the United States), GDPR (General Data Protection Regulation of the European Union) and so on, which restrict the exchange of private data between multiple institutions through strict regulations. Distributed privacy protection machine learning (federated learning) protects the input data of machine learning model training through encryption, distributed storage and other methods, which is a feasible solution to break data island and complete multi-institution joint training modeling.
- Data and mechanism fusion
The development of AI models conforms to the laws of simplicity and beauty. Modeling based on data summarizes rules from data and pursues application effect in practice. Modeling based on mechanism deduces the basic physical laws and pursues the expression of conciseness and beauty.
A good, mainstream model, usually one that highly summarizes data patterns and fits into mechanisms, is “elegant” because it gets to the heart of the problem. Like scientific theories, they tend to be neat and patchy, and this solves both the convergence rate problem and the generalization problem.
- Structural development of neural network model
The evolution of neural network has been along the direction of modularization + layering, and constantly combine multiple modules that undertake relatively simple tasks.
Neural network structure detects basic features through lower level modules and higher order features at higher level. Both the multilayer feedforward network and convolutional neural network embody this modularity (in recent years, The “capsule” network proposed by Hinton is a further modular development). Because the problems we deal with (image, voice, text) often have natural modularity, if the modularity of learning network matches the inherent modularity of the problem itself, it can achieve better results.
Layering is not only the topological superposition of the network, but also the upgrading of the learning algorithm. Simply deepening the layering may lead to the disappearance of the gradient of BP network.
- Integrated development of multi-school methods
The advantages and weaknesses of complementary algorithms can be achieved through the integration and development of multi-school methods. For example, 1) The combination of Bayes and neural network, and the Deep Gaussian Process of Neil Lawrence group replaced the neural network layer with a simple probability distribution. 2) Integration of symbolism, integrated learning and neural network, Zhou Zhihua’s Deep Random Forest. 3) Integration of symbolism and neural network: knowledge base (KG) is integrated into neural network, such as GNN and knowledge graph to represent learning. 4) The integration of neural network and reinforcement learning, such as Google’s Alpha Go based on DNN+ reinforcement learning, makes the performance of AI in complex tasks close to that of human beings.
- Based on large-scale unsupervised pre-training development
If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning (RL) — Yann Lecun
Supervised learning requires sufficient labeled data, but manual labeling of large amounts of data is time-consuming and laborious, and it is almost impossible to obtain sufficient labeled data in some fields (such as medicine). Using a large amount of unlabeled data in reality through large-scale unsupervised pre-training is a research hotspot. For example, the emergence of GPT-3 has stimulated the continuous exploration and research on large-scale self-supervised pre-training methods. In the future, cross-language self-supervised pre-training models based on large-scale image, voice, video and other multi-modal data will be further developed, and the cognitive and reasoning abilities of the models will be continuously improved.
- Development based on causal learning methods
Most of the current ai models focus on the correlation between data features, but the correlation is not equivalent to the more original causal relationship, which may lead to the deviation of prediction results, poor ability to fight against attacks, and the model is often lack of interpretability. In addition, the model requires independent identically distributed (I.I.D.) assumptions (reality in many cases, I.I.D. Assumption is not set up), if the test data and the distribution of training data from different, statistical learning Model is often the effect not beautiful, and it was such a Causal inference studied: how to learn a can work under different distribution, contains the Causal mechanism of Causal Model (Causal Model), and a Causal Model intervention or the fact inference.
- Development of interpretable AI (XAI)
Interpretable AI is likely to be central to machine learning in the future, and as models become more complex, identifying simple, interpretable rules becomes more difficult. Explainable AI (XAI for short) means that AI operation is transparent, and it is easy for human to supervise and accept AI, so as to ensure fairness, security and privacy of algorithms.
Afterword.
As data, computing power and algorithms continue to make breakthroughs, artificial intelligence may enter a perpetual spring. This paper mainly looks at the TREND of AI from the perspective of technology is relatively one-sided. Although technology is a “high” first productive force with its own law of development, it cannot be ignored that technology serves the demand market. Technology combined with stable market demand is the actual orientation of technology development.
This article was originally published on “Advanced Algorithms” and can be read on Github blog.