Summary: It’s been 65 years since Dartmouth, especially in recent years, since the rise of deep learning, AI has enjoyed unprecedented prosperity. However, in the last two years, the AI boom in China seems to have fallen back, and there are challenges in theoretical breakthroughs and practical applications. There are many voices of criticism and doubts from the outside world, and even some AI practitioners are somewhat frustrated. This article is the author’s personal review of the development of AI, examining where we are now in history, and exploring where the future of AI lies.

The author | | ali Jin Rong source technology public number

Some views:

1. The ERA of AI has just begun. AI is still in its infancy, just as Faraday just discovered alternating current.

2. There have been many impressive advances in AI research in the form of deep learning in recent years, but some of them are also the result of luck.

3. After encountering the bottleneck, deep learning has three possible breakthrough directions: the fundamental understanding of deep learning, self-supervised learning and small sample learning, and the organic integration of knowledge and data.

4. The biggest opportunity for AI at present: AI for Science.

The introduction

It’s been 65 years since Dartmouth, especially in recent years with the rise of deep learning, and AI has flourished like never before. However, in the last two years, the AI boom in China seems to have fallen back, and there are challenges in theoretical breakthroughs and practical applications. There are many voices of criticism and doubts from the outside world, and even some AI practitioners are somewhat frustrated.

I have been fortunate to be an AI researcher since the 1990s, when I started my PhD at Carnegie Mellon University, and have witnessed the ups and downs of the field. In this article, I will attempt to review the development of AI from a personal perspective, examine where we are now in history, and explore where the future of AI really lies.

Jin Rong

Historical stages of AI: Manual workshops

While there are those who call this a third or even fourth wave of AI and are optimistic that AI has arrived, I’m a little more cautious: AI certainly has great potential, but in terms of what we’re capable of right now, AI is still in its infancy, a technology rather than a science. This is not only a Chinese AI problem, but also a global AI problem.

The rapid development of deep learning in recent years has greatly changed the face of THE AI industry, making AI become a technology used by the public in daily life. There are even some surprising AI application cases, which make people mistakenly believe that science fiction movies will soon become reality. But the reality is that technological development needs to be accumulated over a long period of time. At present, AI is only in its infancy, and the ERA of AI has just begun.

If we compare the age of AI to the age of electricity, our AI technology today is Faraday’s electricity. Faraday’s discovery of electromagnetic induction, which led to the development of the first prototype of an alternating current generator, was not without its greatness. Faraday and his pioneers, who had a lot of experience, made new products by hand through lots of observation and trial and error, were just the beginning of the age of electricity. The real great development of the electric age benefited greatly from the proposal of electromagnetic field theory. Maxwell turned practical experience into scientific theory, and put forward and proved the Maxwell equation with cross-era significance.

The electrical revolution would not have happened if people’s understanding of electromagnetism had stayed at the Faraday level. If wind, rain, thunder, and even temperature changes can cause power outages, how can electricity become a universal product, how can it become a social infrastructure? And how can there be a variety of electrical products, electronic products, communications products, completely change our way of life?

This is the problem with AI right now, being limited to certain scenarios, certain data. Once out of the laboratory, the AI model is often invalidated by the interference and challenge of the real world, and its robustness is not enough. Once another scene is changed, we need to deeply customize the algorithm for adaptation, which is time-consuming and laborious, difficult to scale promotion and limited generalization ability.

That’s because TODAY’s AI is largely based on experience. AI engineers are like Faraday. They can make some AI products, but they don’t know why, they don’t understand the core principles.

So why hasn’t AI become a science yet?

The answer is that technology moves much more slowly than we think. Looking back over the past 20 years since the 1990s, we see more rapid progress in AI application engineering, while breakthroughs in core technologies and core problems are relatively limited. Some of the technologies that seem to have emerged in recent years actually already exist.

Take autonomous driving as an example. The Alvinn project, conducted by researchers at Carnegie Mellon University in the United States, started to use neural networks to achieve autonomous driving in the late 1980s. In 1995, it successfully traveled nearly 3,000 miles from east to west across the United States in seven days. In terms of chess, TD-Gammon, developed by IBM researchers in 1992, is similar to AlphaZero in its ability to learn and strengthen itself to the level of a master of backgammon.

However, due to the limitations of data and computing power, these studies only happened in points, not scale, and certainly did not arouse widespread public discussion. Today, thanks to the ubiquity of commerce, increased computing power, easy access to data, and lower barriers to application, AI is within reach.

But the core idea has not fundamentally changed. We are all trying to describe the world by using finite samples to achieve function approximation. There is an input and then an output. We imagine the learning process of AI as a function approximation process, including our entire algorithm and training process, such as gradient descent and gradient return.

Likewise, the core issue has not been effectively addressed. The core questions that were being asked in the 1990s have not been answered yet, and they are all closely related to neural networks and deep learning. For example, for the optimization problem of non-convex function, the solution obtained is likely to be the local optimal solution, not the global optimal solution, which may not converge during training, and limited data will bring insufficient generalization problem. Could we be biased by this solution, overlooking more possibilities?

Deep learning: encountering development bottleneck after great prosperity

Needless to say, AI research represented by deep learning has made many amazing progress in recent years. For example, in the training of complex networks, two particularly successful network structures, CNN and Transformer, have been produced. Based on deep learning, AI researchers have achieved rapid development in various fields such as speech, semantics and vision, solved many practical problems and realized great social value.

But looking back at the development of deep learning, AI practitioners have been very lucky.

The first is stochastic gradient descent (SGD), which greatly promotes the development of deep learning. In fact, stochastic gradient descent is a very simple method with great limitations. It is a slow convergence method in optimization, but it performs very well in deep networks, and it is surprisingly good. Why is it so good? So far, researchers don’t have a perfect answer. Some of the more obscure forms of good luck include the residual network, knowledge distillation, Batch Normalization, Warmup, Label Smoothing, Gradient Clip, and Layer Scaling. Some, in particular, have strong generalization abilities that can be used in multiple scenarios.

Moreover, in machine learning, researchers are always wary of the problem of overfitting. When there are too many parameters, a curve that fits all the points perfectly is likely to be a problem, but in deep learning this no longer seems to be a problem… Although many researchers have discussed this, there is no clear answer at present. Even more surprisingly, even if we gave the data a random label, it fitted perfectly (see the red curve below), resulting in a fitting error of 0. According to standard theory, this means that there is no bias in the model, which helps explain any results. Think about it. Is a model that can explain anything really reliable, a cure-all?

(Understanding deep learning requires Rethinking generalization. ICLR, 2017.)

At this point, let’s review the development of machine learning as a whole, so as to better understand the current deep learning.

Machine learning developed in several waves, beginning with rule-based learning in the 1980s and 1990s. From the 90s to the 2000s, mainly neural networks, it was discovered that neural networks could do some nice things, but it left a lot of fundamental questions unanswered. Therefore, after the 2000s, a group of people tried to solve these basic problems, the most famous is called SVM (Support Vector Machine). A group of researchers with mathematical background focused on understanding the process of machine learning, learning the most basic mathematical problems, how to better realize the approximation of functions, how to ensure fast convergence. How do you guarantee generalization?

At that time, researchers put a lot of emphasis on understanding, and good results should come from our deep understanding of it. Researchers will be very concerned about whether there is a good theoretical foundation, because to do a good analysis of the algorithm, it is necessary to have a deep understanding of functional analysis, optimization theory, and then do generalization theory… Probably these several items must be very good, in order to have a say in the field of machine learning, otherwise even can not understand the article. If researchers want to build a large-scale experimental system, especially a distributed one, they need to have rich experience in engineering, otherwise they cannot do it. At that time, there are not too many ready-made things, but more theories, and most engineering implementations need to rely on their own.

But in the era of deep learning, someone came up with a really nice framework that made it easier for all researchers, lowered the bar, and that’s a really great thing, and it’s driven the industry very fast. You can do deep learning today if you have a good idea, if you write dozens or even dozens of lines of code. Thousands of people are experimenting with all kinds of new projects, testing all kinds of new ideas, and often the results are really amazing.

But we may need to realize that deep learning has hit a major bottleneck so far. The good fortune that helped deep learning succeed, and the inexplicable black box effect, is now a hindrance to its further development.

Three possible directions for next generation AI

What is the future of AI? What will the next generation of AI be? It is difficult to give a clear answer at present, but I think there are at least three directions that deserve to be explored and made breakthroughs.

The first is to seek a fundamental understanding of deep learning.

Only in this way can AI become a science. Specifically, it should include breakthroughs on the following key issues:

  • A more comprehensive characterization of dnN-based function Spaces;
  • Understanding of SGD (or more generalized first-order optimization algorithms);
  • Rethink the foundations of generalization theory.

The second direction is the organic integration of knowledge and data.

Humans use a lot of knowledge as well as data to make a lot of decisions. If our AI can integrate knowledge structure into the organic, as an important part, AI is bound to have a breakthrough development. Researchers have been working on knowledge mapping, but need to further solve the organic combination of knowledge and data to explore a usable framework. There have been some innovative attempts, such as Markov Logic, which combine Logic with basic theory to form some interesting structures.

The third important direction is self-supervised learning and small sample learning.

I put this at number three on my list, but it’s one that’s worth focusing on right now to bridge the gap between AI and human intelligence.

Today we often hear that AI can outperform humans in some abilities, such as speech recognition and image recognition. Recently, Dharma Academy AliceMind scored better than humans in visual question answering for the first time, but this does not mean that AI is smarter than humans. Google has a very insightful paper on the Measure of Intelligence in 2019. The core point is that real intelligence not only requires superb skills, but more importantly, whether it can be quickly learned, quickly adapted or quickly used.

In this view, AI is currently far inferior to humans, and while it may surpass humans in some areas of accuracy, its range of availability is very limited. The fundamental reason for this is that humans, especially smart ones, can achieve results quickly with very little learning cost — and this is one of the main differences I see between AI and humans right now.

There is a very simple fact that AI is not as intelligent as human beings. Take translation as an example. Now a good translation model needs at least 100 million levels of data. If a book is in the hundreds of thousands of words, the AI will read tens of thousands of books. It is hard to imagine that a person needs to read thousands of books in order to learn a language.

Another interesting contrast is between the structure of neural networks and the human brain. Currently, AI is very focused on depth, and neural networks often have dozens or even hundreds of layers, but when we look at humans, for example, visual neural networks have only four layers, which is very efficient. And the human brain is also very low power consumption, only about 20 watts, whereas gpus today are basically hundreds of watts, an order of magnitude less. One run of the famous GPT-3 is equivalent to three round trips by a 747 from the East coast to the west Coast of the United States. In terms of information coding, human brain is encoded in time series, while AI is expressed in tensors and vectors.

It may be said that AI development does not have to be in the direction of human intelligence. I agree that there is some truth to this idea, but it may be instructive to look at human intelligence when AI is stuck and there is no other way to compare it. For example, are deep neural networks today the most sensible direction to go when comparing human brain intelligence? Is today’s code making the most sense? These are the foundations of our AI today, but are they good foundations?

It should be said that the large model represented by GPT-3 may also be a breakthrough direction of deep learning, which can realize self-learning to a certain extent. The big model is a bit like the old days when you overdo everything you can see, and when you come across a new scene, you don’t need much new data. But is this the best solution? We don’t know yet. Again, taking translation as an example, it is hard to imagine that one needs to pack so many things to master a foreign language. The big models are now starting at a scale of ten billion, ten billion parameters, and no human will carry that much data.

So maybe we need to keep exploring.

Opportunities for AI: AI for Science

At this point, some people may be disappointed. Since we haven’t solved these three problems yet, and AI isn’t a science yet, what’s the value of AI?

Technologies, such as the Internet, are hugely valuable in their own right, reshaping our work and lives. The big opportunity for AI as a technology is to help solve AI for Science. AlphaFold has shown us that AI has solved a protein folding puzzle that has plagued biology for half a century.

We need to learn about AlphaFold, but there’s no need to worship it. What AlphaFold demonstrates is that DeepMind is really good at choosing topics, taking problems that have enough foundation and data to be possible to solve today, and then building the best team in the world to solve them.

We have the potential to create even more important results than AlphaFold because there are so many important open questions in the natural sciences, and AI has even greater opportunities to discover new materials, discover crystal structures, and even prove or discover theorems… AI could upend traditional research methods and even rewrite history.

For example, some physicists are now thinking, can AI rediscover the laws of physics? In the past hundreds of years, the discovery of the laws of physics depended on genius. Einstein discovered the general theory of relativity and the special theory of relativity. Heisenberg, Schrodinger and others pioneered quantum mechanics. Without these geniuses, development in many fields would be delayed by decades or even centuries. But today, with more data and more complex scientific laws, can we rely on AI to derive the laws of physics instead of one or two geniuses?

Take quantum mechanics, for example. At the heart of this is the Schrodinger equation, which was developed by a genius physicist. But now physicists have used the vast amount of data they’ve collected to use AI to derive these patterns automatically, and have even discovered an alternative way to write schrodinger’s equation. This is a truly remarkable thing that could change the future of physics and even humanity.

The AI EARTH project that we are advancing is bringing AI to the field of meteorology. Weather forecasting, which has been around for hundreds of years, is a very big and complex scientific problem that requires supercomputers to perform complex calculations, which consume a lot of resources and are not particularly accurate. Can we use AI today to solve this problem, to make weather forecasting efficient and accurate? If it works, it will be very exciting. Of course, this is bound to be a very difficult process, requiring time and determination.

AI practitioner: more interest, less utility

The current situation of AI is a test for all of us AI researchers. Whether it is the basic theoretical breakthrough of AI, or the solution of scientific problems by AI, it cannot be achieved overnight. Researchers are required to be smart and determined. If you are not smart, you cannot seize opportunities in the uncertain future; If not, you are likely to be intimidated.

The boom of deep learning in recent years has brought a large number of talents and capital into the AI field in China, which has rapidly promoted the development of the industry, but has also given rise to some unrealistic expectations. After DeepMind did AlphaGo, some people in China followed and copied, but the significance for core fundamental innovation progress is relatively limited.

Since AI is not yet a science, we are going to explore things that no one has done before and probably fail. This means that we must have a real interest, with interest and curiosity to drive ourselves forward, to overcome countless failures. We may have seen DeepMind make AlphaGo and AlphaFold, but there are likely to be many more failed projects that no one hears about.

In the aspect of interest drive, foreign researchers deserve our learning. Like some Turing award winning top scientists, every day in the front line to do research, personally derive the theory. I still remember when I was studying in CMU, there were several Turing award winners in the school, and they usually took various seminars (seminars). I know one of them, Manuel Blum, who won the Turing Prize for his cryptography work, and WHEN I attended a seminar I found Manuel Blum sitting on the steps of the classroom without a seat. He didn’t mind where he sat, he came when he was interested, and he squeezed whenever there was no seat available. I had the good fortune to meet Thomas Sargent, a Nobel Laureate in economics. He was already a successful economist, but he started studying general relativity at 60, deep learning at 70, and still discussing the progress of deep learning with us younger generations at 76… Perhaps this is the true love of research.

At home, we don’t have to be complacent. China is a world leader in AI engineering. Acknowledging that AI is still in its infancy is not a negation of the efforts of practitioners, but a reminder that we need to be more committed to long-term efforts rather than rush them. Without Faraday and other pioneers, it would have been impossible to summarize the theory and make mankind enter the electrical age.

In the same way, AI depends on the vision of big innovation, day by day, trying new ideas, and then a few small breakthroughs. When some smart minds can connect the dots and come up with a theory, AI will make the big breakthroughs that will eventually become a science.

We have already half-stepped into the gate of the AI era, which is destined to be a more brilliant and exciting era than the electric era, but the premise of all this depends on the unswerving efforts of all researchers.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.