Introduction to the
With non-it industry friends to explain the artificial intelligence is always a difficult thing, they also have larger myth, to the oneself is not what famous people, but also engaged in the related content over a period of time of learning and research, this article will try to discuss the artificial intelligence from the higher level as well as the limitations of IT now.
I will use NLP(natural language processing), my area of expertise, as an example. The following are my personal views, if any, please correct them.
The current nature of artificial intelligence
Artificial intelligence usually refers to deep learning, because the rise of big data in recent years has enabled deep learning to play a big role, but deep learning has not made a big breakthrough in theory. It is still an old thing, and its essence is still probability theory and statistics.
Here to Microsoft Xiaoice, Apple Siri these intelligent assistants for example, their technical principle is actually dialogue system + speech recognition. When we say a word to small ice, small ice at this time do what?
The small ice receives this sentence, you will be carried out in speech recognition, speech recognition is to use so-called it previously trained probability model, calculated the voice data which corresponds to the text, and then a dialogue prediction, namely the text next occurrence probability which is the biggest text, “the little ice itself did not understand what I said, it’s just answer is given by calculating probability”.
Training a chat can commercial machine needs the support of many engineering skills, but the essence is this probability model, we can climb from from the BBS on weibo, take a lot of corpus data, these data are our normal humans left on the net, after the pretreatment of the complex, get the training data, and the training is from roughly between the data found in the text.
For example, if the phrase “HackPython” appears many times in a 100T corpus, the model will remember the word “HackPython” and most likely will be followed by “HackPython”, but the model does not understand this sentence. It does not know that “HackPython” is a public number. It didn’t know that “pay attention” was a propaganda statement. All it knew was probability.
In the field of NLP, whether it is “LSTM”, “GRU”, “attention mechanism”, or the current popular “knowledge graph”, they are all text-based games of probability, but the different ways are different when using the information of training data.
The essence of all current famous model, through the matrix operations training data for a certain probability distribution, the probability distribution of the complex model is usually high dimension, here will lead to a variety of mathematical methods, such as measure theory, manifold, but the nature of thought is still think through probability distribution to describe the features of training data. With these, the same probability distribution can be used to describe similar data, so as to achieve the so-called “recognition”.
After obtaining the probability distribution, there are still questions that haunt us, such as: why can we train such a probability distribution?
This is still a black box for us, and we don’t know why the data is such a probability distribution after being trained by this model, that is, unexplainable.
The unexplainable problem hinders the application of deep learning in some fields, such as the financial field. The model says that such investment is highly likely to make money, but you cannot strictly prove the reliability of this conclusion and can only explain it intuitively, which makes people hesitate, because after all, they may lose a lot of money.
Similarly, autonomous driving, which relies on image recognition technology, is plagued by unexplainable models. Although the image recognition model trained by a large amount of data has a high recognition rate, there are still problems, which can be seen from the research in the field of “anti-attack”. When we change a small amount of data in the image, the image recognition model will not be able to recognize or recognize errors. Since we cannot explain the probability distribution of the image recognition model from the data level, it is not clear under what circumstances it will fail.
Identify and understand differences
From the previous discussion, it can be known that current deep learning does “recognition”, such as language recognition and image recognition, but not “understanding”. This can be seen intuitively from the FIELD of NLP that the most intelligent dialogue system still performs poorly.
“We’ve had a lot of training in” recognition “through a lot of data, and these effects can be very useful, such as face recognition for security, license plate recognition for traffic systems,” but the work on understanding is probably just beginning, “he said.
Artificial intelligence is not as smart as we think, no matter how big the company, introduced how advanced framework, automatic tuning parameter, what great functions such as automatic learning, nature, advanced framework just let us faster, better get can describe the data characteristics of probability distribution, and the probability distribution does not realize the “understanding”.
Let’s think about it. How do humans “understand”?
We understand something, usually refers to know behind the abstract concepts, such as the word “apple”, when we see the word, mind will emerge out of the “apple” relevant information, these information are not from the word “apple”, but I’ve been through life, the life gives us the corresponding background knowledge, And that background helps us understand the word “apple.”
How did we get this background information? It’s still being studied.
This has led many scholars to question whether the current research approach to deep learning is correct. Here is a famous discussion to show the huge difference between the way we train models and the way we actually learn in nature.
Crows are abundant on the streets of Japan, and researchers have found that these crows steal nuts to eat, but they can’t crack the shell themselves, so how do they get the flesh? They watch the traffic lights at the intersection from the pole, and when the light is red, they put the nuts on the road, and then they fly away, and they wait for the light, and when the light is green, a car goes by, and it crushes the nut shells, and then the next time the light is red, the crow flies over and leisurely eats the flesh.
This common performance biological learning actually only need a simple observation can acquisition skills, with the deep learning approach to learning, a completely different depth study using a large amount of data, and then repeated training trial and error, use trial and error to get loss, using gradient descent and back propagation algorithm to update model, The result is a probability distribution that describes the characteristics of the data, and crows don’t have as many lives to try and error, or time to try and error.
Crows, so to speak, “understand” the patterns of traffic lights with a small number of observations, in a way that models derived from deep learning cannot.
The difference between the learning mode of natural creatures and the current mainstream learning mode of deep learning is difficult to avoid confusion.
Moravec paradox
In 1980 Hans Moravec noted that it was relatively easy to make a computer play chess as an adult, but difficult or impossible to make a computer feel and act like a one-year-old. This is known as Moravec’s Paradox, and it still holds true today. In short, “hard problems are simple, and simple problems are hard” is the current situation of ARTIFICIAL intelligence.
For computers, the realization of human high-level intelligence such as logical reasoning and mathematical operation requires relatively little computing power, while the realization of low-level intelligence such as perception and movement requires huge computing resources, which is just the opposite of human beings. For perception and movement, we do not need much brain to do well. Logical derivation, mathematical calculation and so on take a long time to think about.
How to make ai “understand”? How to give artificial intelligence basic background knowledge and response to the real world?
When these two problems are solved, we need to start worrying about whether humans will be replaced by artificial intelligence.
However, the ability to describe data features brought by deep learning has had a fatal impact on some professions, such as factory workers. Mechanical arms with image recognition can well replace most of the work of workers.
The tail
If you’re worried about how a technology or something is going to affect you, the best thing to do is to learn about it and how it works, rather than read articles in the media that will only confuse the view and make the game more difficult. Once you know how this works, the fear of ignorance dissipates and you can adjust accordingly.
Although this paper mainly discusses the problems from the perspective of NLP, these phenomena exist in different fields, such as image processing and reinforcement learning.
Finally, thanks for reading.