The author | Larry Hardesty, etc
Compile | ziqi Zhang
That’s right! Artificial intelligence is hot, neural networks are hot, but do you really understand it? How does a neural network actually work? No one knows.
It’s like a black box. We can use it, but we can’t understand it. But recently, the elite at the Massachusetts Institute of Technology (MIT) seem to have found the answer.
What methods did these elites use? How reliable are their answers? What kind of questions do people have about this approach? Can their methods lead to the true unlock of the black box?
This article, “How Neural Networks Think,” was compiled by AI Tech Camp and published on the MIT website. Enjoy!
But what is its essence?
However, we have never been concerned with how these parameters drive the neural network to work.
At the recent 2017 Empirical Methods on Natural Language Processing Conference, Researchers at MIT’s Artificial Intelligence Lab have come up with a new idea. By studying neural networks that perform NLP tasks, they’ve been able to get computers to interpret random text in our everyday language — the exact opposite of looking up structured language in databases.
This technique works for any system that takes text as input and string results as output, such as the automatic translation system commonly used. Because the results of its analysis are derived only from different inputs and feedback on the effects of those outputs, it can be used directly on an online natural language processing server without accessing the underlying software.
In fact, the technology can be applied to any text processing system that uses “black box” technology as its internals, because it doesn’t care about the internals. The researchers showed that in their experiments, “neural networks” in human translation work showed the same characteristics as the human mind!
Stop arguing and do the experiment!
Although the technology is aimed at natural language processing, its core idea is somewhat similar to that of neural networks for computer vision tasks.
For example, you should be familiar with the target detection system. Generally speaking, its principle is to divide the image into different parts, and then return the divided image to the target recognizer, and finally classify the image according to its features.
For target detection, this method is easy to implement, but it is very difficult to apply to natural language processing.
Tommi Jaakkola, a professor in the department of electrical engineering and computer Science at THE Massachusetts Institute of Technology, wonders if such splitting does not destroy the semantic meaning of a sentence.
You can’t just do a simple random process and come up with a method. You have to be able to predict relatively complex objects, such as in a sentence. What does this method mean?
Interestingly, the paper’s lead author, David Alvarez-Melis, also a graduate student in MIT’s Department of Electrical engineering and Computer Science, used the black-boxed neural network to generate test statements and then used the test set to test the performance of the black-boxed neural network.
How exactly did they test it?
First, they trained a network that compresses and decompresses our everyday statements as the middle layer of a black-box neural network. Compression is the packaging of statements as numerical representations to facilitate information transfer;
Decompression is the process of reexpanding the packaged statement into its original form. In the process of neural network training, the system will adjust the encoder and decoder simultaneously according to the matching degree of the output of decoder and the input of encoder.
The nature of neural networks is a probabilistic problem. Why is that?
For a target detection system, if we detect the image of “feeding the dog”, the result of neural network discrimination may be that the subject of the image has a 70% probability of being a dog and a 25% probability of being a cat, so the system can identify the dog we are feeding instead of the cat.
Similarly, in Jaakkola and Alvarez-Melis’s sentence compression network, they provided alternative words for each word in the decoded statement, and the probability of each alternative word.
It can be understood that the network will naturally select co-occurrence words to increase the decoding accuracy. For words with the same probability, words with the same semantic association are generally found through clustering.
For example, the coded statement is “She gasped in surprise” (She gasped), Alternatives like “She squealed in surprise” (She screamed in surprise) and “She gasped in horror” (She gasped in horror) were automatically assigned fairly high decoding probabilities.
However, it will also reduce the number of “She swam in surprise” and “She gasped in coffee.” The decoding probability of this type of statement.
Then, for any statement, the system produces a list of statements associated with it, and Jaakkola and Alvarez-Melis put the list of statements generated by the black-box-neural network back into the black-box-neural network. The result is a new list of input-output pairs, a discriminant list that allows the researcher to further analyze which inputs cause which internal changes, and thus which outputs.
Here, test it out
The researchers applied the technique to three different natural language processing systems: one for inferencing the pronunciation of words, another for common translation, and a simple human-computer conversation system that provides a reasonable answer to any question.
The analysis of the translation results of the translation system shows that there is a strong dependence of words in the input and output sequences. This is similar to what people would expect. Interestingly, when translating text, machine translation systems sometimes differ greatly from our human expectations in gender discrimination.
For example, dancer is a gender-neutral word in The English language. But in French, dancers are divided into “danseur” (male dancer) and “danseuse” (female dancer). when
When The system translates “The dancer is charming” into French, it usually translates as “la danseuse est charmante.” This is because the translation system understands that “female dancer” might be a better word for “attractive,” ignoring the fact that the original translation did not consider the gender of the dancer.
It can also be seen that different adjectives in a sentence have an effect on the translation of other words.
For man-machine dialogue system, it selected from the Hollywood movie plot double has carried on the training and testing, but the system test results, although researchers select a large training set to train, but because the network itself is too small, make the network cannot be completely use of the training data.
“The system we built was really flawed,” Alvarez-Melis explains. “If you design a black box model and it doesn’t work as expected, can you go straight to the problem and solve it? For now, it’s better to fix the system, improve the performance of the system, and understand what caused the error.”
In this case, the researchers’ analysis suggests that human-computer conversation systems typically type keywords into input dialogs that are used to generate fixed conversation templates. For example, to questions that begin with “who” and “what”, the conversational system often responds with “I don’t know what you’re talking about”.
The original address
http://news.mit.edu/2017/how-neural-networks-think-0908