The gods are fighting again: Should WE continue to do deep learning?

It’s quarreling again.

I’m talking about Gary Marcus, a professor of psychology and neuroscience at New York University, who recently posted a paper on arXiv listing the top 10 limitations of deep learning [1], which has sparked a lot of discussion about how DL doesn’t actually solve the problem.

Thomas Dietterich, a renowned machine learning expert and former president of the AAAI, quickly sent out 10 tweets refuting Marcus’s list of “ten SINS” and calling Marcus’s criticisms “very disappointing.”



When Marcus saw this, he responded angrily:

Dietterich confused concepts, misread my point, you wait, I’ll write a (technical) response next!



In response, Yann LeCun’s next comment snapped Marcus in the face — Tom was actually right, you know.

Why pay attention to this controversy? Deep learning is arguably the key to driving the latest wave of ARTIFICIAL intelligence, and we all know it works, but we also know that its theoretical foundations are questionable.

As previously discussed in NIPS 2017 “The Alchemy of Deep Learning”, many researchers will abandon deep learning again if there are other approaches to achieve the effect of deep learning models, and they are better interpretable. However, such an approach does not yet exist.

So, should WE continue to do deep learning?

“Debaters” on both sides, and LeCun, an active defender of deep learning



Marcus has long been a high-profile participant in the AI debate as a cognitive psychologist. Geometric Intelligence, the machine learning company he founded, was acquired by Uber. Personally, Marcus calls himself an “AI contrarian.” What is a contrarian? Someone who opposes or criticizes popular ideas. In this case, the “popular view” is, of course, deep learning.

Marcus’s research and theory focus on the intersection of biology and psychology. He takes an inborn stance on human speech acquisition, arguing that AI researchers need to explore what it would be like to embed innate mechanisms in machine learning. Marcus also challenged connectionism by arguing that the Mind is just a random array of neurons.



On the other hand, Thomas Dietterich, a machine learning pioneer, is the founding president of IMLS, the International Society for Machine Learning. Dietterich is not a blind supporter of deep learning. He actually sided with Ali Rahimi, who compared deep learning to alchemy, in the “alchemy” debate that was so popular at NIPS 2017. He told Shinjiwon,

I agree with Rahimi that we need more research, both experimental and theoretical, focused on improving our understanding of how and why the deep Web is useful.

Deep CNN has shown remarkable performance in image and voice tasks, which naturally leads to a lot of experimental work that people want to apply to many other problems. At the same time, this has attracted many people with no formal training to machine learning, including students and software engineers.

In many cases, one can actually solve problems with deep networks, but only after excessive trial and error and tuning. This marks the immaturity of the field, as a science and as an engineering discipline.

Dietterich said that despite the lack of scientific maturity, it was important to continue experimenting with different network structures and new tasks, or as Rahimi put it, to continue working on “alchemy.” But it’s also important to do experiments to understand why our method works, because that will help us improve our existing methods. “It is also critical to understand the weaknesses of the system, as practical applications need to be more powerful than current pure deep learning solutions.” He said.



As for Yann LeCun, I believe there is no need to say more. LeCun, one of the three gods of deep learning, is an active participant in various AI debates, and was recently described by some in the industry as “too defensive” and “insecure” for his aggressive defense of deep learning at NIPS.

In fact, LeCun had argued with Marcus many times before, and perhaps that’s why he didn’t bother to say much this time…

Against this backdrop, let’s take a look at the pros and cons.

Top 10 Limitations of Deep Learning?

Xinzhiyuan has compiled the 10 questions listed in Section 3 of Marcus’s paper, “The limitations of deep learning,” along with Diettrich’s points. For more information, see the link at the end of this article.

Thomas Dietterich:

1. Gary Marcus’s article is disappointing. He barely mentions the achievements of deep learning (such as natural language translation) and minimizes others (such as saying that there are 1,000 categories of ImageNet that are “very limited”).

2. DL not only learns representation, but also mapping. Deep machine translation reads the source statement, represents it in memory, and generates the output statement. This mechanism is better than any of the old symbolist (GOFAI) methods of the past

3. Marcus complains that the DL doesn’t extrapolate, but nothing can extrapolate. Interpolation from X to Y seems to be interpolation in a representation that makes X and Y look the same. This is more true of logical reasoning than of the connectionist approach.

4. DL can learn such representations better than any previous learning method. But I think these are just the first steps towards learning the higher levels of abstraction that Marcus is after.

5. I am excited about recent work on the study of disentangled representations, particularly beta-vae (ICLR2017) and Achille and Soatto’s theories on compression and minimal untangle (ArXIv1706.01350).

6. I am also interested in meta-learning methods, which are reflected in low-level deep learning representations. Perhaps they will be able to learn higher-level abstractions?

7. I believe that learning the right abstraction will address the key issues Marcus mentioned: data hunger, vulnerability to samples, inability to extrapolate, lack of transparency.

8. DL is essentially a new way of programming — “differentiable programming” — and the field is trying to formulate reusable structures in this way. DL also: convolution, pooling, LSTM, GAN, VAE, memory units, routing units and so on.

But no one thinks we have a complete set. No one knows the limits of differentiable programming. But the field continues to advance rapidly, and so does theoretical understanding.

10. We certainly need more theory and better engineering, and there are already plenty of promising research ideas.

Gary Marcus:

1. Deep learning lacks a mechanism for learning abstract concepts through explicit, verbal definitions. Currently, deep learning still requires a lot of data. Humans can learn abstract relationships in just a few attempts, with explicit definitions and implicit means. Deep learning currently lacks a mechanism for learning abstract concepts through explicit, verbal definitions, and machines must undergo thousands of training sessions to work best.

2. Deep learning does not understand abstract concepts. DeepMind uses deep reinforcement learning to play “brick breaking”, but the system does not know what a tunnel is or what a wall is. All it has learnt is a particular action in a particular situation. Deep learning currently does not have sufficient capacity to migrate.

3. Deep learning does not yet naturally deal with hierarchy. Most current language models based on deep learning treat sentences as sequences of words. When encountering unfamiliar sentence structures, the recurrent neural network (RNN) cannot systematically display the recursive structure of sentences. The association between features acquired by deep learning is planar and has no hierarchical relationship.

4. Deep learning is currently incapable of open reasoning. The system can’t understand the subtle difference between “John promised Mary to leave” and “John promised to leave Mary,” and the machine can’t figure out who’s leaving whom or what’s going to happen next.

5. Deep learning is not transparent enough, and the “black box” nature of neural networks has been the focus of discussion for the past few years. This problem is fatal for deep learning in financial transactions and medical diagnostics.

6. Deep learning has not been well integrated with prior knowledge, in part because the knowledge represented in deep learning systems deals primarily with (largely opaque) correlations between features, rather than with abstractions like quantitative statements (e.g., “All men will die”). The problems suitable for deep learning are more related to classification, while the problems related to common sense reasoning are almost beyond the scope of deep learning.

7. At present, deep learning is unable to distinguish causality from correlation. Roughly speaking, deep learning learns complex correlations between input and output features, but has no inherent representation of causality. A deep learning system, for example, can easily learn, in the whole population, the relationship between height and vocabulary, but is not so easy to find the correlation between growth and vocabulary development from one (the children learn more vocabulary at the same time to grow more and more high, but this does not mean that grow motivate them to learn more words, also is not learning new words could grow). Causality is a key element in some other approaches to artificial intelligence (Pearl, 2000), but little work in deep learning has attempted to address this problem.

8. Deep learning assumes that the world is generally stable, but it isn’t. Deep learning works well in highly stable worlds, such as board games with fixed rules such as Go, but not so well in changing systems such as politics and economics.

9. For now, deep learning is only an approximation, but the answer is not entirely credible. As mentioned earlier, deep learning systems perform fairly well in a particular area for the most part, but the system is susceptible to deception. There are many studies that reveal this vulnerability, such as misidentifying stop signs as speed limit signs and turtles as rifles. Szegedy et al. (2013) were probably the first to suggest that deep learning systems are “deceptable”, but four years have passed and no reliable solutions have emerged despite many positive studies.

10. Currently, deep learning is difficult to use in engineering. All of the above points to the fact that it’s hard to do robust engineering with deep learning. As Google engineer Peter Norvig (2016) pointed out, machine learning still lacks the progressiveness, transparency and debuggability of classical programming, and faces challenges in achieving robustness.

The original post was published on January 05, 2018

Author: Wen Qiang

This article is from xinzhiyuan, a partner of the cloud community. For relevant information, you can follow the wechat public account “AI_era”

The former president of AAAI criticized Marcus And LeCun for joining the war