Ian Goodfellow, the father of GANs, recently answered questions on Quora. During the Q&A, Goodfellow not only talked about what Google Brian is doing right now, but also detailed the various problems GANs is having and where it’s headed. Goodfellow, one of the authors of Deep Learning, also spoke about the future of Deep Learning. In addition, Goodfellow, a former student of “machine learning guru” Yoshua Bengio, even offered some advice on how to pick a class for college students and indie developers.


Reply “DL” on the official account of AI Science and Technology Camp, and you can get the ebook of Deep Learning.


The author | Lan Goodfellow

| AI base of science and technology (rgznai100)

Participate in | Peng Shuo, march


Here’s Goodfellow’s answer:


Current problems and future of GANs



Q: What is Google Brain doing in 2017?

The Google Brain team is quite large and each researcher has a lot of freedom to decide what they want to do and implement their ideas, so it’s hard to make a summary of what we do.

You can check out the Research we did on our website: Research at Google

We’ll do basic research to improve machine learning algorithms; Developing computer systems that drive machine learning (such as TensorFlow); Using machine learning to solve problems in areas such as healthcare, robotics, and music and art creation.

The basic research team THAT I’m working on is focused on adversarial machine learning.

Q: GANs can generate beautiful images of celebrity faces and bedrooms, why can’t they do so well in thousands of other categories (ImageNet)?

We don’t really know. We used to think it was caused by the small-value loss function similar to Jensen-Shannon discretization theorem. For a generated model, if the model is large enough, many loss functions can precisely replicate the training distribution.

So the question is, what happens when the model isn’t big enough? Does the model only deal with the actual samples, but forgo generating samples similar to some of the classes in training? Or does the model try to include all categories, but sometimes some categories come together to produce vague, unrealistic samples? According to the Jensen-Shannon discretization theorem, it’s the former.

We’ve been thinking about this for a long time, and that’s what caused the model to crash. Later, I found that a few different loss functions could be used in generative adversarial networks. For example, KL discrete values between data and model tend to choose the latter. These loss functions generally do not cause the model to crash, but problems still occur.

After that, I think the model crashes due to non-convergence in the learning algorithm. This means that Nash equilibrium in generative adversarial networks can cover all models, but learning algorithms may not find Nash equilibrium. I thought it was necessary to design a better reliable learning algorithm that could find Nash equilibrium. In subsequent months, reasoning has shown that there are several reasons to believe that the existing algorithm should have been able to find a Nash equilibrium: [1706.04156] Gradient descent GAN optimization is locally stable; [1706.08500] GANs Trained by a Two time-scale Update Rule Converge to a Nash Equilibrium

It is important to clarify these theoretical results, because our practice of generative adversarial networks does not necessarily correspond to the theoretical assumptions, which are not straightforward to say that generative adversarial networks must reach a Nash equilibrium state, etc. But putting these theoretical results together does reduce my confidence that model crashes are due to nonconvergence.

Recently I was thinking that model crashes might come from the architecture of the neural network we use. If I train an autoencoder to reconstruct an ImageNet image and then extract and decode the encoding values randomly from a Gaussian distribution, all the resulting images are semantically similar. I expected to generate all sorts of disappointing images that didn’t look like training data at all. Instead, its model crashed, even though there had been no GAN training at all.

This brings me to the architecture of generator networks, which makes it easier to express functions that contain schema crashes rather than functions that correspond to various samples. I wonder if we’re seeing anything like a power function, where we’re constantly multiplying a vector times the pivot eigenvectors of the matrix by transformations of the matrix. Obviously, generator networks do not behave like power functions because each layer has different parameters, but I wonder if we are overlooking some related effects.

Q: Under what circumstances can GANs generate high-quality images from any ImageNet category (not just celebrities or bedrooms)?

Perhaps the best GAN for generating multiple ImageNet categories is the auxiliary classification GAN: ([1610.09585] Conditional Image short With Auxiliary Classifier GANs – https://arxiv.org/abs/1610.09585).

It is important to remember that the training process uses classification tags to divide the data into groups of different categories, so it is more like several GANs that generate different kinds of images rather than one giant GAN, thus avoiding model crashes.

Q: Is adversarial training effective against adversarial samples in general?

Generally speaking, no.

In general, in training, if you use optimization algorithm X to create adversarial samples, then the model will adapt to algorithm X, but will fail to identify adversarial samples generated by algorithm Y. Alexey and I found this effect by looking at a variety of algorithms.

I’ve also heard of people using very powerful optimization algorithms and finding that in some cases, the model simply doesn’t fit the adversarial samples in training. Nicholas Carlini told me this happened during his thousands of iterations of Adam-based attack training.

Recently, Aleksander Madry found that during training in a gradient rise confrontation with random restarts, he encountered a model that he could not break. In both the MNIST data set and ciFAR-10, I think there are still significant shortcomings.

If you want to try breaking Madry’s model, see MadryLab/mnist_challenge

Q: Can GANs be used to create signature and sabotage identification software? How do system security experts prevent it
The problem?

All tools can be used for good or evil. Hammers can be used to build a house, but unfortunately hammers can also be used as weapons. There will be people using GANs to do bad things. Generating Adversarial Malware Examples for Black-box Attacks Based on GAN [1702.05983] Generating Adversarial Malware Examples for Black-box Attacks Based on GAN

Q: How is adversarial learning different from reinforcement learning? Can they work together like CNN+RL (Convolutional Neural Network + Reinforcement learning)?

In traditional machine learning, there are two concepts: cost and parameters, and training algorithms modify parameters to reduce costs. In adversarial machine learning, there is more than one “player,” each with different costs and different parameters. Each player can only modify their own metrics, but the cost depends on all players. It is possible to use adversarial methods to reinforce learning. For example, you can train RL agents in adversarial samples, In order to make it stronger robustness. [1705.06452] Delving into adversarial attacks on deep policies (https://arxiv.org/abs/1705.06452)

Q: What are some new and interesting areas of adversarial machine learning research?

Defense against samples is a very hot topic. If you want to research this topic, you can check out our Kaggle contest: NIPS 2017: Targeted Adversarial Attack

  • Aleksander Madry et al. found that the adversarial samples generated by the random restart iterative algorithm were used to conduct adversarial training on the machine learning model, and the defense effect of the obtained model on MNIST and CIFAR was very good.

  • Another hot topic: How to better transfer adversarial samples between different models, how to make adversarial samples deceive models applied in the real world [1602.02697] Practical black-box Attacks against Machine Learning Use camera to view counter samples from different distances and angles: Robust Adversarial Examples.

  • A lot of my personal work is focused on making gans more stable, reliable, and easy to use.

  • There is a lot of interest in how GAN can be used for text.

  • Countering techniques for hiding information: Generating Steganographic Images via Adversarial Training

  • [1605.07725] Adversarial Training Methods for semi-supervised Text Classification; [1705.09783] Good semi-supervised Learning that Requires a Bad GAN

Of course, there are many other studies going on, so I won’t list all of them here.

Q: What are your favorite methods for hyperparametric optimization?

Random search – Run 25 jobs simultaneously with random hyperparameters, pick the best 2-3 of them, then tighten the random distribution and spend more time on the best jobs. Then run a new set of 25 jobs.

Almost every year I try out the latest popular hyperparametric optimizers to see if they’re any better than random searches. So far, I haven’t found any optimizer that really beats the random search program mentioned above. I realize that others will have different experiences, and I tend to use more hyperparameters than most because I’m working on using hyperparameter optimizers in my environment.


The future of deep learning

Q:Which direction will deep learning go?

Deep learning still has a lot of work to do. What we need to do is expand our work in many directions, not just one:

  • Better reinforcement learning, integrated deep learning and reinforcement learning. Can reliably learn how to control the robot reinforcement learning algorithm.

  • Better generative models. Algorithms that reliably learn how to generate images, speech, and text that the human eye cannot distinguish.

  • Learning how to learn and deep learning everywhere: algorithms that can redesign their own architecture, adjust their own hyperparameters and so on. Human experts are still needed to run the “learn how to learn” algorithm, but in the future it will be easier to deploy and all sorts of companies that don’t specialize in AI will be able to use deep learning.

  • Security for machine learning, machine learning for security: More cyber attacks will use machine learning to generate more autonomous malware, more deceptive vulnerabilities, and more. There will also be more cyber defenses that use machine learning to respond faster than humans, detect more stealthy intrusions, and so on. The offensive and defensive machine learning algorithms will trick each other to attack or defend.

  • Active Dynamic routing will prompt humans to build larger models that may require fewer computations to handle a single sample than current models. But in general, massive computing will remain key to AI; Whenever we build something less computationally intensive, we want to learn how to learn by running thousands of models in parallel.

  • Semi-supervised learning and one-off learning will reduce the amount of data required to train multiple models and make AI more widely available.

  • Future research will focus on building ultra-robust models that almost never go wrong and can be applied to programs where security is critical.

  • Deep learning will continue to make its way into popular culture, and we’ll see artists and pop culture creators using it to accomplish things we never thought of. I think Alexei Efros’s lab and projects like CycleGAN are just the beginning of this trend.

    Q: What areas of machine learning will replace deep learning in the future?

    The definition of deep learning is very broad, and I’m not sure it will ever be replaced.

    Deep learning means learning several steps of processing, not just one. In this sense, the number of deep algorithms, as opposed to shallow ones, will proliferate in the future.

    Deep learning was very popular between 2006 and 2011, but it often meant stacking up unsupervised learning algorithms to define complex features for supervised learning.

    Since 2012, deep learning has generally meant optimizing all parameters in a deep computation graph with a differentiable function by back propagation.

    For now, we’ll soon have algorithms that are more Bayesian (rather than based on a point estimate of the best parameter), that use more non-differentiable operations, and so on. When we stop to think about whether this is “deep” or not, we may still think it is.

    I don’t think we’ll pay much attention to the difference between “deep learning” and other learning algorithms in the future. Deep learning has already been accepted.



    Q: What changes do you most expect to see in AI over the next five to 10 years?

    Different people have different expectations. Personally, I am more eager to see:

    • A powerful defense for generating countersamples can work in real data sets like ImageNet.

    • Adversarial samples can be theorized, similar to what we now know about supervised learning in general. (NFL theory, VC theory, etc.)

    • Generative models can make a real difference.

    • Efficient sample learning algorithms can learn from a small number of labeled samples just like humans.



    Advice for college students and independent researchers

    Q: How should sophomores learn about AI?

    If you’re ambitious, try your hand at learning and participating in our Counter Sample contest: NIPS 2017: Non-Targeted Adversarial Attack.

    • Take courses in linear algebra and probability theory.

    • Take a class that teaches you how to write FastCode for existing hardware. These courses are mostly hardware design courses rather than programming courses. If you can’t find such courses, you can ask your academic advisor for help.

    • Take a class that teaches you how to write high-performance, real-time, multithreaded code. Sometimes this topic is grouped under another course, such as operating system development or game development.

    • Read Deep Learning.

    • Choose a simple deep learning project that interests you. If you’re not interested in applying deep learning to your hobby or class project, you can do something generic, like building an SVHN classifier from scratch in TensorFlow. During the course of this project, if you have questions about what you have read about deep learning, try running experiments on your project data set to clarify these questions.

    • If your college offers courses in machine learning, computer vision, natural language processing, and robotics, be sure to take them.

    • Apply for an internship at Google Brain.

    Q: Would you encourage people in other fields (like mechanical engineering) to learn ML (machine learning)?

    Yes, of course.

    Geoffrey Hinton, one of my heroes, got his PhD in experimental psychology.

    In my opinion, in mechanical engineering, you learn a lot of the mathematical tools that are used in machine learning, so you don’t start from scratch.

    In short, we often find that when a person comes to a field with a new idea from another field, it revolutionizes the field.

    Q: How do people learn ML without any technical background?

    It’s important to know the basics, such as linear algebra, probability theory, and Python programming.

    But you don’t need to know all the knowledge of linear algebra (out of the campus, I never used QR decomposition), all of the knowledge of probability theory (about combinatorics, depth of learning sequence reorder we use less), or all of the Python programming knowledge (many more obscure language features by many companies actually disabled).

    To get started, learn enough linear algebra, probability theory, and Python programming. With Python and Numpy, you can do logistic regression yourself.

    I think if you can read Chapters 1-5 of Deep Learning, you can learn everything about that project except Python programming. I don’t know if that’s possible, because it’s hard for me to think about it from a point of view without any technical knowledge. Obviously, it takes a lot of patience and dedication to absorb that much knowledge from scratch, but we’ll try to give you as much detail as possible in the book to get you there.

    Q: Which 10 books would you most recommend?

    1. First of all, I would recommend Deep Learning, a book co-authored by Yoshua, Aaron and ME.

    2. Pattern Recognition and Machine Learning by Chris Bishop

    3. Daphne Koller and Nir Friedman, Probabilistic Graphical Models

    4. AI: A Modern Approach by Stuart Russel and Peter Norvig

    5. Introduction to Algorithms by Thomas H. Cormen

    6. Cracking the Coding Interview by Gayle McDowell

    7. Difficult Conversations: Co-authored by Douglas Stone et al. How to Discuss What Matters Most The Social Side of an AI Career is extremely Important Too! (Networking for the AI business is also extremely important!)

    8. Elements of the Theory of Functions and Functional Analysis by A.N. Kolmogorov and S.V. Fromin

    9. If you’re interested in generative modeling or computer vision: Natural Image Statistics

    10. Linear Algebra by Georgi E. Shilov

    Q: How do independent learners or researchers compete with big companies and institutions like Google, Facebook or OpenAI in deep learning?

      At Google Brain, we often think about how to choose projects that are different and really worth working on. During my time at OpenAI, the company also made topic selection an important part of its strategy.

      The worldwide focus on AI has meant that AI research has moved from optimisation to game theory. In the past, researchers could choose what they thought was a good topic to study. Now, it’s important to anticipate what other researchers will study and choose a topic that can bring a unique advantage.


      The original address