1 introduction

Meta Learning or Learning to Learn has become another important research branch after Reinforcement Learning (hereafter only called Meta Learning). The theoretical research on artificial intelligence has emerged

Artificial Intelligence –> Machine Learning –> Deep Learning –> Deep Reinforcement Learning –> Deep Meta Learning

This trend.

It all depends on current developments in artificial intelligence. In the era of Machine Learning, the effect of more complex classification problems is not good. The emergence of Deep Learning basically solves the problem of one-to-one mapping, such as image classification, one input to one output, hence the landmark achievement like AlexNet. But what if the output still affects the next input? In other words, the problem of sequential decision making cannot be solved by a single deep Learning. At this point Reinforcement Learning comes out. Reinforcement Learning = Deep Reinforcement Learning. With deep reinforcement learning, sequential decision making initially paid off, hence the milestone results like AlphaGo. However, a new problem arises. Deep reinforcement learning relies too much on a huge amount of training and requires precise rewards. For many problems in the real world, such as robot learning, there is no good Reward and no unlimited training. This requires being able to learn quickly. The key to fast Learning is the ability of human beings to learn and make full use of previous knowledge and experience to guide the Learning of new tasks. Therefore, Meta Learning has become a new direction to conquer.

At the same time, the failure of starcraft 2 DeepMind’s existing deep reinforcement learning algorithm shows that the current deep reinforcement learning algorithm is difficult to deal with the overly complex action space, especially the problems that require real strategic and tactical thinking. This brings us to one of the very core issues in general artificial intelligence: getting ai to think and reason on its own. AlphaGo, in my opinion, has done its thinking in the process of input of board features into the neural network. However, go’s action space is very limited, that is, hundreds of choices. This is a far cry from starcraft ii’s almost infinite choices (approximately 20,000,000 choices per screen resolution * mouse + keyboard keys = 1920*1080*10). However, in the case of such a large number of choices, human beings are still no problem. The key is that human beings greatly reduce the range of choices through certain strategies and tactics (for example, the current goal is to create people and mine). Therefore, how to enable artificial intelligence to learn to think and construct tactics is very critical. This problem is even more difficult than fast Learning, but Meta Learning may also learn to think because it has the ability to learn to learn. Therefore, Meta Learning remains one of the potential solutions to Learning to think about such difficult problems.

After the above analysis, just to draw the following conclusion:

Meta Learning is the key to achieving general artificial intelligence!

Before this article, the column has published two articles related to Meta Learning:

  1. Learning to Learn: Give AI core values that enable fast Learning
  2. Robot revolution and Learning to Learn

We used the name “Learning to Learn” in the hope that more friends can understand this concept. From the beginning of this article, we will directly use the name “Meta Learning” (in fact, just because it looks more professional and cool 😀).

The concept of Meta Learning will not be introduced in this article. The two blogs listed above already explain the concept. This article will share with you some of the most cutting-edge research progress of Meta Learning, which can be said to be the stage of contention of a hundred schools of thought.

2 Meta Learning research ideas of flowers blooming

Why Meta Learning research is a hundred schools of thought? Because every research idea is completely different, it is really a variety of methods and trials, showing a stage of wisdom explosion.

Papers of Meta Learning:

songrotek/Meta-Learning-Papers

Here basically analyzes the development situation that comes recently one or two years, make a classification first, do brief analysis next.

2.1 Method based on Memory

Learning from past experience can be achieved by adding Memory to a neural network.

Representative articles:

[1] Santoro, Adam, Bartunov, Sergey, Botvinick, Matthew, Wierstra, Daan, and Lillicrap, Timothy. Meta-learning with memory-augmented neural networks. In Proceedings of The 33rd International Conference on Machine Learning, pp. 1842 — 1850, 2016.

[2] Yu H. Signature T, Yu H. Meta Networks. ArXiv Preprint arXiv: 173.00837, 2017.

Meta-learning with memory-augmented neural networks

We can see that the input of the network also takes the last Y label as the input, and external memory is added to store the last X input, which enables the connection between Y label and X to be established when the next input is backpropagated. So that the later X can obtain relevant images through external memory for comparison to achieve better prediction.

2.2 Methods based on predicted gradients

Basic idea: since Meta Learning objective is to achieve fast Learning, and the key point is the neural network Learning quickly gradient descent to the accurate, fast, so if we can make the neural network using the previous task to learn how to predict the gradient, facing new tasks, such as long as the gradient prediction accuracy, then will you learn faster?

Representative articles:

[1] Andrychowicz, Marcin, Denil, Misha, Gomez, Sergio, Hoffman, Matthew W, Pfau, David, Schaul, Tom, and de Freitas, Nando. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pp. 3981–3989, 2016

The idea of this paper is very clear, training a general neural network to predict the gradient, using the regression problem of a quadratic equation to train, and then the neural network optimizer obtained by this method is better than Adam, RMSProp, which obviously speeds up the training.

2.3 The method of using the Attention mechanism

The basic idea: attention can be improved by using past experiences. For example, when we look at a sexy picture, we will naturally focus our attention on key points. So, is it possible to train a Attention model with previous tasks so that it can directly focus on the most important part of a new task?

Representative articles:

[1] Vinyals, Oriol, Blundell, Charles, Lillicrap, Tim, Wierstra, Daan, Matching networks for One shot learning. In Advances In Neural Information Processing Systems, pp. 3630 — 3638, 2016.

This paper constructs a attention mechanism, that is, the final label judgment is obtained by the superposition of attention:

Attention A is given by g and F. The basic purpose is to train a good attention model with existing tasks.

2.4 Refer to the LSTM method

Basic idea: The internal update of LSTM is very similar to the update of gradient descent. Then, can we use the structure of LSTM to train a neural network update mechanism, input the current network parameters, and directly output the new update parameters? It’s a very clever idea.

Representative articles:

[1] Ravi, Sachin and Larochelle, Hugo. Optimization as a model for few-shot learning. In International Conference on Learning Representations (ICLR), 2017.

The core idea of the article is the following paragraph:

How to link the LSTM update with gradient descent is more worthy of consideration.

2.5 RL-oriented Meta Learning method

Meta Learning can be used for supervised Learning, but what about reinforcement Learning? Can this be achieved by adding some external input such as reward and previous action?

Representative articles:

[1] Wang J X, Kurth-Nelson Z, Tirumala D, et al. Reinforcement Learning based on reinforcement. ArXiv Preprint arXiv:1611.05763, 2016.

[2] Y. Duan, J. Schulman, X. Chen, P. Bartlett, I. Sutskever, and P. Abbeel. Rl2: Fast reinforcement learning via slow reinforcement learning. Technical report, UC Berkeley and OpenAI, 2016.

The idea of the two articles is the same, that is, the input of reward and previous action is added to force the neural network to learn some task-level information:

2.6 The method of training a good base model is applied to both supervised learning and reinforcement learning

Basic idea: previous methods can only be limited to supervised learning or reinforcement learning, can we make a more general? Is it possible to work with a better base model than Finetune?

Representative articles:

[1] Finn, C., Abbeel, P., & Levine, S. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. arXiv preprint arXiv:1703.03400.

The basic idea of this paper is to start multiple tasks at the same time, and then obtain the synthetic gradient direction of different task learning to update, so as to learn a common optimal base.

2.7 Using WaveNet method

WaveNet network every time use the previous data, so whether can copy WaveNet’s way to achieve Meta Learning? Make use of past data?

Representative articles:

[1] Mishra N, Rohaninejad M, Chen X, et al. Meta-Learning with Temporal Convolutions. arXiv preprint arXiv:1707.03141, 2017.

Direct use of the previous historical data, the idea is very simple, the effect is extremely good, is currently the omniglot, mini Imagenet image recognition of the state-of-the art.

2.8 Methods for predicting Loss

Basic idea: In order to make the learning speed faster, in addition to better gradient, if there is a better loss, then the learning speed will be faster. Therefore, is it possible to construct a model to learn how to predict Loss using previous tasks?

Representative articles:

[1] Flood Sung, Zhang L, Xiang T, Hospedales T, et al. Learning to Learn: Meta-Critic Networks for Sample Efficient Learning. arXiv preprint arXiv:1706.09529, 2017.

In this paper, a Meta-critic Network (including Meta Value Network and task-actor Encoder) is constructed to learn and predict Loss of Actor Network. For Reinforcement Learning, this Loss is Q Value.

Learning to Learn: Let AI have core values for fast Learning

Here’s New York University’s Kyunghyun Cho:

It’s kind of a new way of thinking.

3 summary

From the above analysis, we can see that Meta Learning is on the rise, with various magical ideas emerging in endlessly, but the real killer algorithm has not yet appeared. We are looking forward to the future development! I also hope that more friends can devote themselves to the research direction of Meta Learning.