- Differentiable Plasticity: A New Method for Learning to Learn
- Originally written by Uber Engineering
- Translation from: The Gold Project
- This article is permalink: github.com/xitu/gold-m…
- Translator: luochen
- Proofreader: SergeyChang xxholly32
The neural network that underlies Uber’s machine learning system has proved very successful in solving complex problems including image recognition, language understanding and game theory. However, the network is usually trained to a terminating point by gradient descent, and the network connection is constantly adjusted according to the performance of the network in multiple trials. Once the training is complete, the network is fixed and the connections do not change; So, in addition to later retraining (which again requires a lot of samples), the network actually stops learning at the end of training.
By contrast, the plasticity shown by biological brains — the ability for connections between neurons to change autonomously and continuously throughout life — enables animals to learn quickly and efficiently from ongoing experiences. The level of plasticity in different regions and connections in the brain is the result of fine-tuning over millions of years of evolution to enable effective learning throughout an animal’s life. The resulting capacity for continuous learning allows animals to adapt to changing or unpredictable environments with very little additional data. We can quickly remember scenes we have never seen before, or gain new knowledge from a few experiments in a completely unfamiliar situation.
To provide similar capabilities to our AI agents, Uber AI Lab has developed a new approach called microplasticity that allows us to train malleable connectivity behavior through gradient descent, so that they can help previously trained networks adapt to future environments. However, evolution of this plasticity neural network has long been the field of evolutionary computing. To our knowledge, the work presented here is the first to show that plasticity networks can be optimized by gradient descent. Because recent breakthroughs in artificial intelligence are based on gradient-based approaches (including image recognition, machine translation, and chess playing). Adapting the plasticity network to GRADIENT descent training may greatly extend the power of both approaches.
How does microplasticity work
In our approach, each connection will have initial weights, including the coefficients that determine the plasticity of the connection. More precisely, the activation value yi of neuron I is calculated as follows:
The first equation is the typical activation function of a neural network unit, excluding the fixed component of input weights (green) and the plasticity component (red). The Hi, J terms of the malleable components can be automatically updated as functions of input and output (as indicated in the second equation, other formulas are also possible and are discussed in this paper).
During initial training, gradient descent adjusts the structural parameters WI, J and α I, J which determine the size of the stationary and plasticity components. Thus, after initial training, agents can learn automatically from continuous experience, since the plasticity component of each connection is sufficiently shaped by neural activity to store information, reminiscent of some forms of learning in animals, including humans.
Demonstrate microplasticity
To demonstrate the potential of microplasticity, we applied it to challenging tasks requiring rapid learning from unpredictable stimuli.
In the image reconstruction task (Figure 1), the network stores a set of never-before-seen natural images; One of these images was then displayed, but half was erased, and the network had to reconstruct the missing half from memory. We show that microplasticity can effectively train large networks with millions of parameters to solve this task. Importantly, traditional networks with non-plastic connections (including state-of-the-art cyclic structures such as LSTMs) are unable to solve this task and spend considerable time learning greatly simplified versions of it.
Figure 1: Image completion task (each row is a separate reconstruction process (episode)). After displaying the three images, the network takes part of the image and has to reconstruct the missing parts from memory. Inplastic networks (including LSTM) cannot solve this task. The source images are from the CIFAR10 dataset
We also trained the plasticity network to solve the Omniglot task (a standard “learn to learn” task), which involves learning to recognize a set of unfamiliar handwritten symbols from symbols drawn by each person individually. In addition, the approach can be applied to reinforcement learning problems: the plastic network outperforms the non-plastic network in the maze exploration task, where the agent must discover, remember, and repeatedly reach the reward position within the maze (Figure 2). In this way, the simple idea of adding plasticity coefficients to neural networks offers a truly novel approach — and sometimes the best approach — to a wide range of problems that require constant learning from continuous experience.
Figure 2: Maze quest. The agent (yellow block) gets the reward by reaching as many reward locations (green block) as possible (the agent moves the reward to a random location each time it finds it). On the first attempt to explore the maze (left), the agent’s behavior was essentially random. After 300,000 explorations (right), the agent has learned to remember the reward site and automatically find its way to it.
Looking forward to
In fact, microplasticity provides a new biological heuristic approach to the classic problem of learning or meta-learning that can be very flexible just by utilizing gradient descent and the underlying building blocks (plasticity connections) in a variety of powerful ways, as demonstrated by the different tasks above.
In addition, it opens the door to multiple new avenues of research. For example, can we improve existing complex network architectures, such as LSTM, through connectivity plasticity? If the plasticity of connections is controlled by the network itself, then does it seem similar to how neuromodulators affect biological brains? Does plasticity provide a more efficient form of memory than the circulating network alone (note that the circulating network stores incoming information in neural activity, whereas the plastic network stores it in a larger number of connections)?
We intend to investigate these and other exciting questions in our future work on microplasticity, and hope that others will join us in our exploration. To encourage research into this new approach, we posted the code for the above experiment on GitHub along with a paper describing our method and results.
To receive future Uber AI Lab blog posts, sign up for our mailing list or you can subscribe to the Uber AI Lab YouTube channel. If you’re interested in joining the Uber AI Lab, please submit your application at Uber.ai.
Subscribe to keep up with the latest innovations in Uber engineering.
Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.