Overlearning is a problem with many AI models, and to develop a general-purpose AI, OpenAI has released a special training environment called CoinRun, which provides Pointers for programs to transfer lessons learned in the past to new environments through game design.

Make a universal model for different tasks, to strengthen the depth of the current learning algorithm, is a big problem, although the training program can solve the complex task, but change to the new environment, the program will transfer experience challenges, especially intensive study program will often have the problem of excessive learning (overfittng), The model learning results are close to the training data, and the effect of other test data will be greatly reduced, and general skills cannot be learned.

CoinRun imitate sonic youth gaming platform, designed to make the existing algorithms are easy to be training environment, provide a large amount of training data can be quantified, CoinRun each level of the target is in a environment with obstacles, collect COINS and hit the barrier, if the program agent will die in the games, the only reward mechanism is to collect COINS, The bonus is fixed, and the level ends if the agent dies, accumulates coins, or moves 1,000 moves.

To assess the generality of the model, OpenAI trained nine agents to play on CoinRun using a common three-tier convolution architecture, also known as Nature-CNN. Eight agents were trained on levels 100 to 16,000, and one agent was trained on CoinRun. Therefore, the program agent will not see the same level twice. Each level is a new environment for the agent. The program agent is a program trained by Proximal Policy Optimization (PPO) algorithm and learning at a fixed level. Each level will be played tens of thousands of times, whereas without a fixed training level program, each level will only be played once.

OpenAI collected the results of a single trained AI program. In a training configuration of less than 4,000 levels, the program overlearned, and over 16,000 levels, the problem remained. OpenAI then trained the AI program using CoinRun’s fixed 500 levels and found that through multiple formalization techniques, Can improve training outcomes such as Dropout and L2 normalization, data amplification, and environmental randomness.

In addition, OpenAI has developed two other environments to study the problem of overlearning: coinrun-Platforms, a variant of CoinRun, and RandomMazes, a simple maze navigation environment. In these experiments, researchers used the original Impala-CNN architecture. To experiment with a long short-term memory (LSTM) model, in coinrun-Platforms’ environment, an AI program has to collect coins within a 1,000-step time limit, with coins scattered randomly across different levels, so the program must actively explore.

According to OpenAI, the results of this study provide more research directions for building a general model of reinforcement learning. Using the CoinRun environment, the problem of overlearning can be accurately quantifiable. With this metric, researchers can more accurately assess which architectures and algorithms to use. Such as studying the relationship between environment complexity and level number, whether repetitive architecture is appropriate for general-purpose AI, and exploring the most effective ways to formalize multiple combinations.

Article from: Shengbo hair has you