preface

Nowadays, machine learning has developed so rapidly that various algorithms emerge, especially deep neural network, which has made great achievements in computer vision, natural language processing, time series prediction and other fields. It can be said that this wave has driven many people to enter the field of deep learning and made some achievements.

As a discipline inspired by behaviorism theory in psychology, reinforcement learning involves multi-disciplinary knowledge such as probability theory, statistics, approximation theory, convex analysis, computational complexity theory and operational research. The difficulty and high threshold result in its slow development. As one of the most complex games in human entertainment, Weiqi has 19 horizontal and vertical lines, a total of 361 points, and the state space is as high as possible (note: the total number of atoms in the universe is, even if the entire universe of matter can not store all the possibilities of Weiqi).

Master(AlphaGo version) began to appear on Ewei.com and Tencent Noe Fox Wei.com in December 2016, winning 60 consecutive victories and stirring the go circle with its unprecedented strength.

The defeat of Go is a demonstration of the power of reinforcement learning. David Sliver and his team, the leader of AlphoGo, are renowned for their cutting-edge vision of the ultimate goals of artificial intelligence:

Artificial intelligence = DL(Deep Learning)+ RL(Reinforcement Learning) ==DRL

On the basis of great progress made in deep learning, the real development of deep reinforcement learning is attributed to the improvement of neural network, deep learning and computing power. David started a new research direction by using the approximation value function of neural network: Deep Reinforcement Learning (DRL) has proved the deterministic strategy.

Reinforcement learning

What is reinforcement learning? What is the relationship between machine learning algorithms (SVM, Bayes, decision tree) and deep learning (CNN, RNN, LSTM, GAN) that we have learned? This can be said to be every beginner’s doubt. In fact, reinforcement learning is similar to human learning (a child learning to walk, shown in the picture), in that it learns by trial and error, and uses the rewards of doing something as a basis for behavior improvement.

It fundamentally breaks the previous thinking of processing data, selecting algorithm model, training and testing, but solves problems from the perspective of strategy, value function and model. In order to make use of mathematics for general expression, markov decision process, which is typical of sequential decision problem, is widely used. In addition, dynamic programming, Monte Carlo and time sequence control are important methods to explore the optimal strategy of Markov sequence, and are used to teach agents how to explore and utilize in the limited state from the perspective of control. On the basis of the above, policy gradient and neural network are widely used in the approximation process of policy and value function.

Under normal circumstances, human learning is in the real environment, but reinforcement learning is not popular at the stage of high complexity, logical reasoning and emotion analysis, so having a simulation environment is an important basis of reinforcement learning. It can be said that the success of reinforcement learning comes from its success in the field of games, because games only involve strategic decisions and do not require complex logical reasoning (go calculates the probability of a move).

At present, there are many simulation environments, such as Gym games developed by OpenAI, DM_Control suite developed by Google DeepMind, etc. There are many small games in Gym that can directly train reinforcement learning algorithms, including classic Atari, Box2D, Classic Control, MuJoCo, Robotics, and Toy Text, each of which contains many mini-games, such as: Cartpole-v1, etc., can be used with reinforcement learning algorithms to make a car learn how to climb a hill on its own without human intervention, to make a goofy multi-jointed robot run (without any human knowledge), and to make a brick breaking game work its way through, Let the robot arm pick up things and move in a fixed area (moving things), let the SIMS teach how to ski score, and a lot of text generation games.

But as the saying goes :” to do a good job, must first sharp tool “, install a simulation environment of their own reinforcement learning is a cool thing, no longer need to watch other people’s games on YouTube, but really in their own machine to complete all kinds of strange upgrades, complete their dreams. Lucky in the unfortunate is, however, these can only provide us with a specific environment, verify and improve the algorithm, the development, the basis of the already meet the needs of most people, and for those who want to be in their chosen fields and areas of interest for students to do something is not enough, need to separate the custom a truly belong to their own development environment, Creating a reward for solving a real problem is the only way to feel truly fulfilled.