hidden
- 1. Deep reinforcement learning and DQN algorithm
- 2. Board games and seven Ghosts 523
- 3. DQN algorithm to solve seven ghosts 523
- 3.1 The first step is network structure
- 3.2 Step 2: Feature engineering
- 3.2 Step 3: Reward function
- 4. Experimental results and analysis
- Conclusion 5.
Deep reinforcement learning is the mainstream algorithm for developing game AI in academia. In this paper, we will use DQN algorithm to explore chess and card AI.
1. Deep reinforcement learning and DQN algorithm
Machine learning is divided into three parts: presentation, objectives, and optimization. Deep Learning (DL) belongs to the category of representation, which can automatically extract features of things and express them as computers can understand data. Reinforcement Learning (RL) belongs to the goal category, which can train the model to the desired goal by setting reward function and automatic exploration. Deep Reinforcement Learning (DRL) is an algorithm combining Deep Learning with Reinforcement Learning. It uses the powerful feature extraction and representation ability of Deep Learning and the target ability of Reinforcement Learning to achieve more general artificial intelligence.
Deep Q Network (DQN) is a method of Deep Learning and traditional reinforcement Learning algorithm Q-learning, and is one of the early representative algorithms of Deep reinforcement Learning. Q-learning algorithm is actually a state-action value version of Temporal Difference (TD) algorithm. Q-learning uses Markov properties and only uses the next step information. Q-learning enables the system to explore in accordance with the policy guidance, and updates the state value at each step of exploration. The update formula is shown below.
S is the current state, A is the action taken now, S ‘is the next state, A’ is the action taken in the next state, r is the reward obtained by the system,Is the learning rate,Is the decay factor. In the era of deep Learning, the combination of deep Learning and Q-learning, coupled with replay-memory and doubble-network skills, is the birth of DQN.
2. Board games and seven Ghosts 523
Board games are board games and card games collectively. Board games and card games are different, they have their own distinct characteristics. Board games are open card games, and the professional term is complete information games. Typical board games include Chinese chess, international chess and Weiqi. Card games are dark card games, which are technically non-perfect information games. Typical card games include Doudidou, Seven Ghosts 523, bridge and Texas Hold ’em.
The game we’re going to use this time is seven ghosts, five, two, three. There are different versions of Seven Ghosts 523 in different parts of China, and we implemented seven Ghosts 523 in an imperfect information game AI environment — RoomAI adopted a hometown version of a colleague: Many people use one or more poker, at the beginning of each random five cards, the size of the card arrangement – 7, King, xiao Wang, 5, 2, 3, A, K, Q, J, 10, 9, 8, 6, 4; The game is divided into two stages, the preparation stage and the card stage; In preparation, the player plays a number of cards to make up for a number of cards. At the drawing stage, the player who finishes his hand first wins.
RoomAI is an imperfect information game AI environment. In RoomAI, players get the information given by the game environment. The current player chooses the appropriate action, and the game environment pushes the game logic according to the action. Repeat the process until the winner is determined; The whole process is shown below.
In recent years, AI has made rapid progress in board game AI. In 1997, chess was first conquered by the computer system Deep Blue. In March 2016, go, the top board game, was defeated by the computer system AlphaGo. Not less than a year later, in January 2017, the computer system Libratus conquered Texas Hold ’em. Even go has been conquered, complete information board games have no academic value, but incomplete information card games still have some problems worth us to explore. While Libratus surpassed humans at Texas Hold ’em, it was only two players playing against each other. If it were Texas with multiple players, we would not be able to use CRM and would need to design a different algorithm framework. In Professor Tuomas Sandholm’s own view, that is a “totally different story”. Further, some card games not only allow multiplayer, but also allow or require cooperation between certain players. For example, there are three players fighting the landlord, two peasants against a landlord. In bridge and military chess, two players form a team and play against each other. These situations are also problems that have not been considered.
3. DQN algorithm to solve seven ghosts 523
We developed the DQN algorithm on the imperfect information game AI environment — the Seven Ghosts 523 game provided by RoomAI for three players. For simplicity, we only developed our AI on seven Ghosts 523 for three players, and the resulting model did not apply to any other number of seven Ghosts 523 games. When implementing DQN on the RoomAI, there are two things to note. 1) The data received by DQN algorithm is S, A, R, S ‘, where S ‘is the next state of S. But by supporting imperfect information games, after a current player makes an action, the current player doesn’t know what information the next player is facing. So s’ can only be the information that the current player faces next time. The reactions of other players in the middle can be considered part of the system. 2) The data received by DQN algorithm is S, A, R, S ‘. When the action space is fixed, the model can be updated according to Formula 1. In The RoomAI, however, the current player’s optional action space is not fixed. So you need to accept data S,a, R, S ‘,a ‘, where a ‘represents the current player-optional action space, provided by RoomAI’s API. There are three steps to implementing a seven Ghost 523 AI.
3.1 The first step is network structure
The first step in writing a card AI is to determine the network structure. The DQN we realized is as follows: the current state S and optional action A are respectively converted into vectors through CNN network, and then the two vectors are splicing together. Finally,a real value Q(s,a) is obtained through DNN.
3.2 Step 2: Feature engineering
The second step is to identify the characteristics. We need to extract features of states and actions. The state feature is a 15 * 5 size of 8 layers of pictures, 15 * 5 in each row corresponds to a point and each column represents a suit, a suit has a total of 4 normal suits and the virtual suits of the king. Of the 8 layers, 4 layers are enabled in the preparation phase and the other 4 layers are enabled in the card playing phase. The four layers are the current hand, the cards played in the player’s history, the cards played in the previous history, and the cards played in the next history. The action features are two layers of pictures with a size of 15 * 5, in which the first layer is enabled in the preparation stage and the other layer is enabled in the card playing stage, and each layer is full of cards.
3.2 Step 3: Reward function
The third step is to determine the reward function. Here we use a very simple reward function: the game is not over, the reward equals -1; When the game is over, the reward is calculated according to the score given by the game; RoomAI rated three people’s game of Seven Ghosts 523 at 2 points for the lone winner and -1 for the other two losers.
We have put the relevant code into RoomAI’s Model Zoo for those who are interested.
4. Experimental results and analysis
To conduct the experiment, we simply set up two baseline methods: 1) simple rule method and 2) random card method. The simple rule method is better than the random card method. According to our experiments, simple rules beat random cards 91.3 percent of the time in two-person matchups. When we train, all three are our models, and we train at the same time. In the evaluation, one side is our model and the other two sides are the baseline methods. The following figure shows the experimental results, where the abscissa represents the number of iterations (one iteration is equivalent to running 10 rounds) and the ordinate represents the winning rate of our model. In a random simulation of three equally capable models, each player should have a win rate of 33.33.. %. Is more than 33.33… % is better than the corresponding baseline method.
From the results shown above, DQN’s approach can learn better AI than random cards and simple rules. But these AI are not too strong, after all the two baselines are not very strong. Because we only used the simplest DQN, the network structure was not optimized, and features were not carefully selected. More work is needed if AI capabilities are to be further enhanced. This work is to verify whether the RoomAI framework can support THE development of AI. In the future, we will not continue this work, but turn to other algorithm frameworks to solve the problem of chess and card AI.
Overall conclusion: DQN can learn a relatively good AI, but more features, network structure and training objectives need to be optimized to get a more powerful AI.
Conclusion 5.
Deep reinforcement learning is the mainstream algorithm for developing game AI in academia. In this paper, we will use DQN algorithm to explore chess and card AI. We use the imperfect information game environment RoomAI provides on the seven Ghosts 523 game, with DQN development AI. Experiments show that DQN can achieve certain effect. The code for this article can be found on Github. Welcome to star.