preface

The goal is to train the bird to stay alive as long as possible through as many pipe obstacles as possible to maximize the game score. Reinforcement learning is used when it is difficult to program a complete list of predetermined behaviors.

We plan to design a reinforcement learning system, in which the Q function is simulated by a convolutional neural network (deep Q network), whose input is the original pixel, and the output is a value function to evaluate future rewards, which can guide the agent to take actions. The training of the deep Q network is to extract an image of the game running and output the necessary actions to perform from the set of actions that can be taken. This is equivalent to a classification problem, but unlike common classification problems, the model cannot be trained with labeled data. Instead, it can be trained with reinforcement learning, based on performing the game and evaluating the actions in a given state based on the observed rewards.

1. Reinforcement learning system

2. Preprocessing of frame data

1. Capture frame images

Get every frame rendered in the PyGame-based game window:

image_data = pygame.surfarray.array3d(pygame.display.get_surface())

Where pygame.display.get_Surface () is used to get the Surface object currently displayed, where the Surface object is the object used in PyGame to represent the image.

Pygame.surfarray.array3d () copies the pixel’s RGB color integer values into a THREE-DIMENSIONAL array.

2. Preprocess the initial state

The initial state is the first frame image captured, but our state requires 4 consecutive frames. Here, we directly repeat the first frame image 4 times as the initial state of the acquired game:

Def get_init_state(self, x_rgb): X_gray = cv2.cvtcolor (X_RGB, cv2.color_bgr2gray) # Convert RGB image to grayscale image ret, X_t = cv2.threshold(x_gray, 1, 255, cv2.thresh_binary) # Stack ((x_t, x_T, x_T, x_T), axis=2) # splice a frame so that the dimensions of the picture are [80, 80, 4] return stateCopy the code

Where Np. stack((x_t, X_T, X_T, X_T), axis=2) splice a frame of image, making the dimension of the image [80, 80, 4], Where the parameters (X_t, X_t, X_t, X_t) represent the images to be concatenated, and axis = 2 represents the concatenation operation in the third dimension.

3. Preprocess the next state

Also, the relevant function to get the next state of the game is as follows:

Def get_next_state(self, state, x_rgb) def get_next_state(self, state, x_rgb) X_gray = cv2.cvtcolor (X_RGB, cv2.color_bgr2gray) # Convert RGB image to grayscale image ret, X_t = cv2.threshold(x_gray, 1, 255, cv2.thresh_binary) # 0 x_T = np.0 0 x_T = Np.0 (x_t, (80, 80, 1)) # Change the dimension to (80, 80, 1) Next_state = np.appEnd (x_t, state[:, :, :3] Axis =2) # return next_stateCopy the code