Translation | Peng Shuo Jiang Yi, reason_W
The redaction | reason_W
DeepMind’s open source starcraft ii AI platform, OpenAI AI system beat Dota2’s top players…… More and more tech giants are entering the game AI space and opening up their interfaces and data sets. The complexity of training data, the instant-changing combat environment, and the requirement for the ability to cooperate with multiple intelligences have led games like StarCraft to be called the key to general intelligence, heralding AI’s approach to the human mind in increasingly realistic, chaotic environments.
So how do white players get into pit AI? How does the game AI interface with the game to determine character states, perform actions, and plan strategies?
This issue push AI science and technology base camp (wechat ID: Rgznai100) selected a foreign blogger game AI series of articles, to start to build a game AI hand in hand.
Come to the bowl
Based on deep learning and other machine learning techniques, we will build a game AI for Road to Exile (PoE). This tutorial is divided into five stages, and the original posts are listed below.
-
Part ONE: Overview
-
Part two: Calibrate the projection matrix for Road of Exile
-
Part three: Movement and navigation
-
Part four: Real-time screen capture and underlying commands
-
Part FIVE: CNN real-time obstacle and hostile target detection based on TensorFlow
(This article is divided into 5 parts, 20 sections, 8,300 + words, and takes about 17 minutes to read, because we put all five parts into one article.)
The goal of our project was to create a game AI based on visual input that could successfully cruise and defend itself on the game map. Of course, you can also learn the joys of building game AI while playing the game.
Part ONE: Overview
This section consists of 2 sections, original POST link
A Deep Learning Based AI for Path of Exile: A Series
https://nicholastsmith.wordpress.com/2017/07/08/a-deep-learning-based-ai-for-path-of-exile-a-series/
PoE is an action game similar to diablo, Titan Quest, and other Diablo RPGS, as shown in Figure 1 below:
Figure 1: Screenshot of the game
Players are mainly through the mouse character movement, beat strange, open the box and game interaction. Keyboard hotkeys can also be set for special attacks, blue and other menu shortcuts.
1. Top-level design
The idea behind this AI is to use a convolutional neural network (CNN) to categorize the images in the game to create an internal representation of the game world. This internal representation guides the characters in the game world. The following flow chart illustrates the basic design ideas for game AI:
Figure 2: Flow chart of artificial intelligence logic
The main loop of the AI program drips a static image from the game and passes it to a CNN. CNN will predict what’s going on in this static image. These forecasts are then passed on to the world’s interior map, which is updated with the latest predictions. Next, the game AI performs a series of actions based on the current state of the map inside the world. Finally, these actions are translated into mouse and keyboard inputs and sent to the mouse and keyboard. The cycle repeats itself. Sounds easy, right? That’s right.
We chose Python(3.6) as the programming language for this time. The main libraries used are
-
scikit-learn
-
TensorFlow
-
PyUserInput
-
win32gui
-
scikit-image
In the next few sections, we’ll look at how to further break down these tasks and implement them step by step.
disclaimer
The logo and artists of PoE are the property of Game developer Grinding Gear Games (GGG). The author has no relationship with GGG, and the ideas and opinions of the author do not represent GGG’s opinions. The purpose of this article is to explore artificial intelligence and deep learning without infringing copyright or terms of service.
Part two: Calibrate the projection matrix for Road of Exile
This section consists of 5 sections, original post link:
Calibrating a Projection Matrix for Path of Exile
https://nicholastsmith.wordpress.com/2017/07/09/calibrating-a-projection-matrix-for-path-of-exile/
In this section, we’ll explore how a game’s static graphics can be used to update its internal representation of the world.
1. Challenge of visual input
One of the difficulties in getting game AI to interact with visual input is that the graphical data is 2D, but the world is 3D. So most likely, the game engine uses its own internal representation of the world in a 3D environment, and then uses projection technology to render the game in 2D and display it on the screen. It’s useful to get data from representations of the game world through reverse engineering, but we won’t explore this in depth for the time being because our ultimate goal is to build game AI.
To simulate the world more accurately, the game’s projection matrix should be as similar to the world as possible. This matrix is used to determine the 3D coordinates corresponding to the 2D coordinates on the screen, and (with some assumptions) vice versa. Figure 3 illustrates the basic concepts of projection mapping. The left rectangle represents the screen, while the right axis represents the world coordinates. Gray lines (projection maps) map blue dots from world coordinates to positions on the screen.
Figure 3: Projection projection
The process of approximating the projection matrix given a 2D image is called camera calibration.
2. Camera calibration
Camera calibration is accomplished by an image containing an object with a known three-dimensional dimension. An optimization problem for solving transformation matrix is constructed by mapping from three – dimensional coordinates to two – dimensional coordinates. This idea can be expressed in equation 1.
Equation 1: Projection transformation
In the above equation, A is the projection matrix, W is the world point (3D) coordinate matrix, and P is the projection point (2D) coordinate matrix.
To calibrate PoE’s camera, that is, to determine A in the equation above, we will use several fixed-size boxes. The camera calibration process is shown in the screenshot below.
Figure 4: Camera calibration
Note that the approximate apex Angle of these bins is specified with a point and marked as the corresponding world point. This process is cumbersome and needs to be done manually. We specify the lower right corner of the box in the middle of the picture as the origin of coordinates (0,0,0), and assume that the box is a unit cube. The spacing between boxes is also unit length. Another point to note is that this projection is for screens with a resolution of 800*600. For other screen resolutions, pixel sizes will change and need to be recalibrated.
Its data point set (abbreviated) is shown in Table 1: world point coordinates & projection point coordinates
Table 1: Data mapping
Next, we build A transformation matrix, A, that projects 3D points onto 2D points.
3. Perform fitting
We use TensorFlow to construct a nonlinear fitting. Note that it is common to treat the calibration problem as a homogeneous least square problem. The Adam method seems to provide better results for this particular image.
(Click for larger version)
The projection matrix produced by running the code above is shown in equation 2. The final value may be slightly different due to initialization differences.
Equation 2: the resulting projection matrix
Using the formula given in equation 3, position C on the camera can be recovered in world coordinates. Notice that Q is a 3×3 matrix, and m is a 3×1 matrix.
Equation 3: Camera position recovery
The following code fits the projection matrix and the recovered camera position is represented by the variable CP.
Results 4.
The recovered camera positions are (5.322,-4.899,10.526). Now looking back at the initial screenshot, this value is in the same direction as we felt intuitively. The world space coordinates take the height, width, and depth of a box as units of length. Therefore, the camera position is approximately 5 box lengths up the x axis, 4 box lengths down the Y axis, and 10 box lengths up the Z axis. Using this projection matrix, we can project points onto the original image. The following figure shows how to project a grid point represented by an XY plane onto the original image.
Figure 5: Original image and XY plane to be projected
The projection above seems reasonable. When the projection matrix is well calibrated, a matrix of three dimensional points (1 point per action) can be projected onto the plane using the following function.
(Click for larger version)
5. Assume a peaceful shift
If it is assumed that the character moves only in the XY plane, the 3D position of the character can be recovered by the pixel coordinates of the character. Assuming z=0 and solving x and y in the projection equation, we can give the pixel coordinates of this role. The code to do this is as follows.
(Click for larger version)
In the above two functions, the transpose calculation of projection matrix is the main factor affecting efficiency. With these two functions in place, we can use the following code to calculate grid points in the XY plane on an 800 x 600 screen. The following function will be the key to tracking the player’s position on the first level plane later.
(Click for larger version)
In PoE, when the player moves, the camera moves (camera Angle is fixed). To track moving cameras and players, world points are shifted and panned back to their original positions before being projected. In practice, this is done by multiplying the projection matrix by a translation matrix to get the final projection matrix. Equation 4 shows a translation matrix that can represent the translation of a set of points by vectors (x, y, z).
Equation 4: a translation matrix
We can use matplotlib(https://matplotlib.org/) to construct an Xy-plane animation that simulates the movement of characters in the world. In the animation below, the camera moves linearly through several randomly generated points.
Figure 6: Camera pan motion
With the above code, the distance on the screen can be more precise. For simplicity, let’s assume that the player always moves in the XY plane. At some altitudes, however, this is not a reliable assumption. This part may need to be reconsidered considering the AI’s performance.
Part three: Movement and navigation
This section is the third Part of the article: PoE AI Part 3: Movement and Navigation
Original post link:
https://nicholastsmith.wordpress.com/2017/07/18/poe-ai-part-3-movement-and-navigation/
In this section we will explore the technique of moving a game character on a flat surface of the same height.
1. Mobile maps
In PoE, the player typically moves a character by clicking on a location, and then the character moves to the location where the mouse clicked. Figure 7 shows an example of moving a character by clicking the mouse.
Figure 7: Character movement
Now, to complete navigation, the AI maintains a data structure that represents a map of the world. This can be achieved by giving a correspondence between coordinates and positional types. For example, at a given time, the AI might have the data shown in Table 2 in its internal map.
World point coordinates & type
Table 2: Interior map
The map records the locations and types visited. This type marks whether the player can move to that position. The location types are “open” type or “barrier” type. With such a map, you can use breadth-first traversal to find the shortest path from one location to another.
2. Mapping between dimensions
Now, let’s assume the player is in position (0,0,0) and wants to move to (1,1,0). How do you use the mouse on the screen? Consider the previous sections, a calibrated projection matrix that allows us to more accurately approximate the player’s position in 3D coordinates. Thus, the projection matrix transforms the point (1,1,0) to determine its position on the screen. This is where the mouse is going to click.
In practice, I found that the displacement skill was quite inaccurate when the player assigned a target point for the character to move to. Especially when we click on obstacles. In this case, the role is usually moved near the click position. The picture below is an example of this.
Figure 8: Moving towards an obstacle
This picture shows the result of clicking a mouse over an obstacle. Note that while the player moves towards the mouse click, he stops at the obstacle.
3. Lightning transmission
Unfortunately, situations like figure 8 can cause the AI’s internal map to be inconsistent with reality. To solve this problem, I decided to use lightning teleport to move. Figure 9 shows the effect of three transfers.
Figure 9: Lightning transmission
In terms of character movement, the advantage of lightning teleport is that there are only two outcomes in motion, which is easy to determine; Either the player moves to the specified position or the player does not move to the specified position. This helps to keep the AI’s position within its internal map and in sync with the player’s actual position. Therefore, in order to move to position X, the AI first projects point X onto the screen, then moves the mouse to that position and triggers the appropriate key to perform the lightning teleport.
Motion detector
Now, our only remaining challenge is to verify that the transfer is successfully executed. If you click over an obstacle, the character does not teleport. To accurately predict this, we build a binary classifier that takes a portion of the screen as input and predicts whether the transfer is currently taking place. The program first extracted the 70×70 rectangle around the role from the screen as the input of the model.
To build the model, we manually construct the dataset using the game’s static images. Figure 10 shows the sample taken from the dataset.
Figure 10: Lightning transmits classifier data
The code to perform the prediction is as follows. The following code assumes that the file lwtrain.csv has multiple lines of the format: filename, Y/N. In each line, filename is the path to the above image file, Y indicates that the image is being transmitted, and N indicates the opposite, indicating that no transmission is taking place.
(Click for larger version)
Therefore, after firing the appropriate key, the AI calls the DetectLW function (repeatedly) to check that the move was successful. Upon success, the character’s position on the map is updated. If the teleport is not detected for a certain amount of time, the move is assumed to have failed and the player’s position on the map does not change.
Part four: Real-time screen capture and underlying commands
PoE AI Part 4: Real-time Screen Capture and Plumbing
The original post links
https://nicholastsmith.wordpress.com/2017/08/10/poe-ai-part-4-real-time-screen-capture-and-plumbing/
As we discussed in the first part of this series, the AI program takes a screen shot of the game and uses it to make predictions to update its internal state. In this section, we’ll explore ways to capture game screenshots.
Figure 11: AI logic flowchart
1. Available libraries
There are many Python libraries that can be used to capture screenshots of games, Such as pyscreenshot (https://pypi.python.org/pypi/pyscreenshot) and ImageGrab the from PIL (http://pillow.readthedocs.io/en/3.1.x/reference/ImageGrab.html). Here is a simple program to test the effect of image capture. The code is as follows:
(Click for larger version)
Unfortunately, the results don’t look good. If nothing else, expect the program to process five or six frames per second. Also, in the code above, screen capture runs on the main thread. So the entire program waits for the graphics to come in and doesn’t process or interact with the game during that time. Another problem is that only the game footage should be captured (in window mode), not the rest of the desktop.
2. Use Windows APIS
We can mitigate these problems and improve program performance by calling several Windows apis.
(Click for larger version)
In the GetHWND function above, we use win32gui.findWindow (None, wname) to get a handle to the game window. In this case, wname should be “Path of Exile” or win32gui.findWindow (None, “Path of Exile”).
The handle to the game window, Win32gui.getwindowRect (self.hwnd), gives the position of the game window on the screen. These values are necessary to convert the mouse movement in the game window (size 800×600) to an absolute value on the screen (usually something like 1920×1080).
The GetScreenImg function above is used to actually capture the game screen image and store it in the NUMpy matrix. There are three main considerations for this code. First, the game window has a border that is not useful to the AI program and can be discarded. The variables self.bl, self.br, self.bt, and self.bb store the left, right, top, and bottom borders of the window, respectively. Second, the edges of the image need to be discarded so that the height and width of the image are multiples of 7 and 9, respectively. The reasons for this will be covered in the next part of this series. Third, bitmap data from the Windows API is organized into four 8-bit integer BGRAs representing the blue, green, red, and Alpha channels. Most Python image libraries require three channels like RGB to display images. The last line in GetScreenImg reverses the order of the channels and dismisses the Alpha channel, which is not used here.
3. Use parallelism
Since the game is constantly capturing images, we write the capture program into a single thread and provide an interface for other threads to read images in an asynchronous and thread-safe manner. This way the picture is always available in real time. This can be done by threading and locking objects in thread libraries.
(Click for larger version)
Results 4.
To time the new code, you should measure it in the ScreenUpdateT function. Here’s a quick but consequence-free way to get the final timing routine:
(Click for larger version)
Time is an order of magnitude faster. Right now, the theoretical maximum AI processing speed is around 64 FPS. The main AI program accesses the screen image using a data member of type ScreenViewer similar to the code below.
(Click for larger version)
The final part of this series will show you how to use convolutional neural networks (CNN) to process the images of the screen to update the state of the AI. (Note: I’m almost done…)
Part FIVE: CNN real-time obstacle and hostile target detection based on TensorFlow
This section is the fifth section: AI Plays Path of Exile Part 5: real-time Obstacle and Enemy Detection using CNNs in TensorFlow
The original post links
https://nicholastsmith.wordpress.com/2017/08/25/poe-ai-part-5-real-time-obstacle-and-enemy-detection-using-cnns-in-tenso rflow
As we discussed in the first part of this series, the AI program takes a screen shot of the game and uses it to make predictions to update its internal state. In this section, we will discuss ways to take visual input from game graphics and classify and identify that information. I have already put the source in my lot (https://github.com/nicholastoddsmith/poeai), happy O (studying studying) O ~ ~
1. Classification system architecture diagram
Figure 12: AL logic flow chart
Recall from part 3 that the moving map maintains a dictionary from 3D points to labels. For example, at a given time, the robot might have the data shown in Table 3 on the internal map.
World point coordinates & projection points
Table 3: Interior map
Recall from Part 2 that the projected map class allows any pixel on the screen to be mapped to 3D coordinates (assuming the player is always on the XY plane, the 3D coordinates are then quantized to some arbitrary precision that makes the AI’s world map a point in a evenly spaced grid).
Therefore, what we need is a way to identify whether a given pixel on the screen is part of an obstacle, an enemy, an object, etc. This task is essentially target detection. However, real-time target detection is a difficult problem with high computational complexity. Here we present a simplified solution that achieves a good balance between performance and precision.
To simplify the target detection task, the game screen is divided into rectangular areas of equal size. For the 800×600 resolution screen, we chose a grid composed of m = 7 rows and N = 9 columns. Twelve, four, and four pixels are removed from the bottom, left, and right edges of the image so that the resulting dimensions (792 and 588) are divisible by 9 and 7, respectively. Thus, each rectangle in the screen grid is 88 pixels wide and 84 pixels high. Figure 2 shows a game screen image segmented using the above scheme.
Figure 13: Game screen blocks
A convolutional neural network (CNN) is used to judge whether the screen cells contain obstacles or open classification tasks. Obstacles mean that there is something occupying the cell that makes it impossible for the player to stand there (boulders, for example). Examples of open and closed cells are shown in Figure 3.
Figure 14: Image cell label
The mission to identify objects and enemies used CNN for the second time. Given the cells on the screen, CNN classifies the cells as containing enemies, objects or nothing.
In order to target only living enemies, the binary classifier that determines whether a movement has occurred uses CNN for the third time. Given the cell on the screen, the third CNN determines whether there is movement in the cell. Only cells that contain movement can pass in a second CNN. This CNN then predicts whether the cells contain items or enemies. Detect the movement of the item label by toggling the item’s highlight in the continuous screenshot.
Image data for motion detection is obtained by capturing 2 frames of the picture in quick succession and reserving only the significantly different areas of the image. This is done using the numpy.where function (16 is an arbitrarily chosen threshold).
In summary, the screenshots captured from the game screen will be entered into each of the three CNNS. The first CNN detects obstacles in the screen cell. Then the 3D grid points in each cell on the screen are marked accordingly in the motion map. The internal map retains the predicted results for each cell and reports the most frequently predicted categories when the cell is queried. The second and third CNNS need to be used together to detect enemies and items.
2. The data set
Manually build the training dataset using the ScreenViewer screenshots. Currently, this data set contains only the “candied Lake” level of game action 4. The dataset consists of more than 14,000 files in 11 folders and is 164MB in size. A screenshot of the data set is shown in Figure 4.
Figure 15: Training data set
In the dataset, the images in the Closed folder are cells that contain obstacles. The first CNN used the folders Closed, Open, and Enemy. The second CNN uses folders Open, Enemy and Item. The third CNN uses folders Move and NoMove.
3. Training
AI uses a somewhat modest CNN architecture, with two sequences of convolution and pooling layers followed by three fully connected layers. The architecture is shown in Figure 5 below.
Figure 16: CNN architecture
match
The accuracy of approximately 20 to 30 EPOCHS cross-validation across the entire dataset ranged from moderate to 90%. Epochs is performed by randomly drawing batches of size 32 from the training data until an appropriate number of samples are drawn. Training for the NVIDIA GTX 970 takes about 5 to 10 minutes. Training takes about 5 to 10 minutes on the NVIDIA GTX 970.
4. Use parallelism for better performance
In order to improve the performance of AI, CNN detection is executed in parallel. This program allows for acceleration because numpy and TensorFlow code avoids the global interpreter locking problems of ordinary Python code. The startup code for the enemy classification thread is as follows.
(Click for larger version)
Figure 17: Thread logical organization
Therefore, the classification is performed in parallel, and the data members containing the predictions are supplied to the main thread in a thread-safe manner using mutex. Figure 6 illustrates the logical organization of threads and mutex. In the figure, ECP and PCT are the data members of the Bot class that contain the position of enemy cells and predict cell types, respectively.
Results of 5.
The following six-plus minute video summarizes the project, with a four-minute section showing how the AI can play Route of Exile (PoE).
Figure 18: PoE AI materials
Video address (over the wall) : https://youtu.be/UrrZOswJaow
More videos of the latest game AI can be viewed on the author’s Youtube page.
https://www.youtube.com/channel/UCdkASWTlm-9EuAdZmbhkxgQ
After reading the author so carefully of the tutorial, did you move? I need a clone soon
Nicholas T Smith is a graduate of The University of California, California, with a master’s degree in computer science, specializing in AI and machine learning software development.
Original link:
https://nicholastsmith.wordpress.com/2017/08/25/poe-ai-part-5-real-time-obstacle-and-enemy-detection-using-cnns-in-tenso rflow/
Making address: https://github.com/nicholastoddsmith/poeai
The selection of welfare
Pay attention to AI science and technology base camp, enter the public account, reply to the corresponding keywords to view classified topics; Reply “enter group” and join AI Science and technology base camp learning group.
Reply to “Deep Learning”, a collection of 30 in-depth learning articles.
In reply to “Machine learning”, 30 excellent articles on machine learning are recommended.
To reply to “Interview”, check out the exclusive interview transcripts of Wu Xizhi, Zhou Zhihua, Yang Qiang, Ant Financial Qiyuan and Toutiao Li Lei.
Reply to “resources”, a review of machine learning, deep learning, neural network and other resources.
Reply to “Video”, a 5 minute video with an easy introduction to ARTIFICIAL intelligence.
Reply to “programmer” and let you know how other programmers learn AI well.
Reply to “Data” to help you understand the relationship between ARTIFICIAL intelligence and data science.
Google Brain Deep Learning &Fast. Ai Most Practical Deep Learning &David Silver Deep Reinforcement Learning
Click to read the original article to see the AI Tech Base recruitment plan