An overview of the
CV(Computer Vision) has been relatively successful in the real world, such as face recognition, license plate recognition, fingerprint comparison, electronic image stabilization, pedestrian and vehicle tracking in daily life. What about other areas, such as mobile games, where people play a lot? The images of game scenes are still different from those of real scenes. The scenes of some games are relatively complex, such as interference of special effects, rules of game characters unlike real people, and the font of art is not fixed like the font of license plate, and there is a uniform background color, etc. Some elements are relatively simple, such as a fixed icon in a fixed position in the game, etc. Simple game elements can be used with traditional image detection methods, but also can be achieved better results. In this article, we will take a look at common game scene identification.
1. Process
Game scene identification can be divided into two modules, GameClient and CVServer. The GameClient module is responsible for obtaining real-time images from mobile phones or PC terminals and sending them to CVServer. CVServer processes the received Game images and returns the results to Game Client. The Game Client will then feed back to the Game end after further processing as required. The process is shown in Figure 1.
FIG. 1 Main process of game scene identification
2. Application Examples
The last section mainly shared the main process of scene recognition in the game. This section will mainly analyze the application of image recognition in the game.
2.1 Determination of game state
Each game UI is called a game state. Games can be thought of as having many different UI components. The sample library of these UI is established first, and the current game state can be judged by comparing the current image with the sample image when obtaining a game picture in real time. There are many methods to compare whether two images are similar. Here, we take feature point matching as an example, and the main steps are as follows:
Step1: feature point extraction of sample image and feature point matching of the image are tested
Figure 2 Extraction of feature points
Step2: Feature point matching
FIG. 3 Feature point matching
Step3: Match screening
Figure 4 Matching screening based on ratio-test
ORB feature point matching is a relatively mature technology. In the collected test data set, the size of the image or the UI rendering position will be greatly different due to the difference of mobile phone resolution, bangs or rendering, and it is difficult for the commonly used template matching to adapt to this situation. The matching scheme based on feature points does not exist in this case. Feature points generally refer to corner points or salient points in the image. It has no obvious relation with the position and size of object elements in the image, so it is more applicable. ORB feature point is a combination of FAST feature point detection method and BRIEF feature descriptor, and improved and optimized on their original basis. ORB feature points have rotation invariance and scale invariance. Feature point extraction, feature point description, feature point matching and feature point screening are introduced below.
2.1.1 Feature point extraction :FAST
The basic idea of FAST is that if a certain pixel P differs greatly from enough pixels in its surrounding neighborhood (1 to 16), the pixel may be an Angle point. While the original FAST feature points had no scale invariance, the ORB implementation in OPENCV achieved scale invariance by building a Gaussian pyramid and then detecting corner points on each pyramid image layer. The original FAST also did not have directional invariance, and ORB’s paper proposed a gray scale center of mass method to solve this problem. For any feature point P, the moment of the neighborhood pixel of P is defined as
Where I(x,y) is the gray value at point (x,y), and the centroid of the image is:
, the included Angle between the feature point and the centroid is the direction of the feature point of FAST:
Figure 5. Schematic diagram of FAST feature points (Image from the paper: Faster and Better: A machine learning approach to corner Detection)
2.1.2 Feature Description :BRIEF
The core idea of BRIEF algorithm is to select N point pairs around the key point P in a certain way, and then combine the comparison results of these N point pairs into a binary code string with length N as the descriptor of the key point. In the calculation of BRIEF descriptors, ORB establishes the coordinate system with the key point as the center of the circle and the line between the characteristic point P and the centroid (Q) of the point region as the X axis. The center of the circle is fixed, with PQ as the X-axis coordinate and the vertical direction as the Y-axis coordinate. Under different rotation angles, the point pairs extracted from the same feature point are consistent, which solves the problem of rotation consistency.
2.1.3 Feature point matching: Hamming Distance
The Hamming distance between two binary strings of equal length is the number of different characters in the corresponding positions of the two binary strings. ORB uses Hamming Distance to measure the Distance between two descriptors.
2.1.4 Feature point screening: Ratio-Test
The Ratio-Test is used to eliminate fuzzy matching point pairs whose distance Ratio (nearest neighbor distance/second neighbor distance) is approximate. Here, a parameter ratio is used to control the feature points whose distance ratio is beyond a certain range. As shown in the figure below, when the ratio is around 0.75, the correct match and the wrong match can be best separated.
FIG. 7 Ratio diagram of nearest neighbor distance and second neighbor distance. The solid line is the PDF of ratio when matching correctly, and the dotted line is the PDF of ratio when matching incorrectly. Image from paper: d. g. Lowe. Distinctive imagefeatures from scale-invariant keypoints, 2004
2.2 Scenario Coverage
The method based on feature point matching can also be used in the application of scene coverage. The first is to load the template image of the core scene, and the AI will collect a large number of screenshots of the game during the running of the game. Based on these screenshots formation test data sets, through each test data sets, respectively based on the part of the image feature point matching algorithm is the core of scene images and testing images, all images and matching feature points matching core scene images and testing images, ultimately selected matching results, the core of the filter is matched to the scene of the image. Through the matched images and the number of core scenes, the scene coverage in the operation process of AI can be predicted.
2.3 Recognition of numbers in the game
There are many digital images in the game, such as the number of levels, scores, countdown numbers, etc. We can identify the numbers based on CNN method. The classification method based on CNN was put forward very early. Lenet network was proposed in 1998. Lenet network consists of two layers of convolution, two layers of pooling, two layers of full connection and the last layer of Softmax. Input is digital image, output is digital image category.
FIG. 6 Lenet network
We can first segment the full digital image into an independent number, and then predict each digital image through Lenet network. The output digital image category is the recognized number, and then assemble the whole number to get the final all-digital recognition result.
Figure 7 Digital recognition process
With the deepening of network structure, the enhancement of convolution function, and the historical opportunities brought by GPU and big data, CNN has witnessed explosive development in recent years. In addition, CNN is not only used for classification, but also for object detection, that is, the last layer is from the original category of the output object to the position of the output object in the image and the category of the object at this position. We can adopt a compromise algorithm of speed and accuracy, YOLOV3, and optimize the network in two directions of reducing the number of network layers and reducing the number of feature graphs based on the image features of the project, so as to optimize the network speed further.
Figure 9. The process of number recognition and recombination
2.4 Fixed position Fixed icon recognition
There are many applications of template matching. We give examples on the recognition of fixed button, recognition of prompt information and detection of stuck state. In the main screen of the game, the hero’s skills, equipment, operation buttons and other buttons are generally in fixed positions. Extract the button icon when the button is in the available state as a template. The game interface obtained in real time detects the template, which indicates that the current button is available. Once the game AI has information about these buttons, it can take appropriate strategies, such as releasing skills and buying equipment. The game prompts are similar. Some prompt messages appear at fixed positions in the game interface, such as route indication information, game ending state (success/failure) and game running state (start) as shown in FIG. 7. We first collect the location of the prompt information and the prompt icon template. During the real-time running of the game, whether the prompt information is matched with the collected icon template at the location of the prompt information is matched in real time. If so, it indicates that the prompt information is present. If the game success icon is matched, the AI strategy in the game should be rewarded, and vice versa.
FIG. 10 Recognition of fixed Button
FIG. 11 Recognition of game prompt ICONS
The idea of template matching is to find the part of one image that best matches another template image. The process is shown in Figure 12.
Figure 12. Template matching process
The procedure for template matching is as follows:
- Step1: Starting from the upper left corner of the original image, from left to right, from top to bottom, step size 1, the similarity of template image and window sub-image is calculated successively by sliding window.
- Step2: Store similar results in the result matrix.
- Step3: Finally find the best matching value in the result matrix. If the more similar the value is, the larger the value is, then the brightest part in the result matrix is the best matching.
OpenCV provides the interface function cv2.matchTemplate(SRC, TMPL, method) for template matching, where method represents the selection of matching method.
2.5 Object filtering based on pixel features
According to the range of color values of each channel, the pixels in the detection area are filtered to obtain the location of the target that conforms to the color characteristics.
The color feature of the blood bar in the game is also quite obvious. For example, the R channel value of red blood strip is relatively large; The G channel value of the green blood strip was larger. The blue blood bar has a higher b-channel value. We extracted the color characteristics of the blood strip, and filtered the pixels of the blood strip according to the color characteristics. Many pixels formed the blood strip, and calculated the connected area of the blood strip to know the length of the blood strip, and then the percentage of blood volume. By filtering the pixel points of the health bar, we can know the position attributes of friendly units (green or blue) and enemy units (red) on the main interface of the current game, as well as the percentage attributes of health. Depending on these attributes, the game AI can use different strategies such as escape, forward attack, formation, etc.
FIG. 13 Blood stripe percentage calculation process
MOBA games, often in the minimap, our tower and the enemy tower. Extraction tower color range R(0, 90),G(90, 190),B(110,200). In the small map range, filter pixel gray value within this range of pixels, you can know our (enemy) tower in what position, as well as the tower health (pixel point number). If our hero appears in the minimap and there is a green peripheral circle around our hero’s head, we can also extract the pixel value range of the green peripheral circle R(80, 140),G(170, 210),B(70,110). Filter out the hero’s position through the gray value of each channel, and then conduct pathfinding or strategy selection.
FIG. 14 Application of small map pixel screening in MOBA games
2.6 other
There are many other applications of image recognition in the game, such as pedestrian detection in the game scene, hero detection, flower screen detection, air wall, mold penetration, weight removal and so on.
3 summary
This article mainly introduces the application of image recognition in the game, such as the determination of game state, the calculation of scene coverage, the recognition of numbers in the game, the recognition of fixed position and fixed ICONS. Hopefully, readers will have a better understanding of how image recognition can be used in games after reading this article.
“UQM User Quality Management” is a professional monitoring and analysis platform for game client performance (lag, fever, memory /CPU, network, etc.) and anomalies (crash, flash back, ANR, Error). With the help of deep quality big data analysis, we provide a full range of quality monitoring, data analysis and business insight services for the game business.
Click on the link:
UQM User Quality Management | WeTest, the industry’s leading quality cloud service provider, learn more