This article is authorized by the author Anjie, the algorithm ideas and codes in the article are for academic exchange only

Original: https://zhuanlan.zhihu.com/p/32636329

Recently, WeChat little game to jump jump can be said to be fire all over the country, from children to older children as if everyone in the brush jump jump, as no (zhi) (hui) not (ban) can (zhuan), the AI programmer, we thought, can use artificial intelligence (AI) and computer vision (CV) method to play the game?

Therefore, we developed the wechat Auto-jump algorithm to redefine the correct posture of playing the Jump. Our algorithm not only far exceeds the level of human beings, but also far exceeds all the algorithms currently known in terms of speed and accuracy. It can be said that it is the state-of-the-art in the Jump field.

The installation of automatic jump jump tutorial: please refer to lot warehouse at https://github.com/Prinsphield/Wechat_AutoJump

Let’s describe our algorithm in detail.

The first step of the algorithm is to take a screenshot of the phone’s screen and control the phone’s touch operation. Our Github repository details the configuration method for Android and IOS phones

All you need to do is connect your phone to your computer and follow the instructions to complete the configuration. Once you get the screen shots, it’s a simple visual problem. All we need to find is the position of the little man and the center of the table for the next jump.

As shown in the figure, the green dot represents the current position of the villain, and the red dot represents the target position.

Multiscale Search

There are a lot of ways to solve this problem, but in order to quickly brush the list, I started with a multi-scale search. I took a random picture, and I picked out the little guy, like this.

In addition, I noticed that the small people were slightly different in size depending on where they were on the screen, so I designed a multi-scale search, matching with different sizes, and finally selecting the one with the highest confidence score.

Examples of code for a multi-scale search:

Let’s give it a try. The results are pretty good. I should say it’s fast and good. However, the bottom center of the position box here is not the position of the little man, the real position is a little above that.

Similarly, target countertops can be searched in this way, but we need to collect a number of different countertops, including round, square, convenience store, manhole cover, prism, etc. Because of the large number, plus multiple scales, the speed will slow down. That’s when we need to speed things up.

First of all, it can be noticed that the target position is always above the position of the villain, so the point of operation is to abandon the part below the villain position after finding the villain position, so as to reduce the search space. But that’s not enough. We need to dig deeper into the story. The SIMS and target mesa are basically about symmetrical placement in the center of the screen. This provides a very good idea to narrow down the search space. Assuming that the screen resolution is (1280,720) and the position of the bottom of the little man is (H1, w1), then the position of the central symmetric point is (1280-H1, 720-w1). In a square with 300 side length centered on this point, we can search the target position at multiple scales and it will be almost accurate again.

The blue box is the search area (300,300), the red box is the table, and the center of the rectangle is the coordinate of the target point.

Fast-search (Fast Search)

Playing games requires careful observation. We can see that if the little man jumps to the center of the table last time, the next time there will be a white spot in the center of the target table, as shown in the picture just now. More careful people will find that the RGB value of the white dot is (245,245,245), which enables me to find a very simple and efficient way to directly search the white dot. I notice that the white dot is a connected area, and the number of pixels with the pixel value of (245, 245) is stable between 280 and 310. So we can use this to directly locate the target. This only worked for the previous jump to the center, but that’s ok, we can try this method every time it takes less time, but if not, we can think about multi-scale search.

At this point, our method works really well, basically a perpetual motion machine. The following is the state of playing with my mobile phone for about an hour and a half and jumping 859 times. Our method correctly calculated the position of the small man and the target position, but I chose the dog belt, because the mobile phone card has been out of line.

Here’s a sample video:

Is this the end of it? So what’s the difference between us and amateurs? Now it’s serious academic time, non-combatant please evacuate quickly!

CNN Coarse – to – Fine model

Considering that IOS devices cannot use fast-search due to the limitation of screen capture scheme (the screenshot obtained by WebDriverAgent is compressed and the image pixel is damaged, and the original pixel value is no longer available, the reason is not clear, welcome to put forward suggestions for improvement ~). Meanwhile, in order to be compatible with multi-resolution devices, We use convolutional neural network to build a faster and more robust target detection model. The following four parts of fractional data collection and pretreatment, coarse model, Fine model and Cascade model will introduce our algorithm.

Data acquisition and preprocessing

Based on our very accurate multiscale-search and fast-search models, we collected 7 times of experimental data, totaling about 3000 screen shots, each of which was marked with target location. For each picture, we carried out two different pre-processing methods. It is used to train the coarse model and fine model respectively. The following two different pretreatment methods are introduced respectively.

Coarse model data preprocessing

Since the truly meaningful area for the current judgment in each image is only in the center of the screen, that is, the location of the person and the target object, the upper and lower parts of each screenshot are meaningless. Therefore, we cut 320*720 from the upper and lower parts of the image with a size of 1280*720 along the X direction. Only 640*720 images of the center were retained as training data.

What we observed in the game was that every time the little man landed in the center of the object, a white dot appeared in the center of the next object,

Considering that fast-search will generate a large number of data with white dots in the training data, in order to eliminate the interference of white dots to network training, we remove white dots from each graph by filling the white dot area with solid color pixels around the white dots.

Fine model data preprocessing

In order to further improve the accuracy of the model, we set up a data set for fine model. For each graph in the training set, a piece of 320*320 size is taken near the target point as training data.

To prevent the network from learning the trivial result, we added a random offset of 50 pixels to each image. The fine model data is also whitened.

The Coarse model

We regard this problem as a regression problem, and the coarse model uses a convolutional neural network to regression the location of the target

After ten hours of training, the coarse model can reach an accuracy of 6 pixels in the test set, while the actual test accuracy is about 10 pixels. The inference time was 0.4 seconds on the test machine (MacBook Pro Retina, 15-inch, Mid 2015, 2.2ghz Intel Core I7). This model can easily score more than 1K, which is far beyond the human level and most automatic algorithms, and is good enough for daily entertainment, but you’d be wrong to think we’d stop there

Fine model

The fine model is similar to the Coarse model in structure and has a slightly larger number of parameters. The Fine model acts as the refine operation on the Coarse model

After ten hours of training, the precision of the fine model test set reached 0.5 pixel, and the actual test accuracy was about 1 pixel. The inference time on the test machine was 0.2 seconds.

Cascade


The overall accuracy is about 1 pixel and the time is 0.6 seconds.

conclusion

In view of this problem, we use AI and CV technology to propose a complete solution suitable for IOS and Android devices, which can be successfully configured and run by users with a little technical background. We propose multiscale-search, fast-search, CNN coarse-to-Fine has three algorithms to solve this problem, and the three algorithms can cooperate with each other to realize fast and accurate search and jumping. Users can adjust the jumping parameters for their own devices and get close to realizing “perpetual motion machine”. At this point, it seems safe to announce that we’re working terminate on this issue, and game over!

Friendly tips: moderate games benefit the brain, addicted to games harm the body, the fun of technical means lies in the technology itself rather than the game ranking, I hope you treat the game ranking and the technology proposed in this paper rationally, use games to entertain your life

Disclaimer: The algorithm and open source code proposed in this paper comply with MIT open Source license. All consequences caused by using this algorithm for commercial purposes shall be borne by users themselves

The article about deep learning recommends a very cool material “1 day to understand deep learning”, written by Professor Li Hongyi in Taiwan, public number reply “deep learning” download.

Recommended reading:

  • The crawler of wechat public number is implemented based on Python for data analysis

  • Thank you for 2017. In 2018, we will continue to work hard

  • 5 cool Python tools

  • Python crawler overview

  • Top10 most frequently asked questions about manipulating Python lists

  • Python 2017 Year-end review

  • How to teach Python from scratch

Focus on the Zen of Python and learn some Python

Go to the GitHub address