Basic concept
Last month, I started to understand machine Learning from the principle level. I chose an online e-book named Neural Networks and Deep Learning as the textbook. The book proved to be very good, and a PM with no Neural network and machine Learning background quickly understood the basic principles.
Attached is my reading notes of the first chapter of this book. For PM, it is enough to finish the first chapter. The basic concepts and methodology are clearly explained in this chapter. After reading it, I felt as if I had opened the door to a new world.
Action triggers
Last weekend, Twitch made ClipMine, which identifies heroes and segments that players in Overwatch and Hearthstone are playing based on the live video of games, allowing viewers to select the heroes they want to watch among multiple anchors. Suddenly itching, thinking of such demand in the domestic live industry is actually exist, such as thousands of king of Glory live stream is identified in the current playing heroes, so that the audience can choose their own heroes want to watch the focus.
Itchy hands, and it’s the weekend, so I have to do something.
Engineering analysis
There are several ways to identify the heroes playing In Honor of Kings:
1. Select the hero interface before the game starts
2. Loading interface when Loading resources after the game starts
3. The hero himself in the middle of the screen during the game
4. Skill icon in the lower right corner of the screen during the game
Analysis is as follows:
- The ratio of time 1 and 2 to the total live time is very small, and if the player directly broadcasts the game in progress, there is no hero information.
- In addition, the current position of the player in the interface 1 and 2 is not fixed, and if the determination of the player’s position is added, the engineering complexity increases by more than a little.
- Since the hero is always at the center of the game, 3 is actually a good training material. But given that heroes have different actions and orientations, the most important thing about this game is that single heroes have different skins. The combination of these conditions, on the one hand, requires more training materials, and on the other hand, makes machine learning more difficult.
- In the interface of 4, the hero’s skill will not change in a long period of time, and the position is stable, and the time that appears in the whole live broadcast takes up a high proportion. The only big change is the second countdown during the skill launch interval. Taking all things into consideration, 4 is the most suitable material for training.
Once you’ve identified this, you can figure out how the project works:
- Get 60 + heroes in progress photos, no less than 1000 for each hero.
- The lower right corner of the picture in 1 is taken as training material for machine learning
- Run machine learning code to train models that recognize different hero skills
- The screen is extracted from the live stream to be identified, and the skill screen in the lower right corner is captured. The model in 3 is used to identify the hero’s skill, so as to complete the recognition of the hero in the live stream
The process is clear, but the most work is later, let’s take a look at how to obtain training materials.
To collect material
As anyone who does machine learning knows, writing code isn’t the hardest part. Gathering good training materials is. How do you quickly get 1000 images of 60 + heroes? Not to mention the difficulty of finding more than 60,000 pictures, do I need to mark which picture is which hero? If you really want to do this, it’s probably very difficult for one person to do it.
The trick this time is to go directly to youku search “King of Glory” + “hero name”, you can find a lot of players recorded hero video. So I asked the students in the team to help me and soon collected all the heroes’ fighting videos, one for each. Software like Adapter is then used to convert the video into thousands of images, one frame at a time. (Although Adapter was not discovered at first, so the frames in the video were read from the frame by OpenCV.)
The only problem is that videos on Youku are often not just about the battle itself, but also about special effects, text, transitions, etc. At first, I didn’t pay attention and contaminated a small batch of training materials. Later, I found a few clean videos to solve the problem.
In this way, it was easy to get nearly 100,000 images with heroic tagging information.
The next thing is simple, use OpenCV to crop the picture. At the beginning, we cut to three skill areas, but because this area covers a large area, it will contain a lot of unnecessary image information, resulting in unsatisfactory training results. In the later tuning, I thought that each hero’s skills were unique and there was no need to identify all of them, so I cut all the materials to the second skill, which greatly improved the identification accuracy.
Technology implementation
I have long heard that Google TensorFlow is easy to use and has good performance. Plus, it is a G product, which naturally has the best compatibility with Python. It is suitable for me who writes Python all the time. I decided to read the TensorFlow documentation.
I have to say that I spent a lot of time on installing TensorFlow and OpenCV in VirtualEnv over the weekend. None of the tutorials on the web ran perfectly on Mac, but thanks to the help of Google and Stack Overflow, both components compiled on my Mac.
Soon, find an article about image recognition in the official TensorFlow tutorial. Ran the demo and it worked fine. So he started studying how to train his model.
Since I have followed the official tutorial, I will use Inception V3’s network architecture directly. With nearly 100,000 images, on a Macbook Pro without a GPU, it would take almost 10 hours to finish the first training run.
The most time-consuming of these, however, is calculating its Bottleneck value, which does not change with each workout. Therefore, the tutorial will save each image after the first calculation of its own Bottleneck, so that the next training will only need to calculate the new image’s own Bottleneck, rather than the whole calculation each time. After this optimization, efficiency soared.
By this time, it was 3am on Sunday morning, so get the model running and going to bed.
Performance tuning
When I woke up at noon on Sunday, the first training had been finished. I hurried to do various tests with the new model, and the accuracy rate was beyond imagination. However, as mentioned in the “gathering materials” section, there are three abilities to identify in the beginning, so some heroes are not as good to identify.
After adjusting all the footage to a single skill screenshot, I ran it again and it was Sunday night. At this time, the accuracy depends on the test pictures I found there are no failed examples.
At this point, it takes about five seconds to recognize a single image, which is not unacceptable, but faster. Without an Nvidia graphics card, we rely on compiling TensorFlow locally so that TensorFlow can use instructions like SSE and AVX from the local CPU to speed up processing. How to compile TensorFlow on a Mac can be found here.
Additional features
Hero recognition is already perfect locally, but I always want more colleagues to experience this feature. When I got home on Tuesday night, I remembered the wechat robot I wrote with ItChat, so I immediately combined the robot and the king recognition code to send the game picture to the robot in wechat, and the robot immediately replied who the hero in the picture was.
conclusion
The actual learning and development time is from 8 p.m. to 3 a.m. on Saturdays. During these 8 hours, I realized the charm and implementation details of machine learning from the principle and code level, and really saw the entrance of another world and endless possibilities in the future.
The biggest shock was to start thinking about the future when product managers design product logic, if they understand machine learning, then a lot of things that were previously thought impossible will become part of product logic. Building on this new understanding and using new tools will differentiate product managers in some areas of the future.
Learning is a survival skill that A PM should never stop learning.