On October 19, 2017, Google Deepmind launched AlphaGo Zero, a new generation of go artificial intelligence. On the day Alpha Zero was released, my moments, Weibo and other social platforms were flooded. The AlphaGo Zero news filled the social media pages. Why is AlphaGo Zero so influential that it has blown up the AI world? How is it different from previous generations of AlphaGo?
Note: This is not an in-depth technical article! No math involved! This post is for a quick look at AlphaGo Zero.
Mathematics is the tool to achieve the goal, and understanding is the bridge to achieve the goal, so this article uses easy to understand animation to illustrate complex machine learning concepts. For more straightforward information, visit: Don’t bother with Python
AlphaGo caused a stir
In 2016, the first edition of AlphaGo was published in Nature, which is a very important journal. If anyone can publish a journal to Nature, haha, he will be guaranteed to have a good life. Now, just over a year later, Google DeepMind has published an improved version of AlphaGo in Nature. AlphaGo Zero, the same Go AI, has made two discoveries on natural impurities. Marvel at their strength! To understand AlphaGo Zero, we first need to understand how AlphaGo beat humans. AlphaGo’s victories over Fan Hui, the European champion, Lee Se-dol and, more recently, Ke Jie, the world champion, suggest that the best game of go has been lost. One by one, these Go masters said that AlphaGo went to places they did not expect, that AlphaGo had overcome the biological limits of human beings, that AlphaGo would not feel tired and so on. Indeed, this is one of the great advantages of machines over humans. So how did AlphaGo outmaneuver humans? It’s simple. It plans.
Monte Carlo tree search
It can use this tree structure to try an infinite number of strategies, each branch of which is one possible trend, but go has more accepted trends than there are stars in the sky, and human computers cannot try these countless trends at every step.
So it uses something called a Monte Carlo Tree Search to explore the unknown,
This is exactly what happened with chess AI, when Deep Blue, the chess AI developed by IBM at the time, used this kind of tree search to beat humans in 1997. But the same tree structure did not develop much over the next 20 years, or go would have been conquered. What’s the problem?
This was the search tree structure that was used in chess at the time, because there were far fewer possible scenarios in chess than in Go, and it was perfectly feasible to search extensively by computer. But in Go, the same formula does not apply, so the DeepMind team abandoned extensive searches in favour of deep ones, which are less computationally intensive and more accurate at analysing limited situations. But this tree search alone is not enough.
The neural network
So we’re going to add neural network structures that are developing very rapidly, for assessing the current state, for making decisions.
A simple neural network consists of three aspects. It takes in information from the outside world, such as a chess score, and then processes that information through the tens of thousands of neural nodes in the neural network. Finally, output what you understand, which can be the next move to take, or an evaluation of the current state of the chess. AlphaGo uses two sets of neural network systems: one to give the next action based on the current state, and one to evaluate whether the current state is favorable to our side. Use neural network with search tree to provide good chess behavior, these good behaviors as training data in turn to train the neural network, so that once and for all, the use of reinforcement learning method of continuous training, our neural network can continue to improve their ability to play chess. This is the main reason why AlphaGo can beat humans.
AlphaGo Zero
But why was the new version of AlphaGo Zero proposed, and why could it cause a stir after it was proposed? It is clearly better than its predecessors, first and foremost, because it has learned nothing about human chess.
Human learning chess, learning the predecessors left the excellent chess score is essential, so several versions of AlphaGo also inherited this idea, I want to learn the principles of chess from humans, there is a good teacher, will be much more convenient than without a teacher.
If you were to play AlphaGo with a human teacher, you might see a human playing behind it. But AlphaGo Zero is a completely untaught fellow, and playing with it, you can smell a lot of machinery. On the other hand, such AlphaGo has broken thousands of years of human chess thinking restrictions, explored the realm of chess that human could not think of, and learned a new way of playing chess.
On a technical level, AlphaGo Zero no longer uses two sets of neural networks, but fuses them into one, which makes more efficient use of resources and better learning.
And instead of just using gpus, it has added its own TPU built specifically for machine learning. And the amount of hardware used is also decreasing, but the learning effect is increasing. In just 40 days of training without a teacher, AlphaGo Zero surpassed all its predecessors. At this time, I believe that AlphaGo Zero has truly achieved the unbeatable in the go field. Finally, as David Silver, the father of AlphaGo, has said, the creation of a self-taught AlphaGo does not just mean that our AI can beat humans in the game of Go. It also means, in more ways than one, We can use AI like this to create more new chapters in human history.
Finally the enclosed AlphaGo Zero Nature paper: https://www.nature.com/nature/journal/v550/n7676/full/nature24270.html