Machine Heart, by Tony Peng.
Last year, OpenAI’s 1V1 AI beat the world’s top player Denti, and OpenAI CTO Greg Brockman promises that next year, we’ll be back with a 5V5 AI bot. Today, they are fulfilling that promise with the new OpenAI Five, which aims to challenge the world’s top Dota 2 human players. After 51 minutes, however, OpenAI suffered a complete defeat.
According to the report, many TI8 teams have signed up to compete against AI, and OpenAI today met its first opponent: paiN, a Brazilian team, who was also the first TI8 team to be eliminated. But it’s still undeniably one of the 18 strongest teams in the world by far. In previous open matches, the OpenAI Five beat Dendi in a 1V1 match and beat a 6,000-level team of human ex-pro gamers and game commentators in a 5V5 match.
Ahead of the game, OpanAI’s AI was widely predicted to win, following the example of AlphaGo. However, the truth is not so simple. Although OpenAI agents have advantages in operational response and other aspects, they are still inferior to human teams in terms of overall strategy and cooperation.
TI8, bad start
Today’s man-machine war is only a game, the two sides of the game lineup is as follows:
-
OpenAI Tianhui fang: helicopter, Lich, Death Prophet, Ice Woman, Tide
-
PaiN night mare: Lane, necromancer, witch doctor, Musket, Axe King
OpenAI gave itself a 97% chance to win at the start of the match, but paiN got off to a bad start when he chose to run straight into the tianhui field, where four men besieged the lone tide and grabbed a blood. OpenAI also shows the “not smart” side of AI, constantly poking eyes under the tower.
PaiN got off to a good start, with the OpenAI Five trailing by 1000 economies at 7 1/2 minutes. OpenAI gradually broke the score to 7-7 in the 10th minute, and the game was in a frenzy. As we can see, the computer doesn’t focus on killing people, it only focuses on pushing the tower. Then, the AI took its chance in the two lower battles. By 17 minutes, OpenAI’s economic anti-human player.
In 21/2 minutes, AI wins Roshan, which is the first time AI kills Roshan in an open match. Helicopter takes shield. However, the shield was wasted when the helicopter was caught in the nightmare area at 25 minutes. AI also did not come to the rescue, directly abandoned brother. Win Roshan for the second time in 32 minutes, but unbelievably, OpenAI is “selfish”, whoever kills Roshan gets the immortal Shield, even if he is in the support position! Then OpenAI went down the field with a wave of two for four.
Artificial intelligence has a different way of thinking about inserting eyes than human beings. We can see up to three real eyes placed in the Roshan doorway! At the same time, AI also put three eyes at home, which attracted the ridicule of understanding. After the full level, we see that the AI death prophet has been recruiting in the wild area, very strong!
PaiN’s players were headlong behind and had little advantage in the hero maneuvering, but gradually regained their rhythm and gained the upper hand after falling off the road. By 35 minutes, when the human player was on high ground, OpenAI’s predicted win rate had dropped to 67 percent.
Of course, humans don’t expect AI to have an advantage at this point. In 37 minutes, paiN economy was up 9,000. By 40 minutes, OpenAI was playing Roshan for the third time after killing two human heroes, but by this time the human pros seemed to have mastered the ai’s playbook.
By the 49th minute of the match, AI thought his chance of winning had dropped to 20% and the game was decided.
In the end, paiN’s human player destroyed the crystal after destroying OpenAI. TI8 The first match between man and machine ends with a human victory.
There are three huge problems with AI in today’s games:
First of all, there will be no Gank in mid-game and no concentrated advantage push tower. Between 20 and 35 minutes, there’s a window where paiN’s Musket and Axe haven’t come out yet, so it’s a good opportunity for AI. But apart from sticking eyes everywhere, even hanging around The Roshan, there was no decent organization of arresting and pushing towers. By the time the enemy economy came out and BKB came out, the game was one-sided.
Second, there is no position, and resources will not be properly arranged. Dota has traditionally played the 1-5 position, the 1 Carry position and the 4-5 assist position. Dota has been allocating resources to position 1 for years, and paiN is allocating resources to Musket and Axe. On the other hand, the AI side obeys the principle that everyone is equal, and there is a “tactical arrangement” that allows tidal and lich to take the immortal shield.
Finally, the faking is a big problem. The AI doesn’t seem to understand what equipment is appropriate and is wasting a lot of money on eye plugs.
Jonathan Raiman, a research scientist and member of the OpenAI Five team, told Machine Heart that the team wasn’t particularly disappointed. “Before the game, most of us thought we had about a 30 to 40 percent chance of winning the game. For example, the AI killed Roshan a lot. That’s something we should go back and study.”
Raiman revealed that the competition environment was changed so that the homing pigeons could be killed and the model had to adapt to the new environment, which affected a number of factors, such as equipment purchase. In addition, the team is rethinking how to set the weight of rewards in the future. OpenAI has a system of teamwork (more on this later), and all rewards are based on winning the game, but for now, this reduces the AI’s incentive to build up an economy in the early stages of the farm.
This is only OpenAI’s first game during TI8, and there are two more games left for OpenAI to play. But perhaps we can take a hint at why OpenAI Five is taking a beating today, from the first release of its findings in June to the team that has been killing humans on its way to benchmark tests.
After AlphaGo, you need a baton catcher
Let’s go back to 2016…
Studying AI in games has long been a hot topic in the field of machine learning: First, games are designed to entertain and challenge, and this complexity and fun make them ideal for AI; Both, games provide rich opportunities for human-computer interaction; Again, because the game is so popular, it’s natural to create more data to train the AI with.
In the past few years, game research has led to major breakthroughs in the field of machine learning: In 2015, Google’s DeepMind published a recent study in the scientific journal Nature: They developed Deep reinforcement learning (known as Deep Q Network) to train AI players, and in a series of Atari 2600 games, performance was close to or even better than human levels.
The next year, DeepMind’s AlphaGo, based on Monte Carlo tree search and reinforcement learning, came out on top, beating South Korean go master Lee Se 乭 4-1. Over the course of another year, AlphaGo evolved into AlphaZero, which outperformed humans in three board games — chess, Chess and Go — without relying on human knowledge but playing closely with itself.
A game of chess, in the world blew a boom in AI, but this boom will cool down sooner or later. The world needs new stimulation to maintain the curiosity and enthusiasm for AI, and practitioners need to seek new challenges to explore the boundaries of AI.
Go may have been broken, but there is still plenty of room for researchers in a wide range of gaming worlds: card games, first-person games, the Atari series, racing games, strategy games, sandbox games… DeepMind and Facebook worked on StarCraft, considered one of the most difficult games in the video game world to crack, and DeepMind’s performance so far has been dismal. That led them to open source the machine learning platform for StarCraft II with Blizzard last year.
Against this backdrop, OpenAI’s Dota AI project is being held in high hopes.
On November 5, 2016, OpenAI decided to develop ai agents that could learn from Dota 2. The team is led by OpenAI CTO Greg Brockman.
Until then, OpenAI didn’t know what games to study, but the criteria were: games that were complex enough, popular enough, had a rich API to work with, and could run on Linux. They searched for all the games on Twitch, the US livestreaming platform, and settled on Dota 2.
Dota, or Defense of The Ancients, was originally a multiplayer online battle map hatched out of The Warcraft series of competitive games. As The title says, Dota’s victory condition was to destroy The enemy’s Ancient (crystal).
The first Dota map, version 6.01, was released in 2005, and IceFrog, the core map programmer behind Dota, has maintained and updated Dota maps for many years. In 2013, Frogfrog teamed up with game developer Valve to release Dota 2, a fully independent game that became a true competitive game.
Dota 2 meets all OpenAI requirements:
First, it’s complicated. Dota 2 has 115 heroes available, ranging from 1 to 10 skills per hero (Carl, I’m talking about you), hundreds of items, over 20 towers, dozens of NPCS, 5V5 to form the parties of The Nightmare and The Sky Light, and play each other on three lines. Different tactics and arrangements were derived, including line, field, Gank, regiment battle, and eye piercing.
On its official blog, OpenAI lists the data for Dota 2 compared to board games: Dota 2 produces an average of 1,000 possible effective actions per tick, compared with 35 for chess and 250 for Go. Through Valve (the company that runs Dota 2) ‘s BOT API, OpenAI treats Dota 2 as 20,000 states, representing all the information available to humans in the game. Chess represents about 70 enumerations, and Go has about 400 enumerations.
Second, Dota 2 is popular. The game is played by tens of millions of players around the world, not in the same numbers as League of Legends or the likes of Chicken and Fortress today, but its relatively long history (Dota was released in 2005) and its epic setting, Warcraft, give it a lot of cachet and reputation.
Moreover, Dota 2 has professional esports. Every August, The world’s top players come to North America to compete in The International, a Dota 2 International tournament hosted by Valve. Last year TI7 had a bonus pool of more than $20 million.
OpenAI didn’t set out to beat the top human players, but it would have been a big breakthrough to use current cutting edge machine learning algorithms to develop a smart dota-playing virtual robot (bot instead). Unexpectedly, the road is getting further and further.
We may fail
In early 2017, OpenAI developed what they believe is one of the best rules-based scripting bots. That’s thanks to Rafal Jozefowicz, a former researcher on the project and now SVP at hedge fund DE Shaw Group. Rafal has never played In Dota, but he watches game reels every day and talks to other members about how Dota 2 heroes put skills, push towers, and buy equipment.
The researchers wrote in all the rules they could think of, and the script bot did win against some amateurs, but not against better players.
OpenAI decided to take it a step further by taking the hard-coded parts out and replacing them with machine learning. They used reinforcement learning to make the BOT learn from the ground up. As a result, they found it impossible to do this in a 5V5 environment any time soon. It was too difficult.
Instead, the researchers started with a small game called Kiting and then expanded the environment.
Kiting is a trick in Dota that usually occurs during opposition: you attack an enemy unit and then move it so that it can’t hit you, draining the enemy’s health back and forth. OpenAI created a small game based on Dota2: on a circular island, let trained bots Kiting and script bots on the island, and win by killing enemy units without getting hit.
Sounds simple, right? In practice, this is not the case at all, as OpenAI bots can never beat human players in Kiting. OpenAI’s bots always train along the same trajectory, but humans often don’t follow the same trajectory, which keeps the results disappointing.
“We’re probably going to fail,” OpenAI concluded at the time, and many researchers were frustrated that the project was so far behind schedule, less than six months after its launch. At this point, OpenAI decided to go where it went, and even the release of the latest research was still valuable.
Then came the turning point. The researchers set out to randomize the training environment so that the hero would sometimes walk faster, sometimes walk slower, and sometimes falter with a malfunction. The randomness makes the reinforcement learning strategy network of bot very robust. On March 1, 2017, OpenAI trained black Ranger (Dark Ranger) can now kill scripted calves in Kiting.
They applied the same strategy from Kiting to Dota 2’s 1v1 mode and it worked as well. Bots are now able to make up, block, and use various abilities. This gives OpenAI a lot of confidence: just using the same algorithm, and then adding computing power, we might one day be able to make 5V5 AI.
Jonas Schneider recalls that until April or May 2017, he was still beating THE AI easily, but as OpenAI added more computing power to train bots, it began to improve dramatically. In early June, it beat players with 1,500 points. Two months later, Sumail, a Dota2 1v1 god and member of The 2015 The International champion team, also lost to OpenAI.
In this process, William “Blitz” Lee, a well-known KoreAn-American commentator, helped OpenAI a lot. OpenAI reached out to Blitz for some guidance. Not everyone in the Dota community liked what OpenAI was doing. Some thought the scientists were playing a trick, others didn’t, but Blitz was fascinated by OpenAI’s work from the start. According to OpenAI researchers, Blitz said this after playing a game of 1V1 with the bot:
“This will change the way Dota players approach 1V1 from now on.”
The rest is history: in a one-on-one Dota2 exhibition match on TI7 last year, a bot designed by OpenAI defeated Danylo “Dendi” Ishutin, who has won $730,000 in prize money in his career. OpenAI’s bot beat Dendi about 10 minutes into the first match. In the second game Dendi gave up and refused to play the third game.
OpenAI fire. From the star research institution in the circle of machine learning, it has become the focus of worldwide attention and hot discussion. Last year’s TI7, an exhibition match, overshadowed all the official TI7 matches. Most people are excited, surprised and unbelievable, while others are skeptical and unwilling.
The 1V1 victory solves many mysteries for OpenAI, the most important of which is: Does reinforcement learning work in such a complex, long-term strategy environment?
There is no one who doesn’t question the ability of an AI to learn certain skills, such as negative complement, such as release skill, it is very simple. But combining all the skills, moves, aligns, etc. in a complex environment to beat the world’s top players in 1V1 is a major breakthrough, no doubt about it.
What many people don’t know, though, is that a human player has won a single 1V1 match against OpenAI since then. On September 7 last year, Dominik “Black” Reitmeier of Dota2, Germany, scored a last-gasp comeback to win 2-1. This is the first time a human has ever won against a full-fledged AI, and look how excited Black is.
OpenAI is not AlphaGo, or at least it is not invincible.
After the game, OpenAI CTO Brockman announced another exciting news on TI7, “The next step is 5V5. See you next year at TI!”
Addresses the three core issues of 5V5
The word is out, but OpenAI is not sure how to replicate the success of 1V1 with 5V5. Before actually training the BOT, the research team did a lot of preparatory work:
Time is money. OpenAI ended up using 128,000 CPU cores and 256 Gpus to support computing power, allowing the AI to play thousands of games by itself every day. (there is no such thing as a limit on AI play time);
Instead of Kubernetes, they developed a training system dedicated to reinforcement learning, Rapid, which can quickly replicate the results and data trained on multiple machines in a distributed system, and then update the training parameters.
They use the Gym as a training environment. Gym is a training environment independently developed by OpenAI for reinforcement learning, which contains various programs and background coding required by OpenAI Five.
After deployment, OpenAI needs to address three core issues: long-term operations, rewards, and team collaboration.
To train each hero, OpenAI uses two machine learning techniques: LSTM and Proximal Policy Optimization.
It’s easy to understand why LSTM is used: fighting Dota2 requires memory, and every current action of the enemy hero has an effect on subsequent actions. LSTM is a recurrent neural network (RNN), which is more suitable for processing and predicting important events with very long intervals and delays in time series than ordinary RNN. LSTM has an element called a Cell that can tell if input information is useful and needs to be remembered.
Each bot’s neural network consists of a single-layer, 1024-unit LSTM that watches the game and acts accordingly. The following interactive demo gives you an idea of how each bot performs its commands, as observed by the Dota 2 API.
For example, in the lower right corner of the image, the dragon (Nether subdragon) casts the second skill “Venom”. In order to do this, he needs four indicators: the action (movement, attack, casting skill, using item), the target hero, where the skill is cast, and when it is cast. OpenAI eventually represented the Dota2 world as a list of 20,000 numerical values.
The self-learning of Bot relies on the near-end strategy optimization, a reinforcement learning algorithm proposed by OpenAI in 2017, which is proved to require less data and parameters than the general strategy gradient method to achieve better results. Both the OpenAI Five and the early 1V1 bots learned from playing against themselves, starting with random parameters and not using human search or boot programs.
To avoid “strategy breakdown,” agents train against themselves for 80% of the game and against past agents for 20% of the game.
The reward system involves two aspects: one is the weight that each action has on the final outcome of the game. For example, the weight of the inverse complement is 0.2, the weight of the positive complement is 0.16; The weight of the tower is 1.0, but the weight of the two towers outside the crystal is only 0.75, which is the same as the weight of the first tower, and the kill warrant is negative.
The other is that each neural network is trained to maximize the exponential decay factor and future rewards. This is a very important parameter that determines whether the bot is focused on long-term rewards or short-term rewards. If the gamma is too small, then the BOT will only focus on immediate benefits such as money; If gamma is too large, then it will focus on future rewards and will do no good to train the bot in the first place.
According to OpenAI’s official blog, they changed γ from 0.998 (with a half-life of 46 seconds) to 0.997 (with a half-life of 5 minutes). In contrast, OpenAI’s near End Strategy Optimization (PPO) paper had the longest half-life of 0.5 seconds, DeepMind’s Rainbow paper had the longest half-life of 4.4 seconds, Google Brain Observe and Look Further used a half-life of 46 seconds.
How to get five neural networks to work together in a group battle is another curious point, which is also based on the reward system. OpenAI has developed a hyperparameter called Team Spirit, which ranges from 0 to 1. The smaller the number, the more “selfish” each neural network is, and the more it considers the overall interests of the Team. In the end, OpenAI found that setting Team Spirit to 1 won the match.
At the beginning of the training, the researcher actually adjusted the value very little so that the AI would think more about its own rewards, learning how to shunt, match, provide money and experience. After each neural network learned the basic strategies and gameplay, the researchers slowly increased the numbers.
Since all parameters are random, the AI does not introduce any human experience, so the AI does not have the concept of position 1-5, does not distinguish between support and carry, and the equipment is learned from scratch.
In the first game, the hero explores aimlessly on the map, and after a few hours of training, concepts like planning, development, or mid-battle emerge. Over the course of a few days, the smarts consistently adopted basic human strategies: trying to steal wealth from opponents, pushing towers to grow, and spinning heroes around the map to gain a route advantage. With further training, they began to learn advanced strategies such as five heroes working together to push the tower.
“It only took AI two days to beat me”
Jonathan Raiman, who studied at MIT, joined OpenAI last October. Raiman and many of the OpenAI researchers knew each other from the past, and after joining the team, they often opened a five-person night on Monday, which gradually became an OpenAI tradition
On a Monday in May (officially May 15th), AI won its first victory against An OpenAI team in a restricted Dota environment (ladder score 2500).
“I think the human race lasted about 40 minutes,” Raiman said, watching the game from the sideline. “After that, it got shorter and shorter. I’m super excited! I think we have a 50/50 chance to challenge the pros.”
In fact, AI had already beaten humans a week earlier. But there were some problems with that victory. The researchers examined the code behind the scenes and found that the code running the neural network was wrong! AI did not use LSTM’s memory function at all during the game, but it won. Until then, the researchers had seen nothing wrong with the AI.
“A lot of the problem with machine learning is still in engineering and fixing system vulnerabilities,” said Susan Zhang, a research scientist at OpenAI. “For example, the AI will avoid reaching level 25 for a long time because it finds that there is a huge negative reward for reaching 25, so at level 24 the AI will not go out to increase experience.”
Raiman also had a fight with the AI. The first time, his team won; But after two days of AI training, Raiman was no match. “For people at my level, there’s only a 24-48 hour window before we can fight the AI. We can fight for 40 minutes at the beginning, 20 minutes at the back, 10 minutes at the back, and then we stay at base.”
By June 6, OpenAI was able to beat teams with 4,000-6,000 points, but lost to professional teams matching 5,500 points. In that contest, the researchers found a number of interesting things:
OpenAI Five is used to sacrificing their own advantage (nightdream’s road, Sky’s road), and then sending the three heroes on the inferior road to suppress the enemy’s advantage, forcing the battle to shift to the more difficult side of the opponent to defend. This strategy has appeared in the professional world over the past few years and has become a popular tactic.
– Early to mid-match transitions are faster than opponents, AI will proactively gank when a human player is out of position; Just push the tower before they organize a revolt.
The AI will give money and experience to the support heroes (who don’t usually get resources first) in the early stages to give them a higher damage value, build a bigger advantage, win the group battle and use the opponent’s mistakes to ensure a quick win.
Nearly a year later, OpenAI announced the progress of the OpenAI Five project for the first time, and released the OpenAI Five Project report.
As more details were revealed, the news that “OpenAI defeats human Dota2 player after 180 years of training in one day” and “OpenAI breaks Dota2” quickly swept the world. Microsoft founder Bill Gates tweeted, “AI robots beat humans in the video game Dota 2. This is a big problem because their victory required teamwork and collaboration – a major milestone in advancing AI.”
People are starting to really believe that Dota 2, like Go, will be broken by AI?
Only half a Dota
The results of OpenAI’s first phase were impressive, but the results revealed didn’t satisfy many Dota fans for one reason: too many restrictions. In the tournament, which ends in June, players can only control five heroes, no eyes, no fog, no Roshan, no invisibility, no scanning, etc. Is this still Dota?
It’s not that OpenAI doesn’t want to be free, it’s that there’s so much AI has to learn, and so little time.
For example, OpenAI strictly controls the number of heroes, and if you look closely, most of them are Dota2 entry heroes, such as ice woman, shadow demon, lich, witch doctor, etc. As a result, one of the most common comments you’ll see on forums or Weibo is “dare OpenAI to play Carl or doghead”.
The AI can play with Carl, but it takes a lot of time to train. This and people are actually the same, start to play the entry-level hero, skilled to play advanced hero (I can’t play to Karl), the more difficult the hero, the longer the time to learn.
Since all the parameters in the training are random, the AI can only learn how to use these skills through constant training, so it doesn’t really understand them. Some skills are very direct, such as ice woman’s big move put out will definitely have damage; Some are more complex, like the Alchemist, whose two-skill “Unstable compound” is a double-edged sword: cast for 5 seconds to stun enemy heroes and inflict damage, and after 5.5 seconds to damage yourself.
This is a headache for the AI: Do I let go or do I not? For a long time, the AI decided that neither of the alchemist’s abilities was useful. This is quite different from people, no one will not use the alchemist’s second ability because it hurts blood.
The same goes for Roshan. Hit Roshan can get undead shield, three Roshan can also get filled with 3000 drops of blood cheese, but also pay a painful price, accidentally died inside. So AI chose not to play Roshan for a long time.
To solve this problem, the researcher chose to randomly set Roshan’s health during training. For example, when he had only 100 drops of health, A.I. would choose to kill Roshan. With this training, today’s AI will choose to look at one health every time it passes Roshan.
In today’s match, the OpenAI hero watched Roshan repeatedly as a result of this training.
Inserting the eyes is a rather interesting “challenge”. For a long time, the AI was constantly poking holes in the base, or poking holes in the base when nothing was wrong. The researcher doesn’t get it. Why are you sticking eyes in the base? ! It turns out that often when the enemy unit is pushed high (i.e. the third defense tower), the AI will make this choice and make room for the lockers to buy other equipment.
To this day, AI is still inserting eyes in some strange places: under towers, at the base, even a lot of eyes.
Illusions are still a limitation, because OpenAI couldn’t figure out how to let the hero control them. Raiman says that they have tried to have heroes with dual axes, but only when defending high ground or towers, because illusions can withstand a certain amount of damage (which makes it hard to get in the lineup).
So over the june-August period, OpenAI began to address these issues gradually. At the same time, they announced their next step: on August 5, they will invite (former) professional players who are more than 99.95 percent of the world’s players to benchmark the AI Bot.
“Even if we don’t do well at TI in the end, if we can achieve benchmark success, it will all be worth it,” Zhang said.
Benchmark test, blood abuse
About a mile from OpenAI’s office in San Francisco’s Mission District is a bar called Folsom Street Foundry, which has been praised locally. The bar has a large venue, which can accommodate 300-400 people and is used to host events such as concerts, parties and so on.
The Folsom Street Foundry is the place where the OpenAI Five will play 5V5 for the first time in public against top human players. On Sunday, August 5, at 12 p.m., the bar was full. The barstools and bar have been removed and replaced with rows of seats. There are five landline computers in the center of the stage, alongside a professional commentary stand. Dozens of OpenAI researchers, including founder Ilya Sutskever and CTO Brockman, turned out to witness this historic moment.
On the day, the OpenAI Five hosted four tournaments: an exhibition match with spectators and three benchmark matches with top players. If OpenAI wins, it will mean that the project has achieved its milestones.
The game also opened up a number of environmental restrictions, including the introduction of fog of war, eye insertion, Roshan play, the ability to choose heroes from each other, and the number of heroes expanded from 5 to 18.
Before the game, MoonMeander, an active pro on team human and currently ranked 104 in the world, put up a Flag on Twitter: “never lost to a bot, never will this time.” MoonMeander was joined on stage by OpenAI’s old friends Blitz, Capitalist, Fogged and Merlini, who sported the word “human” in their uniforms.
How good are these five guys? They have a team name called “99.95th-percentile”, which means they are better than 99.95 percent of the world’s players. The level of the first 15,000 players in the world is roughly equivalent to the level of Divine5, which is more than 6000 points in the past ladder.
But even so, the crowd didn’t think much of the human players before the game. Of at least 10 people interviewed at the event, more than three-quarters thought AI would win. “I support humans emotionally, but I don’t think they have a chance to win,” one audience member said.
And so it turned out.
Generally speaking, 30 minutes of play time can be guaranteed even in a one-sided game. However, the three matches that AI won were won by 13 minutes (and the audience), 21 minutes, and 25 minutes.
In the first match, the humans were the Night mare: Calf, Plague Mage, Ice Lady, Razor, shadow Demon; OpenAI Five are tianhui fang: Lich, Aircraft, Musket, DP, Lane.
In the first game, the human players seemed to have a hard time adapting to OpenAI, and it wasn’t until the fifth minute of the game that Blitz’s shadow hit a hit. OpenAI’s play was quite aggressive, moving quickly from a 212 split to a 311 lead, and then a 411 push on the lower tower in the 10th minute of the match. This time is usually the line period, the human side has not organized a decent defense. By the 13th minute of the game, the AI had a head advantage of 22:4.
For the next ten minutes, there were few human highlight points, except for a double kill by Shadow Devils. OpenAI was able to keep the humans down. In the 21st minute, OpenAI broke two ways, and then completed a wave of small group annihilation at the cost of 0 for 4 on the high ground, and the humans scored a GG (Good game) with a head count of 8:39.
In the second match, the human race is tianhui side: Calf, Shadow Demon, witch doctor, Death Prophet and Yin Sting; OpenAI for the night mare, offering musket, helicopter, ice woman, Lane and lich.
When Blitz chose Shadowshadow again, OpenAI’s chance of winning jumped from 56% to 72%. The humans were clearly in better shape in this innings, and the score remained tight except for a blood draw. But then a few waves of regiment war human losses, to 20 minutes, OpenAI began to focus advantage push tower, a breath of three roads all broken, offering a super soldier, human hit GG, 12:41.
There were a lot of interesting situations in this game: for example, the OpenAI ice lady played a golden hand, which is usually used for wild heroes or late carry equipment; AI learned to stop play, but didn’t know why; AI is very fond of the eye and the eye, they also learned to open fog; After pushing off the two roads, the human will generally directly remove the last two towers outside the crystal, but AI chose to retreat, and then the third road from the first tower to dismantle……
Since humans were powerless against AI, and OpenAI got what it wanted, the third match became an entertainment event, with live viewers and Twitch viewers choosing heroes for the OpenAI Five. As a result, the audience chose four melee heroes (Minnow, Stinger, Axe, and Wanderer) and a useless queen of Pain; For the humans, the prophet of Death, the Necromancer, the Lane, the lich, the helicopter. The OpenAI Five went straight out with a 2.9% chance of winning, and ended up winning less than 1%. However, the AI remained tenacious and the game was tied at 15:15 in the 15th minute.
Although the match ended up being a “game of love” for the human players, ending 48:22 after 35 minutes, for the OpenAI researchers, the entertainment generated some data worth studying. For example, when being suppressed by humans, AI seems to be at a loss as to what to do and can’t fight against the wind: Minnowing man is running all over the field, Wandering and Axe king are demolish the tower without brain, and when humans push the high ground, none of the five HEROES of AI is defending the high ground.
“The robot plays with confident knowledge that it knows where everyone is? It knows exactly how much damage they can do between three or four heroes in a lane, and it pops up immediately if you’re in the wrong position. It knows. And I’ve never played anything like it, it looks amazing.”
After the three matches, CTO Brockman tweeted: “OpenAI’s AI system is ready to take on top pros next month at TI8!”
Behind the victory, buried huge hidden trouble
What OpenAI did not expect was that the third game of the entertainment game would be a warning for today’s failure.
In fact, researchers were even more stressed after the benchmark. The level of human players in the benchmark test is around 6500K, but the level of professional players in TI is above 9000K. It is very difficult to greatly improve the strength of AI in just three weeks.
Raiman also revealed that the third game was so bad that fixing the problem became a top priority for OpenAI.
Zhang thinks there is too little time for them. “We try to do some impressive things on TI, and there’s certainly some pressure, but it’s mostly a matter of time. You need to give the experiment time to run, time to train, and then at the end you’ll do something really cool. We don’t have that much time right now!”
“Another problem is that the longer the game goes on, the worse it is for the AI, because there are so many factors and variables to consider.”
Everything he had said two weeks before TI was true: there was nothing AI could do against the wind.
However, OpenAI’s ability to enable complex collaboration and long-term gameplay in an imperfect environment is a huge breakthrough. Although OpenAI didn’t independently developed groundbreaking algorithm, but they will be at the forefront of the existing algorithms and combined model and calculate the force, let an agent from nothing, and through the self learning, develop a reasonable behavior patterns, application of this method in other AI, robots and could be applied in the game.
TI8 will not be the last stop of the OpenAI Five. There will be one last event, scheduled for October-November or even early next year. At that point, OpenAI hopes to open up all the heroes in the hero pool, freeing up all restrictions and allowing AI and human players to play a truly exciting Dota 2 match.
For now, OpenAI’s Dota journey is far from over.
It will be interesting to see how artificial intelligence will perform in the second game tomorrow.