Not clever enough yue row of huaihe from concave not temple qubit product | public QbitAI

Hold your tongue.

Give yourself a few minutes to play the game and see if it works. Just a few minutes. Don’t be hard on yourself. That’s because it takes an average of 20 minutes.

The game address: https://high-level-4.herokuapp.com/experiment

It is recommended to open it on the PC. It cannot be played on the mobile phone. If you’re not at your computer, check out the GIFs of our demo:

What the hell is this? It’s kind of freaking hard.

In a way, you’re experiencing what the AI feels like playing a game.

In recent years, people have praised AI for learning to play atari games like brick breaking on its own, and reaching or surpassing the average human player.

On the other hand, there are doubts that AI takes too long to learn a game that humans can pick up at a glance. Why is there such a difference in learning efficiency between machines and humans?

A group of scientists at the University of California, Berkeley, is working on that very question.

The difference may be that humans come into games with a lot of prior knowledge from the real world, which greatly improves decision-making efficiency.

What do you mean?

Based on the game we started with, let’s follow the Berkeley researchers and take a look.

Quantitative analysis of

Here we go.

Take a look at the game in its original form.

Even if you don’t play the game, you can see that the player should dodge pink monsters, stab roadblocks, and then jump and climb ladders to get the key in the top right corner and open the door in the top left.

If you want to try to address at: https://dry-anchorage-61733.herokuapp.com/experiment

Experiments have shown that humans are quick to pick up the game. The average completion time was 1.8 minutes, the average death count was 3.3, and 3,011 paths were explored.

All right, let’s make it harder.

Remove the semantic

The shape of the ladder is its semantics. Humans see the shape of a ladder and associate it with climbing. How important is semantics? The experiment was simple, removing the details of various objects in the game and using a uniform color block.

The whole game looks like this.

The average time to complete the game increased to 4.3 minutes, the number of deaths increased to 11.1, and 7,205 game paths were explored.

Demo address here: https://boiling-retreat-38802.herokuapp.com/experiment

Do you think other players are too weak? Actually, you’re missing the point. Because of the structure of the article, you play the normal version of the game first, but if you don’t know the original design of the game, you can feel how hard it is to hide the semantic information of the game.

In the original game, both the key and the gate are visible. It’s natural for humans to get the key before opening the door. In the masked version, the player has no way of knowing this information.

Only 42 of the 120 participants obtained the “key” (orange square) before reaching the “gate” (blue square). It also takes longer to reach the “gate” after acquiring the “key” than in the original game.

This result suggests that in the absence of semantics, humans are unable to infer reward structures, significantly increasing exploration time.

To further quantify the importance of semantics, the next experiment changed the semantics rather than simply masking them.

Pink monsters and barbed roadblocks are replaced with gold coins and ice cream with positive meanings; Ladders, keys and gates were replaced with negative connotations of fire, barbed barricades and pink monsters, respectively.

As a result, the participants took longer to pass the test. Averaged 6.1 minutes, 13.7 deaths, 9,400 paths explored. This result shows that semantic inversion is more deceptive than masking.

Confuse the object

While none of the objects in the game convey any meaning after the above masking, they are very different from the background. It is easy for humans to infer that these eye-catching objects are sub-targets and embark on more effective actions than random searches.

In order to test this, the little game of confusion further upgrade. Each space on the platform was filled with different color blocks, most of which were useless. The color block representing the actual object, in the same position as before. Of course, if you’re playing this game for the first time, it’s going to be a little confusing.

Game screen as shown.

Demo address: https://high-level-1.herokuapp.com/experiment

The results: The average time for a human player to complete a game quadrupled to 7.7 minutes; The number of deaths reached 20.2; There are 12,232 game paths to explore. The time between finding the key and opening the door increases further.

Smooth out functional visibility

As the twists and turns so far have shown, extrapolating a game’s basic reward structure is not an easy task. But the overall game still benefits the human player. It’s still clear where the platforms are and how they are connected to each other, and the black background gives it away.

How to do?

One way to smooth out the visibility of features is to fill blank areas with random textures. These textures are similar to those used to render ladders and platforms. Again, the semantics of the various objects in this experiment are clearly visible.

Demo address: https://fierce-sierra-47669.herokuapp.com/experiment

Results: The average time to complete the game was 4.7 minutes, the number of deaths was 10.7, and there were 7,031 game paths to explore. This result is not much different from that of masking the semantics. It can be argued that visibility is as important as semantics.

Once the human player realizes that it is possible to stand on or climb certain textures, it is easy to identify other platforms and ladders by similarity. Humans assume that things that look the same have the same properties.

Let’s go ahead and increase the difficulty.

Every platform and ladder has a different texture this time. Humans can no longer extrapolate from similarity.

Demo address: https://high-level-3.herokuapp.com/experiment

This time, humans averaged 7.6 minutes, 14.8 deaths, and 11,715 paths to explore. The results show that visual similarity is the second most important type of knowledge for humans in games.

Change the interaction

All of these are studies that have to do with vision. In this game, the player also has to know how to interact with different objects. For example, encounter pink monster can jump past, encounter ladder can press ⬆️ to climb up. However, agents controlled by deep reinforcement learning do not have such prior knowledge, and must learn how to interact with objects little by little.

To test the importance of this prior knowledge, a new version of the game was created. We won’t go into what’s changed, but you can try it out for yourself.

Demo address: https://calm-ocean-56541.herokuapp.com/experiment

As you can see in the GIF above, you can’t climb a ladder by simply pressing the up button. You have to alternately press the left and right buttons repeatedly while holding down the up button.

Compared to the original game, this small change increased the average playtime to 3.6 minutes, 6 deaths, and 5,942 paths explored.

The ultimate challenge

Finally, the combination of the above has resulted in a small game with abnormal difficulty mentioned at the beginning.

The average time for humans to complete the mini-game increased to 20 minutes, the number of deaths reached 40, and the number of game paths explored increased nine-fold.

The game is so difficult that the rate of players giving up is very high.

But that’s not the hardest part. In fact, the Berkeley researchers came up with an even harder version. In that version, the direction of gravity was flipped, and the response to the button was mapped randomly.

Even researchers who have played it countless times have a hard time completing this version.

Also, they didn’t release a demo address for this version. (If any teachers find out by themselves, please leave a message to us)

Of course, there is also a 90° gravity direction rotation version, you can challenge.

Address: https://tranquil-earth-53211.herokuapp.com/experiment

discuss

The experimental results of this paper show that even strong reinforcement learning algorithms have a lot to learn from human cognition if they are to perform sparse reward tasks as efficiently as humans. Priori knowledge of objects has been a huge help to humans in the sparse task of playing games.

However, it is not always a good thing to have a strong priori, which sometimes limits the scope of exploration. Future research should also pay attention to less constraints on exploration in some environments.

In addition, human’s prior knowledge of games is actually far beyond the “objects” discussed in this paper. When playing games, people also assume that the game has goals, and the right button is usually to move forward. These priori are also worth exploring.

The researchers finally ranked the classification and importance of prior knowledge based on the experiment. As shown below:

The study’s paper Investigating Human Priors for Playing Video Games was published at ICLR Workshop 2018.

First author Rachit Dubey is a PhD student at Berkeley’s Computational Cognitive Science Lab. Other authors are also from Berkeley, including Pulkit Agrawal and Deepak Pathak, PhD students in the Department of Computer Science, and tutors for first and third authors: Tom Griffiths and Alexei Efros, director of the Computational Cognitive Sciences Laboratory.

Video on

If you’re interested in the research paper, you can also check out this video explaining it.

– the –