DeepMind, a subsidiary of Alphabet, has developed an artificial intelligence system that builds its 3D scenes by looking at 2D images. DeepMind researcher suggests that people do not understand the visual scene only by eyes, light must also rely on head knowledge reasoning, such as in a room to see the three legs of a table, people will inference and a same shape and color of the feet of hidden in sight, even can’t see the whole scene of the house, You can also imagine or draw its layout.

In other words, GQN is an autonomous learning system. GQN is composed of representation network and generation network, the former input data based on the observation of the agent to generate the description of the scene, and the latter predicts the appearance of the scene from the viewpoint not observed.

Performance network must accurately describe the scene as far as possible, including the location of the object, the color and the layout of the room, the generator you learned in the training of objects in the environment, function, relationship and regularity, so the network performance is highly compressed description field condition and abstract way, and generate network is responsible for fill the detailed information.

In the DeepMind experiment, they deployed a 3D world with randomly arranged objects, colours, shapes, textures and light sources. After training in these environments, the presentation network created a new scene. The presentation network was able to imagine previously unseen scenes from a fresh perspective. Produce a 3D scene with correct lighting and shape. The generative network can also draw a complete 3D building block configuration from the floor plan of the building blocks observed in the performance network. Or navigate back and forth through a maze of obstructed views, drawing the right 3D scene with limited resources.

The researchers said, compared with the traditional computer vision technology, this method still has limitations, and now only in the synthesis of scenario training, but with the emergence of new data and the ability of the hardware, GQN framework will be applied to the real scene and higher resolution images, DeepMind will explore GQN in more applications, on the understanding the scene For example, querying space and time to learn the constant weave of physics and motion, or applying it to virtual and augmented reality. The article was transferred from: Huizhong Industrial Science Station http://hertzhon.com.tw/