Content introduction

Face recognition has also encountered a pit, the recognition of three, but the two ineffective. Disney’s technology team is developing the algorithm to help animators with post-search. The team took advantage of PyTorch and saw significant productivity gains.

This article is compiled and published by the PyTorch developer community

When it comes to animation, we have to mention Disney, a business empire established in 1923. Disney started with animation and has been leading the development of animation films around the world.

Behind every animated film, there are hundreds of people’s efforts and sweat. Disney’s journey into digital animation began with the release of toy Story, the first computer-animated film. With the development of CGI and AI technologies, the production and archiving methods of Disney animated films have also undergone great changes. The worldwide hit zootopia took five years to complete

Disney has also recruited an army of computer scientists who are using cutting-edge technology to change the way content is created and lighten the burden on those behind the movies.

Century-old film giant, how to manage digital content

Walt Disney Animation Studios employs more than 800 people from 25 different countries, including artists, directors, writers, producers and technical teams.

To make a film, we need to go through a complex process from inspiration, to story outline writing, to script drafting, art design, character design, dubbing, animation effects, special effects production, editing, post-production and so on.

As of March 2021, Walt Disney Animation Studios, which specializes in producing animated films, had produced and released 59 feature-length films with hundreds of characters. Historical animated character data will be used frequently in sequels, Easter eggs, and reference designs

Animators looking for a particular character, scene, or object in a sequel or reference to a particular character need to search through vast archives of content. To do so, they often spend hours watching videos, sifting through them with their naked eyes.

To address this issue, Disney has been working on an AI project since 2016 called “Content Genome” to create Disney digital Content archives that will help animators quickly and accurately identify faces (whether people or objects) in their animations.

Training animation special face recognition algorithm

The first step in digitizing content libraries is to detect and tag content from past works, making it easier for creators and users to search.

Face recognition technology has been relatively mature, but can the same set of methods be used for facial recognition in animation?

Content Genome’s technology team tested it and found it only worked in certain situations.

They took two animated films, “Elena” and “Guardian of the Lion,” and manually annotated samples, using squares to mark faces in hundreds of frames. By manually annotating the data set,The team verified that the face recognition technology based on HOG + SVM pipeline performed poorly in animated faces (especially face like and animal face). Manually mark the face of the animated image

The team’s analysis confirmed that methods like HOG + SVM were robust to color, brightness, or texture changes, but the models used could only match animated characters with human proportions (i.e., two eyes, one nose, and one mouth).

In addition, since the background of animated content often has flat areas and little detail, the FtP-RCNN model mistakenly identifies everything that stands out in a simple context as an animated face.In Cars, the abstract faces of the two “racing” protagonists cannot be detected and recognized by traditional facial recognition technology

So the team decided they needed a technology that could learn more abstract concepts of faces.

The team chose to use the PyTorch training model. With PyTorch, the team says, they can access the most advanced pre-training models, meet their training needs and make the archiving process more efficient.

During the training, the team found that they had enough positive samples in their data set, but not enough negative samples to train the model. They decided to augment the initial data set with other images that did not contain animated faces but had animated characteristics.

Technically to do this, they extended Torchvision’s htyl-RCNN implementation to allow negative samples to be loaded during training without annotations.

This is also a new feature that the team has developed for Torchvision 0.6, with guidance from the core Developers of Torchvision. Adding negative sample examples to the data set can greatly reduce false positives during reasoning, resulting in excellent results.

Video processing with PyTorch is 10 times more efficient

After implementing facial recognition for animated images, the team’s next goal was to speed up the video analysis process, and using PyTorch effectively parallelizes and speeds up other tasks.

According to the team,Reading and decoding the video was also time-consuming, so the team used a custom PyTorch IterableDataset, in conjunction with PyTorch’s DataLoader, to allow different parts of the video to be read using a parallel CPU. The i-frames of the video are extracted and divided into chunks, and each CPU worker reads different chunks

Reading the video this way is pretty fast, but the team tried to do all the calculations in a single read. So they executed most of the pipeline in PyTorch with GPU execution in mind. Each frame is sent to the GPU only once, and then all algorithms are applied to each batch to minimize the communication between CPU and GPU.

The team also used PyTorch to implement more traditional algorithms such as lens detector, which does not use neural networks and performs operations such as color space variations, histograms, and singular value decomposition (SVD). PyTorch enables teams to move computations to gpus at minimal cost and easily reclaim intermediate results shared between multiple algorithms.

By using PyTorch, the team moved the CPU portion onto the GPU and used DataLoader to speed up video reading, taking full advantage of the hardware and ultimately cutting processing times by a factor of 10.

The team’s developers concluded that PyTorch’s core components, such as IterableDataset, DataLoader, and Torchvision, allowed the team to improve data loading and algorithm efficiency in production environments, Teams are increasingly turning to PyTorch for everything from reasoning to model training resources to complete Pipeline optimization toolsets.