NVIDIA has released DatasetGAN, an AI training data set generator
Researchers at Nvidia created DatasetGAN, a system that generates annotated composite images for creating datasets that train AI visual models. DatasetGAN can be trained with as few as 16 human-annotated images, performing just as well as a fully supervised system that requires 100 times more annotated images.
The system and experiments are described in a paper presented at the upcoming Computer Vision and Pattern Recognition Conference (CVPR 2021). DatasetGAN uses Nvidia’s StyleGAN technology to generate realistic images. The human annotator annotates the various parts of the object in the image in detail, and then trains the interpreter on this data to generate feature labels from StyleGAN’s potential space. The result is a system that can generate an unlimited number of images and annotations, which can then be used as a training data set for any computer vision (CV) system.
Generative adversarial network (GAN) is a system consisting of two deep learning models: a generator, which learns to create realistic data; A discriminator that learns to distinguish real data from the output of a generator. After training, generators are often used separately to simply generate data. Nvidia already uses GANs for a number of applications, including its Maxine platform for reducing video conferencing bandwidth. In 2019, Nvidia developed a GAN called StyleGAN that generates realistic images of faces and is used on the popular website This Person Does Not Exist. Last year, Nvidia developed a variant of StyleGAN that takes the required camera, texture, background and other data as input to produce customizable image renderings.
Although GANs can produce an infinite number of unique, high-quality images, most CV training algorithms also require the image to be annotated with object information in the image. ImageNet is one of the most popular CV datasets, famously employing tens of thousands of workers to tag images using Amazon’s Mechanical Turk. Although the workers were able to label the images at a rate of five per minute, they were simple pictures of a single object. More complex visual tasks, such as those required by self-driving cars, require complex scene images with _ semantic segmentation _ where each pixel is marked as part of an object. According to nvidia researchers, “tagging a complex scene with 50 objects can take 30 to 90 minutes.”
Nvidia’s insight into DatasetGAN is that the _ latent space _ input as a generator must contain semantic information about the generated image, so it can be used to create annotated graphs of the image. The team created a training data set for their system, first generating several images and saving potential vectors associated with them. The generated images are annotated by human workers, and latent vectors are paired with these annotations for training. This data set is then used to train a collection of multi-layer perceptron (MLP) classifiers as _ style interpreters. The input of the classifier consists of feature vectors generated by GAN to generate each pixel, while the output is a label for each pixel; For example, when a GAN generates an image of a face, the interpreter outputs a label that represents a part of the face, such as the cheek, nose, or ears.
The researchers trained the interpreter on the resulting images, which were marked by experienced human annotators. The images included bedrooms, cars, faces, birds and cats, with 16 to 40 examples in each category. They then used a complete DatasetGAN system to generate image datasets, which were then used to train standard CV models. The team used several common CV benchmarks, such as Celeb-A and Stanford Cars, to compare the performance of the models they trained on the generated data sets with the baseline models trained using current state-of-the-art transfer learning and semi-supervised techniques. With the same number of annotated images, Nvidia’s model performed “significantly” better than the baseline on all benchmarks.
Using synthetic data to train AI is an active research topic because it reduces the cost and labor associated with creating data sets. A common technique for mobile robot and autonomous vehicle training is the use of virtual environments and even video games as data sources. In 2015, researchers at the University of Massachusetts Lowell used crowdsourced CAD models to train image classifiers. In 2017, Apple developed a system that used GAN to improve the quality of compositing images for CV training, but the technique did not produce pixel-level semantic labels.
While Nvidia has open-source StyleGAN, the code for DatasetGAN has yet to be released. In a Twitter discussion about the work, co-author Huan Ling noted that the team is working hard on the release and hopes to meet the deadline for this year’s NeurIPS conference.
The original link: www.infoq.com/news/2021/0…