Video address: Advance instance level identification research

Instance-level identification (ILR) is the computer vision task of identifying a specific instance of an object, not just the category to which it belongs. For example, we are not interested in labeling images as “post-Impressionist paintings,” but rather as “Starry night over the Rhone by Vincent Van Gogh” or “Arc de Triomphe, Paris, France,” rather than simply “The Arch.” Instance-level identification problems exist in many domains, such as landmarks, artwork, products or logos, and have applications in visual search applications, personal photo organization, shopping, and so on. Over the past few years, Google has been contributing to ILR research through the Google Landmarks Dataset, Google Landmarks Dataset V2 (GLDv2), and novel models such as DELF and detect-to-Retrieve.

Today, we highlight some of the results from the instance level identification workshop at ECCV ’20. The workshop brought together experts and enthusiasts in the field for many fruitful discussions, including our ECCV ’20 paper “Deep Local and Global Features” (DELG), a state-of-the-art image feature model – Level Recognition, and open source libraries supporting DELG and other related ILR technologies. It also introduces two new milestone challenges based on GLDv2 (regarding identification and retrieval tasks) and future ILR challenges that extend to other areas: artwork identification and product retrieval. The long-term goal and challenge of the workshop is to advance the FIELD of ILR and drive the most advanced technologies by unifying research workflows from different fields, which to date have mostly been addressed as separate issues.

DELG: Depth local and global features

Effective image representation is a key component of solving instance level recognition problems. In general, two types of representations are required: global and local image features. Global features summarize the entire content of the image, resulting in a compact presentation, but discard information about the spatial arrangement of visual elements that might be characteristic of unique examples. On the other hand, local features include descriptors and geometric information about a particular image region; They are especially useful for matching images depicting the same object.

Currently, most systems that rely on these two characteristics require different models to adopt each of them separately, leading to redundant calculations and reduced overall efficiency. To solve this problem, we propose DELG, a unified model for local and global image features.

The DELG model utilizes a fully convolutional neural network with two distinct heads: one for global features and the other for local features. Global features are obtained using pooled feature maps at the deep network layer, which actually summarize the significant features of the input image and make the model more robust to subtle changes in the input. The local feature branch uses the intermediate feature map to detect the significant image regions with the help of the attention module, and generates descriptors representing relevant local contents in a differentiated way.

This novel design allows for efficient reasoning because it can extract global and local features in a single model. We demonstrate for the first time that such a unified model can be trained end-to-end and provide state-of-the-art results for instance level identification tasks. Compared with previous global features, the average average accuracy of the proposed method is 7.5% higher than that of other methods. For the local feature reordering phase, the result based on DELG is 7% better than the previous work. Overall, DELG achieved an average accuracy of 61.2% on GLDv2 identification tasks, outperforming all but two of the 2019 Challenge methods. Note that all of the top-level approaches in this challenge use complex model integrations, while our results use only one model.

Tensorflow 2 open source library

To facilitate repeatability of research, we also released an improved open source library that includes DELG and other techniques related to instance-level identification, such as DELF and detect-to-retrieve. Our code, using the latest version of Tensorflow 2, provides a usable reference implementation for model training and reasoning in addition to image retrieval and matching capabilities. We invite the community to use and contribute to the code base to provide a solid foundation for research in the FIELD of ILR.

New challenges for instance-level identification

Focusing on the domain of landmarks, Google Landmarks Dataset V2 (GLDv2) is the largest instance-level recognition dataset available, with 5 million images spanning 200,000 categories. By training the landmark retrieval model on this dataset, we have shown a 6% improvement in average accuracy compared to models trained on earlier datasets. We also recently introduced a new browser interface for intuitive exploration of the GLDv2 dataset.

This year, we also launched two new challenges in the area of landmarks, one focused on identification and the other on retrieval. These competitions use newly collected test sets and new evaluation methods: instead of uploading CSV files with pre-calculated predictions, participants must submit models and code running on the Kaggle server to calculate the predictions and then score and rank them. The computational limitations of this environment focus on efficient and practical solutions.

The challenge attracted over 1200 teams, with 3X up from this time last year, as well as significant improvements after participating in our strong DELG baseline. On the identification task, the highest scoring submission had a 43% relative improvement in average accuracy, and on the retrieval task, the winning team had a 59% relative improvement in average accuracy. The latter result was achieved through a combination of more efficient neural networks, pooling methods, and training protocols (see the Kaggle contest website for more details).

In addition to the landmark identification and retrieval challenges, our academic and industrial partners discussed their progress in developing benchmarks and competitions in other areas. A large-scale research benchmark for artwork identification is under construction, utilizing The Met’s open access image set, as well as a new test set containing guest photos showing various luminosity and geometric variations. Similarly, a new large-scale product search competition will capture a variety of challenging aspects, including large numbers of products, long-tail class distribution, and changes in object appearance and context.

Update note: first update blog, later update wechat public number “rain night’s blog”, after will be distributed to each platform, if the first to know more in advance, please pay attention to “rain night’s blog”.

Blog Source: Blog of rainy Night