Recently, ICCV (InternationalConference on Computer Vision), the top biennial academic conference in the field of Computer Vision, was successfully concluded. In this conference, Meituan selected two accepted papers, one of which won the HTCV Symposium best Paper nomination award, and won the second place of two well-known challenges, covering face technology, human body technology, model optimization, low power consumption and many other fields. And intellectual source and Barcelona first joint calculations of Chinese Academy of Sciences, Beijing university jointly held the LargeFineFoodAI seminar (large granular food analysis) technology, has attracted many participants from different time zones to actively participate in and discuss, in the international arena to promote application in food analysis, computer vision technology to help everyone to eat better, Life is better.
ICCV is recognized as the highest level of the three summits in computer vision, and its paper acceptance rate is very low. This year, the ICCV received 6,236 valid entries and 1,617 entries were accepted, a mere 25.9 percent acceptance rate. Of these papers, Chinese scholars won almost half, accounting for 45.7 percent, nearly double that of the second-ranked United States and nearly 13 times that of the third-ranked United Kingdom.
Meituan held a large-scale seminar in the field of fine-grained food analysis and experts gathered to discuss artificial intelligence to help food health
The seminar consists of three parts: expert invited report, challenge report and paper report. At the symposium, experts and scholars presented insightful analysis and new problem definitions in the field of food intelligence analysis, and jointly discussed the development direction and application of computer vision for food empowerment, promoting the integration of computer vision, food science, nutrition and health and other cross-fields.
Ramesh, a professor at the University of California, Irvine, and founding director of the Institute for Future Health, spoke about the importance of customised food models. In the design of the model, personal preference is effectively balanced with the food needed by the body, and targeted to recommend the most balanced food for each user. Professor Kiyoharu, department of information and communication engineering, university of Tokyo, introduced a new FoodLog tool, FoodLog Athl, which can be used for diet-related health care and dietary evaluation services. The tool supports food image recognition, nutritional diet evaluation, food nutritional value measurement and other functions. Professor Radeva, Faculty of Mathematics and Computer Science, University of Barcelona, discusses the necessity of uncertainty estimation and demonstrates the method of uncertainty modeling in food image recognition. In addition, papers submitted by Carnegie Mellon University and Purdue University were also selected to present at the symposium.
Meituan held two food challenges to promote academic exchanges
At the same time, Meituan also organized the first “Large-scale Food Image Recognition and Retrieval” Challenge, which attracted many capable teams from home and abroad, including Tsinghua University, University of Science and Technology of China, Nanjing University of Science and Technology, University of Barcelona, Nanyang Technological University of Singapore and other universities. A total of 143 teams from home and abroad, including Alibaba, Shenlan Technology, OPPO and Jubilee, participated in the competition.
As a leading lifestyle service platform in China, Meituan takes the lead in proposing fine-grained analysis of food images with the help of computer vision algorithms to quickly respond to and meet a large number of diversified online food image audit, management, browsing, evaluation and other needs of merchants and users. The datasets of both tracks are derived from Meituan’s own food image dataset “Food2K”, which contains 1,500 categories and approximately 800,000 images. Each image is captured by different individuals, using different equipment and in different environmental scenes, which is rare image data that can fairly evaluate the robustness and effect of the algorithm. Compared with other mainstream food image recognition data sets, “Food2K” completely manual annotation, noise ratio control within 1%, consistent with the real data distribution scenario, and build a unified food standard system, the system covers the western food subordinate 12 categories of 2000 kind of food (pizza, for example, subdivided into such as shrimp pizza, durian pizza and other categories).
Compared with general image recognition and retrieval, food fine-grained recognition and retrieval technology is more difficult, because many different types of food look very similar, and the same type of food due to different cooking methods look very different. In addition, different lighting, shooting Angle and shooting background will affect the accuracy of the algorithm, and even professionals are difficult to identify quickly and accurately. In the end, according to the competition results and technical solutions, the teams from Joyous Times, Nanjing University of Science and Technology and OPPO won the top three spots in the track identification, and the teams from Shenlan Technology, University of Science and Technology of China and OPPO won the top three spots in the track retrieval.
Report card of Meituan in ICCV2021:2 papers were accepted by the top conference, papers were accepted in the low power and ReID field symposium, and won the second place in the challenge competition
In this ICCV conference, Meituan selected two papers. Are:
Trash to Treasure: Harvesting OOD Data with Cross-modal Matching for Open-set Semi-supervised Learning
Abstract: This article in view of open set a semi-supervised learning scenario put forward a set of general open a semi-supervised training image classification framework, by designing a compatible image classification task goal multimodal matching mechanism, supervision and image classification in subsequent tasks out without annotation in the data sample from the group, at the same time use the supervision of learning technology, make full use of all without annotation data (including samples) from the group, In order to enhance the ability of model feature extractor to understand the high-level semantics of images.
Topic: “Learn to Cluster Faces via Pairwise Classification”
Abstract: In this paper, a fast face clustering method based on Pairwise Classification is proposed, which can solve the problems of memory dependence and efficiency of large-scale data inference. Meanwhile, in order to reduce the influence of cluster center estimation offset caused by outliers of clustering tasks, A rank-weighted Density method was proposed to guide the selection of pairs in the prediction stage. The monotone decreasing function based on Rank ranking in k-nearest neighbors could be used to weight the similarity between samples, so as to estimate the cluster center more accurately and further improve the accuracy of clustering. In MS1M, IJB-B and other public data sets, the performance of SOTA is achieved.
Runner-up in the 5th LPCV Low Power Vision International Competition and VIPriors Pedestrian Reidentification Challenge.
In addition, I also won the nomination award for best paper in HTCV seminar in ReID field:
Paper Title: Transformer Meets Part Model: Adaptive Part Division for Person Re-Identification
Abstract: The method based on local partition has become the most mainstream method in the field of pedestrian re-recognition. There are two main implementation methods: One is to divide the pedestrian into several fixed areas, but the pedestrian image is not aligned will lead to performance degradation; The other is to introduce additional pedestrian pose estimation or pedestrian segmentation model, but it requires more computation and annotation data. Inspired by the recent Vision Transformer, this paper proposes an adaptive local partitioning method that can automatically extract different important local features without additional annotation and with minimal extra computation. At present, this method has reached the international leading level in the four most mainstream data sets of Market-1501, CUHK03, DukeMTMC ReID and MSMT17.
Follow the trend of AI and contribute to the construction of digital China
With the core goal of “helping people eat better and live a better life”, Meituan AI is committed to exploring cutting-edge ARTIFICIAL intelligence technology on the demand of actual business scenarios and quickly applying it to actual life service scenarios. Meituan Visual Intelligence department is committed to building world-class visual core technology capabilities and platform services. Current visual intelligence technology layout has been include image processing, character recognition, video analysis, the body/face recognition, visual perception, and other fields, unmanned in accumulated the international/domestic leading technology progress at the same time, the method of innovation and achievements transformation, give attention to two or morethings depth can assign retail, no one intelligent transportation, logistics, warehousing, distribution and other business scenario.
In the 14th Five-Year Plan, we should unswervingly build a digital China and speed up digital development. In the digital wave of the present, artificial intelligence will become a new match point, AI has become a key link of China’s economic development. Meituan is also making full use of its artificial intelligence capabilities, continuously exploring more application scenarios and application Spaces, enabling more users to enjoy the dividends brought by science and technology, and contributing to the construction of digital China.