Editor’s note:

Yuan You, senior algorithm expert of Xianyu, recently attended the live event [CoderPark] hosted by MobTech Group, gathered in the cloud with MobTech experts and KOL, a well-known algorithm in the industry, and shared the application of image technology in hundreds of millions of real-shot pictures.

The background,

As a free commodity and content distribution market, There are millions of pictures uploaded by users every day, among which there are some low-quality products such as duplicate pictures, unclear description of pictures and inconsistent pictures. There are also a variety of funny, witty and other backward content; There are also illegal or illegal gray areas such as pornography and black production. If these low-quality products and contents are allowed to flow into the daily display of commodities, it will not only affect the transaction efficiency of users, but also reduce the reputation and value of products in the market, and even increase the risk of regulatory governance and suspension. Typical problems include:

  • Repeated picture content: In the daily publishing content of Xianyu, some sellers create multiple same products with different descriptions and pictures to increase the exposure of their products. At this time, the text expression may be completely different, but the product pictures look basically the same, as shown in Figure 1:

Figure 1. Picture of the same goods

  • Inconsistent graphic content: the contents of some pictures are inconsistent with the description of the products sold, which will affect the overall search sensation and effect when these products are mixed with other consistent products in the sorting process, as shown in Figure 2;

Figure 2. Differences between product picture and text description

  • Picture content quality: not all pictures uploaded by users are suitable for commodity picture display, such as commodity packaging, parts of commodities, non-commodity pictures, invoices, and commodity description drawings, as shown in Figure 3.

Figure 3. Picture of inappropriate product

  • Violation: In order to attract buyers’ attention, some sellers will take some illegal pictures such as beautiful women, sexy and funny pictures as the main picture of the product, which seriously affects xianyu’s brand value and fair and excellent trading environment, as shown in Figure 4.

Picture 4. Beautiful woman’s head picture merchandise

In fact, Internet companies have applied relevant technologies to solve practical problems on a large scale. For example, alibaba, Baidu, Tencent and other head companies have their own visual algorithm teams, and the top conferences are regular customers every year. We not only explore related cutting-edge technologies, but also transform them into practical products, such as the application of image features in Bealitao and Baidu Image recognition, image detection is directly used in automatic driving and industrial quality inspection, and image recognition related technologies have been widely used in audit, short video, advertising and other businesses. This paper introduces how to use visual technology to solve some problems involved in idle fish products. For example, the image content itself is not commodity or pornographic and other illegal materials can be solved by using image classification, image features and other methods. The following parts are mainly introduced:

  1. Build a large-scale image classification model to learn the image distribution characteristics of free fish products;

  2. Learning image comparison features based on classification model;

  3. Combining image classification and image features to solve practical problems;

Second, build a large-scale image classification model

Image classification model is the basis of visual model, detection, segmentation and other visual problems depend on the basic image classification model. There are several difficulties in constructing image classification model in idle fish scene:

  1. Most of the pictures are uploaded by users, and the image quality is low, which increases the difficulty of recognition.

  2. The picture content is not limited to the commodity category itself, but covers many other categories unrelated to commodities, making it difficult to define the overall category.

  3. The title of idle fish products is filled in by users themselves, with uneven structured information and a lot of colloquial noise.

  4. Similar goods contain large noise pictures that cannot be trained directly

  5. The cost of data annotation is high, and most data cannot be covered effectively in a short time.

We do not need to directly identify the many specific commodity name of the category, as long as it can distinguish between each other, for the need to focus on identifying the category, we use the trained samples characteristics of mining, the overall process as shown in figure 5, including learning based image features, clustering sample build, and the training of the classification model:

Figure 5. Semi-automatic image classification recognition

3. Basic image feature learning

The basic image model is mainly to learn the overall distribution of data. Under the condition of improving sample coverage as much as possible, simple samples are excavated so that the model can be cold started. Firstly, according to the display results of query requests online, click commodities under high-frequency Query are collected as candidate sets. Because there are misclick and high-click diversion samples, the commodity samples with low click rates and high click rates need to be filtered out. Meanwhile, query with similar semantics is required for de-duplication. After the above steps constitute the basic data of image classification. Resnet101 model was used for model training, and the effects of softmax and ArcFace [5] losses were compared. Softmax was better than ArcFace, probably because the sample was not pure and ArcFace was difficult to converge to a better target.

1. Cluster sample construction

With the basic model in place, the next step is to collect samples of idle fish orders. Firstly, collect samples of products under each category of idle fish, which can control the semantic concept within a certain range. Then, according to the hit ratio of the central word in title, divide the samples under this category into different subcategories, and samples under each subcategory have clear semantic meaning. However, there are great changes in the picture, as shown in figure 8-1. There are multiple categories in the samples under the subclass of “James”, which cannot be directly trained. Then we can use the above to get the new sample image characteristics on the basis of the subclass purification, to do clustering in each of the categories and the cosine each image features from the nearest sample aggregate into new categories, filter out the sample less category, finally got the training we idle fish under the scenario of image classification samples, according to different distance threshold, We selected three subcategories of 4.6K,7.4K and 12K respectively, and the artificial evaluation effect of 7.4K was relatively ideal, while the other two samples were either too coarse or too fine.

2. Training of classification model

Based on the previous classification model parameters, train new class target tags, adopt batchsize=256, Centercorp =224, add random crop, mirror and cutout preprocessing, adopt COS learning rate and add hot restart at a certain number of rounds, so that the model can further improve the accuracy in the later stage. The accuracy of the final verification set top@1 is 74%, which basically meets the application requirements.

In order to be able to identify specific categories such as bills, text maps, buildings, and people, we need to do custom sample mining for these categories: The retrieval system of commodity database is built by extracting features from the trained model, and samples requiring special treatment are collected for the nearest search. The threshold value of the result card after search is retrieved again, and the required sample set is further expanded, as shown in Figure 6. Finally, these categories are combined with the original categories for training.

Figure 6. Sample mining for specific categories

** Learn image comparison features based on classification model

The comparison feature is mainly used to judge whether the goods are the same style and filter out or break up the goods with repeated pictures. Since each commodity in the system has been defined separately, the deepID [1][2]][3] scheme is adopted here to train the same characteristics. However, there are two problems in training directly with the pictures of each commodity: 1. There are multiple images with large variations in each ID category, so you can’t use all images directly; 2. The samples of each ID class are rare, and direct training is difficult to converge.

In view of the above problem 1, we have a prior assumption that most of the pictures uploaded by users are related to the main intention of the commodity. Therefore, we can learn from the previous idea of clustering to conduct clustering among the pictures of the same commodity, and select the category with the most samples as our candidate set. If the sample difference between categories is not large, the product is considered unsuitable for training and should be discarded.

In order to mine the same sample for each category as much as possible, we select commodities with trading behaviors under the same query online and commodities with high click-through rates as candidate sets. Meanwhile, we conduct clustering within each commodity to ensure sample purity. In the experiment, we found that the samples under brand query or category Query have great diversity, such as “Anta”, “Huawei mobile phone” and “electric car”. Even though the click behavior is relatively dense, they may belong to different SKUS, so we need to restrict query. Try to ensure query to separate SKU granularity disambiguations, such as “Huawei P40 Pro”, “Infinometer Projector H3”, “Calf electric car G2”.

With the above samples, we can train the DEEPID model. Arcface Loss [5] commonly used for human face is used here. Backbone selects the previous classification model and initialization parameters. After selecting appropriate margin and scale, the model is trained to saturation, and then enlarging margin parameters and selecting appropriate scale parameters for training. After three iterations, the model is trained without fitting, and the final phase recognition accuracy is 95%, and product recall within SKU is 79%.

Based on the same feature can also be used in the same recognition scene, that is, using the same feature to do recall and then sift feature to do the final geometric verification, can realize the recognition of rotation, partial clipping and occlusion of the same image.

Combining image classification and image features to solve the problem of correlation and diversity

1, idle fish search relevance problem

In view of the diversity of product images submitted by users, in order to improve the body sense of search, the retrieval results are rearranged by using the correlation clustering method of the first image as shown in Figure 7-1, taking “Sharp Shark” search query as an example: Figure 7.1, for example, the control results, we can see that the queue is a portable tool related goods, but there is also the packing (pit 1, 6) and the experience of ambiguity (pit 3) bad case, through the commodity image classification model for goods to predict the multiple pictures and clustering, we can get each item top3 of main categories, Then, the top category of the whole queue was counted as the confidence category of the queue, and finally rearranged according to the confidence degree. The results showed that the top goods were all related to mobile tools by referring to Figure 7.2. The other case is shown in Figure 8.1. Query is “James” and the main category of sneakers is rearranged to the front row by image features, as shown in Figure 8.2. Other less relevant categories are downgraded. There will also be badcases in online cases. For example, in some queries, such as “Huawei” and “Apple”, there are multiple categories. Should we rearrange them, we need to optimize them according to user feedback?

Figure 7.1. Top6 results of query=” Sharp shark “control group

Figure 7.2. Top6 results of query=” Sharp shark “image feature reshooting group

Figure 8.1. Query =” James “control group top6 results

Figure 8.2. Top6 results of query=” James “image feature reshooting experiment group

2, idle fish Feed diversity problem

Diversity is an important indicator of recommendation effect. Too much concentration of relevant results will affect user experience. The particularity of products defined by users of Xianyu is difficult to be solved in one way in terms of diversity, which needs to be solved jointly by multi-dimensions such as category, text description and commodity picture. The solution to the diversity problem of commodity picture is somewhat similar to the search consistency problem above. As shown in Figure 9.1 below, for the same product “Huawei Mate Xs”, the USER-DEFINED category may belong to different categories, so the direct user-defined category cannot be diversified. However, from the picture dimension, we can find that these products have the same elements, namely, the picture of product packaging. Therefore, the image classification model can be used to predict the categories of each commodity picture to achieve fragmentation.

Figure 9.1 Case example of similar visual but different submission categories

First need to filter out the character and word category said these images consistent but don’t do processing, semantic change a lot of goods after the goods of goods figure category forecast and the forecast of confidence top1 category of aggregated, aims with the same kind of goods is going to the candidate set, the actual found in some of the same goods top1 category is not the same, It may be in top3, but directly using Top3 to remove the influence area is very large and will bring a lot of badcases. Therefore, the method of iterating twice is adopted here, that is, voting top3 categories after aggregation of Top1, and selecting the categories with non-co-occurrence but more than half of the vote for the second time, as shown in 9.2. This is an effective complement to cases where text and categories are not properly reduplicated.

Figure 9.2 Image diversity de-duplication logic

After the launch, all indicators have been improved

3, the same goods picture weight

Idle fish products in some sellers in order to increase their exposure can create multiple goods and description and visual changes in a similar small commodity picture, there are some sellers will use the same original goods pictures, if the goods in the same search results page will bring bad user experience, and reduces the commodity transaction efficiency. Here, the previous image comparison features are used to build an image search engine, and the general product quantization method is adopted to build 120 million commodity database. The overall process is shown in Figure 10.1.

The proportion of daily real-time goods is relatively small to the full inventory, so we do not need to achieve real-time and accurate weight removal, so we adopt offline weight removal scheme: First day of new products will be put in storage in the form of incremental, at the same time of building the index will synchronous current state of commodities such as plane, whether effective filter out invalid goods, and then use quantitative method to build product offline to heavy search engine, the new goods will pass to weight every day search engine to find the same goods as the database and update KV store to online, As a new link relationship is generated between the goods in the original library and the new goods, the list of the same goods in the original library in KV storage needs to be updated according to the new link matching relationship. After the online request completes the recall, it will query the corresponding same goods in KV storage in real time according to the id of the recalled goods. Finally, the scatter logic will break the same goods hit in the previous page in pagination to complete the deduplication.

10.1 Plan for removing goods from online stores

4. Filtering of non-compliant goods

The illegal products in Xianyu mainly include beauty first picture, sexy first picture, funny picture, human body parts, etc., but the actual products sold have nothing to do with these product pictures, and the sellers use such pictures mainly to attract users’ attention and cheat traffic. Using the commonly used censorship model, OCR recognition can filter out most pornographic, pornographic, political, violent and terrorist illegal content. Then there are some illegal commodities as shown in Figure 11.2 and contents of funny and segment categories as shown in Figure 11.3, which affect the normal trading market. Such problems cannot be filtered out directly by auditing and other models.

11.1 Identification process of illegal goods

In view of the above problems, we have designed a scheme to identify illegal commodities, as shown in FIG. 11.1. Since online business strategies are involved, there will be no detailed description here.

  • First identify and related main diagram, using the general classification to all goods photo tagging, and then filter out the semantic inconsistency commodity, but it is there will be a certain badcase as shown in figure 11.4, the user bask in single here is normal goods, so I still need to determine whether there is after filtering with goods, if the recognition with, will pass the goods;
  • Funny and sub-category pictures are generally popular pictures, although it is possible to edit, but the theme content is unchanged, such pictures can be solved by establishing the illegal content library and using the same recognition branch, as shown in Figure 11.1.

11.2 Examples of illegal products

11.3 Examples of non-commodity jokes and jokes

11.4 Figure of a Normal product example

** 6, ** summary

Using visual technology is mainly introduced in this paper the classification, characteristics of learning to solve the application of actual users distribute goods, but idle fish the diversity of the goods and content of user custom made in the aspect of audit, governance and structure brought us a lot of challenges, just use a modal and content is very difficult to solve all problems, Therefore, in practical application, a variety of schemes and technology combinations will be integrated to solve the problem. For example, in the recognition category, text and image content need to be used to improve the recognition accuracy, but also guide users to help complete the product structure; For sample purification, we can not only mark, but also make full and reasonable use of user feedback behavior to help us; In addition, retrieval system, data processing flow and synchronization management are the key to the value of the final model, and only a few models can not directly solve the online problems; In the problem of illegal governance is the so-called “devil is higher than one foot, the road is higher than one foot”, some non-compliance users always exploit the loopholes of the platform for their own profit, and the battle of wits with them is a long-term process, but also requires constant iteration and transcendence of technology. Thanks to the cooperation team: Xianyu Structuring team, Xianyu architecture team, Dharma Academy, audit part and other brother teams for their manpower and technical support.

Reference:

[1]Yi Sun,Xiaogang Wang,Xiaoao Tang. Deep Learning Face Representation from Predicting 10,000 Classes.CVPR 2014

[2]Yi Sun[1], Xiaogang Wang[2], Xiaoou Tang[3]. DeepID2: deep learning face representation by joint identification-verification. CVPR 2014

[3]Yi Sun[4], Xiaogang Wang[5], Xiaoou Tang[6]. Deeply learned face representations are sparse, selective, and robust. CVPR 2014

[4]Yi Sun[7], Ding Liang[8], Xiaogang Wang[9], Xiaoou Tang[10]. Face Recognition with Very Deep Neural Networks. CVPR 2015

[5]Jiankang Deng[11], Jia Guo[12], Niannan Xue[13], Stefanos Zafeiriou[14]. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. CVPR 2018

References

[1] Yi Sun: arxiv.org/search/cs?s… [2] Xiaogang Wang: arxiv.org/search/cs?s… [3] Xiaoou Tang: arxiv.org/search/cs?s… [4] Yi Sun: arxiv.org/search/cs?s… [5] Xiaogang Wang: arxiv.org/search/cs?s… [6] Xiaoou Tang: arxiv.org/search/cs?s… [7] Yi Sun: arxiv.org/search/cs?s… [8] Ding Liang: arxiv.org/search/cs?s… [9] Xiaogang Wang: arxiv.org/search/cs?s… [10] Xiaoou Tang: arxiv.org/search/cs?s… [11] Jiankang Deng: arxiv.org/search/cs?s… [12] Jia Guo: arxiv.org/search/cs?s… [13] Niannan Xue: arxiv.org/search/cs?s… [14] plan Zafeiriou: arxiv.org/search/cs?s…