(Welcome to “I Love Computer Vision” public account, a valuable and in-depth public account ~)

Face recognition is a field of computer vision that has made great progress in recent years. Thanks to the powerful model fitting ability of deep learning and the establishment of annotated large data sets, there have been annotated million-magnitude data sets for face recognition.

However, it is becoming more and more difficult to continue to expand the scale of the data set. Even if manual annotation is used, noise will inevitably be introduced when the scale of the data set becomes larger and larger. How to use cheap face image data without annotation has become an urgent problem to be solved.

The paper “Consensus Driven Propagation in Massive Unlabeled Data for Face Recognition” by researchers from Sensetime, The Chinese University of Hong Kong and Nanyang Technological University was published by ECCV2018. This paper presents a method to generate sample label pairs from unlabeled face images and use them in supervised learning model training, which provides a new idea to increase the scale of data set and improve the accuracy of face recognition at low cost.

It is worth mentioning that this paper is to solve the problem with the real face recognition application scenario is closely related to the hypothesis has built up a small amount of data has been marked with and did not mark the face image data is collected from the uncontrolled environment, and the database has been marked with no overlap, hope these data will give tags added to the training set.

Author Information:

Algorithm thought

The core motivation of the algorithm is to find those pseudo-positive sample face image pairs from the same person in the unlabeled data and add them to the training set to expand the size of the training set.

A very simple idea is to extract features and then cluster them, and take the clustered labels as pseudo sample labels. However, ordinary clustering is difficult to provide high-quality and reliable labels. An intuitive example is that the similarity of images of different people’s side faces is higher than that of the same person’s front face and side face.

How do you construct a reliable pair of pseudo-positive samples from the same person? See the picture below:

The author invented a model called Consensus Driven Propagation, in which there are three important roles: Mediator model base-model, Committee model, mediator model

Base-model and Committee Model are deep learning model classifiers trained from labeled data. This paper uses different network architectures to train multiple models, and uses them to extract features from unlabeled face images, and then uses these features to construct K-NN graphs of unlabeled samples. These K-NN diagrams preliminarily reflect the view relations between different face images of the same person.

The authors tried many depth models:

Then mediator Model was used to classify whether the two face image samples with connection relationship in k-NN graph came from the same person according to the connection relation and various diversity characteristics of k-NN graph. Multilayer Perceptron (MLP) was used as mediator model.

Obviously, mediator Model builds positive and negative sample pairs on marked data during training, which is exactly the origin of Consensus Driven Propagation. The relationship between the unlabeled face images from the same person and the relationship between the labeled image of the same person is similar, as shown in the relationship between their K-NN graph nodes.

Extracted sample diagram:

Examples of pseudo-positive sample images constructed:

The red box represents abnormal samples rejected by Mediator Model.

After the pseudo-label is constructed, it is added to the training set. However, Loss, different from marked data, is used for re-training on base-Model during training.

The experimental results

The experiment was carried out on MageFace and IJB-A face datasets, which were divided into 11 parts. Only 1/11 labeled data were used in the training, and the unlabeled data were gradually increased. The final accuracy was compared with the results using all labeled data.

The figure below shows the model network architecture used in the experiment and the accuracy achieved on the two data sets respectively, as well as the integrated accuracy.

The figure below shows that with the addition of unlabeled data, the accuracy of the model continues to improve.

On the MageFace dataset, the accuracy was 61.78% without unlabeled data (i.e., only 1/11 of the training data was used), 78.18% with 10 unlabeled data, and 78.52% with the fully supervised method (using all real annotations). It is proved that the proposed method can greatly improve the model accuracy (16.4%) by adding the pseudo-labels into the training set, and the performance is comparable to that of the fully supervised method.

Interestingly, the method used in this paper beats the fully supervised method on IJB-A (which theoretically shouldn’t be) because the IJB-A database itself introduces A lot of tag noise, the authors explain.

conclusion

The method proposed in this paper is very valuable to expand the scale of data at low cost. It is not only suitable for face recognition, but can be tried in almost all recognition tasks. According to the experimental results of IJB-A, it can even become A method of data cleaning.

Paper:

https://arxiv.org/abs/1809.01407

Code:

https://github.com/XiaohangZhan/cdp/