In CVPR of 2018 [1], the author used GAN to expand the data of camera style transfer to solve the problem of photo style change under different cameras. Meanwhile, the label smooth regularization method was used to solve the problem of noise and over-fitting. Experiments show that the method proposed in this paper effectively improves ReID’s performance. GAN is also the first to do ReID camera style transfer.

Overview of papers:

Pain points

ReID as a cross-camera retrieval task, camera style changes are of course inevitable. In order to solve this pain points, the article use GAN network to complete the camera style of migration, and expand the migrated to generate data for data, to recover the camera in the style of change, forcing the network to recognize the object itself, at the same time, as a kind of data, can also play a role of regularization, against CNN fitting problem, Ultimately improved performance on the ReID task.

The author thinks that the main causes of noise are as follows: 1) CycleGAN can not perfectly simulate the style transfer process, so there will be errors in the image generation process. 2) Due to occlusion and detection errors, there are noise samples in the real data. Converting these noise samples into pseudo-data may produce more noise samples. Therefore, the author uses label smooth regularization (LSR) method to solve this problem, so that the task can be improved.

At the same time, GAN is also an unsupervised method, which is a good data expansion scheme for ReID field, where annotation information is expensive, without manual or algorithmic annotation.

The style migration of the Market1501 dataset is as follows:

model

The schematic diagram of CycleGAN structure is as follows:

Suppose we want to be in the image of the zebra in domain A, and the style is converted to the bronco in domain B, and the zebra image goes through the generatorThe generated fake image, fake image and a b-field mustang image are input to the discriminatorTo determine whether the image is real or fake, there is a discriminant loss to determine whether the image is real or fake, so far the network is still a normal GAN.

And then the fake image of the mustang in field B, and then a generator, so that the false image can be restored to the style of zebra in Domain A. The restored image and the real image of zebra in domain A have A supervision of generation loss (L2 loss is used here) to ensure that the restored image is similar to the real image of zebra in domain A. CycleGAN has two generators and two discriminators. The generator generates style transfer image and restore image respectively. The discriminator discriminates whether style transfer image is true or false, and discriminates whether restore image is true or false.

The total loss of CycleGan provided in the article is as follows:

Where G is the mapping of A->B domain transformation, F is the mapping of B->A domain transformation,andAre the discriminators of truth and falsehood in A and B domains respectively,andLoss for B domain discriminator and cross-domain generator and loss for A domain discriminator and cross-domain generator respectively.It is A cyclic consistency loss to ensure that the real image in domain A is similar to the real image after it is generated in domain B and is similar to the real image after it is generated in domain B, that is, the original image can be restored. The parameter λ is used to balance the weights of different Loss.

The model flow is as follows:

The green box is the real image, and the blue box is the generated style transfer image. GAN uses the classic CycleGAN, and the generated data label can directly use the label of the original image before the style transfer.

The generated pictures and real pictures were sent to CNN for training. The final loss is processed by cross entropy loss and label smooth regularization (LSR) loss in real image and style transfer image respectively. LSR places less trust in real tags and less weight on other categories. The label distribution reassigned by each style-migrated image can be expressed as:

Among themIt’s a constant between zero and one whenWhen is 0, the dimension reduction of LSR becomes:

According to LSR, cross entropy loss is redefined as:

LSR is not required for real images (which are, after all, real) and is set for style transfer images= 1, there is

The author mentioned that LSRO is used to evenly distribute labels in [2], but this paper only gives a small proportion of distrust to some unreliable data because it does not completely trust labels.

The comparison between images generated by the method in this paper and those generated by DCGAN is shown in Figure 5:

It can be seen that the image generated by the method in this paper is more robust and closer to the real data.

The experiment

The visualization of tSNE is shown in Figure 4 below:

The same color is the same ID, the circle is the true sample, and the triangle is the false sample (generated sample). It can be seen that the false sample is often together with the true sample, which can support the utility of data augmentation. However, the interference points of false samples may still be misclassified (in the red box), which requires LSR to deal with.

The proportion of training quantity between the real sample and the false sample, and the effect on the result:

Baseline compared to the separation trial using CamStyle and LSR, and the number of cameras used under different data sets:

You can see a general improvement in CamStyle+LSR.

Separation test comparison of Loss using true and false data:

It can be seen that the best effect is to use cross entropy Loss for real data and LSR Loss for false data.

Separation test on the number of cameras for style transfer:

Of course, the more cameras with style transfer, the better.

Compatibility tests with other data enhancement methods:

Wherein, RE is Random Erasing, RF+RC is Random flip+ Random crop, both of which are commonly used data enhancement methods. As you can see, CamStyle is perfectly compatible with other data enhancement methods.

SOTA trial on Market1501:

SOTA trial on Duke:

reference

[1] Zhong Z, Zheng L, Zheng Z, et al. Camera style adaptation for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 5157-5166.

[2] Z. Zheng, L. Zheng, and Y. Yang. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV, 2017.