core idea
Different from the common single-label classification model based on cross entropy loss, this paper uses multi-label classification model to complete the training. In the so-called multi-label classification model, in the initial process, a single image is assigned a separate class, and through subsequent iterations, an image not only belongs to its own class, but also belongs to the class of other photos belonging to the same person as the photo. In order to accelerate the training efficiency of multi-classification model, memory-based multi-label classification LOSSS (MMCL) is proposed in this paper. The main idea of this loss function is that the cosine distance between two images with common tags should be closer to 1, while the distance between images with different tags should be closer to -1.
An overview of the model
For each input image, after feature extraction, features are stored in a dictionary with index as the key value. At the same time, a one-hot vector is also generated with this image, that is, the single-class label in the figure, where only the index corresponding to this image is +1, and other positions are -1. The vector and the memory dictionary can obtain the corresponding multi-class label through the MPLP module, which is similar to the single-class label. The index of the predicted class is +1, and the other positions are -1. For each category predicted as 1 in the multi-class label, extract the corresponding image feature from meomry, calculate cosine distance with the input image feature as scores. Finally, multi-class label and scores interact to obtain the final MMCL for optimization.
The main content
Given an unlabeled image dataset X={x1,x2, X… ,xn}X = \{x_1, x_2, x… , x_n\}X={x1,x2,x… ,xn}, the goal is to train a ReID model based on this data set. For any given Query image QQQ, the trained ReID model can extract a feature to retrieve images GGG belonging to the same person from Gallry set G. To this end, the ReID model should ensure that QQQ has more similar features to GGG than other images in GGG. The final optimization objective is:
For each image xix_iXI, its corresponding initialized single-class label is the binary vector LLL, whose length is N, L[I]=1L[I]=1L[I]=1, and other positions are -1. Because an image can have multiple classes, you should change the single-class label to the multi-class label. Because of the large number of images in the dataset, it is difficult to train a multi-label classifier. A more efficient solution is to use the image feature FIF_IFi corresponding to the i-class as the class classifier. Thus, the classification score of any image Xjx_JXj can be calculated as follows:
Cj [I]=fiT× FJC_j [I]= F_I ^T \times f_JCj [I]=fiT× Fj, where CJC_JCj represents the multi-label classification score of XjX_JXj.
memory
Memory bank MMM with size N ×dn \times dn× D is used to store image features, where M[I]=fiM[I]= f_iM[I]= FI.
MPLP
MPLP module input single-class label and memory bank to get the corresponding multi-class label
Yiy_iyi is the input single-class label, and Y ^ I \hat{y}_iy^ I is the output multi-class label.
Given the initialized binary single-class label yiy_iyi of the image xix_ixi, MPLP works to find other categories that might belong to xix_ixi. For Xix_IXI, MPLP first computes a ranking table RiR_iRi based on the similarity between xix_IXI and other features
Where Si,js_{I,j}si,j represent the similar fractions between xix_Ixi and Xjx_jxj.
RiR_iRi can be used to obtain a candidate set of xix_IXI trusted tags, such as selecting the first few in the sorting table. However, due to ambiguity, perspective, and background, the stability of the sorted table will be reduced. Therefore, this paper proposes the following two strategies to solve the stability problem:
- By giving a confidence score lower limit, the trusted set is screened out as Pi=Ri[1:ki]P_i = R_i[1: k_i]Pi=Ri[1:ki], where Ri[ki]R_i[k_i]Ri[ki] is the last tag whose confidence is higher than the given lower limit of score. Therefore, kik_iki may not be the same for each image.
- The principle of label filtering by loopback constraint is based on the assumption that if two images belong to the same category, their neighbor sets should also be similar. According to this principle, hard negative labels in PiP_iPi can be screened out. MPLP traverses the labels in PiP_iPi from beginning to end. For the label J in PiP_iPi, MPLP calculates its top−kitop- k_ITop −ki nearest label. If label I is also the closest label of one of the top− Kitop-k_ITop −ki labels of label J, then label J is considered a positive sample of xix_ixi. Otherwise, it is considered a hard negtaive tag. The traversal stops when the first hard negative tag is detected. Thus, the positive label set Pi∗P_i^*Pi∗ is obtained.
MMCL
Traditional multi-label loss function MCL
Since M[j]TM[j]^TM[j]T and FIF_IFi are L2 mean, the classification score is limited to [−1,1][-1, 1][−1,1]. This limits (j ∣ xi) l l (j | x_i) l (j ∣ xi) in the scope of the sigmoid function makes even the correct classification, could not make loss value is 0. This problem can be solved by introducing a coefficient τ\tauτ, updating the loss function as follows
The corresponding MCL loss is expressed as LMCL−τL_{MCL-\tau}LMCL−τ, then the gradient of LMCL−τL_{MCL-\tau}LMCL−τ is calculated as follows
According to the above formula, the gradient diagram as shown below is obtained
Obviously, the MCL loss after upgrading still has the problem of gradient disappearing when the classification score is greater than 0.25 or less than -0.25. Another problem is that because the task involves many categories, there is an imbalance between positive and negative categories. To solve this problem, the author further proposed MMCL loss.
Memory-based Multi-label Classification Loss (MMCL)
First, in order to solve the interval problem of scores, the loss function is modified as follows:
Secondly, in order to balance positive and negative categories, MMCL introduced hard Negative class mining. For XIx_IXI, the negative category can be marked Ri/P∗R_i /P^*Ri/P∗. Sort the negative categories by their classification score, and then pick the top R % category as hard negative, Expressed as Ni, ∣ Ni ∣ = (n – ∣ Pi ∗ ∣) ⋅ rN_i, | N_i | = (n – | P ^ * _i |) \ cdot r % Ni, ∣ Ni ∣ = (n – ∣ Pi ∗ ∣) ⋅ r.
Thus, the new loss function can be obtained:
The gradient formula is as follows:
As can be seen from the figure above, the gradient disappearance problem can be solved.
Memory Update
MMM updates after each training iteration in a similar way to moment,
The experiment
See the original paper.
reference
-
[1] Unsupervised Person Re-identification via Multi-label Classification