- Article transferred from wechat official account “Alchemy of Machine Learning”
- Author: Brother Lian Dan (authorized)
- Contact: Wechat CYX645016617
- Thesis title: “Learning From Synthetic Data: Addressing Domain Shift for Segmentation”
“Foreword” : I haven’t updated my official account for a long time recently, and I accidentally fell into a mistake: I thought I had read enough articles, so I used my previous style transfer and GAN knowledge to solve a domain adaptive problem, but a random punch didn’t kill the old master, but made me very tired. Then find such a good DA framework, to study the rules seriously, after the holiday to use the combination of rules again.
0 review
Different from the previous adversarial model or super-pixel information to achieve this domain migration, this paper uses adversarial generation network GAN to narrow the feature space of the two domains.
This paper presents a domain adaptive algorithm for semantic segmentation. The paper pays special attention to the problem that there is no label in the target field.
The traditional DA method involves minimizing some distance functions that can measure both source and target distributions. Two common measures are:
- Maximum Mean Discrepancy (MMD)
- Use DCNN to learn distance Metric through adversarial learning
The main contribution of this paper is to propose an algorithm to align source and target distributions in feature space based on generation model.
1 method
Judging preliminarily from the picture, it is relatively easy to understand:
- First of all, I guess its domain transfer, may be modeled after the GAN domain do style transfer method;
- There are altogether 4 networks in the picture. F network should be the feature extraction network, C network is the network for segmentation, G network is the network for restoring the features extracted from F to the original image, and D network is the network for classification. Different from general GAN, D network is the True source for four classifications. True target, False source, False targe. This is similar to merging two discriminators in cycleGAN.
2 details
The original image is defined as XXX, the source domain is defined as XsX^sXs, and the target domain is defined as XtX^tXt.
- Base network. The architecture is similar to pre-trained VGG16, which is divided into two parts: feature extraction part is called F network, and pixel segmentation part is called C network.
- G network is used to reconstruct the original image from the embedding features generated by F. Network D not only determines whether an image is real or fake, but also performs a segmentation task, similar to network C. This split task is only for the Source domain because the target domain does not have a label.
Xs,Ys{X^s,Y^s}Xs,Ys now assume that we have the data and labels ready:
- Firstly, feature expression is extracted through F, F(Xs)F(X^ S)F(Xs)
- C The network generates split labels Y^s\hat{Y}^sY^s
- G network reconstruction image X^s hat{X}^sX^s
Based on recent successful studies, instead of explicitly concatenate a random variable in the input of G, use the Dropout Layer in the Generator
3 the loss
The authors suggest a number of countervalling losses:
- The losses within a domin are:
- Discriminator loss distinguishes between SrC-Real and SrC-fake.
- Discriminator loss distinguishes TGT-Real from TGT-fake.
- A Generator loss, allowing a fake source to be recognized as srC-Real by the discriminator;
- Losses in different domains:
- F Network loss can make fake source input be judged as real target;
- F Network loss allows fake target’s input to be judged as a real source;
In addition to the confrontation losses mentioned above, there are the following segmentation losses:
- LsegL_{seG}Lseg: The cross entropy loss of Pixel-wise in standard segmentation network C;
- LauxL_{aux}Laux:D network will also output a segmentation result, cross entropy loss;
- LrecL_{REC}Lrec: L1 loss between original image and reconstructed image.
4 Training Process
In each iteration, a random triplet is entered into the model: {Xs,Ys,Xt}\{X^s,Y^s,X^t\}{Xs,Ys,Xt}. The network then updates the parameters in the following order:
- Update parameter D first, and the update policy is as follows:
- For source input, LauxL_{aux}Laux and Ladv,DsL^s_{adv,D}Ladv,Ds;
- For target input, use Ladv,DtL^t_{adv,D}Ladv,Dt
- Then update G with the following update strategy:
- Discriminator fooled two loss, Ladv,GsL^s_{adv,G}Ladv,Gs and Ladv,GtL^t_{adv,G}Ladv,Gt;
- Rebuild damage, LrecsL^s_{rec}Lrecs and LrectL^t_{rec}Lrect;
- F Network update policies are as follows:
- F Network update is the most critical! (In the paper)
$L^s_{adv,F}$to confuse fake source and real target; - Similar to the G-D min-max game, there is a competition between F and D, except that the former is to confuse fake and real and the latter is to confuse source domain and target domain;Copy the code
D
We can see that the D in this is not actually the D in the traditional GAN, and the output is no longer a single scalar, representing the probability that the picture is fake or real
As mentioned ina recent GAN article, patch discriminator (which this paper has just read) is a two-digit value output by D, and each value represents the probability of fake or Real for the corresponding patch. This method greatly improves the quality of images reconstructed by G. The output image is a pixel-wise discriminator. Each pixel has four categories: fake-src, real-src, fake-tgt, and real-TGt.
The Auxiliary Classifier GAN (ACGAN) is the Auxiliary Classifier GAN (ACGAN), which is the Auxiliary Classifier GAN (ACGAN) for large scale real image data. You can train a more stable G, so that’s why there’s still a partition loss in D LauxL_{aux}Laux
6 summarizes
Each component provides critical information. Without further ado, back in the lab for the holidays I’m going to start using a combination of methods to solve the problem.