Abstract: This paper briefly reviews the flow, characteristics and limitations of traditional migration algorithms, and then introduces several algorithms to solve the problem of migration when the source domain data has some access restrictions. Specific include: ADDA-CVPR2017, FADA-ICLR2020, SHOT-ICML2020.

This paper introduces a migration algorithm in a special situation: privacy protection migration algorithm. First, this paper briefly reviews the flow, characteristics and limitations of traditional migration algorithms, and then introduces several algorithms to solve the problem of migration when the source domain data has some access restrictions. Specific include: ADDA-CVPR2017, FADA-ICLR2020, SHOT-ICML2020.

Traditional migration algorithm UDDA

Firstly, the traditional migration algorithm mentioned here mainly refers to Deep Domain Adaptation, and more specifically, Unsupervised Deep Domain Adaptation (UDDA). Because UDDA is the most common and widely focused setup, much more work has been done on this than on the rest of the migration algorithms.

Given a Target Domain, which has only unmarked data and therefore cannot be supervised to train the model, the Target Domain is usually a new site, scene, or data set. In order to build a model without marked data in the target Domain, the knowledge of Source Domain can be used. The Source Domain usually refers to the existing site, scene or data set. The knowledge can be the model trained in the Source Domain, the original data of the Source Domain, the characteristics of the Source Domain, etc.

With the help of labeled source domain, a model can be built even without labeled data on the target domain. The key difficulty to make the model effective for target Domain data is the difference in data distribution between source Domain and target Domain, which is called Domain Shift. How to align the data of source Domain and target Domain is the main problem solved by UDDA.

UDDA generally contains the following three frameworks:

First, the source domain and target domain data (cylinder) will be extracted through the feature extractor (Encoder) feature (rectangle), then various methods will operate on the source domain and target domain features, make the source domain and target domain data feature alignment. It is important to note that UDDA usually assumes that the source and target fields have the same categories, such as 0-90 handwritten digits, but the handwriting styles of the source and target fields are different.

There are three categories of ways to manipulate source-domain and target-domain characteristics:

  • Based on statistical alignment: various statistics are used to align the distribution of source domain and target domain features, such as MMD Loss and CORAL Loss, etc.

  • Based on adversarial alignment: A Domain Classifier is established as the Discriminator. The purpose is to distinguish the characteristics of the source Domain and the target Domain as far as possible. Gradient Reversal Gradient is used. GRL can force the feature extractor to extract Domain Invariant features.

  • Alignment based on reconstruction: the features of source domain and target domain are generated through the same generation network to generate the corresponding data. It is assumed that only samples with close distribution can use the same network to generate data to align the features of source domain and target domain.

For the specific algorithms of the above UDDA, please refer to previous articles:

Zhuanlan.zhihu.com/p/205433863…

This article presents only a few features of UDDA:

  • Source domain data available: UDDA assumes that source domain data exists and is available;

  • Source domain target domain data can be mixed: UDDA usually assumes that source domain and target domain data can be processed together, that is, they can be run on the same device.

  • The training and prediction process is transitive: the target domain data must be trained together with the source domain data so that the feature extractor can extract domain-independent features and migrate the source domain model to the target domain. Therefore, when a batch of new target domain data arrives, the source domain model cannot be used for prediction directly.

Generally speaking, the traditional UDDA approach assumes that source domain data is available, source domain target domain data can be mixed, and training process Transductive. However, in some scenarios, the data in the source domain is not available or cannot be transferred out. How can you perform migration in this case?

First of all, it is important to note that source domain data cannot be transmitted and source domain data cannot be obtained. The former assumes that source domain data exists but cannot be put together with target domain data. The latter means that source domain data does not exist at all.

ADDA

ADDA is a work of CVPR2017 from the paper Adversarial Discriminative Domain Adaptation.

Back to the subject, the training flow chart of ADDA is as follows:

Firstly, in the pre-training Stage, marked data is used for training in the source domain, and cross entropy loss is adopted:

Among themIs the feature extractor of the source domain,Is the classifier of the source domain.

Then there is the Adversarial Adaptation Stage, which copies the feature extractor from the source domain to the target domainAnd classifyFixed migration to target domain. And then there’s yeahFine-tuning based on target domain data if and only if target domain feature extractorFeature extraction in target domain and source domain feature extractorWhen the features extracted from the source domain data are similar, the classifier of the source domain can be well adapted to the target domain, that is, the purpose of the following formulas is mainly to make

The easy way is still to train with confrontation. The first step is to train the Discriminator to distinguish the features of the source domain from those of the target domain.

Represents the domain discriminator:

Step two: Training,Make the discriminator as inseparable as possible:

Repeat the above two steps until convergence.

It can be seen that in the above process, the source domain feature extractorUsed only in the pre-training phase of the source domain, then copied to the target domain, the target domain fine tuning feature extractor. In other words, after the model trained by the source domain, including feature extractor and classifier, is transferred to the target domain, the target domain only fine-tune the feature extractor, so that the features extracted by the feature extractor are aligned with the features of the source domain, and the classifier of the source domain is still used for classification.

Why does this approach extend to privacy protection? As you can see, the source domain data is only used in the pre-training phase, and only source domain characteristics are used in the subsequent alignment processRather thanThe latter requires access to the source domain raw data.

In general, ADDA allows feature extractors for source and target domains to be inconsistentParameters are decoupled, and only the characteristics of the source domain are used in the training process. If the source domain data and the target domain data are not on the same device, assuming that the characteristics of the source domain data can be sent out, the scheme can achieve privacy protection.

FADA

Like the CVL group introduced above, Xingchao Peng extended ADDA to the multi-domain version and proposed FADA. FADA is from Federated Adversarial Domain Adaptation, ICLR2020.

This paper proposes a new scenario, FADA, which is multi-domain migration under federated learning. Suppose there are many source domains, the data of each source domain is distributed on separate devices, and the original data cannot be exported. How can you reuse its model to the target domain in this case? In short, how do you align features with the constraint that the data can’t be sent out?

This article assumes that the characteristics of each domain can be sent out, consistent with the ADDA hypothesis. Suppose you haveEach source domain trained a feature extractorAnd classifier, first for the feature extractor of the target domain

And classifier, using the weighted average method in Federated Learning:

Among themThe contribution of each source domain to the target domain is measured and generally needs to be satisfied

. FADA mentioned a method of Dynamic Attention, which will not be introduced here. It is mainly measured by the improvement of the feature differentiation degree of the target domain after the current model of the source domain is fused to the target domain. In the simple case, I can take. In short, there is no mark on the target field, so it is impossible to train outIs obtained by weighted average of the source domain model.

Next, FADA uses feature extractor to extract features in each domain, i.eThen, assuming that these characteristics can be transferred to the same device, you can train a Domain Identifier (DI) on that device. Note that the discriminator here is not the same as in ADDA because there are multiple domains involvedClassification.

The loss function of the training domain discriminator is as follows:

Among themIt’s the first digit of the vectorItem, that is, the target will train the domain discriminator so that noThe data in the source domain is predicted to be the firstCategory, and the target domain sample is predicted to be no

Class.

After the field discriminator is trained, theSend to the device where each source domain resides, and then train the respective feature extractorTo confuse

The general framework of FADA is shown as follows. This framework integrates many methods, including Feature decoupling (Feature Disentangle) and so on, which will not be introduced here.

In general, FADA sends the features of multiple source domains and target domains to a specified device, trains a domain discriminator on the device, and then sends the domain discriminator to each source domain as an antagonistic item to prompt the corresponding feature extractor to extract domain-independent features. You can say that FADA is a multi-domain extension of ADDA.

SHOT

SHOT is an interesting piece of work, Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation, From ICML2020, the author information screenshot is as follows:

While ADDA and FADA assume that the source domain data cannot be sent out of the device, SHOT assumes that the source domain data cannot be retrieved, that is, the source domain data is missing or nonexistent.

So how do you migrate when there is only a lot of unlabeled data in the source domain model and the target domain? SHOT solves this problem. First, SHOT refers to Source Hypothesis Transfer, which refers to the classifier of the Source domain model. One thing that SHOT and ADDA share is that they both anchor the classifier for the source domain model and fine-tune the feature extractor for the source domain. While ADDA fine-tuned the target domain feature extractor by fighting losses (assuming access to features of the source domain data), SHOT was self-supervised training through Pseudo labels.

First, SHOT performs supervised training on the source domain model, which can be denoted as, includingAre the feature extractors and classifiers of the source domain respectively. When training, Label Smoothing is used to make the training model more transferable and generalization.

Then, copy the source domain model to the target domain,, hold theFine-tuning,

SHOT first employs a common loss of Information Maximization (IM) to minimize the entropy of classification probabilities for each sample in the target domain and to minimize the average predicted probabilities for all samples. Suppose the target domain sampleThe predicted result is zero, includingIs a Softmax function. So rememberIs the average of the sample prediction probability of the target domain (the average of the sample prediction probability of a Batch can be calculated). Then IM loss is:

This loss cannot completely train the feature extractor of the target domain properly, so it is necessary to use the following pseudo-label technology for training.

Pseudo-label technology is very intuitive. It is to label the unlabeled samples using the current model, and then label the samples with the highest confidence in the prediction results, and then use the pseudo-label data to continue training the model.

For example, for the target domain sample, the maximum predicted probability according to the modelSort, select the largest part and label it as. Direct use of pseudo-label training is easy to cause error accumulation, so it is necessary to make the pseudo-label as accurate as possible, and a process of Label refining can be used. Specifically, it includes:

Among themIs the firstClass center of class sample,It’s a function of distance. More than a few formulas can be thought of as a few K – Means the operation, the first formula according to the model output probability value and characteristic vector of each sample for Soft is weighted by category center, the second formula according to each sample and each kind of center distance to play tag, the third formula is Hard weighted update class center, the fourth formula is based on distance to play tag. This iteration can be repeated many times, but in general it is more accurate to use the pseudo-label after the two-step iteration.

The above is the process of label refining, which mainly refers to the use of the relationship of the target domain samples (clustering results) to further adjust the pseudo-labels, rather than just using the predicted results of the model.

After labeled with false labels, the model can be trained according to cross entropy loss, and the performance of the model can be improved to a high level by integrating IM loss.

conclusion

To summarize, traditional UDDA and ADDA, FADA, and SHOT, which this article focuses on, can be distinguished using the following figure:

Adversarial Discriminative Domain Adaptation is the work of Eric Tzeng from the University of California, Berkeley, whose works include DDC and ADDA. NeurIPS 2018’s Algorithms and Theory for Multiple-Source Adaptation, based on Algorithms and Theory of Multiple Source Adaptation, is one of Judy Hoffman’s best publications on multi-domain migration. Kate Saenko is the Leader of Computer Vision and Learning Group (CVL) at Boston University. She is a female scholar. Kuniaki Saito and others are in or have been in this group.

CVL representative works are (personal assessment, the following articles I have read or studied more or less in the process of learning DA) :

  • Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko: Federated Adversarial Domain Adaptation. ICLR 2020

  • Xingchao Peng, Yichen Li, Kate Saenko: Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation. ECCV (6) 2020: 756-774.

  • Shuhan Tan, Xingchao Peng, Kate Saenko: Generalized Domain Adaptation with Covariate and Label Shift ALignment. CoRR ABS /1910.10320 (2019)

  • Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko: Domain Agnostic Learning with Disentangled Representations. ICML 2019: 5102-5112

  • Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, Bo Wang: Moment Matching for Multi-Source Domain Adaptation. ICCV 2019: 1406-1415

  • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko: Semi-Supervised Domain Adaptation via Minimax Entropy. ICCV 2019: 8049-8057

  • Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko: Adversarial Dropout Regularization. ICLR (Poster) 2018

  • Xingchao Peng, Ben Usman, Neela Kaushik, Dequan Wang, Judy Hoffman, Kate Saenko: VisDA: A Synthetic-to-Real Benchmark for Visual Domain Adaptation. CVPR Workshops 2018: 2021-2026

  • Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell: Adversarial Discriminative Domain Adaptation. CVPR 2017: 2962-2971.

  • Baochen Sun, Kate Saenko: Deep CORAL: Correlation Alignment for Deep Domain Adaptation. ECCV Workshops (3) 2016: 443-450

  • Baochen Sun, Jiashi Feng, Kate Saenko: Return of Frustratingly Easy Domain Adaptation. AAAI 2016: 2058-2065

  • Eric Tzeng, Judy Hoffman, Trevor Darrell, Kate Saenko: Simultaneous Deep Transfer Across Domains and Tasks. ICCV 2015: 4068-4076.

reference

  • Eric Tzeng, Judy Hoffman, Kate Saenko, Trevor Darrell: Adversarial Discriminative Domain Adaptation. CVPR 2017: 2962-2971.

  • Xingchao Peng, Zijun Huang, Yizhe Zhu, Kate Saenko: Federated Adversarial Domain Adaptation. ICLR 2020

  • Jian Liang, Dapeng Hu, Jiashi Feng: Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation. CoRR ABS /2002.08546 (2020)

This article is shared from huawei cloud community “[Technology dry goods] under the protection of privacy migration algorithm”, the original author: quite suddenly.

Click to follow, the first time to learn about Huawei cloud fresh technology ~