Adapted from arXiv by Daniel Cohen et al., Heart of the Machine.

ACM International Information Retrieval Research and Development Conference SIGIR 2018 was recently held in Ann Arbor, Michigan, USA. So far, the conference has announced the best paper award, among others, for a paper by Universidad Autonoma de Madrid
Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender SystemsBest paper award, and Best short paper award for “Cross-domain Regularization for Neural Ranking Models Using Adversarial Learning,” a collaboration between Microsoft and the University of Massachusetts Amherst. This article will give a brief introduction to the best short papers.


1 the introduction

Recently, several neural sequencing models have been proposed in the academic circle. These models estimate the correlation between documents and queries by considering the original query-document text [14], the pattern matching documents based on exact query terms [5], or combining the two [10]. These models typically learn to distinguish input feature distributions corresponding to correlated query-document pairs from less correlated query-document pairs by observing a large number of related and unrelated samples during training. Unlike traditional learning sequencing (LTR) models, which rely on artificial features, these deep neural models learn directly from the data for higher-level representations that can be used for target tasks. Their ability to learn features from training data is a powerful attribute, giving them the potential to discover new relationships that were not captured by manually crafted features.

However, as discussed by Mitra and Craswell [9], the ability to learn new features may come at the cost of poor generalization ability and performance over domains not covered in the training process. For example, the model might observe that a certain pair of phrases, such as “Theresa May” and “Prime Minister”, occur more frequently together in the training corpus than other phrases. Alternatively, the model May infer that learning a good representation of Theresa May is more important than learning a representation of John Major, based on the relative frequency of phrase co-occurrence in training queries. Although these correlations and distributions are important to achieve optimal performance in a single domain, if we are more concerned with model performance over unseen domains, then the model must learn to be more robust to unseen domains. In contrast, traditional retrieval models (e.g. BM25 [12]) and LTR models generally show strong robustness in terms of cross-region performance.

The goal of this study is to train a deep neural ordering model to learn useful representations from data without “overfitting” the distribution of the training domain. Recently, adversarial learning has been proved to be an effective cross-domain regularization term suitable for classification tasks [3, 17]. In this paper, we propose a similar strategy to make neural ordering model learn more robust representations for different domains. The researchers train neural sequencing models on small domain sets and evaluate model performance on reserved domains. During the training, the researchers combined the neural sequencing model with the adversarial discriminator, which attempted to predict the domain of the training sample based on the representation learned from the sequencing model. As backpropagation passes through the layers of the sorting model, the gradient of the adversarial discriminator is reversed. This provides negative feedback to the ordering model, preventing it from learning representations that are meaningful only for a particular domain. Experiments show that the adversarial training has a consistent improvement in the sorting performance of the reserved field, sometimes achieving up to 30% precision@1 improvement.


Cross-domain regularization using adversarial learning

The motivation for adversarial discriminators is to make neural models learn domain-independent features that help estimate correlations. The training purpose of traditional neural sequencing model is only to optimize correlation evaluation, ignoring the nature of internally learned features. In this paper, we propose to use an adversarial agent to adjust the direction of model parameters (to the opposite direction) in the domain-specific space of the manifold to make the features learned from the ordering model independent of the domain. This cross-domain regularization by Domain confusion [17] can be represented by the following joint loss function:

Where L_rel is based on the correlation of loss function and L_adv is adversarial discriminator loss. Q, DOCR and DOCNR are query, related document and irrelevant document respectively. Finally, θ_rel and θ_D are parameters of correlation model and adversarial model respectively. λ determines the influence degree of confusion loss on the optimization process. The researchers used it as a hyperparameter in the training process. The sorting model is trained on multiple training fields D_train = {d_1.. d_K} and evaluated on reserved fields D_test = {d_k+1.. d_n}.

The discriminator is a classifier that checks the output of the hidden layer of the sorting model and tries to predict the field D_true ∈ D_train of the training sample. Discriminators are trained using standard cross entropy losses.

Gradient updates are performed by back propagation on all subsequent layers, including those belonging to the ordering model. However, the researchers use gradient Reversal Layer (Ganin et al. [3]). The layer will be standard gradient

To its additive inverse (additive inverse)

. This results in θ_rel maximizing domain identification loss while still allowing θ_D to learn the discriminant domain. Although not optimized directly, this can be considered as a correction of (1) through the sign change of L_adv.

Pedestrian retrieval model. The researchers evaluated the adversarial learning method on a pedestrian retrieval task. They used the neural sequencing model proposed by Tan et al. [16] (referred to as CosSim below) and the Duet model [10] as baseline models. This thesis focuses on the study of domain-independent text representations. Therefore, similar to Zamani et al. [20], this study only considers the distributed formula network of Duet model.

CosSim model is an interaction architecture based on LSTM. The researcher used the method [16] to train the CosSim model and obtained a result 0.2 higher than hinge loss function. According to the method proposed by [10], the Duet-distributed model was trained by maximizing the logarithmic likelihood of correct pedestrians. Similar to [11], the researcher adjusted the hyperparameters of the Duet model to adapt to the pedestrian retrieval task. After the maximum pooling representation, the output of Hadamard product is significantly reduced, the query length is extended from 8 tokens to 20 tokens, and the maximum document length is reduced from the initial 1000 tokens to 300 tokens.

Unlike previous studies using adversarial methods [3, 6, 17], sorting requires modeling the interaction between queries and documents. In this setup, the query-document joint representation learned by checking the neural ordering model against the discriminator is shown in Figure 1A. For deeper architectures, such as the Duet-distributed model, the researchers allowed the discriminator to examine additional layers in the sorting model, as shown in Figure 1B.

Figure 1: Cross-domain regularization using an adversarial discriminator for two baseline models (CosSim and Duet-distributed). The discriminator examines the representation learned from the sorting model and provides a negative feedback signal for any representation discriminated in the help domain.


5. Results and discussion

Table 1: Model performance on L4 Topics, the metrics under each set represent the performance of the model trained on the other two sets. All* refers to the entire L4 collection (with the target topic removed). † indicates a significant performance improvement over non-adversarial models (P < 0.05, Wilcoxon test).

Table 2: Performance across sets, where the performance under each set represents the performance of the model trained on the other two sets. † Indicates significant performance improvement over non-adversarial models (i.e., P <0.05, Wilcoxon test)


Cross Domain Regularization for Neural Ranking Models Using Adversarial Learning

Links to papers: arxiv.org/abs/1805.03…

Abstract: Unlike traditional learning sequencing models, which rely on manually making features, neural representation learning models learn higher-level features for sorting tasks through training on large data sets. However, this ability to learn new features directly from the data may come at a cost. In the absence of any special supervision, these models can learn relationships that exist only in the field of training data sampling, but are difficult to generalize to fields not observed during training. We investigate the effectiveness of adversarial learning as a cross-domain regularization term on sorting tasks. We train our neural ordering model on a small number of domains using an adversarial discriminator that provides negative feedback signals to prevent the model from learning domain-specific representations. Our experiments show that the model consistently performs better on the reserved domain when using an adversarial discriminator — sometimes achieving as much as 30% precision@1 improvements.