preface

In today’s Internet world, recommendation system plays a crucial role in the field of content distribution. How to improve the recommendation effect of the recommendation system as much as possible is the core goal of each recommendation algorithm student. In the overseas recommendation business of IQiyi, the TensorFlow Ranking (TFR) framework was introduced, and on this basis, research and improvement were carried out, which significantly improved the recommendation effect. This article will share the practice and application of TFR framework in overseas recommendation business.

01

Iteration of algorithms: from traditional CTR estimation to LTR

For a long time, the research focus of CTR estimation method widely used in the ranking stage of recommendation system lies in how to estimate the click probability of a user for an item more accurately. In this kind of algorithm, we treat a group of items exposed to users at the same time as a single example, and combine user features, environmental features and the features of each item into one training data. The user’s feedback on this item (click, not click, playing time, etc.) is used as the label of training data. Such seemingly reasonable problem abstraction can not accurately represent the recommended scene.

Strictly speaking, the nature of the sorting problem (especially in the form of waterfall flow) is not to estimate the probability of a user clicking on a single item, but to study which of the items in the group is more likely to click on when a group of items is exposed simultaneously.

Learning-To-Rank (LTR) algorithm appears To solve this problem. LTR algorithm organizes training data in the way of Pairwise or Listwise during training, and exposes a group of items in front of users at the same time. Two (Pairwise) or multiple (Listwise) items and user feature environment feature together form data pairs. As a piece of training data. Accordingly, LTR algorithm adopts NDCG, ARP, MAP and other indicators that can reflect the influence of items order in the evaluation model.

At the same time, due to the LTR algorithm’s training data organization form, this kind of algorithm is easier to achieve better results in a relatively small number of users. Also thanks to this data organization form, it is very convenient to achieve better negative sample sampling.

(There is another interesting study On sampling and model evaluation Metrics in the Recommendation business, 2020 KDD Best Paper Award, On Sampled Metrics for Item Recommendation)

02

TensorFlow Ranking (TFR)

TensorFlow Ranking (TFR) is an official LTR framework developed by TensorFlow. It aims to develop and integrate LTR related technologies based on TensorFlow, making it more convenient for developers to develop LTR algorithms.

In the actual use process, we can realize the benefits brought by TFR framework. In the framework, classes at different levels in the training are abstracted, and related Loss functions are developed to facilitate the training of PairWISE and ListWISE. Meanwhile, metrics of ARP, NDCG and other models are integrated, and TF high-order API (Estimator) is combined. Development can be done quickly and easily without having to struggle with the details of implementation.

As shown in the figure above, the blue block diagram is the algorithm model module that needs to be designed and developed by itself in model_FN parameter when using TensorFlow Estimator. In this model_FN, model structure (Scoring Function) needs to be designed by itself, then logit and label calculated by the model are used to calculate Loss and Metrics, and finally Optimizer is used to optimize the model. Below the red curve is the entire process using the TFR framework.

As can be seen from the figure, in fact, the TFR framework mainly does two aspects:

1. Split the calculation of Scoring Function, Loss and Metrics in original Model_FN. Then replace the Loss and Metrics implemented by ourselves in the original process with the Loss and Metrics related to LTR in the TFR framework.

2. In order to realize LTR training with LTR-related Loss and Metrics in TFR framework, training data should be organized in the form of Listwise. However, as the Scoring Function in the original Model_FN needs to be used, the training data input by the model is transformed through the data conversion Function in the LTR framework in the data input part. The Scoring Function can be used to calculate logIT for the data organized in listwise form.

Therefore, in the TFR framework, the complete process from data to model training is as follows: Training data — User definition Feature_columns — Transform_FN feature transformation — Scoring Function calculation — LOss_FN calculation of ranking_head Loss – Ranking_head eval_metric_fns calculation evaluation indicator – Optimizer For optimization.

From a usage perspective, the TFR framework does both of these things and doesn’t seem complicated. However, from the perspective of framework development, multiple LTR-related Loss and metrics have been developed in the Losses. Py and metrics. In data.py, a tool for reading and parsing tfRecords files that organize data in listwise form is implemented, and a feature processing tool compatible with TensorFlow feature conversion function is developed in feature.py. Finally, the above functions are closed and abstracted layer by layer through classes in head.py and model.py, and well combined with TensorFlow Estimator.

To be more specific, in terms of code organization, the TFR framework is mainly implemented as follows:

The first:

TFR through

Tfr.model. Make_groupwise_ranking_fn is used to package model_FN of Estimator as a whole. In our original tF-based development, complete model functions including Loss and Metrics need to be defined in the parameter model_FN of Estimator, but TFR does not need to be defined. Make_groupwise_ranking_fn returns the model_FN received by Estimator as a whole.

The second:

The first parameter of tFR.model. make_groupWISe_ranking_FN Function is group_score_fn, here is the model structure (Scoring Function) that needs to be introduced into our design and development, but this model is mentioned before. You just have to figure out the logit model.

The third:

The third argument to the tfr.model.make_groupwise_ranking_fn function, transform_fn, In the corresponding call to feature.py, a compatible tensorflow-feature conversion function was developed to convert the Dataset organized in listwise form (read in by the tool in data.py). Ensure that the data format of Scoring Function is correct.

Fourth:

Tfr.model. make_groupwise_ranking_fn’s fourth argument, ranking_head, corresponds to the call to tfr.head. Create_ranking_head, The three parameters define loss, Metrics, and Optimizer. Loss and metrics choose what we need from TFR’s Losses. Py and metrics. Py respectively, while Optimizer still uses the Optimizer in TensorFlow.

The above is the overall architecture of TFR framework, in fact, the overall design and code implementation of this framework, or very elegant and clever.

03

Problems and practices encountered

The delicate design and implementation of the TFR framework solved 80 to 90 percent of the problems in our LTR algorithm based on TensorFlow. But as a framework with only its first release in 2019, TFR still leaves some room for improvement.

TensorFlow 1.x is supported in the 0.1.x release of the TFR framework. Both 0.2.x and 0.3.x only support TensorFlow 2.x. There are still some uncertainties in TensorFlow 2.x (for example, weights in the model custom layer created using the Keras feature API cannot be gradient updated).

(github.com/tensorflow/… TensorFlow 1.X version. We are currently using TensorFlow 1.x, so the content, problems and solutions presented in this article are based on TensorFlow 1.14, which corresponds to the latest TFR 0.1.6.

We started using the TFR framework in mid-2019, when the latest version of the TFR framework was 0.1.3. In use, it was found that sparse/ Embedding features were not supported. However, sparse features are an indispensable part of the recommended features, and most of them may be sparse, so we have to give up using them. However, this problem was soon resolved in a later release of 0.1.4.

We officially started using the TFR framework with version 0.1.4. However, as of the latest 0.1.6 release, there are still two features that we have to use that are not supported in TFR 0.1.x:

· Regularization cannot be implemented in the training process

As mentioned above, the TFR framework encapsulates the model_FN of Estimator by make_groupWISe_ranking_FN.

The Scoring Function model designed and developed by ourselves defines the network, input and output nodes, and finally only needs to output a logit. This is different from the development of model_FN model under the traditional TensorFlow Estimator. The traditional model not only outputs a logit, but also needs to define how to calculate loss and how to optimize. But the TFR framework already encapsulates and integrates this, so score_FN doesn’t need it here. This brings a problem. In the original model design, we can directly take out the regularized parameters in the network and optimize them in the calculation of Loss. However, after using the TFR framework, the model design and forward calculation are carried out in the model function designed by ourselves, while the calculation of loss is carried out in the TFR framework (Loss_FN in ranking_head), so regularization terms cannot be added.

Someone has raised this issue, but there is no good solution :(github.com/tensorflow/…

When we use more complex networks, regularization is an essential part of our optimization process. If regularization terms are not added for optimization, it will inevitably fall into serious over-fitting, as shown in the figure below:

In order to facilitate the use of TFR framework other functions, we in-depth TFR framework source code to try to solve the regularization problem. As analyzed above, the reason why TFR framework cannot implement regularization is that the Scoring Function is designed and developed by ourselves, but the calculation of loss is encapsulated by TFR framework. Therefore, the core of solving this problem is how to take out the parameters to be regularized from the model developed by ourselves and pass them to the part of calculating Loss in TFR framework. In TFR framework, logIT calculated by our model is integrated with logit, labels and Loss function defined in TFR framework through create_ESTIMator_spec method of Ranking_head. To complete the entire optimization process. In TFR 0.1.5, the create_ESTIMator_spec method already supports passing in regularization_losses, Due to initialization of ranking_model object (our Scoring Function).

GroupwiseRankingModel does not support us to get regularization items of our model, so it cannot be implemented.

In theory, we could make TFR regularized by rewriting the compute_logits method of the _GroupwiseRankingModel class in the TFR source code. See here for code details on how to deal with regularization in the TensorFlow Ranking framework. After the regularization term is added, the over-fitting phenomenon will not be so serious during the training of the same model as the figure:

· Feature input does not support Sequence Features

As described earlier, in the TFR framework, model input features are context_features and example_features, which correspond to characteristics common to a request (context characteristics, user characteristics, etc.) and characteristics unique to item, respectively. The data organized in listwise (usually the Dataset generated by the tfRecords file read by the tool in data.py) needs to be converted by TFR’s feature conversion function (_transform_fn). And then into our Scoring Function.

However, the current feature conversion function (_transform_fn) only supports the conversion of classic features such as numeric_column and categorical_column, but does not support the conversion of sequence_categorical_column. To solve the problem that TFR cannot support SequenceFeatures, we need to adjust the transform_FN feature transformation step.

In transform_FN, tFR.feature. encode_listwise_features and

The tfr.feature.encode_pointwise_features functions are defined in feature.py.

The function of these two functions is to generate the dense Tensors of the input model by using user-defined feature columns in listwise or Pointwise mode. Both of these functions call encode_features function to specifically execute feature columns to generate the dense Tensors input model. The encode_features function only supports the conversion of numeric_column and categorical_column, but does not support the conversion of sequence_categorical_column. Through the analysis here, we can see that the feature transformation process is completed in feature.py. Therefore, the core idea to solve the problem that TFR framework supports SequenceFeatures is to modify several functions related to feature transformation in feature.py. Enable these functions to convert the characteristics of the sequence_categorical_column type.

The attributes we use for sequence_categorical_column are all in context_features, so I’m going to take sequence_categorical_column out of that. Add a separate line of code to handle the sequence_categorical_column transformation after handling the classic feature transformation. After the transformation is complete, it is merged back into context_Features, with context_features and example_features still being entered into the model. For details on how to deal with the code, see here. (Let TensorFlow Ranking Framework support SequenceFeatures)

The solution to both of these problems involves modifying the source code of the TFR framework. It’s easy to cause stability compatibility issues and unexpected bugs. To keep the code as stable and reliable as possible, we consider two main code organization principles:

** First, ** minimize the scope of code changes. All changes are made in as few functions as possible and do not involve other module code of the TFR framework.

** Second, ** should be fully compatible with projects that do not involve the above two issues. For projects that do not use feature columns or do not use regularization (which should be rare), ensure that the original logic and calculation results remain unchanged.

04

Experiment: Comparison between LTR model and original model

We also conducted an online experiment to analyze how much effect the LTR sorting model trained by TFR framework can bring compared with the original model with the same network structure. A business scenario is selected and three traffic groups are taken out to make the following models respectively:

**· BaseB: ** There is no sorting service. Priority is configured for each recall channel, and the system gives recommendation results according to the priority.

**· Ranking: ** A Ranking algorithm developed by TensorFlow’s native Estimator.

**· TfrRankingB: ** LTR sorting algorithm developed based on TFR framework.

Among them, compared with Ranking, TfrRankingB model structure is completely consistent, that is, the same Scoring Function is adopted and the training data set is also completely consistent. However, since TfrRankingB adopts LTR model trained by TFR framework, there are several differences in model optimization as follows:

The differences above are the differences in model training methods and evaluation indicators, which are also brought to us by the ADOPTION of TFR framework. Training data and the model itself, including regularization items, are exactly the same as TFRranking. After training, the two models were tested together with BaseB for a complete 4 days in an online real flow environment, in which day_1 and day_2 were weekdays and day_3 and day_4 were rest days. The online experiment examined users’ CTR (click through rate), UCTR (user click through rate) and LPLAY (long play percentage), and the results were as follows:

For business confidentiality, we do not show the specific values of horizontal and vertical coordinates. But the conclusion is clear:

1. TfrRankingB is significantly better than Ranking and BaseB in CTR and UCTR indicators.

2. In terms of LPLAY indicators, TfrRankingB is superior to Ranking and BaseB.

conclusion

Using the TFR framework, it is very easy to develop LTR models based on TensorFlow or adapt existing models to LTR models. At the same time, TFR framework module design, code logic are very clever, such as high cohesion low coupling and other commonly talked about norms are also really fall on the code. In the following work, we will gradually upgrade the existing TensorFlow 1.x version to 2.x and observe the support of the TFR framework for TensorFlow 2.x.

reference

1. Burges C J C. From ranknet to lambdarank to lambdamart: An overview[J]. Learning, 2010, 11(23-581): 81.

2. Liu T Y. Learning to rank for information retrieval[M]. Springer Science & Business Media, 2011.

3. Pasumarthi R K, Bruch S, Wang X, et al. Tf-ranking: Scalable tensorflow library for learning-to-rank[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019: 2970-2978.

4. Krichene W, Rendle S. On Sampled Metrics for Item Recommendation[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020: 1748-1757.

5. Github.com/tensorflow/…

I Qiyi Technology SauceTens of thousands of happiness because of you become simple, pay tribute to my side the most lovely you, code a new vision of entertainment! # Black technology #1024 Programmer festival # iQiyi programmer @iQiyi @i Qiyi technology sauce video number

Maybe you’d like to see more

Domain driven design framework Axon practices

Iqiyi RND Framework JS Framework analysis