This paper introduces the mechanism of the whole network distribution scenario of Taobao details page.
Product details page is one of the modules with the largest flow in the hand Amoy, which loads the detailed information of billions of products, and is an essential part of the entire decision-making process of users. This block should not only undertake the demands of users’ full perception of the current commodity, but also be able to shoulder the retention of the diversion flow from other sources, ultimately activating the activity of the internal flow of the platform and the external flow in the whole ecosystem as much as possible. At the same time, the product details page is also the link between many scenes and even the platform. The user’s behavior track in the platform always alternates between multiple scenes and the details page, and further behavior decisions (purchase/purchase, etc.) are generated in the details page. Therefore, the details page should not only meet the demands of users to “learn more”, but also meet the demands of the platform to “start and turn into middleware”.
Traffic within the details page has two notable features:
- Large flow, often is the user purchase decision link;
- It takes on a lot of external drainage.
Due to these two important characteristics, as well as the product design considerations to improve the platform viscosity and enhance the fluency of user behavior as much as possible, we set up some netwide distribution scenarios inside the details page and conducted some algorithm exploration based on the characteristics of these scenarios.
background
As a result of information explosion, users have few access to massive information and few access to effective information. If social media is the voice of the voiceless, then recommendation system can be regarded as the voice of massive information, as well as the producer of the exposed information of platform users. Therefore, we have the responsibility and obligation to ensure the quality and quality control of the recommended content, which is a great appeal and challenge for the recommendation system. The current recommendation system, by deeply mining user behavior, carries out personalized demand mining and real-time interest capture for users, aiming at helping users to quickly and accurately locate themselves in massive information, so as to better complete intelligent services.
The distribution recommendation on the details page shoulders the important responsibility of “serving merchants”, “improving user experience” and “good platform distribution efficiency”. This gives us three different priorities for the scenario that need to be taken into account in order to create a better traffic distribution site. The way we solve these three needs is to open up the whole network distribution module in front of the same store product recommendation, which ensures the rights and interests of merchants to a great extent, and enables users to quickly locate the products “guess what you like” among the massive products in a focused page. There is one biggest difference between the recommendation in the detail page and the recommendation in the public domain: each detail page is the information derivative field of the main commodity, and the recommendation content is strongly constrained by it. Most existing studies lack exploration of scenarios with prior information: they only emphasize the individual interests of the user. There is some important and directly relevant prior information that is directly ignored. We observed that the user’s click behavior was highly homogeneous with the main item (the item/topic that woke up the recommendation page) on the recommendation page for a single item/topic. In these scenarios, the user has conveyed a very focused and clear intention to the model through the main goods, so the relevant results of recommendation should not be arbitrarily generalized. At the same time, aggregation reduces the efficiency of distribution and causes users to feel tired during browsing. Therefore, the recommended content for these scenarios should follow the strategy of “clear intent, moderate divergence”. Of course, with the benefit of the main product information, we can adapt the recommendation strategy to local conditions when tuning the model, so as to create some clearer and more explicable user experience compared to other scenarios. This is the original intention of this article. If such a “to the product push” scenario would like to know more details, this article will take you to come and see us together to explore problems – “user interest reinforce and extend” immediately, and the model solution and online engineering practice.
The scene is introduced
Among them, the network traffic distribution scenario mainly includes the information flow at the bottom of the details page (the good goods next door), the main graph horizontal slide (new), and the purchase shell layer (new). These scenarios break the situation that the private domain is locked in, and fully improve the energy efficiency of the private domain network distribution. Of course, in order to take into account the interests of merchants, these scenarios will be divided into two parts (same-store content recommendation module and cross-store content recommendation module).
Technology to explore
Algorithmic problem definition – immediate interest reinforcement
Entering the details page is an active action initiated by the user, so the user has a strong focus of interest on the main commodities on the current page. The main product information can help us quickly locate users’ immediate interests, which is crucial for recommendation algorithms. Although there are many methods to replace the last position of behavior sequence with immediate interest, or to use models to mine immediate interest, these methods are all inference in uncertain events, and no detail page naturally contains strong intention information such as main commodity. Based on this, our work will model and strengthen this information from different aspects of the recommendation technology, so that the details page distribution scenario can be combined with the characteristics of the scenario to meet the immediate needs of users as much as possible.
The recall
background
With the popularization of deep learning technology in many fields and the rise of vector retrieval technology, a series of deep learning recall technologies based on similar ideas have emerged one after another. In 2016, Youtube put forward the idea of DNN recall in the recommendation system. It combines user historical behavior and user portrait information, which greatly improves the personalization and richness of the matching range. Our work is based on our colleague’s recall work SDM: Deep Recall Based on User Behavior Sequence Modeling, and the User Based Sequential Deep Match is also a continuation of this idea. SDM can well model the dynamic changes of users’ interests, and synthesize the long-term and short-term behaviors to represent users in different dimensions, so as to better use low-dimensional vectors to express users and commodities, and finally complete deep recall with the help of large-scale vector retrieval technology. SDM online compared with BASE (multi-channel I2I recall merge) IPV index improved by 2.80%. Compared with SDM model, IPV of CIDM model is improved by 4.69%. On this basis, in order to fit the characteristics of the distribution scenario of the details page, we enrich and explore the relevant information of the main commodities, and take it as an immediate interest to improve the structure of the recall model.
Model — CIDM(Current Intention Reinforce Deep Match)
In order to enable model SDM to catch the main commodity information and interact with user behaviors, we designed the following model structure, in which trigger is the main commodity in the details page, and we represented and strengthened it from several aspects:
- Trige-layer: inspired by Paper 1, explicit modeling of main goods: In addition to modeling users’ long and short-term preferences in SDM, the user’s immediate preference Layer is introduced to integrate the characteristics of main goods with long and short-term preferences as the final expression of users.
- Trigger-attention: change the self-attention used in the original model to target-attention where the Trigger acts as the target;
- Trigger-lstm: Drawing on the modeling idea in paper 2, we introduce Trigger information into the structure of Lstm and add trigger-gate to make Lstm tend to remember more content about main commodities.
- Trigger-filter-sequence: It is found in the experiment that the recall modeling using the leaf category of the main commodity and the sequence obtained from the primary category filtering as the supplement of the original sequence can increase revenue, so cate-filter-seq and CAT1-filter-Sequece are added to the data source.
Among them, the first two points are relatively obvious, so we will not repeat them here. We will elaborate on three or four innovation points in detail.
In thesis 2, it is demonstrated that adding a time gate can better capture the short-term and long-term interest of users. Based on this conclusion, we try to design a trigger-gate to introduce the impact of trigger information into the sequence features captured by the model. We tried a variety of structural variants to compare two ways of work (see figure) :
- The trigger information is input as one line of the memory gate, that is, after passing the sigmoid function, it is multiplied by the information to be updated before.
- Parallel to the first memory gate, add a new instant interest gate whose input is the cell input and the current main commodity, consistent with the memory gate structure.
In this way, the main commodity information can be retained more fully.
The first method is simply a modification of the memory gate:
The second method adds an instant interest gate:
In these two experiments, the offline HR index increased by +1.07%. 1.37%, and the optimal version online index IPV increased by +1.1%.
According to our own experimental conclusion: “Using the sequences filtered by leaf category and primary category of the main commodity as a supplement to the original sequence and as model input can improve the prediction accuracy”. This shows that the structural information of the main commodity has obvious benefits, and it can produce positive constraints on the sequence samples. Basically, some samples in the original sequence that are less relevant to the current main commodity are filtered out, which is equivalent to data denoising. Along this line of thought, associated with the coding machine is mainly used for the data noise reduction and feature dimension reduction, so the consideration based on AE model to deal with the sequence of the structure, what’s more, because we are directional denoising (excluding irrelevant with the main goods), we use the variational since the coding machine (VAE), borrow the main commodity information in hidden variables space with constraints on the expression of the sequence, To ensure that the hidden layer can better abstract the characteristics of the sequence data.
Variational autocoder is a series of models with dual structure (including encoder and decoder) joint training. It draws lessons from the idea of variational inference and carries out personalized customization in the space of hidden variables, which is more suitable for our interest modeling needs. First we have a sample of the data, its likelihood distribution can be expressed as, maximize its logarithmic likelihood of the posterior probability distributionIs agnostic, so VAEs use custom distributionsTo approximate the true posterior probabilityThe KL divergence is used as a measure of the degree of similarity between the two distributions. The overall optimization function can be expressed as:
Refer to article 5 for specific derivation. Where the first term is used as a posterior distribution of the hypothesisAnd prior distributionAs close as possible. The second term is the reconstruction loss to ensure the overall stability of the self-coding structure. Where, prior distributionIt’s our custom, and we want to incorporate information about the main product, so we assumeThat is, the expression of the main commodity is used as the mean value of the Gaussian distribution, and the second moment of the sampling batch is substituted as the variance of the Gaussian distribution. Therefore, the optimization function of the model becomes:
Inspired by paper 3 and 4, we design the network structure in the following form, introduce the feature vectors of the main commodities into the variational self-coding network as mu and Sigma, regulate the expression of sequence features in the hidden space, and use the learned sequence hidden space variable seq_hid as the user’s strong intention sequence to express trigger_emb. And long – and short-term preferences.
This experiment showed an increase of +2.23% in offline HR index, which was not tested online.
The effect
Compared with SDM model, IPV of CIDM model is improved by 4.69%.
Fine line
background
The elaboration model is explored and developed based on DIN(Deep Interest Networks). Our idea is to integrate more information of main commodities into the sequence information. In fact, sequence information mining and main commodity information strengthening are the externalization of the two needs of our scene. The main commodity information strengthening can well grasp the users’ immediate intention and meet the users’ immediate focus needs. However, sequence information mining is based on the extension of the current intention, which can diverge the intention to a certain extent, so that the recommendation results will not be too concentrated and bring experience fatigue. Of course, the two aspects need to be balanced, so that the model can identify the timing and extent of the “convergence” and “divergence”. On this basis, we carry out: 1. Mining more semantic information of main commodities; 2. 2. Strengthen the guidance and influence of main commodity information on sequence feature extraction.
DTIN(Deep Trigger Based Interest Network)
Firstly, we hope to mine more semantic information of the main commodity. In this part, we align the relevant features of the main commodity (trigger) with the commodities to be scored (candidate), and then directly put these features into the wide side of the model, so as to improve the sensitivity of the model to the representation of the main commodity.
Secondly, as DIN’s motivation is to introduce attention mechanism to capture users’ interest points more accurately, as a reflection of users’ interest points stronger than the goods to be scored, we designed a double attention structure to strengthen this part of information. As shown in the figure, firstly, concat the commodity features of trigger and candidate into the first attention structure to learn the weighted vector of the first layer. This separation value integrates the information of trigger and candidate, and it can be regarded as the user interest extraction based on the intersection of main goods and goods to be scored. Then, we only use the main commodity information as the query and pass it into the second attention structure to learn the second weight vector, it can be thought of as extended interest capture based only on immediate interest. The two weight vectors are then multiplied bitwise to form the sequence weight vector. The part of model structure design has experienced a lot of exploratory experiments. If you are interested, welcome to discuss with us. Here is only the version with the best effect in our experiment.
The effect
Compared with DIN model, THE IPV of DTIN model increased by 9.34%, corresponding to the auC of offline experiment increased by 4.6%, gauC increased by 5.8%.
Thick row
motivation
The rough arrangement model is designed to solve the special problems of the recommendation system applied to the industry. When the recall set is large, the precision arrangement model is too complex to guarantee the scoring efficiency. So the rough row model came into being. As the distribution scenario of details page needs to recall goods from the whole network of 100 million products, and the recall phase uses a combination of multiple recall methods (including I2I, vector recall, etc.). This makes the recall order of magnitude larger, and the overlapping of multiple recalls makes the matching features not on the same scale, which brings great pressure to the subsequent fine arrangement model. Based on this, we developed a rough row module with two parts: bridge recall and fine row. The goal of the module is to screen the recall results. It not only needs to give consideration to efficiency and accuracy, but also needs to be compatible with multi-scale recall methods. Based on the characteristics of our scene, the real-time intention modeling based on the main commodity is carried out in the initial screening stage.
Tri-tower(Tri-Tower Preparatory Ranking Framework)
Due to the efficiency requirements of the rough layout model, it is not possible to build an overly complex structure. Based on the two-tower rough layout model, we add a new main commodity tower, trigger-Tower, to strengthen the immediate interest. The characteristics of this tower are consistent with the commodity tower, and cross with the commodity tower after exporting logits at the top. Added to the input of the sigmoid function as a complement to the previous twin tower model. The model structure is as follows:
Among them, Trigger Net and Item Net use some lightweight statistical class features on the Item side, and User NET also checks large-scale ID class features on the basis of deep Match. Ensure that the rough layout model is light and service is fast. The final IPV index of the three-tower rough-row model is improved by 3.96% compared with that of the no-rough-row model.
conclusion
On the whole, the optimization idea of the details page distribution scenario is relatively unified, which is to mine the main commodity information and strengthen the correlation of the user’s historical behavior in the model. We have added a barrier (instant interest reinforcement) to the traditional interest-mining web, preserving those intentions that are clear and most relevant to the moment. In this way, the recommended results are somewhat convergent. At the same time, the multiple interests are not completely erased in the model, but the divergence of the results is affected by the dynamic weight adjustment of the attention network, which also ensures that the recommended results are personalized and divergent to a certain extent.
So far has been expounded the “user immediate interest in strengthening and extending” subject in the private domain distribution scenario three main links: recall – thick rows – the essence of the above have earnings to, of course, this process is also accompanied by a lot of failure, whether model optimization and the obstruction on the engineering practice, has brought us a fruitful experience. In addition to the three main models, we have also optimized the strategy and other link models for this problem, which will not be described here. If you are interested in details or further optimization directions, please feel free to contact us.
reference
- Tang, Jiaxi, et al. “Towards neural mixture recommender for long range dependent user sequences.” The World Wide Web Conference. 2019.
- Zhu, Yu, et al. “What to Do Next: Modeling User Behaviors by Time-LSTM.” IJCAI. Vol. 17. 2017
- Liang, Dawen, et al. “Variational autoencoders for collaborative filtering.” Proceedings of the 2018 world wide web conference. 2018.
- Li, Xiaopeng, and James She. “Collaborative variational autoencoder for recommender systems.” Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.
- Zhao, Shengjia, Jiaming Song, “Towards deeper understanding of Variational Autoencoding Models.” arXiv Preprint arXiv:1702.08658 (2017).