Recommended system study notes series links:

Recommendation system learning note recall strategy based on content recall recommendation system learning note recall strategy based on collaborative filtering recommendation system multi-path recall and Embedding recall strategy


1. Multiple recall

1.1 an overview of the

The so-called “multi-path recall strategy” refers to the strategy of using different strategies, features or simple models to recall a part of candidate sets respectively, and then mixing these candidate sets together for the use of the subsequent ranking model.

Then we say why we need to use multiple recall strategy, we recall the design layer, the computing speed and the recall rate these two indicators are mutually contradictory, that is to say, at the time of improve the computing speed need to try to simplify the recall strategy, the recall rate will lead to unsatisfactory, the same, needs to improve the recall rate requires complex recall strategy, The speed of computation must be reduced accordingly. After weighing the two, the industry generally adopts the “multi-way recall strategy” which is superimposed with several simple recall strategies.

In multipath recall, each strategy is unrelated, so it is generally possible to write and multithread simultaneously. For example, in the recommendation system of news, we can recall articles by category, author, popularity, etc., so that the recall results are more appropriate to the actual requirements. Meanwhile, we can open up multiple threads to carry out these recall strategies separately, which can be more efficient.

1.2. Say more

It should be noted that the characteristics of relevant businesses should be fully considered when selecting recall strategies, that is, those strongly related to business. For example, for news recall, it can be “hot news”, “news type”, “news category”, “author recall”, etc.

As shown in Figure 2, the first K candidate sets will be pulled for each path recall. The size of K of each path belongs to hyperparameter and can be different. The size of K generally needs to be determined by offline evaluation and online A/B testing.

Although the industry generally adopts multiplex recall strategy, but the multiplex recall there are still some inevitable defects, for example, from the strategic choice to the candidate set size parameters adjustment needs to be artificial, additional information is split between different strategies, not considering the influence of different strategies for the same items. Of course, there are good solutions to these defects: the recall based on Embedding is described later in this article.

1.3. Fusion sorting and strategy

After getting some candidate set after each recall strategy, how to combine these results…

For example: The itemid (weight) returned by several recall strategies is:

The recall strategy Recall item list (item-ID: weight) Recall item list (item-ID: weight) Recall item list (item-ID: weight)
Recall Strategy 1 A: 0.9 B: 0.8 C: 0.7
Recall Strategy 2 B: 0.6 C: 0.5 D: 0.4
Recall Strategy 3 C: 0.3 D: 0.2 E: 0.1

For the above, we have the following integration strategies:

  1. Display the recall strategy in order, for example, “Recall strategy 1” > “Recall strategy 2” > “Recall strategy 3”, then directly display A, B, C, D, E

  2. In the averaging method, the denominator is the number of recall strategies containing item-ids from the itemlist, and the numerator is the sum of the weights. For example: 1).b: (0.8 + 0.6)/2, 2).C: (0.7 + 0.5 + 0.3)/3

  3. Weighted average In the weighted average method, we specify the relevant weights ourselves. For example, we assign the weights of the three strategies as 0.4, 0.3, and 0.2, then the weight of B is (0.4 * 0.8 + 0.6 * 0.3+0 * 0.2)/(0.4+0.3+0.2). The problem with this method is that the weight of each strategy is set by itself and is not accurate, so there is a dynamic weighting method

  4. The dynamic weighting method calculates the CTR of XYZ’s three recall strategies as the dynamic weight updated daily. So how to calculate the CTR of each strategy, again for example:

    • Display log – with recall source: X, Y, Z, X, Y, Z
    • Click log – With recall source: click X
    • Then CTR of each recall = clicks/displays (X:1/2)

    The drawback of this approach is that it only considers the click-through rate, not the whole picture.

  5. Machine learning weight method We can use machine learning methods to set the weight. For example, we can use the logistic regression LR classification model to calculate the weight of various recalls offline in advance, and then make the weighted recall. This method will be more accurate by considering more characteristics and environmental factors.

The above fusion sorting method, the cost increases gradually, the effect becomes better, you can choose according to the cost, or you can use A/B test to determine which method to use.

2. The Embedding a recall

2.1 an overview of the

The multi-path recall strategy is mentioned above and its shortcomings are also mentioned. In order to make up for the shortcomings of multi-path recall, the current strategy based on Embedding recall is produced. In addition, the practical effect and speed of Embedding recall method are not inferior to multi-path recall. In fact, the information of “interest tag”, “interest Topic” and “hot” in the multi-path recall can be integrated into the final Embedding vector as additional information in the Embedding recall method. In other words, the two methods based on Embedding recall and multi-path recall are combined.

When Embedding recall, we can take the similarity between Embedding as the only judgment standard, so we can arbitrarily limit the size of candidate set for Embedding recall.

2.2 Common methods of Embedding recall

Before Embedding recall method is introduced, we introduce some common nouns such as I2I, U2I, U2U2I, U2I2I, U2TAG2I, as shown in the following figure. “2” represents the edge in the figure below, and “U” and “I” represent the nodes in the figure below.

  • I2I: Calculate item-item similarity for similarity recommendation, correlation recommendation and association recommendation;
  • U2I: based on the results of matrix decomposition and collaborative filtering, recommend I to U directly;
  • U2U2I: collaborative filtering based on users, firstly finding similar users and then recommending items that similar users like;
  • U2I2I: Collaborative filtering based on items, first count the user’s favorite items, and then recommend his favorite items;
  • U2TAG2I: tag-based generalization recommendation. The tag vector favored by users is counted first, and then all items are matched. This tag is generally the tag, classification, keyword and other tags of the item.

The next question mainly introduces the U2I and I2I methods.

2.2.1 U2I recall scheme

Preliminary: U2I recall algorithm, uESe2VEC, Word2VEC personalized, Crosstag, DSSM personalized recall algorithm; User2vec is to take the user’s tag vector and the article’s tag vector similarity, do recall; DSSM personalization is to take the user’s DSSM vector and the article’s DSSM vector similarity, do recall; Crosstag is equivalent to multiple user2veCs. It is necessary to make statistics of user tags by category. K tags are taken from each class to obtain M groups of tags.

Advanced: UESR2VEC is in the initial stage of recall, do some simple attempts, simple violence quick effect, storage pressure. Each user stores a recommendation list. In the early stage of the product, when DAU was low, the conflict was not obvious. As DAU continued to increase, the storage problem became more and more serious, which forced us to find ways to change the status quo. The other is to convert individual recommendations into group recommendations. We did both.

The group recall process is as follows:

We have tried cluster recall, group portrait recall, LSTM, DSSM, BNB, incremental clustering, and dynamic rule clustering.

Cluster recall is to use clustering algorithm (such as minibatch-Kmeans) to gather the tag vectors of all users into several clusters (such as 500 clusters, determined by elbow point method), and then save the cluster tags, cluster centers and clusters to which each user belongs (a user can belong to one or multiple clusters). After obtaining the user’s cluster, there are two ways. One is to make real-time CF in the cluster according to the real-time click log, that is, to push the clicked news to each other in the cluster. Another approach is to calculate the similarity of each cluster center and candidate news regularly offline, and then sum to the candidate set of each cluster. From the experimental results, the effect of real-time CF in cluster is better.

Group portrait recall is to first classify users into groups, then extract all user portraits in the same group, and then fuse them into a group portrait, which is equivalent to synthesizing the group of people into one person, and then use personalized recall similar to that of a single user portrait for the group portrait.

LSTM clustering is similar to cluster recall, but the user’s vector is sent into LSTM to get the user’s vector by the Bert vector (tag2VEc vector is also available) of m articles that the user recently clicked on. The remaining steps are similar to cluster recall. This algorithm has some improvement, but the calculation speed is slow and it is difficult to spread.

DSSM clustering is to send the user portrait to DSSM to obtain a 64-dimensional vector of the user, and send the article portrait to DSSM to obtain a 64-dimensional vector of the article. The remaining steps are similar to cluster recall. The algorithm has improved significantly and has been used in the shop.

BNB clustering refers to integrating the embedding (tag, topic, cat) of multiple features of articles into a vector, which is similar to the vector of articles. The remaining steps are similar to cluster recall, and the algorithm has some improvement, which is not very significant.

And then there is an opportunity to add more information about incremental clustering and dynamic rule clustering…

2.2.2 I2I recall scheme

Just use fasttext + faiss recall algorithm can achieve several road, such as: iten2vec, media2vec, tag2vec, loc2vec, title2vec.

Tag2vec is the use of word vector to do recall, such as the tag vector can be used to express the article vector, if an article has four tags (keywords: “Jiang Fan; Divorce; Zhang Dayi; Our experience is to take the first 3 tags, do equal weight vector addition, the effect is the best. Of course, it’s not the only way. There are many uses for embedding vector, such as equal weight addition, weighted addition, average and maximum.

After obtaining the article vector, it is the typical calculation process of Item2item. Faiss is used to calculate the similar articles of each article. For example, after 1000 candidate articles are searched for each article, a truncation is made according to the similarity. Using other features of the article such as heat, CTR and freshness to make a weighting, the simplest tag2VEC recall along the way is born.

The other recall is similar to this routine, except that the embedding vector is trained with a slight difference. Tag2vec is the vector for training Chinese words, item2vec is the vector for training article ID (AID), media2vec is the vector for training article author ID (mid), loc2vec is the vector for training region name. Title2vec is the title vector of the article trained by LSTM, and DOC2VEc is the vector of the text (or abstract) of the article calculated by Bert. Entity2vec is derived from transE using a knowledge graph we built ourselves

3. Some thoughts

The recommendation system itself is strongly related to the business of the enterprise, especially the recall part. Therefore, as long as the recall strategy meets the business requirements, it is reasonable. Therefore, I feel that talking about recall without business actually feels like “talking about it”. Of course, these recall strategies must also have certain guiding significance for practical work. We can learn from these ideas and combine them with actual business, and we will surely gain good results.

Finally: because of my knowledge and vision of the lack, this article has deficiencies, sincerely accept criticism!

Reference:

  1. Deep Learning Recommendation System by Wang Zhe
  2. Summary of recommendation System Embedding Technology Practice written by Tencent Technology Practice