This is the fourth day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021

In the recommendation system, the final recall of candidate pool data, there is an important processing, that is, the hash of different data types in the candidate list. For example, the short videos recommended in the Tiktok Feed Feed, if not broken up to a certain extent, are likely to see the same type piled on top of each other continuously, with no sense of hierarchy.

Scatter is an important data processing logic of the recommendation system, and the most important means of realizing the recommendation system to avoid data clustering. This paper introduces the simplest polling algorithm in the recommendation algorithm — single dimension polling.

Single-dimension polling is mainly used to classify data according to a certain Angle. For example, in short video recommendation service, videos of the same author cannot appear next to each other.

Here are some of the rules I’ve created to make sure that the list of videos I return doesn’t have the author’s work reappearing next to each other.

Short video scatter rules

The basic rule

  • Each video corresponds to an author
  • Videos by the same author are not contiguous
  • Can control the number of videos displayed at a time

Video cursor rules

  • The cursor is the video ID, starting from scratch by default. The video cursor can retrieve a video that you haven’t seen yet every time you retrieve a video. Can solve the paging query video, there will be new video, old video is squeezed to the next page, resulting in repeated video problems.

Fixed bit video insertion rules

  • The identifier controls whether a fixed video is displayed
  • The fixed video can be anywhere in the list, not at the top

Basic rule diagram

  • The initial find outVideo listThe following

  • Find outThe relationship between the video and the author

  • Initial result list, the list size is determined by the parametersizecontrol

  • When adding a video for the first time, the video ID is displayedResult listAdd the author ID toAuthor's temporary listIn the

  • The second time I add the video,Author's temporary listIf there is the same author, do not add, save the video ID toQueue to Be addedIn the

  • For the third time,Author's temporary listIf there is no same author in

  • After loop once, clearAuthor's temporary list, takeQueue to Be addedAs a new video list, continue with the above logic and add the video toResult listUntil theQueue to Be addedNo data, exit recursion, return to endResult list.

Fixed video rule illustration

  • Fixed video needs to be againInitial result listIn the corresponding index positionVideo ID

  • Author's temporary listPut author information in

  • Add video logic Add a function to check whether there is a video in the current index. Combined with the topVideos of the same author are not contiguousThe rules,16I think the next video will be3And recordAuthor's temporary listPut the author inB, the following figure

  • The first time through, only 3 and 5 goResult listThat’s it. The rest of itQueue to Be added, the results are as follows

  • Continue the basic rule loop untilResult listFull no longer add, final result as shown

summary

This article illustrates a simple custom scatter rule by drawing diagrams to ensure a more hierarchical return to the video list. The specific code implementation will be given later.

reference

  • Scatter Algorithm of Recommendation System — Polling (Classification bucket)
  • A preliminary study on the algorithm model of information flow dispersion in recommendation system