This is the fourth day of my participation in the November Gwen Challenge. Check out the details: The last Gwen Challenge 2021
In the recommendation system, the final recall of candidate pool data, there is an important processing, that is, the hash of different data types in the candidate list. For example, the short videos recommended in the Tiktok Feed Feed, if not broken up to a certain extent, are likely to see the same type piled on top of each other continuously, with no sense of hierarchy.
Scatter is an important data processing logic of the recommendation system, and the most important means of realizing the recommendation system to avoid data clustering. This paper introduces the simplest polling algorithm in the recommendation algorithm — single dimension polling.
Single-dimension polling is mainly used to classify data according to a certain Angle. For example, in short video recommendation service, videos of the same author cannot appear next to each other.
Here are some of the rules I’ve created to make sure that the list of videos I return doesn’t have the author’s work reappearing next to each other.
Short video scatter rules
The basic rule
- Each video corresponds to an author
- Videos by the same author are not contiguous
- Can control the number of videos displayed at a time
Video cursor rules
- The cursor is the video ID, starting from scratch by default. The video cursor can retrieve a video that you haven’t seen yet every time you retrieve a video. Can solve the paging query video, there will be new video, old video is squeezed to the next page, resulting in repeated video problems.
Fixed bit video insertion rules
- The identifier controls whether a fixed video is displayed
- The fixed video can be anywhere in the list, not at the top
Basic rule diagram
- The initial find out
Video list
The following
- Find out
The relationship between the video and the author
Initial result list
, the list size is determined by the parametersize
control
- When adding a video for the first time, the video ID is displayed
Result list
Add the author ID toAuthor's temporary list
In the
- The second time I add the video,
Author's temporary list
If there is the same author, do not add, save the video ID toQueue to Be added
In the
- For the third time,
Author's temporary list
If there is no same author in
- After loop once, clear
Author's temporary list
, takeQueue to Be added
As a new video list, continue with the above logic and add the video toResult list
Until theQueue to Be added
No data, exit recursion, return to endResult list
.
Fixed video rule illustration
- Fixed video needs to be again
Initial result list
In the corresponding index positionVideo ID
Author's temporary list
Put author information in
- Add video logic Add a function to check whether there is a video in the current index. Combined with the top
Videos of the same author are not contiguous
The rules,16
I think the next video will be3
And recordAuthor's temporary list
Put the author inB
, the following figure
- The first time through, only 3 and 5 go
Result list
That’s it. The rest of itQueue to Be added
, the results are as follows
- Continue the basic rule loop until
Result list
Full no longer add, final result as shown
summary
This article illustrates a simple custom scatter rule by drawing diagrams to ensure a more hierarchical return to the video list. The specific code implementation will be given later.
reference
- Scatter Algorithm of Recommendation System — Polling (Classification bucket)
- A preliminary study on the algorithm model of information flow dispersion in recommendation system