Cognitive prevalence
Popularity (Popularity)
Popularity of content, also known as heat, most commonly by recommending the hottest content in the list to users (weibo hot searches, TopN products)
Popularity-based recommendation is a recommendation model generated around popularity-based calculation (not only TopN)
Solve the problem of cold start => the algorithm to recommend goods according to popularity, that is, what content appeals to users, what content to recommend to users
Popularity is a way to measure the popularity of commodities. Whether it is effective for the recommendation results needs specific analysis
Prevalence analysis in MovieLens
The ratio of 5 to 1 is small, but it is the most valuable to the system
(Youtube points 5 and 1 are the most valuable)
A small percentage of items with high popularity
Low popularity item scores vary greatly (less popular => judge by your own preferences)
Items with high popularity have a small difference in scoring (the more popular => the group’s preference has a large influence)
Understanding trends in prevalence from data sets:
MovieLens dataset
Movies with high, middle and low ratings showed similar trends over time
The change trend of the score over time is to increase first and then decrease
=> The trend of scoring changes with time is very important
Items with high popularity have less fluctuation in rating, whereas items with low popularity have more fluctuation in rating
=> User’s herd mentality
Popularity-based recommendations (cold start, personalized recommendations)
Cold start problem
• Use non-personalized recommendations when user behavior information is insufficient
• The nature of the algorithm, what appeals to users, and what to recommend to users
• It also needs to be representative and differentiating, i.e. not too generic or all-age => unable to distinguish between users’ interests
• Diversity, many possibilities of user interest, to match the diversity of interest => provide a high coverage set of startup items (items that cover mainstream user interest)
• MovieLens, open data sets
• Crowdsourced data sets at Crowdsourced sites without disclosing common biases
• Matthew effect => is not evident in Crowdsourced MovieLens
• The Long tail => universal. The user’s niche needs add up to something very important
How do you use popularity
Popularity is to solve the problem of insufficient user information when new users register in the early stage. We cannot provide a good recommendation service and recommend hot users from the popular trend. Solving cold start and data sparsity is the biggest direction of recommendation system. On the one hand, it starts from the characteristics of recommendation system itself.
1. For new users, non-personalized recommendation (popularity-based recommendation) is adopted. 2
In addition, we need to take into account the characteristics of different websites, different websites have different goals, to be intended to solve different problems.
· E-commerce websites, such as Vipshop for sale, aim to create popular products
· Dating websites, such as Jiayuan.com, aim to get more people active
Recommend the architecture of the system
• In the online part, the recall phase will be 1 million items =>1000, too many items can also be coarse row. The sorting phase uses a relatively complex model to sort a small number of items. It also aggregates some business policies (remove user-read, commercial ads) before presenting them to users.
• Near line part: collect user behavior feedback in real time, select training instances, extract features => update the online recommendation model
• In the offline part, organize the offline training data => periodically update the recommendation model