Popular recommendation is to recommend popular items to users, and the recommendation results of each user are almost the same. It does not meet personalized recommendation and has low coverage, which is not conducive to mining long-tail information. But it can be used on a cold start.
Taking movie rating data as an example, the steps are as follows:
- (1) Calculate the rating weight of each film;
- (2) Recommend the top movies that users have never seen.
In Step (1), the score weight calculation formula of IMDB website is adopted: Weighted rating = [v/(v+m)] * R + [m/(v+m)] * C M stands for the minimum number of ratings (85% to 95% percentile of the total number of ratings for all movies, and only when the number of ratings for a movie exceeds this value, its rating is credible), R stands for the average rating of the movie, and C stands for the average of the average rating for all movies.Copy the code
Code implementation:
import os
import random
from operator import itemgetter
from utils.movielen_read import loadfile
base_path = os.path.dirname(os.path.abspath(__file__)) + "/.. /.. /data/"
def load_data(train_rate=1):
train_items_ratings = {}
test_data = {}
for line in loadfile(base_path + "ml-1m/ratings.dat"):
arr = line.split("... "")
if random.random() < train_rate:
train_items_ratings.setdefault(arr[1], set())
train_items_ratings[arr[1]].add(arr[2])
else:
test_data.setdefault(arr[0], set())
test_data[arr[0]].add(arr[1])
return train_items_ratings, test_data
def calculate_weigh(items_ratings):
""" Calculate the weight """
items_weigh = {}
# Average score and number of scores for each item
items_means = {}
sum_rating = 0
sum_rating_times = 0
for item, values in items_ratings.items():
items_means.setdefault(item, list())
sum = 0
for value in values:
sum += int(value)
mean_rating = sum / len(values)
sum_rating += mean_rating
sum_rating_times += len(values)
items_means[item].append(mean_rating)
items_means[item].append(len(values))
all_mean_rating = sum_rating / len(items_ratings)
all_mean_rating_time = sum_rating_times / len(items_ratings)
for item, values in items_means.items():
weigh = (values[1] / (values[1] + all_mean_rating_time)) * values[0] + \
(all_mean_rating / (values[1] + all_mean_rating_time)) * all_mean_rating
items_weigh.setdefault(item, weigh)
items_weigh = sorted(items_weigh.items(), key=itemgetter(1), reverse=True)
print(items_weigh)
Copy the code
- github