Machine Learning 033- Building movie recommendation Systems
(Python libraries and versions used in this article: Python 3.6, Numpy 1.14, Scikit-learn 0.19, matplotlib 2.2)
The most critical internal component of the movie recommendation system is the recommendation engine. Just like the engine of a car, the recommendation engine is used to generate data power and provide data calculation schemes. In essence, a recommendation engine is a model that predicts a user’s point of interest. For different specific project requirements, recommendation engines are not the same, this paper mainly introduces the recommendation engine construction method specially used for movie recommendation system.
Recommendation engines are very important. For example, in e-commerce websites, there are often huge catalogs of commodities, and users are unlikely to find all relevant contents. At this time, recommendation engines are needed to build appropriate recommendation systems and recommend commodities that users may be interested in to user pages. We often see in shopping, you click on the page of the laptop, the system will recommend to you the mouse, keyboard and other goods, its internal is the use of recommendation engine.
1. Find similar users in the data set
A very important task for recommendation engines is to find similar users, so that recommendations generated for one user can be pushed to other users who are similar to them.
The following code is used to find other users similar to a particular user, using the Pearson correlation coefficient calculation function from the previous article. The idea of the following code is as follows: first determine whether the user exists in the dataset, then calculate the correlation coefficients of the user to all other users, and place them in a list, then reverse the list, take the first K users, and then find the K users most similar to the user.
def find_similar_users(dataset, user, user_num=3):
if user not in dataset: Verify that the user is in the dataset first
raise TypeError('User {} not in dataset! '.format(user))
For all users, calculate their similarity to user, using Pearson correlation here
scores=np.array([[other_user,pearson_score(dataset,user,other_user)] for
other_user in dataset ifother_user! =user])# Correlation is stored in the two-dimensional matrix scores, so similar users can be searched through sorting
scores_sorted=np.argsort(scores[:,1[]) : :- 1] Sort first to get coordinates, then reverse order
User_num = user_num
top_users=scores_sorted[:user_num]
return scores[top_users] Return information about these users
Copy the code
After importing movie_ratings data, the four users most similar to John Carson are calculated as follows:
# Use movie data to find similar users
import json
with open("E:\PyProjects\DataSet\FireAI\movie_ratings.json".'r') as file:
dataset=json.loads(file.read())
user='John Carson'
similar_users=find_similar_users(dataset, user, 4)
print('Users similar to {}---->>>'.format(user))
print('User\t\t\tSimilarity Score\n')
for item in similar_users:
print('{}\t\t{}'.format(item[0],round(float(item[1]),3)))
Copy the code
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – — – a — — — — — — — — — — — — — — — — — —
Users similar to John Carson—->>> User Similarity Score
Michael Henry 0.991 Alex Roberts 0.747 Melissa Jones 0.594 Jillian Hobart 0.567
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – — — — — — — — — — — — — — — — –
2. Build a movie recommendation engine
Suppose we now have multiple user ratings for certain movies, how do we build a movie recommendation engine? Recommend other relevant movies to users who have seen some of them?
This movie rating data is stored in the movie_ratings.json file, which has the user name as the first layer and the movie name and the user’s rating for the movie as the second layer. Recommend user A film of the inner logic is that first find A higher similarity with users of multiple users, and then find out the similar users have been scoring but did not score A movie set, the film says users A haven’t seen but other similar users have seen, we recommend the movie from the selection of the film in the collection, so how to choose? A selection criterion needs to be constructed. At this point, we can calculate the movie recommendation score, which is obtained by multiplying the similarity and the movie evaluation score. It can be considered that users with higher similarity have recommended more movies they have seen, and the higher the rating of these users, the higher the quality of the movies, and the more recommended to user A.
Based on the above logic, the code is as follows:
# Build a movie recommendation engine
def get_recommendations(dataset,user):
if user not in dataset: Verify that the user is in the dataset first
raise TypeError('User {} not in dataset! '.format(user))
total_scores={} # Store key = movie name, value = movie rating times similarity
similarity_sums={} # store key for movie name and value for similarity
for other_user in dataset:
if other_user ==user: continue # Make sure it's another user, not yourself
similarity_score=pearson_score(dataset,user, other_user)
# print('other user: ', other_user, 'similarity: ', similarity_score)
if similarity_score<=0: continue If the similarity is too small, ignore it
Other_user = other_user = other_user = other_user
# this part of the movie represents a similar movie that other_ser has seen but user has not.
# Recommended movies must come from this section
user_not_rating_movies=[]
for movie in dataset[other_user]: # other_user Rated movies
# if movie not in dataset[user] or dataset[user][movie]==0:
if movie not in dataset[user]:
# user is not rated, or user is rated 0.
user_not_rating_movies.append(movie)
# print(user_not_rating_movies)
# Calculate the recommended score of these movies that user hasn't rated,
This is expressed by multiplying the movie's rating by its similarity
for movie in user_not_rating_movies:
recommend_score=dataset[other_user][movie]*similarity_score
total_scores.update({movie: recommend_score})
similarity_sums.update({movie: similarity_score})
# print('other user: ', other_user, 'total_scores: ', total_scores)
If the total number of recommended movies is 0, indicating that all movies have been evaluated by user, then it is not recommended
if len(total_scores) ==0: return [[0.'No Recommendations']]
# Calculate the recommendation rating for each movie
movie_ranks=np.array([[rec_score/similarity_sums[movie],movie] for
movie, rec_score in total_scores.items()])
Reverse order the first column
movie_ranks_desc=movie_ranks[np.argsort(movie_ranks[:,0[]) : :- 1]]
# print(movie_ranks_desc)
return movie_ranks_desc
Copy the code
Finally, calculate the list of movies recommended by several users, as follows:
import json
with open("E:\PyProjects\DataSet\FireAI\movie_ratings.json".'r') as file:
dataset=json.loads(file.read())
user='John Carson'
movie_ranks=get_recommendations(dataset,user)
print('Recommended movies to {}---->>>'.format(user))
for idx, recommend in enumerate(movie_ranks):
print('{}: {}-->recommend score: {}'.format(idx, recommend[1], recommend[0]))
user='Michael Henry'
movie_ranks=get_recommendations(dataset,user)
print('Recommended movies to {}---->>>'.format(user))
for idx, recommend in enumerate(movie_ranks):
print('{}: {}-->recommend score: {}'.format(idx, recommend[1], recommend[0]))
Copy the code
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – — – a — — — — — — — — — — — — — — — — — —
Recommended movies to John Carson—->>> 0: No Recommendations–>recommend score: 0 Recommended movies to Michael Henry—->>> 0: Jerry Maguire–>recommend score: 3.0 1: Inception–>recommend score: 3.0 2: Anger Management–> Recommend score: 2.0
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – — — — — — — — — — — — — — — —
# # # # # # # # # # # # # # # # # # # # # # # # small * * * * * * * * * * and # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
1. Generally, there are several steps to build A movie recommendation system: first, find out the similar users, and then find out the movies that similar users have seen while user A has not. Finally, establish A recommendation score for these movies, and the higher the score, the more worthy of recommendation.
2. The key is to understand the recommendation logic and establish recommendation algorithms, which may vary with the actual application scenarios.
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Note: This part of the code has been uploaded to (my Github), welcome to download.
References:
1, Classic Examples of Python machine learning, by Prateek Joshi, translated by Tao Junjie and Chen Xiaoli