(Star Python developers to improve Python skills)
By Pulkit Sharma; Shen Libin, Fu Yushuai
Introduction to the
Everyone in today’s society is faced with a variety of choices. For example, if I’m aimlessly looking for a book to read, a lot of possibilities open up about how I search. As a result, I may waste a lot of time surfing the Internet and searching various websites in the hope of finding valuable books. This is when I might be looking for recommendations.
If there’s a website or app that recommends new books to me based on what I’ve already read, that would definitely help. At this time, I will have the following pleasant experience: log on to the website, and I can see the 10 books that meet my interests. I don’t have to waste time searching the website.
This is what recommendation engines do, and their power is now being harnessed by most businesses. From Amazon to Netflix, Google to Google Reading, recommendation engines are one of the most widespread applications of machine learning technology.
In this article, you’ll look at the various recommendation engine algorithms and the basic framework for building them in Python. We’ll also discuss the mathematics behind how these algorithms work, and finally use matrix factorization techniques to create our own recommendation engine.
What is a recommendation engine?
Until now, people have tended to buy products recommended by friends or people they trust. This is what people tend to do when they have any questions about a product. But with the advent of the digital age, the circle has expanded to include online sites that use some kind of recommendation engine.
A recommendation engine uses different algorithms to filter the data and recommend the most relevant items to the user. It first stores data on customers’ past behavior and then recommends items they might buy based on that data.
If a brand new user visits an e-commerce site, the site does not have any historical data about that user. So how does a website recommend products to users in such a scenario? One possible approach is to recommend the best selling item to the customer, which is in high demand. Another possible approach is to recommend products that will bring the most money to the site.
If we can recommend some products to users based on their needs and interests, this can have a positive impact on the user experience and ultimately achieve the effect of multiple visits. As a result, today’s enterprises build smart and intelligent recommendation engines by studying data about users’ past behavior.
Now that we have an intuitive view of recommendation engines, let’s take a look at how they work.
How does the recommendation engine work?
Before diving into this topic, let’s first consider how to recommend products to users:
-
We can recommend the most popular products to a user
-
Users can be divided into multiple sub-categories based on their preferences (user characteristics) and then recommended products based on the category they belong to.
Both approaches have drawbacks. In the first approach, the most popular items for each user are the same, so users see the same recommendations. In the second approach, as the number of users increases, so do the characteristics of the users. Therefore, dividing users into multiple categories can be a very difficult task.
The main problem here is that we can’t customize recommendations for users’ specific interests. It’s like Amazon suggesting you buy a laptop simply because it’s bought by a majority of shoppers. Fortunately, Amazon (or any other big company) doesn’t use this method to recommend products. They use some personalized methods to help them recommend products more accurately.
Let’s now look at how the recommendation engine works through the following steps.
2.1 Data Collection
Collecting data is the first and most critical step in building a recommendation engine. Data can be collected in two ways: explicit and implicit. Display data is information provided intentionally by users, such as movie rankings, while Hidden data is information collected from data streams, such as search history, click rate and historical orders, rather than voluntarily provided by users.
In the image above, Netflix is explicitly collecting data in the form of user ratings for different movies.
In the figure above, you can see Amazon’s record of customer history orders, which is an example of an implicit data collection pattern.
2.2 Data Storage
The amount of data determines how good the model’s recommendations are, for example, in a movie recommendation system, the more users comment on the movie, the better the recommendation will be to other users. Data types have an important impact on the type of storage used, which can include a standard SQL database, a NoSQL database, or some type of object store.
2.3 Data Filtering
After the data is collected and stored, it must be filtered to extract the relevant information needed for the final recommendation.
There are various algorithms that can help simplify the filtering process. In the next section, we’ll look at each algorithm in detail.
2.3.1 Content-based Filtering
The algorithm recommends products similar to what users have liked in the past.
Photo credit: Medium
For example, if a user likes Inception, the algorithm will recommend movies in the same genre. But how do algorithms understand the types of movies to choose and recommend?
Take Netflix: they store all the information relevant to each user in vector form. This vector contains the user’s past behavior, i.e. the movies they liked/disliked and the ratings they gave. This vector is also known as a profile vector. All movie-related information is stored in another thing called an item vector. The project vector contains details about each film, such as genre, actor, director, and so on.
Content-based filtering algorithm finds the cosine of the Angle between contour vector and item vector, that is, cosine similarity. Assuming that A is the contour vector and B is the project vector, the similarity between them can be calculated by the following formula:
Movies can be ranked in descending order based on the cosine between -1 and 1, and recommended using one of the following two methods:
-
Pick the top N movies: Recommend the top N movies that are most relevant (N can be decided by the company).
-
Method of rating scale: Set a threshold and recommend all movies that exceed this threshold.
Other methods that can be used to calculate similarity are:
-
Euclidean distance: If drawn in n-dimensional space, similar entities will be close to each other. So we can calculate the distance between entities and recommend content to users based on that distance. Here is the Euclidean distance formula:
-
Pearson correlation: It tells us how related two entities are, and the higher the correlation, the more similar they are. Pearson’s correlation can be calculated using the following formula:
One major drawback of this algorithm is that it is limited to recommending entities of the same type. It will never recommend products that users have not bought or liked in the past. Therefore, if the user only watched or liked action movies in the past, the system will only recommend action movies. Obviously, this method of building recommendation engine has poor generalization performance.
We need to find an algorithm that can improve this type of recommendation system by not only making recommendations based on content, but also taking advantage of user behavior information.
2.3.2 Collaborative filtering
Let’s use an example to understand this approach. If user A likes three movies, such as Interstellar, Inception, and Destination, and user B likes Inception, Destination, and the Prestige, they have similar interests. We can safely say that A should like The Prestige and B should like Interstellar. Collaborative filtering algorithms use “user behavior” to recommend movies. This is one of the most commonly used algorithms in industry because it does not rely on any additional information. There are many different types of collaborative filtering technologies, which we discuss in detail below.
Collaborative filtering between users
The algorithm first finds a similarity score between users, and based on this similarity score, it picks out the most similar users and recommends products that those similar users have previously liked or bought.
Photo credit: Medium
In the case of our previous movie example, the algorithm finds similarities between each user based on their previous ratings for different movies. The prediction of an entity for user U is calculated by calculating the weighted sum of user ratings for an entity I by other users. Pu, I can be calculated by the following formula:
The symbols of the formula are as follows:
-
Pu, I is a solid forecast
-
Rv, I is user V’s rating of movie I
-
Su, v Similarity score between users
Now, we have rated the users in the contour vector, and based on this vector, we are going to predict the ratings of other users. The next steps are as follows:
-
For prediction, we need similarity between users U and V. This is where Pearson correlation can be used.
-
First, we found items rated by users and, based on ratings, calculated relevance among users.
-
Predictions can be calculated using similitude values. The algorithm first calculates the similarity between each user and then calculates the predicted value based on each similarity. Highly relevant users are generally similar.
-
Recommendations are made based on these predictions. To understand it, let’s look at an example:
User-movie Rating Matrix:
We can see A user-movie rating matrix. To understand this formula in more depth, let’s find similarities between users (A, C) and (B, C) in the table above. Films A and C are rated x2 and X4, and films B and C are rated x2, X4 and X5.
The correlation between users A and C is greater than that between users B and C. Therefore, users A and C have more similarities, and user A’s favorite movies will be recommended to user C, and vice versa.
This algorithm is very time consuming because it involves calculating the similarity of each user and then calculating the prediction of each similarity score. One way to solve this problem is to select only a few users (neighbors) instead of predicting all values, that is, we select only a few similar values instead of predicting all similar values:
-
Select a similarity threshold and select all users above that value
-
Randomly select users
-
Rank adjacent users in descending order of similarity value, and select the first n users
-
Use clustering algorithm to select adjacent users
This algorithm works well when the number of users is small. It is not effective when there are a large number of users because it takes a lot of time to calculate the similarity between all user pairs. This results in item-item collaborative filtering, which is very effective when the number of users far exceeds the number of recommended items.
Item – item collaborative filtering
In this algorithm, we calculate the similarity between each pair of goods.
Photo credit: Medium
So in our case, we’ll find similarities between each movie pair, and based on that, we can recommend similar movies that users have liked in the past. The algorithm works in a similar way to user-user collaborative filtering, with a small change — instead of weighting the ratings of “adjacent users”, it weights the ratings of “adjacent goods”. The prediction formula is as follows:
We calculate the similarity between goods:
Now we have similarities, ratings and predictions for each pair of movies, and based on those predictions, we can recommend similar movies. Let’s look at an example:
The movie rating mean here is the average of all the movie ratings for a particular movie (compare it to the table we saw in user-user filtering). And instead of finding user-user similarity as we saw earlier, we find item-item similarity.
To do this, we first need to find the users who rate these items and calculate the similarities between the items based on the ratings. Let’s find out the similarities between movies (x1, x4) and movies (x1, x5). As can be seen from the above table, users A and B have scored both movies X1 and X4, and users A and B have scored both movies x1 and X5.
Movies X1 and X4 are more similar than movies X1 and X5, and based on these similarity values, if any user searches for movie X1, movie X4 will be recommended for them, and vice versa. Before we can take these concepts any further, there is a question we must answer — what happens if a new user or movie is added to the dataset? This is called cold start, and it comes in two types:
-
User cold start
-
Product cold start
A cold start means that a new user is added to the database, and since there is no history of the user, the system does not know the user’s preferences, so it is difficult to recommend products to the user. So how do we solve this problem? One basic approach is to adopt a population-based strategy, which recommends the most popular products. These can be determined by recent trends, and it will be easier to recommend products once we know what users like.
On the other hand, a product cold start means that a new product is put on the market or added to the system. User behavior is important in determining the value of any product. The more interactions the product accepts, the easier it is for our model to recommend the product to the right users. We can solve this problem with content-based filtering. The system recommends the content of the new product first, but ultimately the user recommends the interaction of the product.
Now let’s use a case study in Python to solidify our understanding of these concepts. This is an interesting example, so get your computer up and get started.
3 Learning python examples based on MovieLens data set
We will use this MovieLens dataset to build a model and recommend movies to the end user. That data has been collected by the GroupLens research project at the University of Minnesota. The dataset can be downloaded here:
This data set contains:
-
1,682 films watched 100,000 times by 943 viewers (1-5)
-
Demographic information of users (age, gender, occupation, etc.)
First, we’ll import the standard library and read the data into Python:
import pandas as pd
% matplotlib inline
import matplotlib
import matplotlib . pyplot as plt
import numpy as np
# pass in column names for each CSV as the column name is not given in the file and read them using pandas.
# You can check the column names from the readme file
#Reading users file:
u_cols = [ ‘user_id’ , ‘age’ , ‘sex’ , ‘occupation’ , ‘zip_code’ ]
users = pd . read_csv ( ‘ml-100k/u.user’ , sep = ‘|’ , names = u_cols , encoding = ‘latin-1’ )
#Reading ratings file:
r_cols = [ ‘user_id’ , ‘movie_id’ , ‘rating’ , ‘unix_timestamp’ ]
ratings = pd . read_csv ( ‘ml-100k/u.data’ , sep = ‘\t’ , names = r_cols , encoding = ‘latin-1’ )
#Reading items file:
i_cols = [ ‘movie id’ , ‘movie title’ , ‘release date’ , ‘video release date’ , ‘IMDb URL’ , ‘unknown’ , ‘Action’ , ‘Adventure’ ,
‘Animation’ , ‘Children\’s’ , ‘Comedy’ , ‘Crime’ , ‘Documentary’ , ‘Drama’ , ‘Fantasy’ ,
‘Film-Noir’ , ‘Horror’ , ‘Musical’ , ‘Mystery’ , ‘Romance’ , ‘Sci-Fi’ , ‘Thriller’ , ‘War’ , ‘Western’ ]
items = pd . read_csv ( ‘ml-100k/u.item’ , sep = ‘|’ , names = i_cols ,
encoding = ‘latin-1’ )
After loading the dataset, we should look at the contents of each file (users, ratings, movies) :
-
The user
print ( users . shape )
users . head ()
Thus, we can see that there are 943 users in the data set, and each user has five characteristics, namely user ID, age, gender, occupation, and zip code. Now let’s look at the rating file.
-
score
print ( ratings . shape )
ratings . head ()
We have 100K movie ratings for different users and movie combinations. Now finally check the movie file.
-
The movie
print ( items . shape )
items . head ()
The dataset contains attributes for 1682 movies, with 24 columns, the last 19 of which specify the specific movie genre. These are binary columns, that is, a value of 1 indicates that the movie belongs to that type, otherwise 0.
GroupLens has divided the data set into train and test, with 10 levels of test data for each user, totaling 9430 rows. We next read these files into the Python environment.
r_cols = [ ‘user_id’ , ‘movie_id’ , ‘rating’ , ‘unix_timestamp’ ]
ratings_train = pd . read_csv ( ‘ml-100k/ua.base’ , sep = ‘\t’ , names = r_cols , encoding = ‘latin-1’ )
ratings_test = pd . read_csv ( ‘ml-100k/ua.test’ , sep = ‘\t’ , names = r_cols , encoding = ‘latin-1’ )
ratings_train . shape , ratings_test . shape
Now it’s finally time to build our recommendation engine!
4. Build the collaborative filtering model from 0
We will recommend movies based on user-user similarity and movie-movie similarity. To do this, we first need to count the number of unique users and movies.
n_users = ratings . user_id . unique (). shape [ 0 ]
n_items = ratings . movie_id . unique (). shape [ 0 ]
Now we will create a user movie matrix that can be used to calculate the similarity between the user and the movie.
data_matrix = np . zeros (( n_users , n_items ))
for line in ratings . itertuples () :
data_matrix [ line [ 1 ] – 1 , line [ 2 ] – 1 ] = line [ 3 ]
Now, let’s calculate the similarity. We can use skLearn’s pairwise_distance function to calculate cosine similarity.
from sklearn . metrics . pairwise import pairwise_distances
user_similarity = pairwise_distances ( data_matrix , metric = ‘cosine’ )
item_similarity = pairwise_distances ( data_matrix . T , metric = ‘cosine’ )
This gives the item-item and user-user similarity in the array form. The next step is to make a prediction based on these similar values. Let’s define a function to make this prediction.
def predict ( ratings , similarity , type = ‘user’ ) :
if type == ‘user’ :
mean_user_rating = ratings . mean ( axis = 1 )
#We use np.newaxis so that mean_user_rating has same format as ratings
ratings_diff = ( ratings – mean_user_rating [ : , np . newaxis ])
pred = mean_user_rating [ : , np . newaxis ] + similarity . dot ( ratings_diff ) / np . array ([ np . abs ( similarity ). sum ( axis = 1 )]). T
elif type == ‘item’ :
pred = ratings . dot ( similarity ) / np . array ([ np . abs ( similarity ). sum ( axis = 1 )])
return pred
Finally, we will make predictions based on user similarity and movie similarity.
user_prediction = predict ( data_matrix , user_similarity , type = ‘user’ )
item_prediction = predict ( data_matrix , item_similarity , type = ‘item’ )
As it turns out, we also have a library that automatically generates all of these recommendations. Now let’s learn how to create a recommendation engine using Turicreate in Python. To familiarize yourself with Turicreate and install it on your computer, see here:
https://github.com/apple/turicreate/blob/master/README.md
5. Use Turicreate to build a simple and popular collaborative filtering model
After installing the Turicreate library, we first import it and then read the training and test data sets in our environment.
import turicreat e
train_data = turicreate . SFrame ( ratings_train )
test_data = turicreate . Sframe ( ratings_test )
We have user behavior, but also user and movie properties, so we can make content-based and collaborative filtering algorithms. We’ll start with a simple popular model and build a collaborative filtering model.
First, we built a model of recommending the most popular movies to users, so that all users would receive the same recommendations. This can be done using the popularity_Recommender recommendation function in Turicreate.
popularity_model = turicreate.popularity_recommender.create(train_data,user_id=’user_id’, item_id=’movie_id’, target=’rating’)
The various variables we use are:
-
Train_data: SFrame contains the training data we need
-
User_id: This column contains the ID of each user
-
Item_id: This column contains each movie to be recommended (movie ID)
-
Target: This column contains the rating or rating given by the user
It’s time to predict! We will recommend the top 5 movies for the top 5 users in our data set.
popularity_recomm = popularity_model . recommend ( users = [ 1 , 2 , 3 , 4 , 5 ], k = 5 )
popularity_recomm . print_rows ( num_rows = 25 )
Note that the recommendations are the same for all users — 1467, 1201, 1189, 1122, 814. They’re in the same order! This confirms that the average rating for all recommended movies is 5 out of 10, meaning that all users who watched the movie gave it the highest rating. So our population-based system performed as we expected.
Having built the popular model, we will now build a collaborative filtering model. Let’s train the movie similarity model and provide the top 5 recommendations for the top 5 users.
#Training the model
item_sim_model = turicreate . item_similarity_recommender . create ( train_data , user_id = ‘user_id’ , item_id = ‘movie_id’ , target = ‘rating’ , similarity_type = ‘cosine’ )
#Making recommendations
item_sim_recomm = item_sim_model . recommend ( users = [ 1 , 2 , 3 , 4 , 5 ], k = 5 )
item_sim_recomm . print_rows ( num_rows = 25 )
Here we can see that each user’s recommendation (movie ID) is different. We have different recommendation sets for different users, which means that personalization exists.
In this model, we don’t have every movie rating given by every user. We had to find a way to predict all these missing scores. To do this, we had to find a set of features that defined how users rated the movie. These are called latent features. We need to find a way to extract the most important potential features from existing ones. The next section introduces the matrix decomposition technique, which uses low-dimensional dense matrices to help us extract important potential features.
Introduction to matrix decomposition
We understand matrix factorization through an example. Consider the user movie rating matrix (1-5) given by different users for different movies.
The user IDS here are unique ids for different users, and each movie is assigned a unique ID. 0.0 indicates that the user has not rated a particular movie (1 is the lowest rating the user can give). We hope to predict these missing ratings, using matrix decomposition to find potential features that can determine how users rate a movie. We decompose the original matrix into its components so that the product of these parts is equal to the original matrix.
Let’s say we want to find k potential features. Therefore, we can divide our scoring matrix R(MxN) into P(MxK) and Q(NxK), such that P x QT(where QT is the transpose of the Q matrix) approximates the R matrix:
:
-
M is the total number of users
-
N is the total number of movies
-
K is the total potential feature
-
R is the MxN user movie rating matrix
-
P is MxK user feature association matrix, which represents the association between users and features
-
Q is the feature association matrix of NxK movies, representing the association between movies and features
-
Sigma is the K*K diagonal feature weight matrix, representing the important weight of the feature
The potential features are selected by matrix decomposition and the noise in the data is eliminated. How do you do that? It removes features that do not determine how users rate movies. Now to get user PUK’s rating RUI for all potential features K of a movie qik, we can compute the dot product of the two vectors and add them together to get a rating based on all potential features.
That’s how matrix decomposition gives us ratings for predicted movies, and those movies don’t get ratings from users. But how do we add new data to our user movie rating matrix, that is, if a new user joins and rates the movie, how do we add this data to the existing matrix?
I did matrix factorization to make it easier for you to understand. If a new user enters the system, the diagonal weight matrix and commodity – feature correlation matrix will not change, and the only change is in the user feature correlation matrix P. And we can do that with some matrix multiplication.
We have:
Multiply both sides by the matrix Q:
Now we have:
So:
Further simplification, P matrix is obtained:
This is the updated user feature association matrix. Similarly, if a new movie is added to the system, we can follow similar steps to get the updated movie feature correlation matrix Q.
We have to be conscious, even though we’ve decomposed the R matrix into P and Q, how do we decide which P and Q matrices are more approximate to the R matrix? We can do this with a gradient descent algorithm, the goal being to minimize the squared error between the actual score and the score evaluated using P and Q. The squared error formula is as follows:
-
Eui stands for error
-
Rui represents user U’s actual rating of movie I
-
ř UI indicates the ratings of the movie I predicted by user U
Our goal is to determine the p and Q values so that the error is minimized, so we need to update the p and Q values to get the optimized values for these matrices so that the error is minimized. Now we will define an update rule for PUK and QKI, and the update rule in gradient descent is defined by the error gradient to be minimized.
Since we now have gradients, we can apply update rules for PUK and QKI:
Alpha is the learning rate, which determines the size of each update. Repeat the above update process until the error is minimized, and finally we can get the optimal P and Q matrix, which can be used to predict the score. Let’s take a quick review of how this algorithm works, and then we’ll build a recommendation engine to predict ratings for unrated movies.
Here’s how the matrix factorization prediction score works:
# for f = 1,2,…. ,k :
# for rui ε R:
# predict rui
# update puk and qki
Based on each potential feature, all missing scores in the R-matrix are padded with predicted RUI values. Then puK and QKI are updated by gradient descent method to obtain their optimal values. The process is shown in the figure below:
Now that you understand the inner workings of this algorithm, let’s take an example of how to decompose a matrix into its constituent parts.
Use a 2×3 matrix, as shown below:
Here we have 2 users and corresponding ratings for 3 movies. Now, we break this matrix into subparts as follows:
The eigenvalues of AAT will give us the P matrix and the eigenvalues of ATA will give us the Q matrix, sigma is the square root of the eigenvalues of AAT or ATA.
Calculate the eigenvalues of AAT:
The eigenvalues of AAT are 25 and 9. Similarly, we can calculate the eigenvalues of ATA. These values are 25,9,0, and now we can calculate the eigenvectors corresponding to AAT and ATA.
The eigenvalue λ= 25, we have:
This can be simplified in line as:
The unit vector in the kernel of the matrix is:
Similarly,λ= 9 we have:
This can be simplified in line as:
The unit vector in the kernel of the matrix is:
For the last eigenvector, we can find a unit vector that is perpendicular to q1 and q2. So,
The sigma 2X3 matrix is the square root of the AAT or ATA eigenvalues 25 and 9:
Finally, I can calculate P2X2 by using the formula σ PI = Aqi or PI = 1/σ(Aqi) :
So the matrix decomposed by A matrix is as follows:
Now that we have the P and Q matrices, and we can use gradient descent to get an optimized version of them, let’s use matrix decomposition to build the recommendation engine.
Use matrix factorization to build a recommendation engine
We start by defining a function to predict a user’s rating of all the movies that he or she has not rated.
class MF () :
# Initializing the user-movie rating matrix, no. of latent features, alpha and beta.
def __init__ ( self , R , K , alpha , beta , iterations ) :
self . R = R
self . num_users , self . num_items = R . shape
self . K = K
self . alpha = alpha
self . beta = beta
self . iterations = iterations
# Initializing user-feature and movie-feature matrix
def train ( self ) :
self . P = np . random . normal ( scale = 1. / self . K , size = ( self . num_users , self . K ))
self . Q = np . random . normal ( scale = 1. / self . K , size = ( self . num_items , self . K )) # Initializing the bias terms
self . b_u = np . zeros ( self . num_users )
self . b_i = np . zeros ( self . num_items )
self . b = np . mean ( self . R [ np . where ( self . R != 0 )])
# List of training samples
self . samples = [
( i , j , self . R [ i , j ])
for i in range ( self . num_users )
for j in range ( self . num_items )
if self . R [ i , j ] > 0
]
# Stochastic gradient descent for given number of iterations
training_process = []
for i in range ( self . iterations ) :
np . random . shuffle ( self . samples )
self . sgd ()
mse = self . mse ()
training_process . append (( i , mse ))
if ( i + 1 ) % 20 == 0 :
print ( “Iteration: %d ; error = %.4f” % ( i + 1 , mse ))
return training_process
# Computing total mean squared error
def mse ( self ) :
xs , ys = self . R . nonzero ()
predicted = self . full_matrix ()
error = 0
for x , y in zip ( xs , ys ) :
error += pow ( self . R [ x , y ] – predicted [ x , y ], 2 )
return np . sqrt ( error )
# Stochastic gradient descent to get optimized P and Q matrix
def sgd ( self ) :
for i , j , r in self . samples :
prediction = self . get_rating ( i , j )
e = ( r – prediction )
self . b_u [ i ] += self . alpha * ( e – self . beta * self . b_u [ i ])
self . b_i [ j ] += self . alpha * ( e – self . beta * self . b_i [ j ])
self . P [ i , : ] += self . alpha * ( e * self . Q [ j , : ] – self . beta * self . P [ i , : ])
self . Q [ j , : ] += self . alpha * ( e * self . P [ i , : ] – self . beta * self . Q [ j , : ])
# Ratings for user i and moive j
def get_rating ( self , i , j ) :
prediction = self . b + self . b_u [ i ] + self . b_i [ j ] + self . P [ i , : ]. dot ( self . Q [ j , : ]. T )
return prediction
# Full user-movie rating matrix
def full_matrix ( self ) :
return mf . b + mf . b_u [ : , np . newaxis ] + mf . b_i [ np . newaxis : ,] + mf . P . dot ( mf . Q . T )
Now we have a function that can predict ratings. The input of this function is:
-
R-user-movie rating matrix
-
K- Number of potential features
-
Alpha- Random gradient descent learning rate
-
Beta- regularization parameter deviation
-
Iterations- The number of Iterations that perform random gradient descent
We must convert user-movie ratings to matrix form, using the Pivot function in Python to do the conversion.
R= np.array(ratings.pivot(index = ‘user_id’, columns =’movie_id’, val ues = ‘rating’ ). fillna ( 0 ))
Fillna (0) : all missing values are filled with 0. Now that we have our R matrix, we can initialize the number of potential features, but the number of features must be less than or equal to the number of original features.
Now we predict all the missing scores
Parameters are initialized to: K = 20, alpha = 0.001,
Beta = 0.01, iteration times = 100.
Mf = mf (R, K = 20, alpha = 0.001, beta = 0.01, iterations = 100)
training_process = mf . train ()
print ()
print ( “P x Q:” )
print ( mf . full_matrix ())
print ()
The following gives us the error values after every 20 iterations, and finally the complete user-movie score matrix, with the output looking like this:
Now that we have created our recommendation engine, let’s focus on how to evaluate the performance of the recommendation engine in the next section.
8 evaluation indicators of recommendation engines
To evaluate the performance of a recommendation engine, we can use the following metrics.
8.1 recall rate
-
What percentage of movies are actually recommended that users like
-
The formula is as follows:
-
Here tp represents the number of movies recommended to the user that he or she likes. Tp +fn represents the total number of movies that he or she likes
-
If a user likes five movies and the recommendation engine decides to show three of them, the recall rate is 0.6
-
The higher the recall rate, the better the recommendation
8.2 accuracy
-
Of all the recommended movies, how many do users actually like?
-
The calculation formula is as follows:
-
Where TP represents the number of movies recommended to him/her, tp+ FP represents the total number of movies recommended to the user.
-
If you recommend five movies to the user and he likes four of them, the accuracy will be 0.8.
-
The higher the accuracy, the better the recommendation
-
But consider this: If we simply recommended all the movies, they would surely cover the ones the user likes. So we have a 100 percent recall rate! But if you think about the accuracy, if we recommend 1,000 movies and the user only likes 10 of them, then the accuracy is 0.1%, which is really low. So, the goal should be to maximize accuracy and recall.
8.3 Mean square Error (RMSE)
It measures errors in predictive scoring:
-
Predicted is the rating Predicted by models and Actual is the original rating
-
If a user gives a movie a 5 and we predict a 4, then RMSE is 1.
-
The smaller the RMSE, the better the recommendation
The metrics above tell us how accurate the model’s recommendations are, but they don’t look at the order in which they are recommended, that is, they don’t look at the product recommended first, and then in the order after that. We also need metrics that take into account the order in which products are recommended. Here’s a look at some ranking metrics:
8.4 MRR (Mean Frame Rank)
-
Evaluate the list of recommendations
-
Suppose we have recommended three movies A, B, and C to the user in A given order. However, users only like movie C, because the rating of movie C is 3, so the Reciprocal Rank is 1/3.
-
The larger the MRR, the better the recommendation effect
8.5 MAP at K (Mean Average Precision at cutoff K)
-
Accuracy and recall are not related to the order of recommendations
-
The precision of cutoff K is the precision calculated by considering only the recommended subset from 1 to k
-
Suppose we have given three recommendations, [0,1,1]. Here 0 means the recommendation is incorrect and 1 means the recommendation is correct. So the accuracy of k is [0,1/2, 2/3] and the average accuracy is (1/3) *(0+1/2+2/3) =0.38.
-
The greater the average accuracy, the more accurate the recommendation
The Cumulative Gain of NDCG is Normalized
-
The main difference between MAP and NDCG is that MAP assumes that objects are interesting (or not), while NDCG gives relevance scores.
-
To understand it, let’s say that out of 10 films — A to J — we can recommend the first five films, namely A, B, C, D and E, while we cannot recommend the other five films, namely F, G, H, I and J, and the final recommendation is [A, B, C and D]. So in this example the NDCG will be 1 because the recommended product is relevant to the user.
-
The higher the NDCG value, the better the recommendation effect
9 What else can you try?
So far, we’ve looked at what a recommendation engine is and its different types and how they work. Both content-based filtering and collaborative filtering algorithms have their own advantages and disadvantages.
In some areas, generating useful descriptions of goods is very difficult. The content-based recommendation model will not select an item if the user’s previous behavior does not provide useful information. We also need to use additional technology so that the system can make recommendations beyond what the user has already shown interest in.
The collaborative filtering model does not have these disadvantages. Because there is no need to describe the recommended item, the system can handle any type of information. In addition, it can recommend products that users have no previous interest in, but collaborative filtering can’t recommend new items without users rating them. Even if users start rating the item, it can take a while to get enough ratings to make a recommendation in order to make an accurate recommendation.
A system that combines content filtering and collaborative filtering can potentially gain more information from the presentation of content and the similarity of users. The weighted average of content-based recommendations and collaborative filtering recommendations is a way to combine collaboration and content-based filtering.
The various ways of doing this are:
-
Combined goods score:
We combine the ratings from the two recommendations, and the easiest way is to average them.
Suppose one method recommends a rating of 4 for a movie, while the other recommends a rating of 5 for the same movie. So the final recommendation is an average of the two ratings, which is 4.5.
We can also assign different weights to different methods.
-
Portfolio ranking
Suppose collaborative filtering recommends five movies A, B, C, D, and E in the following order: A, B, C, D, and E, while content-based filtering recommends five movies B, D, A, C, and E in the following order.
The order of the movies is as follows:
Collaborative filtering:
Content-based filtering:
Therefore, a hybrid recommendation engine will combine these rankings and make the final recommendation based on the overall ranking. The combined rankings are:
The final recommendations will also be based on this ranking, and you can see that the order of recommendations will be: B, A, D, C, E.
In this way, two or more approaches can be combined to build a hybrid recommendation engine and improve their overall recommendation accuracy and efficiency.
endnotes
This article covers all aspects of recommendation engines and is a great help if you want to start learning about them. We discussed not only basic recommendation techniques, but also how to implement some of the more advanced technologies in today’s industry.
We also relate real-world issues for each technique, and as someone who wants to learn how to make a recommendation engine, I recommend that you learn the techniques discussed in this tutorial and implement them in your model.
Do you find this article useful? Share your thoughts in the comments below!
Recommended reading
(Click on the title to jump to read)
Implement a big data search engine in Python
Let’s build a template engine (part 1)
Let’s build a template engine (part 2)
Find this article helpful? Please share it with more people
Follow the “Python Developer” star to improve your Python skills