Code implementation
import numpy as np
import pandas as pd
Copy the code
We will study the MovieLens data set, build on and create models to recommend movies to users. The data was collected by the GroupLens Research project at the University of Minnesota. The data set can be downloaded here. The data set included 100,000 ratings (1-5) for 1682 movies from 943 users. User information includes age, gender, occupation and other information.
Import data set
First, we’ll import our standard library and read the data set in Python. This data set contains properties for 1682 movies. There are 24 columns, and the last 19 columns specify the type of movie. These are binary columns, that is, a value of 1 indicates that the movie belongs to that type, otherwise it is 0.
The data set has been divided into two parts by GroupLens: training data set and test data set. In the test data set, there are 10 scores for each user, that is, a total of 9,430 rows. We will read these two files into our Python environment.
r_cols = ['user_id'.'movie_id'.'rating'.'unix_timestamp']
ratings_train = pd.read_csv('data/ml-100k/ua.base',sep='\t',names=r_cols,encoding='latin-1')
Copy the code
ratings_train.head()
Copy the code
user_id | movie_id | rating | unix_timestamp | |
---|---|---|---|---|
0 | 1 | 1 | 5 | 874965758 |
1 | 1 | 2 | 3 | 876893171 |
2 | 1 | 3 | 4 | 878542960 |
3 | 1 | 4 | 3 | 876893119 |
4 | 1 | 5 | 3 | 889751712 |
ratings_test = pd.read_csv('data/ml-100k/ua.test',sep='\t',names=r_cols,encoding='latin-1')
Copy the code
ratings_test.shape,ratings_train.shape
Copy the code
((9430, 4), (90570, 4))
Copy the code
ratings_test.head()
Copy the code
user_id | movie_id | rating | unix_timestamp | |
---|---|---|---|---|
0 | 1 | 20 | 4 | 887431883 |
1 | 1 | 33 | 4 | 878542699 |
2 | 1 | 61 | 4 | 878542420 |
3 | 1 | 117 | 3 | 874965739 |
4 | 1 | 155 | 2 | 878542201 |
Create a collaborative filtering model
Generating a scoring matrix
The collaborative filtering computations are based on the rating matrix and then we’re going to use the DataFrame data from above to generate a 2-dimensional rating matrix, where the rows are the users and the columns are the movies, and the numbers are the users’ ratings of the movies, and the ones that don’t have ratings are filled in with zeroes.
rating_df = ratings_train.pivot(index='user_id', columns='movie_id', values='rating').fillna(0)
Copy the code
data_matrix = rating_df.to_numpy()
Copy the code
data_matrix.shape
Copy the code
(943, 1680)
Copy the code
Calculates user-to-user and project-to-item similarity
Now that the similarity is calculated, you can use the pairwise_distance function provided by SkLearn to calculate the cosine similarity.
from sklearn.metrics.pairwise import pairwise_distances
user_similarity = pairwise_distances(data_matrix, metric='cosine')
item_similarity = pairwise_distances(data_matrix.T, metric='cosine')
Copy the code
Implementation prediction method
This gives us item-item and user-user similarity in array form. The next step is to make predictions based on these similarities. Let’s define a function to do this.
def predict(ratings, similarity, type='user') :
if type= ='user':
mean_user_rating = ratings.mean(axis=1)
#We use np.newaxis so that mean_user_rating has same format as ratings
ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
elif type= ='item':
pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
return pred
Copy the code
user_prediction = predict(data_matrix, user_similarity, type='user')
item_prediction = predict(data_matrix, item_similarity, type='item')
Copy the code
user_prediction
Copy the code
Array ([1.81349209, 0.70700463, 0.61698708,..., 0.39276591, 0.39226752, 0.39200766], [1.49898583, 0.34098239, [1.51740786, 0.29296796, 0.15029285,..., -0.12609608] [1.36707183, 0.23452991, 0.09185339,..., -0.17162167, -0.17056838, -0.17063272], [1.54450965, 0.36817316, 0.25677137,..., -0.01115452, -0.01046112, -0.01008175] 0.37321426... 0.14561441, 0.14514214, 0.14528961]])Copy the code
Matrix Decomposition (MF)
Here’s an example to understand matrix factorization. Consider a user-movie rating matrix with different users giving different movies. Factoring is familiar to all of us. I was exposed to it in junior high school.
The whole point of factoring is to simplify things
Here, the rating matrix is a sparse matrix. The sparse matrix means that most of the positions on the matrix are zeros, in fact, because users usually only see a small number of movies. The sparse matrix is predicted to be at position 0 through matrix factorization to fill in those missing scores. The matrix decomposition method can be used to find some potential features, which usually exist in the potential space. Users and items can co-exist in the same potential space through the implicit features. They share the common implicit features, which is convenient for us to calculate the similarity between them. Based on these underlying characteristics, it can be inferred how users rate movies. The factorized matrix multiplication needs to be able to restore the original matrix.
In the scoring matrix, rows are each User by User Id, and items are columns by Item Id. By matrix factorization, a user-item scoring matrix is decomposed into two matrices, and then a matrix can be returned from these two matrices. The advantage of this is that the user and item of the matrix will have a value where they cross, i.e
In general, the potential feature dimension K is much smaller than the user dimension M or the commodity dimension N, Can put our rating matrix R (M * N) R_ {} (M \ times N) R (M * N) into the PM x KP_} {M \ times K PM * K and Q (N (K) Q_ {N \ times (K)} Q (N (K), So PxQTP xQ to the TPxQT, where QTQ to the TQT is the transpose of the Q matrix, is close to the R matrix.
- M Indicates the number of users
- N Number of projects (films)
- K Number of implied spatial features
- RM×NR_{M \times N}RM×N user-movie rating matrix
- PM×KP_{M \times K}PM×K user feature matrix represents the implicit features of each user
- QN×KQ_{N \times K}QN×K item eigenmatrix represents the implied features of each item
- σ K ×k\Sigma_{k \times k} σ K × K diagonal feature weight matrix represents the feature
Noise in the data can be eliminated by matrix factorization. Instead of removing features that have little to do with user reviews of movies, the current user rating of an item (scalar) is broken down into two vectors: user pukp_{UK}puk and item vector Qikq_ {ik}qik. You can compute the dot product of the two vectors to give the user a specific score for the item.
When a new user enters the system and scores a movie, how is the new user behavior data added to the matrix to update the existing matrix? When a new user evaluates a particular product, this is the diagonal matrix σ \Sigma Sigma does not change, the item-characteristic matrix, the only transformation is user-characteristic transformation.
- Multiply the equal sign twice by the matrix QQQ
- Since the QQQ matrix is an orthogonal matrix, we get QTQ=1Q^TQ = 1QTQ=1 so we have RQ=P σ RQ=P \SigmaRQ=P σ
- So you get $$
So when a matrix Q changes, we can use this formula to figure out the P matrix. In the same way that the P matrix is transformed you can update the Q matrix. The other thing is to decompose the R matrix into P and Q, and we need to multiply the P and Q matrix to form a matrix as close to the R matrix as possible. You can do this using gradient descent algorithms. The objective function minimizes the squared error between the actual score and the estimated score of P and Q.
- Euie_ {UI} Eui is the error, u is the user’s index and I is the item’s index
- Ruir_ {UI}rui is the actual u user’s score for project I
- R ^ UI \hat{r}_{UI}r^ UI is the predicted value of u users to I goods by matrix factorization
With the loss function, we can update qkiq_{ki}qki and pukp_{UK}puk with gradient descent
Here α\alphaα is the learning rate, used to determine the step size of each update parameter. The above updates can be repeated until the error is minimized to achieve the training process.
class MF() :
Initialize the user-movie rating matrix, where the characteristics are implied, as well as the alpha and beta parameters
def __init__(self, R, K, alpha, beta, iterations) :
self.R = R
self.num_users, self.num_items = R.shape
self.K = K
self.alpha = alpha
self.beta = beta
self.iterations = iterations
Initialize the user-feature and movie-feature matrices
def train(self) :
self.P = np.random.normal(scale=1./self.K, size=(self.num_users, self.K))
self.Q = np.random.normal(scale=1./self.K, size=(self.num_items, self.K))
Initializing the bias terms
self.b_u = np.zeros(self.num_users)
self.b_i = np.zeros(self.num_items)
self.b = np.mean(self.R[np.where(self.R != 0)])
# List of training samples
self.samples = [
(i, j, self.R[i, j])
for i in range(self.num_users)
for j in range(self.num_items)
if self.R[i, j] > 0
]
# Random gradient descent during the specified iteration
training_process = []
for i in range(self.iterations):
np.random.shuffle(self.samples)
self.sgd()
mse = self.mse()
training_process.append((i, mse))
if (i+1) % 20= =0:
print("Iteration: %d ; error = %.4f" % (i+1, mse))
return training_process
# Calculate batch MSE
def mse(self) :
xs, ys = self.R.nonzero()
predicted = self.full_matrix()
error = 0
for x, y in zip(xs, ys):
error += pow(self.R[x, y] - predicted[x, y], 2)
return np.sqrt(error)
# Random gradient descent to optimize the P and Q matrices
def sgd(self) :
for i, j, r in self.samples:
prediction = self.get_rating(i, j)
e = (r - prediction)
self.b_u[i] += self.alpha * (e - self.beta * self.b_u[i])
self.b_i[j] += self.alpha * (e - self.beta * self.b_i[j])
self.P[i, :] += self.alpha * (e * self.Q[j, :] - self.beta * self.P[i,:])
self.Q[j, :] += self.alpha * (e * self.P[i, :] - self.beta * self.Q[j,:])
# j Get I users' comments on J movie
def get_rating(self, i, j) :
prediction = self.b + self.b_u[i] + self.b_i[j] + self.P[i, :].dot(self.Q[j, :].T)
return prediction
# Populate the user-movie rating matrix
def full_matrix(self) :
return mf.b + mf.b_u[:,np.newaxis] + mf.b_i[np.newaxis:,] + mf.P.dot(mf.Q.T)
Copy the code
Evaluate the recommendation system engine
You may be more focused on the definition of the model, the construction process and the algorithms behind it, and you may not know much about how to evaluate a model, the indicators of the various models, what they represent and how to use them. For some machine learning or deep learning tasks, it is important to set an implementation target after accepting the task.
In fact, to train a good model, the first thing to know what model is a good model. So you need some indicators to really reflect the model. That’s what we’re going to focus on today.
Real value \ predicted value | T | F |
---|---|---|
T | TP | FN |
F | FP | TN |
- TP(True Positive) The True label of the True class test set is T, and the predicted value is the total number of T
- FN(False Negative) misses the total number of test sets whose real label is F and whose test sets are F
- The total number of False Positive (FP) test sets whose true label is F and whose test sets are T
- TN(True Negative) True Negative class
accuracy
Accuracy: proportion of all correctly classified samples to the total number of samples Accuracy is the ratio of correct predictions to the total number of samples. From the confusion matrix, the sum of TP and TN is the correct number of predictions. Acc=NpredNtotalAcc = \frac{N_{pred} }{N_{total}}Acc=NtotalNpred
The accuracy and recall are a little tricky, but it’s not hard to understand and use these two metrics to measure models if you pay attention and do some practice on your own.
Precision (Precision)
Precision: it is the ratio of the sum of the correct number of classes and the true number of classes and the missed number in the positive sample that we predict to be correct.
The recall rate (recall)
Also called recall ratio, it reflects the percentage of positive samples that are predicted to be positive.
The recall rate reflects the classification model H’s ability to recognize positive samples. The higher the recall rate, the better the model’s ability to recognize positive samples.
Suppose there are 20 tasks out of which 10 are completed on time and 10 are overdue
- 1 kind
Predicting completion of tasks | Anticipate deadlines | |
---|---|---|
Actually complete tasks on time | 2 | 8 |
Actually finish tasks overdue | 0 | 10 |
The name of the | value |
---|---|
accuracy | (2 + 1) / 20 = 0.6 |
precision | 2 / (2 + 0) = 1 |
The recall rate | 2 / (2 + 8) = 0.2 |
- The second prediction
Predicting completion of tasks | Anticipate deadlines | |
---|---|---|
Actually complete tasks on time | 10 | 0 |
Actually finish tasks overdue | 10 | 0 |
The name of the | value |
---|---|
accuracy | 10/20 = 0.5 |
Accurate rate | 10 / (10 + 10) = 0.5 |
The recall rate | 10 / (2 + 8) = 1 |
And what we see is that the recall rate is 100% but the accuracy rate is very low. It’s only 50%. So we’re putting all our eggs in one basket.