Recommendation system based on matrix decomposition algorithm
Recommendation system
The recommendation system can recommend different things to users according to their preferences.
Recommended system type:
- Set the recommended content manually
- Rank items by sales, exposure, etc., and recommend them to users
- According to different algorithms, integrate data of different dimensions to recommend items intelligently
Simple recommendation system model
Set:
U stands for all users
P is the collection of all items
R is the user’s preference for the item
Model(R) = U * P
Algorithm core:
Through the user’s rating of different items, to predict the user’s preference for other items. The attributes of users and items, such as age, gender, education background, job, price, category and appearance, are not considered here.
Through the user to the item score, can establish a recommendation value matrix, then can calculate the matrix to predict the user’s preference, that is, matrix decomposition algorithm!
Matrix decomposition:
The recommended value matrix R is decomposed into matrix U and matrix P, so that the product of U and P results in elements in the new matrix R* that are very close to the values of known elements in R, so the values in R* corresponding to unknown elements in R are the predicted values.
Recommended value matrix:
A brief history of time | Wanli thirty years | Daqin empire | A dream of red mansions | Mathematics a brief history of | |
---|---|---|---|---|---|
Xiao Ming | 1 | 4 | 1 | ||
wang | 2 | 2 | 4 | ||
Xiao li | 4 | 1 | 4 | ||
Xiao zhang, | 5 | 1 | 4 |
Key issues of recommendation value matrix:
- Initial value acquisition, data collection
- Predict unknown data from known data in the recommendation value matrix
- Establish an evaluation system to verify the effectiveness of the recommendation system
To collect data
Generally, we can adopt the way of web crawler, for example, to score data, we can crawl the data on Douban reading, and we can also collect user information by burying points on websites that we can control.
Predicting unknown data
Key challenges:
-
When the number of users and items are relatively large, usually recommend the matrix is a sparse matrix (in the matrix, the number of elements if a value of 0 is far more than the number of non-zero elements and non-zero element distribution without law, says that the matrix of the sparse matrix), shows that most users may not express preferences for most of the items.
-
Cold start is a problem that every recommendation system needs to face.
Example of matrix decomposition:
That is:
By comparing the leftmost element matrix with the rightmost prediction matrix, the element value in the missing value position of the original matrix is the predicted value.
And you can also get
That is, the preference data of the item at position ij can be represented by the portrait vector of the ith user and the portrait vector of the JTH item.
The graphical representation is as follows:
The mathematical meaning of K is the rank of matrix decomposition, and the business meaning is the k factors that affect the user’s rating of items. At present, we cannot directly know the value of K, and generally adopt the way of cross verification to find the optimal value of K during model training.
We can use the sum variance as the loss function
This is done by using the {}, calculate the “and variance” to minimize it, that is, the closer the predicted value is to the true value. That gives us the values of U and P that we need.
The gradient of the loss function
Separate extraction error
The derivative of the error L on U and P can be obtained
Now that we know the gradient (derivative) of the loss function, we can use gradient descent to solve for U and P.
Gradient descent method
The optimal solution can be obtained by randomly selecting a starting point and continuing training in the direction of negative gradient until the gradient of the loss function is closer and closer to zero.
Introduce regularization
In order to prevent over-fitting, regularization parameters are added to the loss function
Lambda > 0
In this way, when U and P are both guaranteed to be relatively small, and the value of U or P changes dramatically, the dot product of U and P will not change much.
The final loss function is:
+
The gradient of the final loss function is:
The gradient descent method is used to find the optimal solution
The gradient descent rate γ (learning rate) and k value were set, and U and P were randomly initialized. The training was repeated until the error was satisfied.
Evaluation and recommendation system
- Basically, we train the model through the training set, test the model through the test set, and if the model’s performance on the test set meets our expectations, the model can be deployed online. The average absolute deviation is generally used to verify the predicted value of the model
N: The total number of recommended values in the test set
: Recommended value of item P by the real user U
: The predicted user u’s recommended value for item P
- Online A/B testing
The project of actual combat
The data set format is as follows:
1 1119 9.000000
1 167 8.000000
1 6265 8.000000
1 1440 9.000000
1 1427 9.000000
1 5404 8.000000
1 259 7.000000
1 4156 8.000000
2 419 9.000000
2 415 10.000000
2 2834 9.000000
2 228 10.000000
2 107 10.000000
2 440 9.000000
2 44 10.000000
2 455 10.000000
Copy the code
The first column is user ID, the second column is item ID, and the third column is the corresponding score (1-10)
The overall code is based on the Surprise library and can be installed first
pip install scikit-surprise
Copy the code
Let’s import the related libraries and datasets
import numpy as np
import surprise
from surprise import BaselineOnly
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
reader = Reader(line_format='user item rating', sep='\t', rating_scale=(1.10))
data = Dataset.load_from_file('book_ratings.dat.txt', reader=reader)
# Randomly divide the data into training and test data sets
trainset, testset = train_test_split(data, test_size=25.)
Copy the code
According to the formula, define the algorithm function
class MatrixFactorization(surprise.AlgoBase):
def __init__(self, lr, n_epochs, n_factors, lmd):
self.lr = lr # Learning rate of gradient descent
self.n_epochs = n_epochs # Number of iterations of gradient descent method
self.n_factors = n_factors # The rank of the decomposed matrix, which is the hidden factor that affects the user's rating
self.lmd = lmd # regularize parameters
def fit(self, trainset):
print("Fitting data...")
Initialize u and p matrices randomly
u = np.random.normal(0.1., (trainset.n_users, self.n_factors)) # mean 0, variance 0.1, (number of rows, number of columns)
p = np.random.normal(0.1., (trainset.n_items, self.n_factors))
Gradient descent
for _ in range(self.n_epochs):
print("Round:", _)
for i, j, r_ij in trainset.all_ratings():
# Here is just applying the formula above
# u_old[i] = u[i]
err = r_ij - np.dot(u[i], p[j])
u[i] -= -self.lr * err * p[j] + self.lr * self.lmd * u[i]
p[j] -= -self.lr * err * u[i] + self.lr * self.lmd * p[j]
self.u, self.p = u, p
self.trainset = trainset
print("End fitting!")
def estimate(self, i, j):
if self.trainset.knows_user(i) and self.trainset.knows_item(j):
return np.dot(self.u[i], self.p[j])
else:
return self.trainset.global_mean # return average value
Copy the code
Finally, train, predict and evaluate
Algo = MatrixFactorization(0.005, 60, 3, 0.2) Algo.fit (trainset) Predictions = Algo.test (testset) accuracy. Mae (Predictions)Copy the code
We can adjust the learning rate, the number of iterations, the number of hidden factors and regularization parameters to train different models and evaluate the results to obtain satisfactory models.