Image from Overview of Recommender Algorithms

The last article, user behavior analysis of recommendation systems, introduced some basic recommendation algorithms. This article will use the Surprise library to analyze these algorithms in depth.

NormalPredictor

The first prediction method assumes that the score data comes from a normal distribution, and this problem is modeled in the following.

Question: Given a set of samples x 1,x 2… X n, given that they come from the Gaussian distribution n (μ,σ), try to estimate the parameter μ,σ.

Probability density function of gaussian distribution:

Substitute the sample value of X I into X I, and get:

Let’s take the log of L(x) and simplify it:

Finally, we take the derivative of this expression to get μ,σ:

Look at the code:

BaselineOnly

The second algorithm, based on Factor in the Neighbors: Scalable and Accurate Collaborative Filtering, found some problems with traditional CF methods:

  • Different items are graded differently

  • Different users have different ratings

  • The score changes over time

The following baseline model was proposed:

Where u is the average score, BU is the user’s bias, and BI is the item’s bias, which is equivalent to solving the following extremum problem:

The following bu and bi can be obtained:

We can look at the code in Surprise, which implements:

Run the code to get:

You can find better results than NormalPredictor.

KNNBasic

KNNBasic is the basic CF algorithm, user-based or item-based

Compute the user-user similarity, or compute the item-item similarity.

You can see that this result is slightly worse than BaselineOnly

KNNWithMeans

KNNWithMeans is based on an assumption that there is a high or low score between users and item, which is calculated after removing an average value.

The result calculated by means of mean is slightly better than KNNBasic

KNNBaseline

Based on KNNWithMeans, the value of the baseline is used to replace the mean value, and the calculation formula is as follows:

It can be seen that using the KNNBaseline method is the best result so far.

SVD

From this algorithm we start with the matrix factorization algorithm, starting with the basic SVD method.

Define the estimate first:

In the definition of loss

Finally, how to update:

Where EuI is the residual

Core code:

svd++

The accuracy of the prediction takes into account not only the explicit feedback, but also some implicit feedback

  • Explicit feedback behavior includes the behavior of users explicitly expressing their preferences for items: the main ways are rating and like/dislike;

  • Implicit feedback behavior refers to those behaviors that do not clearly reflect users’ preferences: the most representative implicit feedback behavior is page browsing behavior;

See the previous article on user behavior analysis of recommendation systems for more details.

Taking this implicit feedback into account is especially useful for users who are constantly browsing, but don’t rate much.

At this point, we can obtain the following formula by modeling:

Where R(u) is the item set of user U’s implicit behavior, and Y is the vector modeling of item’s implicit behavior. If the user has multiple implicit behaviors, we can also add another implicit vector:

Parameter update method:

Code implementation:

Code execution effect:

NMF

Non-negative matrix factorization (NMF) and SVD both do matrix factorization, not much explanation, to see the effect of code execution:

conclusion

This article introduces two categories of recommendation algorithms: cryptic model and neighborhood based algorithm through the method in Surprise. The next article will introduce some recommendation algorithms based on neural network. Please look forward to it.