Recommended example of user-cf

In the school life, freshmen will ask the same professional brothers and sisters “what books should I buy?” “, “How do I schedule my sleep? At this time, senior students will make some recommendations according to their major and preference, which is a specific case of personalized recommendation in reality. There is an interesting place in this. Generally, junior and junior students will not ask these questions to senior students of different majors, nor will they ask seniors who just graduated. Why is this? As far as they are concerned, the knowledge structure and interest preference of the seniors in the same major in the previous year are the most similar to theirs, so the results recommended by them are more reliable.

Therefore, in A personalized recommendation system, when user A needs personalized recommendation, he can first find other users who have similar interests with him, and then recommend those items that user A likes but user A has never heard of to USER A. This approach becomes the user-based collaborative filtering algorithm (USER-CF).

The detailed process

  1. Similarity calculation: Find a set of users with similar interests to the target users
  2. Build a recommendation matrix: Find items from this set that users like and that target users have never heard of and recommend to target users.

1. Similarity calculation

The goal of similarity calculation is to find the similarity between the target user and other users with known preferences. Here, collaborative filtering (CF) mainly utilizes the similarity of user behavior. Given user u and user V, let N(u) represent the set of items for which user V has ever had positive feedback, and let N(v) be the set of items for which user V has ever had positive feedback.

Here the similarity calculation has several details, if the user attribute is represented by a Boolean value (0,1), for example:

0 indicates no, and 1 indicates yes. Then, the similarity of u and v can be simply calculated by Jaccard formula, which is defined as the ratio of the intersection size of A and B to the union size of A and B within the range of [0,1]. The closer the value is to 1, the more similar A and B are. In particular, when both sets A and B are empty, J(A,B) is defined as 1, and the calculation formula is as follows:

After the above results are substituted into the formula, the following results can be obtained:

Of course, there is a flaw in this. Jaccard’s formula cannot express the specific preference of u and V for each attribute. In this case, cosine similarity can be used to solve.

Change the values in the table below to indicate specific preferences:

Plug in the cosine similarity formula

Similarly, there are also Tanimoto Coefficient and Euclidean Distance. When Euclidean Distance is used to represent similarity, the following formula is generally used for conversion. The smaller the Distance, the greater the similarity.

The Tanimoto coefficient, also known as the Jaccard coefficient, is an extension of Cosine similarity. It is also used to calculate the similarity of document data:

2. Construct the recommendation matrix

After the above similarity calculation, the user set with similar interests to the target user is found. Now there is a user U who needs to recommend fruit types for him. The similarity between user U and other users is calculated and the recommendation matrix is constructed as follows:

After obtaining the above recommendation matrix for User U, the User-CF algorithm will recommend K items that the User likes most similar to his interest to the User. The following formula measures User U’s interest in item I in the User-CF algorithm:

Where, S(u, K) contains K users whose interests are closest to user U, N(I) is the set of users who have behaviors on item I, w_uv is the interest similarity between user U and user V, and R_vi represents user V’s interest in item I. Assuming K=2, the matrix is actually sorted by similarity as follows:

Then, user U’s interest in peach is calculated as follows:

If only one fruit is recommended for user U, then the final result of the UserCF algorithm should be pineapple.

Scope of application

The assumption of user-CF algorithm is that a User and other users have similar interests and preferences, so they all like similar things, namely, the concept of grouping people into groups.

This algorithm is suitable for the occasion where there are few users and their personalized interests are not significant. The new behavior of users in the recommendation process may not lead to the change of recommendation results. If there are too many users, the cost of computing user similarity matrix is too high. And the algorithm can not solve the cold start problem of new users, but new items can be recommended quickly.