Recently I came into contact with the recommendation system. In this topic, I would like to share my harvest and experience with you.

Collaborative filtering algorithm based on user

User-based collaborative filtering algorithm is the oldest algorithm in the recommendation system. This algorithm was put forward in 1992 and was used in the mail filtering system, which was later used in the news filtering system. Simple, recommender system, collaborative filtering is based on users, users find similar to the target user interest first, and then recommend the users like to target users, find similar users with the target user interests is first calculate the similarity between users, today is mainly about using cosine similarity.

Cosine similarity principle

The cosine value of the Angle between two vectors in the vector space is used to measure the difference between two individuals. The closer the value is to 1, the closer the Angle is to 0°, that is, the more similar the two vectors are, which is called cosine similarity

Cosine similarity formula

Take out the formula from the book, hesitate not to type mathematical formula online, use handwritten photos instead

Direct endorsement might confuse you, but let me give you the popular version

I think you can see this in the popular version, but now I’m going to do the derivation

The process of deriving a formula

As shown in figure A and B vectors, the Angle between them is θ

Make auxiliary line C as shown in figure

Now, the problem is to find the cosine of theta, which, according to the law of cosines, is better than if you remember

I’m going to put this model in two dimensional coordinates

Then, the constructed triangle has three sides with the following lengths:

Substitute the side lengths of A, B and C into the formula of the law of cosines, and the calculation process is as follows:

This is derived in two dimensional coordinates, and if you extend the process to n-bit coordinates, this is the simple version of the formula above

Take a chestnut

User A is interested in items A, B and D; user B is interested in items A and C; user C is interested in items B and E; and user D is interested in items C, D and E. This is converted into A vector graph and put into the formula to calculate the similarity. The calculation process is shown in the figure

Cosine similarity for everyone to share here, welcome everyone to exchange, point out some of the wrong places in the article, let me deepen my understanding, I wish you no bug, thank you!