Train of thought
Suppose we have user A’s preference list SA[1,2,3] and user B’s preference list SB[3,4,5]. By using len(SA^SB)/len(SA), we can easily get A representation of the similarity between B and a. although it is not necessarily precise, it is better than simplicity.
127.0.0.1:6379> sinter a:like b:like 1"3"Len of SA to the SB is equal to 1 and len of SA is equal to 3 so the similarity is 1/3Copy the code
For A certain user A, use A to calculate the similarity of other users C[2,3,6],D[1,2,3,4],E[1,2,3,4,5,6,7]. By sorting the list, the similarity priority list of user A [D,E,C,B] can be obtained.
So the order is D, E, C, B, rightCopy the code
In the difference set operation, the difference set of B relative to A is the rest of B except the union set. In this scenario, it is generally speaking to recommend items that A may like from similar friends. Sdiff key1 key2, find the difference between key1 and key2
127.0.0.1:6379> sdiff d:like a:like
1) "4"
127.0.0.1:6379> sdiff e:like a:like
1) "4"
2) "5"
3) "6"
4) "Seven"127.0.0.1:6379> sdiff C :like a:like 1)"6"
127.0.0.1:6379> sdiff b:like a:like
1) "4"
2) "5"
(integer2)Copy the code
At this point, you take all of the differences and you get user A’s list of recommendations
Simple implementation
I made a simple implementation version with Golang, and I put the code here
The interesting points encountered in the implementation process are, respectively, the calculation logic of similarity, how to get the complete set of differences between a user and all his friends, and how to recommend items to the user
Computing logic of similarity
Len (SA^SB)/len(SA) = len(SA^SB)/len(SA) = len(SA^SB
- There is little overlap between elements A and B, such as [1,3,5,7,9] and [2,3,6,8,10].
- The elements of A and B partially overlap, for example [1,3,5,7,9] and [2,3,5,7,10]
- The elements of A and B coincide completely, and A small number of elements without A exist in B, such as [1,2,3] and [1,2,3,4].
- The elements of A and B coincide perfectly and there are A large number of elements in B that do not exist in A, such as [1,2,3] and [1,2,3,4,5,6,7,8].
So if you just take len of SA to the SB over len of SA, then the fourth case would be considered very similar, which is not the right way to do it, so you subtract the proportion of B that’s out of intersection, So it becomes len(SA^SB)/len(SA) – len(SB-SA^SB)/len(SB).
So, if you want to say that len(SA^SB)/len(SA) is equal to the degree of A, and -len(sb-sa ^SB)/len(SB) is equal to the degree of A minus B, then len(SA^SB)/len(SB), Also, the intersection can represent the degree value of B. If both the degree value representing A and the degree value representing B are used, intuition will be more accurate
So the formula becomes len(SA^SB)/len(SA)-len(sb-sa ^SB)/len(SB)+len(SA^SB)/len(SB). Into a plen (SA) ^ SB/len (SA) – qlen (SB – SA ^ SB)/len (SB) + r * len (SA) ^ SB/len (SB.)
However, to obtain a very accurate similarity calculation algorithm is really not easy, the choice of parameter value is troublesome, but lack of a little necessary mathematical knowledge to derive this value (every time HERE I envy graduate students), so I randomly tested, and finally determined a method
if float64(len(s))/float64(likeLen) >= 1 {
similarity = 0.66 * float64(len(s))/float64(likeLen) - 0.38 *float64(friendLikeLen-len(s))/float64(friendLikeLen) + 0.45 * float64(len(s))/float64(friendLikeLen)
}else{
similarity = float64(len(s))/float64(likeLen) - 0.3 *float64(friendLikeLen-len(s))/float64(friendLikeLen) + 0.35 * float64(len(s))/float64(friendLikeLen)
}
Copy the code
LikeLen is the length of the user’s preference list, friendLikeLen is the length of the friend’s preference list,len(s) is the length of the intersection, the code general idea is that if B covers A far, p=0.66,q=0.38,r=0.45, if B does not cover A, = 0.3, p = 1, q, r = 0.35
Gets the complete set of differences between a user and all of his friends
After sorting the friends according to similarity, traverse the friend list and calculate the difference set for each friend and user. How should the difference set obtained get a complete set recommended to users
The first thing that comes to mind is the brute force method. Every time we get a difference set, we use the redis sadd command. The pseudo-code is as follows
for friend := friends {
list := redis.sdiff(friend, user)
redis.add(user:recommend, list)
}
Copy the code
This ensures order and is easy to implement, but the downside is that users need to operate Redis as many times as they have friends, which is not a good option
Therefore, based on this, I came up with a method that first stores all the difference sets into a slice, and finally only needs to execute sadd once when the for loop exits. The pseudocode is as follows
for friend := friends {
list := redis.sdiff(friend, user)
recommendList.append(list)
}
redis.sadd(recommendList)
Copy the code
After careful consideration, if a user has a large number of friends and a large list of friends’ preferences, each friend’s preferences are almost the same, then in the case of controlling memory is not full, the recommendation list of users stored in Redis will appear in extreme cases, and there are not many options.
For example, [1,2,3….,1,2,3], n 123s, and you end up with a list of user recommendations of 1,2,3, so you still have to get a set in your code
[]int = map[int]struct = []int
m := make(map[string] *struct{},len(simFriends))
indexes := make([]string.0.len(simFriends))
for _, friend := range simFriends {
redisClient.Append("sdiff", fmt.Sprintf(userLikeKey, friend.ID), fmt.Sprintf(userLikeKey, userID))
r := redisClient.GetReply()
iferr := r.Err; err ! =nil {
panic(err)
}
values, err := r.List()
iferr ! =nil {
panic(err)
}
for _, val := range values {
ifm[val] ! =nil {
continue
}
m[val] = &struct{}{}
indexes = append(indexes, val)
if len(indexes) >= 500 {
break
}
}
}
redisClient.Append("lpush", fmt.Sprintf(userRecommendKey,userID), indexes)
Copy the code
A space for time approach ensures the order of recommendations, and to some extent also ensures the number of recommendations. As you may notice carefully, sadd is not used, for the reason in the next section
Recommend items to users
There are two kinds of recommendation scenarios that come to my mind. One is the one-by-one recommendation similar to netease Cloud Private FM, and the other is the recommendation similar to Taobao and Jingdong, which guess how many you like
When using the data structure of Redis set, it can be recommended one by one, but it is either srandmember command, random members, not sorted by similarity, or Spop key 1. Once recommended, it will not be recommended twice, which may not be very friendly
Then I realized that I had already done the set operation in my code. I could store it in Redis even if it was not a set. I could use the list structure, whether it was Lrange, Lindex or extreme rPOP
func getUserRecommend(userID,pageSize,page int) []string{
redisClient.Append("lrange", fmt.Sprintf(userRecommendKey,userID), pageSize*(page - 1),pageSize*page - 1)
r := redisClient.GetReply()
ifr.Err ! =nil {
log.Println(r.Err)
return nil
}
recommendList,err := r.List()
iferr ! =nil {
log.Println(err)
return nil
}
return recommendList
}
Copy the code
conclusion
Mainly added to the redis data structure applicable scenarios thinking. In addition, the more I contact with interest recommendations, the more interested I am in postgraduate research topics, the more I have the desire to study part-time, and the more I am in awe of the gods in mathematics and computer fields.
Level is limited, welcome to clap brick