The original article is reprinted from liu Yue’s Technology blog v3u.cn/a_id_136
In 2020, personalized recommendation is booming. Youtube,Netflix and even Pornhub, all of the streaming media giants that dominate the Internet, rely on recommendation system to attract traffic and turn into cash. Some e-commerce systems, such as Amzon and Shopfiy, also make use of accurate recommendation to make profits. Accurate recommendation tells us that streaming media and products are not only content dissemination, but also a way of communication.
So how do we use Python syntax to build our own recommendation system? Here we recommend collaborative filtering algorithm, which belongs to memory-based algorithms. This recommendation algorithm is easy to implement, and the recommendation results are highly interpretable. We use User-based Collaborative Filtering: The main consideration is the similarity between users. As long as we find out the items that similar users like and predict the scores of target users on corresponding items, we can find several items with the highest scores and recommend them to users. For example, Teacher Li and Teacher Yan have similar movie preferences. When a new movie is released and Teacher Li likes it, he can recommend the movie to Teacher Yan.
To put it bluntly, commodities are used as a link to judge the commodities that users with high similarity have not bought before each other, and then recommend them in order.
Let’s say we are an online mobile platform with some purchase data and rating records
phone.txt
1, Huawei P30,2.0 1, Samsung S10,5.0 1, Mi9,2.6 2, Huawei P30,1.0 2, Vivo,5.0 2, HTC,4.6 3, Meizu,2.0 3, iPhone,5.0 3, Pixel2,2.6Copy the code
User 1 bought three phones, samsung and huawei millet and users and bought huawei, vio, HTC, these three users 1 and 2 of the same are all bought huawei mobile phones, we think that the two people have a certain similarity, and user 3 to buy mobile phones are completely different, so the user 3 exist can be understood as a kind of examination mechanism, It is used to verify the availability of the recommendation system because, based on user 3’s purchase history, user 3’s phone should not theoretically be recommended to user 1 and 2, and in turn user 1 and 2’s purchased phone should not be recommended to user 3
The first step is to read the data and format it into a dictionary for easy parsing:
content = \[\]
with open('./phone.txt') as fp:
content = fp.readlines()
Write the user, rating, and phone to the dictionary data
data = {}
for line in content:
line = line.strip().split(', ')
If a user is not already in the dictionary, the user ID is used to create the user
if not line\[0\] in data.keys():
data\[line\[0\]\] = {line\[1\]:line\[2\]}
Add the user ID to the key dictionary
else:
data\[line\[0\]\]\[line\[1\]\] = line\[2\]
Copy the code
In the second step, calculate the similarity between two users, using Euclidean distance (Euclidean distance)
from math import \*
def Euclid(user1,user2):
# Take out the phone and rating that two users have purchased
user1\_data=data\[user1\]
user2\_data=data\[user2\]
distance = 0
# Find the phone both users have purchased and calculate the Euclidean distance
for key in user1\_data.keys():
if key in user2\_data.keys():
# Note that the greater the distance, the more similar the two are
distance += pow(float(user1\_data\[key\])-float(user2\_data\[key\]),2)
return 1/(1+sqrt(distance))The smaller the return value, the greater the similarity
Copy the code
The third step is to calculate how similar the current user is to all the other users, because there are tens of thousands of users, and we only need one person who is very similar to the current user
# Calculate how similar a user is to other users
def top\_simliar(userID):
res = \[\]
for userid in data.keys():
# Exclude the similarity with your own calculation
if not userid == userID:
simliar = Euclid(userID,userid)
res.append((userid,simliar))
res.sort(key=lambda val:val\[1\])
return res
Copy the code
Finally, recommend push:
def recommend(user):
# Most similar users
top\_sim\_user = top\_simliar(user)\[0\]\[0\]
# Purchase records of users with the highest similarity
items = data\[top\_sim\_user\]
recommendations = \[\]
Select the phones that the user has not purchased and add them to the list
for item in items.keys():
if item not in data\[user\].keys():
recommendations.append((item,items\[item\]))
recommendations.sort(key=lambda val:val\[1\],reverse=True)# Sort by rating
return recommendations
Copy the code
Finally run the tests
print(recommend('1'))
Copy the code
\ [('vivo'.'5.0'), ('htc'.'4.6') \]Copy the code
Vivo and HTC phones were recommended to user 1 in reverse order, which was consistent with our basic logic
The original article is reprinted from liu Yue’s Technology blog v3u.cn/a_id_136