The original article is reprinted from liu Yue’s Technology blog v3u.cn/a_id_136

In 2020, personalized recommendation is booming. Youtube,Netflix and even Pornhub, all of the streaming media giants that dominate the Internet, rely on recommendation system to attract traffic and turn into cash. Some e-commerce systems, such as Amzon and Shopfiy, also make use of accurate recommendation to make profits. Accurate recommendation tells us that streaming media and products are not only content dissemination, but also a way of communication.

So how do we use Python syntax to build our own recommendation system? Here we recommend collaborative filtering algorithm, which belongs to memory-based algorithms. This recommendation algorithm is easy to implement, and the recommendation results are highly interpretable. We use User-based Collaborative Filtering: The main consideration is the similarity between users. As long as we find out the items that similar users like and predict the scores of target users on corresponding items, we can find several items with the highest scores and recommend them to users. For example, Teacher Li and Teacher Yan have similar movie preferences. When a new movie is released and Teacher Li likes it, he can recommend the movie to Teacher Yan.

To put it bluntly, commodities are used as a link to judge the commodities that users with high similarity have not bought before each other, and then recommend them in order.

Let’s say we are an online mobile platform with some purchase data and rating records

phone.txt

1, Huawei P30,2.0 1, Samsung S10,5.0 1, Mi9,2.6 2, Huawei P30,1.0 2, Vivo,5.0 2, HTC,4.6 3, Meizu,2.0 3, iPhone,5.0 3, Pixel2,2.6Copy the code

User 1 bought three phones, samsung and huawei millet and users and bought huawei, vio, HTC, these three users 1 and 2 of the same are all bought huawei mobile phones, we think that the two people have a certain similarity, and user 3 to buy mobile phones are completely different, so the user 3 exist can be understood as a kind of examination mechanism, It is used to verify the availability of the recommendation system because, based on user 3’s purchase history, user 3’s phone should not theoretically be recommended to user 1 and 2, and in turn user 1 and 2’s purchased phone should not be recommended to user 3

The first step is to read the data and format it into a dictionary for easy parsing:

content = \[\]
with open('./phone.txt') as fp:  
    content = fp.readlines()

Write the user, rating, and phone to the dictionary data
data = {}
for line in content:
    line = line.strip().split(', ')
    If a user is not already in the dictionary, the user ID is used to create the user
    if not line\[0\] in data.keys():
        data\[line\[0\]\] = {line\[1\]:line\[2\]}
    Add the user ID to the key dictionary
    else:
        data\[line\[0\]\]\[line\[1\]\] = line\[2\]
Copy the code

In the second step, calculate the similarity between two users, using Euclidean distance (Euclidean distance)

from math import \*
def Euclid(user1,user2):
    # Take out the phone and rating that two users have purchased
    user1\_data=data\[user1\]
    user2\_data=data\[user2\]
    distance = 0
    # Find the phone both users have purchased and calculate the Euclidean distance
    for key in user1\_data.keys():
        if key in user2\_data.keys():
            # Note that the greater the distance, the more similar the two are
            distance += pow(float(user1\_data\[key\])-float(user2\_data\[key\]),2)
 
    return 1/(1+sqrt(distance))The smaller the return value, the greater the similarity
Copy the code

The third step is to calculate how similar the current user is to all the other users, because there are tens of thousands of users, and we only need one person who is very similar to the current user

# Calculate how similar a user is to other users
def top\_simliar(userID):
    res = \[\]
    for userid in data.keys():
        # Exclude the similarity with your own calculation
        if not userid == userID:
            simliar = Euclid(userID,userid)
            res.append((userid,simliar))
    res.sort(key=lambda val:val\[1\])
    return res
Copy the code

Finally, recommend push:

def recommend(user):
    # Most similar users
    top\_sim\_user = top\_simliar(user)\[0\]\[0\]
    # Purchase records of users with the highest similarity
    items = data\[top\_sim\_user\]
    recommendations = \[\]
    Select the phones that the user has not purchased and add them to the list
    for item in items.keys():
        if item not in data\[user\].keys():
            recommendations.append((item,items\[item\]))
    recommendations.sort(key=lambda val:val\[1\],reverse=True)# Sort by rating

    return recommendations
Copy the code

Finally run the tests

print(recommend('1'))
Copy the code
\ [('vivo'.'5.0'), ('htc'.'4.6') \]Copy the code

Vivo and HTC phones were recommended to user 1 in reverse order, which was consistent with our basic logic

The original article is reprinted from liu Yue’s Technology blog v3u.cn/a_id_136