Recommendation algorithm based on content | more challenges

“This is the 8th day of my participation in the Gwen Challenge.

Introduction to the

Content-based recommendation method is very direct. It makes recommendations based on the content description information of articles, which is essentially based on the direct analysis and calculation of the characteristics or attributes of articles and users themselves.

For example, if we know that movie A is A comedy, and we happen to know that A user likes to watch comedy movies, we can recommend movie A to the user based on such known information.

Content-based recommended implementation steps

Portrait construction. As the name suggests, a portrait depicts the features of an object or user. It’s essentially labeling a user or an item.
Object portraits: for example, what can be used to tag the movie Wolf Warrior 2?

“Action”, “Wu Jing”, “Wu Gang”, “Zhang Han”, “Mainland film”, “domestic”, “patriotic”, “military” and a series of labels can be affixed

User portrait: for example the user movie history is known: “” 1″ the war Wolf “, “” 2″ the war Wolf “, “” the founding of” “, “the great cause for army”, “the founding of a republic”, “the red sea action”, “1-8” fast and furious “, etc., if we can analyze the user’s interest characteristics such as: “Patriotic”, “war”, “racing”, “action”, “military”, “Wu Jing”, “Han Sanping” and other labels.

Question: Where do the labels come from?

PGC item portrait – Cold start
- Attributes inherent to the item (as soon as the item is created) : movie title, director, actor, genre, etc
- Attributes set by the service provider (attributes attached to the goods by the service provider) : such as short video topic, weibo topic (platform draft)
- Other channels: such as crawlers
UGC cold start problem
- Attributes of items provided by users in the process of enjoying the service, such as user comments, weibo topics (prepared by users)

The object portrait constructed according to the PGC content can solve the cold start problem of the object

Algorithm flow based on content recommendation:

Construct the object portrait based on the PGC/UGC content
Generate user portraits based on user behavior records
Find the most matching top-N items from the items according to the user’s portrait and make recommendations

Cold start treatment of articles:

Build an object portrait based on the PGC content
Calculate the similarity between two objects by drawing them
Generate top-N most similar items for each item to make relevant recommendations: for example, which items are similar to this item? What articles are similar to this article?

Eg, movie recommendation

import pandas as pd
import numpy as np
Copy the code

Use the tags of each movie in tags. CSV as candidate keywords for movies
TF·IDF was used to calculate the TFIDF value of each film label, and top-N keywords were selected as film portrait labels
And the film classification words directly as each film portrait label

def get_movie_dataset(): CSV from mL-latest dataset # Because there are too many tags in ML-latest -small, _tags = pd.read_csv("datasets/ml-latest-small/all-tags. CSV ", usecols=range(1, Dropna () tags = _tags.groupby("movieId").agg(list) # Load movie list data set movies = pd.read_csv("datasets/ml-latest-small/movies.csv", Index_col ="movieId") # Genres ["genres"] = movies["genres"]. Apply (lambda x: X.s plit # (" | ")) for each film matching corresponding tag data, NAN Movies_index = set(movies.index) & set(tags. Index) new_tags = tags. Loc [list(movies_index)] ret = Movies. join(new_tags) # Create a movie data set containing the movie Id, movie name, category, and tag # map(fun, iterable) movie_dataset = pd.dataframe (map(lambda x: (x[0], x[1], x[2], x[2]+x[3]) if x[3] is not np.nan else (x[0], x[1], x[2], []), ret.itertuples()) , columns=["movieId", "title", "genres","tags"] ) movie_dataset.set_index("movieId", inplace=True) return movie_dataset movie_dataset = get_movie_dataset() print(movie_dataset)Copy the code

Recommendation algorithm based on content | more challenges

Introduction to the

Content-based recommended implementation steps

Question: Where do the labels come from?

Algorithm flow based on content recommendation:

Cold start treatment of articles:

Related Posts

postcss

5500 Words: Low Code/No Code Technology Programming why?

Network communication: RESTful API definition specification