“This is the 8th day of my participation in the Gwen Challenge.
Introduction to the
Content-based recommendation method is very direct. It makes recommendations based on the content description information of articles, which is essentially based on the direct analysis and calculation of the characteristics or attributes of articles and users themselves.
For example, if we know that movie A is A comedy, and we happen to know that A user likes to watch comedy movies, we can recommend movie A to the user based on such known information.
Content-based recommended implementation steps
- Portrait construction. As the name suggests, a portrait depicts the features of an object or user. It’s essentially labeling a user or an item.
- Object portraits: for example, what can be used to tag the movie Wolf Warrior 2?
“Action”, “Wu Jing”, “Wu Gang”, “Zhang Han”, “Mainland film”, “domestic”, “patriotic”, “military” and a series of labels can be affixed
- User portrait: for example the user movie history is known: “” 1″ the war Wolf “, “” 2″ the war Wolf “, “” the founding of” “, “the great cause for army”, “the founding of a republic”, “the red sea action”, “1-8” fast and furious “, etc., if we can analyze the user’s interest characteristics such as: “Patriotic”, “war”, “racing”, “action”, “military”, “Wu Jing”, “Han Sanping” and other labels.
Question: Where do the labels come from?
- PGC item portrait – Cold start
- Attributes inherent to the item (as soon as the item is created) : movie title, director, actor, genre, etc
- Attributes set by the service provider (attributes attached to the goods by the service provider) : such as short video topic, weibo topic (platform draft)
- Other channels: such as crawlers
- UGC cold start problem
- Attributes of items provided by users in the process of enjoying the service, such as user comments, weibo topics (prepared by users)
The object portrait constructed according to the PGC content can solve the cold start problem of the object
Algorithm flow based on content recommendation:
- Construct the object portrait based on the PGC/UGC content
- Generate user portraits based on user behavior records
- Find the most matching top-N items from the items according to the user’s portrait and make recommendations
Cold start treatment of articles:
- Build an object portrait based on the PGC content
- Calculate the similarity between two objects by drawing them
- Generate top-N most similar items for each item to make relevant recommendations: for example, which items are similar to this item? What articles are similar to this article?
Eg, movie recommendation
import pandas as pd
import numpy as np
Copy the code
- Use the tags of each movie in tags. CSV as candidate keywords for movies
- TF·IDF was used to calculate the TFIDF value of each film label, and top-N keywords were selected as film portrait labels
- And the film classification words directly as each film portrait label
def get_movie_dataset(): CSV from mL-latest dataset # Because there are too many tags in ML-latest -small, _tags = pd.read_csv("datasets/ml-latest-small/all-tags. CSV ", usecols=range(1, Dropna () tags = _tags.groupby("movieId").agg(list) # Load movie list data set movies = pd.read_csv("datasets/ml-latest-small/movies.csv", Index_col ="movieId") # Genres ["genres"] = movies["genres"]. Apply (lambda x: X.s plit # (" | ")) for each film matching corresponding tag data, NAN Movies_index = set(movies.index) & set(tags. Index) new_tags = tags. Loc [list(movies_index)] ret = Movies. join(new_tags) # Create a movie data set containing the movie Id, movie name, category, and tag # map(fun, iterable) movie_dataset = pd.dataframe (map(lambda x: (x[0], x[1], x[2], x[2]+x[3]) if x[3] is not np.nan else (x[0], x[1], x[2], []), ret.itertuples()) , columns=["movieId", "title", "genres","tags"] ) movie_dataset.set_index("movieId", inplace=True) return movie_dataset movie_dataset = get_movie_dataset() print(movie_dataset)Copy the code