“This is the 10th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
With the gradual development and improvement of machine learning technology, recommendation system also gradually uses the idea of machine learning to recommend. The list of ways to apply machine learning to recommendation systems is endless. The following is a general classification of model-based CF algorithm:
- Based on classification algorithm, regression algorithm, clustering algorithm
- Recommendation based on matrix decomposition
- Based on neural network algorithm
- Based on graph model algorithm
Collaborative filtering recommendation based on K nearest Neighbor
Collaborative filtering recommendation based on K-nearest Neighbor is essentially MemoryBased CF, except that the k-nearest neighbor restriction is added in the selection of nearest neighbors.
Here we directly according to the MemoryBased CF code implementation
Modify the following
class CollaborativeFiltering(object): based = None def __init__(self, k=40, rules=None, use_cache=False, standard=None): ''' :param k: ["unhot","rated", ["unhot","rated"], None :param use_cache: param use_cache: ["unhot","rated"] Whether to enable caching: Param standard: Standardized scoring methods, Self.k = 40 self.rules = rules self.use_cache = use_cache self.standard = standardCopy the code
Modify all the local code to select the nearest neighbor, according to the similarity to select K nearest neighbors
similar_users = self.similar[uid].drop([uid]).dropna().sort_values(ascending=False)[:self.k]
similar_items = self.similar[iid].drop([iid]).dropna().sort_values(ascending=False)[:self.k]
Copy the code
But since we have less raw data, our KNN method here will perform worse than pure MemoryBasedCF
Collaborative filtering recommendation based on regression model
If we think of ratings as continuous values rather than discrete values, then linear regression can be used to predict how a target user will rate an item. One such implementation strategy is called Baseline.
Baseline forecast
Baseline design philosophy is based on the following assumptions:
- Some users generally score higher than others, and some users generally score lower than others. For example, some users are naturally willing to praise others and are soft and easy-going, while others are more demanding and always score no more than 3 out of 5.
- Some items are generally rated higher than others, and some items are generally rated lower than others. For example, the status of some objects is determined by their production. Some are more popular than others.
The difference between a user or an item that is generally above or below the average is called bias.
Baseline target:
- Find the offset value bub_ubu that is generally higher or lower for each user than for others
- Find the bias value BIB_ibi for each item that is generally higher or lower than other items
- Our goal was to find the best bub_ubu and BIB_ibi
The steps to predict the score using the Baseline algorithm are as follows:
-
Calculate the average score of all movies μ\muμ (i.e. global average score)
-
Calculate the offset of each user’s score to the average score μ\muμ bub_ubu
-
Calculate bib_ibi, the offset between the rating received for each movie and the average rating μ\muμ
-
Predict how users will rate movies:
\hat{r}_{ui} = b_{ui} = \mu + b_u + b_i
Copy the code