Recommendation based on collaborative filtering algorithm
(The data selected in this experiment are real desensitization data of e-commerce, only for learning, not commercial use)
A classic example of data mining is diapers versus beer. Diapers and beer are two seemingly unrelated products, but when supermarkets put them next to each other, sales of both products increase dramatically. Many times, two seemingly unrelated products will have some kind of mysterious hidden relationship, obtaining this relationship will promote sales, but sometimes this relationship is difficult to obtain through empirical analysis. At this time we need to use the common algorithm in data mining – collaborative filtering to achieve. This algorithm can help us to mine the relationship between people and goods.
Collaborative filtering algorithm is an algorithm based on association rules, taking shopping behavior as an example. Assume that users A and B have products A, B, and C. If a and B both buy products A and B, we can assume that A and B have similar shopping tastes. When A has purchased product C and B has not purchased product C, we can recommend C to B as well. This is a typical user-based case, where the characteristics of the user are used as an association.
I. Service scenario Description
Through a user’s shopping behavior data before July, the associated relationship of goods can be obtained, and users’ purchases after July can be recommended and the results can be evaluated. For example, user A bought product A before July, and product A is strongly correlated with B, so we recommended product B after July and detected whether the recommendation hit.
Pai-studio is selected as the experimental platform in this experiment, and a set of collaborative filtering based recommendation system can be quickly realized only by dragging and dropping components. The data and complete business process for this experiment have been built into the PAI home page template, right out of the box:
2. Introduction of data set
Data source: This data source provides data for Tianchi Competition. The data is divided into two parts by time, namely, purchase behavior data before July and data after July. The specific fields are as follows:
The field name | meaning | type | describe |
---|---|---|---|
user_id | The user id | string | User ID for shopping |
item_id | Item number | string | The serial number of the item being purchased |
active_type | Shopping behavior | string | 0 means click, 1 means buy, 2 means favorites, and 3 means shopping cart |
active_date | Shopping time | string | The time of the shopping |
Data screenshot:
3. Data exploration process
Pai-studio is selected as the experimental platform in this experiment, and a set of collaborative filtering based recommendation system can be quickly realized only by dragging and dropping components.
Experimental flow chart:
1. Collaborative filtering recommendation process
The first input data source is the shopping behavior data before July, and the purchasing behavior data of users is taken out through SQL script and entered into the collaborative filtering component. The purpose of doing so is to simplify the process, because the purchasing behavior is the most valuable for this experimental analysis. In the component setting of collaborative filtering, TopN is set to 1, indicating that each item returns the closest item and its weight. Through the purchase behavior, analyze which items are most likely to be purchased by the same user. The setting diagram is as follows:
Collaborative filtering result, which represents the relevance of goods, ItemID represents the target goods, the left side of the colon of similarity field represents the goods with high relevance to the target, and the right side represents the probability:
For example, for the first item in the figure above, the similarity between Itemid1000 and Item15584 is 0.2747133918. The higher the similarity is, the higher the probability of the two items being selected at the same time.
2. Recommended
The above steps describe how to generate A corresponding list of strongly correlated goods, using A relatively simple recommendation rule, for example, user A bought goods A before July, goods A is strongly correlated with B, we recommend goods B after July, and check whether the recommendation hits. This step is achieved using the following figure:
3. Result statistics
The statistics module is above, and the statistics of the full table on the left shows the recommendation list generated according to the shopping behavior before July, with a total of 18,065 items removed. The statistical component on the right shows a total of 90 hits, with a hit rate of around 0.4%.
As can be seen from the statistical results above, the recommendation effect of this test is relatively general, for the following reasons:
1) Firstly, this paper only introduces the usage of collaborative filtering recommendation for business scenarios. Many key points of shopping behavior recommendation are not dealt with, such as time series. It is necessary to pay attention to the analysis of timeliness of shopping behavior, because recommendations that span several months will not have good effects. Secondly, we did not pay attention to the attributes of the recommended goods. This paper only considered the relevance of the goods, without considering whether the goods are high frequency or low frequency goods. For example, if user A bought A mobile phone last month, user A would be less likely to buy A mobile phone next month, because the mobile phone is A low frequency consumer goods.
2) Recommendation based on association rules is often best used as a supplement to the final recommendation result or the most basic recommendation system. The real improvement of accuracy depends on the training model of machine learning algorithm. For specific methods, you can refer to other articles in the series of generic recommendation.
Read the original
For more technical dry goods, please pay attention to the wechat account of Alibaba Yunqi community: YunqiInsight