This article is about 3,200 words and takes about 10 minutes to read
The first technical article of 2020 begins with a new series — Recommend System. The first article will briefly introduce the definition and application of Recommend System, which is listed below:
- What is a recommendation system
- Do you really need a referral system
- Recommend problem patterns for the system
- Problems with the recommendation system
- Application of recommendation system
What is a recommendation system
Definition from Wikipedia:
Recommendation system is a kind of information filtering system, by means of predicting user’s score and preference for item.
Further answer this question from the following three aspects
1. What recommender systems can do — Recommender systems can eventually identify the connections that will be made between the user and the item in advance.
The meaning of connection here is very broad. Anything that can create a relationship is a connection, including the behavior of the user to the object, or some properties of the user and some properties of the object.
The basis for this is the fact that there is a general tendency for everything to be connected.
2. What does the recommendation system need – it needs existing connections to predict future connections from existing connections.
3. How recommendation systems do it — predict user ratings and preferences. Specifically, it is machine recommendation and manual recommendation, also known as personalized recommendation and editing recommendation.
In general, the recommendation system is actually an algorithm that can help users filter a lot of invalid information and get interested information or items in the current information explosion era, and can also dig out some long tail items. Of course, over-reliance on the recommendation system may actually make you only receive the same kind of information or items in a single field, which is also a problem of the recommendation system, exploration and utilization.
How does the recommendation system work? When we are not sure what movie to watch, there are usually several ways to make a decision:
- Ask a friend. Not just by asking a friend, but by Posting on moments or tweeting, using a social product to ask this question. This method is called social recommendation in the recommendation system, that is, let friends recommend;
- We may also decide which movie to watch based on the actor or director, perhaps by using a search engine to find out if our favorite actor or director has a movie in theaters or hasn’t seen one yet. This approach is called Content-based Filtering.
- We also open Douban and check the movie list to see which movies with high scores are good. We also check the movies that users with similar historical interests have watched and select one that interests us. This approach is called Collaborative Filtering, which means making recommendations based on similar users or items.
The above are only three recommendation methods. In fact, there are other recommendation methods in the recommendation system, but in essence, they all require the connection between the user and the item, and predict the future connection through the existing connection.
Do you need a referral system
Consider this in two ways:
- The purpose of the product. If the goal of a product is to make as many connections as possible, then ultimately a recommendation system is needed. On the other hand, for tool-based products, recommendation systems are not needed;
- Product existing connection. When there are few items in the product, which can be handled manually, the connection generated by users is certainly not much. At this time, the bottleneck of connection lies in the number of items, which is not suitable for building a recommendation system. On the other hand, there are a lot of items, but users don’t make a lot of connections. In this case, users don’t keep coming back and need to find the reasons for churn, not the recommendation system.
Here’s a simple formula to determine if you need a recommendation system:
The numerator is the number of connections increased, and the denominator is the number of active users and the number of active items increased.
This simple indicator looks like this:
- If increasing the number of connections mainly depends on the number of active users and items, then this metric will be very small, indicating that it is not suitable for recommendation systems
- If the increased number of connections has little relationship with the newly active users and items, it indicates that the number of connections has a tendency of spontaneous growth and is suitable for joining the recommendation system.
Finally, whether the recommendation system is needed is a matter of input-output ratio that needs to be considered tactically, such as building a team, purchasing computing resources, accumulating data and spending time on optimization. But if it’s a strategic issue, it doesn’t need to be discussed.
Recommend problem patterns for the system
According to the introduction above, the goal of recommendation system is to predict the connection between users and items, and its prediction problem mode can be divided into two categories from the perspective of connection goals achieved:
- Score predicts
- Behavior prediction
Score and behavior is actually reflects the users of the result of the recommended two kinds of feedback, the former is an explicit feedback, directly show the user’s preferences in the recommended items, while the latter is more show the implicit feedback, such as user just skimming the recommended items, electrical business class or add to cart, collecting items, etc.
Score predicts
The main thing that rating prediction does is predict in advance how users will rate an item, such as a movie rating of 1 to 5, or how many stars an item will have.
A simpler implementation idea: build a model that predicts scores based on items that users have historically rated.
The root mean Square Error (RMSE) is usually used as a loss function to measure the good or bad forecast:
Where n is the total number of samples,It’s how users rate things,Represents the score predicted by the model, so their subtraction is the error between the model and the user’s actual score, whereas RMSE only cares about the absolute value.
The scoring problem is mainly applied to all kinds of review products, such as Douban and Imdb, etc., but the scoring recommendation has the following problems:
- Data not easy to collect
- Data quality is not guaranteed and the threshold for forgery is low
- The distribution of scores is not stable. The overall score will vary greatly in different periods, and the individual score will have different standards due to time, with a large standard deviation between people.
Behavior prediction
Behavior prediction is to predict the probability of implicit feedback by using implicit feedback data. There are several reasons why behavioral prediction is more important:
- Data is denser than explicit feedback. The rating data is generally sparse;
- Implicit feedback is more representative of what the user really thinks.
- Implicit feedback is often more closely related to the objective function of the model and is often easier to link to test metrics in AB testing. For example, CTR estimation focuses on the implicit feedback of clicking.
There are many ways to predict behavior. The two most common ways are:
- The probability of directly predicting the occurrence of behavior itself is also called CTR estimation, but it can also be the estimation of collection and purchase behavior in practical application.
- Predict the relative order of items.
Problems with the recommendation system
Up to now, there are still some problems that do not have good general solutions and are not easy to be paid attention to.
1. Cold startup problem
Recommendation systems are really data-hungry applications, the day when there will never be enough demand for data.
Cold start problems can be divided into:
- New users or inactive users;
- New items or items that have been shown less often (long Tail items)
- The system itself has no user and no user behavior, only item data
The usual solution: find a way to import data and actively learn (semi-supervised learning) from existing data, such as user registration information, item description information and so on.
2. Explore and use problems
This problem is also known as the EE (Explore & Exploit) problem:
- Exploration: that is, mining users’ unknown interests and hobbies, and recommending items that are irrelevant or not similar to users’ interests, including long-tail items;
- Leverage: Leverage known user interests to recommend similar items
It is usually the best practice to recommend most of the items that users are interested in, and a small part of them are new items in other fields. For example, if users are known to be digital product lovers, most of them recommend digital products such as computers and keyboards, and then a small amount of other items, such as sports and fitness products or clothes.
However, it is the proportion that needs to be considered here. Different users recommend different proportions. Some users may just like to explore novel items, while others only like interesting items.
3. Security issues
The recommendation system also has security problems and may be attacked. The impacts of attacks are as follows:
- Unreliable recommendation results that affect user experience and ultimately brand image;
- The collection of unreliable dirty data, this impact will continue to be retained in the product, it is difficult to eliminate completely;
- Loss of commercial interest in the product, which is a direct economic loss;
Application of recommendation system
Recommendation systems have a particularly wide range of applications, including e-commerce, movies and video, music, social networking, reading, location-based services, personalized mail and advertising, and more.
- E-commerce: Domestic Taobao and JINGdong have personalized recommendation systems, which recommend similar products to users through browsing, clicking, purchasing, collecting and adding to shopping cart.
- Movies and videos: For example, douban, iQiyi and other video websites, Douban will obtain users’ interest according to their ratings, and then recommend movies that users who have watched the movie also like (user-based collaborative filtering recommendation) or other similar movies (object-based collaborative filtering).
- Music: The most representative is NetEase Cloud Music, whose recommendation algorithm is indeed better than other domestic music products;
- Social network: Micro-blog, there will be multiple dimensions of recommendation, hot topics, city, or by field, entertainment, science and technology, sports, etc.
- Reading: mainly all kinds of news portals, among which the best is Toutiao;
As a matter of fact, as the data of users and items of the product become larger and larger, personalized recommendation system should be considered to give users personalized experience.
reference
- “Recommendation systems in practice.” chapter 1
- Geek Time “Recommendation System Type 36”
Welcome to follow my wechat official account — the growth of algorithmic ape, or scan the QR code below, we can communicate, learn and progress together!