preface
I believe many friends have heard of big data, AI recommendation, thousands of people and other lofty words; Also often see a lot of App applications, will often recommend some goods to us, what guess you like, key recommendation and other businesses.
Many partners should also go to the Internet to understand, found that it is really confused, especially when seeing some algorithms, those mathematical formulas look headache. Today, I will try to introduce the overall architecture of accurate recommendation and the realization principle of the core algorithm, so that my friends can understand it as much as possible.
Note: this article requires some basic Knowledge of Java and ElasticSearch
Recommended architecture
The following is a general recommended system architecture diagram
The flow in the architecture diagram above is viewed in two dimensions
Request the path from the user
1) The user terminal initiates a request and passes in the core tag UserId
Because some platforms will have a lot of places to recommend business, such as: shopping cart below [boutique recommendation], product details inside [guess you like], product list [popular recommendation] and so on; Therefore, the terminal often carries the parameter “scene”, and different scenes will correspond to different model data
2) The background interface invokes the recommendation service
3) Any accurate recommendation has three stages: recall, sequencing and business rearrangement;
What do these three mean? Just a quick illustration
Through the steps already taken, we can achieve the recommended effect, thousands of faces; The core of the whole process is recall algorithm, sorting algorithm; Let’s look at the path of the data analysis dimension in the background.
From the data analysis path
Any analysis needs to have material. What is the material? In fact, it’s the big data that you’ve heard the most in recent years. What is Big Data? Simple understanding is the amount of data, data dimensions. We can analyze it with so much data.
In the recommended architecture diagram above:
-
We collect user behavior logs by burying them in terminals. Store to big data platform.
-
Collect business data, collect user preference behavior data, such as: favorites, likes, comments, etc. Store to big data platform.
-
Based on the data of big data platform, a training model is obtained by analyzing the data through some algorithms.
-
Through the trained model, relevant recommendation data can be obtained.
-
Save the obtained recommendation data to persistence tools such as mysql/ Redis.
To achieve user-requested performance, the recommended data is stored in the database ahead of time; Ensure user experience.
Algorithm model
What is an algorithm? What is a model? Let me give you an example from first grade
Plain Text
Title: Find the pattern and fill in the values below
1, 3, 5, 7, 9, 11, 13,? ,?
You can see the answer, right? We’re not talking about what the final answer is here, but let’s analyze where the answer came from. Okay?
Looking at the problem above, let’s break it down; We already know one set of data
Plain Text
One, three, five, seven, nine, eleven, thirteen
This data is actually equivalent to what we already know.
So now we need to figure out what the next two numbers are based on what we already know.
That is, we know the behavior data of users and then predict and recommend products to users.
algorithm
We know that the second term is 2 more than the first term, so x2 is equal to x1 plus 2; In mathematics, the term is arithmetic sequence. So this is a simple algorithm, or you can view it as an algorithm formula.
Training model
In our recommendation system we have the concept of a model, what is a model? We’re going to stick with the problem above. So let’s think a little bit deeper, why do we know that x2 is equal to x1 plus 2?
Is it because we find a difference of 2 between 1 and 3, and then we find a difference of 2 between 3 and 5, and 2 between 5 and 7, and then we find a difference of 2 between 11 and 13; So we decided, we found the pattern in this column, that x2 is equal to x1 plus 2.
In our recommendation system, the train model is also in the same way. We first take out some data from the collected data, such as 1, 3, 5 and 7. Let’s look for patterns in this part of the data, and we get something like x2 = x1 + 2;
Then we use this formula to derive the rest of the known data, such as: we can deduce the following 9, 11, 13 according to this formula. And if it’s consistent with our data, we can say that the algorithm works.
The first part of the test terminology above is training data, the rest of the data is called test data
1, 3, 5 and 7 are training data; 9, 11 and 13 are test data
In the recommendation system, this whole process can be understood as the training of the model, because there are many data dimensions in the real scene, which cannot be as simple as our example. In real scenes, we need to use algorithms such as collaborative filtering LFM, ALS algorithm, LR logistic regression and so on
To summarize
algorithm
Plain Text
Is a solution to the problem of thinking algorithm formula.
Model: Understood as a program
Plain Text
Is through the algorithm + data analysis process of a section of procedures.
Need data as input parameter, program body as algorithm; The specific recommendation data is returned after execution.
So the amount of data and the number of dimensions will directly affect the accuracy of the model
Now let’s introduce the algorithms commonly used in the recommendation system
Traditional recommendation algorithm
Let’s take an example. There is a book platform that needs to develop a recommendation system. Now we have the following known data
We found that in the figure above, we listed the title, behavior user; The value 1 inside represents read. A null value indicates that it has not been read. So how do you make recommendations now based on this data? Let’s look at traditional recommendations
Collaborative Filtering Algorithm based on User (UserCF)
The essence is from the user’s perspective
The first step is to find other users who have read the same books as them, and then recommend other books that those users like. The technical term for this idea is UserCF
In the example above, if John and John have both read Ideas for Java programming, the system thinks they have something in common.
So I recommended the book Sun Tzu’s Art of War, which Zhang SAN and Li Si had read.
The book recommended to Li Si was everyone is a Product Manager, which Zhang SAN had read
Item-based Collaborative Filtering Algorithm (ItemCF)
The essence starts from the commodity point of view
They need to be recommended books that are similar to books they have already read.
IT is from the common book, Zhang SAN read “JAVA Programming Ideas”, belongs to IT books, so the system can be recommended to Zhang SAN “Big Front-end Self-cultivation” or “Game Development”. The technical term for this idea is ItemCF
UserCF and ItemCF
It can be seen from the principles of the two algorithms that UserCF’s recommendation results focus on reflecting hot spots of small groups similar to users’ interests, while ItemCF’s recommendation results focus on maintaining users’ historical interests. In other words, UserCF’s recommendations are more social, reflecting the popularity of items in a user’s small interest group, while ItemCF’s recommendations are more personalized, reflecting the user’s own interest inheritance.
UserCF application scenario
Plain Text
1) In news websites, users’ interests are not particularly detailed, and the vast majority of users like to watch hot news. Even personalization is coarse-grained. For example, some users like sports news and some like social news. UserCF can recommend to users the news that a group of other users with similar interests are watching today.
2) Another reason why UserCF is suitable for news recommendation is from a technical perspective. As an item, news updates very quickly, with new content appearing all the time. However, ItemCF needs to maintain a table of relevance of the item. If the item is updated quickly, the table also needs to be updated quickly, which is technically difficult to achieve. Most articles are only updated once a day, which is unacceptable in journalism. UserCF only needs the user similarity table, although UserCF also needs to update the similarity table for new users, but in news websites, the update speed of items is much faster than the speed of new users to join, and for new users, it can completely recommend the most popular news to him, so UserCF obviously has more advantages than disadvantages.
ItemCF application scenarios
Plain Text
1) In book, e-commerce and movie websites, such as Amazon, Douban and Netflix, ItemCF can greatly play its advantages. First, in these sites, users’ interests are relatively fixed and persistent. Most users of these systems do not need popularity to help them judge the quality of an item, but can judge the quality of the item themselves based on their knowledge of the area they are familiar with. Therefore, the task of personalized recommendations on these sites is to help users discover items relevant to their field of study. In addition, these sites don’t update their items very quickly, so it’s acceptable to update the item similarity matrix once a day without too much loss.
conclusion
The cooperative algorithm of UserCF and ItemCF is introduced above, which is also a commonly used recommendation algorithm before. However, in recent years, a collaborative algorithm LFM (cryptic model), the core idea of cryptic model is to connect users’ interests and items through latent factor.
For example, user A’s interests involve detective stories, popular science books, and some computer technology books, while user B’s interests are more focused on mathematics and machine learning.
To recommend books to A and B:
For UserCF, the first step is to find other users who have read the same books as them (users with similar interests), and then recommend other books that those users like;
For ItemCF, it is necessary to recommend books similar to those they have already read. For example, author B has read a lot of books on data mining and can recommend books on machine learning or pattern recognition to him.
In fact, the above recommendation lacks the relationship between user interests and items, that is, there is A certain degree of similarity between user A and user B, but they are not completely the same
For example, user A is interested in detective stories and computer technology. User B is interested in detective fiction, economics; That would probably recommend an economics book to user A.
So what’s the solution? We simply add the relationship between the user’s interests and the item. You can start by classifying your interests into books and objects. For a user, first get a category of his interests, and then pick items from that category that he might like.
This interest-based approach probably needs to solve three problems:
(1) How to classify items?
(2) How to determine which kinds of items users are interested in and to what extent?
(3) For a given class, which items belong to this class should be recommended to users, and how to determine the weight of these items in a class?
This is the problem LFM is trying to solve, we will share it with you in the next article, thanks!!
Three things to watch ❤️
If you find this article helpful, I’d like to invite you to do three small favors for me:
- Like, forward, have your “like and comment”, is the motivation of my creation.
- Follow the public account “ARCHITECTURE Notes of A Wind” and share original knowledge from time to time.
- Also look forward to the follow-up article ing🚀
- [666] Scan code to obtain the architecture advanced learning materials package