This account is the official account of the first recommendation of the fourth paradigm intelligent recommendation product. The account is based on the computer field, especially the cutting-edge research related to artificial intelligence, aiming to share more knowledge related to artificial intelligence with the public and promote the public’s understanding of ARTIFICIAL intelligence from a professional perspective. At the same time, we also hope to provide an open platform for discussion, communication and learning for people related to ARTIFICIAL intelligence, so that everyone can enjoy the value created by artificial intelligence as soon as possible.
Even without user data, we can still build efficient recommendation systems that show users more quality content and engage them.
Too long to look at the version:
The first step is to build a content-based recommendation system that recommends other similar products to users, but doesn’t rely on data from other users. These characteristics (that is, mathematical expressions in which the recommendation algorithm relies on expressions from different aspects of the content item) come from the content item itself, not from user behavior. With written text, semantic techniques can be used to extract text features.
Using the above recommendation system as a benchmark model, we can introduce other features, such as metadata extracted from text, to optimize the system as much as possible. Although there is no clear user identity, personalized recommendations can still be achieved by using user account agents. Assuming that users browse multiple items each time they visit the system, a local session-based recommendation system can be built based on real-time trends within the session.
Is Chinese version:
“How do you build a recommendation system without user data?” We’ve come across this question many times, and today I’m going to try to answer it.
This article will show you the basics of how recommendation systems work, using (in important parts) some industry jargon. When it comes to technical issues, the specific technical environment will be explained.
In general, there are three possible ways to build a recommendation system without user data. I’ve listed them below in order of complexity, and I’m assuming you have all the data at your disposal at this point. In each of these three ways, the latter makes better use of user data, such as unique identifiers and user information, than the former, but you don’t actually have that data at hand.
Build a content-based recommendation system
First, we can build a standard content-based recommendation system featuring tags or other content metadata. We can evaluate the algorithm using the TF-IDF model, in which these labels represent every word in a pre-calculated dictionary (the dictionary is just a data structure, a collection of all words in the discourse).
In particular, the dictionary will help us build what are called “feature vectors”, assuming that we make full use of all tags and other features to build the dictionary. Then, based on feature vectors, we compared different content items to build a recommendation system. At this stage, a content-based recommendation system has been preliminarily completed. According to my research experience, the recommendation effect of this system is quite good. All of the work we’re doing right now is recommending products that are similar to historical products. The term “similar” here means that a recommended item bears similar labels and features as compared to a historical item.
If we want to build a more accurate recommendation system, the first thing we need to do is iterate on the above primary recommendation system and optimize it from there. Next, I’ll introduce other methods.
Optimize content-based recommendation systems
The above steps leverage a single dictionary containing existing tags and other characteristics. The next step to improve recommendation accuracy is to build two or more dictionaries — corresponding to different categories of metadata — based on which we can calculate a weighted combination of each content item score using tF-IDF statistical method in the recommendation system. We can optimize parameters (such as the weight of each score) based on the results of subjective evaluations. It depends on which parameter weight gives the best recommendation.
If a certain type of metadata cannot be weighted by TF-IDF, for example, this set of data is not related, then I recommend that you subdivide this set of data into different categories. After doing this subdivision, we get another set of labels (each subdivided type of data has its own label). Assuming that other features don’t proliferate along the way, that doesn’t make the whole job more difficult.
Filtering techniques can then be introduced into the system, such as adding a specific tag, to further optimize the recommendation system. It is not part of the core algorithm, but if we want to embed an algorithm in the recommendation system so that users can customize the operation of the recommendation criteria, then filtering technology is an additional support structure for the algorithm.
Build a recommendation system using user agent
The next step in improving the system’s recommendation accuracy is to look at the characteristics of the data that can be used as a user agent. Although we do not have a user account, we may have IP addresses, browser information, user sessions and other information.
At this point, we can build an abstract user. Such user accounts could not be verified, but there was a prototype for fingerprint technology. Once we can name an “abstract” user, we can generate personalized recommendations for that user, specifically using multiple collaborative filtering techniques. It doesn’t seem complicated to me — we can find many open source projects (such as advanced Python packages). The key is that we can build user accounts with existing broker information.
In addition, we need user click interaction data. We need to know which items have already been clicked on, otherwise there is no way to optimize them further according to user preferences. Once you have the user’s click interaction data and abstract user account, you can build a personalized recommendation system composed of IP address and browser information. It’s not really personalized recommendation, but it’s not far off.
Build a session-based recommendation system
The final approach is summarized as building a session-based recommendation system. This is similar to the previous approach, but this time we focus on data within a particular session. Even if we can’t get user information, it’s possible to get user session data. With a user session account, it is equivalent to a highly localized “user account”.
There are many session-based recommendation systems, some of which are built based on recurrent neural network (RNN) with high accuracy, such as the research conducted by Hidasi and Karatzoglou. The recommendation results of these systems are quite satisfactory.
Session-based recommendation systems assume that the user intends to stay on the system for a period of time. If the user does this, and clicks on it enough times, the recommendation will be better and the recommended content will be more attractive to the user.
What Are the Three Ways to Build a Recommender System When You Don’t Have AudienceData?
The above content is from Quora, compiled and published by the fourth paradigm.
Related reading:
Want to learn about recommendation systems? Look here! (2) — Neural network method
Want to learn about recommendation systems? Look here! (1) — Collaborative filtering and singular value decomposition
How to realize automatic online, operation and maintenance of intelligent recommendation system?
Getting started with recommendation systems, a list of knowledge you shouldn’t miss
For more information, please search and follow the recommendation wechat public account (ID: DSFSXJ).