Recommendation system
Definition
Definition: To recommend items of interest to a user based on historical data
- Historical data: behavior data, user&Item attribute data, context;
- How to target users’ interests?
The basic logic of recommendation system: use historical data to analyze the user’s interest
How to determine users’ interests
Intuitively,
Directly assume that the user’s interest is XXX;
-
The item that the target user has behaviors or clearly expresses his preference is his interest, and he recommends similar items to the target user based on this interest point
How do I get the similarity between items?
- Item CF: Most users have bought itemA and itemB together, so itemA itemB is considered similar
- Content similarity: use NLP to analyze the similarity of items, such as title, description, etc
- Association rule mining:
Corresponding to the recommendation system, is to find the path: I2I
This is not a good idea. Having a behavior does not necessarily mean that users really like it, just like in daily life, doing everything is not necessarily interest-driven
The approximate interest
Do not assume the interests of users, but use similar, analogous methods, get similar user groups, think that the same crowd like the item, the target users will also like. Birds of a feather flock together
-
How to get a similar user base
- Clustering method: according to the user’s attribute information clustering
- UserCF: Users who assume the same behavior can be considered similar
Corresponding to the recommendation system is U2U
This approach can effectively solve the problem of cold start;
But there are still problems, not precise enough, a user must have more than one interest point, similar user groups, must be based on the similarity of several interest points; In other words, in a user group, it is not always possible to achieve a perfect match of users’ interests
Try to calculate the user’s interests
This paper proposes to use cryptic meaning model to mine latent factors to connect users’ interests and items.
Latent factor can be thought of as an implicit vector of interest;
For example: Topic Model, MF, LDA, etc
The basic steps are as follows:
-
Categorize books and objects based on interest
-
Cryptic analysis techniques
Through cryptic analysis technology, k implicit classes, that is, K implicit features, and the weight of K implicit classes of user and item objects are calculated.
Cryptic analysis techniques include implicit topic model, implicit category model, matrix decomposition and so on
-
-
For the user, first get his interest category, and then pick out the items he might like from the category
Recommendation systems and deep learning
The Embedding concept in natural language processing based on deep learning happens to agree with me.
The hidden interest vector can represent the correlation between user and item.
In natural language processing, Embedding technology can indicate the correlation between words.
The principle of Embedding technology
One-hotting -> Embedding
In natural language processing, one-hotting is used to do word numerical tool, it can uniquely mark each word, but it has a disadvantage, is the default of every word is irrelevant, which is obviously not in line with reality;
The Embedding vector can reflect the correlation of words. Specifically, by representing a word as a dense vector, word similarity can be calculated, and thus sentence similarity can be calculated, or the dense vector can be directly input into the advanced prediction model as features
In natural language processing, Andrew Wu gives the example of word analogical reasoning.
- By representing a word as a dense vector, word similarity, and hence sentence similarity, can be calculated, or this dense vector can be directly fed into advanced prediction models as features
The problem is to provide the embedding vector of the four words on the left and the four words on the right to the neural network, just like we did before to predict what the middle word is, to predict the target word in the middle, which can also be used to learn word embedding.
The easiest way to learn word embedding:
-
Learn the word embedding matrix E
Basic model:
-
One-hotting * E =embedding vector
-
The embedding vector is added to a Softmax
The objective function is optimized and the embedding matrix E is calculated
** New problem: ** Softmax calculation is too much, use negative sampling solution
Convert 10,000 soft-max multi-classification problems into **** dichotomies
-
In natural language processing, word Embedding Embedding is used to express the correlation between words.
Deep learning is used to learn such a Embedding matrix.
The essence of deep learning is to learn function through a large number of x and Y samples with x->function->y relations. The premise
How do you define the correlation between words here?
-
The belief that words appearing in the same sentence have some kind of correlation.
-
Data preparation:
Specifically, any word in a sentence is selected as the context C, as the input X of the sample, and a word near any context is selected as the target word T as the output Y of the sample. A number of C and T are selected to enrich the sample library by window sliding.
-
Model structures,
-
Sample numeralization
One-hotting coding was used to quantify the word direction. In the initial state, it is assumed that words are irrelevant to each other, and the process from one-hotting to Embedding is irrelevant to correlation
-
One – hotting -> Embedding
An Embedding matrix is assumed. The final Embedding matrix is the result pursued by Word2vec
-
Embedding -> Softmax output
Problem, due to the large number of words, Softmax is too large and the calculation is complicated, so an optimization process is made:
-
Multilevel dichotomies
-
Negative adoption: the multiple classification problem becomes the dichotomous problem 2
-
-
A word near the target word is used as the context. In practice, it is common to use the first few words of the target as the context
Learn word embedding using Word2vec
The GloVe is embedded
Python quickstart Tang Yudi 】 【 (5 hours to take you to fly) _ bi li bi li (゜ ゜ つ ロ cheers ~ – bilibili
What is the GloVe?
- A way to realize word vector
Word vector background
- word2Vec
Where does GloVe come from?
The concrete realization of Glove?
The experimental data