Produced by: The Cabin by: Peter Edited by: Peter

Recommender Systems – Computer Learning – Recommender Systems

This week mainly explained the recommendation system related knowledge. Recommendation system should be one of the most popular directions in the field of machine learning or artificial intelligence, as well as NLP, CV, etc., mainly including:

  • Introduction to recommendation System
  • Content-based recommendation systems
  • Collaborative filtering

Recommendation system

Introduction to recommendation System

There are three main application scenarios for common recommendation systems:

  • Personalized recommendation: often in the form of “recommendation”, “guess you like”, “discover”, generally placed in the home page
  • Related recommendation: often in the form of “related recommendation”, “look also look”, generally put on the content details page
  • Hot recommendation: based on all kinds of data for calculation, the ranking, support global ranking and classification ranking, position is not limited

The core value of recommendation system to users is mainly reflected in:

  1. Help users quickly and conveniently screen out the content they are interested in
  2. Provide references to users in unfamiliar areas
  3. Satisfy the user’s curiosity

The main tasks of the recommendation system are:

  • First of all, it is based on the user’s interest, according to the user’s historical behavior to do interest mining, the items and the user’s personalized preference to match.
  • Then through the recommendation algorithm or technology to filter the information, solve the user overload problem.
  • When the user has a new behavior, such as after clicking or searching, it can further capture the user’s interest in time.
  • Choose appropriate scenarios, personalized or relevant and popular, to recommend to users.

Personalized recommendation system is a bridge between the user and the user. Based on the user’s interest preference, the article, video, information and other things that users are interested in are recommended to users to bring users an immersive experience.


Problem formalization

Recommendation systems are widely used: if you think about sites like Amazon, or Netflix, or ebay, or iTunes Genius, there are plenty of sites and systems that try to recommend new products to users. Amazon recommends new books to you, Netflix tries to recommend new movies to you, etc.

These recommendation systems look at what books you’ve bought in the past, or what movies you’ve reviewed in the past. These systems will generate a big chunk of revenue, for example for Amazon and companies like Netflix.

Therefore, the improvement of recommender system performance will have a substantial and direct impact on these enterprises.

Learn about the recommendation system through a chestnut

Let’s say we’re a movie vendor, and we have five movies and four users, and we ask the users to rate the movies

The first three are romances and the next two are action movies. Alice and Bob are more into romance, and Carol and Dave are more into action. Some tags

  • Nun_unu Number of users
  • Number of nmn_mnm movies
  • R (I, j) r (I, j) r (I, j) if the user j I review excessive to films is r (I, j) = 1 r (I, j) = 1 r (I, j) = 1
  • Y (I,j)y^{(I,j)}y(I,j) represents the rating of movie I given by user J
  • Mjm_jmj represents the total number of movies that user J has rated poorly

Content Based Recommendations

In a content-based recommendation system algorithm, we assume that there is some data about the characteristics of the things we wish to recommend. Now suppose a movie has two characteristics:

  • X1x_1x1 Degree of romance
  • X2x_2x2 Degree of action

So each movie has an eigenvector, for example the first movie is [0,9, 0]

A recommendation system algorithm is constructed according to the features. Assuming a linear regression model is used for each user, θ(1)\theta^{(1)}θ(1) represents the parameters of the first user’s model. The definition is as follows:

  • θ(j)\theta^{(j)}θ(j) parameter vector of JJJ user
  • X (I)x^{(I)}x(I) movie III eigenvectors

For film III and user JJJ, the cost of the linear regression model is the sum of the squares of the prediction errors, plus the regularization term:


min Theta. ( j ) 1 2 i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) 2 + Lambda. 2 ( Theta. k ( j ) ) 2 \min_{\theta (j)}\frac{1}{2}\sum_{i:r(i,j)=1}\left((\theta^{(j)})^Tx^{(i)}-y^{(i,j)}\right)^2+\frac{\lambda}{2}\left(\theta_{k}^{(j)} \right)^2

Where I :r(I,j) I :r(I,j) I :r(I,j) indicates that we only count those movies that user JJJ overrated. In a general linear regression model, both the error term and the regular term should be multiplied by 1/2m1/2m1/2m, where we remove MMM. And we do not regularize the variance term θ0\theta_0θ0.

Sum of cost functions for all users:


min Theta. ( 1 ) . . . . . Theta. ( n u ) 1 2 j = 1 n u i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) 2 + Lambda. 2 j = 1 n u k = 1 n ( Theta. k ( j ) ) 2 \min_{\theta^{(1)},… ,\theta^{(n_u)}} \frac{1}{2}\sum_{j=1}^{n_u}\sum_{i:r(i,j)=1}\left((\theta^{(j)})^Tx^{(i)}-y^{(i,j)}\right)^2+\frac{\lambda}{2}\sum_{j=1} ^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2

If we want to use the gradient descent method to solve the optimal solution, after calculating the partial derivative of the cost function, the updated formula of the gradient descent is as follows:


Theta. k ( j ) : = Theta. k ( j ) Alpha. i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) x k ( i ) ( for . k = 0 ) \theta_k^{(j)}:=\theta_k^{(j)}-\alpha\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_{k}^{(i)} \quad (\text{for} , k = 0)


Theta. k ( j ) : = Theta. k ( j ) Alpha. ( i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) x k ( i ) + Lambda. Theta. k ( j ) ) ( for . k indicates 0 ) \theta_k^{(j)}:=\theta_k^{(j)}-\alpha\left(\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_{k}^{(i)}+\lambda\theta _k^{(j)}\right) \quad (\text{for} , k\neq 0)

Collaborative Filtering

The above content-based filtering algorithm trains the parameters of each user by using the features of the movie. Conversely, if you use the user’s parameters, you can also learn the characteristics of the movie:


m i n x ( 1 ) . . . . . x ( n m ) 1 2 i = 1 n m j r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) 2 + Lambda. 2 i = 1 n m k = 1 n ( x k ( i ) ) 2 \mathop{min}\limits_{x^{(1)},… ,x^{(n_m)}}\frac{1}{2}\sum_{i=1}^{n_m}\sum_{j{r(i,j)=1}}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1 }^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Without the user’s parameters and the movie’s characteristics, the collaborative filtering algorithm can learn both


min Theta. ( 1 ) . . . . . Theta. ( n u ) 1 2 j = 1 n u i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) 2 + Lambda. 2 j = 1 n u k = 1 n ( Theta. k ( j ) ) 2 \min_{\theta^{(1)},… ,\theta^{(n_u)}} \frac{1}{2}\sum_{j=1}^{n_u}\sum_{i:r(i,j)=1}\left((\theta^{(j)})^Tx^{(i)}-y^{(i,j)}\right)^2+\frac{\lambda}{2}\sum_{j=1} ^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2

The partial derivative of the cost function is as follows:


Theta. k ( j ) : = Theta. k ( j ) Alpha. i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) x k ( i ) ( for . k = 0 ) \theta_k^{(j)}:=\theta_k^{(j)}-\alpha\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_{k}^{(i)} \quad (\text{for} , k = 0)


Theta. k ( j ) : = Theta. k ( j ) Alpha. ( i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) x k ( i ) + Lambda. Theta. k ( j ) ) ( for . k indicates 0 ) \theta_k^{(j)}:=\theta_k^{(j)}-\alpha\left(\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})x_{k}^{(i)}+\lambda\theta _k^{(j)}\right) \quad (\text{for} , k\neq 0)

The process of collaborative filtering algorithm:

  1. Initialize x, θx, \thetax, θ to small values
  2. Minimizing cost function min⁡J(x,θ)\min J(x, theta)minJ(x,θ) using gradient descent algorithm
  3. After training the algorithm, predict the rating of movie I given by user J

Collaborative filtering algorithm

Optimization objectives of collaborative filtering:

A given x (1),… ,x(nm)x^{(1)},… ,x^{(n_m)}x(1),… ,x(nm), estimated θ(1)… , theta (nu) \ theta ^ {} (1),… Theta, \ theta ^ {(n_u)} (1),… , theta (nu) :


min Theta. ( 1 ) . . . . . Theta. ( n u ) 1 2 j = 1 n u i : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) 2 + Lambda. 2 j = 1 n u k = 1 n ( Theta. k ( j ) ) 2 \min_{\theta^{(1)},… ,\theta^{(n_u)}}\frac{1}{2}\sum_{j=1}^{n_u}\sum_{i:r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_ {j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2

Given theta (1),… , theta (nu) \ theta ^ {} (1),… Theta, \ theta ^ {(n_u)} (1),… ,θ(nu), estimate x(1)… ,x(nm)x^{(1)},… ,x^{(n_m)}x(1),… , x (nm) :

Also minimize x(1)… ,x(nm)x^{(1)},… ,x^{(n_m)}x(1),… , x (nm) and theta (1),… , theta (nu) \ theta ^ {} (1),… Theta, \ theta ^ {(n_u)} (1),… , theta (nu) :


J ( x ( 1 ) . . . . . x ( n m ) . Theta. ( 1 ) . . . . . Theta. ( n u ) ) = 1 2 ( i . j ) : r ( i . j ) = 1 ( ( Theta. ( j ) ) T x ( i ) y ( i . j ) ) 2 + Lambda. 2 i = 1 n m k = 1 n ( x k ( i ) ) 2 + Lambda. 2 j = 1 n u k = 1 n ( Theta. k ( j ) ) 2 J(x^{(1)},… ,x^{(n_m)},\theta^{(1)},… ,\theta^{(n_u)})=\frac{1}{2}\sum_{(i,j):r(i,j)=1}((\theta^{(j)})^Tx^{(i)}-y^{(i,j)})^2+\frac{\lambda}{2}\sum_{i=1}^{n_m} \sum_{k=1}^{n}(x_k^{(i)})^2+\frac{\lambda}{2}\sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2


min x ( 1 ) . . . . . x ( n m ) Theta. ( 1 ) . . . . . Theta. ( n u ) J ( x ( 1 ) . . . . . x ( n m ) . Theta. ( 1 ) . . . . . Theta. ( n u ) ) \min_{x^{(1)},… ,x^{(n_m)} \\ \theta^{(1)},… ,\theta^{(n_u)}}J(x^{(1)},… ,x^{(n_m)},\theta^{(1)},… ,\theta^{(n_u)})

Vectorization_ Low Rank Matrix Factorization Vectorization_ Low Rank Matrix Factorization

Collaborative filtering algorithms can do the following:

  1. Give an item and find a similar item
  2. When a user browses a product, find similar products and recommend them

Suppose five movies, four users, are stored in a matrix:

Movie Alice (1) Bob (2) Carol (3) Dave (4)
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ?
Nonstop car chases 0 0 5 4
Swords vs. karate 0 0 5 ?

Deduce the corresponding score

Find similar movies

Normalization means Normalization

In the figure above, suppose that the new user Eva does not rate any movies, on what basis do we recommend movies to him?

  • Normalize the mean value of the above Y matrix, subtract the average value of all users’ ratings for a certain movie from the score of each user, and get the following matrix:

  • The new matrix Y is used to train the algorithm. If we were to use the newly trained algorithm to predict ratings, we would need to add the average back and predict: (θ(j))Tx(I)+μ I (\theta^{(j)})^T x^{(I)}+\mu_i(θ(j))Tx(I)+μ I. The model would assign each movie an average score.