Several common algorithms in recommendation system

• Content-based recommendations

U Content feature representation, feature learning, recommendation list

• Recommendations based on collaborative filtering

U Swarm intelligence, historical user behavior

• Recommendations based on association rules

Transaction, frequent itemset, and association rule mining

• Utility-based recommendations

Definition of the utility function of u

• Knowledge-based recommendations

U Creation of knowledge graph

• Combination recommendation

U Often used in practical work

U Each recommendation algorithm has its own usage scenario, which can be considered comprehensively

Recommendation algorithm based on association rules:

Apriori algorithm

FPGrowth algorithm

PrefixSpan algorithm

What are association rules

Association Rules, or Basket Analysis

If A consumer buys product A, what is the probability that he will buy product B?

Association rules are based on transaction, while collaborative filtering is based on user preference (score)

The product mix uses shopping basket analysis, the Apriori algorithm, and collaborative filtering calculates similarity

Association rules do not take advantage of “user preferences,” but rather frequent itemset mining based on purchase orders

What is the process of Apriori algorithm

The process of Apriori algorithm:

Step1, K=1, calculate the support degree of K item set;

Step2. Filter the item set less than the minimum support;

Step3. If the item set is empty, the result of the corresponding k-1 item set is the final result.

Otherwise K=K+1, repeat steps 1-3.

What are the deficiencies of Apriori algorithm

Deficiencies of Apriori in the process of calculation:

A large number of candidate sets may be generated. Because by permutation and combination, you’re grouping all the possible itemsets

Each calculation requires rescanning the data set and calculating the support of each item set: a waste of computing space and time

Correlation analysis and regression analysis

Correlation analysis:

If there is a certain correlation, then further verify the accurate relationship between them through regression analysis

The correlation coefficient obtained by correlation analysis is not as accurate as regression analysis

Correlation analysis is a descriptive analysis, while regression analysis is more important and accurate

Use DataFrame to show the correlation between elements:

DataFrame.corr(method=’pearson’, min_periods=1)

Method parameters

Pearson, to measure whether two data sets are on the same line, the correlation coefficient calculation for linear data has an error for nonlinear data

Kendall, an indicator reflecting the correlation of categorical variables, is usually used in the research on the consistency level of scoring data, such as judge scoring and data ranking

Spearman: The correlation coefficient of non – linear, non – normal distribution data

Pearson’s coefficient, the most widely used correlation statistic, measures the degree of linear correlation between two sets of continuous variables

Regression analysis:

Regression analysis:

A widely used statistical method to determine the quantitative relationship between two or more variables that are interdependent

According to the number of variables involved, it is divided into unary analysis and multiple regression analysis

According to the number of dependent variables, it can be divided into simple regression analysis and multiple regression analysis

According to the relationship between independent variables and dependent variables, it can be divided into linear regression analysis and nonlinear regression analysis

Linear regression model

Loss function

The loss function can measure the quality of the model

MSE, mean square error, is a loss function commonly used in regression problems

clf = linear_model.LinearRegression()

Fit (X,y), training, fitting parameters

Predict (X), prediction

Coef_ holds the regression coefficient

Intercept_, which holds the intercept

Score (X,y), get the score result, R squared (determine coefficient)

R-squared:

R square, also known as coefficient of determination, represents the degree of fitting of the model to the real data and evaluates the prediction effect

R squared is calculated as 1 minus the ratio of the variance of y with respect to the regression equation (unexplained deviation) to the total variance of y

In a linear regression, R squared is equal to the Pearson product moment relationship

For example, R squared =0.8 indicates that the regression relationship explains 80% of the variation in the dependent variable. In other words, if we can keep the independent variable X constant, the dependent variable Y will be 80% less variable

In the SKlearn calculation, the correlation coefficients are positive and negative

What do support, confidence, and promotion represent in association rules and how are they calculated

Support: a percentage that compares the number of occurrences of a particular product to the total number of occurrences. The higher the support, the greater the frequency of the combination.

Support for milk =4/5=0.8

“Milk + bread” support =3/5=0.6.

Confidence: a conditional concept

It’s the probability that when you buy good A, you buy good B

Confidence (milk → beer) =2/4=0.5

Confidence (beer → milk) =2/3=0.67

Degree of improvement: the degree to which the occurrence of good A improves the probability of the occurrence of good B

Improvement (A→B)= confidence (A→B)/ support (B)

Three possibilities of ascension:

Promotion degree (A→B)>1: there is promotion;

Promotion (A→B)=1: there is no promotion, there is no decline;

Promotion degree (A→B)<1: there is A decrease.

The order no.

Goods purchased

1

Milk, bread, diapers

2

Coke, bread, diapers, beer

3

Milk, diapers, beer, eggs

4

Bread, milk, diapers, beer

5

Bread, milk, diapers, coke

The difference between association rules and collaborative filtering

Association rules are based on transaction, while collaborative filtering is based on user preference (score)

The product mix uses shopping basket analysis, the Apriori algorithm, and collaborative filtering calculates similarity

Association rules do not take advantage of “user preferences,” but rather frequent itemset mining based on purchase orders

Current needs:

Recommendations are based on the current purchase/click only (association rules)

Long-term preference:

Based on the behavior analysis of user history, establish the preference ranking in a certain time (collaborative filtering)

The two recommendation algorithms have different thinking dimensions. In many cases, we need to combine the results of various recommendation methods to make a mixed recommendation.

How to determine the minimum support and minimum confidence in association rules

Minimum support, minimum confidence is experimental

Minimum support:

The support degree of minimum value varies greatly with different data sets. It could be between 0.01 and 0.5

You can output the support of the top 20 itemsets from high to low as a reference

Minimum confidence: probably between 0.5 and 1

Degree of improvement: the multiple that can be improved by using association rules. It is the ratio of confidence to expected confidence

The increase must be at least 1