The introduction

The recommendation system is based on users’ personal preferences to recommend users’ favorite items as much as possible (video, news, goods, live broadcast, etc.), and users’ interest preferences are reflected through past behavior data. When there is no history to follow, there is the problem of recommending cold starts. In the following part, this paper first describes the concepts related to cold start, then introduces some conventional solutions in the industry, and finally focuses on the practice of Huajiao Live broadcast in solving the cold start of users.

1. Cold start concept

The recommendation system aims to predict users’ preferences for items based on user characteristics, interactive behaviors and characteristics of items, so as to make personalized recommendation. When the items or users lack sufficient effective information, the problem of cold start exists.

Cold start problems can be classified as follows:

1.1 Cold Startup

Refers to the whole product just launched, has not accumulated enough data and users. At this point, product designers and developers mainly cooperate to complete the cold start of the system, which includes product positioning, target group promotion, seed user acquisition and so on.

1.2 Cold start of items

It refers to newly added items without interactive data (how many people watch the live broadcast, how many people click or buy the product, and how many people score the movie and TV series, etc.), such as new products on e-commerce platforms and new anchors on live broadcast platforms.

1.3 Cold Startup

For newly registered users without interaction records, the initial user experience almost determines the retention of new users, so user cold start is a very important part of recommending cold start. This paper will mainly introduce the solution of user cold start and the exploration of Huajiao live in user cold start.

2. User cold start model

There are several common methods for user cold startup in the industry

2.1 Recommendation of popular content

You can rely on expert experience and recommend popular content to new users through some rules. This solution is relatively simple, but in most cases lacks variety, and users will lose interest quickly if updates are not timely.

2.2 Based on group representative recommendation

Provide the user with several tabs or items of interest to choose from. Initializing users’ interests and preferences through tags or items represented by groups, and then recommending relevant items to users.

The advantage of group representative recommendation is that its similarity recommendation can be based on matrix decomposition. If the original recommendation system has a matrix decomposition model, it can be realized without too many changes. At the same time, the scheme is highly explanatory. The front-end interaction logic needs to be modified to add new user operations, which may degrade user experience.

2.3 Use of auxiliary information

New users can log in through accounts of other platforms, and expand user portraits through existing user portraits of other platforms (such as friends, interests and hobbies on social platforms) to achieve the effect of cold start recommendation.

The use of auxiliary information can solve the problem of user cold start to a certain extent. However, due to reasons such as privacy protection and platform data protection, the development workload of obtaining platform user portrait is relatively large in the actual implementation process.

2.4 Bandit Algorithm (Bandit Exploration + Utilization)

The multi-armed Bandit (MAB) problem is a classic E&E problem in reinforcement learning. The probability of winning is different for each arm of the dobby. The experimenter tries several times according to a certain strategy, and updates the strategy iteratively according to the winning situation of each time, so as to obtain the maximum profit. Bandit-related algorithms include Epsilon-greedy, UCB, LinUCB, Thompson Sampling, etc. Thompson Sampling algorithm is taken as an example to briefly explain the implementation process of EE algorithm.

Thompson sampling

We assume that the winning probability of each arm follows Beta(win, lose) distribution, then solving the maximum return translates into parameter estimation of the winning probability distribution of each arm. In each iteration, we can take the probability p generated by win and lose parameters of the current beta distribution for each arm as the winning probability, select the arm with the largest P for the next round of trial, and update the win and lose parameters of the arm according to the results.

The E&E problem in MAB problem is very suitable for the scenario of cold start of users. Bandit algorithm takes into account the known interests of users, explores the potential interests of users through different strategies, and explores the interests of users through multiple attempts. Such algorithms can also be used to improve the diversity of recommendations. Bandit algorithm needs to keep trying and updating to achieve the fitting of users’ interests. Since it does not use any prior knowledge, the recommendation effect may be worse than popular in the first few times.

2.5 Deep learning

Deep learning can automatically extract higher-order features due to its multi-layer network design and dropout technology, and its generalization ability is better than that of traditional machine learning models. In the cold start scenario of new users, there are few user behavioral characteristics. At this time, the model can be trained to learn group characteristics, such as XX male who uses XX mobile phone in XX area and which categories of goods he likes.

This kind of population feature can be realized by feature crossing, such as manual feature crossing of Wide part of Wide&Deep model, but manual crossing can only learn the cross features existing in training samples (for example, two features A and B are A1, A2, B1 and B2 respectively, and only A1_B1 and A2_B2 exist in samples). The weights of A1_B2 and A2_B1 cannot be learned.

DeepFM uses a factorization machine (FM part) to represent features with vectors, and realizes automatic feature crossing through vector inner product. At the same time, DeepFM solves the problem that no co-occurrence features in the training set cannot be learned weight.

3. Introduction of user cold start recommended by Huajiao live broadcast

Due to the small time span, the recent interaction records of new users are not enough to construct user portraits (the frequency and duration of watching in the past period of time, the number of times of sending danmu, etc.), so it is not suitable for the recommended model of old users. We mainly focus on basic portraits and short-term interaction records, using these features to solve the cold start problem for new users.

3.1 Popular Recommendations

As mentioned in 2.1, a list of hot anchors can be generated according to some rules (such as the ranking of retention rate of new users in live broadcast rooms, ranking of popularity in live broadcast rooms, etc.) and then recommended to new users.

3.2 Hot + real-time behavioral feedback

On the basis of pure popular recommendation, we introduce the feature of users’ recent viewing records, and use the collaborative filtering method to recall anchors similar to those in the viewing records from the candidate set, and then recommend them to users after sorting. If the user has no viewing record, the hot list is recommended. This scheme is relatively simple and can make use of the existing collaborative filtering model without the need to train the model separately.

3.3 bandit algorithm

In the multi-arm DU-Bo-machine problem, the tester tried to select different arms according to the strategy for several times, and updated the winning probability of each arm iteratively according to the feedback. In the live broadcast recommendation, there are many online anchors, but users can only watch one anchor at a time, and the mobile phone screen is small, so only a few anchors can be displayed on one screen. If the anchor directly corresponds to an arm of the multi-arm DU-Bo-machine, and one show (exposure) is modeled as a try, and one click/watch as a win, many arms will not be updated. To this end, we made some modifications. We clustered anchors, took categories as arms, and calculated bandit scores of users against categories.

Taking Thompson sampling algorithm as an example, we first trained anchors’ vectors through matrix decomposition, Item2vec and other methods, and obtained multiple categories through K-means clustering. When bandit algorithm learned online, parameters of each category were updated according to users’ viewing records, and then multiplied with anchors’ scores to get the final score. Different from the category score based on the bandit attempt of a single user, the anchor score is calculated by counting the exposure and click records of all users on the platform and updating bandit parameters. If there is no exposure and click record, a random value is assigned. Bandit algorithm can recommend different types of live streams to users compared to popular and view-based collaborative filtering schemes, at the cost of poor performance in the first few recommendations.

3.4 Deep learning

In addition to the lack of interactive behavior records, new users often have some basic portraits (gender, age) and contextual characteristics (region, device model, channel, time, etc.). Bandit algorithm does not make use of these information, and the previous recommendation attempts are uncertain, so the initial recommendation effect is relatively ordinary. We try to use deep learning to learn the group’s interest preference based on these basic features. In Wide&Deep, the user’s context features are manually crossed with the anchor’s tag, and DeepFM is used to realize automatic feature crossing to achieve the effect of learning the group’s features. At the same time, in the Deep part of the model, referring to DIN model, the attention mechanism is used to process the characteristics of users’ viewing sequence, from the learning of behavior sequence to the user’s interest vector, so as to realize the learning of users’ real-time behavior characteristics.

3.5 Use of auxiliary information

The depth model makes use of the features of basic portrait and viewing sequence, and the recommendation effect has been improved to some extent. On this basis, it is considered to use user portraits of other platforms to complete the existing user portraits. This includes the use of some portraits on social platforms and the use of existing user portraits on other registration source platforms.

3.6 Model Effect

In the process of solving the problem of cold start of new users, feature mining and model updating are indispensable. After adding real-time feedback information of users on the popular basis, relevant indicators are significantly improved, such as average viewing time of new users is increased by more than 50%, and depth model is also improved by more than 20% on this basis.

4. To summarize

The introduction of new users and new items is one of the keys for a platform to maintain its vitality. Cold start scheme of recommendation system plays a very important role in the introduction of new users and new items, and there are many studies in the industry on cold start. In the selection of technology, appropriate schemes should be selected based on different scenarios. Relatively simple methods can be selected in the initial stage to bring the system online as soon as possible, and some relatively complex schemes can be continued to be tried for iterative improvement when the system is relatively stable.