Application practice of machine learning in 360 mobile assistant

Odd technical guidelines

Recommendation, as a technology to solve information overload and tap the potential needs of users, has become a standard part of Internet products. This paper mainly Outlines the aspects involved in 360 mobile assistant recommendation, including the understanding of business scenarios, recommendation system architecture, as well as the core recommendation process, data warehouse construction and online analysis and monitoring in the system architecture.

This article is reprinted from Qizhuo.

The introduction

Recommendation, as a technology to solve information overload and tap the potential needs of users, has become a standard part of Internet products. Collaborative filtering was initially used by Amazon in the industry, including common content-based recommendations and user-based recommendations. Next, the matrix decomposition occurs. The user behavior matrix is decomposed into user matrix and item matrix, and the most relevant TopK items are found according to the user matrix. With the gradual maturity of CTR estimation process in advertising system, most recommendation and search problems are also transformed into CTR estimation problems. Common models in this period include LR, FM, GBDT and some model fusion attempts (such as LR+GBDT[1]). The difficulty in this period is to mine features, requiring a lot of feature engineering. In recent years, deep learning has made breakthroughs in speech and image by virtue of its powerful expression ability and flexible network structure, and also shines in the recommendation field. Wide&Deep[2], DeepFM[3], DIN[4] and other network structures have achieved good results in CTR prediction problems.

In 360 mobile assistant products, App recommendations on the home page, game page and software page and App related recommendations account for a high proportion in App downloads. In order to improve users’ stickiness, 360 mobile Assistant also adds information and game videos into the product. How to better mix App and content to improve users’ download conversion and content click is also a challenge for recommendation. This paper mainly Outlines the aspects involved in 360 mobile assistant recommendation, including the understanding of business scenarios, recommendation system architecture, as well as the core recommendation process, data warehouse construction and online analysis and monitoring in the system architecture.

The business scenario

As a relatively low-frequency APP, the product design will be diversified in application scenarios as far as possible, not only to meet the needs of users to find applications, but also to meet the needs of users after downloading.

Business characteristics

The application market has the following business scenarios:

The total number of apps is large (millions), but the quality is relatively few, and the life cycle of different apps is different
The frequency of user usage is relatively low, and the granularity of user base portrait is coarse
The frequency of software and game hits is different, and the decision factors affecting users’ downloading are different
There is a big difference in download habits between new and old phones
There are many recommended scenarios and different optimization objectives

As a result of these issues, the app store recommendation system will have a different focus.

Since every part of the whole system requires the team to participate in the development and maintenance, we try our best to use open source projects in the process of work, and modularize part of the code to be configurable. On the one hand, for the convenience of management, knowledge accumulation can be formed, and on the other hand, to reduce the trouble in maintenance and reduce hidden risks.

System architecture

The overall framework of 360 mobile Assistant recommendation is as follows:

Data warehouse layer

As the algorithm technology department is not only responsible for all algorithm-related work, but also for data construction, data analysis and other multi-dimensional requirements, the importance of data construction was realized only after experiencing various pains. The initial data construction architecture of the department adopts a single-layer mode: interface layer (or parsing layer)-> application. This mode will not have a big problem in the scenario of unique data source type and small number of interfaces. The only advantage is that data support for new business is relatively fast, and there are many disadvantages:

Iterative development
Inconsistent caliber
Transformation and maintenance cost is high and so on.

Increase along with the development of the business, the data source type (SDK form, HTTP requests, internal system libraries, etc.), the change of the business requirements (demand), complexity, repeatability, timeliness, diversity in single mode is an infinite disadvantage, at this time the construction of data should be centralized, platform (large data centers), and the data warehouse is the core of construction in large data centers. The core objectives are:

The interface layer shields the diversity of data sources;
The application layer ensures data quality (integrity, timeliness, quasi-certainty and uniqueness);
The layering concept reduces the migration and transformation cost and maintenance cost caused by the change of the source system, improves the reuse rate, avoids repeated development and responds to the business demand in time;
Unified hierarchical model and naming conventions provide a global view of data, making the knowledge system easier to query and understand.

Calculate the layer

The computing layer as a whole can be divided into two parts: offline computing and real-time computing. The platform used by the computing layer is mainly Spark, Hadoop MR, HBox provided by 360 Systems Department (which has completed the integration of TensorFlow, MXNet, Caffe, Theano, PyTorch, Keras, XGBoost and other commonly used frameworks. It also has good scalability and compatibility.

The output of the offline computing layer mainly includes:

User portrait
Training model
The related indexes
Personalized Push list
list
Statements, analysis reports, etc

The real-time computing output mainly includes:

Real-time behavioral feedback from users
Short-term characteristics of item
Server & business real-time monitoring indicators, etc

Recommendation system layer

This section mainly refers to the engine that provides online services. The working mode of the entire engine is defined in the configuration file. According to the configuration file, the engine will read the corresponding recall data, select the sorting model, select the policy rules, read the operation intervention data, and finally return to the user a complete recommendation list.

Take APP recommendation as an example

Data collection: User portrait data and feature data of the APP are recorded with the recommended logs. The logs are collected in real time through Scribe, and the feedback data of users are associated with uniqID and stored in the original training data. After that, the data is cleaned, formatted and saved into CSV format and stored in the DM layer of the data warehouse.

Quantitative indicators: The indicators recommended by APP are easier to define, and the recommendation effect can be measured by CTR/CVR. But in the video (content) recommended by quantitative indicators in the scene is more complex, because in the app store content recommended scenario in the scene of the final purpose is to improve the user download the desire, but experiments show that the optimal directly download the final recommendation is not ideal, so in the video (content) recommended we are trying to use migration study of MTL in thought, The Wide&deep model is improved to support multi-objective training, and Tensorflow is used to improve the model.

Data cleaning sampling: User exposure logs are very large, but user click behavior data is relatively small. Therefore, data sampling is a necessary step. First, filter out the data that is not really exposed to the user, otherwise these data will have a great influence on the model, and the model’s performance in training will be unstable. Considering the particularity of business scenarios, although some users have real exposure, their behaviors are only temporary and they do not generate any feedback data, so this part of data will also be filtered out. Finally, we will try up sampling and down sampling according to the adjustment of optimization objectives.

Features: Based on application store scenarios, we abstract the following features:

APP features: Basic features: ID, APP name, introduction, classification information, package size, comment information, rating information, update time, etc. Statistical features: downloads, CTR, CVR, and revenue by time period
User characteristics: demographic attributes: gender, age, region, etc. Equipment information: model, brand, resolution, operating system, etc. Behavioral attributes: category preference, time-sharing activity, download, search, browse, other data source portraits, etc
Context features: time, region, application scenario, exposure position, etc. Recently, long-term exposure data
Cross features: cross, browse &ID, gender & age &ID, etc. Intersection, gender, age &ID CTR, interest &ID CTR, etc

Analysis: after all, the characteristics of the decision model of the ceiling, while deep study on the voice and image of the end-to-end study has achieved very good results, but want to accomplish in the recommendation system end-to-end learning difficulty is very big, after all the data noise is more, compared to the image and the user log also does not have a good structure. On the one hand, feature analysis can optimize the data input to the model. On the other hand, it is very important to find hidden bugs in the system. Some bugs are not easy to be found in the system, but feature analysis can find them in time and avoid further deterioration. Feature analysis was carried out according to the logs after JOIN. First, feature quality was judged by the overall coverage. The numerical characteristics are analyzed from the dimensions of mean, variance, correlation coefficient and chi square. The category features are analyzed from the dimensions of single feature CTR, information gain and AUC.

Small sample data sample continuous

Small sample data sample category

The recall section should actually be written in its own right, but it is briefly introduced here for completeness. Content-based: Based on the name, introduction, author, tag and other information of the APP, the similarity of the content is carried out. However, because the introduction of the APP is usually short and the focus is not prominent, the effect of similarity recall based on the content is not ideal.

Item-based collaborative filtering: Because the user’s behavior is greatly affected by the display content, and the behavior of long-tail APP is very few, although the recall effect is good for relatively high-frequency APP, the problem of incomplete recall or non-recall of long-tail APP will occur if collaborative filtering is completely based.

Embedding based: We implement Embedding based on user installation data using word2vec thought, but there is no strong correlation between apps installed by users and they can’t get good effect like text class. Later, referring to the idea of YouTube recommendation system, we indirectly get the embedding vector of APP by making sorting model prediction, and then calculate the similarity of each APP.

The results based on collaborative filtering on the left side of the following figure are basically concentrated in shared bikes. On the right side, some other commonalities can be found based on embedding mode, but each model is not good when seen individually. Therefore, online integration of multiple models is required, and the integration method needs some human experience and can be adjusted by checking bad cases.

Population-based cluster: mainly for new users or user features less, according to the model, for example, regional, gender, age and other attributes do recall, if only consider this part of the crowd of downloads, installed capacity will lead to the result did not distinguish between degrees, so need to take account of the differences between the. Among them, TGI is a good measurement effect, but the result will be out of control based on TGI completely. Therefore, the TGI formula needs to be deformed and other factors should be considered as the final result

Recall based on user vote: this part of the results are mainly for the application scenarios of the list, to provide users with relatively authoritative list results, improve the trust of users, and ultimately improve the download rate of users. The list is divided into total list, soaring list, emerging list, classification list and so on. This part of data will be integrated with the data of other businesses of the company for unified consideration, and IMDB and other algorithms are considered in the sorting calculation.

Sorting model: at the same time, we package the code as a general project, through the configuration file to select the feature combination and processing method, custom model selection, selection of optimizer, custom model export form, etc. For the custom model, we only need to redefine model_FN. Feature discretization, embedding, crossover and other operations are integrated into the model, and only the original data features are needed on the line. Online engine and offline training use the same set of configuration files to process the features, to avoid the online and offline parts of the model due to human error caused by abnormal use. After the code is structured, other scenarios can complete the process from offline training to online verification more quickly.

Example: in the following figure, the first on the left supports transformation processing of feature columns, the second on the left is the parameters of model training, and the third on the left is the complete definition of feature columns.

At the very beginning of the project, the ranking rules had a high weight. We referred to the idea of Edge Rank proposed by Facebook and defined different user behaviors as different edges, with different weights for each Edge (e.g. downloading is more than browsing). Relevancy is used as score of items under Edge, and all scores are multiplied to obtain the final ranking. The disadvantage is that the weight of each Edge is not well defined, frequent AB tests are required, and the problem is not smooth enough to handle.

Try LR rules after abstract characterized by rapid, simple explanation for LR model is strong, has a great help for us to understand the online business, but because of LR for more strict quality characteristics, we need to do a lot as well as the characteristics of the combination of the characteristics of mining attempts, from the point of view of cost performance will not take the model as the main optimization model for a long time.

Because LR has a high demand for feature combination, we began to try GBDT and GBDT+LR to reduce the trouble of feature combination. Although the tree model has a strong ability to divide continuous features, it has disadvantages such as poor depth control for large-scale sparse ID class feature trees. FM has obvious advantages in dealing with cross features. Considering the implementation cost, we will focus on deep learning in the later stage. DNN, FM and other models will be realized by using neural network.

At present, it is mainly based on the recommendation of deep learning, and the online sorting is updated according to the Wide&Deep model proposed in the paper of Google Play. Wide&deep can give consideration to memorization and generalization at the same time, and can learn low-order features and advanced features. At present, Wide&Deep model has been fully launched in many scenarios. However, the wide part still requires human intervention of some cross feature items, so we are currently using DeepFM to reduce the human cross part and excavate the hidden cross feature, so as to improve the online effect. We choose Tensorflow for the deep learning framework. The training code part is based on the tensorFlow advanced API Estimator to reduce the complexity of operating the underlying API, and utilizes the flexible feature processing feature of Feature_column to improve the efficiency of our feature attempts.

Strategy layer

For the purpose of interest detection and new product detection, UCB and Thompson sampling are mainly used online. For new games, exposure weight should be increased as much as possible. Meanwhile, if there are high-quality resources that need to be sold or forced brand exposure, positions will be reserved for operation configuration.

online

The engine sorting part is currently divided into two parts. The deep learning sorting part is completely based on TF Serving. The advantage is that the secondary development is reduced and the model effects can be easily and quickly verified, but there is an extra network overhead due to the large amount of feature data that needs to be sent to TF Serving per request. The traditional machine learning sorting part is embedded directly into the engine as a unit within the engine, saving network overhead compared to TF Serving. Also, when TF Serving suffers a problem, the engine automatically degrades the sorting to traditional machine learning sorting.

Contrast effects

CTR comparison data after one of the service scenarios went online

The business layer

From the perspective of users, the business layer refers to different recommendation scenarios, and the product design and user requirements in each scenario are different.

For example, although the location of the first screen of the home page is personalized recommendation, due to its special location, the promotable content of personalized recommendation needs to be strictly screened, and the algorithm needs to sort in the limited recall set. This kind of scene has relatively high requirements for sorting.
For example, the similar recommendation of APP focuses more on the correlation between two items, so the requirement of recall similarity is relatively high.

Coordinated scheduling layer

Data flow between different layers needs to have a complete closed loop, so that the development of the whole system can be benign. The coordination layer mainly includes real-time collection of user recommendation logs; Accurate update engine after item feature changes, quick and complete update to user portrait system after user portrait update, quick and complete update to sorting engine after model update, etc.

Analysis & monitoring layer

Analysis is mainly for operation and product personnel. The team provides a self-service query platform and report work to facilitate non-technical personnel to quickly query data and analyze results. The monitoring layer mainly includes core index monitoring and system index monitoring. System index monitoring can quickly build a visual monitoring platform through ELK, which can easily and quickly find the problems existing in the system and improve them in time.

Summary and Outlook

We still have a long way to go in machine learning. On the business side, there is a need to continue to strengthen the understanding of business scenarios and continue to optimize each scenario. In the system aspect, feature server, deweight server and so on need to be further split and optimized. In terms of features, we will continue to conduct in-depth research on the mining and utilization of features. At present, we have not used the features of images, and we will make some attempts specifically for images in the future. In terms of the model, we will continue to explore the network structure, try new model features, and fit the characteristics of the scene. The successful experience of academia and industry is very valuable and provides us with new ideas and methods. However, due to the different business problems and accumulated data of scenarios, it is still necessary to adapt to the scenarios to achieve the improvement of business goals.

reference

[1] He X, Pan J, Jin O, et al. Practical lessons from predicting clicks on adsat facebook[C]. Proceedings of 20th ACM SIGKDD Conference on KnowledgeDiscovery and Data Mining. ACM, 2014: 1-9. [2] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, [3] Guo H, Guo H, Wang J, et al. Recommender Systems based on Recommender theory and Recommender theory. Tang R, Ye Y, et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[J]. 2017:1725-1731. [4] Guorui Zhou, Chengru Song, Et al. Deep Interest Network for click-through Rate Prediction. ArXiv Preprint arXiv:1706.06978,2017. [5] Covington P, Adams J, Sargin E. Deep Neural Networks for YouTube Recommendations[C]// ACM Conference on Recommender Systems. ACM, 2016:191-198.

For more mobile technology, here at Chitro

World of you when not

Just your shoulder

There is no

360 official technical official account

Technology of dry goods | | hand information activities

Empty,

Application practice of machine learning in 360 mobile assistant

online

Contrast effects

Related Posts

What is the “captcha” that makes programmers hurt each other?

Voice boom in AI era, How Do Chinese giants Compete for microphone?

Yahoo AI has its own weapon?