Introduction: How to build recommendation model quickly through machine learning PAI

Author: Cheng Mengli – Machine learning PAI team

With the popularity of mobile apps, personalized recommendation and advertising have become an integral part of many apps. They have made a huge difference in terms of improving the user experience and increasing the revenue of the app. The application of deep learning in the field of search and generalization has also been very deep, and has brought great improvement to the effect of various scenes. There are already many models for each stage of the recommendation process, most of which have open source implementations, but these implementations are often scattered around Github in different ways of handling data and constructing features. If we want to apply these models in a new scenario, we usually need to make more changes:

  • Input transformation, the input format and feature construction of open source implementation are usually inconsistent with online. It usually takes 1-2 weeks to adapt an algorithm, and it is inevitable that bugs will be introduced due to unfamiliarity with the code. If five algorithms are tried, the transformation time will be 5 times longer. If algorithm resources are limited, is it time to give up on something that might work?
  • Many open source implementations only achieve good results on the open data set, and the optimal parameters on the open data set may not be suitable for the actual scene, so the parameter tuning also needs a large amount of work; Sometimes it doesn’t work well, not because the method isn’t good, but because the parameters aren’t good. Without a systematic approach to parameters, many algorithms are simply a matter of trying them out, and without deep Explore, how can you get a deeper understanding of the algorithm? Why aren’t you able to spot seemingly simple improvements? Why did you try a similar direction, but it didn’t work? Effects are usually piled up with computational power and countless attempts;
  • The open source implementation uses TensorFlow 1.4, while the online version uses TensorFlow 2.3, where many of the function parameters have been changed. Many open source implementations have not been verified in real scenarios, so their reliability is questionable. They may lack the dropout and bn, and the effect is very different.
  • Cost of the model effect adjusted, found that there will be a lot of problems online, such as training speed is too slow, memory occupation is too large, reasoning QPS can not keep up, offline effect good online effect kneel and so on. With so many problems, do you still have energy for your next idea? Are you still motivated and persistent to explore new directions?

These are the kinds of problems that make us feel overwhelmed, stay late at night, and never know when to stop: it takes a lot of effort to test a simple idea. The so-called world martial arts, only fast can not break, for the search of the field of algorithm students, especially so: through fast iteration can verify more ideas, find more problems, find the optimal characteristics and model structure. Slow down, your business goals will change, the front end layout will change, your business will not trust you, and you will never get online.

At this point, our appeal is clear: we want to write less code, or even no code, to validate our ideas. In view of these problems and appeals, we launched a new, one-step recommendation modeling framework, is committed to helping you solve the problems in recommendation modeling, feature construction, parameter tuning, deployment and other aspects, so that you write less code, do less repetitive meaningless dirty work (these EasyRec are contracted). Less pit and less thunder (EasyRec for you), so that we can quickly online verification of new idea, improve the iteration efficiency of the recommended model.

advantage

EasyRec offers significant advantages over other modeling frameworks in the following areas:

  • Support multi-platform and multi-data source training
  • Supported platforms include: MaxCompute(original ODPS), DataScience(based on Kubernete), DLC(deep Learning Container), Alink, local;
  • Supported data sources include: OSS, HDFS, HIVE, MaxCompute Table, Kafka, Datahub;
  • Users usually only need to define their own model, after passing the local test, they can train on a variety of distributed platforms.
  • Support for multiple Tensorflow versions (>=1.12, <=2.4, PAI-TF), can seamlessly connect to the user’s environment, no code migration or change;
  • Support the implementation of mainstream feature engineering, especially the display of cross features, can significantly improve the effect;
  • HPO automatic parameter tuning is supported, which significantly reduces the user’s parameter tuning workload and improves the model effect in multiple scenarios.
  • The mainstream depth model is realized, covering recall, sorting, rough sorting, rearrangement, multi-objective, multi-interest, etc.
  • Support EarlyStop, BestExport, feature importance, feature selection, model distillation and other advanced functions.

architecture

The EasyRec modeling framework is based on the parallel data training method of Estimator as a whole, and supports multi-machine and multi-card training through the structure of Parameter Server. The main modules of EasyRec include input, feature construction, depth model, Loss and Metric, each of which can be customized. EasyRec has made in-depth optimization for various problems that users may encounter when training with TF, such as worker exit failure, num_epoch evaluator exit failure, inaccurate AUC calculation, etc. Aiming at AdamOptimizer slow training speed, asynchronous training slow, hash conflict, large sample space negative sampling and other problems, EasyRec combined with PAI TF(PAI optimized Tensorflow) and AliGraph also made a deep optimization.

model

EasyRec is built with the industry’s advanced deep learning model, covering recommended full link requirements, including recall, rough sorting, sorting, multi-objective, cold start, etc.

EasyRec also supports user – defined models. As shown below, to realize the customized model in EasyRec, only three parts of model structure, Loss and Metric need to be defined. Data processing and feature engineering can directly reuse the capabilities provided by the framework, so it can significantly save the modeling time and cost of users and focus on the exploration of model structure. For common model types such as RankModel, MultiTaskModel, etc., the Loss and Metric sections can also directly reuse the parent class definition.

Automatic parameter adjustment and automatic feature engineering

EasyRec auto-tuning integrates PAI automL auto-tuning capability to realize automatic tuning for a variety of parameters. Any parameter defined in EasyRec is searchable. Common parameters include hash_bucket_size, embedding_DIM, Learning_rate, dropout, batch_norm, and feature selection. When you are in doubt about some parameters, you can start automatic parameters to help you find the best Settings; The parameters you get from auto-tuning are usually better than the ones you get from tapping your head, and sometimes they can be surprising.

Feature engineering is usually the key to improve the recommendation effect. High-order feature combination is usually helpful to improve the model effect, but the space for high-order combination is very large. Brainless combination will lead to feature explosion, which will drag down the speed of training and reasoning. Therefore, EasyRec introduces the ability of AutoFeature engineering to automatically find enhanced high-order features and further improve the effect of the model.

Top5 search results:

Deployment model

The EasyRec model can be deployed to the PAI EAS environment with one click or via TF Serving. In order to improve inference performance, EasyRec introduced PAI Blade’s ability to perform placement optimization, OP fusion, subgraph weight removal and other functions. Through the above optimization, QPS increased by more than 30% and RT decreased by 50%. In the future, the function of FP16 will be introduced to further improve inference performance and reduce memory consumption. In order to support super-scale Embedding, EasyRec splits and replaces the large model with op, and stores the Embedding in distributed caches such as Redis, breaking the memory limit. Embedding from Redis is slower than that of the memory, so the access to Redis is reduced by caching the high-frequency ids to improve the embedding lookup speed.

Characteristic consistency

Feature engineering is a key part of the search and promotion process, and is usually the cause of the inconsistency between online and offline effects. In order to maintain consistency offline and online in fast iterations, the usual approach is to use the same set of code offline and online. The construction process of offline training data: Firstly, the user feature(including real-time and offline parts), item feature and context_feature are constructed, and then training samples (including label) are joined. Finally, training samples input to EasyRec are generated through jar package of feature engineering. Online process: Import user feature(offline part) and item feature into distributed storage such as Redis and Hologres, recommendation engine queries corresponding features according to user_id and Item_id, calls feature engineering library for processing, and sends them into EasyRec model for prediction. The real-time feature of the online part is usually generated by using blink, Alink and other platforms supporting streaming computing, while the real-time feature of the offline part is constructed in two ways: offline simulation and online falling feature. Both approaches have their pros and cons: There are usually small inconsistencies between offline emulation and online emulation due to log loss and other issues; If new features need to be added online, it usually takes a long time to accumulate enough samples. Our solution is to sequence the user’s behavior online, and then process various statistical features from the same JAR package offline, such as 1H /2h/.. /24h clicks.

Online feature engineering requires higher computational efficiency, and the amount of calculation is larger than that of offline: the sample of offline calculation is usually 1 user paired with M exposed items (if the model is recalled, some negative samples of random sampling will be added), while the sample of online calculation is 1 user paired with N items (n>> M). Online computing is often inefficient with naive computing, where a request is spread out into n samples for separate calculations. It is not difficult to find that the user feature has a lot of repeated calculation, and the optimization of the calculation efficiency of the user feature can significantly improve the QPS on the line. We combined the Feature Generation module used in tao system to do in-depth optimization, including memory allocation, string parsing, repeated computation elimination, multi-thread parallel computation, etc., which significantly improved the efficiency of computation on the premise of ensuring consistency.

Incremental training and real-time training

Incremental training usually leads to a significant improvement in effectiveness because incremental training sees more samples and is more fully trained on embeding. EasyRec supports checkpoint restore from the previous day and then continues training on the new day’s data. In order to quickly adapt to the rapid changes in the sample distribution of news, holidays, promotion and other scenes, we provide support for real-time training. EasyRec constructs real-time samples and features through Blink, and calls Feature Generation to process features, and then reads real-time sample flow through Kafka and DataHub for training. The stability of real-time training is very important. We monitor the positive and negative sample ratio, feature distribution and AUC of the model in real time during the training process. When the change of sample and feature distribution exceeds the threshold, the alarm is given and the model is stopped updating. When checkpoint is saved, the EasyRec synchronously records the offsets currently trained (there will be multiple offsets when multiple workers are trained together). When the system restarts after a fault occurs, the training will be resumed from the saved offsets.

Effect of validation

EasyRec was validated in multiple user scenarios (20+), including product recommendations, news stream advertising, social media, live streaming, video recommendations, etc. Here are some of the improvements that customers have made using EasyRec in their scenarios:

  • An APP advertisement push: AUC increased by 1 point, online CTR increased by 4%, and resource consumption decreased by half;
  • A large live broadcast APP: AUC based on EasyRec MultiTower model increased by 2%;
  • A large social media: AUC based on EasyRec MultiTower model increased by 6%, online effect increased by 50%;
  • A large e-commerce platform: Based on Easyrec DSSM model, online UV value increased by 11%, UVCTR increased by 4%;
  • A short video APP: Based on EasyRec DBMTL model, online duration increased by 30%, + multi-modal features further increased by 10%.

Finally, EasyRec has been open-source via Github (github.com/alibaba/Eas…

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.