Bytedance Technology — Kunlun White
What is terminal intelligence?
AI technology has now covered all aspects of the Internet, and its application in cloud has been very extensive and mature. In order to follow the wave of ARTIFICIAL intelligence, major manufacturers are also continuously enhancing the AI capabilities of mobile devices, mainly reflected in the following aspects:
- Provide better computing power with soCs tailored specifically for AI capabilities
- The maturity of lightweight inference engine technology (e.g., TensorFlow Lite) makes it more user-friendly for mobile devices with limited computing power
- Model compression reduces the size of the model, making it possible to deploy it on mobile devices
After years of rapid development, the deployment of AI capabilities in terminal devices has gradually entered the public’s view, and the concept of end intelligence has emerged to provide a complete framework for the use of AI capabilities in terminal devices. Compared with cloud, end intelligence has the following advantages:
- Low latency: Saves the latency of network requests
- Security: better protect users’ private data
- Customization: carry out local training according to user habits, gradually iterate and optimize, and truly customize users
- Richer features: More diversified user features can be obtained to improve the accuracy of prediction
- Saving cloud resources: combined with cloud reasoning, the terminal is preprocessed to reduce the pressure on cloud computing power
- Richer application scenarios: face recognition, gesture recognition, translation, interest prediction, image search and other intelligent scenarios have been widely used, and more application scenarios are emerging.
In the application of end intelligence, Google, Facebook, Apple and other giants have taken the lead. Google has proposed the concept of Recommendation Android App, which makes content Recommendation based on users’ interests. Apple’s Face ID and Siri Recommendations are also examples of smart apps.
In China, Alibaba, Tencent and other enterprises have also made attempts on end intelligence. Ali has realized the landing of end intelligence in several scenarios, such as the list rearrangement, intelligent refresh, miss point prediction, intelligent Push, and paolitao (search map by map), and launched the MNN neural network deep learning framework. Tencent has launched its own NCNN framework, and has widely applied terminal intelligence technology in medical treatment, translation, games, smart speakers and other fields.
Figure 2-2 shows a typical end intelligence development process. Firstly, algorithm design and model training are carried out using the collected data in the cloud, and the model is output. At this time, the model is not suitable for mobile devices, so it needs to be converted to the format supported by the mobile inference engine through model transformation and compression. Finally, the algorithm and model are dynamically deployed to the target device through cloud configuration. On the terminal device, the inference process is triggered in the appropriate use scenario and opportunity, the input data is sorted out according to the model and transmitted to the inference engine, and the corresponding logic adjustment and feedback are made after the inference result is obtained and analyzed.
Second, the challenges of end intelligence
It’s not easy to be smart on the deployment side of mobile devices, and there are a lot of issues that need to be addressed in the current development process.
- Development efficiency
According to the typical process of end intelligence development, algorithm engineers are first required to train and output the model in the cloud, which determines the input and output formats of data. In the next stage, client engineers need to carry out end-to-end development to adapt to the current model, including collecting and sorting input data, parsing output data, etc. After the development is completed, it will be handed over to the test engineer for follow-up quality assurance. This requires collaboration and communication between multiple roles, making the overall development link lengthy and inefficient.
- flexibility
As mentioned above, once a model is defined, its inputs and outputs follow a fixed format. If the model wants to adjust the strategy for input data, it must also modify the client logic. Therefore, the flexibility of online verification and function iteration is greatly limited, which virtually elongates the whole function launch cycle. In addition, different intelligent application scenarios (CV, recommendation, etc.) also have great differences in requirements. How to meet the requirements of different business scenarios is also a head problem to be solved.
- On-end environment complexity
It is not easy to build a complete set of intelligent operation environment. In addition to model training and delivery in the cloud, data collection, storage and processing, hardware resource evaluation and scheduling, inference engine selection, operating system compatibility, inference task management and scheduling need to be carried out on the client. These problems raise the threshold of the application of end intelligence. How to shield these complex end environment is also an important challenge for end intelligence at present.
Iii. Intelligent integrated solution of Pitaya terminal
To solve the above problems, Pitaya worked closely with the MLX team to build an end-to-end (cloud) full-link dynamic deployment solution. MLX is A cloud-based model training and development platform, providing model training, transformation, debugging, release, A/B and other capabilities. The Pitaya SDK client provides feature engineering, inference engine, algorithm package management, task management, monitoring and other on-side capabilities. The deep integration of the two covers each link of the process of end intelligence, greatly reducing the application threshold of end intelligence.
1. MLX platform
Before introducing Pitaya, let’s talk about the MLX platform. In the process of model training, we will encounter many environmental differences, such as the storage structure and data format of different data sources, different machine learning frameworks, and the establishment of different operating system environments and infrastructure capabilities. The MLX platform was born to solve this problem. It provides a service platform to realize model training, transformation and final productization in the cloud, and saves complex environment construction work through extensive access to various computing frameworks and service platforms.
The architecture of MLX is shown in Figure 3-1-1:
- Base Infra: provides a lot of infrastructure capacity to support the upper functions.
- ML: Provides support for various machine learning frameworks, including Scheduler, Model Training, and Model Serving, which are the core capabilities of Model Training and transformation.
- Core: the main functions that developers directly contact with. In the algorithm development process, MLX provides Notebook, Web Studio and other online IDE editing environment, and supports the workflow control in the drag-and-drop mode of DAG Designer, which is easier to use. In the process of model productization, training, task management, export, release and other links are covered.
- Scene: With the support of the above basic capabilities, different intelligent Scene tasks can be built on MLX platform, such as NLP, CV and GDBT, and Pitaya is a new member of it.
2. Pitaya
2.1 algorithm package
Pitaya has deeply integrated its workflow with MLX platform features, as shown below. In the traditional end intelligence development process, the cloud is only responsible for the output of the model and can dynamically deploy the lower model. In Pitaya workflow, an “algorithm pack” is a complete collection of resources. The “algorithm package” includes the model used in the reasoning process, as well as the algorithm logic and its dependent library information, all of which are packaged into the “algorithm package”. If the client synchronizes the content of the algorithm package and has the capabilities required for the operation of the algorithm package, then the algorithm package can be run to achieve a complete reasoning process.
2.2 Development and debugging
During the development of algorithm package, test algorithm package can be generated temporarily. A data channel is established between the host App and Pitaya-MLX platform by scanning codes, and the test algorithm package is pushed to the client. After running and debugging the algorithm package on the real computer, the output log information will also be displayed in THE IDE environment of MLX, so as to realize the complete experience of cloud debugging. Thanks to the deep integration of Pitaya and MLX, algorithm engineers no longer rely on client engineers for any development, and can independently complete the operation and debugging of the algorithm on the terminal, which greatly improves the efficiency of algorithm development.
2.3 Dynamic Deployment
After the algorithm engineer completes the algorithm package debugging, the current project can be packaged into an algorithm package. Each Business scenario has a unique Business identity under the current App, and the algorithm package is bound to Business. The pitaya-MLX publishing platform can configure and manage the delivery of algorithm package from multiple dimensions such as App, App version, OS version and channel for a certain Business. The release platform also realizes data connection with A/B platform, which can realize seamless online experimental comparison and greatly speed up the verification of online business effects.
In addition to the general configuration capabilities mentioned above, the Pitaya-MLX publishing platform also supports performance rating of current device models, and differential distribution of algorithm packages based on the scoring results. In this deployment mode, the performance of online devices can be divided in a more fine-grained manner. For devices with high performance or that support certain AI acceleration, a more accurate model can be delivered. For devices with low performance, a relatively simplified model can be deployed for better use experience.
2.4 Characteristic Engineering
One of Pitaya’s core capabilities is feature engineering. End-to-end reasoning generally needs to generate features from the original data into the model, and then deduce the results from the model. If the business needs to collect and save the raw data itself, the workload is enormous. The function of feature engineering is to help the business side to collect user feature data on the end in a non-invasive manner for subsequent inference and prediction. Pitaya feature engineering is integrated with the Applog SDK (Event statistics SDK) to support the configuration of Applog events needed in the inference process in the algorithm package. When the algorithm package takes effect locally, Pitaya can collect data according to the configuration of the algorithm package. Pitaya also provides a customized interface for associating user action contexts, such as clicks, exposures, swipes, etc. During the operation of the algorithm package, the original data of users can be obtained through the capability of feature engineering, and the input data required by the model can be generated through data processing. This data collection method is more dynamic and flexible than traditional links, freeing businesses from heavy data processing work.
In addition, feature engineering also supports uploading specified data to THE MLX platform for model training in the cloud, forming a complete data closed-loop, as shown in Figure 3-2-1.
2.5 Operation of algorithm package
When an algorithm package is synchronized to the terminal device, the algorithm package can be triggered in two ways.
- Applog Event triggering: An Applog Event can be triggered after an algorithm package is configured. If a corresponding Applog Event is detected, the algorithm package is triggered indirectly. The triggering mode provides an opportunity for the service party to trigger the algorithm package through the Applog Event and perform operations such as extracting and preprocessing feature data.
- Active triggering: The business side actively invokes Pitaya’s interface (as shown in the following figure on Objective-C) in appropriate scenarios and opportunities, and can customize input data and task configuration to obtain inference results in callbacks.
- (void)runBusiness:(NSString *)business
input:(PTYInput *_Nullable)input
config:(PTYTaskConfig *_Nullable)config
taskCallback:(PTYTaskCallback _Nullable)taskCallback;
Copy the code
Every time an algorithm package is triggered, a Task is created. Pitaya’s internal Task management module will take over and process the Task uniformly. The algorithm package is executed in Pitaya’s runtime container, which provides an independent runtime environment for each task and implements feature engineering data access and model reasoning through the interface provided by Pitaya. Pitaya abstracts reasoning processes and interfaces to support the integration of different types of reasoning engines (ByteNN, ByteDT, TFLite), which can meet the needs of different business parties to the greatest extent and reduce the cost of project migration to Pitaya framework.
2.6 Task Monitoring
In order to achieve the one-click integration experience, Pitaya has built a comprehensive and meticulous monitoring system for tasks. The monitoring content covers the following aspects:
- Indicator monitoring: PV/UV task, success/failure, algorithm package download success rate/coverage rate
- Performance monitoring: memory, link phase time, and initialization time
- Abnormal monitoring: Tasks are stuck, failure causes, and network requests fail
Pitaya SDK categorizes the above indicators. Relying on the data presentation capability of Slardar platform, each integrated business side can copy the template with one click and establish a complete data kanban in the host App, which is truly out of the box.
Iv. Summary and outlook
Pitaya is an integrated solution of terminal intelligence specially designed for mobile terminals. Compared with traditional solutions, Pitaya has the following advantages:
- It reduces the cost of using terminal intelligence, facilitates the rapid integration of business and gets business benefits
- Perfect dynamic capability, support model rapid iteration and effect verification
- Improve the efficiency of multi-party collaboration and allow algorithm engineers to be deeply involved in client scenarios
- The algorithm and model are highly reusable, which can promote the verified scheme quickly
At present, Bytedance has started the practice and exploration of end intelligence based on Pitaya in many product lines such as Tiktok, Toutiao and Watermelon. During this process, we have continued to communicate with the business side to improve the product and user experience, and plan the future development direction of Pitaya as follows:
- Feature engineering: strengthen feature engineering capabilities, make full use of end-to-end and side-specific information and combine cloud data to provide richer and more accurate data.
- Derivation of model: The biggest application scenario of end intelligence is to continuously learn according to users’ own behavior characteristics, so as to better conform to users’ habits. In order to achieve this effect, it is necessary to solve the problems of local training data and model local training, and establish a set of management mechanism for model accuracy evaluation and rollback.
- General AI capacity building: For general application scenarios (network state prediction, etc.), Pitaya can build relevant capabilities, which can be quickly promoted to the business side.
About the Byte Mobile Platform team
Client Infrastructure, bytedance’s mobile platform team, is an industry leader in big front-end Infrastructure technology. It is responsible for the construction of big front-end Infrastructure in Bytedance’s China region to improve the performance, stability and engineering efficiency of the company’s entire product line. The supported products include but are not limited to Douyin, Toutiao, Watermelon Video, Huoshan Video, etc., and have in-depth research on mobile terminal, Web, Desktop and other terminals.
Now! Client/front-end/server/test development recruitment worldwide! Let’s change the world with technology. If you are interested, please contact [email protected] with the subject line resume – Name – Job objective – Phone number.