Author: Byte Mobile Technology — White Kunlun
I. What is terminal intelligence?
AI technology has now covered all aspects of the Internet, and has been widely and maturely applied in the cloud. In order to follow the AI wave, major manufacturers are also increasing the AI capabilities of mobile devices, mainly in the following aspects:
- Provide better computing power through soCs tailored specifically for AI capabilities
- The maturity of lightweight inference engine technology (e.g. TensorFlow Lite) is more friendly to mobile devices with limited computing power
- Model compression technology reduces the model size significantly, making it possible to deploy it on mobile devices
With the rapid development in recent years, the ability to deploy AI in terminal devices has gradually entered the public view. The concept of terminal intelligence has emerged, which aims to provide a complete framework for the use of AI capabilities in terminal devices. Compared with the cloud, the terminal intelligence has the following advantages:
- Low latency: Saves latency for network requests
- Security: better protection of user privacy data
- Customization: local training is carried out according to user habits, and gradually iterative optimization is done to truly achieve user customization
- Richer features: Richer user features can be obtained to improve the prediction accuracy
- Saving cloud resources: Combined with cloud reasoning, preprocessing is carried out at the terminal, so as to reduce the pressure on cloud computing power
- Richer application scenarios: Face recognition, gesture recognition, translation, interest prediction, image search and other intelligent scenarios have been widely used, and more application scenarios are emerging.
In the application of terminal intelligence, Google, Facebook, Apple and other giants have taken the lead. Google proposes the concept of Recommendation Android App, which makes content Recommendation based on users’ interests. Apple’s Face ID and Siri Recommendation are also examples of smart apps.
In China, Ali, Tencent and other enterprises have also carried out the end of the intelligent attempt. Alibaba has realized the landing of terminal intelligence in many scenarios such as the rearrangement of baby list, intelligent refresh, prediction of miss point, intelligent Push, patlitao (searching for pictures by picture), and launched the MNN neural network deep learning framework. Tencent launched the NCNN framework developed by itself, and has widely applied terminal intelligent technology in the fields of medical treatment, translation, games, smart speakers and so on.
Figure 2-2 shows a typical end intelligence development process. Firstly, the collected data is used to design the algorithm and train the model in the cloud, and the model is output. In this case, the model is not suitable for mobile devices and needs to be transformed into a format supported by the mobile inference engine through model transformation and compression. Finally, the algorithm and model are dynamically deployed to the target device through cloud configuration. On the terminal device, the reasoning process is triggered in appropriate usage scenarios and opportunities, and the input data is sorted out according to the model and transferred to the reasoning engine. After the reasoning results are obtained and analyzed, the corresponding logical adjustment and feedback are made.
Ii. Challenges facing terminal intelligence
Deploying intelligence on mobile devices is not easy, and there are many problems in the current development process that need to be addressed.
-
Development efficiency
According to the typical terminal intelligence development process, the algorithm engineer first needs to train and output the model in the cloud, and the model determines the data input and output format. In the next stage, client engineers need to perform side development to fit the current model, including the collection and collation of input data, and the analysis of output data. After the development is completed, it shall be submitted to test engineers for follow-up quality assurance. This requires collaboration and communication between multiple roles, and the overall development link is lengthy and inefficient.
-
flexibility
As mentioned above, once the model is determined, its input and output follow a fixed format. If the model wants to adjust the policy for input data, it must also modify the client logic. As a result, the flexibility of online verification and function iteration is greatly limited, which effectively lengthens the whole function rollout cycle. In addition, different intelligent application scenarios (CV, recommendation, etc.) also have different requirements. How to meet the requirements of different business scenarios is also a header problem that needs to be solved.
-
On-end environment complexity
It is not easy to build a complete set of terminal intelligent operation environment. In addition to the training and delivery of models in the cloud, data collection, storage and processing, hardware resource evaluation and scheduling, inference engine selection, operating system compatibility, inference task management and scheduling are also required on the client. These problems intangibly raise the threshold of end-intelligence applications. How to shield these complex end-intelligence environments is also an important challenge facing end-intelligence.
Third, Pitaya end intelligent integration solution
To solve this problem, Pitaya has worked closely with the MLX team to build a full-link dynamic deployment solution from the cloud to the end. MLX is A cloud model training and development platform, providing model training, transformation, debugging, release, A/B and other capabilities. The Pitaya SDK client provides feature engineering, inference engine, algorithm package management, task management, monitoring and other on-side capabilities. The deep integration of the two covers every link of the end-intelligence process and greatly reduces the application threshold of end-intelligence.
1. MLX platform
Before introducing Pitaya, let’s focus on the MLX platform. In the process of model training, we will encounter many environmental differences, such as different data source storage structure, data format, different machine learning framework, and the construction of different operating system environment and infrastructure capabilities. The MLX platform was created to solve this problem. It provides a service platform for model training, transformation and final productionization in the cloud, and eliminates complex environment construction by widely accessing various computing frameworks and service platforms.
The architecture of MLX is shown in Figure 3-1-1:
- Base Infra: Provides a number of infrastructure capabilities to support the upper layer functions.
- ML: Provides support for various machine learning frameworks, mainly consisting of Scheduler, Model Training and Model Serving, which are the core capabilities of Model Training and transformation.
- Core: the main function for developers to contact directly. MLX provides Notebook, Web Studio and other online IDE editing environment in the algorithm development process, and supports the drag-and-drop workflow control of DAG Designer, making it easier to use. In the process of model transition, the whole link of training, task management, export and release is covered.
- Scene: Based on the above basic capabilities, the MLX platform can build different intelligent Scene tasks, such as NLP, CV, GDBT, and Pitaya is a new member of it.
2. Pitaya
2.1 algorithm package
Pitaya has deeply integrated its workflow with MLX platform features, as shown in the figure below. In the traditional end intelligence development process, the cloud is only responsible for the output of the model and can dynamically deploy the model. In the Pitaya workflow, an “algorithm package” is a collection of complete resources. The “algorithm package” includes the model used in the reasoning process, as well as the algorithm logic and its dependent library information, all of which are unified into the “algorithm package”. If the client synchronizes the contents of the algorithm package and has all the capabilities required for the algorithm package to run, it can run the algorithm package and realize a complete reasoning process.
2.2 Development and debugging
During the development of algorithm package, test algorithm package can be generated temporarily. Through code scanning, the host App will establish a data channel with Pitaya-MLX platform, push the test algorithm package to the client, run and debug the algorithm package on the real machine, and the output log information will also be displayed in THE IDE environment of MLX, so as to achieve a complete experience of cloud debugging. Thanks to the deep integration of Pitaya and MLX, algorithm engineers no longer rely on client engineers for any development, so they can independently complete the running and debugging of algorithms on the end, which greatly improves the efficiency of algorithm development.
2.3 Dynamic Deployment
When the algorithm engineer has finished debugging the algorithm package, the current project can be packaged into an algorithm package. Each Business scenario has a unique Business identity under the current App, and the algorithm package is bound to the Business. On the Pitaya-MLX publishing platform, the algorithm package delivery can be configured and managed from multiple dimensions such as App, App version, OS version and channel for a Business. The publishing platform also achieves data integration with A/B platform, which can realize seamless online experimental comparison and greatly accelerate the verification of online business effects.
In addition to the general configuration capabilities mentioned above, the Pitaya-MLX distribution platform also supports performance scores for current device models, and differentiated algorithm packages are delivered based on the results. In this deployment mode, the performance of online devices can be divided in a more granular manner. For devices with high performance or that support AI acceleration, a more precise model can be delivered. For devices with weak performance, a relatively simplified model can be deployed to achieve better user experience.
2.4 Feature Engineering
One of Pitaya’s core competencies is “feature engineering.” On-end reasoning generally requires the generation of features input into the model from the original data, and then the model inferences the results. If the business side needs to collect and keep the raw data itself, the workload is enormous. The purpose of feature engineering is to help the business side to collect user feature data on the end for subsequent inferential prediction in a non-intrusive manner. Pitaya feature engineering has been connected with Applog SDK (Event Statistics SDK), which supports the configuration of Applog events needed in the inference process in the algorithm package. When the algorithm package takes effect locally, Pitaya can collect data according to the configuration of the algorithm package. Pitaya also provides customized interfaces for contextualizing user behavior such as clicks, exposures, swipes, etc. In the process of running the algorithm package, the original data of users can be obtained through the capability of feature engineering, and the input data required by the model can be generated through data processing. This data collection method is more dynamic and flexible than the traditional link, freeing the business side from heavy data processing work.
In addition, feature engineering also supports uploading specified data to THE MLX platform for cloud model training, forming a complete data closed-loop, as shown in Figure 3-2-1.
2.5 Algorithm package running
When an algorithm package is synchronized to the terminal device, there are two ways to trigger the algorithm package to run.
- Applog Event triggering: The algorithm package is configured to trigger the Applog Event. When the corresponding Applog Event Event is detected, the algorithm package is triggered to run indirectly. The triggering method is to provide the business side with an Applog Event to trigger the execution of the algorithm package, and the extraction and preprocessing of feature data can be carried out in the algorithm package.
- Active triggering: The business side invokes the Pitaya interface (as shown in the figure below) in the appropriate scenario and time. The input data and task configuration can be customized, and the inference result can be obtained in the callback.
- (void)runBusiness:(NSString *)business
input:(PTYInput *_Nullable)input
config:(PTYTaskConfig *_Nullable)config
taskCallback:(PTYTaskCallback _Nullable)taskCallback;
Copy the code
Every time an algorithm package is triggered to run, a Task is actually created. The Task management module inside Pitaya will take over and process the Task in a unified manner. The algorithm package is executed in Pitaya’s running container, which provides an independent running environment for each task, and carries out feature engineering data access and model reasoning through the interface provided by Pitaya. Pitaya highly abstracts the reasoning process and interface, supports the integration of different types of reasoning engines (ByteNN, ByteDT, TFLite), and meets the use requirements of different business parties to the greatest extent, and reduces the cost of project migration to Pitaya framework.
2.6 Task Monitoring
In order to achieve an integrated experience, Pitaya has built a comprehensive and detailed monitoring system for tasks. The monitoring content covers the following aspects:
- Monitoring indicators: Task PV/UV, success/failure, algorithm package download success rate/coverage rate
- Performance monitoring: memory, link phase time, and initialization time
- Exception monitoring: Task stalled, failure cause, network request failed
The Pitaya SDK categorizes the above metrics. Based on the data presentation capabilities of the Slardar platform, each integrated business can copy the template with one click and build a complete data Kanban within the host App, which is truly out of the box.
Iv. Summary and prospect
Pitaya is a terminal intelligent integration solution specially created for mobile terminals. Compared with traditional solutions, Pitaya has the following advantages:
- It reduces the cost of terminal intelligent use, facilitates rapid business integration, and gets business benefits
- Perfect dynamic capability to support rapid iteration and effect verification of the model
- Improve the efficiency of multi-party collaboration and allow algorithm engineers to be deeply involved in client scenarios
- Algorithm and model are highly reusable, which can quickly promote the verified scheme
At present, many product lines in Bytedance, such as Douyin, Tiaotiao and Watermelon, have started the practice and exploration of end intelligence based on Pitaya. During this process, we also keep communicating with the business side, constantly polishing the product and user experience, and make the following plans for the future development direction of Pitaya:
- Feature engineering: Strengthen the capability of feature engineering, make full use of end-side unique information and combine with cloud data to provide richer and more accurate data.
- Model derivation: The biggest application scenario of end-intelligence is to continuously learn according to the user’s own behavior characteristics, so as to better conform to the user’s habits. In order to achieve this effect, it is necessary to solve the problems of local training data and model local training, and establish a set of management mechanism for model accuracy evaluation and rollback.
- General AI capability building: Pitaya can build in relevant capabilities for common application scenarios (network state prediction, etc.) and quickly promote them to the business side.
About the Byte Mobile Platform team
Bytedance Client Infrastructure is the industry leader in big front-end Infrastructure, responsible for the construction of big front-end Infrastructure in The entire China region of Bytedance, improving the performance, stability and engineering efficiency of the company’s entire product line. The supported products include but are not limited to Douyin, Toutiao, Watermelon video, Huoshan small video, etc., which have been researched in depth on mobile terminal, Web, Desktop and other terminals.
Now is the time! Client/front-end/server/Test development recruitment worldwide! Let’s use technology to change the world. If you are interested, please contact us at [email protected]. Email subject: Resume – Name – Objective – Phone number.