Author: Byte terminal technology — Quantity

preface

End intelligence, as the name suggests, is the AI model running on the end. As a hot new direction at present, end intelligence has begun to emerge in the industry. Alibaba, Google, Kuaishou and other large enterprises are actively deploying end-to-end intelligence, using end-to-end AI to optimize various business scenarios and achieving very outstanding results.

The BYTE Client AI team is deeply engaged in the field of end intelligence, and implemented the end intelligent video preloading solution in cooperation with Watermelon Video earlier this year, which has achieved good results. Through this case, we will take you to uncover the veil of end intelligence and see how end AI is applied to improve business effect in practice.

A,

1.0 Scenario Introduction

This scenario is very simple: while playing the current video, the client preloads a fixed cache of 800K for each of the next three videos. So that users can play the following video quickly, get a smoother playback experience.

But there are some very obvious problems with such a fixed strategy:

In most cases, users will not finish reading the 800K buffer, but simply browse the content and then skip to the next video, resulting in a waste of bandwidth

When users carefully browse video content, if there is not enough buffer, the playback may fail or stall, affecting user experience

In fact, the optimal preloading strategy is to make “the preloading size and the playback size match as much as possible, and we load in advance as much as the user will watch in the playback stage, so as not to waste and not to affect the user experience”.

1.1 In-depth Analysis

But the reality is so variable that it’s almost impossible to load as much as you want the user to see. We had an idea that if we could predict the user’s next behavior pattern, such as whether he would “switch videos quickly” or “consume videos slowly,” we could help optimize our preloading strategy.

In fact, the behavior pattern of users in a certain period of time is regular: the user’s “hand speed”, “interaction tendency”, whether it is in “fragmentation time”, whether it is “working day” and so on. We can use these patterns to predict user behavior patterns and get a better preloading strategy. For example, we predict that the user will follow the mode of “quick video browsing” with a high probability. In this case, a more suitable preloading strategy for the user may be “reduce the cache size of preloading & increase the number of preloaded videos”, and vice versa.

1.2 Breakthrough Direction

At this point, we see that the key to optimizing this scenario can be translated into the question: How do you predict the behavior patterns of users on the end?

There are several sub-questions:

Use “rules” or “models” to make predictions?
1. The rules
  1. Advantages: simple, low development cost, can quickly verify the effect of the scheme
  2. Disadvantages: It can only deal with simple scenarios. The more complex the scenarios are, the more complex the rules become, resulting in high development and maintenance costs
2. model
  1. Advantages: Can deal with complex scenarios and make more detailed strategies
  2. Disadvantages: High development cost and longer cycle time
3. In general:
  1. In the early preparation stage, rules can be used to quickly verify the effect of the scheme and estimate the approximate benefits of the strategy
  2. In the implementation stage, the learning and prediction ability of the model will be used to develop a highly available and highly extensible refined strategy, and finally achieve the maximum effect

Does the model predict that it should be done “in the cloud” or “on the client”?
1. It varies from scenario to scenario, and there is no one universal approach
2. Taking the current scenario as an example, the preloading scenario has the following characteristics:
  1. The need to predict user behavior patterns on the page is strongly correlated with user sliding behavior
  2. The sliding behavior has certain regularity in short time period (second level, minute level), but weak regularity in long time period (day level, month level)
  3. The preloading trigger frequency is high, the time is short, and the real-time requirement is very high (otherwise it is easy to cause user lag)
  4. Some features on the end are not suitable for reporting due to privacy and magnitude
3. For such scenarios with high real-time requirements and short time cycle of characteristic data, it is a better solution to directly run the model on the “client” for prediction than to request the long response link in the cloud.

Finally, a table is used to summarize the characteristics of “end intelligence” and “cloud intelligence” :

	Decision-making cost	Response delay	Data dimension	Computing scale
Cloud intelligent	Network connection + Native iteration	Classification: buried point delay + near line delay	Long-term + Full Users + Statistical features	Big data + big model
The intelligent	Data triggering + dynamic scripting	Second: end – side calculation delay	Short term + single user + timing characteristics	Small data + small model

1.3 Optimization Ideas

After the above disassembly and analysis, the optimization idea of the whole scene has slowly emerged:

We can use “real-time processing power on the end” + “complex scene abstraction power of AI” to “predict user behavior patterns” directly on the client, and then “optimize video preloading strategies” based on the predicted results.

Second, intelligent preloading scheme

2.0 Terminal intelligence solution

After the idea, the rest is how to put our ideas on the ground.

Generally speaking, an end intelligence solution consists of the following stages:

On-side AI development
Client development
Algorithm package development

These phases are independent of each other and can proceed in parallel, and the “video preloading” scenario is used as an example to describe each of these phases.

2.1 On-end AI development

2.1.0 Feature mining

From the optimization idea of this scenario, we have made it clear that the target of prediction is “the user’s behavior model”. The remaining question is: what to predict?

Think about the full-screen streaming scenario, what features will affect or reflect the user’s ability to “swipe quickly and briefly or swipe slowly and watch carefully”?

Imagine a user are sliding to watch video, if several consecutive video is he wasn’t interested in content, has been a decline, so the user expectations of the next video or patience will continue to reduce, if still not interested in content may only browse the title slip away, average current overall watch time is shorter.
On the contrary, if users are interested in the first few videos and enjoy watching them, their expectation or patience for the next video will also increase, and the current overall viewing time will be longer on average.
When users swipe through videos, will they be biased towards different types of videos or even from different sources, resulting in different viewing duration? For example, if the user wants to watch a short video, it will directly slide over the long video.

There are a number of characteristics of user behavior that can be derived from thinking about scenarios like this:

With the conceptual thinking of user behavior characteristics, we can carry out data collection and analysis, and transform “concepts” into “data”. This requires us to comb through the endpoints associated with the scene. The benefits of combing buried points are as follows:

The process of combing through the burial sites gives us a comprehensive and intuitive view of our scene.
Enlighten the thought of feature mining
Make up for gaps in existing data

2.1.1 Feature processing

Once we have the raw data from burial sites and other sources, we need to process the raw data and convert it into characteristic data. There are many ideas for feature processing. Features can be extracted from different dimensions, such as historical features and real-time features. Different attempts can be made with different dimensions, granularity, or conditional combinations of multiple features, such as:

X samples of the past
Relevant sample data from x videos in the past
Sample data for the past x hours

2.1.2 Feature analysis

After obtaining a series of features that we think are meaningful to the current problem through feature processing, we need to further analyze the value of each feature. Value here refers to the value of these features to the model so that we can make a trade-off between the cost of acquiring these features and the benefit of the effect.

There are also several common ways to analyze the value of features to a model, such as:

Pearson, Spearman, Cohen’s Kappa and other correlation coefficient methods
Lasso, Ridge and other regularization methods
Distance correlation coefficient method
Decision tree feature sorting method

Each of these methods has its own applicable scenarios, which will not be described here. Feature analysis can help us screen out the most valuable features for the current scene in addition to the prior estimation of the model effect. Because adding features usually comes with a cost attached, such as:

It may increase the time of data collection
May increase the time of feature processing
It may increase the complexity of the model, thereby increasing the size and reasoning time of the model

As a model running on the client, in order to better real-time response, reduce the time of the whole link as much as possible and the size of the model can also better ensure the effect of intelligence on the client.

Once we have the on-side AI model, we still need to do some development work on the client side and the algorithm package.

2.2 Algorithm package development

There are two main things that algorithm package development does.

2.2.0 Feature engineering on the end

End-to-end feature engineering is to process the original data into feature data. Within the byte is the Pitaya side intelligent framework, which supports the capabilities of on-end feature engineering.

In general, we need to provide feature engineering capabilities on the end like this:

Support different triggering methods to support the needs of different scenarios
From different data sources (buried points, user profile data, equipment characteristics, etc…) The ability to obtain the features on the end
There are different dimensions and levels of management of data

With this capability, we can process features into the input data required by the model in real time on the end for subsequent inference and prediction.

2.2.1 On-end model reasoning

In addition to feature processing, we also need to establish a model inference link on the end, which is also supported by the Pitaya framework in bytes. This link is mainly divided into three parts:

The deployment environment

As the name implies, it provides an inference engine that can be deployed in real time on the end and the corresponding virtual machine environment.

Dynamic capability

Because the scene on the end is very many, iteration will be very fast. Policies, models, and even scenarios can iterate rapidly over time as the client updates or progresses. So one of the most important capabilities for terminal intelligence is the ability to be dynamic. The ability to dynamically pull algorithm packages, models, and runtime environments down to the end for deployment and management. The dynamic update capability allows us to update our policies dynamically without relying on client releases, enabling policy iterations in a very low-cost manner.

Real-time effect monitoring

In addition, a real-time effect monitoring system is needed. It is used to observe the effect, performance, stability and other key indicators of the model in real time, and timely alarm in the future when the effect of the model decreases due to user group change or scene change.

2.3 Client development

The client needs to determine the triggering time of different algorithm packages and the parsing logic of corresponding algorithm package results according to different scenarios to execute corresponding service logic. In the video preloading scenario it looks like this:

Trigger algorithm package execution
- After rendering the first frame of landscape in-stream scene, the algorithm package is actively triggered to execute.
Parsing algorithm package results
- The logic of adding preloading tasks is changed from the fixed policy to adding preloading tasks based on the execution results returned by the algorithm package.

In the scene of video preloading, we used the ability of dynamic update in the early stage to quickly test the influence of extreme values of preloading and different loading numbers on the playing experience, and determined the best combination range suitable for video preloading. At the same time, real-time effect monitoring is also used to reveal the current problems of the model in the experiment, so as to continuously modify and iterate to improve the effect of the model

Evaluation of program effect

After adjusting the client business code and developing the model and algorithm package, we can verify the effect of our scheme through AB experiment. We are all familiar with the AB experiment, and the end intelligence experiment is not different in nature, but there are a few points that need our special attention.

3.0 Fast Iterative algorithm pack

End intelligence not only provides flexible solutions, but also brings more dimensions that can be verified for experiments. We can try different models, different strategies in composition, and even different trigger times. Because there are too many combinations that can be tried, it is easy to encounter the problem of insufficient flow and experimental group in the experiment.

The Byte Pitaya platform solves this problem by providing algorithmatic package deployment and shunting capabilities. Support for the creation of sub-publications at algorithmpack releases, each of which can be used as a policy group to validate our idea of a policy direction. At the same time, each strategy group can be bound to the experimental group of AB one by one, and our strategy can be updated iteratively during the experiment, so as to achieve the purpose of not shutting down the experiment and quickly verifying different strategies.

In the video preloading scenario, we can design our experiment like this:

The experimental group	Strategy group
Online group
The experimental group a	Change the number of preloads (1,3,5,7,..)
The experimental group 2	Change the preloading size (500, 700, 900,..)
The experimental group 3	Change preload scheduling (parallel, serial,..)

Each policy group corresponds to an optimization direction, and the specific optimization details are adjusted through algorithm package iteration within the policy group. This allows us to quickly observe the effects of each direction in a short period of time and find the best performing strategy within each strategy group.

Another benefit of this design is that the release of the algorithm package is largely independent of the client version, and the iteration of the algorithm package can be adjusted more quickly based on experimental feedback, to a frequency of once or even twice a week.

3.1 Algorithm packet monitoring

Compared with general experiments, in addition to the business effect indicators in AB experiment, we also need to pay attention to the indicators of the algorithm package itself to ensure that the model effect also meets our expectations. These indicators typically include:

Performance indicators: Success rate, execution time, PV, UV,…
Effect index: Accuracy, precision, recall, TP, FP, TN, FN,…

With such real-time monitoring, we can grasp the running status of the algorithm package and the effect of the model at any time, find and locate in time when the algorithm package is abnormal and the effect is reduced, and optimize and iterate.

3.2 Scheme Optimization

During the experiment, we optimized the scheme for many times according to the problems reflected in the experimental data.

For example, when reviewing the model effect and experimental data, WE found that FN’s influence in our scene would be much greater than FP’s. Therefore, the proportion of FN was reduced by adjusting the threshold, and the effect was verified by rapid iteration in the experiment. As a result, the indicators such as the playback lag rate and the start and play failure rate were significantly optimized.

Another example is that when we analyze the prediction results of the model, we find that the behavior pattern of users has certain rules in a short period of time, and optimize our model strategy for different periods of time to further improve the business effect.

With the dynamic update capability of the algorithm package, we can iterate our strategy quickly. From the simple and rough start-up, continuous optimization to more and more powerful model, algorithm strategy more and more refined. Finally, the refined strategy that best fits the current business scenario can be found through on-end intelligence.

3.3 Revenue Results

Through a series of experiments, the video preloading strategy of watermelon Video has also been successfully launched, and considerable benefits have been achieved in various video playback indicators and bandwidth cost indicators:

In terms of playback indicators, compared with the fixed preloading strategy, the intelligent preloading strategy reduces the failure rate by 3.372%, the unstarted failure rate by 3.892%, the stalling rate by 2.031%, the 100-second stalling times by 1.536%, and the stalling permeability by 0.541%
In terms of cost, the intelligent preloading strategy reduces the total bandwidth cost by 1.11% compared with the fixed preloading strategy, saving the company tens of millions of dollars.

Four,

During the online inspection, we will find many hidden problems in the previous stage, such as the poor selection of features and the need to add new features, the high reasoning time and the need to optimize the model and algorithm package, the poor combination of model and strategy, resulting in the overall effect does not meet the expectations and so on…

Each new question, in turn, helps us refine a previous phase, iterate on our strategy, and then be tested online again. The whole stage can be viewed as a ring:

Scenarios combining AI and business often need to go through several cycles of optimization and refinement in this ring to continuously achieve better results. Only in the end, a mature and effective scheme can be iterated and finally improve the business effect.

Pitaya, a bytedance intelligent platform, is an end-to-end + cloud infrastructure jointly built by Bytedance Client Infra team and Data-MLX team. It aims to help terminal businesses efficiently utilize AI capabilities to improve business effects and expand business scenarios. At present, the main construction of end-to-end intelligent business, end-to-end feature engineering, end-to-end reasoning engine, end-to-end training and other frameworks, in Douyin, watermelon video, Toutiao, education and other business cooperation to help improve the business effect.

Volcanic engine application suite MARS is byte to beat terminal technology team in trill, today’s headlines over the past nine years, watermelon video, books, understand car such as emperor App development practice, for mobile research and development, the front-end development, QA, operations, product managers, project managers and operational role, provide one-stop research and development of the overall solution, We will help enterprises upgrade their R&D models and reduce their overall r&d costs. Click the link to enter the official website for more product information.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

End intelligent power watermelon video business practice