The average daily PV of the Web page of QQ Music Android client reaches tens of millions, but the opening time of the page is far from that of the Native page, requiring systematic optimization. This paper will introduce the problems, ideas, schemes and effects of QQ Music Android client in the process of Web page general performance optimization, and try to summarize the common bottlenecks and countermeasures of cross-end scenarios. Article author: Guan Yue, QQ music client development engineer.

I. Problems and objectives

As an application focusing on content operation, the AVERAGE daily PV of the Web page of QQ Music Android client reaches tens of millions, and the core pages such as the comment page and MV page are all Web pages, or completely realized by the Web.

The opening time of Web pages in the client is far from that of Native pages, requiring systematic optimization. However, the existing front-end and cross-end optimization schemes have some limitations.

1. Limitations of front-end optimization

For the time-consuming optimization of Web pages, the optimization ideas, programs, services, tool chains and other aspects have been built in great detail. However, in the case of in-client Web pages, pure front-end optimization has two limitations:

  • You cannot avoid the WebView initialization time

  • Limited by the Scope of the WebView lifecycle

From the perspective of the client, in addition to thinking about optimizing the WebView initialization time, you can also think about the optimization plan from the perspective of “extending the front-end life cycle”.

2. Limitations of cross-end optimization

Existing cross-end optimization schemes, including offline package and VasSonic, require front-end terminals to participate in the transformation in order to achieve the best optimization effect. This leads to an increase in the logic of existing pages, which are not friendly to online pages and introduce additional costs and risks. It is difficult to carry out these optimizations when front-end development resources are insufficient.

From the perspective of reducing front-end development workload, it is necessary to consider an optimization scheme with more generality and less front-end perception.

3. The target

Based on the background of this optimization, the objectives of this optimization are as follows: enhancing universality and reducing front-end transformation costs.

Second, index design

At the same time of developing the optimization idea and implementation, it is necessary to establish performance indexes to measure the optimization effect.

1. Existing performance indicator data of the client

Next, based on the Web page loading process within the client, describe the timing represented by the client’s existing performance metrics.

(1) Client WebView callback

Based on the process monitoring callback and page frame capabilities of Android WebView, the performance monitoring that can be achieved includes:

Where onMainFrameFinished takes the time when the first non-main request (HTML) resource is intercepted. For most pages, the main request (HTML) has been downloaded and parsing has begun; Can roughly represent the end of the main request process.

(2) W3C Performance Timing

W3C Performance Timing provides more detailed information about the loading process than the client callback, but does not include the point in time when the WebView starts initialization. Only some of them are listed below:

2. Limitation of individual collection at each end

(1) Limitations of front-end acquisition

  • The point at which the WebView started initialization cannot be obtained independently.

  • To obtain the most accurate loading completion time, mainly rely on manual burying point.

(2) Limitations of client collection

SSR (server rendering) and CSR (client rendering) differ in the time point at which page content can be consumed.

For the WebView page load cycle:

  • The CSR page should display data after the front-end page frame is loaded, and the content request is completed and displayed after the page is loaded

  • The first content on the SSR page can carry the first screen data, so the page content can be consumed before the page is loaded

The client callback timing is incomplete or too “harsh” to determine the point in time when “page content is available for consumption.”

By tracing the callback time of onPageFinished, the corresponding Blink code must meet the following conditions: the page has been parsed and no resources are being downloaded.

By this standard, once there is an image that is still loading but the rest of the page frame has been processed, the onPageFinished callback waits for the image to finish loading, which is different from the actual “page content is consumable” point.

3. Index design scheme

Combined with the above analysis, it can be determined that:

  • The most accurate time for page load completion comes from the front end

  • The most accurate timing for WebView initialization comes from the client

Therefore, perfect time measurement needs to be done by the client and the front end in collaboration.

(1) Front end

The front-end automatically sets the end time, obtains the WebView initialization time from the client, and collects statistics about the opening time.

  • The front-end can obtain the loading completion time by manually burying points or monitoring DOM node number changes.

  • When the front-end statistics call the client to provide JSAPI, get the time from the WebView initialization point as the starting point.

  • The front-end calculates and reports the loading time.

(2) Client side

As a complement, the client can retrieve the domInteractive point in W3C Performance Timing as the end point through JavaScript injection.

  • When the front-end domInteractive is complete, all requests and processing of the resources necessary for page presentation are completed

  • The time difference can reflect the general optimization effect of any page on the client

  • You can measure the consumable time of SSR(server rendering) pages and the first frame time of CSR(client rendering) pages

webView.evaluateJavascript(

Script = "(the function () {return performance. Timing. DomInteractive; }) ();" .

callback = { value ->

responseEndDuration = value.toLong() - getOnCreateTimestamp()

}

)

Although WebKit is responsible for maintaining Performance Timing values, WebView does not provide an interface to obtain the values at the above points in time.

Third, optimize the plan and effect

1. Overview of optimization scheme

Based on the loading process of Web pages in the client, five optimization items are proposed from the aspects of “WebView initialization time optimization”, “resource loading time optimization” and “logical processing time optimization”.

  • TBS (X5 kernel) environment preloaded

  • The WebView instance pool

  • The main request is loaded in parallel

  • Web public resource pools

  • Follow skin logic optimization

Optimization items take effect when the Web page is loaded as follows:

2. Description of optimization methods

(1) WebView initialization

After preliminary analysis, the compression space of WebView initialization time itself is limited. Therefore, the optimization means is mainly to initialize the logic preposition. For example, the WebView instance pool preinitializes the WebView when the application is in the background and the main thread is not affected significantly, replacing the initialization time when the Web page is started.

(2) Client self-built cache

In order to achieve the above optimization of resource loading, the client needs to build a resource cache independent of WebView cache mechanism.

Self-built cache Refers to the three-level cache mechanism commonly used by clients. Based on the strong life cycle of WebView, a cache life cycle of “cold-hot cache cycle” is designed.

For example, at the same time as the WebView is initialized, the self-built cache loads the resources needed for the page from the file system into memory. When intercepting the input byte stream to the WebView resource, the self-built cache must be output from the memory cache, and the output can be immediately cleared from the memory cache. This mechanism can make memory cache flushing more aggressive, and byte streams stay in memory for less time, reducing memory footprint.

(3) Inline public resources

After the development of the common resource pool, the page opening time is negatively optimized. After analysis, it was determined to be related to the performance bottleneck of resource interception callback.

  • The single-threaded model leads to read and write performance degradation

  • The larger the number of intercepted resources, the more easily the impact on performance is magnified

Therefore, in order to reduce the performance impact of resource interception callbacks, common resource inlining optimization is introduced in order to reduce the number of interceptions.

  • After the common resource is loaded into the hot cache, it is transformed into the corresponding HTML node

  • After the parallel loading of the main request is completed, the corresponding out-link node is replaced directly in the main request byte stream. The replacement of the new byte stream returns to the WebView

The introduction of public resource inlining basically offsets the performance impact of resource interception callbacks and improves page loading time by 3.2%.

3. Optimization effect

QQ Music Android terminal inside the comment page:

  • Load time reduced by 26.2% (1932ms → 1426ms)

  • Lower bounce rate

  • Median length of stay increased

4. Bottlenecks and countermeasures of cross-end scenarios

Based on the optimization process in the WebView scenario, the similar problems that may exist in the cross-end scenario are deduced. This paper tries to give some possible performance bottlenecks in the cross-end scenario and solutions.

1. The communication channel efficiency of the front terminal is insufficient, so consider “fewer times and more quantities”.

Cross-platform solutions (WebView, React Native, etc.) generally suffer from inefficient communication channels of front-end terminals.

  • The WebView channel does not support the passing of large orders of data

  • Communication threads are mostly single threads and even need to initiate or process communication in the main thread

  • The sensitivity to the number of transfers is greater than the sensitivity to the total amount of data transferred

Therefore, when a large amount of data is transferred in a cross-end scenario, the availability of the current communication channel should be given priority. If the total amount of data to be transferred cannot be compressed, if the channel allows, try to reduce the number of transfers and increase the amount of data to be transferred at a time.

“Public resources inline” is the practice of this idea.

2. Extend the lifecycle

The front-end life cycle is limited. Clients can take advantage of time outside of the front-end life cycle to do appropriate resource and logic front-loading to reduce page loading time.

For example, “common resource pool” and “parallel loading of main request” in the optimization above reflect the idea of extended life cycle. In addition, the dual-thread model of wechat applet [1] increases the execution time of front-end code by introducing JSCore, and helps front-end extend its life cycle by means of off-line packages.

3. Streamlining/preloading common library code

If front-end pages share a common library, the natural expansion of the common library with the complexity of front-end services may magnify the time spent on script parsing and execution.

For Web pages, you can reduce the execution of irrelevant code by streamlining the base library. The React Native page can be subcontracted and instance preloaded to allow more base library code to be executed before the page is loaded, thus reducing the amount of code executed during page startup and time consumption.

V. Summary and prospect

Based on the loading characteristics of Web pages in the client, aiming at the problems and bottlenecks in the current situation of WebView initialization, resource loading and logical processing, this paper designs and implements five optimization items, and the optimization effect is obvious. In addition, the bottlenecks and countermeasures of cross-end scenarios are summarized to provide ideas for the subsequent optimization of cross-end scenarios.

In the future, the team will further enrich client-front-end collaborative performance monitoring and allow the front-end to launch the client-side Web page framework in a more refined manner. In the future, it will also try to explore CGI preloading and introduce JSCore to further improve the loading time of Web pages in specific scenes.

References:

[1] Two-thread model of wechat applets:

Developers.weixin.qq.com/ebook?actio…

See Tencent technology, learn cloud computing knowledge, pay attention to the cloud plus community