Read the original

Introduction: Today’s topic is “Baidu open source small program framework evolution and performance optimization practice”. This sharing includes two parts. The first part is the overall framework and evolution of Baidu intelligent small program, mainly about the overview of the whole process of Baidu small program development, Baidu intelligent small program framework, and Baidu small program multi-host operation guarantee; The second part is baidu small program framework performance optimization, mainly about the whole small program startup process, as well as from the developer’s point of view, what are the important optimization points.

First, Baidu intelligent program overall framework and evolution

The whole mobile Internet is always looking for tradeoffs between NA and H5. NA has good performance and strong capability. The H5 is more flexible. I think rendering is divided into two schools, one is NA rendering school, one is called H5 rendering school.

NA rendering school, such as RN, Flutter; Web rendering school, such as Baidu’s light application, and then do small programs.

1. Development of the whole process overview

Baidu has done three representative products of Web rendering, which are light application, direct number and small program.

  • Light application, is H5 + end capability. It is a standard H5, adding some NA apis, such as positioning, etc.

  • Direct, on a technical level, is the same as light application.

  • Applets are essentially a limited H5 + a large number of rich API + UI components. Now we provide more than 300 apis for small programs, there are more than 30 components, components are interface. For example, videos, maps.

There are two main reasons why applets should be limited:

  • Keep the experience consistent. H5 is too flexible, JS can change the interface at any time.

  • Safety considerations. Because we provide a lot of apis and components, and these are very low-level capabilities, such as phone numbers and account numbers, that can’t be easily opened up to people.

There are two main reasons for this limitation:

  • Write language, no longer write HTML directly, but with the custom language swan to write.

  • The Runtime layer has two stacks, the rendering stack and the JS execution stack, which are physically isolated for security purposes.

2. Intelligent applets framework

(1) Development and operation of the whole process

The first brief introduction of the entire Baidu intelligent small program development process.

  • First, the developer writes the layout with swan.

  • Then through the developer tools package, upload to our small program B side server;

  • Then is the small procedure audit process, organic audit, human audit;

  • Finally, when the user clicks the applet, the client requests the applet C server, and the C server obtains the applet package from the B server. The whole process is encrypted transmission, can ensure the security of the code.

(2) Baidu intelligent applets framework -SWAN

The figure above is the framework of a Baidu intelligent small program, which we named SWAN internally.

The hierarchical structure is as follows:

  • The top layer is the developer base library, named Swan-JS, and developers work directly with this layer. Swan-js is responsible for two things: that is, swan code becomes HTML and becomes a WebView runnable program; Client-side capability encapsulation exposed.

  • The next level is swan-native. The core of this is the NA implementation of the API and components. The double-stack management is also in this layer. In addition, the red-labeled Extension is used to expand the ability of the developer host. For example, if the host of tieba wants to add a Posting ability, it can be used through this mechanism.

  • The Porting Layer is put in below. This layer is baidu small program in order to achieve open source, add a layer and host interface layer. The lowest layer is the host base capability layer. If the host does not have these capabilities, you can refer to baidu open source reference implementation, which can be directly integrated into the host for use.

3. Core structure

(1) Front end Angle

Let’s look at the double stack from the front end. A host client can run multiple applets and remain alive for a period of time. Each applets has a Master execution framework JS and the applets’ developer JS, and one master corresponds to multiple slaves (slaves represent a user visible interface).

(2) Client Angle

From the perspective of the client side, the dual-stack structure, as shown in the figure above, the master is responsible for executing JS. There are two implementation methods: WebView or JS engine (V8/ JScore), JS engine is more efficient; The slave displays a WebView. To speed up the creation of webviews, set the cache. Master and slave communicate through the message bus.

Master does not support BOM, DOM, or Web-API. Applets can only invoke the capabilities that are open to the outside world.

(3) Relationship between small program NA component and interface

From the experience point of view, the applet experience is better than the H5, one thing is that the applet will integrate some of NA’s capabilities and UI into the applet. The body rendering of the applet is still based on H5 technology, so let’s talk about how NA elements are incorporated into the UI.

There are two kinds of relations between NA element and H5, namely, patch relation and same-layer relation.

Patch relationship: NA is not on the same layer as H5, NA floats above H5, and all H5 elements cannot be placed on NA. Because it is not in the same layer, it needs to deal with the scrolling linkage. When the WebView is detected to scroll n pixels, the NA element also needs to scroll N pixels.

Same layer: NA is in the H5 layer, H5 can be on top of NA.

Patch, the same layer of the interface hierarchy tree as above.

Let’s take an example of a video component layer. The video component is complex, with four layers. The first layer is the video layer, which is the image of the original video, the second layer is the barrage layer, the third layer is the control layer for the video (for example, the start and pause buttons), and the fourth layer is the Slot layer, where the H5 elements floating above the video will be placed.

H5 (swan written by the developer will be converted to H5) will be marked with a special attribute inline. Go through the kernel and take the surface out of this area and give it to the NA layer; The applet framework then slips this area of the surface into the player and allows the player to draw directly on the surface to the same layer. The above bullet screen, controls, and slots are all implemented in the SwanJS layer H5. The Slot layer can be thought of as a container. For example, if you write a video, all of its children will be placed in slots.

The technical solution for NA component layer is not quite the same, and there are some differences between Android and iOS. On iOS, for example, if some component has over-the-flow, it will naturally support it, but android requires the browser kernel to support it.

4. Small program multi-host operation guarantee

Baidu intelligent small program is an open system that can run on multiple hosts. How to ensure the consistency of small program running experience on multiple hosts?

After each host integrates with our applets framework, they first run the CTS test, and then they can get the applets list for distribution.

For optional abilities, not all of the abilities of each host need to be implemented. For example, some AI abilities, push abilities.

What if a small program uses optional capabilities?

The first is a two-way selection mechanism between the small program and the host. The small program can choose which platforms I want to distribute to, and the host has the right to choose which hosts TO distribute to. Second, the small program to do compatibility.

Estension mechanism

Shown in red is the Extension mechanism, which can be used when the host has some customization requirements. As a host, you need to do two things. One is to write a set of interfaces in the JS layer. The second is to achieve a set of abilities at the Porting Layer. If the host feels that this ability is universal, they can feedback the proposal. After the review is approved, the Baidu Small program team will merge the proposal into the open source framework.

5. Chapter summary

Second, Baidu intelligent small program framework performance optimization practice

First, take a look at the loading process of a small program from the user’s perspective.

1. Baidu intelligent small program loading process in stages

Take Weibo for example, as shown in the picture above.

  • First, after the applet is started, a Loading process is started. The title above and the TAB below (framework NA implementation) are shown

  • . The second image is defined as the FP (First Paint) phase.

  • The third one has a search box under it, which is actually the contents of the applet. It is rendered initially through the initDate interface, which we define as the FCP (First Contentfull Paint) phase.

  • The fourth diagram shows the widget pulling live content from the Web and then updating it to the interface, which we define as the FMP (First Meaningful Paint) phase.

  • In the last image, all the elements have been pulled down and shown that the user can manipulate any position, which we define as the TTI (Time to Interative) phase.

2. Baidu intelligent small program

(1) Performance baseline

Baidu Miniprogram established the FMP indicator at the end of 2019, which is displayed on the developer platform under the name “screen time.”

We counted an 80-cent point on the line, which took 1.9 seconds. What is the 80-cent mark? For example, if 100 requests come in, and we rank the time of the requests, the time of the 80th request, we think it’s 80 points.

(2) Performance history curve

As shown in the figure above, the historical curve of baidu applet performance optimization in 2019. The FP framework layer has been optimized for around 1.1s from close to 3s. The goal of Baidu Miniprogram is to make it wirelessly close to the NA experience.

3. Start the process

Next, what else can we optimize from a developer’s perspective?

Let’s take a look at the startup process. All the startup logic is simply listed in sequence (actually some steps are in parallel).

4. Performance optimization

There are two main parts to performance tuning that developers can do. One is the size of the small package, and the other is the business data.

Here are three points to illustrate what developers can do.

(1) Package volume optimization

It is recommended to keep the package size within 1M. Why?

Because according to our statistics, if we need to download the package when we open it, the startup time will account for 60% of our entire time. A 1M package at 80 qubits takes 1s+ to download. So control the size of your bag. And we’re just looking at the 80th quartile right now, and when we get to the 90th quartile, the 99th quartile, it’s a very steep curve, and it gets worse.

Inclusion optimization mechanism

There are two technologies: one is subcontracting technology and independent subcontracting technology, the other is resource compression.

  • Subcontracting technology & Independent subcontracting technology Subcontracting technology

A small application has many pages, but not all of them are high PV pages. Many pages are rarely clicked by users, we can put these pages into our subcontract, the main package put our high PV pages.

Subcontracting does not work independently, for example, from the search feed distribution, which requires downloading our main package, but because of its low probability, it does not affect the vast majority of cases. In short, use subcontracting techniques to strip out non-critical pages.

What if the volume of the small package is still large after the non-critical pages are stripped out using subcontracting techniques?

  • Independent subcontracting technique

Standalone means that after downloading the package, you can run it without downloading the main package. The difference between the main package and the independent package is that the small program always has an entry, and the independent package of this entry is called the main package.

These two techniques were used to reduce our inclusion size, keeping it under 1M.

  • Resources compression

We analyzed some small programs and found that some packages contained PC pictures, which definitely increased the size of the package. The suggestions are as follows:

  • Put the image on the server, not in the package.

  • Compress the image volume, for example, changing from PNG to JPEG can reduce the volume by 90% (regardless of opacity).

  • Eliminate useless resources.

App-js needs to be subcontracted to solve the problem. What goals should we achieve in the end?

  • Each packet must be within 1 MB.

  • The number of files is limited to 200.

(2) Data pulling

The purpose of data pull is to quickly fill the interface with content and reduce the user’s white screen time. Cache some data offline even if the user is offline.

As shown in the figure above, there are references to the business skeleton screen and frame skeleton screen. Now many small programs will refer to the implementation of H5, H5 progressive loading skeleton screen is used in our small program, after using this technology, it will slow down the real content display speed, we statistics about 300ms delay.

In order to solve the content display delay caused by skeleton screen, we made a skeleton screen mechanism of frame layer. Implementing skeleton screens with our mechanism will have a much smaller impact on performance. The strategy is to have the slave load the skeleton screen while the master does appJS execution and execute in parallel.

Write your own business skeleton screen, when will it be displayed?

As you can see above, when you notify your App, Page, and waitNotify to the render thread, you don’t render your own business skeleton screen until Ready firstRender, which is of course slow. Even though you’re using a skeleton screen, there’s still a lot of white screen time between the skeleton screen and when the user clicks. With a frame frame screen, the problem of white screen time will be solved. With a frame-skeleton screen, it takes more or less time, it’s parallel, but it’s still grabbing the phone’s resources.

So overall, from the perspective of the client or the framework, we do not recommend it, but we do not oppose it. If you want to use frame frame screen, minimal impact.

The optimization of request, I summarize mainly two points, the first should be early, the second should be less.

  • “Early” can be divided into two parts, one is early hair, the other is not blocked.

The first is to send early, request too late, and of course the presentation is slower. It is recommended to place web requests in onLaunch. This is the first event we open for small applications. Many small applications will be placed in Page Unload, which is slower. These two times are 80 tenths of the line, about 200ms ~ 300ms apart. The second is not blocking, often see some small procedures, together, it should wait for the user’s authorization, positioning. Usually, positioning involves XY coordinates, but once positioning involves altitude, GPS needs to be turned on, so that the performance will slow down 2s ~ 3s. Don’t set the height if you don’t need it, it will be very slow. There is also a small program in the use of the user will be authorized, if not authorized to show nothing below, blocked. If possible, it is recommended to prompt the user when authorization is required, so that the user does not feel disgusted and can speed up the startup.

  • “Less” is mainly divided into two points, one is the delay of non-critical request, the other is to pull only one screen of data.

After a small program runs, there may be dozens or even hundreds of network requests, small program in addition to its own business but also hit, which will greatly affect our network speed. Because the common host in the underlying network library will set up a thread pool, too many requests will queue. The applets framework does not know whether a request is core or non-core and can only queue it. Business will be blocked if it is all butchers at first. In general, data that needs to be displayed on the entire page is requested first, and non-critical requests are deferred. Second, only pull a screen of data, segmented loading.

(3) Rendering

SetData operations are expensive and minimize data volume and frequency.

As shown in the figure above, setData is a very core API, when the network data comes back, only after the setData driven rendering, the content can be displayed on the interface.

The figure above shows a comparison between before and after optimization. We can see that even 1K of data takes about 20ms. If JS is executed with a WebView, first a JS string goes to the Browser, the Renderer thread, the Browser thread, becomes a C layer string, and then we go to NA, through the Java Interface, Becomes a Java String. And then when you get to slave, you have to go the other way, so you can’t go fast. Although we made some optimizations to make it a memory pointer optimized for switching through the kernel, it was still expensive.

Found that some small programs in the use of the process, the use of setData has a lot of improper, the following is the use of setData to pay attention to.

  • Reduce the number of setData calls. Goodcase: Merge multiple setData calls into one setData call.

  • Reduce the amount of setData. Badcase: Call setData after new page data is added to previous page data.

  • Variable changes update only variables, not objects.

5. Performance self-check

Performance self-check mainly has three stages, namely development stage, test stage and online.

  • In the development phase, we have three means to conduct performance self-check, namely, tool experience score, performance panel (on the client side, performance panel can indicate the total performance startup time) and dotting system.

  • In the test phase, we have two means, one is screen recording, the other is high-speed camera, these two means can truly reflect the user experience.

  • After launch, there is a developer platform. How to obtain official support for technology? Go to the developer documentation and community for technical support.

6. Chapter and subsection

  • Developers can optimize performance from package size, data request, and rendering.

  • Inclusion: within 1M. Subcontracting technology, compression pictures, useless resources elimination.

  • Frame screen: If you want to use frame frame screen.

  • SetData: reduce frequency and amount of data.

The overall review is shown below.

———- END ———-

Read the original

❤️ follow + like + favorites + comments + forward ❤️, original is not easy, encourage the author to create better articles ~

Follow baidu’s official official account [Baidu Architect], a public account focused on technological innovation

Welcome to pay attention to [Baidu architect], more technical dry goods only in the public account push