Original: CAI Rui, senior network expert of Baidu APP Technology team article source: Baidu APP technology wechat official account

Let us look forward to the long-awaited “Baidu App network depth optimization series” two “, continue to inherit baidu simple can rely on the spirit, not to say much, directly on dry goods, also welcome you fellow people to discuss together!!

One, foreword

In the series of “a”, we know that network optimization will generally be the first choice to optimize DNS, and the following HTTP protocol becomes the focus of optimization, the general optimizer will choose protocol switching, merge request, reduce packet size and other means to optimize the HTTP protocol, strictly speaking, this does not belong to the category of network optimization.

HTTP protocol is the basis of connection, so our series “two” connection optimization arises at the right moment, hoping to help you in the direction of network learning and practice.

Second, the background

Connection optimization needs to solve two core problems

1. It takes a long time to establish a connection, resulting in a long request duration, which affects user experience.

2. In a changeable network environment, the connection establishment process may fail, resulting in a decrease in the success rate and affecting user experience.

Baidu App carries hundreds of millions of traffic. For each request, it needs to pursue experience with short time and high success rate. How can this be done from a protocol perspective? First, let’s look at how it takes to establish a connection.

Principle of establishing connection time

We can see it clearly from the picture above

DNS Query requires a round-trip Time (RTT). Baidu App is based on HTTPDNS service, so most of the cache will be hit. If the system DNS is degraded, it will also be hit. So it doesn’t have much of an effect on connection time, and the online data shows that.

2. TCP undergoes 1.5 RTT of SYN, ACK, ACK handshake. ACK and ACK are merged, so there is 1 RTT.

3. Transport Layer Security (TLS) requires two RTTS through handshake and key exchange.

To sum up, DNS, TLS and TCP handshake phases take 4 RTTS to reach ApplicationData phase, which is the data transmission phase.

From the above analysis, it can be concluded that if we can reduce the RTT of TLS and TCP as much as possible, the connection time will be greatly reduced.

What can we do about connection optimization

The optimization goal of Baidu App is divided into two categories, one is TLS connection optimization, the other is TCP connection optimization.

TLS connection optimization

Connection optimization for TLS requires both server and client support, including Session Termination and False Start.

Session Resumption

Session Transtermination means Session reuse. The following figure illustrates the protocol principle of Session Termination.



The protocol principle of Session termination

It can be seen from the above figure that the TLS key negotiation exchange process is not, but how to achieve it? There are two modes: Sesssion Identifier and Session Ticket.

1) the Session Identifier

Session Identifier in Chinese is a Session Identifier, which is more like the familiar concept of Session. Is the Session ID generated in the TLS handshake. The server stores the Session ID, and the client stores the Session ID for subsequent ClientHello, where the server can complete a quick handshake if it finds a match.

2) Session Ticket

Session Identifier has some disadvantages. For example, if a client requests multiple times without landing on the same machine, it cannot find the matching information, but Session Ticket can. Session Ticket is more like the familiar concept of cookie. Session Ticket stores Session information encrypted with a security key known only by the server on the client. The client carries a Session Ticket with ClientHello, and the server can perform a quick handshake if it decrypts successfully.

Both Session Identifier and Session Ticket have timeliness issues and are not permanent. For these two methods, you can refer to reference [4]. The network protocol layer of Baidu App supports both of these two methods, eliminating the link of certificate download and key negotiation and exchange during TLS handshake, saving one RTT time.

False Start

False Start: False Start: False Start: False Start



False Start protocol principle

The preceding figure clearly shows that after the TLS handshake succeeds, the client sends the Change Cipher Spec Finished message to start data transmission. After the TLS handshake is complete, the server directly returns the application data. The application data is actually sent before the handshake is complete, so it is called a false start.

From the result, 1 RTT time is saved. False Start The following two conditions must be met: The Application Layer Protocol Negotiation (ALPN) handshake is negotiated through the Application Layer Protocol, and the forward secure encryption algorithm is supported. False Start sends data before the handshake is completed. Forward security can improve security. For specific protocol implementation, you can refer to reference [3]. False Start is supported in the Network protocol layer of baidu App.

As an aside, in fact, TCP layer has a similar connection optimization method called Fast Open. If you are interested, you can refer to resources [5].

The difference between Session Termination and False Start

Both save one RTT for TLS. Session Transtermination still requires two RTTS on the first handshake and is reused to one RTT on the second handshake. False Start is an on-end action, so it is reduced to 1 RTT each time.


TCP connection optimization

TCP connection optimization, let’s start with the connection pool, first let’s recognize the types of connection pool.

1. The connection pool



Type of connection pool

The figure above shows the different types of connection pools, which are familiar protocol connection pools. There are low-level connection pools, including TCP connection pools (which manage HTTP requests) and WebSocket connection pools (which manage WebSocket connections).

There are advanced connection pools, including HTTP proxy connection pools (managing connections for HTTP proxy requests), SpdySession connection pools (managing connections for SPDY and HTTP/2 requests), SOCKS connection pools (managing connections for SOCKS and SOCKS5 proxies), and SSL connection pools (managing connections for HTTPS requests).


The ability of different types of connection pools to reuse each other in the form of combinations.

1) SSL connection pools manage SSLSocket, but SSLSocket relies on TCPSocket provided by TCP connection pools.

2) If the HTTP proxy connection pool uses HTTP, TCP connection pool needs to provide TCPSocket; if HTTPS protocol is used, SSL connection pool needs to provide SSLSocket.

3) SpdySession connection pool relies on SSL connection pool to provide SSLSocket. Here it needs to be explained that although HTTP/2 protocol does not bind HTTPS forcibly, it does bind HTTPS in actual development. Baidu App uses ALPN to negotiate HTTP/2.

4) Both SOCKSSocket and SOCKS5Socket managed by SOCKS connection pool need to rely on TCPSocket provided by TCP connection pool. Although SOCKS5 supports UDP, cronet network library is not implemented at present.

5) The WebSocket connection pool depends on the TCPSocket provided by the TCP connection pool.

TCP connection optimization is a relatively complex content, Baidu App has done targeted scene optimization, including pre-connection, connection reconstruction, standby connection, composite connection.

2. The connection



Preconnect and rebuild connections

Preconnection, pre-created connection. It addresses scenarios where connections can be acquired without time consumption during the App usage phase. Here are four questions and answers to explain prelinking.

Question 1: Does preconnection solve all network requests for pre-connection establishment?

Answer: The answer is no. The pre-connection requires the business side to evaluate the core business and establish the pre-connection for the core domain name.

Question 2: Since the preconnection is for a specific domain name, how is it configured?

A: Using the domain name + number of connections in a way that configuration, such as https://a.baidu.com | 2, said to a.baidu.com this domain name configuration two connection, here to say, in the HTTP / 1 x under the agreement, the realization of the network library will be for the limitation of single domain name has a maximum number of connections, The number of connections varies from five to six, but for HTTP/2, the number of connections is limited to one.

Question 3: How is the preconnection established?

A: During the initialization of the network library, the pre-connection will be delayed for 5s according to the user’s configuration. In order to ensure the overall performance of the network library, the total number of pre-connections is limited to 20, mainly considering the impact of cold startup on the startup performance of the network library.

Q4: How is the preconnection maintained?

A: During the initialization of the network library, in addition to the establishment of a pre-connection, a pre-connection timer will be created, which will be every 31 seconds, the setting of this value depends on BFE (Baidu Front End) and BGW (Baidu Gate Way, Baidu independently developed four-layer load balancing platform) set the minimum timeout value and re-establish the connection according to the user’s configuration.

3. Reestablish the connection

The connection is rebuilt, and the connection is re-established. It solves the scenario that the App network status changes and the IP address changes, resulting in unavailability of the connection. Here are three questions and answers to explain connection reconstruction.

Question 1: Is connection reconstruction for all connections in the connection pool?

Answer: Yes.

Q2: What is the process of connection reconstruction?

A: When the network status changes, the first step is to clear idle sockets in the connection pool. What is idle sockets? That is, idle sockets are cleared in 60 seconds for unused sockets and in 90 seconds for used sockets. The second step is to wait 200ms for the DNS to complete the reconstruction.

Q 3: Does connection rebuilding affect performance?

A: For performance reasons, the connection reconstruction limit is 100.

4. Standby connection



Alternate connections and composite connections

Alternate connection, alternate connection. It addresses the scenario of normally sending a request when no connection is available within a group (what is a group? A group is the smallest unit of socket management. It contains active sockets, idle sockets, connection tasks, and waiting requests. Here are three questions and answers to explain alternate connections.

Question 1: Is the standby connection for all requests?

Answer: Yes.

Question 2: What is the process of the standby connection?

A: When a request comes and there is no connection available in the connection pool, a timer will be started to open the standby connection. The timer interval is 250ms, which competes with the primary connection. If the primary connection fails due to network jitter or poor network status, the standby connection will directly send the request. If the primary connection is successful, the standby connection is cancelled.

Question 3: What is the purpose of the standby connection?

A: If there is no connection in the connection pool, you must create a connection. Adding a standby connection in addition to the primary connection greatly improves the success rate of connection creation and improves user experience.

5. Composite connection:

Composite connections, that is, multiple connections. It addresses the scenario of connection selection for multiple IP addresses. Here are three questions and answers to explain compound joins.

Question 1: Is the composite join for all requests?

Answer: Yes. Compound connection can be switched on globally. Baidu App has not opened compound connection at this stage.

Question 2: What is the process of composite joining?

A: It is well known that domain name DNS query usually returns multiple IP addresses. Let’s take domain name query returning two IP addresses as an example

1) If there is an IPv6 address in the result, the IPv6 address will be selected first. Follow HappyEyeBall mechanism (refer to the introduction of HappyEyeBall in series 1).

2) Then the two IP addresses will attempt to establish a connection in sequence. If the first IP address returns a failure, the second IP address will be connected immediately.

3) If the first IP successfully returns first, the second IP will be added to the list of connection attempts and all connection attempts will be stopped.

4) If the first IP address fails, the connection to the second IP address will start immediately.

5) If the first IP is in the pending state, a timer will be started, and the default delay of 2s will initiate the connection of the second IP. If there are multiple IP addresses, the connection will be recursive. It should be noted that the delay time of different network systems will be different, so that the experience will be better.

Question 3: What is the purpose of composite join?

A: Composite connections have the advantage of providing the optimal IP selection mechanism, but they also bring high load on the server side. Therefore, comprehensive evaluation is required when using composite connections.

Best practices for connection optimization

Currently, the client network architecture of Baidu App has not been unified due to historical reasons, but we are working towards this goal.

Our central idea is centered on the API call interface of the system network library, the upper layer establishes the network facade for external convenient call, and the bottom layer inputs cronet (Chromium NET module) into the system network library by means of AOP through the system mechanism, so as to achieve the unification of the two-end network architecture and ability reuse.

The following highlights the location and practice of connection optimization in Android and iOS network architectures.

1. Position and practice of connection optimization in Android network architecture



The place of connection optimization in the Android Web architecture

Baidu App’s Android network traffic is currently on okHTTP, the upper layer of the network facade encapsulation, encapsulation of internal implementation details and external friendly API, we are currently reconstructing, the default Android standard network interface HttpURLConnection, Its underlying implementation is the system-provided OKHTTP.

In the customized aspect, the URL Stream Protocol mechanism is used to take over the underlying network Protocol stack of HttpURLConnection into CRONet, which is used by various services and basic modules. All the content of connection optimization is realized in cronet network library.

2. Location and practice of connection optimization in iOS network architecture



The place of connection optimization in the iOS network architecture

At present, all the iOS network traffic of Baidu App is above Cronet. We use the URL Loading System mechanism of iOS to inject cronet stack into URLSession. In this way, we can directly use THE URLSession API for network operation and easier system maintenance. The network facade is encapsulated in the upper layer for the use of various business and basic modules.

Within Cronet, pre-connection (mainly pre-connection and preservation for several core domain names of Baidu App), connection reconstruction (for all requests), alternate connection (for all requests), composite connection (temporarily not open on iOS), Session Termination (for all requests), False Start (for all requests).

Five, the revenue

The benefits of connection optimization are mainly reflected in network delay and network success rate. These two benefits need to be combined with the business. Take baidu App Feed refresh as an example.

The network delay of Feed refreshing text request decreased by 16%, and the network delay of Feed refreshing image request decreased by 12%, which can be said that the benefit is quite obvious.

In terms of success rate, the success rate of Feed refresh text request increased by 0.29%, and the success rate of Feed refresh image request increased by 0.23%, which are also very good benefits.

Six, the concluding

Connection optimization is a continuous topic, there is no optimal only better. The experience and practices of Baidu App introduced above may not be perfect, but we will continue to further optimize and continuously improve the network performance of Baidu App.

The above optimization was jointly completed by Baidu App team, kernel team and OP team. Finally, thank you for your hard reading, I hope to help you, will continue to launch – Baidu App network depth optimization series “three” weak network optimization, please look forward to.

Vii. Reference materials

1. https://chromium.googlesource.com/chromium/src/+/HEAD/docs/android_build_instructions.md

2.https://chromium.googlesource.com/chromium/src/+/HEAD/docs/ios/build_instructions

.md

3. https://tools.ietf.org/html/rfc7918 False Start

4. https://tools.ietf.org/html/rfc5077 Session Resumption

5. https://tools.ietf.org/html/rfc7413 TCP Fast Open

Read more:

Baidu App network depth optimization series “A” DNS optimization

The article has been revised on February 15, 2014

Denver annual essay | 2019 technical way with me The campaign is under way…