List of 100 million Android Architecture columns:

“100 million Level Android Architecture” column with talk

Network Connectivity and Acceleration in the Android Architecture

Long Connection Technology for Android Architecture

“High Availability mobile Connectivity with Android Architecture”

Evolution of Network Security in the Android Architecture

High Performance Mobile Logging System based on Android Architecture

Second Mobile Configuration Center for Android Architecture

The body of the

In our last article, “The Android Architecture of the Web (Part 1),” we covered technical points like OkHttp, network acceleration schemes like HttpDNS, and data compression and serialization. In this paper, we will talk about various aspects of long connection technology in combination with mainstream long connection schemes in the industry such as Tencent Mars framework and Meituan Shark system.

This article will cover the following technical points:

  • Long connections are confused with Http short connections and keep-alive connections
  • Why do you need a long connection
  • When the long connection is disconnected
  • How to establish a stable long connection
  • Mars intelligent heartbeat mechanism
  • Long connection data protocol and encryption
  • Construction and Disaster recovery of long-link channels

In addition to the common Http short connection, large apps almost always set up a complete set of TCP long connection network channel. Let’s take a look at the online data of meituan Shark long connection:

Photo source: Mobile Network Optimization Practice of Meituan-Dianping

The above two pictures compare the success rate of long/short connections and network latency data, which are the most important indicators of network modules. It can be seen that both success rate and network delay, long connection is significantly better than short connection.

In addition, wechat is known for sending and receiving instant messages, thanks to its stable and highly available long-link system behind it. In fact, in addition to messaging, wechat’s other small data communication is achieved through long connections.

Let’s take a look at some of the core technical points of long links.

I. Long connections cannot be distinguished from Http short connections and keep-alive connections

To avoid confusion between long and short connections, here are a few differences.

Long connection vs Http short connection

They correspond to long and short connections at the TCP protocol layer respectively.

As we all know, TCP will establish a connection with the server through a three-way handshake, and then transfer data. However, the short connection will actively close the connection after the data transfer, while the long connection will continue to keep this connection, subsequent data read and write continue to use this connection.

Long connection vs HttpKeep-Alive

Connection multiplexing was mentioned in the previous article. Using the Keep Alive field in Http1.1, we can Keep an Http connection from being closed immediately. “Keep Alive” means “Keep Alive”.

It’s not. A long connection is also called a TCP long connection. It is built on THE TCP protocol, and the above mentioned “Keep Alive” is the content of the Http protocol, and even the protocol is different.

Keep Alive is an Http connection, also known as a persistent connection, which is not the same as a long connection. Interested in this article: “TCP Advanced”.

TCP’sKeep-AliveVs HttpKeep-Alive

The TCP protocol is also used to Keep Alive. What is the difference between the Http protocol and the TCP protocol?

The two serve different purposes. The Http protocol closes the connection automatically after completing a request. In this case, I can put a Keep Alive message in the request to the server, telling the server not to close the connection immediately. I want to reuse this connection. The TCP layer does not break automatically, but this raises the question of how do I know if the connection is broken for some external reason? TCP automatically sends a Keep Alive packet to the server after two hours to check whether the server is still responding. It functions like a heartbeat pack, only too far apart to be a real heartbeat pack.

II. Why do you need a long connection

So what are the benefits of long connections compared to Http short ones?

1. Requests from different domain names can reuse the same long-link channel

In the past, we needed to make corresponding DNS requests for different domain names, and then establish corresponding Http connections. The Http connection pool described in the previous article is not reusable under different domains and needs to be re-established. These are some overhead, but if you use a long connection channel, the domain name is just one field in the request and can be reused directly over the same long connection channel.

2. Does not depend on DNS, and does not have DNS time-consuming and hijacking problems

We mentioned HttpDNS, which is superior to system DNS, but still does DNS operations. However, long connections are direct IP connections, so there is no DNS related cost and time.

3. If there are a large number of network requests, the network latency can be significantly reduced and bandwidth can be saved

For large apps, there are a lot of dense network requests, which can lead to a lot of Http disconnections and reconnections, wasting a lot of time and bandwidth. However, over the long connection channel, this part of the time is not required, and the binary data can be directly transmitted, saving the bandwidth overhead such as headers in each connection.

4. The server actively pushes data to the client

For the wechat message receiving scenarios mentioned above, if the client is required to take the initiative to poll, it will frequently initiate requests, which will cause heavy load on the server and waste bandwidth traffic. Through the long connection, the server can actively send messages to the client, achieving the highest real-time performance and saving traffic.

III. When will the long connection be disconnected?

Normally, long connections do not break. We can try to establish a connection between two sockets, as long as the network is unchanged, everything is normal, then the two sockets can continue to send data to each other, without disconnecting.

However, in the mobile network, the network status is complex and changeable, such as the network line is cut off and the server is down, which will lead to the interruption of the long connection. In addition to these line anomalies, we need to pay attention to the following causes of long connection disconnection:

1. The process of the long link is killed

This is easy to understand. If our App switches to the background, the system can kill our App at any time and the long connection will be disconnected.

2. Switch the network

For example, when the mobile network is disconnected or there is a switch between Wi-Fi and cellular data, the IP address of the mobile phone will change. As we know, TCP connection is based on IP + Port, once the IP changes, the TCP connection will be invalid, or the long connection will be disconnected.

3. NAT times out due to system sleep

Here is a brief explanation of NAT for those of you who are not familiar with it. When the mobile phone is connected to the network, the gateway will assign us an IP address, which is actually an internal IP address. At this time, we have not really connected to the public network, nor can we connect to the server. If we want to connect to the public network, the carrier needs to map our internal IP address to a public IP address, so that the server can establish a connection with us. NAT refers to this mapping process.

In other words, the carrier assigns a public IP address to each device, which is like a communication card. However, as the number of devices connected to the network increases, the load on the gateway increases. In this case, the carrier reclaims the public IP addresses of some inactive devices. If the device needs to be connected to the network next time, it simply allocates another IP address to the device.

This may seem fine, but in fact, if our App is inactive for a period of time and a NAT timeout occurs, our public IP will become invalid and the long connection will also fail.

4. The DHCP lease

If the DHCP lease expires, the IP address will also become invalid.

In general, a long connection will not be disconnected under normal circumstances, but once the IP address of the phone expires, the connection will have to be re-established.

IV. How to establish a stable long connection?

Above we mentioned a variety of reasons for long connection disconnection, so how should we optimize, as far as possible to ensure that the long connection continues to open, or timely disconnection, but also to reconnect as soon as possible?

1. Long connect an independent process to Mars

To reduce the chance of a process being killed, we can see in the Mars Demo code that it has isolated the long-connection logic into a separate process. This process only interacts with the network and consumes few resources, such as memory, reducing the probability of being reclaimed by the system.

Picture from “Android version of wechat background combat sharing (process)”

2. The long-link process is restarted

Process killing is inevitable. However, AlarmReceiver, ConnectReceiver, and BootReceiver can be used to wake up the process in time.

Of course, process preservation is a big topic, and improper process preservation can be detrimental to the system experience. I won’t go into that here.

3. Heartbeat mechanism

A lot of people mistakenly think that heartbeat packets are just used to periodically tell the server about our status, but they are not.

We mentioned NAT timeout above, that is, if the App is inactive for a period of time, the carrier will delete our public IP mapping, which will cause our TCP long connection to be disconnected. Therefore, a heartbeat mechanism is required to ensure App activity and prevent NAT timeout.

4. Disconnect and reconnect

When running online, long connections are likely to be disconnected due to network switches and the like. In this case, we need to find out as soon as possible that the long connection is down and reconnect immediately. There are generally the following methods:

  • Create a Receiver to monitor network status and reconnect immediately if network switchover occurs.
  • Monitors heartbeat packet return on the server. If no heartbeat packet return is received for five consecutive times, the long link is invalid.
  • Set the heartbeat packet timeout limit. If no heartbeat packet is received within the timeout period, the system reconnects the heartbeat packet. This mode consumes power.
  • Wait for the socket IO exception to be thrown, but the discovery takes about 15 seconds.

V. Mars Intelligent heartbeat mechanism

1. Fix the heartbeat mechanism

As mentioned above, the heartbeat mechanism is mainly used to prevent NAT timeouts and invalid external IP addresses. Therefore, it is common practice to ensure that heartbeat packets are sent before NAT becomes invalid. Alternatively, clients should send heartbeat packets at intervals slightly less than the NAT timeout.

The early wechat heartbeat is 4.5 minutes to send a heartbeat, can run well.

2. Mars Intelligent heartbeat strategy

Under the premise of not affecting the timeliness of receiving messages as far as possible, according to the network type, the largest heartbeat interval of keepalive signaling TCP connection is found adaptively, so as to reduce the air channel resource consumption caused by the heartbeat of android wechat, reduce the load of the heartbeat Server, and reduce part of the power consumption caused by the heartbeat.

Adaptive heartbeat

Therefore, under the fixed heartbeat mechanism, wechat also studied a set of dynamic heartbeat calculation scheme, dynamically detected the maximum NAT timeout time, and then selected the appropriate heartbeat interval to send heartbeat packets. Here’s the general idea:

First, the longer the heartbeat interval, the smaller the load and consumption. Therefore, wechat adopts adaptive heartbeat: When an effective heartbeat interval is found, we take the initiative to increase this interval, and then test whether the interval can be successful. If not, we use a time slightly shorter than the last successful interval as the interval. Otherwise, keep increasing the interval until you find a valid one.

So how do you tell if a heartbeat interval is valid? Wechat uses a fixed short heartbeat until three consecutive short heartbeats are met, and this interval is considered to be effective.

The detection process is roughly as follows: 60 seconds short heartbeat, three consecutive shots after the detection, 90,120,150,180,210,240,270

Front and back policies

In addition, considering that the requirements of the App for long connections are different in the front and back. Therefore, when wechat is active in the foreground, it adopts a fixed heartbeat mechanism; When the front screen is off or the background is active (within 10 minutes after entering the background), several times of minimum heartbeat are used to maintain the long connection, and then the adaptive heartbeat mechanism is entered. In the background stable state (more than 10 minutes), the maximum heartbeat calculated by adaptive heartbeat is used as the fixed value.

If a heartbeat failure occurs during the operation, reconnection is performed. At the same time, adjust the heartbeat interval to the interval before disconnection minus 20s, and re-walk the adaptive heartbeat; If the test fails for five consecutive times, the test will continue at the initial heartbeat of 180s.

Alarm alignment strategy

For The Android system, in order to reduce the power loss caused by frequent wake-up of the system, Alarm alignment wake-up mechanism is provided: multiple Alarm awakenings within a certain period of time are combined into one, reducing the number of system awakenings and increasing the standby time.

However, our heartbeat packets need to be automatically triggered to send a heartbeat packet when the timing ends. Therefore, the heartbeat interval in Mars is also based on the Alarm alignment time to reduce the power loss.

other

For those interested in the micro-confidence jump strategy, see the references at the end of this article, where you can refer to the smart_heartbeat code.

VI. Long connection data protocol and encryption

The long connection passes binary data, and the front and back end can negotiate the contents of each byte. Of course, some generic protocols can also be considered: serialization schemes such as SMTP and ProtoBuf.

Reference article: A Super lean Long Connection messaging protocol based on TCP/WebSockets.

In addition, in data encryption, asymmetric encryption algorithm RSA and symmetric encryption algorithm AES can be combined to encrypt data transmission.

This point is not the focus of this article, do not repeat too much.

VII. Long Connection Channel construction and Disaster recovery

The advantages of long connection mentioned above, how can we build the whole long connection channel? Here we take the long connection channel of Meituan as an example to illustrate, and the schemes of major factories are similar.

  1. The client establishes a long-term connection with the proxy server. The proxy server can be deployed in multiple places in The country. When establishing a long-term connection, you can select the nearest server IP address to access the proxy server.
  2. After the long connection is established, the client encrypts and transmits the binary data to be sent.
  3. After receiving the request, the proxy server can access the service server through an internal private line or a common Http request.
  4. If the long connection is unavailable due to a fault, you need to immediately degrade the Http short connection or UDP channel to ensure client running.

summary

This article combines the domestic big factory such as Tencent, Meituan and other long connection framework, for the long connection this technical point has made a complete introduction and analysis, if there is no right or question, welcome to leave a message.


thank you

wingjay


“100 million Level Android architecture” small column introduction

Rapid business growth cannot be achieved without a stable and reliable architecture. “100 million Level Android Architecture” small column will be based on the author’s practical work experience, combined with the domestic big companies such as Ali, Tencent, Meituan and other infrastructure status, try to talk about how to design a good set of architecture to support business from 0 to 1, even to 100 million, hope to discuss with you more.

The main content of this column:

  1. What Android architecture does the big factory have?
  2. What problems these architectures solve;
  3. What are the principles of these architectures;
  4. Learn what these structures mean for us.

100 Million Android ArchitectureList of small columns:

“100 million Level Android Architecture” column with talk

Network Connectivity and Acceleration in the Android Architecture

Long Connection Technology for Android Architecture

“High Availability mobile Connectivity with Android Architecture”

Evolution of Network Security in the Android Architecture

High Performance Mobile Logging System based on Android Architecture


Reference:

Mobile IM Practice: Implementing intelligent Heartbeat Mechanism of wechat on Android

Summary of Message Push on Android: Implementation principle, Heartbeat Survival, Problems encountered, etc.

Mobile Network Optimization Practice of Meituan-Dianping

Android version oF wechat backstage combat sharing (Network Protection)

Overview of Mobile APP Network Optimization

High Efficiency Long Life Connection: A Step-by-step Guide to Implementing Adaptive Heartbeat Survival Mechanisms

Discussion on the Design and Implementation of an Intelligent Heartbeat Algorithm for Android TERMINAL IM

HTTP Long Connection Description

TCP Advanced

Public account, focus on Android, Java, big front end and other technical fields, but also include the growth of the program ape, job-hopping and other content.