TCP is a stateful communication protocol. Stateful refers to the state of the connection maintained by both parties during communication.
First, the TCP keepalive
Let’s briefly review the process of establishing and disconnecting a TCP connection. (The main process is considered here. Packet loss, congestion, Windows, and retry failures will be discussed later.)
First, the client sends a syn (Synchronize Sequence Numbers) packet to the service end to tell the service end that it wants to connect to you. The syn packet carries the SEQ Sequence number of the client. The server sends a SYN + ACK packet. The SYN packet is similar to the CLIENT but carries the SEQ sequence number of the server. The ACK packet confirms that the client is allowed to connect. Finally, the client sends an ACK to acknowledge the syn packet received from the server. This allows the client and server to establish a connection. The whole process is called the three-way handshake.
After the connection is established, the client or server can send data through the established socket connection. After receiving data, the peer end can confirm the receipt of data through ack.
Once the exchange is complete, the client can usually send a FIN packet to tell the other end that I’m disconnected; The other end confirms receiving the FIN packet through ACK, and then sends the FIN packet to tell the client that I am also closed. Finally, the client responds with an ACK confirming that the connection has been terminated. The whole process becomes four waves.
TCP is often criticized for its performance. In addition to TCP+IP’s extra header, it requires three handshakes to establish a connection and four waves to close it. If very little data is sent, very little valid data is transmitted.
Can you make a connection once and then reuse it? It is possible to do this, but this brings up another problem: what if the connection is never released and the port is full? This introduces the first topic of today’s discussion, TCP Keepalive. After a TCP keepalive connection is established, it is kept in keepalive mode and is not interrupted immediately after data transmission. Instead, the connection status is detected through the Keepalive mechanism.
The Linux control keepalive takes three parameters: Net.ipv4. tcp_keepalive_time, keepalive interval net.ipv4.tcp_keepalive_intvl, Keepalive probes net.ipv4.tcp_keepalive_probes, The defaults are 7200 seconds (2 hours), 75 seconds, and 9 probes, respectively. If the KEEP-alive mechanism of TCP is used, the disconnect takes at least 2 hours + 9 x 75 seconds in Linux. For example, after logging in to a server using SSH, we can see that the KEEPalive time of TCP is 2 hours, and the probe packet is sent after 2 hours to confirm whether the peer is connected.
TCP keepalive is a leaky TCP connection on the server:
# ll /proc/11516/fd/10
lrwx------ 1 root root 64 Jan 3 19:04 /proc/11516/fd/10 -> socket:[1241854730]
# date
Sun Jan 5 17:39:51 CST 2020
Copy the code
The connection has been established for two days, but the other party has been disconnected (abnormally). The connection was not released due to the use of older GO (problems with pre-1.9 versions).
To solve this problem, you can use the KEEPalive mechanism of TCP. The new GO language supports setting the Keepalive time when establishing a connection. First look at the DialContext method in the network package that establishes a TCP connection
if tc, ok := c.(*TCPConn); ok && d.KeepAlive >= 0 {
setKeepAlive(tc.fd, true)
ka := d.KeepAlive
if d.KeepAlive == 0 {
ka = defaultTCPKeepAlive
}
setKeepAlivePeriod(tc.fd, ka)
testHookSetKeepAlive(ka)
}
Copy the code
DefaultTCPKeepAlive is 15s. If the connection is HTTP and the default client is used, it sets the Keepalive time to 30 seconds.
var DefaultTransport RoundTripper = &Transport{
Proxy: ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
DualStack: true,
}).DialContext,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
Copy the code
Here’s a simple demo test, with the following code:
func main() {
wg := &sync.WaitGroup{}
c := http.DefaultClient
for i := 0; i < 2; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for {
r, err := c.Get("http://10.143.135.95:8080")
iferr ! = nil { fmt.Println(err)return
}
_, err = ioutil.ReadAll(r.Body)
r.Body.Close()
iferr ! = nil { fmt.Println(err)return
}
time.Sleep(30 * time.Millisecond)
}
}()
}
wg.Wait()
}
Copy the code
After executing the program, you can view the connection. The initial keepalive value is set to 30s.
Then it decreases continuously until it reaches 0, and then it gets 30 seconds again.
You can capture packets using tcpdump.
# tcpdump -i bond0 port 35832 -nvv -A
Copy the code
Actually a lot of application is not through the TCP keepalive mechanism live, because the default more than two hours to check time for many real-time systems are completely couldn’t meet, typically by application layer timing monitoring, such as PING – PONG mechanism (like to play table tennis, one), the application layer send heartbeat packets every once in a while, For example, ping pong of Websocket.
Second, the TCP Time_wait
The second topic I want to share with you is TCP’s Time_wait state. ,
Why do we need time_wait? Why not just go to closed? Going directly to Closed frees resources for new connections more quickly, rather than waiting another 2MSL (Linux default).
There are two reasons:
One is to prevent “lost packets”, as shown below, if a third packet arrives late on the first connection due to an underlying network failure. Waiting for a new connection to be established before the late packet arrives will result in receiving data disturbance.
The second reason is simpler. If the last ACK is lost, the peer party will remain in the last ACK state. If a new connection is initiated, the peer party will return the RST packet to reject the request, and the new connection cannot be established.
The time_wait state is designed for this purpose. In the case of high concurrency, time_WAIT multiplexing means that connections in time_wait state can be reused if TCP of time_wait can be reused. Change from time_WAIT to ESTABLISHED and reuse. The Linux kernel uses net.ipv4.tcp_tw_reuse to control whether to enable time_wait reuse.
You may be curious to know that time_Wait was originally designed to solve these two problems. If direct reuse will not lead to the above two problems? Tcp_timestamps = 1 Net.ipv4. tcp_timestamps = 1
After the timestamp is enabled, for the problem of the first lost packet, the timestamp of the late packet will be discarded directly, which will not cause the disorder of the new connected packet. Reuse: When the server is in last-ACK state, the syn packet returns a FIN,ACK packet, and the client then sends an RST to the server to close the request so that the client can send a SYN again to establish a new connection.
The tcp_TW_reuse parameter is also used to forcibly reclaim time_WAIT connections. This parameter causes packet loss in the NAT environment. Therefore, you are not advised to enable it.
Author: Chen Xiaoyu
The author is the author of the Cloud Computing Thing: Progressing from IaaS to PaaS