There are usually two techniques for sending heartbeat packets

Method 1: The application layer implements its own heartbeat packet

The application itself sends heartbeat packets to check whether the connection is normal. The general method is as follows: The server periodically sends a short packet to the client in a Timer event, and then starts a low-level thread, which continuously detects the response of the client. If no response is received within a certain period of time, the client is considered to be offline. Similarly, if the client does not receive a heartbeat packet from the server within a certain period of time, the connection is considered unavailable.



Method 2: TCP KeepAlive mechanism

Considering that a server usually connects to multiple clients, it is more complicated to implement the heartbeat packet by the user at the application layer. However, it is much simpler to use the KeepAlive function built into the TCP/IP protocol layer to implement the heartbeat function. After the KeepAlive function is enabled on either the server or the client, the KeepAlive function automatically sends heartbeat packets to the peer within a specified time. After receiving the heartbeat packets, the peer automatically replies to inform the peer that the peer is still online. The TCP layer does not enable KeepAlive by default because of the extra bandwidth and traffic required to enable KeepAlive. This is trivial, but adds to the cost in a pay-per-traffic environment. On the other hand, An improper KeepAlive setting may disconnect a healthy TCP connection due to transient network fluctuations. Also, the default KeepAlive timeout takes 7,200,000 MilliSeconds, or two hours, with five detections. For many server applications, two hours of idle time is too long. Therefore, we need to manually enable the KeepAlive function and set appropriate KeepAlive parameters.

(As we know, TCP has a connection detection mechanism. If no data is transmitted within a specified period of time (usually 2 hours), a keep-alive packet is sent to the peer. The serial number is the serial number of the last byte of the last packet sent. Sending back a TCP ACK confirming that the byte has been received lets you know that the connection is not broken. If no response is received within a period of time, the device tries again. After several retries, the device sends a reset message to the peer device and disconnects the connection. In Windows, the first probe is made two hours after the last data was sent, and then five probes are made every one second. If no response is received, the connection is disconnected.

For Win2K/XP/2003, the keepalive parameter affecting all connections across the system can be found from the following registry key:

[HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters]

“KeepAliveTime” = dword: 006 ddd00

“KeepAliveInterval” = dword: 000003 e8

“MaxDataRetries” = “5”



Heartbeat packet mechanism

Heartbeat packets are called heartbeat packets because they are sent at regular intervals, like heartbeat, to tell the server that the client is still alive. In fact, this is to keep the connection long. As for the contents of this package, there is no special rule, but it is usually a very small package, or an empty package containing only the header. In the TCP mechanism, there is a mechanism for heartbeat packets, that is, the TCP option: SO_KEEPALIVE. The default heartbeat rate is set to 2 hours. But it can not check the machine power, network cable pulled out, firewall and other disconnections. And the logical layer may not be so good at handling broken lines. In general, if only for survival or can be. Heartbeat packets are typically implemented by sending empty Echo packets at the logical layer. The next timer sends an empty packet to the client at a certain interval, and then the client feeds back a same empty packet. If the server does not receive the feedback packet sent by the client within a certain period of time, it can only identify that it is disconnected. In fact, to determine the connection is down, just need to send or recv, if the result is zero, it is down. However, with a long connection, there may be no data coming and going for a long time. In theory, the connection is always connected, but in practice it is difficult to know if the intermediate node fails. What’s more, some nodes (firewalls) will automatically disconnect the connection without data interaction within a certain period of time. At this point, we need our heartbeat packets to maintain the long connection and keep us alive. After learning of the disconnection, the server logic may need to do some things, such as cleaning up the data after the disconnection, reconnecting… This, of course, is up to the logical layer to do according to the requirements. In general, heartbeat packets are mainly used for long connection preservation and disconnection processing. For general applications, 30-40 seconds is a good time. If it’s really demanding, try 6-9 seconds.

\

Heartbeat detection procedure: 1 the client every once in a time interval in a probe packets to server 2 client contract starts a timer timeout 3 server receives the test package, should respond to a package of 4 if the client received a reply packet, the server is the server is normal, delete if client timeout timeout timer timer timeout, still not received a reply packet, The server is down. Go to blog.sina.com.cn/s/blog_a459… We can set the SO_KEEPALIVE property so that we can find out after 2 hours whether the TCP connection is still alive when the other end disconnects in an inelegant way. // set KeepAlive 1, BOOL bKeepAlive = TRUE; int nRet=::setsockopt(sockClient,SOL_SOCKET,SO_KEEPALIVE,(char*)&bKeepAlive,sizeof(bKeepAlive)); if(nRet! =0) {AfxMessageBox(” error “); return ; // Set the KeepAlive detection time and times tcp_keepalive inKeepAlive = {0}; Unsigned long ulInLen = sizeof(tcp_keepalive); tcp_keepalive outKeepAlive = {0}; Unsigned long ulOutLen = sizeof(tcp_keepalive); unsigned long ulBytesReturn = 0; Inkeepalive. onoff = 1; // Set the socket keep alive to 10 seconds and send three times. inKeepAlive.keepaliveinterval = 4000; / / the time interval between two KeepAlive probe inKeepAlive. Keepalivetime = 1000; NRet =WSAIoctl(sockClient, SIO_KEEPALIVE_VALS, (LPVOID)&inKeepAlive, ulInLen, (LPVOID)&outKeepAlive, ulOutLen, &ulBytesReturn, NULL, NULL); If (SOCKET_ERROR == nRet) {AfxMessageBox(” error “); return; } int keepIdle = 6; int keepInterval = 5; int keepCount = 3; Setsockopt(listenfd, SOL_TCP, TCP_KEEPIDLE, (void *)&keepIdle, sizeof(keepIdle)); Setsockopt(listenfd, SOL_TCP,TCP_KEEPINTVL, (void *)&keepInterval, sizeof(keepInterval)); Setsockopt(listenfd,SOL_TCP, TCP_KEEPCNT, (void *)&keepCount, sizeof(keepCount)); See: blog.csdn.net/gavin1203/a…

To the operation of the setsockopt, see: www.cnblogs.com/hateislove2…

———————————————————————————————————————— –

** The application layer uses the following functions for each socket to enable the Keepalive mechanism. The parameters are configured as described in the system. 天安门事件

setsockopt(rs, SOL_SOCKET, SO_KEEPALIVE, (void *)&keepAlive, sizeof(keepAlive));

Note: Keepalive is a TCP packet, not an application layer packet, which means it cannot be obtained from the application layer through functions such as recV. You can use the packet capture tool to see.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = \

What is keepalive timer? [1]

Many TCP/IP beginners are surprised to learn that there is no data flow over an idle TCP connection. That is, if neither process is sending data to the other on either end of the TCP connection, there is no data exchange between the two TCP modules. Polling you may find polling in other network protocols, but it does not exist in TCP. The implication is that as soon as we start a client process and establish a TCP connection with the server, it doesn’t matter how many hours, days, weeks, or months you’re gone, the connection still exists. Routers in the middle may crash or restart, telephone lines may go down or back up, and the connection remains established as long as the hosts at both ends are not restarted.

It is safe to assume that neither client-side nor server-side applications have application-level timers that detect connection inactivity and cause either application to terminate. Sometimes, however, the server needs to know if the client host crashed and shut down, or crashed and restarted. Many implementations provide survival timers to accomplish this task.

The survival timer is a controversial feature. Many people believe that even if this feature is needed, this polling of the other party should be done by the application, not implemented in TCP. In addition, if there is a temporary interruption of connection on an intermediate network between two terminal systems, the survival option can cause the termination of a good connection between two processes. For example, if a survival probe is sent just as an intermediate router crashes and restarts, TCP will assume that the client host has crashed, but it has not.

Keepalive is not part of the TCP specification. The Host Requirements RFC lists three reasons for not using it: (1) they may cause a good connection to be dropped during a brief period of failure, (2) they consume unnecessary broadband, and (3) they cost money on the Internet where packets are charged. However, survival timers are provided in many implementations.

Some server applications may be hogging resources on behalf of the client and need to know if the client host is crashing. Survival timers can provide probing services for these applications. Many versions of the Telnet server and Rlogin server provide a survival option by default.

A PC user logging in to a host over Telnet using TCP/IP is a common example of the need for a survival timer. If a user simply turns off the power at the end of use, instead of logging off, he is left with a half-open connection. In Figure 18.16, we see how a reset is returned by sending data on a half-open connection, but that is the data sent by the client. If the client disappears, leaving the server with a half-open connection, and the server is waiting for data from the client, the wait will go on forever. The purpose of the survival feature is to detect this half-open connection on the server side. How does Keepalive work? [1]

In this description, we call the segment where the survival option is used the server and the other end the client. This option can also be set on the client side, and there is no reason why this should not be allowed, but it is usually set on the server. If both ends of a connection need to detect each other’s disappearance, you can set it on both ends (for example, NFS).

If there is no activity on a given connection for two hours, the server sends a probe segment to the client. (We’ll see what the probe section looks like in the example below.) The client host must be in one of four states: \

1) The client host is still up and reachable from the server. From the client TCP’s normal response, the server knows that the other party is still active. The server’s TCP resets the survival timer for the next two hours, and if application communication occurs on the connection before those two hours expire, the timer resets for the next two hours and then exchanges data.

2) The client has crashed, is down, or is in the process of restarting. In either case, its TCP does not respond. The server received no response to probe it and timed out after 75 seconds. The server will send a total of 10 such probes, each for 75 seconds. If it does not receive a response, it assumes that the client host is down and terminates the connection.

3) The client crashed but has been restarted. In this case, the server will receive a response to its survival probe, but this response is a reset, causing the server to terminate the connection.

4) The client host is active, but the slave server is unreachable. This is similar to state 2, because TCP cannot distinguish between the two. All it can indicate is that it has received no reply to its probe.

The server does not have to worry about the client host being shut down and then restarted (a normal shutdown performed by the operator, not a crash of the host). When the system is shut down by the operator, all application processes (that is, client processes) are terminated, and client TCP sends a FIN on the connection. Upon receiving the FIN, server TCP reports an end of file to the server process, allowing the server to detect this state.

In the first state, the server application does not know if survival probes have occurred. Everything is handled by the TCP layer, and survival probes are transparent to the application until the next two, three, and four states occur. In these three states, an error message is returned to the server application via the server’s TCP. Usually the server makes a read request to the network and waits for data from the client. If the survival feature returns an error message, it is returned to the server as the return value of the read operation. In state 2, the error message is similar to “Connection timed out”. State 3 indicates that the connection was reset by the other party. The fourth state looks like a connection timeout or may return other error messages depending on whether an ICMP error message related to the connection was received.

Windows:

On a normal TCP connection, when we call the following Recv or Send with an infinite wait:

   ret=recv(s,&buf[idx],nLeft,flags);

   或

   ret=send(s,&buf[idx],nLeft,flags);

If the TCP connection is closed properly, that is, closesocket(s) or shutdown(s) is properly called, the Recv or Send call will return immediately with an error. This is because closesocket(s) or shutdown(s) has a normal closing process that tells the other party “THE TCP connection is closed, you don’t need to send or receive any more messages.” However, if the network cable is suddenly removed and the machine at either end of the TCP connection is suddenly powered off or restarted, the Recv or Send operator will wait for a long time because there is no notification of connection interruption. The solution to this situation is to enable the keepAlive mechanism in TCP programming.

    struct TCP_KEEPALIVE inKeepAlive = {0};

unsigned long ulInLen = sizeof(struct TCP_KEEPALIVE);

struct TCP_KEEPALIVE utKeepAlive = {0};

unsigned long ulOutLen = sizeof(struct TCP_KEEPALIVE);

unsigned long ulBytesReturn = 0;

inKeepAlive.onoff=1; inKeepAlive.keepaliveinterval=5000; / / unit for milliseconds inKeepAlive keepalivetime = 1000; Ret =WSAIoctl(s, SIO_KEEPALIVE_VALS, (LPVOID)&inKeepAlive, ulInLen, (LPVOID)&outKeepAlive, ulOutLen, &ulBytesReturn, NULL, NULL);

Keepalivetime indicates the probe frequency when the TCP connection is unblocked. Once the probe packet does not return, it is sent at the frequency of KeepaliveInterval. After several retries, if the probe packet does not return, the conclusion is drawn: The TCP connection has been disconnected, so the Recv or Send call above can return immediately without being stuck indefinitely.

\

\

The image above is an illustration of the above text. Before the bar is lit, TCP is open and KeepAlive sends a probe packet at a frequency of 1000 milliseconds (the value of KeepaliveTime). When the 32nd probe packet does not return, TCP sends a probe packet at a frequency of 5000 milliseconds (the value of KeepaliveTime). After several retransmissions, the probe packet does not return, so it is concluded that the TCP connection has been broken!

 

For Win2K/XP/2003, the keepalive parameter affecting all connections across the system can be found from the following registry key:


[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]

“KeepAliveTime” = dword: 006 ddd00 KeepAliveInterval = dword: 000003 e8 “MaxDataRetries” = “5”

 

Two hours is too much idle time for a utility. Therefore, we need to manually enable the Keepalive function and set appropriate Keepalive parameters. On Windows XP and Windows 2003, it can be set for a single socket, but on Windows 2000, it cannot be set for a single socket. If set, it affects all sockets in the entire system.

Linux implementation:

SO_KEEPALIVE/TCP_KEEPCNT/TCP_KEEPIDLE/TCP_KEEPINTVL IF one party has closed or terminated the connection abnormally, and the other party is not aware of it, we call such a TCP connection half-open. TCP uses KeepAlive timers to detect half-open connections. In a network server with high concurrency, socket omissions often occur, resulting in a large number of CLOSE_WAIT connections. In this case, you can fix the problem by setting the KEEPALIVE option, but there are other ways to fix the problem. See Resources 8 for details. //Setting For KeepAlive int KeepAlive = 1; setsockopt(incomingsock,SOL_SOCKET,SO_KEEPALIVE,(void*)(&keepalive),(socklen_t)sizeof(keepalive)); int keepalive_time = 30; setsockopt(incomingsock, IPPROTO_TCP, TCP_KEEPIDLE,(void*)(&keepalive_time),(socklen_t)sizeof(keepalive_time)); int keepalive_intvl = 3; setsockopt(incomingsock, IPPROTO_TCP, TCP_KEEPINTVL,(void*)(&keepalive_intvl),(socklen_t)sizeof(keepalive_intvl)); int keepalive_probes= 3; setsockopt(incomingsock, IPPROTO_TCP, TCP_KEEPCNT,(void*)(&keepalive_probes),(socklen_t)sizeof(keepalive_probes)); Set the SO_KEEPALIVE option to enable KEEPALIVE. Then set parameters such as the start time, interval and times of KEEPALIVE through TCP_KEEPIDLE, TCP_KEEPINTVL and TCP_KEEPCNT. Tcp_keepalive_time, tcp_keepalive_intvl, and tcp_keepalive_probes can also be set to /proc/ sys/net/ipv4/tcp_keepalive_probes, but this will affect all sockets. Therefore, setsockopt is recommended.



———————————————————————————————————————— ————————————

CAsyncSocket::SetSockOpt BOOL SetSockOpt( int nOptionName, const void* lpOptionVlaue, int nOptionLen, int nLevel = SOL_SOCKET ); Return value: Returns a non-zero value on success, or zero otherwise, and can call GetLastError to get a specific error code. Error codes available for this member function are:

, WSANOTINITIALISED AfxSocketInit must have been successfully executed before this API function can be called.
, WSAENETDOWN Windows Sockets Detected a network system failure.
, WSAEFAULT LpOptionValue is not a valid value in the process address space.
, WSAEINPROGRESS Performing a block Windows Sockets operation.
, WSAEINVAL NLevel is invalid, or the information in lpOptionValue is invalid.
, WSAENETRESET When SO_KEEPALIVE is set, the connection times out.
, WSAENOPROTOOPT The system does not support this option. The sockets of SOCK_STREAM do not support SO_BROADCAST. The sockets of SOCK_DGRAM do not support SO_DONTLINGER, SO_KEEPALIVE, SO_LINGER, and SO_OOBINLINE.
, WSAENOTCONN When SO_KEEPALIVE is set, the connection has been reset
, WSAENOTCONN The socket is not connected (only for sockets of type SOCK_STREAM).
, WSAENOTSOCK The descriptor is not a socket.

Parameters:

nOptionName Prepare socket options for setting values.
lpOptionValue A pointer to the buffer where the value is to be set.
nOptionLen LpOptionValue Specifies the number of bytes in the buffer.
nLevel The system supports only SOL_SOCKET and IPPROTO_TCP.

Description: This function is used to set socket options. It can set socket options of any type and state, changing their current values. Although options can exist at multiple levels of the protocol, this function only sets options for the highest-level (socket) of the protocol. Options affect the operation of the socket, such as whether fast data is allowed to be received in a normal data stream, whether broadcast messages are allowed in the socket, and so on. Socket options come in two types: Boolean options (which allow or disable a function) and integer or structural options. To allow a Boolean option, lpOptionVlaue simply points to a non-zero integer. When this option is disabled, lpOptionValue points to an integer equal to 0. For Boolean options, nOptionLen should be equal to sizeof(BOOL). For other options, lpOptionValue refers to the integer or structure that contains the value required for the option, and nOptionLen specifies the integer type or length of the structure. SO_LINGER is used to control how unsent data is queued on the socket and how the socket behaves when the Close function is called. For more detailed information, see Windows Socket Programming Considerations in the online documentation “Win32 SDK”. By default, a socket cannot be bound to a local address that is in use. In some cases, however, you want to reuse these addresses. Since each connection is uniquely determined by both the local address and the remote address, it is perfectly feasible to keep the remote address different while having both sockets bound to the same local address. The Windows Sockets implementation disallows Bind if the address is already being used by another socket when calling Bind. To avoid this situation and enable address reuse, set the socket option SO_REUSEADDR before calling Bind. This option only works when Bind is called. It is not necessary to set the option to SO_REUSEADDR for a socket that does not reuse addresses. Setting or resetting this option after calling Bind will not affect any sockets. An application can set the SO_KEEPALIVE option, which allows Windows Sockets to implement a provided TCP keepalive packet. See “Windows Socket Programming Considerations” in the online documentation “Win32 SDK”). Windows Sockets implementations don’t have to support “keepalive” packages. The exact semantics are implementation-defined if they are supported, but must be consistent with section 4.2.3.6 “Internet hosts – communication layer requirements” in RFC1122. If a connection is deleted because of keepalive, any calls on this socket return the error code WSAENETRESET, and subsequent calls return the error code WSAENOTCONN. Setting the TCP_NODELAY option disables the Nagle algorithm. The Nagle algorithm reduces the number of packets sent by buffering small unacknowledged packets sent by the host into a larger packet. However, for some applications, this can be inefficient, and TCP_NODELAY can be set to disable the Nagle algorithm. Application programmers should not easily set this option to TCP_NODELAY because it can have a significant negative impact on network performance. TCP_NODELAY is the only IPPROTO_TCP level socket option supported. Some Implementations of Windows Sockets can support debugging information output by setting the SO_DEBUG option. SetSockOpt supports the following table of options. The type column refers to the data type pointed to by lpOptionValue. \

value type meaning
SO_BROADCAST BOOL Allows broadcast messages to be transmitted over the socket
SO_DEBUG BOOL Recording Debugging Information
SO_DONTLINGER BOOL Not a block waiting for a Close call when unsent data has been sent; Setting this option is equivalent to setting SO_LINGER when L_onofff =0
SO_DONTROUTE BOOL Routing without translation: Send data directly to the interface
SO_KEEPALIVE BOOL Send keep_alives
SO_LINGER struct LINGER If there is unsent data, the wait is delayed at Close
SO_OOBINLINE BOOL Receives out-of-band data in a normal data stream
SO_RCVBUF int Sets the size of the buffer to receive data
SO_REUSEADDR BOOL Allows a socket to bind to an already used address
SO_SNDBUF int Sets the size of the buffer to send data
TCP_NODELAY BOOL Disable the Nagle algorithm when sending data

Berkeley Software Distribution (BSD) options not supported by SetSockOpt are:

value type meaning
SO_ACCEPTCONN BOOL The socket is listening, allowing connections to be received
SO_ERROR int Returns and clears the error status
SO_RCVLOWAT int Receive minimum level mark
SO_RCVTIMEO int A timeout message was received
SO_SNDLOWAT int Send the lowest level mark
SO_SNDTIMEO int Sending timeout messages
SO_TYPE int Sets the type of the socket
IP_OPTIONS int Set the options field in the IP header

\

\