Let’s start with a fitness instructor.

Li Dong, who calls himself the terminator of sub-health, tried to expand his business using the Internet plus model. Post an AD on a newly developed chat app called Chen Chen.

Keyboards are coming. Crazy send “Li Dong”, enter send! , “Sub-health terminator”, and then press Enter to send!

Remember what a layer 4 network protocol looks like?

In the four-tier network model, each layer has its own function, and messages enter each layer with an additional header. Each additional header can be interpreted as an additional hat for a datagram. The masthead records where the message came from, where it went, and how long it was. For example, the MAC header records the unique address of the hardware, the IP header records where it is coming from and where it is going, and the transport header records which process it is going to once it reaches the destination host.

When the message is sent from the message to the network, the message and the complicated network through these information in the flow between routers, finally arrived at the destination machine, the receiver through these headers, step by step to restore the original message sent by the sender.

Why slice the data

Software riches are at the application layer.

The two messages, “Li Dong” and “Sub-health Terminator”, enter the transport layer using the TCP protocol on the transport layer. Messages are sliced into packets as they enter the transport layer (TCP). The length of this packet is MSS.

Network can be compared to a water pipe, there is a certain thickness, the thickness is provided by the network interface layer (data link layer) to the network layer, it is generally believed yes MTU (1500), spread to the entire message directly, will be more than pipe bearable range, so you need to slice, become a packet, so that the message can normal through the “pipe”.

What is the difference between MTU and MSS

  • MTU: Maximum Transmit Unit. The maximum size of data transmitted at one time provided by the network interface layer (data link layer) to the network layer; Generally, MTU is 1500 bytes. Assume that the IP layer has <= 1500 bytes to send, and only one IP packet is needed to complete the sending task. Assume that more than 1500 bytes of data need to be sent at the IP layer. Fragmentation is required to complete the sending. The IP Header IDS after fragmentation are the same.

  • MSS: Maximum Segment Size. The MSS is used by TCP to limit the maximum number of bytes that can be sent by the application layer. The MSS is used by TCP to limit the number of bytes that can be sent by the application layer. If MTU= 1500 bytes, MSS = 1500-20 (IP Header) -20 (TCP Header) = 1460 bytes. If 2000 bytes are sent at the application layer, two slices are required to complete the sending. The first TCP slice = 1460, the second TCP slice = 540.

What is sticky bag

Then, when Li Dong typed “Li Dong” and “sub-health Terminator” on his mobile phone, the message was divided into MSS size in TCP and sent smoothly along the network cable.

The network is stable and the message is fragmented to the peer phone B. TCP layer message reassembly. Become a stream like “Lee-Dong’s Health Terminator”.

But because chat software Chen Chen is new development, and the developer called xiao Bai, finished, is a notorious bug engineer. Through his code, the message changes from “Lidong”, “sub-health terminator” to “Lidong”, “health terminator” while processing the byte stream. The “lidong” as the contents of the previous packet was stuck with the “ya” in the next packet and was mistakenly parsed as a packet. This is called sticky bag.

A fitness coach known as the terminator of health, probably not very bad luck, I wish him a large number of customers.

Why do sticky bags appear

That starts with what TCP is.

TCP, Transmission Control Protocol. Transmission control protocol (TCP) is a connection-oriented, reliable, byte stream based transport layer communication protocol.

One of the most sticky packet is based on the character of byte stream.

Byte stream can be understood as data flowing in a two-way channel. This data is actually what we often call binary data. Simply put, it is a lot of 01 strings. There are no boundaries between these 01 strings.

The data transmitted from the application layer to TCP is sent to the downstream in byte stream mode rather than the packet unit to the destination host. The data may be cut and assembled into various packets. After receiving these packets, the receiver fails to restore the original message, resulting in sticky packets.

Why assemble the sent data

The purpose of the TCP packet cutting mentioned above is to make it through the network pipe. Instead, there is a case of assembly. If the data sent by TCP is much smaller than that sent by MSS, for example, only a few bytes are sent separately each time, network IO is wasted.

For example, xiao Bai dad let xiao Bai go out to buy a bottle of soy sauce, xiao Bai went out to buy soy sauce back. Small white mother let small white go out to buy a bottle of vinegar back. Small white before and after solid ran two times, affected the time of playing games.

The optimization method is also relatively simple. When white dad let white to buy soy sauce, white first wait, continue to play the game, at this time if white mom let white buy a bottle of vinegar back, white can go out with two needs, and then bring things back.

TCP Nagle algorithm is optimized to avoid sending small packets.

With Nagle enabled, packets are sent in one of two cases:

  • If the packet length reachesMSS(or containFinPackage), send immediately, otherwiseWaiting for theThe next package arrives; If the next packet arrives and the total length of the two packets exceedsMSSIf so, it will be split and sent;
  • Wait timeout (generally200ms), the first bag didn’t arriveMSSIf the packet length is too long to wait for the arrival of the second packet, the packet is sent immediately.

  • Because of the activationNagle algorithm, msG1 is less than MSS, then wait200msMsg1 + MSG2 > MSS; therefore, msG2 is divided into MSG2 (1) and MSG2 (2)MSS. Send it now.
  • Msg2 (2) + MSG3 > MSS, so msG3 is divided into MSG3 (1) and MSG3 (2), MSG2 (2) + MSG3 (1) as one packet.
  • The remaining MSG3 (2) is insufficient in lengthmssAnd at the same time200msInside does not wait for the next packet, waits timeout, sends directly.
  • At this point, although the colors of the three packages are different in the picture, in the actual scene, they are all a whole string of 01. If the processing developer treats the first received MSG1 + MSG2 (1) as a complete message, it will look like the two packages are stuck together, which will lead to the sticky package problem.

Will the Nagle algorithm not stick when you turn it off?

Nagle’s algorithm is actually quite old, having been around since 1984. In a scenario where the application sends one byte of data at a time, without Nagle’s optimization, such packets are sent immediately, causing the network to become overloaded with too many packets.

But today’s network environment is so much better that Nagle’s optimization isn’t as helpful. In addition, its delay in sending may sometimes lead to a larger call delay. For example, when playing a game, you operate so smoothly, but Nagle algorithm is delayed to send a beat, so you are asked if you feel bad.

So now they usually turn it off as well.

It seems that the optimization effect of Nagle algorithm is not enough, and it will cause sticky packet “problem”. So is it possible to solve the sticky packet problem by turning off the algorithm?

TCP_NODELAY = 1
Copy the code

  • The receiving application picks up MsG1 as soon as it receives it, and there are no sticky packets
  • When msg2 ** arrives, the application layer is busy and stays in the TCP Recv Buffer
  • ** MSG3 ** is now in the TCP Recv Buffer along with MSG2 and MSG3
  • At this time, the application layer is busy and comes to fetch data. In the picture, two colors are used to distinguish the data, but in the actual scene, they are all 01 strings. When they are removed together, it is found that the package is still stuck.

Therefore, even if Nagle algorithm is disabled, the application layer at the receiving end does not read the data in the TCP Recv Buffer in time, and sticky packets still occur.

How to deal with sticky bags

The root cause of sticky packets is uncertain message boundaries. The receiver, faced with an “infinite” binary stream, has no idea how many 01 counts as a message. If you take too much, it’s sticky. In fact, sticky packets are not a TCP problem, but a problem caused by users’ misunderstanding of TCP.

As long as the sender sends the message with information identifying the message boundary each time, the receiver can identify the message boundary according to this information, and thus distinguish each message.

Common methods are

  • Add special marks

    A special flag can be used as the beginning and end. For example, when a 0xFFfffe or carriage return character is received, the header of a new message is considered to be received. At this time, data continues to be fetched until the next header flag 0xFFfffe or tail flag is received, and the message is considered to be a complete message. Similarly, when chunked encoded in HTTP, a message consists of several chunks, ending with a chunk of zero length.

  • Add message length information

This is usually used in conjunction with the special flag above, and when the header flag is received, the length of the message can be included to indicate how many bytes are in the message after that. If there is a byte that matches the length after that, it is fetched and used as a complete message by the application layer. In actual scenarios, content-Length in HTTP plays a similar role. If the Length of the message received by the receiver is smaller than content-Length, some messages are still not received. The receiver will wait until it runs out of messages or times out, which was explained in more detail in the previous article.

If the flag bit 0xfffffe is used to mark the beginning of a packet, aren’t you afraid that some of the data you send just happens to have this content?

Yes, afraid, so generally in addition to this flag bit, the sender will add a variety of checksum fields (checksum or data obtained after CRC for the whole section of complete data) after the flag bit, after receiving the whole section of data to check to ensure that it is the complete data sent by the sender.

UDP sticky packet

User Datagram Protocol (UDP), another transport layer Protocol with TCP. User packet protocol, a connectionless, unreliable, datagram – based transport – layer communication protocol.

Datagrams enable UDP to send one packet at a time no matter how long the packet is received by the application layer. As for if the packet is too long, it needs to be fragmented, that is also the MATTER of THE IP layer, even less efficient. UDP does not merge or split packets from the application layer, but retains the boundaries of the packets. The receiver receives a datagram without knowing when it will end, as with TCP’s endless binary stream. Because of the difference between datagram and byte stream, the TCP sender sends byte stream data 10 times, while the receiver can fetch data 100 times, and the length of each fetch data can be adjusted according to the processing capacity; However, if the UDP sender sends 10 datagrams, the receiver must receive as many datagrams as it sends, ensuring that each datagram is a complete one.

Let’s look at the IP header first

Note that there is a 16-bit total length, meaning that the IP header records the total length of the entire IP packet. Next, let’s look at UDP headers.

There are 16 bits in the header to indicate the length of the UDP data packet, assuming the length is N, as the data boundary. Therefore, the application layer at the receiving end can clearly distinguish different data packets. If n bits are taken from the header, a complete data packet is formed, thus avoiding packet sticking and packet unpacking.

Of course, even without this bit (16-bit UDP length), since the IP header already contains the total length of the data, if the IP packet (network layer) uses UDP (transport layer) data, then this total length actually contains the UDP header and UDP data.

Since the header length of UDP is fixed at 8 bytes (1 byte = 8 bits, 8 bytes = 64 bits, above except data and options), it is easy to calculate the length of UDP data. Therefore, UDP length information is redundant.

Length of UDP Data = Total IP length - IP Header length - UDP Header lengthCopy the code

Let’s look at the TCP header again

TCP header does not have the length of the information, similar to UDP, you can also use the following formula to obtain the TCP data length of the current packet.

TCP Data length = Total IP length - IP Header length - TCP Header length.Copy the code

Unlike UDP, the sender of TCP is not guaranteed to send a complete datagram, but merely a string of unstructured bytes that are received by the receiver even if the length is known, because it is likely to be part of a complete message.

Why is length field redundancy added to UDP headers

A lot of research has been done on this, but tcp-IP Detail (Volume 2) says it’s probably because it’s used to compute checksums. Some also say that because the bottom layer of UDP can not use IP protocol, after all, the IP header with the total length, just can be used to calculate the length of UDP data, in case the bottom layer of UDP is not IP layer protocol, but other network layer protocol, can not continue to calculate.

However, I think the most important reason is that THE IP layer is the network layer, while UDP is the transport layer. At the transport layer, there is no IP header information in the packet, so the UDP data will be stored in the UDP Socket Buffer. When the application layer does not have time to fetch this UDP datagram, both datagrams are actually a bunch of 01 strings at the data level. In this case, when the first datagram is read, the UDP header is read first. If the UDP header does not contain UDP length information, how much data should the application layer fetch to complete the datagram?

The length of the UDP header is the same as the boundary information that TCP adds to the message body to prevent sticky packets.

By revealing all of this in the interview, you make it seem like you’ve thought a lot about it, and the interviewer may think you’re a thoughtful person. Bonus points.

If I say wrong, please forward my article to more people, let everyone remember this full of nonsense, after paying attention to the merciless private letter scold me, please!

Does the IP layer have sticky packet problem

The IP layer will slice large packets, is there a sticky packet problem?

Conclusion first, no. As mentioned above, sticky packets are actually a problem caused by the user’s inability to distinguish between message boundaries.

Let’s take a look at IP layer slice subcontracting.

  • If the message is too long, the IP layer divides the message into N slices according to the MTU length. Each slice has its own position in the packet (offset) and the same IP header information.

  • Each slice is transmitted in the network. Each packet slice can be circulated through different routes and then assembled at its final destination.

  • When receiving the first slice packet, the receiving end will apply for a piece of new memory, create the data structure of IP packet, and wait for other slice subcontracting data to arrive.

  • After all the messages are in place, the entire message packet is sent to the upper layer (transport layer) for processing.

It can be seen that the whole process, from slicing slices according to length to assembling slices into a packet, the IP layer only cares about transportation, and does not need to care about the message boundary and content, and does not care about the message content, so there will be no sticky packet.

The IP layer says: I just send the data from the sender to the receiver, and I don’t know what’s in it.

It sounds like “I don’t care if the product needs to be stupid, I just do it, I don’t ask, AND I don’t want to fight for it.” This is something that every good paddling programmer should learn from, respect.

conclusion

The root cause of the sticky packet problem is that developers do not understand the TCP byte stream oriented data transmission mode correctly. It is not TCP’s problem, but developers’ problem.

  • TCP sends whatever the sender sends to the receiver based on the byte stream. This byte stream may contain partial information about the last data that was intended to be sent. The receiver adds information identifying message boundaries to the message as needed. Do not add may appear sticky package problem.
  • TCP sticky packet is related to Nagle algorithm, but disabling Nagle algorithm does not solve the sticky packet problem.
  • UDP is a datagram – based transmission protocol that does not have sticky packets.
  • The IP layer is also sliced, but because it does not care what is in the message, there is no sticky packet problem.
  • TCPThe sending end can sendTen timesByte stream data, the receiver can be divided100 timesTo get;UDPThe sender sentTen timesDatagrams, then the receiver has to be inTen timesAfter collect.

Packets are simply assembled and disassembled in TCP fashion, and if they make a mistake, they just make the same mistake every packet does.

Finally, Li Dong lost his job and Xiao Bai said

Article recommendation:

  • Shame on everyone. After three years of golang, I still can’t get this memory leak question right

  • Hardcore! HTTP interview questions

  • Programmer’s Guide to Preventing sudden death