What are we reading and writing when we read and write sockets?

Click open link

1, the introduction

This article follows on from the previous article, “Introduction to Brain-dead Network Programming (1) : Learn TCP Three Handshakes and Four Waves with Animation”, and continues to learn brain-dead network programming knowledge. Most programmers are very familiar with the concept of socket, it is the basis of computer network programming, TCP/UDP messages rely on it. We are familiar with the bottom of the Web server depends on it, we use the MySQL relational database, Redis memory database base depends on it. We rely on wechat to chat with others, we rely on it to play online games, and readers can read this article because it silently supports online communication. This article is still trying to use animation pictures to explain this knowledge point “brain-dead” (ha ha), expect readers can more simple, intuitive understanding of Socket communication data read and write nature. Note: if your network speed is slow, loading GIF animation may be slow, please wait patiently oh. \

2. About the author

Qian Wenpin (Lao Qian) : Graduated from Huazhong University of Science and Technology, majored in computer science and technology. He is a 10-year veteran of Internet distributed and high concurrency technology, and currently serves as a senior back-end engineer of Ireader Technology. Proficient in Java, Python, Golang and other computer languages, developed games, made websites, wrote message push system and MySQL middleware, and implemented open source ORM framework, Web framework, RPC framework, etc.

The author’s Github:github.com/pyloque\

3. Series of articles

This is the second article in a series that Outlines the following: \

Learn TCP three handshakes and four waves with Animation
Introduction to Brain-dead Network Programming (II) : What are We Reading and writing when We read and write sockets? \ (Article) \

4. Simple process understanding of Socket reading and writing

When the client and server using TCP protocol for communication, the client the req encapsulate a request object, the request object the req serialized into a byte array, and then through a socket socket to an array of bytes sent to the server, the server through a socket socket read into an array of bytes, then deserialized into the request object the req, processing, After processing, generate a corresponding response RES, serialize the response object RES into a byte array, and then send its array to the client through the socket. The client reads its array through the socket socket, and then deserializes it into a response object.

Communication frameworks can often hide the serialization process, and we can see this phenomenon as shown in the figure above, with the request object REQ and the response object RES running back and forth between the client and server.

You may think this process is simple and easy to understand, but the sequence of events behind it is actually more than most of you can imagine. The actual process of communication is much more complicated than the picture above. You may ask, how far do we need to go? Can’t we just use it?

Many years of experience in Internet technology service industry told me that if you do not understand the underlying mechanism, you will not understand why the socket socket, speaking, reading and writing will appear various problems of qiqi darling, why sometimes jam, sometimes don’t jam, sometimes an error, why there are sticky problem of half a pack, NIO what is specific, Is it a particularly new technology? Understanding these issues requires an understanding of the underlying mechanics. \

5. Detailed process analysis of Socket reading and writing

To help you understand the communication underlayer, I took the time to create the following animation, which does not cover all the details of the underlying layer, but is good enough to understand how sockets work. Please take a closer look at this animation, which will be the focus of the rest of the tutorial.

The socket we normally use is really just a reference (an object ID), and the socket object is actually placed in the operating system kernel. There are two important buffer structures inside this socket object, the read buffer and the write buffer, which are finite size array structures.

When we write a byte array (serialized request message object REQ) to the socket of the client, we copy the byte array into the write buffer of the socket object of the kernel area. The kernel network module has a separate thread that continuously copies the write buffer data to the nic hardware. The nic hardware then sends the data to the network cable, through a series of routers and switches, and finally to the nic hardware of the server.

Similarly, the network module of the server kernel has a separate thread that continuously copies the received data into the socket’s Read buffer for the user layer to read. Finally, the user process of the server copies the data in the Read buffer to the user program memory through the read method referenced by the socket for deserialization into the request object for processing. The server then sends the processed response object to the client in a reverse flow that is not described here.

\

5.1 Detailed process: blocking

Note that the write buffer space is limited, so if the application writes too fast into the socket, this space will be full. Once full, write operations block until enough space is available. With NIO(non-blocking IO), however, write operations can be written as much as possible without blocking, and the user program will cache what is not written and try again later. We also note that the contents of the Read buffer can be empty. The socket read operation (typically a fixed-length byte array) will also block until the read buffer is full of bytes. With NIO, you can read as much as you want without blocking. If you do not read enough, you will continue to try to read. \

5.2 Detailed process: ACK

Does the diagram above show the entire socket process? Obviously not, the data confirmation process (ACK) is not presented at all. For example, when the contents of the write buffer are copied to the network adapter, the copied contents are not removed from the write buffer immediately, but are removed after the ack of the other party comes. If the network is bad and the ACK is slow to come, the write buffer will fill up quickly. \

5.3 Detailed process: Baotou

Careful students may have noticed that the message REQ in the figure is changed to uppercase REq when copied to the network card. Why is this? Because these two things are not exactly the same anymore. The network module of the kernel transmits the messages in chunks of the buffer. If the contents of the buffer are too large, they will be divided into multiple independent small message packets. In addition, some additional header information should be added to each message packet, such as the address of the source network card and the address of the destination network card, the serial number of the message, etc., and the receiving end needs to reorder these message packets and assemble them before throwing them into the read buffer. These intricate details are very difficult to animate. \

5.4 Detailed process: Rate

Another question is what if the read buffer is full, what if the nic receives a message from the other side? The common approach is to discard the ACK and not send it to the peer party. If the peer party finds that the ACK does not come, it will send the message again. So why is the buffer full? Because the message processing of the receiver is slow, the message produced by the sender is too fast. In this case, THE TCP protocol has a dynamic window adjustment algorithm to limit the sending rate of the sender and make the sending and receiving efficiency match. If it’s UDP, once you lose a message, you lose it completely. There are many more intricate details of the internal implementation of network protocols that need to be explored, so save that for later analysis. \