About the understanding of Socket, roughly divided into the following topics, what is a Socket, how to create a Socket, how to connect and send and receive data, Socket Socket deletion, etc.
What is a Socket and how to create it
A data packet is generated by the application program and enters the protocol stack to wrap various packet headers. Then the operating system invokes the nic driver to direct the hardware and send the data to the peer host. A rough sketch of the process is shown below.
As we all know, the protocol stack is actually a stack of some protocols located in the operating system, including TCP, UDP, ARP, ICMP, IP and so on. Generally, a protocol is designed to solve certain problems. For example, TCP is designed to transmit data safely and reliably; UDP is designed to transmit data with small packets and high transmission efficiency; ARP is designed to query physical Mac addresses through IP addresses; ICMP is designed to return error packets to hosts. IP is designed to achieve interconnection of large – scale hosts.
Data generated by applications such as browsers, emails, file transfer servers, etc., will be transmitted through the transport layer protocol, and the application does not directly establish contact with the transport layer, but has a suite that can connect the application layer and the transport layer, the suite is Socket.
In the above figure, the application consists of a Socket and a parser whose purpose is to query the DNS server for the target IP address.
Below the application is the interior of the operating system, which contains the protocol stack, which is a stack of a series of protocols. Following the operating system is the nic driver, the nic driver is responsible for controlling the nic hardware, the driver drives the NIC hardware to complete the transceiver work.
Within the operating system, there is a storage space for control information, which records the control information used for control communication. In fact, the control information is the Socket entity, or the memory space where the control information is stored is the Socket entity.
It’s probably not clear why, so I used the netstat command to show you what a socket is.
We type at the Windows command prompt
netstat -ano
#Netstat is used to display the contents of the socket. -ano is optional
#A Displays not only the sockets that are communicating, but also all sockets in states such as not yet communicating
#N Displays the IP address and port number
#O Displays the program PID of the socket
Copy the code
My computer will produce the following result.
Each row in the diagram corresponds to a socket, and each column is also called a tuple, so a socket is a quintuple (protocol, local address, external address, status, PID). Sometimes called a quad, a quad does not include protocols.
For example, in the first line of the figure, the protocol is TCP, and both the local address and the remote address are 0.0.0.0. This indicates that the communication has not started, the IP address has not been determined, and the local port is known as 135, but the remote port is not known, and the state is LISTENING. LISTENING indicates that the application is already open and is waiting to establish a connection with a remote host. The final tuple is the PID, or process identifier, which, like our ID number, pinpoints unique processes.
Now that you have a basic understanding of sockets, take a drink, take a break, and let’s continue exploring sockets.
Now I have a question, how is the Socket created?
The Socket is created with the application. There is a socket component in the application. When the application is started, the socket application is called to create a socket. The protocol stack creates a socket based on the application application: Allocating the memory required by a socket first is like preparing a container for control information, but only containers are not useful, so you need to put control information into the container. If you do not apply for the memory space needed to create sockets, there is no place to store the control information you create, so allocate memory space and place the control information is indispensable. The socket creation is now complete.
When a socket is created, the application is returned with a socket descriptor, which is equivalent to a number plate that distinguishes different sockets. According to this descriptor, the application needs to provide this descriptor when delegating data to and from the protocol stack.
Socket connection
After the socket is created, it is still used for sending and receiving data. Before sending and receiving data, there is a step of CONNECT, which is the process of establishing a connection. This connection is not a real connection: a pipe is inserted between two computers.
It is the process by which applications transfer from one host to another over network media using the TCP/IP protocol standard.
Just after the socket is created, there is no data and no object to communicate with. In this state, even if you ask the client application to delegate data to the protocol stack, it doesn’t know where to send it. Therefore, the browser needs to query the IP address of the server according to the URL. The protocol to do this work is DNS. When the destination host is queried, the IP address of the destination host is told to the protocol stack.
On the server, the same socket needs to be created as the client, but again it does not know who it is communicating with, so we need to have the client tell the server the necessary information about the client: the IP address and port number.
Now that the communication parties have the necessary information to establish a connection, only one shareholder is owed south Wind. After the communication parties receive data, they also need a piece of location to store it. This location is the buffer, which is a part of the memory. With the buffer, they can send and receive data.
OK, now that the client wants to send a piece of data to the server, what should it do?
First, the client application needs to call the Connect method in the Socket library, providing the Socket descriptor and the server IP address and port number.
Connect (< descriptor >, < server IP address and port number >)Copy the code
This information will be transmitted to the TCP module in the protocol stack, which will encapsulate the request message, and then to the IP module, which will encapsulate the IP packet header, and then to the physical layer, which will encapsulate the frame header, and then to the server through the network medium. The server parses the packet headers of the frame header, IP module, and TCP module to find the corresponding socket. After receiving the request, the socket writes the corresponding information and changes its status to being connected. After the request is completed, the server’s TCP module returns a response, just as the client does.
Control information plays a key role in a complete request and response process (more on this later).
- SYN is short for synchronization. The client first sends a SYN packet requesting the server to establish a connection.
- An ACK is a response to sending a SYN packet.
- FIN stands for terminate and indicates that the client/server wants to terminate the connection.
Due to the complex and changeable network environment, packets are often lost. Therefore, when communicating with each other, the two parties need to confirm whether the packets of the other party have arrived, and the criterion is the ACK value.
(The connection between communication parties is established through the three-way handshake process. For the detailed introduction of the three-way handshake, you can read the author’s article TCP basics)
When all the packets that establish the connection can be sent and received normally, the socket can be sent and received. At this time, it can be considered that the two sockets are connected by a single management. Of course, there is no tube. After the connection is established, the stack connection is finished, that is, the connect has been executed and the control flow is handed back to the application.
Sending and receiving data
After the control flow returns to the application from Connect, the data sending and receiving phase is directly followed. The data sending and receiving operation begins when the application calls write to send the data to the protocol stack, which then sends the data.
The protocol stack does not care what data the application sends, because it will eventually be converted to a binary sequence. Instead of sending the data immediately after receiving it, the protocol stack puts it in the send buffer and waits for the application to send the next data.
Why don’t incoming packets be sent directly, instead of being stored in a buffer?
As long as the data is sent once received, a large number of small packets may be sent, resulting in a decrease in network efficiency. So the protocol stack has to stack up to a certain amount of data before it can be sent. Different versions and types of operating systems have different ideas about how much data the stack puts into the buffer, but all operating systems and types adhere to the following criteria:
- The first judgment factor is the length of data that each network packet can hold
MTU
, which represents the maximum length of a network packet. The maximum length includes the header, so if the data area is considered alone, the MTU – header length is used, and the resulting maximum data length is calledMSS
.
- Another criterion is time, when the application of less data, the protocol stack to the buffer data placement efficiency is not high, if wait for MSS sending again, every time may delay caused by the waiting time is too long, in this case, even if did not reach the MSS data length, should also send out the data.
The protocol stack does not tell us how to balance these two factors, if the data length first, then the efficiency may be low; If time takes precedence, it will reduce the efficiency of the network.
After a while…
Suppose we use the rule of finite length, the buffer is full, the stack is about to send data, and the stack is about to send data, but it cannot send such a large (relative) amount of data at once, what happens?
In this case, the data in the send buffer is larger than the LENGTH of the MSS, and the data in the send buffer is split into a packet of THE MSS size. Each piece of data is split into TCP, IP, Ethernet headers, and then put into a separate network packet.
At this point, the network packet is ready to be sent to the server, but the data delivery operation is not finished because the server has not confirmed that it has received the network packet. Therefore, after the client sends the packet, the server needs to confirm it.
TCP modules in the split data, calculate the network packet offset, the offset is relative to the data calculated how many bytes from the very beginning, and will be a good number of bytes written in the TCP header, TCP modules will generate a serial number (SYN), network packet this serial number is the only, the serial number is used to confirm the server.
The server will confirm the data packet sent by the client. After the confirmation, the server will generate a serial number and ACK number and send them to the client. After the client confirms, the server will send the ACK number to the server.
Let’s look at the actual process.
First, the client needs to calculate the initial serial number value when connecting and send this value to the server. Next, the server calculates the confirmation number from this initial value and returns it to the client. The initial value may be discarded during communication. Therefore, the server needs to return an acknowledgement number after receiving the initial value. At the same time, the server also needs to calculate the initial value of the serial number from the server to the client and send this value to the client. Then, the client also needs to calculate the confirmation number according to the initial value sent by the server and send it to the server. At this point, the connection is established and the data can be sent and received.
In the data sending and receiving phase, the communication parties can send a request and a response at the same time, and confirm the request at the same time.
Request-acknowledgement is a very powerful mechanism that allows us to verify that a packet has been received by the recipient and resend it if not, so that any errors that occur in the network can be discovered and remedied.
Network cards, hubs, and routers do not have an error-recovery mechanism that simply drops packets once an error is detected. Applications do not have this mechanism either, only the TCP/IP module takes effect.
Because the network environment is complex and changeable, so packets will be lost, so there are certain rules for sending serial number and confirmation number, TCP will be through the window management confirmation number, we will not go over this article, you can read the author of this article TCP basic knowledge to find the answer.
disconnect
Disconnect the communication when the two parties no longer need to send or receive data. Different applications disconnect at different times. Take the Web as an example. The browser sends a request message to the Web server, and the Web server returns a response message. At this point, the data receiving and receiving end is complete.
Whenever either party initiates a disconnection request, the close procedure of the Socket library is called. Let’s take server disconnection as an example. The server initiates a disconnection request, and the protocol stack generates the TCP header of disconnection. In fact, the FIN bit is set, and then the IP module is entrusted to send data to the client.
Upon receiving a FIN request from the server, the client protocol stack marks the socket as disconnected. The client then returns an acknowledgement number to the server, the first step toward disconnection, after which the application also calls read to read the data. When the server data is sent, the protocol stack notifies the client application that the data has been received.
After receiving all data from the server, the client will call the close program to end the sending and receiving operation. At this time, the client will generate a FIN and send it to the server. After a period of time, the server will return the ACK number.
Delete the socket
When the communication is complete, the socket used to communicate is no longer in use, and we can delete the socket. However, the socket will not be deleted immediately, but will be deleted later.
This period of time is used to prevent misoperations, the most common of which is the loss of the confirmation number returned by the client. The length of time to wait depends on how the packet is retransmitted.
This is Socket!