Program development students, no matter Web front-end development, Web back-end development, or search engines and big data, almost all development fields will involve network programming. For example, we carry out Web server development, in addition to the Web protocol itself depends on the network, usually also need to connect to the database, and database connection is usually through the network to connect to the database server, or database cluster, if the load is too high to make a cache cluster.

We basically learned network programming and network protocols when we were in school. But the exact relationship between the two can be confusing. Here we will first focus on two concepts, one is network programming and the other is protocol.

We know that the network protocol is a layered protocol family, that is, a set of protocols, each responsible for its own functions from the bottom up. So what is an agreement? The word agreement means to consult together. Simple understanding is actually the provision of communication between multiple parties. The network protocol is actually the rules for the interaction and communication between multiple compute nodes in the network. If we compare our daily life, the agreement can be understood as language, such as Mandarin Chinese. If two people communicate without talking on the phone, they both understand what the other is saying. For example, if one person speaks Sichuan dialect and the other speaks Zhejiang dialect, it is almost impossible to communicate. The same is true of network protocols, which formalize data formats so that computers can understand each other’s intentions.

The following article introduces network programming, network programming is also called socket programming, socket is usually translated as “socket”, but the original meaning should be translated as “interface”. That is, the API interface provided by the operating system to the developer for network development. This interface can be adjusted to support multiple protocols, including TCP, UDP and IP. The following article from the socket programming and protocol are introduced in detail.

# Network programming

To make it easier to understand, this article will start with the concrete content, that is, through an example to introduce network programming how to work.

This article takes TCP as an example to introduce the relationship between network programming and protocols. For simplicity, this article uses Python as an example. If you do not know the Python programming language, it does not matter. The following code is easy to understand. We know that in network communication, no matter BS architecture or CS architecture, it is usually divided into server and client, but the browser in BS architecture is the client. Therefore, the examples in this article also include the server and client parts of the code. The code function is very simple, is to achieve the client and server to send strings.

This code listing is server-side code that sets up a listener on a port on the server and waits for the client to establish a connection. After the connection is established, wait for the client to send data and send the data back to the client.

#! /usr/bin/env python3
#-*- coding:utf-8 -*-

from socket import *
from time import ctime

host = ' '
port = 12345
buffsize = 2048
ADDR = (host,port)

Create a socket based on TCP
tctime = socket(AF_INET,SOCK_STREAM)
tctime.bind(ADDR)
Listen at the specified address and port
tctime.listen(3)

while True:
    print('Wait for connection ... ')
    tctimeClient,addr = tctime.accept()
    print("Connection from :",addr)

    while True:
        data = tctimeClient.recv(buffsize).decode()
        if not data:
            break
        tctimeClient.send(('[%s] %s' % (ctime(),data)).encode())
    tctimeClient.close()
tctimeClient.close()
Copy the code

Socket, BIND, listen, Accept, RECv and Send. Of interest are Listen and Accept, which listen on ports and accept connection requests from clients, respectively.

The following code listing is the client-side implementation, where the special feature is a connect function that establishes a connection with the server.

#! /usr/bin/env python3
#-*- coding:utf-8 -*-

from socket import *

HOST ='localhost'

PORT = 12345

BUFFSIZE=2048

ADDR = (HOST,PORT)

tctimeClient = socket(AF_INET,SOCK_STREAM)

tctimeClient.connect(ADDR)

while True:
    data = input(">")
    if not data:
        break
    tctimeClient.send(data.encode())
    data = tctimeClient.recv(BUFFSIZE).decode()
    if not data:
        break
    print(data)
tctimeClient.close()
Copy the code

As you can see from the example code above, the server is usually passive, while the client is more active. The server program establishes a listener for a port and waits for a connection request from the client. The client sends a connection request to the server, and the connection is established without accident. The client and the server can then send data to each other. Of course, accidents are common in a real production environment, so you need to deal with all kinds of accidents at the protocol and interface level, as detailed in the protocols section of this article.

In addition, this paper implements a basic program of client-server communication, which is hardly used in actual production. In order to improve the efficiency of data transmission and processing in actual production, asynchronous mode is usually adopted, which is beyond the scope of this article and will be covered in subsequent articles.

TCP protocol details

A network protocol is a language used to communicate information between different computers in a network. In order to achieve interaction, this language needs to have a certain format. This section uses TCP as an example.

TCP is a reliable transmission protocol. Its reliability is reflected in two aspects: on the one hand, it ensures that the packets can arrive in the order they are sent, and on the other hand, it ensures that the packets are correct to a certain extent. Its reliability is realized on the basis of two technologies. One is a CRC check, so that if some data in the packet error can be found by the check sum; Another point is that each packet has a serial number, so as to ensure the sequence of packets, if there is a misplaced packet can request to resend.

Speaking of format, let’s take a look at the data format of TCP packets. The following figure shows the format of TCP packets, including the original port, destination port, serial number and identifier bit, etc. The content is a little dazzling. But from a big perspective, this packet actually contains only two parts, one is the packet header, and the other is the specific data to be transmitted. In the control logic of TCP protocol, the packet header plays the most critical role. It is the basis of various features of TCP protocol, such as establishing connections, disconnecting connections, retransmission and error checking.

Other information about the packet header is relatively clear. This article only introduces the meanings of a few flag bits (URG, ACK, PSH, RST, SYN, and FIN). The specific meanings are as follows:

  • ACK: Confirms that the serial number is valid.
  • RST: resets the connection
  • SYN: Initiates a new connection
  • FIN: Releases a connection

** Connection establishment ** TCP requires the establishment of a connection before the actual transmission of data. The connection here is not a physical connection, the physical connection based on the underlying protocol has been established, and THE TCP connection is also to assume that the underlying connection has been successful, TCP connection is actually a virtual, logical connection. A simple and crude understanding is that the client and server record the serial number of the packets they receive and set themselves to a certain state. In TCP, the establishment of a connection is usually referred to as a three-way handshake. The establishment of a connection requires three times of confirmation.

The TCP three-way handshake process is shown in the following figure. In the initial state, both the client and server are closed. The main process is divided into three steps:

  1. The client sends preconnection packets: The client initiates the establishment of a TCP connection. The client sends a packet to the server. Note that the SYN bit in the packet is 1. As we described earlier, a SYN value of 1 indicates a packet that establishes a connection. At the same time, the packet contains a request serial number, which is also the basis for establishing the connection.
  2. Server reply connection confirmation: the server sends a reply packet to the client if it confirms that a connection can be established (which is not always possible because the number of sockets in the system is limited). In the reply packet, the ACK flag bit is set to 1, indicating that it is the server reply packet. At the same time, the value of request serial number and reply serial number will be set in the reply packet, as shown in Figure 3.
  3. Client confirms connection: Finally, the client sends another connection confirmation packet to inform the server that the connection has been successfully established.

As can be seen from the above process, the establishment of a connection requires multiple interactions, which is what we call a high-cost operation in daily life. In the actual production environment, the frequency of establishing connections is reduced to solve this problem. The common practice is to establish a connection pool and directly obtain connections from the connection pool during data transmission, rather than creating new connections.

One might think that the process of establishing a connection could be optimized by, for example, undoing the client’s last confirmation as useless. For the normal situation really does not have much effect, here is mainly to deal with the abnormal situation. Because network topology is very complex, especially in a wide area network, with countless network nodes, all kinds of exceptions can occur. Therefore, TCP must be designed to ensure the reliability in abnormal cases.

Let’s take an example where a connection request times out. Suppose a client sends a connection request to a server, and for some reason the request never reaches the server, so the server does not reply with a connection confirmation message. The client connection timed out, so the client re-sends a connection request to the server, which this time goes better, arriving quickly and establishing the connection. After a long journey, the previous packet finally reaches the server, which sends a reply packet to the client. The server considers the connection to be established successfully and will maintain the connection. However, the client level considers the connection timed out and will never close the connection. In this case, the server may have residual resources, resulting in a waste of resources on the server. In the long run, the server may have no new connection resources available.

Another point to note is that both client and server sockets have corresponding state, and the state can change with the different phases of the connection. The initial state is CLOSE, and the final connection is ESTABLISHED, as shown in Figure 3. The state changes are described in more detail later in this article.

After the connection is established, the client and server can transfer data. We know that TCP is a reliable transmission, so what is used to ensure the reliability of transmission? This is mainly through the checksum, request sequence number, and reply sequence number in the packet header (see Figure 2).

The reliability of TCP data content is guaranteed by checksum. TCP computes the checksum of the entire packet and stores it in the checksum field of the packet header. The recipient performs a calculation according to the rules to verify that the received data is correct. The process of sending and sending the calculated checksum is as follows:

  1. The false header, TCP packet header and TCP data are divided into 16-digit words, and the checksum field in the TCP packet header is set to 0
  2. Add up all 16 digits with inverse addition
  3. The calculation result is reversed and filled in the checksum field of the TCP packet header

The receiver adds all the original codes together and superposes them in high order. If all the original codes are 1, the data is correct; otherwise, the data is wrong.

The reliability of TCP packet sequence is guaranteed by request sequence number and reply sequence number. Each request in the data transmission will have a request sequence number, and the receiver will send a reply sequence number after receiving the data, so that the sender can know whether the data is correctly received, and the receiver can also know whether the data is out of order, so as to ensure the sequence of packets.

** TCP closes a connection in four steps, called four waves. The closing of a connection does not have to be initiated by the client; the server can also initiate the closing of a connection. The procedure for closing the connection is as follows:

  1. The initiator sends a FIN-set packet to close the connection between the sender and the receiver
  2. The receiver sends a reply with an ACK flag bit of 1 to confirm closure. The connection between the initiator and the receiver is completed, that is, the sender can no longer send data to the receiver, but the receiver can still send data to the sender.
  3. After data transmission is complete, the receiver sends a packet with FIN 1 to the initiator, indicating that the request is disconnected
  4. The initiator replies with an ACK packet confirming that the shutdown is successful

TCP is full-duplex communication. Therefore, the connection needs to be closed bidirectional. The first step is to close the connection between the initiator and the local end. The second step is to close the connection. After receiving the close request from the initiator, the receiver sends a reply to close the connection.

Up to now, this paper introduces the main content of network programming based on TCP protocol. Of course, this is just the entry level, there is still a lot to learn if you really want to understand TCP and network programming. This number will be introduced to you in succession.