This is the 9th day of my participation in the August More Text Challenge. For details, see:August is more challenging
TCP/IP five-tier model
First look at the diagram, layering the network model, and the complete workflow of a complete HTTP request in a five-tier model
The application layer
: The highest layer that provides application-specific protocols. Protocols running at this layer include HTTP, FTP, SSH, Websocket, and so onThe transport layer
: Provides a common data transfer protocol for the communication between two host processes, such as TCP and UDPThe network layer
: Is responsible for addressing and routing functions that send packets to specific computers. The main protocol is the IP protocol. Routers are located at this layerThe link layer
: is responsible for the binary data packets and network signal conversion, switch, network card is in this layerThe physical layer
: Mainly include receiver, transmitter, repeater, fiber optic cable and so on
Network protocols work through layers to clarify the responsibilities of each layer, and work together through well-defined interfaces, so that layer 1 can use the functions of the layers below without worrying about how each layer is implemented. Just as we develop encapsulated components, each component is responsible for its own business and does not interference with each other, which also increases the degree of reuse, as shown in the basic file transfer process above, which is the HTTP hierarchical workflow:
-
Host A initiates A request. Before sending the data, the data is divided into many fragments, called packets. Then, the packets are encapsulated using HTTP protocol, and the request header is added, and then passed to the next layer
-
The transport layer takes the data, assigns a port number to each packet, which is used to determine which application of the target computer, and then processes it using THE TCP protocol, plus the TCP header or UDP header, and passes it to the next layer over TCP
-
The network layer takes the data and adds the IP address of the target computer to each packet, decides what route to send it to or which host to receive it, and then encapsulates it and sends it to the next layer
-
The link layer translates the data into electronic signals, which are further encapsulated into data frames and transmitted to the physical layer
-
The physical layer is transmitted to the link layer on host B through a cable
-
After receiving the data, the link layer of host B checks the destination address in each packet and determines where to send it. If it is not sent to host B, it discards it. Then it determines the protocol type based on the data and passes it to the IP protocol module at the network layer
-
The network layer disconnects the IP header, determines that the IP address in the header matches the IP address, and forwards TCP or UDP packets based on the header protocol type
-
After receiving the packets, TCP at the transport layer calculates and verifies the packets to determine the integrity of the data. Then, it processes the logic of sequential packet receiving and determines which program to forward the packets to according to the port
-
Finally, after receiving the data, the application layer parses the data according to the HTTP protocol
Just expand IP at the network layer and TCP and UDP at the transport layer
IP
If you’re on a LAN and you’re using MAC addresses, outside the LAN, you’re using IP. MAC is like id, IP is like address. All TCP, UDP, and ICMP data is transmitted in the format of IP datagram
The IP protocol itself does not support IP packets that fail to reach the destination address, nor does it provide a direct way to obtain diagnostic information, such as which routers were sent and the round-trip time. ICMP is responsible for this
ICMP does not provide reliability for the IP network. It is only used to feedback various fault and configuration information. Packet loss does not trigger ICMP
The common ping is to query packets using ICMP. However, ping uses ICMP protocol to skip the transport layer, so the ping program does not have a port number
The IP protocol has the following features:
-
IP protocol is an unreliable transport protocol. If there is an ICMP transmission exception, IP will discard the packet and may respond with an ICMP error message to the sender. Any reliability requirements must be provided by upper-layer protocols such as TCP
-
The IP protocol is connectionless. No state information is maintained about the subsequent data, each data is independent. It can be received out of the order in which it is sent, without maintaining the connection state, and without maintaining the link state information of replication (which TCP will cover).
UDP
The characteristics of UDP
-
Connectionless sends data directly without a handshake or wave.
-
Unreliability: is a transfer of data porters, to send a package. Does not back up, does not care whether the other party received correctly, transmission order is not guaranteed. So only the application layer can guarantee reliability, because the network layer is also unreliable
- At the sending end, the application layer sends the data to the UDP at the transport layer. It only adds a UDP header (UDP protocol) and sends the data directly to the network layer.
- At the network layer, the receiver sends the data to the transport layer. At the transport layer, only the IP packet header is removed and the UDP packet is sent to the application layer.
- I don’t care about anything else, but that
Reduce overhead and latency before sending data
-
Support broadcasting: unicast, multicast, broadcast functions, not only support one to one transmission, but also support one to many, many to many
-
Head overhead is small: 8 bytes (source port (not mandatory), purpose port, UDP length (the whole length of the datagram), UDP inspection and (if there is a fault detection UDP datagram or destination port can’t find the corresponding process, each 2 bytes), because it requires is not high and the functions don’t have much, so the first field is not much, and TCP has 20 bytes. Its data can be 0, so it can be at least 8 bytes
-
Packet-oriented: It is suitable for transmitting a small amount of data at a time. The application layer sends UDP packets of many lengths in the same way. That is, a complete UDP packet is sent at a time without merging or splitting. If the packet is too long and the UDP packet is sent to the network layer in its entirety, the network layer must fragment the packet. Because the link layer requires an MTU, the network layer must fragment the packet. This affects the efficiency of the network layer
-
No congestion control: Suitable for real-time applications, as it will always send data at a constant speed, even if the network conditions are poor, the transmission rate will not be adjusted. This may lead to packet loss in the case of poor network, but it has obvious advantages. In some scenarios with high real-time requirements, such as chat, online video, voip and so on, UDP is used instead of TCP. For example, it is not too big a problem if intermittent calls occur when making wechat calls. Of course, there are some remedies for congestion such as forward correction or retransmission
Why is UDP unreliable
- Before transferring data
No connection needs to be established first
Message delivery is not guaranteed
The transport layer of the remote host does not need to confirm after receiving UDP packetsThere is no guarantee that the order will be
, no packet number is set, no rearrangement is performed, and no queue head blocking occursCongestion control is not implemented
, no built-in feedback mechanism, no retransmission, no timeout
TCP
This is the protocol that we use most of the time, especially back and front TCP provides a completely different service to applications than UDP. TCP is a reliable service that is connection-oriented. Connection-oriented means that before two applications of TCP can exchange data, Establishing a TCP connection by communicating with each other TCP provides a byte stream abstraction to applications: instead of automatically inserting record flags or message boundaries, for example, the sender may send 10 bytes and 30 bytes, but the receiver may read in two 20 bytes
The characteristics of TCP
-
It is connection-oriented. Both parties must establish a connection before communication can be made
-
Only unicast transmission is supported, that is, point-to-point transmission. A TCP connection can have only two endpoints
-
It provides reliable delivery services with integrity check. Data will not be lost. Packets will be retransmitted and arrive in sequence
-
Is byte stream oriented. Unlike UDP, each packet is transmitted independently, but is transmitted as a byte stream without preserving the packet boundary
-
Congestion control is provided. When the network is congested, flow control is provided to reduce the speed and quantity of data transmission, alleviate congestion, and ensure stability
-
Providing full-duplex communication and reliable communication means that both sender and receiver can send and receive data at the same time. Because both sides have a send cache and a receive cache
Send the cache
It’s the sending cache that has the data in the queue that’s ready to be sent and the data that has been sent but hasn’t received an acknowledgement from the recipient, and if it hasn’t received an acknowledgement it has to be resent so it can’t be thrown away, it might be retransmitted, because TCP needs to make sure that it’s transmitted reliablyReceive buffer
So the data that arrived in order but hasn’t been read by the receiving application and the data that didn’t arrive in order, need to be sorted before the receiving application can receive the data one by one
Why is TCP reliable
The recipient will send an ACK acknowledgement message after receiving the data. In this way, the sender knows that its data has been received by the other party. If it does not receive the ACK for a certain period of time, it will resend it. Therefore, even if the data is not sent to the receiver, or the RECEIVER’s ACK packet is lost, there is a retransmission mechanism to ensure that both parties can finally receive the message correctly through retransmission
The retransmission mechanism
Because the underlying network layer of TCP may be lost, duplicated, or out of order, TCP needs to provide reliable data transmission services.
To ensure the correctness of data transmission, a timer is started after a packet is sent. If no ACK packet is received within a certain period of time, the timer will retransmit the packet. If the ACK packet is not successfully sent for a certain number of times, the timer will give up and send a reset signal
Congestion control mechanism
It is mainly reflected in four aspects
- One is that
Slow start
Do not send large amounts of data at first, test the network, and then slowly increase the congestion window size from small to large - The second is
Congestion avoidance
, once the network congestion is determined, the transmission is set to half the size of the congestion, and the congestion window is set to 1, and then the slow start algorithm is restarted - The third is
The fast retransmission
The fast retransmission algorithm specifies that the sender immediately retransmits the packet segments that have not been received as long as it receives three consecutive repeated acknowledgements without waiting for the retransmission timer to expire - Fourth,
Fast recovery
, considering that if the network is congested, it will not be able to receive several consecutive repeated acknowledgments, so the sender will think that the network may not be congested, so instead of implementing the slow start algorithm, the congestion avoidance algorithm will be implemented
Flow control
The idea is to keep the sender from sending the data so fast that the receiver has time to receive it.
When processing accepted data in the receiver’s cache, reduce the sender’s window size to allow the receiver enough time to receive the packet. Or when the receiver is idle, the sender increases the window size to speed up transmission and make rational use of network resources
TCP Three-way handshake
- The first handshake: The client sends a message to the server (SYN, SEQ)
- a
The SYN packet
- One client
Initialize a random serial number
(seq)
- a
- Second handshake: The server sends a request to the client after receiving it (SYN,ACK, SEQ,ACK)
- Their own
The SYN packet
andACK packet
- A server
Initialize a random serial number
seq - A confirmation number ack= serial number sent by the client +1, indicating that the client has received it
- Their own
- The third handshake: After receiving the acknowledgement from the server, the client sends (ACK, SEQ, ACK) to the server
- Confirm and reply ACK packets
- A SEQ that is the value of the ACK sent by the client during the second handshake
- An ACKNOWLEDGEMENT ack, with the server’s serial number +1, tells the server I received it
Why not twice?
- Confirm that the two sides
send
andreceive
Is not normal - Each other
Confirm the initialization sequence number
If the server uses only two packets, the server does not know whether its serial number is acknowledged by the peer. As a result, invalid packet segments may be received by the server, resulting in errors.
Why not four times?
Because there is no need, the confirmation of the three times have been confirmed
TCP waves four times
Both the client and the server can initiate the shutdown operation. The following uses the client as an example
- The browser sends first
FIN
The message,Seq
= Initializes the sequence number to the server and stops sending data, but still accepts data in response from the server - After receiving the packet, the server sends the packet
AC
K= Browser serial number +1 to the browser, indicating that received - All the server data has been sent. Send it to the browser
FIN
The message,Seq
= Serial number to the browser - The browser receives the packet and sends it
ACK
= Serial number of the server +1 to the server: received
After the end of the fourth wave, a period of time (2MSL after the timer is set) is required to ensure that the server has received its ACK packet. After receiving the ACK packet, the server also closes the connection
Why wait a while before closing? Why not wait?
This is to prevent the loss or error of the acknowledgement segment sent to the server, which may cause the server to fail to shut down. The waiting time is 2MSL, which is the maximum lifetime of packets on the network. If the waiting time exceeds this time, packets will be discarded
In RFC793, MSL is set at 2 minutes, but in practice, it is usually used for 30 seconds. If this time is exceeded, the active switchback will send a packet with RST status bit, indicating that the connection has been reset. At this time, the switchback will know that the other party has closed the connection
If the initiator does not wait, due to port reuse, the initiator may have opened another connection and the initiator is still retrying the FIN request, resulting in the initiator receiving a lot of useless packets. Because the packet has a serial number, so you can determine that this connection is not the received packet, not tube. To do this, we need to make the active switcher-off wait to ensure that the switcher-off will not send any FIN requests before port multiplexing
Why four waves
TCP uses the four-wave wave because the TCP connection is full-duplex. Therefore, both parties need to release the connection of the other party. The release of the connection of an independent party means that data cannot be sent to the other party but can be received
So in close connection, the server receives a FIN message, probably does not immediately close the socket, because it may have a message not sent out, so can only reply an ack packet, tell the client “you hair message I have received”, etc. All of my message is sent out, I can send a FIN message, so you can’t send together, So it takes four waves
TCP packet sticking and processing
To prevent a large amount of data from being transmitted due to a small amount of data, packets are glued into multiple TCP segments and sent. It is to glue several packets of data into a packet, from the receive buffer, the head of the last packet of data is close to the end of the previous packet of data. Because the TCP layer transport is streaming transport, streaming, the biggest problem is that there are no boundaries, no boundaries will cause data to stick together
Unpacking means splitting tasks to reduce the error rate
A sticky scenario
- The receiver does not receive packets from the buffer in time, causing multiple packets to be received
- Because TCP is used by default
Nagle algorithm
, the algorithm itself may also lead to sticky packaging problems - As a result of TCP
reuse
Cause sticky bag. Due to the reusability of TCP connections, which can be used by multiple processes on a single host, boundary segmentation is bound to be a bizarre problem when streaming data of many different structures into TCP - The sticky package problems posed by the packet is too large, such as application process buffer size of the content of a message sent over the size of the cache area, is likely to produce glue bag, because the message has been divided, the first part has been received, but the other part may just cache rushed into the area to prepare to send, after this will lead to a part of the package
- Flow control, congestion control can also lead to sticky packets
The Nagle algorithm does two things: it sends the next packet only if the last one is confirmed; The second is to collect multiple small packets, the size of which reaches the maximum segment size (MSS), and send them together when a confirmation arrives. Multiple groupings are sent as one data segment. If the boundary problem is not handled properly, packets will be sticky when unpackingCopy the code
How to deal with the sticky package
- If it is
Nagle algorithm
Cause, combined with the application scenario to close the algorithm can be appropriate - If it is not
Tail tag sequence
. The boundary of the packet is represented by a special identifier, such as \n\r\t or some hidden characterThe header marks step receive
. Add the data length to the header of the TCP packet. Using a protocol with a header, the header stores the start id and the length of the message. When the service provider obtains the header, it parses the length of the message and then reads the contents of that length backwards- When the application layer sends data, it sends it with a fixed length. The server reads the content of a fixed length as a complete message. If the length is not long enough, it fills in fixed characters in the space
Why does UDP not stick to packets
- Because UDP is
Message-oriented protocols
, the UDP segment is a message, the application must be the message as the unit to extract data, can not extract any byte of data at a time - UDP has
Protect message boundaries
Each UDP packet has a header (source address, port information, etc.) so that it is easy for the receiver to partition. The transmission protocol transmits data as an independent message on the Internet, and the receiver can only receive independent messages. If the message content is too large to be accepted by the receiver at one time, a part of the data will be lost, because even if it is lost, it will not receive it in two times
Differences and application scenarios between TCP and UDP
TCP
Transmission speed is slow;UDP
Speed is fastTCP
The protocol is reliable, congestion control and flow control;UDP
The protocol is unreliable and there is no congestion control, flow control, etcTCP
The protocol is connection-oriented and requires a 3-way handshake;UDP
The protocol is connectionless and handshake is not requiredTCP
Only one-to-one connection;UDP
Support broadcasting, one to one, one to many, many to many can beTCP
Minimum header size of 20 bytes;UDP
Minimum 8 bytesTCP
It is byte stream oriented in transmission; whileUDP
Is message orientedTCP
When the protocol transmits the data segment, it labels the segment.UDP
Agreement without
In addition, the TCP and UDP port numbers are independent, so they can be the same
Applicable scenario
TCP is suitable for the scenarios that require large amounts of data to be transmitted and reliable transmission (such as data confirmation, resend, and sorting), such as login and file transmission. UDP is suitable for the scenarios that require small amounts of data to be transmitted and high efficiency, such as real-time applications, im, chat, and video calls
conclusion
Praise support, hand stay fragrance, and have glory yan
Thanks for seeing this, come on!
reference
- With you understand the inner workings of network programming
- Juejin. Cn/post / 684490…