TCP

What is TCP?

TCP is a connection-oriented, reliable, byte stream – based transport-layer communication protocol.

  • Connection-oriented: must be “one to one” to connect, not like UDP protocol can be a host to send messages to multiple hosts at the same time, that is, one-to-many is not able to do;
  • Reliable: TCP ensures that a packet can reach the receiving end no matter how the link changes.
  • Byte stream: Messages are “borderless”, so we can transmit them no matter how big the message is. In addition, the message is in order. If the previous message is not received, even if it receives the following bytes first, it cannot be thrown to the application layer for processing, and the repeated packets are automatically discarded.

What is a TCP connection?

In simple terms, certain state information is used to ensure reliability and flow control maintenance. The combination of these information, including Socket, serial number, and window size, is called connection.

Therefore, to establish a TCP connection, the client and the server need to reach a consensus on the above three information.

  • Socket: consists of an IP address and a port number
  • Serial number: used to solve out-of-order problems
  • Window size: used for flow control

How do you determine a unique TCP connection?

A TCP quad can determine a unique connection. Quads include the following:

  • Source address
  • Source port
  • The target address
  • Destination port

The source address and destination address fields (32 bits) are in the IP header and are used to send packets to the peer host over IP.

The source port and destination port fields (16 bits) are in the TCP header and are used to tell THE TCP protocol which process to send the packet to.

TCP header format

The field names Length (bits) Containing righteousness
The source port number 16 The port number of the program that sends network packets
Destination port number 16 The port number of the program that receives network packets
Serial number (Sequence Number) 32 Sequence NumberIs the serial number of the bag,This command is used to solve the reordering problem of network packets.
Confirm serial number (Acknowledgement Number) 32 Acknowledgement NumberThat’s ACK — for acknowledgement of receipt,Used to solve the problem of not losing packets.
The length of the first 4 Represents the starting position of the data portion, and can also be considered the length of the header
keep 6 This field is reserved and not currently in use
Control bits 6 Each bit in this field represents the following communication control meanings.

URG: 1 indicates high-priority data packets, and the emergency pointer field is valid.

ACK: 1 indicates that the serial number field is valid, and generally indicates that the data has been received by the recipient.

PSH: a value of 1 indicates data with the PUSH flag, indicating that the receiver should deliver this message segment to the application layer as soon as possible without waiting for the buffer to fill up.

RST: 1 indicates that a serious error occurs. You may need to recreate the TCP connection. It can also be used to reject invalid message segments and connection requests.

SYN: A value of 1 indicates whether this is a connection request or a connection accept request, and is used to create a connection and synchronize the sequence number.

FIN: 1 indicates that the sender has no data to transmit and needs to release the connection.
The window size 16 The receiver tells the sender the size of the window (i.e. the amount of data that can be sent together without waiting for confirmation)
The checksum 16 Used to check for errors
Pointer to an emergency 16 Represents the location of data that should be processed urgently
options Variable length In addition to the fixed header fields above, optional fields can be added, but they are rarely used except for join operations

TCP three handshakes and four waves

Three-way handshake

  1. The first packet is a SYN packet

    • The client randomly initializes the serial number (client_isnTheta is theta in this caseseq = x), place the sequence number in the sequence number field of the TCP header field, and place theSYNMark position is1Said,SYNMessage. Then, the client sends the first SYN packet to the server, indicating that it initiates a connection to the server. The packet does not contain application-layer dataSYN-SENTState.
  2. The second packet is a SYN + ACK packet

    • The server receives the packet from the clientSYNAfter the packet is received, the server initializes its serial number randomly (server_isnTheta is theta in this caseseq = y), enter the serial number in the SERIAL number field of the TCP header field, and then enter the Confirm serial number field of the TCP header fieldclient_isn + 1And then theSYNACKMark position is1. Finally, the packet is sent to the client. The packet does not contain application-layer data. After that, the server is inSYN-RCVDState.
  3. The third packet is an ACK packet

    • After receiving a packet from the server, the client sends the last reply packet to the server. First, the reply packet contains the TCP header fieldACKMark position is1, and then fill in the “Confirm serial number” fieldserver_isn + 1, and finally sends the packet to the server. This packet can carry the data of the client to the serverESTABLISHEDState.
    • After receiving the reply packet from the client, the server also logs inESTABLISHEDState.

Note:

We see that there are two intermediate states, SYN_SENT and SYN_RCvd. These states are called “half-open” states, which are sent to each other without seeing a response.

Syn_sent is the “half open” state of the active open party, and SYN_RCVD is the “half open” state of the passive open party. The client is the active open side, the server is the passive open side.

  • Syn_sent: SYN package has been sent
  • Syn_rcvd: SYN package has been received Syn packets have been received

Four times to wave

  • When the client intends to close the connection, it sends a TCP header fieldFINThe flag bit is set to1Is, namelyFINPacket, and then the client entersFIN_WAIT_1State.
  • After receiving the packet, the server sends it to the clientACKReplies the packet, and then the server entersCLOSED_WAITState.
  • The client receives the packet from the serverACKAfter the packet is answered, the packet is enteredFIN_WAIT_2State.
  • After the server finishes processing the data, it also sends the data to the clientFINThe server enters the packetLAST_ACKState.
  • The client receives the packet from the serverFINA packet is returnedACKReply packet and then enterTIME_WAITState.
  • The server received it.ACKAfter answering the message, it entersCLOSEDState, at which point the server has completed closing the connection.
  • The client is passing by2MSL(4 minutes) After the automatic accessCLOSEDState, at which point the client has finished closing the connection.

As you can see, each direction requires a FIN and an ACK, which is often referred to as four waves.

Note that only those who actively close a connection have a TIME_WAIT state.

The four waves are not always four waves. Sometimes the two movements in the middle can be combined to create a three wave, and the active closing party will go directly from fin_WAIT_1 to time_wait, skipping fin_WAIT_2.

The data transfer

The retransmission mechanism

Timeout retransmission

A mode of the retransmission mechanism is to set a timer when sending data. If no ACK packet is received within the specified time, the data is retransmitted.

TCP timeout retransmission occurs in the following two cases:

  • Packet loss
  • Acknowledgement acknowledgement (ACK) is missing

The fast retransmission

TCP also has another Fast Retransmit mechanism, which is data-driven rather than time-driven.

In the figure above, the sender sends 1,2,3,4,5 pieces of data:

  • The first Seq1 was sent first, so Ack back 2;
  • Seq2 didn’t get it for some reason, Seq3 arrived, so it’s Ack 2 again.
  • Seq4 and Seq5 have arrived, but still Ack 2, because Seq2 is still not received;
  • After receiving three Ack = 2 acknowledgements, the sender retransmits the lost Seq2 before the timer expires.
  • Seq3, Seq4, and Seq5 all received Seq2.

The fast retransmission mechanism solves only one problem, the time-out problem, but it still faces another. Is it when you retransmit, is it the one before that, or is it all the questions.

Method the SACK

SACK (Selective Acknowledgment), which requires a SACK in the TCP header “Options” field, to send the cached map to the sender. This way the sender can know what data was and was not received. Knowing this, You can retransmit only the lost data.

As shown in the figure below, after receiving the same ACK packets for three times, the sender triggers the fast retransmission mechanism. It finds that only the data segment 200~299 is lost through SACK information, and only this TCP segment is selected for retransmission.

Duplicate SACK

Duplicate SACK, also known as D-sack, mainly uses SACK to tell “sender” what data has been repeatedly received.

  1. ACK packet loss

  • Both ACK replies sent by “receiver” to “sender” are lost, so the sender retransmits the first packet (3000-3499) after timeout.
  • The “receiver” finds that the data is received twice, and replies with SACK = 3000~3500The ACK has reached 4000, which means that all the data before 4000 has been received, so the SACK representsD-SACK.
  • In this way, the sender knows that the data is not lost, but that the ACK packet from the receiver is lost.
  1. Network latency

D-sack has several advantages:

  1. You can let the “sender” know whether the sent packet is lost or the ACK packet that the receiver responded to is lost.
  2. You can tell if the “sender” packet is delayed by the network;
  3. It can tell whether the network has copied the “sender” packet.

The sliding window

To solve this problem, TCP introduced the concept of Windows. It does not reduce the efficiency of network communication even in the case of long round trip times.

So with a window, you can specify the window size, which is the maximum value at which data can continue to be sent without waiting for an acknowledgement.

The implementation of a window is actually a cache space created by the operating system, and the sending host must hold the sent data in the buffer until the acknowledgement reply comes back. If an acknowledgement is received on schedule, the data can be purged from the cache.

If the window size is three TCP segments, the sender can continuously send three TCP segments. If an ACK is lost during the process, the sender can confirm the ACK through the next acknowledgement reply. The diagram below:

As long as the sender receives an ACK 700 acknowledgement, it means that all data “recipients” up to 700 have received it. This pattern is called cumulative acknowledgement or cumulative response.

Which side decides the window size?

The TCP header has a field called Window, which is the size of the Window.

This field is used by the receiver to tell the sender how many buffers it has left to receive data. Then the sender can send data according to the processing capacity of the receiver without causing the receiver to be overwhelmed.

Therefore, the size of the window is usually determined by the size of the window on the receiving end.

The size of the data sent by the sender cannot exceed the size of the window on the receiver; otherwise, the receiver cannot receive data normally.

The sender’s sliding window

Let’s take a look at the sender’s window first. The following figure shows the data cached by the sender, which is divided into four parts according to the processing situation, among which the dark blue box is the sending window and the purple box is the available window:

  • #1 is the data that has been sent and received an ACK: 1 to 31 bytes;
  • #2 is the data that has been sent but has not received an ACK: 32 to 45 bytes;
  • #3 is not sent but the total size is within the receiver’s processing range (the receiver has space) : 46 to 51 bytes;
  • #4 is not sent but the total size is out of the receiver’s processing range (the receiver has no space) : after 52 bytes.

In the figure below, when the sender sends the data “all” at once, the size of the available window is 0, indicating that the available window is exhausted and data cannot be sent until an ACK is received.

In the figure below, if the size of the send window does not change after receiving an ACK acknowledgement of 32 to 36 bytes of data previously sent, the sliding window moves 5 bytes to the right, because 5 bytes of data are acknowledged by the ACK acknowledgement, and the next 52 to 56 bytes become available again. Then you can send 52 to 56 bytes of data.

How does the program represent the four parts of the sender?

The TCP sliding window scheme uses three Pointers to track bytes in each of the four transport categories. Two of these Pointers are absolute (referring to a specific sequence number) and one is relative (offset).

  • SND.WND: represents the size of the send window (the size is specified by the receiver);
  • SND.UNA: is an absolute pointer to the sequence number of the first byte sent but not acknowledged, that is, the first byte of #2;
  • SND.NXT: is also an absolute pointer to the sequence number of the first byte in the unsent but deliverable range, that is, the first byte in #3;
  • The first byte to #4 is a relative pointer, and it needsSND.UNAPointer to addSND.WNDThe offset of size points to the first byte of #4.

Then the calculation of the available window size can be:

Large available window = snd.wnd – (snD.nxt-snd.una)

A sliding window for the receiver

Next, let’s look at the receiver window. The receiver window is relatively simple, divided into three parts according to the processing situation:

  • #1 + #2 is the data that has been successfully received and acknowledged (waiting for the application process to read);
  • #3 is the data that has not been received but can be received;
  • #4 Data cannot be received when no data is received.

The three receiving parts are divided by two Pointers:

  • RCV.WND: indicates the size of the receive window, which is advertised to the sender.
  • RCV.NXT: is a pointer to the sequence number of the next data byte expected from the sender, the first byte of #3.
  • The first byte to #4 is a relative pointer, and it needsRCV.NXTPointer to addRCV.WNDThe offset of size points to the first byte of #4.

Are the receive and send Windows equal in size?

The size of the receive window is approximately equal to the size of the send window.

Because sliding Windows are not static. For example, if the receiving application is reading data very fast, the receiving window can fill up quickly. The new size of the receiving window is told to the sender by the Windows field in the TCP packet. There is a delay in the transmission process, so the receiving window and the sending window are approximately equal to each other.

Flow control

The sender should not mindlessly send data to the receiver. The processing capability of the receiver should be considered.

If you mindlessly send data to the other party, but the other party can not process it, it will trigger the retransmission mechanism, resulting in unreasonable waste of network traffic.

To solve this problem, TCP provides a mechanism that allows the “sender” to control the amount of data sent based on the “receiver” ‘s actual ability to receive data. This is called flow control.

According to the flow control in the figure above, each process is described below:

  1. The client sends a request packet to the server. Note that this example uses the server as the sender, so the server receive window is not drawn.
  2. After receiving the request message, the server sends an acknowledgement message and 80 bytes of data. Then the window is availableUsableReduced to 120 bytes whileSND.NXTThe pointer is also shifted 80 bytes to the right to point to 321,That means the next time the data is sent, the serial number is 321.
  3. When the client receives 80 bytes of data, the receive window moves 80 bytes to the right,RCV.NXTSo I’m pointing to 321,This means that the client expects the sequence number of the next packet to be 321And sends an acknowledgement packet to the server.
  4. The server sends another 120 bytes of data, so the available window runs out to zero and the server cannot send any more data.
  5. When the client receives 120 bytes of data, the receive window moves 120 bytes to the right,RCV.NXTSo it points to 441, and then it sends an acknowledgement to the server.
  6. After the server receives an acknowledgement packet for 80 bytes of data,SND.UNAThe pointer is offset to the right to point to 321, so the window is availableUsableIt goes up to 80.
  7. After the server receives an acknowledgement packet for 120 bytes of data,SND.UNAThe pointer is offset to the right to 441, so the window is availableUsableIt goes up to 200.
  8. The server can continue to send, so after sending 160 bytes of data,SND.NXTIt points to 601, so you have an available windowUsableIt’s down to 40.
  9. After the client receives 160 bytes, the receive window moves 160 bytes to the right,RCV.NXTThat is, it points to 601 and sends an acknowledgement packet to the server.
  10. When the server receives a 160-byte acknowledgement message, the send window moves 160 bytes to the right, soSND.UNAPointer offset 160 to 601, available windowUsableSo it goes up to 200.
Congestion control

Why congestion control, when there is flow control?

The previous flow control prevents the “sender” from filling the “receiver” cache without knowing what is going on in the network.

Generally speaking, computer networks live in a shared environment. Therefore, there is also the possibility of network congestion due to communication between other hosts.

When the network is congested, if a large number of packets continue to be sent, packet delay and loss may occur. In this case, TCP will retransmit data. However, a retransmission will cause a heavier burden on the network, resulting in greater delay and more packet loss. .

So TCP couldn’t ignore what was happening on the network. It was designed to be a disinterested protocol that would sacrifice itself to reduce the amount of data sent when the network was congested.

Congestion control, then, is designed to keep the “sender” from filling the network.

To adjust the amount of data to be sent at the “sender” side, a concept called the “congestion window” is defined.

What is a congested window? What does it have to do with the send window?

Congestion window CWND is a state variable maintained by the sender, which changes dynamically according to the degree of congestion in the network.

We have mentioned that the relationship between the sending window SWND and the receiving window RWND is approximately equal, so after the concept of congestion window is added, the value of the sending window is SWND = min(CWND, RWND), which is the minimum value between the congestion window and the receiving window.

Congestion window CWND change rules:

  • As long as there is no congestion in the network,cwndIt’s going to increase;
  • But there was congestion on the network,cwndReducing;

So how do you know if the current network is congested?

In fact, as long as the sender does not receive the ACK reply packet within the specified time, that is, timeout retransmission occurs, the network is considered congested.

What are the congestion control algorithms?

Congestion control mainly consists of four algorithms:

  • Slow start
  • Congestion avoidance
  • Congestion occurs
  • Fast recovery

Slow start

After the TCP connection is established, there is a slow start process. This slow start means that the number of packets sent is increased bit by bit. If a large amount of data is sent at the beginning, it is not to add congestion to the network.

Slow-start algorithms just need to remember one rule: as the sender receives an ACK, the size of the congestion window CWND increases by one.

Here we assume that the congestion window CWND is equal to the send window SWND.

  • After the connection is established, the initial initializationcwnd = 1“, indicating that one can be transmittedMSSSize of data.
  • When an ACK acknowledgement is received, the CWND is incremented by 1 so that it can send two at a time
  • After receiving two ACK acknowledgments, the CWND increases by 2, so two more can be sent than before, so four can be sent this time
  • When the ACK confirmation of these 4 comes, the CWND of each confirmation increases by 1, and the CWND of the 4 confirmation increases by 4, so 4 more CWND can be sent than before, so 8 can be sent this time.

It can be seen that the slow start algorithm, the number of packets sent is exponential growth.

So when does slow start stop?

There is a status variable called ssthRESH (Slow Start Threshold).

  • whencwnd < ssthresh, the slow start algorithm is used.
  • whencwnd> =ssthresh, the congestion avoidance algorithm is used.

Congestion avoidance

As mentioned earlier, the congestion window CWND enters the congestion avoidance algorithm when it “exceeds” the slow start threshold SSthRESH.

Ssthresh is typically 65535 bytes in size.

After entering the congestion avoidance algorithm, its rule is: CWND increases by 1/ CWND every time an ACK is received.

Following the previous slow start chestnut, ssTHRESH is assumed to be 8:

  • When 8 ACK acknowledgments arrive, each one is increased by 1/8. 8 ACK acknowledgments CWND is increased by 1, so 9 can be sent this timeMSSThe size of the data becomes zeroLinear growth.

Therefore, we can find that the congestion avoidance algorithm turns the exponential growth of the original slow start algorithm into linear growth, or growth stage, but the growth rate is slower.

In this way has been growing, the network will slowly enter the congestion situation, so there will be packet loss phenomenon, then need to retransmit the lost packets.

When the retransmission mechanism is triggered, the congestion generation algorithm enters.

Congestion occurs

When network congestion occurs, data packets are retransmitted. There are two retransmission mechanisms:

  • Timeout retransmission
  • The fast retransmission

The congestion sending algorithms used by the two are different, and will be discussed separately.

Congestion generation algorithm for timeout retransmission

At this point, the values of SSTHRESH and CWND change:

  • ssthreshSet tocwnd/2.
  • cwndReset to1

Then, restart the slow start, which suddenly reduces the data flow. This is really once “timeout retransmission”, immediately back to the pre-liberation. But this approach is too aggressive, too reactive, and can cause network congestion.

Congestion generation algorithm for fast retransmission

There’s a better way, and we talked about fast retransmission algorithms. When the receiver discovers that it has lost an intermediate packet, it sends ACK of the previous packet three times, and the sender quickly retransmits without waiting for a timeout.

TCP considers this situation not serious, because most of them are not lost, only a small part is lost, then ssTHRESH and CWND change as follows:

  • cwnd = cwnd/2, that is, set to half of the original;
  • ssthresh = cwnd;
  • The fast recovery algorithm is displayed

Fast recovery

Fast retransmission and fast recovery algorithms are used together, and the fast recovery algorithm assumes that the fact that you still get three duplicate Acks means that the network isn’t that bad, so it doesn’t have to be as strong as the RTO timeout.

As mentioned earlier, CWND and SSTHRESH have been updated before going into quick recovery:

  • cwnd = cwnd/2, that is, set to half of the original;
  • ssthresh = cwnd;

Then, enter the fast recovery algorithm as follows:

  • Congestion windowcwnd = ssthresh + 3(3 means to confirm that three packets have been received);
  • Retransmission of lost packets;
  • If repeated ACKS are received, the CWND increases by 1;
  • If CWND is set to the value of SSTHRESH in step 1 after receiving the ACK of new data, the ACK confirms the new data, it indicates that data has been received from duplicated ACK. The recovery process is finished and the state before recovery can be returned to. That is, congestion avoidance again.

That is, there is no “timeout retransmission” like the night before liberation, but still in a relatively high value, followed by a linear increase.

Schematic diagram of congestion algorithm

Handshake related questions

Why three handshakes, not two? Not four times?

Cause one: Avoid historical connections

Simply put, the primary reason for the three-way handshake is to prevent the old duplicate connection initialization from causing chaos.

  • The client sends multiple SYN packets to establish connections. In the case of network congestion:

    • An old SYN packet arrived at the server earlier than the latest SYN packet.
    • Then the server will return oneSYN + ACKThe packet is sent to the client.
    • After receiving the connection, the client can determine whether it is a historical connection (serial number expired or timed out) based on its own context, and then the client will send the connectionRSTThe packet sent to the server terminates the connection.

    For a two-handshake connection, you cannot determine whether the current connection is a historical connection. For a three-handshake connection, when the client (sender) prepares to send the third packet, the client has sufficient context to determine whether the current connection is a historical connection:

    • If the connection is historical (serial number expires or times out), the packet sent by the third handshake isRSTMessage to terminate a historical connection;
    • If the connection is not historical, the packet sent for the third time isACKMessage, the communication parties will successfully establish a connection;

    So, the main reason TCP uses a three-way handshake to establish a connection is to prevent historical connections from initializing the connection.

Cause two: Synchronize the initial sequence number of both parties

Both sides of TCP communication must maintain a serial number. Serial number is a key factor in reliable transmission. It functions as follows:

  • The receiver can remove duplicate data;
  • The receiver can receive packets in sequence according to the serial number.
  • You can identify which packets have been received.

Therefore, when a client sends a SYN packet carrying the initial sequence number, the server needs to send an ACK packet to indicate that the client’s SYN packet has been received. When the server sends the initial sequence number to the client, the server sends an ACK packet to indicate that the client’s SYN packet has been received. It also needs to get a response from the client, so that the initial serial numbers of both parties can be reliably synchronized.

The quad handshake also reliably synchronizes the initial sequence numbers of both parties, but since steps 2 and 3 can be optimized into one step, it becomes a “three-way handshake.”

However, two handshakes only guarantee that the initial serial number of one party can be successfully received by the other party, but there is no guarantee that the initial serial number of both parties can be received.

Reason 3: Avoid wasting resources

If there is only “two-handshake”, when the client’s SYN request is blocked on the network and the client does not receive an ACK packet, the server sends the SYN again. Because there is no third handshake, the server does not know whether the client has received its ACK signal for establishing a connection. So every time you receive a SYN, you have to initiate a connection. What happens?

If the client’s SYN is blocked and SYN packets are sent repeatedly, the server will establish multiple redundant invalid links after receiving the request, resulting in unnecessary resource waste.

That is, the server repeatedly receives useless SYN packets for connection requests, causing repeated resource allocation.

summary

When establishing a TCP connection, the three-way handshake prevents the establishment of a historical connection, reduces unnecessary resource costs on both sides, and helps both sides initialize serial numbers synchronously. Serial numbers ensure that data packets are not repeated, discarded, and transmitted in sequence.

Reasons not to use “two handshakes” and “four handshakes” :

  • “Two-handshake” : it cannot prevent the establishment of a historical connection, resulting in a waste of resources on both sides, and cannot reliably synchronize the serial numbers of both sides.
  • “Four handshakes” : Three handshakes are the minimum number of reliable connections theoretically established, so there is no need to use more communications.

What is a SYN attack? How do I avoid SYN attacks?

The SYN attack

For example, if an attacker forges SYN packets of different IP addresses for a short period of time, the server enters the SYN_RCVD state every time it receives a SYN packet. However, the ACK + SYN packets sent by the server enter the SYN_RCVD state. If a host fails to receive an ACK response from an unknown IP address, the server’s SYN queue (unconnected queue) will be filled over time, preventing the server from serving normal users.

Avoid SYN attack mode 1

One solution is to modify Linux kernel parameters to control the queue size and what to do when the queue is full.

  • When the nic receives packets faster than the kernel can process them, a queue holds the packets. The following parameters control the maximum value of this queue:

    net.core.netdev_max_backlog
    Copy the code
  • Maximum number of SYN_RCVD state connections:

    net.ipv4.tcp_max_syn_backlog
    Copy the code
  • If the SYN exceeds the processing capacity, RST is returned for the new SYN and connections are discarded:

    net.ipv4.tcp_abort_on_overflow
    Copy the code

Avoid SYN attack mode 2

Let’s take a look at how the Linux kernel’s SYN (incomplete connection establishment) queue and Accpet (completed connection establishment) queue work.

Normal process:

  • When a server receives a SYN packet from a client, it adds the packet to the SYN queue in the kernel.
  • Then SYN + ACK is sent to the client and the client responds with an ACK packet.
  • After receiving ACK packets, the server removes them from the SYN queue and places them in the Accept queue.
  • The application callsaccpet()Socket interface that fetches the connection from the Accept queue.

Application is too slow:

  • If the application is too slow, the “Accept queue” will fill up.

SYN attack:

  • If SYN attacks continue, the SYN queue will fill up.

Tcp_syncookies can be used to counter SYN attacks:

net.ipv4.tcp_syncookies = 1
Copy the code

Tcp_syncookies Responds to SYN attacks

  • When SYN Queue is full, subsequent servers receive SYN packets and do not enter SYN queue.
  • Work out acookieValue, which is then returned to the client as “serial number” in SYN + ACK;
  • After receiving the ACK packet from the client, the server checks the validity of the ACK packet. If it is valid, it is directly put into the “Accept queue”.
  • Finally the application is calledaccpet()Socket interface, the connection from the Accept queue.

Questions about waving hands

Why does it take four waves?

  • When a connection is closed, the client sends a message to the serverFIN, simply means that the client is no longer sending data but can still receive data.
  • The server receives the packet from the clientFINA packet is returned firstACKThe server may have data to process and send, and sends it only when the server does not send any more dataFINA message to the client agreeing to close the connection now.

As shown in the preceding procedure, the server usually waits for data to be sent and processed. Therefore, the ACK and FIN packets on the server are sent separately, resulting in one more handshake than the three-way handshake.

Why is the wait time of TIME_WAIT 2MSL?

MSL indicates the Maximum Segment Lifetime. MSL indicates the Maximum Segment Lifetime for which a packet is discarded. TCP packets are BASED on THE IP protocol, and the TTL field in the IP header is the maximum number of routes that an IP packet can pass through. This value decreases by 1 for each router that processes the PACKET. When this value is 0, the packet is discarded and an ICMP packet is sent to notify the source host.

The difference between MSL and TTL: THE unit of MSL is time, while TTL is the number of hops through the route. Therefore, MSL should be greater than or equal to the time when TTL consumption is 0 to ensure that the message has died naturally.

TIME_WAIT Waits twice as long for MSL packets. When packets from the sender are processed by the receiver, the sender sends a response to the receiver.

For example, if the passive closing party does not receive the last ACK packet of the disconnected connection, a TIMEOUT resending Fin packet is triggered. After receiving a Fin, the other party resends an ACK to the passive closing party, with exactly two MSLS coming and going.

The 2MSL time starts when the client sends an ACK after receiving the FIN. If the CLIENT receives a FIN packet from the server again because the ACK from the client is not transmitted to the server during the time-wait, the 2MSL TIME is reset.

On Linux, 2MSL defaults to 60 seconds, so an MSL is 30 seconds. Linux stays in TIME_WAIT for a fixed period of 60 seconds.

This is defined in the Linux kernel code as TCP_TIMEWAIT_LEN:

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT state, about 60 seconds  */
Copy the code

To change the length of TIME_WAIT, change the value of TCP_TIMEWAIT_LEN in the Linux kernel code and recompile the Linux kernel.

Why do I need the TIME_WAIT state?

A time-wait state exists only for the party that initiates the closing of a connection.

Time-wait is required for two reasons:

  • Prevent “old” packets with the same “quad” from being received;
  • Ensure that the “passive closing connection” can be closed correctly, that is, ensure that the last ACK can be received by the passive closing party, thus helping it to close properly;

Cause one: Prevent data packets from old connections

An exception that received historical data

  • As shown in the yellow box above, the server sends before closing the connectionSEQ = 301The packet was delayed by the network.
  • In this case, TCP connections with the same port are multiplexed and delayedSEQ = 301If the packet reaches the client, the client may normally receive the expired packet, which may cause serious problems such as data confusion.

Therefore, TCP designed such a mechanism, after 2MSL this time, enough to let the two directions of the packet are discarded, so that the original connection of the packet in the network are naturally disappeared, and the packets that appear again must be the new connection generated.

Cause two: Ensure that the connection is closed correctly

RFC 793 points out that another important function of time-wait is:

TIME-WAIT – represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.

In other words, the time-wait function is to WAIT for enough TIME to ensure that the final ACK is received by the passive closing party to help it close properly.

Given that time-wait has no WAIT TIME or is too short, what problems do disconnections cause?

  • See the last of the four client waves in the red box aboveACKIf the packet is lost on the network, if the clientTIME-WAITIf it’s too short or not, it’s straight inCLOSEDState, then the server will always be inLASE_ACKState.
  • When a client initiates a connectionSYNThe server sends a request packetRSTThe packet is sent to the client and the connection is terminated.

If a time-wait is long enough, there are two cases:

  • The server normally receives the last of four wavesACK“, the server closes the connection.
  • The server did not receive the last of the four wavesACKThe packet is resendFINClose the connection message and wait for a new oneACKMessage.

Therefore, after the client waits in time-wait state for 2MSL, the connection between the two parties can be closed normally.

What if the connection has been established, but the client suddenly fails?

TCP has a keepalive mechanism. The mechanism works like this:

Define a time period, in this time period, if there is no link related activities, mechanism of TCP keep alive will start, every once in a time interval, sending a probe packets, the detecting data packet contains very little, if several continuous detecting messages are not getting response, argues that the current TCP connection is dead, The system kernel notifies the upper-layer application of the error message.

The Linux kernel has corresponding parameters to set the keepalive time, number of keepalive probes, and interval of keepalive probes. The default values are as follows:

net.ipv4.tcp_keepalive_time=7200
net.ipv4.tcp_keepalive_intvl=75  
net.ipv4.tcp_keepalive_probes=9
Copy the code
  • Tcp_keepalive_time =7200: indicates that the keepalive time is 7200 seconds (2 hours). If there is no activity related to the connection within 2 hours, the keepalive mechanism is enabled
  • Tcp_keepalive_intvl =75: indicates that each detection interval is 75 seconds.
  • Tcp_keepalive_probes =9: indicates that no response is detected for nine times and the peer party is considered unreachable and the current connection is interrupted.

This means that on Linux, it takes at least 2 hours, 11 minutes and 15 seconds to discover a “dead” connection.

This time is a little long, we can also according to the actual needs, the above live related parameters to set.

If TCP keepalive is enabled, consider the following situations:

First, the peer program is working. When a TCP keepalive probe packet is sent to the peer end, the peer end responds normally. In this case, the TCP keepalive time is reset and the TCP keepalive time is waiting for the next TCP keepalive time.

In the second case, the peer program crashes and restarts. After a TCP probe packet is sent to the peer end, the peer end can respond. However, the peer end generates an RST packet because there is no valid information about the connection. In this case, the peer end quickly discovers that the TCP connection has been reset.

Third, the peer program crashes or packets are unreachable due to other reasons. After the TCP keepalive probe packet is sent to the peer end, no response is received. When the keepalive probe number reaches several times, TCP reports that the TCP connection is dead.

UDP

UDP header format

UDP header format

  • Destination and source ports: Indicates which process the UDP protocol should send the packet to.
  • Packet length: This field stores the sum of the length of the UDP header and the data length.
  • Checksum: Checksum is designed to provide reliable UDP headers and data.

What’s the difference between UDP and TCP?

1. The connection

  • TCP is a connection-oriented transport layer protocol. A connection is established before data transmission.
  • UDP does not require a connection and transmits data instantly.

2. Service objects

  • TCP is a one-to-one two-point service, that is, a connection has only two endpoints.
  • UDP supports one-to-one, one-to-many, and many-to-many communication.

3. The reliability

  • TCP delivers data reliably, error-free, without loss, without duplication, and on demand.
  • UDP is best effort delivery and does not guarantee reliable delivery of data.

4. Congestion control and flow control

  • TCP provides congestion control and traffic control mechanisms to ensure data transmission security.
  • UDP does not. Even if the network is very congested, it does not affect the sending rate of UDP.

5. Overhead

  • The TCP header is long and has some overhead. If the header does not use the “options” field, it is20Bytes, or longer if the Options field is used.
  • The UDP header is only 8 bytes long, and is fixed and less expensive.

6. Transmission mode

  • TCP is streaming, without boundaries, but sequential and reliable.
  • UDP is the sending of packets one by one. It has boundaries, but may lose packets and be out of order.

7. Different slices

  • If the size of TCP data is larger than that of MSS, TCP packets are fragmented at the transport layer. After receiving TCP packets, the target host assembles TCP packets at the transport layer. If a fragment is lost, only the lost fragment needs to be transmitted.
  • If UDP data is larger than the MTU, it is fragmented at the IP layer. After receiving the UDP data, the destination host assembles the data at the IP layer and then sends the data to the transport layer. However, if a fragment is lost, all the packets need to be retransmitted.

UDP and TCP application scenarios

TCP is connection-oriented and ensures reliable data delivery. Therefore, it is often used for:

  • FTPThe file transfer
  • HTTP / HTTPS

Because UDP is connectionless, it can send data at any time, and UDP itself is simple and efficient, so it is often used for:

  • A communication with a small amount of packets, such asDNSSNMP
  • Video, audio and other multimedia communication
  • Radio communication

IP

The role of IP

IP is at the third layer, the network layer, in the TCP/IP reference model.

The main function of the network layer is to realize the communication between hosts, also called end to end communication.

IPv4 said

IP addresses (IPv4 addresses) are represented by 32 – bit positive integers, and IP addresses are treated in binary mode by computers.

In order to facilitate memory, humans use dotted decimal notation, that is, the 32-bit IP address is divided into four groups by eight bits, and each group is marked with “. And convert each group to decimal.

IP classification

When the Internet was born, IP addresses seemed abundant, so computer scientists devised classified addresses.

IP addresses are classified into five types: CLASS A, B, C, D, and E.

In the figure above, the yellow part is a category number, which is used to distinguish IP address types.

What are class A, B, and C addresses?

Categories A, B, and C are divided into two parts, namely, network NUMBER and host number.

How to calculate the maximum number of hosts for class A, B, and C addresses?

The maximum number of hosts depends on the number of digits of the host number. For example, if the number of a class C address occupies 8 digits, the maximum number of hosts of a class C address is as follows:

Why do we subtract 2?

Because among IP addresses, there are two special IP addresses: all-1 and all-0.

  • All host ids 1 Specify all hosts on a network for broadcasting
  • If all host ids are 0, specify a network

Therefore, in the allocation process, these two cases should be removed.

What are broadcast addresses used for?

Broadcast addresses are used to send packets between hosts connected to each other on the same link.

If all host ids are 1, the host id indicates the broadcast address of the network. For example, 172.20.0.0/16 is expressed in binary as follows:

10101100.00010100.00000000.00000000

Change all the host parts of this address to 1 to form a broadcast address:

10101100.00010100.11111111.11111111

If the IP address is expressed in decimal notation, it is 172.20.255.255.

Broadcast addresses can be divided into local broadcast and direct broadcast.

  • Broadcast within the local network is called local broadcast. For example, if the network address is 192.168.0.0/24, the broadcast address is 192.168.0.255. Because IP packets from this broadcast address are blocked by the router, they will not reach any link other than 192.168.0.0/24.
  • Broadcasting between different networks is called direct broadcasting. For example, the host whose network address is 192.168.0.0/24 sends an IP packet to the destination address of 192.168.1.255/24. The receiving router forwards the packet to 192.168.1.0/24 so that all hosts from 192.168.1.1 to 192.168.1.254 can receive the packet. (In most cases, the router will not forward the packet because of the security problem of direct broadcast.) .

What are class D and CLASS E addresses?

Class D and E addresses do not have host numbers, so they cannot be used for host IP addresses. Class D is often used for multicast, and class E is reserved and temporarily unused.

What are multicast addresses used for?

Broadcast cannot penetrate routes. If you want to send the same packet to other network segments, you can use multicast that can penetrate routes.

The first four bits of a class D address used for multicast are 1110, and the remaining 28 bits are multicast group numbers.

The range from 224.0.0.0 to 239.255.255.255 is available for multicast, which can be divided into the following three categories:

  • 224.0.0.0 to 224.0.0.255 are reserved multicast addresses that can only be used on the LAN. The router does not forward the multicast addresses.
  • 224.0.1.0 to 238.255.255.255 are available multicast addresses that can be used on the Internet.
  • 239.0.0.0 to 239.255.255.255 are the local management multicast addresses that can be used internally by the Intranet. They are valid only in a specific local range.

Advantages of IP classification

Whether the router or the host resolves an IP address, we determine whether the first part of the IP address is 0, which is A class A address, so we can quickly find out the network address and host address.

For other classification and judgment methods, please refer to the figure below:

Therefore, the advantages of this kind of address classification are simple and clear, simple route selection (based on network address).

Disadvantages of IP classification

One disadvantage

There is no address hierarchy under the same network. For example, a company uses class B address, but it may need to divide the address hierarchy according to the production environment, test environment, and development environment. Such IP classification does not have the function of address hierarchy division, so it lacks the flexibility of address.

Defect 2

Classes A, B and C have an awkward situation, that is, they cannot match well with the real network.

  • The maximum number of hosts a class C address can contain is too small, only 254, it is estimated that one Internet cafe is not enough.
  • And the maximum host number that class B address can contain is too much, more than 60,000 machines are put under a network, general enterprises can not reach this scale basically, idle address is waste.

IPv6

The highlight of IPv6

IPv6 doesn’t just have more addresses to assign, it has a lot more to offer.

  • IPv6 can be automatically configured, even if there is no DHCP server can automatically assign IP addresses, it is really convenient to plug and play ah.
  • The length of the IPv6 packet header is fixed40Byte, remove the packet header checksum, simplify the head structure, reduce the load of the router, greatlyImproved transmission performance.
  • IPv6 has network security functions to prevent forged IP addresses and line eavesdropping, greatly enhancing security.

Method of identifying an IPv6 address

The length of an IPv4 address consists of 32 bits. Each group of 8 bits is expressed in dotted decimal notation.

The length of an IPv6 address is 128 bits. Each group of 16 bits is separated by a colon (:).

Consecutive zeros can also be omitted and separated by two colons ::. However, you can only have two consecutive colons in an IP address once.

IPv6 address structure

IPv6 is similar to IPv4. The first few bits of an IP address identify the type of an IP address.

IPv6 addresses include the following types:

  • Unicast address used for one-to-one communication
  • Multicast address used for one-to-many communication
  • An anycast address used to communicate with the nearest node, which is determined by the routing protocol
  • No broadcast address

IPv6 unicast address type

For IPv6 addresses for one-to-one communication, there are three types of unicast addresses, each of which has a different effective range.

  • For unicast communication on the same link, the link-local unicast address can be used without a router. IPv4 does not have this type
  • Unicast communication on the Intranet can use a unique local address, equivalent to an IPv4 private IP address
  • When communicating over the Internet, you can use a global unicast address, which is equivalent to the IPv4 public IP address

IPv4 and IPv6 headers

IPv6 header improvements over IPv4:

  • Cancels the header checksum field. IPv6 disables IP verification because both data link layer and transport layer are verified.
  • The sharding/reassembly related fields were cancelled. Fragmentation and reassembly are time-consuming processes. IPv6 does not allow fragmentation and reassembly on intermediate routers. Such operations can only be performed on the source and destination hosts, which greatly improves the forwarding speed of routers.
  • Cancel the option field.The option field is no longer part of the standard IP header, but it has not disappeared and may appear in the “next header” position indicated in the IPv6 header. Removing the option field causes the IPv6 header to be fixed length40Bytes.

DNS

DNS Domain name resolution: The DNS automatically translates domain name addresses into specific IP addresses.

Workflow of domain name resolution

If not, check the host domain name resolution file hosts. If not, the DNS server will query it. The query process is as follows:

  1. The client first sends a DNS request asking for the IP address of www.server.com to the local DNS server (that is, the DNS server address specified in the TCP/IP Settings of the client).
  2. After receiving the request from the client, the local DNS server returns the IP address directly if www.server.com can be found in the table in the cache. If not, the local DNS will ask its root DNS server: “Boss, can you tell me the IP address of www.server.com?” The root DNS server is the highest level. It is not used for domain name resolution directly, but it can point the way.
  3. After receiving a request from the local DNS server, the root DNS server finds that the domain name is.com and says, “www.server.com is a domain name managed by the.com zone.” I give you the url of the.com top-level domain name server.
  4. After receiving the address of the top-level domain name server, the local DNS sends a request asking “Dick, can you tell me the IP address of www.server.com?”
  5. The top-level DNS server says, “I’ll give you the address of the authoritative DNS server responsible for the www.server.com area, and you can ask it.”
  6. The local DNS then turns to the authoritative DNS server and asks, “Old three, what is the IP address corresponding to www.server.com?” Server.com authoritative DNS server, which is the source of domain name resolution results. Why authority? Is my domain name I call the shots.
  7. The authoritative DNS server displays the corresponding IP address X.X.X.X to the local DNS server.
  8. The local DNS returns the IP address to the client, and the client establishes a connection with the target.

ARP

When transmitting an IP packet, the host routing table determines the next hop of the IP packet after the source IP address and destination IP address are determined. However, the next layer in the network layer is the data link layer, so we also need to know the MAC address of the “next hop”.

Because the IP address of the next hop can be found in the routing table of the host, you can use ARP to obtain the MAC address of the next hop.

How does ARP know the MAC address of the peer?

  • The host broadcasts an ARP request that contains the host IP address of the MAC address it wants to know.
  • When receiving an ARP request, all devices on the same link unpack the ARP request packet. If the destination IP address in the ARP request packet is the same as its own IP address, the device inserts its MAC address into the ARP response packet and sends it back to the host.

The operating system caches the MAC address obtained through ARP for the first time so that the MAC address corresponding to the IP address can be directly retrieved from the cache next time.

However, the MAC address is cached for a certain period of time, after which the cached content will be cleared.

The TCP three-way handshake and four-way wave interview questions are asked thousands of times

The TCP thing (PART 1)

How is the network connected

The principle and implementation of port scanning

Follow the animation to learn the TCP three-way handshake and four-way wave

30 illustrations: TCP retransmission, sliding Windows, flow control, congestion control

IP basic knowledge of the whole family bucket, 45 pictures a set to take away