Peeling off layers of network protocol is too long

preface

This is a long, long article that will test your patience!

I. Theoretical five-layer model

The implementation of the Internet is divided into several layers, each with its own unique function, and each layer supported by the next layer. Users are only exposed to the top layer, what we call the application layer, and to understand the Internet you have to start at the bottom and understand the functionality of each layer from the bottom up.

There are three common network models:

OSI seven layer model
Theoretical five-tier model
TCP/IP four-tier model

The relationship between them is shown below

Among them, the theoretical five layer model is a kind of principle architecture which combines the advantages of OSI seven layer and TCP/IP four layer. The following discussion is also based on the theoretical five-layer model.

The structure of the theoretical five-layer model is shown below

The functions of each layer are as follows:

The application layer
- Directly provides services for user application processes
- Common protocols areHTTP,HTTPS,SMTP,TELNETEtc.
The transport layer
- The transport layer task is to provide a reliable connection between two processes that communicate at the application layerThe end-to-endServices that blind them to the details of data communication below the transport layer.
- The end-to-endData is sent from a port to a specified port
- port: An integer between 0 and 65535, used to specify a specific application
- Common protocols areTCP,UDPEtc.
The network layer
- The task of the network layer is to select the appropriate route so that the packet can accurately find the destination according to the address and deliver it to the transport layer of the destination.
- The network layer protocol is IP
Data link layer
- The data link layer groups data individuallyframeAnd determine the packet format of the electrical signal.
- The data link layer protocol isEthernetagreement
The physical layer
- The physical layer defines the standards used to connect computers to each other, and network signals in the network are transmitted by electrical signals of 0 and 1.

Simply put, the lower the layer, the closer to the hardware; The higher the layer, the closer it is to the user.

What was the agreement?

Each layer serves a function. In order to implement these functions, there are some common rules that need to be followed. These rules are called protocols.

At every layer of the Internet, there are many protocols defined. These protocols are collectively called The Internet Protocol Suite, and they are at the heart of the Internet.

In the following sections, we introduce the functions of each layer and explain the functions of the main protocols in each layer.

Second, the physical layer

Computer to network, need to connect the computer through a variety of equipment into the network, equipment has optical cable, cable, twisted-pair, infinite wave and so on.

This is the physical layer, which is the physical means that connect computers together. It mainly defines the electrical properties of the network, and its role is to transmit the 0 and 1 electrical signals.

Data link layer

Definition 1.

The physical layer is the medium used to transmit signals, which are 0 and 1 electrical signals. But there are no rules about how electrical signals are grouped and what each signal bit means.

This is the function of the data link layer, which determines the grouping mode of 0 and 1 above the physical layer for information transfer between two devices (the same type of data link node).

2. Ethernet protocol

In the early days, each company had its own way of grouping electrical signals. Gradually, a protocol called Ethernet came to dominate.

Ethernet defines a packet of electrical signals as a frame. Each frame is divided into two parts: a Head and a Data.

Headers and data

The header contains some description of the packet, such as sender, receiver, data type, etc. Data is the content of the packet.
Restrictions on headers and data

The length of the header, fixed to 18 bytes The minimum length of the data is 46 bytes and the maximum length is 1500 bytes.

Therefore, the entire frame has a minimum of 64 bytes and a maximum of 1518 bytes. If the data is long, it must be split into multiple frames and sent.

MTU(Maximum Transmission Unit)

The MTU is the restriction of the link layer on the physical layer.

The length of an Ethernet frame ranges from 46 to 1500 bytes. The maximum 1500 bytes is called the MTU of the Ethernet. Different network types have different Mtus.
If a packet is routed from the Ethernet to the dial-up link and the length of the packet is larger than the MTU of the dial-up link, you need to fragment the packet.

MTU Impact on IP protocols

Due to the MTU limitation at the link layer, if the packet size exceeds 1500 bytes, the packet must be fragmented and sent.

A large IP packet is divided into multiple packets and each packet is labeled with the same 16-bit ID in the IP protocol header. In this way, the packet from which the packet comes can be identified during reassembly.
In the 3-bit flag field of the IP protocol header of each packet, bit MF of the second packet is set to 0, indicating that sharding is allowed, and bit DF of the third packet represents the end flag (if it is the last packet, it is set to 1, otherwise it is set to 0).
When arriving at the peer end, these packets are reassembled sequentially and returned to the transport layer. If any of these packets are lost, the receiver’s reorganization fails, but the IP layer is not responsible for retransmitting the data.

Impact of MTU on UDP

As long as the data carried in UDP packets exceeds 1472(1500-20 (IP header) -8 (UDP header)), IP packets are divided into multiple IP datagrams at the network layer.
The loss of any of these IP datagrams will result in the failure of the receiving network layer reorganization. This means that if UDP datagrams are fragmented at the network layer, the probability of the entire data being lost is greatly increased.
In the LAN environment, it is recommended that the UDP data be less than 1472 bytes. On the Internet, it is recommended that the UDP data be less than 548 bytes.

Impact of MTU on TCP

The length of a TCP packet cannot be infinite and is still subject to the MTU. The maximum length of a TCP packet is called the Max Segment Size (MSS).
When TCP establishes a connection, it negotiates the size of the MSS with the other party (this can only be negotiated during the three-way handshake, otherwise the default value is 536 bytes). Ideally, the value of MSS is exactly the length at which the IP will not be sharded (this length is still subject to the MTU of the data link layer).
When SYN is sent, both parties write the supported MSS value in the TCP header. Then both sides know the MSS value of the other side and choose the smaller one as the final MSS. The MSS value is in the 40-byte variable length option at the TCP header (kind=2).
MSS = MTU-TCP header – IP header, that is, the data length of TCP packets.

3. The MAC address

As mentioned above, the header of an Ethernet packet contains information about the sender and receiver. So how are senders and receivers identified?

The Ethernet stipulates that all devices connected to the network must have nic interfaces. Packets have to go from one network card to another. The address of the nic is the address for sending and receiving packets, which is called the MAC address.

Each NIC is delivered with a MAC address unique in the world. The length of the ADDRESS is 48 bits, usually represented by 12 hexadecimal numbers.

The first six hexadecimal digits are the manufacturer’s SERIAL number, and the last six hexadecimal digits are the nic serial number of the manufacturer. With a MAC address, you can locate the path of the network card and the packet.

Radio 4.

Defining the address is only the first step, there are more steps to follow:

1) First: How does one network adapter know the MAC address of another network adapter?

The answer is that there is an ARP protocol that can solve this problem. This is covered in this article at the network layer, where it is only necessary to know that Ethernet packets must know the MAC address of the receiver before they can be sent.
2) Second: even with the MAC address, how can the system accurately send the packet to the recipient?

The answer is that Ethernet works in a very primitive way. Instead of sending packets exactly to the receiver, it sends packets to all the computers on the network, and lets each computer decide for itself whether it is the receiver or not.

In the figure above, computer no. 5 sends a packet to computer No. 3. Computers No. 1, 2, 3, 4, and 6 in the same subnetwork all receive the packet. They read the packet header, find the recipient’s MAC address, compare it to their own MAC address, and if they are the same, receive the packet for further processing, or discard it. This mode of transmission is called broadcasting.

With a packet definition, a MAC address for the network card, and a way to send the broadcast, the “link layer” can transmit data between multiple computers.

Fourth, network layer

1. The origin of the network layer

Ethernet protocol that relies on MAC addresses to send data. Theoretically, chengdu’s network card can find Houston’s network card by MAC address alone, which is technically possible.

There is, however, a major drawback to this. Ethernet uses broadcast mode to send data packets, which is inefficient and limited to the sub-network of the sender. That is, if the two computers are not in the same subnetwork, the broadcast will not pass through. This design makes sense, otherwise every computer on the Internet would receive all the packets and that would be a disaster.

The Internet is a vast network of countless sub-networks, much like imagining computers in Chengdu and Houston in the same sub-network. It is almost impossible.

Therefore, a way must be found to tell which MAC addresses belong to the same subnetwork and which do not. If it is the same subnetwork, the packets are sent in broadcast mode; otherwise, the packets are sent in routing mode. (Routing, which means how packets are distributed to different subnetworks, is a big topic that won’t be covered in this article.) Unfortunately, MAC addresses alone can’t do that. It’s all about the vendor, not the network.

This led to the creation of the network layer. What it does is introduce a new set of addresses that allow us to distinguish between different computers that belong to the same subnetwork. This set of addresses is called web addresses, or web addresses for short.

So, with the advent of the network layer, each computer has two kinds of address, one is a MAC address, the other is a network address. The MAC address is bound to the network adapter, and the network address is assigned by the administrator. They are just randomly combined.

Network addresses help us determine which subnetwork the computer is on, and MAC addresses send packets to the destination network card in that subnetwork. Therefore, it is logical to assume that the network address must be processed first and then the MAC address.

2. IP protocol

The protocol for specifying network addresses is called the IP protocol. The address it defines is called an IP address. At present, the fourth and sixth editions of IP protocols, called IPv4 and IPv6, are widely used.

(1) the IPv4

IPv4 specifies that a network address consists of 32 bits.
Traditionally, IP addresses are represented by four decimal numbers, ranging from 0.0.0.0 to 255.255.255.255.

Every computer on the Internet is assigned an IP address.

The address is divided into two parts, the first part representing the network and the second part representing the host.

For example, if the IP address 14.215.177.39, which is a 32-bit address, assumes that the network part is the first 24 bits (14.215.177) and the host part is the last 8 bits (the last 1). Computers in the same subnetwork must have the same network part of their IP addresses, that is, 14.215.177.2 and 14.215.177.1 must be in the same subnetwork.

The problem, however, is that we can’t judge the network from the IP address alone. Take 14.215.177.39 as an example. It is not clear from the IP address whether the network part is the first 24 bits, the first 16 bits, or even the first 28 bits. So how can you tell whether two computers belong to the same subnetwork from their IP addresses? This uses another parameter, the subnet mask.

Subnet mask:

A subnet mask is a parameter that represents the characteristics of a subnetwork. Formally equivalent to an IP address, it is also a 32-bit binary number with all 1s in the network part and all 0s in the host part. 172.16.254.1, for example, IP address, if known network part is the first 24 bits, the host part is eight, after the subnet mask is 11111111.11111111.11111111.00000000, written in a decimal is 255.255.255.0.
Knowing the subnet mask, we can tell if any two IP addresses are in the same subnetwork. The method is to perform a bitwise and operation on two IP addresses and subnet masks (both digits are 1, the result is 1, otherwise it is 0), and then compare whether the results are the same. If so, they are in the same subnetwork; otherwise, they are not.

NAT

We know that IPv4 addresses are only 32 bits long, and the number of Internet users on the planet has far exceeded that, so why hasn’t it run out yet?

Because we also have some technologies that can mitigate the address shortage in disguise, such as NAT technology.

Each small LOCAL area network (LAN) uses a private network address on a network segment, which is converted into a public network address when connected to the outside world. In this way, dozens or hundreds of computers need only one public address.
You can even do private network over private network, NAT over NAT, layer by layer. In this way, the number of public IP addresses is greatly reduced. Just because of this, let us “continued life” to today, not unable to access the Internet.
However, NAT also has many disadvantages. Although it is convenient for a private network address to access an Internet address, it is difficult for an Internet address to access a private network address. Many services are limited and you can only solve them through complex Settings, which will affect the efficiency of network processing.

(2) the IPv6

Definition:

An IPv6 address consists of eight groups of four hexadecimal digits, separated by:.
IPv6 support compression, the expression method of leading zeros in the address as shown in figure can be written as: 2001: D12:0:0:2 AA: 987: FE29:9871
When the colon in hexadecimal format for several segments of the number 0 bit, these segments can be compressed as a double colon (:), so we got the final simplified format: 2001: D12: : 2 aa: 987: FE29:9871
Note: The double colon appears only once

IPv6 number segment division and prefix representation

IPv6 has a huge address space of 128 bits. For such a large space, it is not randomly divided, but uses the number segment according to the bit.

The following figure shows the IPv6 address structure

For example, n=48 and m=16 are defined in RFC4291, that is, the subnet ID and interface ID are respectively 64 bits.

IPv6 does not have the concept of subnet mask. Instead, it supports the identification method of subnet prefix.

An IPv6 address/prefix length method is used, for example:

2001:C3: 0:2C6a ::/64 Indicates a subnet.
In 2001: the C3:0-2 C6A: C9B4: FF12:48 BC: 1 a22/64 indicates a node under the subnet address.

An IPv6 address consists of a subnet prefix and an interface ID. The subnet prefix is defined and assigned by the address allocation and management organization, and the interface ID can be generated by each operating system.

Advantages of IPv6

IPv6 is used to solve the problem of IPv4 address exhaustion. IPv4 addresses are 32 bits and IPv6 addresses are 128 bits. Besides the number of addresses, IPv6 has many advantages.

IPv6 uses smaller routing tables. Making routers forward packets faster;
IPv6 adds enhanced multicast support and flow control, which is beneficial for multimedia applications and quality of service (QoS) control.
IPv6 added support for automatic configuration. This is the improvement and extension of DHCP protocol, making the network (especially LAN) management more convenient and fast;
IPv6 has higher security. Users can encrypt data at the network layer and verify IP packets, which greatly enhances network security.
IPv6 provides better capacity expansion. IPV6 allows the protocol to be extended if new technologies or applications require it.
IPv6 has a better header format. The new header format of IPV6 simplifies and speeds up the routing process and improves efficiency.

(3) The difference between IPv4 and IPv6

In the packet, the type field of the IPv6 data link layer is 0x86DD, and that of the IPv4 data link layer is 0x0800
The header of an IPv6 packet is 40 bytes, and the header of an IPv4 packet is 20 to 60 bytes. This means that writing code to process IPv6 packets is much more efficient
The checksum field is deleted in the header of IPv6 packets, improving the forwarding efficiency of routers. However, it is worth noting that UDP and TCP enforce checksums under IPv6 (IPv4 is optional).
When IPv6 packets carry upper-layer protocols such as ICMPv6, TCP, or UDP, the Next Header value is 58, 6, and 17 respectively, which is similar to the Protocol field in the Header of IPv4 packets
If none of the preceding three protocols are used, the IPv6 packet header is followed by the extended header. Extended headers are a new concept introduced by IPv6. Each IPv6 packet can carry zero or more extended headers, which are organized in a linked list. When an IPv6 data packet carries an extended Header, the value of the Next Header is the type of the extended Header. Why to introduce the concept of extension header? This is also an improvement of IPv6 over IPv4. The extension header replaces the optional information of IPv4, simplifies the header of IPv6, and enhances the expansibility of IPv6. When an IPv6 fragment data packet is sent, IPv6 uses the extended Header to organize information about each fragment. As shown in the figure, the value of Next Header in the IPv6 packet Header 44 indicates that an extended Header exists. The extended Header is the information about IPv6 fragment data. In contrast, IPv4 fragment information is recorded in the fragment field in the header of IPv4 packets.

4) IP protocol summary

As mentioned above, the IP protocol has two main functions:

Assign IP addresses to each computer
Determine which addresses are in the same subnetwork

3. IP packets

Data sent over IP is called AN IP packet. We put IP packets directly into the “data” section of Ethernet packets without modifying Ethernet specifications. That’s the beauty of the Internet’s layered structure: what happens at the top doesn’t affect what happens at the bottom.

Specifically, IP packets are also divided into header and data: The header ranges from 20 to 60 bytes (IPv6 is fixed to 40 bytes), and the total length of the entire packet is up to 65535 bytes. Therefore, theoretically, the data portion of an IP packet is 65515 bytes at most.

As shown in the figure, the 20 bytes in the header are fixed and contain the version, length, IP address, and other information, as well as the optional variable part of the header. The data is the content of the IP packet.

When placed in an Ethernet packet, the Ethernet packet looks like this:

In the Ethernet protocol, the data portion of an Ethernet packet is only 1500 bytes at most. Therefore, if an IP packet is larger than 1500 bytes, it needs to be split into several Ethernet packets and sent separately.

4. ARP protocol

There is one last point about the network layer. Because IP packets are sent in Ethernet packets, we must know both the MAC address and IP address of the other party. Usually, the IP address of the other party is known, but we do not know its MAC address. So, we need a mechanism to get MAC addresses from IP addresses.

Again, there are two cases:

In the first case, if the two hosts are not in the same subnetwork, they cannot get the MAC address of the other host. They can only send the data packet to the gateway at the connection of the two subnetworks for the gateway to process.
In the second case, if two hosts are in the same subnetwork, we can use ARP to obtain the MAC address of the other host. ARP also sends a packet (contained in an Ethernet packet) containing the IP address of the host to be queried. The field of the MAC address of the peer is FF:FF:FF:FF:FF:FF, indicating that this is a broadcast address. Each host in its subnetwork receives the packet, extracts its IP address, and compares it with its own IP address. If they are the same, both reply with their MAC addresses. Otherwise, the packet is discarded.

In short, with THE ARP protocol, we can get the MAC address of the host in the same subnetwork, and can send packets to any host.

ARP attack

ARP attacks take advantage of the lack of security verification vulnerabilities in THE design of ARP protocol. They steal communication data of legitimate users by forging ARP packets, causing serious harm such as affecting network transmission rate and stealing user privacy information.

ARP attacks mainly exist in the LOCAL area network (LAN). If a computer on the LAN is infected with an ARP Trojan, the system that is infected with the ARP Trojan attempts to intercept the communication information of other computers on the network through ARP spoofing, which causes communication failure of other computers on the network.

Common ARP attacks in the LAN include intermittent Internet access, file copying failure, and ARP packet surge. The MAC address corresponds to multiple IP addresses. The network data cannot be sent out. The information sent on the network is stolen. The protocol address of the packet does not match, resulting in a large number of ARP packets in the network.

In the LAN environment, ARP attacks are the main security threats. In traditional networks, ARP attacks are solved through static binding, but this mode limits the ease of network extension.

5. Transport layer

1. The origin of the transport layer

With MAC addresses and IP addresses, we can already establish communication between any two hosts on the Internet.

The next problem is that there are many applications on the same host that need to use the Web, for example, while you browse the Web and chat with your friends online. When a packet comes in from the Internet, how do you know if it’s the content of a web page or an online chat?

That is, we also need a parameter to indicate which program (process) is using the packet. This parameter is called port, and it is actually the number of each program that uses the network card. Each packet is sent to a specific port on the host, so different programs can get the data they need.

The port is an integer between 0 and 65535, with exactly 16 bits. Ports 0 to 1023 are occupied by the system. You can select only ports larger than 1023. Whether browsing the Web or chatting online, the application selects a port at random and then contacts the corresponding port on the server.

The function of the transport layer is to establish port-to-port communication. In contrast, the function of the network layer is to establish host-to-host communication. As long as we identify the host and port, we can implement communication between programs. For this reason, Unix systems call hosts + ports a socket. With it, you can do web application development.

2. UDP protocol: user datagram protocol

Now, we have to add port information to packets, which requires new protocols. The simplest implementation is called UDP, and the format is almost nothing more than a port number in front of the data.

A UDP packet consists of a header and data:

As shown in the figure, the UDP header is fixed at 8 bytes and contains the 16-bit source port and 16-bit destination port, as well as the 16-bit UDP length and 16-bit UDP checksum. Then, the entire UDP packet is placed in the data section of the IP packet, which, as mentioned earlier, is placed in the Ethernet packet, so the entire Ethernet packet now looks like this:

UDP packets are very simple. The header is only 8 bytes, and the total length is no more than 65,535 bytes, which fits into an IP packet.

The maximum size of a UDP packet

In a LAN environment, it is recommended that the UDP data be less than 1472 bytes
- The length of Ethernet data frames must be between 46 and 1500 bytes, which is determined by the physical properties of Ethernet. This 1500 bytes is called the MTU(Maximum Transmission unit) of the link layer. This does not mean that the length of the link layer is limited to 1500 bytes. In fact, the MTU refers to the data area of the link layer and does not include the 18 bytes at the front and rear of the link layer.
- So, in effect, this 1500 bytes is the length limit for network layer IP datagrams. Since the header of an IP datagram is a minimum of 20 bytes, the maximum length of an IP datagram is 1480 bytes. The 1480 bytes are used to store TCP segments sent by TCP or UDP datagrams sent by UDP.
- Since the header of a UDP datagram is 8 bytes, the maximum length of a UDP datagram is 1472 bytes. This 1472 bytes is the number of bytes we can use.
- What happens when we send more than 1472 UDP data? This means that the IP packet is greater than 1500 bytes and larger than the MTU. In this case, the SENDER IP layer needs to be fragmented. The datagram is divided into several pieces, so that each piece is smaller than the MTU, and the receiver IP layer needs to reorganize the datagram. This does a lot more, and worse, because of the nature of UDP, when a piece of data is lost in a transmission, the receiver cannot reconstruct the datagram, resulting in the entire UDP datagram being discarded.
- Therefore, in a normal LAN environment, I recommend keeping the UDP data under 1472 bytes.
During Internet programming, it is recommended that the UDP data be less than 548 bytes
- Internet programming is different because routers on the Internet may set the MTU to a different value. If we assume that an MTU of 1500 is used to send data, and the MTU of a network passing through is less than 1500 bytes, then the system will use a number of mechanisms to adjust the MTU value so that the datagram can reach its destination smoothly, which will do a lot of unnecessary operations.
- Since the standard MTU value on the Internet is 576 bytes, I recommend that the UDP data length be controlled within 548 bytes (576-8-20) when programming UDP on the Internet.
- The ipv4 protocol specifies a minimum reassembly buffer size of 576 for the IP layer. Therefore, it is recommended that UDP packets do not exceed this size, not because the standard MTU for the Internet is 576!

3. TCP: Transmission control protocol

The ADVANTAGE of UDP is that it is simple and easy to implement. However, the disadvantage is that the reliability is poor. Once a packet is sent, you cannot know whether the packet has been received. In order to solve this problem, improve network reliability, TCP protocol was born. This protocol is very complex, but can be approximated as UDP protocol with acknowledgement mechanism, each packet sent requires confirmation. If a packet is missing, no acknowledgement is received and the sender knows it is necessary to resend the packet.

Therefore, TCP ensures that data is not lost. Its disadvantages are complicated process, difficult implementation and consuming more resources.

As shown in the figure, the TCP header ranges from 20 to 60 bytes. In addition to the source port number and destination port number, the TCP header also contains various information, such as serial number and confirmation number, to ensure reliable connection and data resending.

Like UDP packets, TCP packets are embedded in IP packets. The length of a TCP packet is unlimited. However, to ensure network efficiency, the length of a TCP packet does not exceed that of an IP packet, so that a single TCP packet does not need to be split.

Application layer

The application receives data from the transport layer and then interprets it. Because the Internet is an open architecture, and data comes from so many different sources, it has to be formatted beforehand, otherwise it’s impossible to read. The role of the application layer is to dictate the data format of the application.

For example, TCP can transfer data for a variety of applications, such as Email, WWW, FTP, and so on. Then, different protocols must dictate the format of E-mail, web pages, and FTP data, and these application protocols constitute the application layer. This is the highest layer, directly facing the user. Its data is in the data section of the TCP packet.

So Ethernet packets now look like this:

conclusion

I just want to say that anyone reading this is a monster!!

Just like it and watch it and go!

Source: juejin. Im /post/687686…