All over the world, a wide variety of computers run a wide variety of operating systems to serve computer users. A single computer will not play a big role in providing the services we need, so we need to find ways to connect them together and cooperate with each other, so that we can provide the most convenient and rich services and maximize the value.

Of course, you should understand that simple wire to connect them together, there is no egg, just like two people meet, language differences, each other is also cast pearls to swine. So we invented languages like Chinese and English to work together, so all we had to do was define a common set of rules and protocols for computers to communicate, and so TCP/IP was born. It is important to note that TCP/IP is not a protocol, but a network communication model. It is a family of network transport protocols, including IP, IMCP, TCP, and the main characters in this book, such as HTTP and FTP, POP3, SMTP, etc. It defines how electronic devices are connected to the Internet and how communication data is transmitted between them. It is the basic communication architecture of the entire network. In this way, computer terminals can freely communicate and collaborate with each other.

Now, we know that the whole TCP/IP protocol family contains quite a lot of protocols, which can be divided into four layers according to their functions. Each layer uses the protocols provided by the next layer to fulfill its own requirements, echoing the OSI seven-layer model, as shown in the figure below:

In the diagram you can see some familiar protocols, IP at the network layer, TCP at the transport layer, HTTP at the application layer, and the familiar FTP, DNS, SMTP for E-mail, etc. It can be seen that our daily work is mainly in the application layer, what happened at the bottom of the basic need not worry about.

In addition, the red and blue lines in the figure above briefly show the TCP/IP protocol data flow. We can see when via HTTP to launch a request, the application layer, transport layer, network layer and link layer of the relevant agreements, in turn, has carried on the packaging to the request and carries the corresponding head, finally in the link layer generated Ethernet packets, Ethernet packets through the physical media transmission to the host to the other party and the other host receives the packet, Then layer by layer using the corresponding protocol to unpack, and finally the application layer data to the application program processing.

Network communication is just like delivery. The goods are wrapped by various agreements, which contain the information of the goods, the address of receiving the goods, the recipient, the contact information, and so on. Then, the goods need to be delivered to the delivery car, the delivery station and the Courier, so that the goods can finally reach the hands of the users.

In general, express delivery is not direct, need to be forwarded to the corresponding distribution station, and then by the distribution station to send.

The delivery car is the physical medium, the delivery station is the gateway, the Courier is the router, the delivery address is the IP address, and the contact information is the MAC address.

The Courier is responsible for forwarding the parcel to each distribution station. The distribution station will confirm whether the parcel needs to be further forwarded to other distribution stations according to the provinces and cities in the harvest address. When the parcel arrives at the target distribution station, the distribution station will find the recipient according to the contact information.

With the overall concept in mind, let’s take a detailed look at the division of labor at each level.

1. Link layer

Network communication is to have a certain significance of transmitting data through the physical media, simply send 0 and 1 is meaningless, meaningful data, you need to in bytes for grouping 0 s and 1 s, and to identify good information characteristics of each set of electrical signals, then according to the packet sent in the order. Ethernet defines a group of electrical signals as a packet, a packet is called a frame, and the protocol that makes this rule is the Ethernet protocol.

A complete Ethernet packet looks like this:

The whole data frame consists of header, data, and tail. The header contains 14 bytes, including the destination MAC address, source MAC address, and type. The minimum length of data is 46 bytes and the longest length is 1500 bytes. If the data to be transmitted is very long, it must be divided into multiple frames for transmission. The tail contains four bytes and represents the data frame check sequence, which is used to determine whether packets are damaged during transmission. Thus, the Ethernet protocol groups electrical signals into data frames and then sends the data frames to the receiver over the physical medium. So how does Ethernet identify the receiver?

According to the Ethernet protocol, all devices that access the network must be equipped with network adapters, or network adapters, and packets must be sent from one network adapter to another. The nic address is the sending and receiving address of the packet, which is also the MAC address contained in the frame header. The MAC address is the identity identifier of each nic, just like the ID number on our ID card, which has global uniqueness. The MAC address contains six hexadecimal characters. The first three bytes are the manufacturer number and the last three bytes are the SERIAL number of the network adapter, for example, 4C-0F-6E-12-D2-19

Have the MAC address, Ethernet USES the broadcast form, send the packet to all hosts the subnet, subnet in each host to receive the package later, reads the first ministry of target MAC address, and then were compared with their own MAC address, do the next step if the same processing, if different, will discard the packet.

Therefore, the main work of the link layer is to group electrical signals and form data frames with specific meanings, and then send them to the receiver in the form of broadcast through the physical medium.

2. Network layer

For the above process, a few details are worth thinking about:

  • How does the sender know the MAC address of the receiver?
  • How does a sender know that the receiver and the sender belong to the same subnet?
  • If the recipient and oneself are not on the same subnet, how can packets be sent to each other?

To solve these problems, the network layer introduces three protocols, namely IP protocol, ARP protocol and routing protocol.

TCP/IP protocol

From the previous introduction, we know that THE MAC address is only related to the manufacturer and has nothing to do with the network. Therefore, it is not possible to determine whether two hosts belong to the same subnet based on the MAC address.

Therefore, the network layer introduces the IP protocol, which defines a new set of addresses that enable us to distinguish whether two hosts belong to the same network. This set of addresses is the network address, also known as the IP address.

There are two versions of an IP address, IPv4 and IPv6. IPv4 is a 32-bit address, usually represented by four decimal digits. The IP protocol divides the 32-bit address into two parts, with the front part representing the network address and the back part representing the address of the host in the LAN. For example, the class C address 192.168.24.1 is used. The first 24 bits are the network address, and the last eight bits are the host address. Therefore, if two IP addresses are in the same subnet, the network addresses must be the same. In order to determine the network address in the IP address, IP protocol also introduced the subnet mask, IP address and subnet mask by bit and operation can be obtained after the network address.

Since the IP addresses of the sender AND receiver are known (the protocol of the application layer is passed in), we can determine whether the two IP addresses are on the same subnet by performing AND on the subnet mask.

ARP protocol

Address resolution protocol is a network layer protocol that obtains a MAC address based on an IP address. Here’s how it works:

ARP first will initiate a request packet, the packet’s first contains the IP address of the target host, then the packet can be in the link layer again packing, generate Ethernet packets, finally by Ethernet broadcast to all the host of subnet, each host can receive the packet, and remove the IP address of the head, The packet is then compared with its own IP address. If it is the same, the packet returns its own MAC address. If it is different, the packet is discarded. ARP receives the return message and determines the MAC address of the target machine. At the same time, ARP stores the returned MAC address and corresponding IP address in the local ARP cache for a certain period of time. In the next request, you can directly query the ARP cache to save resources. Enter arp-a in CMD to query the ARP data in the local cache.

Routing protocol

Based on the working principle of ARP, it can be found that THE MAC address of ARP is still limited to the same subnet. Therefore, the routing protocol is introduced at the network layer. First, the IP protocol is used to determine whether the two hosts are on the same subnet. Then the packets are broadcast to the hosts in the subnet. If the subnets are different, the Ethernet forwards the data packet to the gateway of the subnetwork for routing. The gateway is a bridge between subnets on the Internet. Therefore, the gateway forwards the data packet to the subnet where the target IP address resides for several times, obtains the MAC address of the target through ARP, and finally sends the data packet to the receiver in the form of broadcast.

The physical device that completes this routing protocol is the router. In the intricate network world, the router plays the role of the traffic hub. It will select and set the route according to the channel situation, and forward the packets with the best path.

IP packets

Packets wrapped at the network layer are called IP packets. The structure of IPv4 packets is as follows:

An IP packet consists of the header and the data. The header is 20 bytes long and contains the destination IP address and the source IP address. The destination IP address is the clue and basis of the gateway route. The maximum length of a data packet is 65515 bytes. Theoretically, the total length of an IP packet can reach 65535 bytes, while the maximum length of an Ethernet packet is 1500 characters. If the maximum length exceeds this size, the IP packet needs to be divided into multiple frames and sent.

Therefore, the main work of the network layer is to define the network address, distinguish the network segment, address the MAC address within the subnet, and route the packets of different subnets.

3. Transport layer

The link layer defines the host identity (MAC address), and the network layer defines the IP address (network segment) of the host. With the two IP addresses, packets can be sent from one host to another. But packets are actually sent from an application on one host and received by an application on the other host. Each computer may have many applications running at the same time, so when a packet is sent to the host, there is no way to determine which application will receive the packet.

Therefore, the transport layer introduced THE UDP protocol to solve this problem. In order to identify each application, THE UDP protocol defines the port. Each application on the same host needs to specify a unique port number, and the data packets transmitted over the network must be added with the port information. In this way, when the packet reaches the host, the application can be found according to the port number. UDP defined packets are called UDP packets and have the following structure:

A UDP packet consists of the header and the data. The header is 8 bytes long and contains the source port and the destination port. The maximum length of data is 65527 bytes, and the maximum length of the entire data packet is 65535 bytes.

UDP protocol is simple, easy to implement, but it has not confirmed mechanism, packet once issued, cannot know whether the other side received, poor reliability, therefore, in order to solve this problem, improve the network reliability, TCP protocol was born, the TCP transmission control protocol (TCP), is a connection-oriented, reliable, communication protocol based on byte streams. In simple terms, TCP is a UDP protocol with acknowledgement mechanism. Each packet sent requires acknowledgement. If a packet is lost, no acknowledgement is received and the sender must resend the packet.

In order to ensure the reliability of transmission, TCP establishes the three-time dialogue confirmation mechanism on the basis of UDP. That is to say, a reliable connection must be established with the other party before sending and receiving data formally. Since the establishment process is complicated, we will make a vivid description here:

Host A: I want to send the data to you, is that ok?

Host B: Sure. When will you send it?

Host A: I’ll send it right away, you catch it!

After three sessions, host A sends formal data to host B. UDP is A connection-oriented protocol. It does not establish A connection with the other party, but directly sends the data packet to the other party. Therefore, TCP can ensure that packets are not lost in the transmission process, but good things must pay a price. Compared with UDP, TCP implementation process is complex, consumes more connection resources, and the transmission speed is slow.

TCP packets, like UDP packets, consist of the header and the data. The only difference is that there is no limit on the length of A TCP packet. In theory, the length of a TCP packet can be unlimited.

To sum up, the main work of the transport layer is to define the port, identify the application program identity, to achieve port to port communication, TCP protocol can ensure the reliability of data transmission.

4. Application layer

Theoretically, with the support of the above three layer protocols, data can already be transferred from one host application to another host application, but at this time, the data transmitted is byte stream, which can not be well recognized by the program, poor operability. Therefore, the application layer defines various protocols to regulate the data format, such as HTTP, FTP, and SMTP. HTTP is a common application layer protocol, which is mainly used for data communication between B/S architectures. The packet format is as follows:

In Resquest Headers, Accept indicates the format of the data that the client expects to receive, and ContentType indicates the format of the data that the client sends. In Response Headers, ContentType represents the data format of the Response from the server. The format defined here is generally the same as the format defined by Accept in Resquest Headers.

With this specification, when the server receives the request, it can correctly parse the data sent by the client. When the request is processed, it can return the data in the format required by the client. After the client receives the result, it can parse the data in the format returned by the server.

So the main job of the application layer is to define the data format and interpret the data according to the corresponding format.

5. First of all, let’s sort out the responsibilities of each layer of model:

  • Link layer: groups 0 and 1, defines data frames, confirms the physical address of the host, and transmits data.
  • Network layer: Defines IP addresses, confirms the network location of hosts, uses IP addresses for MAC address, and routes and forwards packets from the external network.
  • Transport layer: defines ports, identifies applications on the host, and delivers packets to the corresponding applications.
  • Application layer: Defines data formats and interprets data according to corresponding formats.

Then I can connect the responsibilities of each layer of the model in a simple sentence:

When you enter a web address and press Enter, the application layer protocol first defines the format of the request packet. Then the transport layer protocol adds the port number of the two sides, confirming the application of the communication between the two sides; Then the network protocol adds the IP addresses of both parties to confirm the network location of both parties; Finally, the link layer protocol adds the MAC addresses of the two parties, confirms the physical locations of the two parties, and groups data into data frames. Data frames are broadcast and sent to the host of the other party through the transmission medium. For different network segments, the data packet is first forwarded to the gateway and router. After multiple forwarding, it is finally sent to the target host. After receiving the data packet, the target machine adopts the corresponding protocol to assemble the frame data, and then parses it through layer by layer protocols. Finally, it is parsed by the protocol of the application layer and handed over to the server for processing.