preface


Computer network is a basic course, but what the teacher says is nothing more than a brick to draw jade. However, for those who need to study independently, it is undoubtedly more difficult. There’s a long way to go

The computer network is more boring originally, the article content is more, suggest the reader to read this article patiently, hope everybody can have a harvest after reading. Let’s put up the general structure of the article first.

Preliminary knowledge


Xie Xiren’s “computer network” is a lot of universities to choose the network teaching material, in the first chapter is an introduction, roughly about the development of computer network, it can also be said that everyone must understand the small common sense. Here, I will make a summary summary, it as learning network preparation knowledge.

A Brief History of the Internet

  • The first stage: in the 1950s, the basic research of data communication technology and network theory
  • Phase two: In the 1960s, ARPANET and packet switching technology
  • Phase 3: Standardization of network architecture and protocols in the mid-1970s
  • Stage 4: In the 1990s, the development of Internet, high-speed network, wireless network, mobile Internet and network security technology

Development of the Internet

The development of computer networks has mainly gone through the following seven stages.

  • Batch processing: In order to get more people to use computers, batch processing systems emerged. Batch processing is the process of loading program data into cassette or tape in advance and reading it by the computer in a certain order.

  • Time sharing: After batch processing, time sharing came. It means that multiple terminals are connected to the computer at the same time, allowing multiple users to use the computer at the same time.

  • Computer communication: In time-sharing systems, we see connections between terminals and computers, but this does not mean that computers are connected to each other. With the popularity of the number of computers, more and more attention has been paid to the convenience of data interaction between computers. In the beginning, the process of data interaction between two hosts was quite tedious, so the computer communication technology (between computers by communication lines) came into being. People can easily read the data from another computer in real time, thus greatly reducing the transmission time.

  • The emergence of computer networks: In the 1970s, people began to experiment with computer networks based on packet switching technology, and began to study the technology of communication between computers from different manufacturers. In the 1980s, a network was created that could connect multiple computers. Network communication technology has entered the highway of development.

  • The popularity of the Internet: In the 1990s, the popularity of computers became higher and higher as the price of computers decreased, the performance of computers increased, and various applications emerged. In the face of this trend, manufacturers not only need to ensure that their products are connected, but also focus on making their network technology continuously compatible with Internet technology (TCP/IP).

  • Internet age: With the popularity of the Internet, now, people are more and more inseparable from the Internet. Life, study and work also depend on network information, the Internet of everything has long arrived.

  • Network security: the Internet has brought the world disruptive change, brought great convenience to People’s Daily life, the Internet presents a highly convenient modern information network environment, in front of the state, just as water and electricity gas, became the national indispensable important resources, with all things, of the Internet network security is the most important part of national security. In the early days of the Internet, people were more focused on pure connectivity, on establishing connections without any restrictions. But now, people are no longer satisfied with “simple connection” but more pursuit of “secure connection”.

Network performance indicators

  • Bit: A unit of the amount of data in a computer and the amount of information used in information theory. The English word bit comes from binary digit, which means a “binary number”. Rate in networking technology refers to the rate at which a host connected to a computer network transmits data over a digital channel. It is also known as data rate or bit rate.
  • Bandwidth: In a computer network, the bandwidth is used to describe the ability of the network’s communication lines to transmit data, so the network bandwidth represents the “highest data rate” that can be passed from one point in the network to another in a unit of time. Bandwidth in this sense is measured in bits per second.
  • Throughput: Throughput indicates the amount of data that passes through a network (or channel or interface) in a unit of time. It indicates the capacity of data transmission on the current network.
  • Time delay:
    • 1. Send delay: refers to the time required by the host or router to send a data frame, that is, from the first bit of the data frame sent to the last bit of the frame sent.
    • 2. Propagation delay: refers to the time it takes for electromagnetic waves to propagate a certain distance in the channel.
  • Delay-bandwidth product: The delay-bandwidth product represents the number of bits that can be held by a link. Therefore, the delay-bandwidth product of a link is also called the link length in bits.
  • Round trip time RTT: Round trip time RTT: indicates the total time from the time when the sender sends the data to the time when the sender receives the acknowledgement from the receiver (the receiver sends the acknowledgement immediately after receiving the data). The round-trip time generally includes the various delays of the packet in the network.
  • Utilization: Utilization can be divided into channel utilization and network utilization. Channel utilization indicates what percentage of the time a channel is utilized (with data passing through). A completely idle channel has zero utilization. The network utilization is the weighted average of the channel utilization of the whole network. The higher the channel utilization, the better. This is because, according to the queuing theory, when the utilization of a channel increases, the delay caused by the channel will also increase rapidly. Excessively high channel or network utilization causes very large delay.

A little common sense

Classification of computer networks

In terms of geographical coverage, computer networks can be divided into three parts:

  • Local Area NetWork (LAN), the common office, dormitory or Internet cafe NetWork is within a few meters to 10km LAN. It is characterized by narrow connection range, few users, easy configuration, and high connection rate.

  • A Metropolitan Area NetWork (MAN) is used to connect the Lans of enterprises, institutions, or schools in a city or region to implement resource sharing within the region

  • Wan: Wide Area Network (WAN), also known as remote Network, is a LAN or MAN Network between different cities. Because of the long distance, information attenuation is serious. Therefore, this kind of Network usually needs to rent special lines and connect through special protocols, forming a Network structure. So the connection rate per user is generally low.

The topology of a computer network

  • Bus structure:
    • Advantages: low cost, easy to expand, high utilization of the line;
    • Disadvantages: Low reliability, difficult maintenance, low transmission efficiency.
  • Ring structure:
    • Advantages: token control, no line competition, strong real-time, transmission control easy;
    • Disadvantages: Difficult maintenance and low reliability

  • Star structure
    • Advantages: High reliability, convenient management, easy to expand, high transmission efficiency.
    • Disadvantages: Low line utilization, high reliability and redundancy required for the central node.

What are the structures of computers

There are three different hierarchical models for computer networks:

  • OSI seven-layer model

  • Five layer structure model

  • TCP/IP hierarchical structure model

TCP/IP protocol is the protocol followed by the current Internet, it is not simply composed of TCP or IP, but by the layer of protocols together, constitute we usually talk about the TCP/IP protocol stack. But for better understanding, the rest of the article is written as a five-tier protocol.

The physical layer


Here I would like to give you a suggestion first. When learning computer network, you should not take each network protocol out to study separately, but understand its cause and its role in the whole computer network.

Digital and analog signals

Its function is: shield the difference of different transmission media and communication means. As we all know, there are two kinds of signals in nature, one is digital, the other is analog. So what is the analog signal? What is a digital signal?

To put it plainly, the so-called analog signal is a physical quantity that changes continuously. The characteristic of the analog signal is that the amplitude is continuous (continuous means that an infinite number of values can be taken in a certain value range). An analog signal, whose signal waveform is also continuous in time, is therefore a continuous signal. We sample the continuous signal, we will get the sampling signal, but the abstract signal is discrete (said said to the signal system, it seems that the make-up examination has some influence on me). But the digital signal is different from the analog signal, it is discrete in the time domain, it has two different states of physical quantity, respectively with “0”, “1” to express. It’s like a light switch. There are two different states.

Of course, the digital signal and the analog signal can be converted to each other, the analog signal is usually quantized and converted to the digital signal using the PCM (pulse code modulation) method, the PCM method is to make different range of analog signal corresponding to different binary values. In general, digital signals are phase-shifted to analog signals.

The transport medium of the physical layer

As we all know, the medium of data transmission in the physical layer is different. The one that works in the physical layer is the hub. However, it can be roughly divided into the following two categories:

  • Guided transmission media: there are different categories of guided transmission media, such as coaxial cable, optical cable, twisted pair, which can be further subdivided according to whether shielded or not.

  • Non-guided transmission medium: Non-guided transmission medium refers to the transmission of radio waves in space, using different frequency bands to transmit different signals.

channel

Speaking of channels, the previous basic article mentioned channel utilization, but as for the channel more detailed introduction, not mentioned, now to take a closer look. According to the transmission media can be divided into three categories:

  • Wired channel: Wired channel takes wire as the transmission medium, and signals are transmitted along the wire. The signal energy is concentrated near the wire, so the transmission efficiency is high, but the deployment is not flexible enough. Transmission media used in this type of channel include overhead wires, telephone wires, twisted pair wires, symmetric cables, coaxial cables, etc., as well as optical fibers that transmit modulated optical pulses.
  • Wireless channel: the wireless channel mainly includes the radio channel with radiating radio waves as the transmission mode and the underwater acoustic channel with the propagation of sound waves. Radio signals are radiated across free space from the antenna of the transmitter. Radio waves in different frequency bands travel in different ways.
  • Storage channel: In a sense, data storage media such as tape, optical disc, disk can also be considered a communication channel. The process of writing the data into the storage medium is equivalent to the process of transmitting the signal from the transmitter to the channel, and the process of reading the data from the storage medium is equivalent to the process of receiving the signal from the channel by the receiver.

A channel is a channel for transmitting information. Channel capacity describes the maximum capacity of the channel to transmit information error-free. It can be used to measure the quality of the channel.

As for the channel, there is another important parameter, that is the signal-to-noise ratio. The larger the signal-to-noise ratio is, the larger the capacity of the channel will be. The famous Shannon formula is given here:

Where, C is channel capacity, B is bandwidth, and S/N is signal-to-noise ratio.

Channel multiplexing

We know that when there is no data to transmit, the channel is quite idle. But when there are a lot of data requests on the Internet, such as the last 618, the speed of the message is hindered. So what is channel reuse? Reuse means reuse. Channel multiplexing can be divided into the following aspects:

  • Time division multiplexing: The so-called time division multiplexing, is to do the entire channel into different times. When TDM is used, all users occupy the same bandwidth at different times. TDM may result in wasted line resources

  • Frequency division multiplexing (FDM) : FDM divides signals into different frequencies. When FDM is used, all users occupy different bandwidth resources at the same time. When frequency division multiplexing is used, all users occupy different bandwidth resources at the same time.

  • Statistical TDM: Statistical TDM systems can also be called asynchronous TDM systems. It has a buffer-like mechanism that forwards data when it reaches a certain amount, which greatly improves channel utilization.

Data link layer


Ethernet frame

The data link layer receives IP datagrams from the network layer and encapsulates them so that IP datagrams can be transmitted on the data link layer. When an IP datagram is installed like this, we call it an Ethernet frame or a MAC frame. A MAC frame is made up of the following important parts:

  • Destination MAC Address: The destination MAC address of a MAC frame is 6 bytes long and indicates the address of the destination host.

  • Source MAC address: The source MAC address is the same as the destination MAC address. The source MAC address is the address of the source host.

  • Type: The type occupies two bytes. It records the protocols used by the upper layer. 0X0800 indicates the IP protocol.

  • Data part: The data part is naturally IP datagrams from the upper layer.

  • FCS: The FCS is 4 bytes long and is used for error detection. If a MAC frame has an error, it cannot be sent to the destination host.

Error detection

Why error detection?

No real communication link is ideal. That is, bits can go wrong in the transmission: 1 can go to 0,0 can go to 1, this is called bit error. In a period of time, the ratio of the transmission Error bits to the total number of transmitted bits becomes the Bit Error Rate (BER). The bit error rate (BER) is closely related to the signal to noise ratio (SNR), and it is impossible to reduce ber to zero in actual communication. Therefore, in order to ensure the reliability of data transmission, in the computer network data transmission, must adopt various error detection measures.

MAC frame error will occur in the process of transmission, error is inevitable. In the Previous Section of Ethernet frames, we mentioned the error detection sequence FCS. According to the FCS, we can know whether the MAC frame is wrong or lost during transmission.

We’ll talk about error detection later when we get to the transport layer, so what’s the difference? To sum up, it can be summed up in one sentence:

  • The purpose of error detection at the data link layer is to achieve “extremely error”.
  • The purpose of error detection at the transport layer is to be “transmission error free”. That is to make up for frame loss, frame repetition, frame disorder.

There are two main methods of error detection: parity check (PCC) and cyclic redundancy check (CRC). PCC is very simple and is not the focus of this article. The following is the CYCLIC redundancy check (CRC).

Cyclic redundancy check (CYCLIC redundancy check) is a method that generates a fixed-bit check code based on the transmitted or saved data. It is mainly used to detect or check the errors that may occur after the data is transmitted or saved. The resulting numbers are computed before transmission or storage and appended to the data, which is then checked by the receiver to determine whether the data has changed.

With CRC, we can calculate the REDUNDANCY check code of FCS, which is located at the end of the MAC frame. With FCS, we can tell if the MAC frame was sent incorrectly.

The adapter

Speaking of adapters, in fact, can be completely reminiscent of life adapters. For example, we need a power adapter when charging a mobile phone. The power adapter is nothing more than a conversion function, or as a carrier to realize the transfer of energy. In fact, the adapters in the computer are the same. Consider this picture:

As we all know, data is transmitted in serial through external media, while computers process internal instructions in parallel. How do you convert data from serial to parallel transmission? This is where the adapter comes in. The adapter acts as a bridge through which the data transfer mode can be easily converted.

CAM table

We all know about switches, which are multi-port Bridges that use MAC addresses to forward data at the data link layer. The switch class is not actually stored as a table, called a CAM table. This table records the MAC addresses of hosts and their corresponding interfaces. Look at the following diagram:

There are three hosts A, B, and C connected to the switch. Initially, no information is stored in the CAM.

One day, host A (source MAC address) wants to send A message to host B (destination MAC address). At this time, the switch will check whether host A’s information is stored in its CAM table. When the switch sees that host A’s information is not stored in its CAM table, it will write host A’s information into its CAM table. The switch’s CAM table now looks like this:

At this time, the switch’s CAM table has stored the information of host A, but host A wants to send the information to host B. What was to be done? First, the exchange checks whether B’s information exists in its CAM table. If so, it forwards the information directly to B. What if it doesn’t exist? After some hesitation, the switch had an idea and broadcast the message from host A to host B to all hosts connected to it. Host C also receives the message, but it checks that the destination address is not for host C, and then dismisses the message. Host B receives the message, also checks the recipient (the destination address), finds that the message is for itself, and accepts it. After that, the switch updates its CAM table with the following message:

In this way, the CAM table stores information about host A and host B. The next time host A wants to send A message to host B, the switch does not need to broadcast.

CSMA/CD protocol

The use of CSMA/CD has been fairly minimal so far, and it is used in the following two places:

  • It’s a wired network
  • It is used in 10M/100M half duplex wired network

A CSMA/CD network has the following three features:

  • The network is a bus structure, all computers connected to the same bus, at the same time, only one computer is allowed to send (or receive) messages, that is, using half duplex communication.
  • Carrier sense: Before and during transmission, it is necessary to continuously monitor the channel. It can only send messages when the channel is idle.
  • Collision detection: The host continuously checks the channel before sending messages. If two hosts send messages at the same time, the message transmission stops immediately. Wait a random period of time before sending a message, this is the retreat algorithm.

To add the characteristics of the retreat algorithm:

  • Non-persistent CSMA: the line is busy, wait for a period of time, and then listen; When not busy, send immediately; Reduce conflicts and reduce channel utilization
  • 1 Persistent CSMA: The line is busy. Continue listening. When not busy, send immediately; As channel utilization increases, conflicts increase.
  • P insist on CSMA: the line is busy, continue listening; When not busy, send based on probability P, and continue listening with probability 1-p (p is a specified probability).

The network layer


TCP/IP protocol

An overview of the IP

IP protocol corresponds to AN IP address, so what is an IP address?

Here’s what Wikipedia says:

IP Address (English: IP Address, full name: Internet Protocol Address), also translated as the Internet Protocol Address, Internet Protocol Address. When a device is connected to the network, it is assigned an IP address, which is used as an identifier. IP addresses allow devices to communicate with each other. Without IP addresses, there is no way to know which device is the sender and which is the receiver. [2] IP addresses have two main functions: identifying devices or networks, and Location Addressing.

The above text actually explains two points, summed up as follows:

  • An IP address is used to mark the address of a host. Without an IP address, a host cannot be identified. (Mark host)
  • Because hosts are uniquely tagged, they can be used to find hosts on the network. (address)

Now think about the MAC address, which is a status symbol for a host. The MAC address of a host is unique and cannot be changed. Of course, you can change the MAC address of a host using software, but you must ensure that two hosts on the same LAN cannot have the same MAC address.

So why do you need an IP address when you have a MAC address? Or if you have an IP address, you also need a MAC address?

This is actually a classic question, there are many answers online, here are two recommended articles:

  • Why use MAC addresses when you have IP addresses?
  • Why do you have an IP address when you have a MAC address?

After reading the above two articles, I summarize as follows:

  • Historical reason: Ethernet predates the Internet, and MAC addresses were in use before IP addresses. The two protocols are used together to ensure that existing protocols are not affected
  • Layered implementation: After the network protocol is layered, the implementation of the data link layer does not need to consider the forwarding between data, and the implementation of the network layer does not need to consider the impact of the data link layer.
  • Division of labor: IP addresses change as hosts access different networks, but MAC addresses do not change. In this way, we can use IP addresses for addressing and MAC addresses for data delivery when the datagram is on the same network as the destination host.

The IP datagram

IP data looks like this:

A few important things need to be said:

  • Version: Indicates the VERSION of the IP protocol used by the IP datagram, which occupies four binary digits. At present, the IP protocol with version 4 in the TCP/IP family is mainly used on the Internet.
  • Header length: Takes up 4 bits. This field indicates the length of the entire header (including options) in 32-bit binary numbers. This field allows the receiver to calculate where the header ends and where the data is read. Normal IP datagrams (without any options) have a value of 5 (that is, a length of 20 bytes).
  • Type of service: Type of service (TOS) : specifies the processing mode of the datagram, which occupies eight binary bits.
  • TTL (Time To Live) : Takes up eight binary bits and specifies the maximum Time that a datagram can be transmitted over the network. In practice, the time to live field is set to the maximum number of routers that a datagram can pass through. The initial value of TTL is set by the source host (typically 32, 64, 128, or 256) and is reduced by one once it passes through a router that processes it. When the value of this field is 0, the datagram is discarded and an ICMP packet is sent to notify the source host. This prevents the datagram from being endlessly transmitted when entering a loop.
  • Upper-layer protocol id: Occupies 8 binary bits. The IP protocol can carry various upper-layer protocols. According to the protocol ID, the target end can send the received IP data to the upper-layer protocol, such as TCP or UDP, that processes the packet.

For a more detailed article on IP datagram, see this article: IP Datagram Format In Detail

Subnet mask and IP address

When we talked about what constitutes an IP address, we talked about network numbers. A common IP address consists of a network address and a host address. So what is a network number? The network number is the name of the network in which the computer is currently located. Under this network, there are many hosts. How do you calculate the network number? This is where subnet masks come in handy.

Usually, the IP address and subnet mask of a computer are paired, and the host number and network number can be known by comparing the subnet mask and IP address. To facilitate representation, a subnet mask is usually preceded by consecutive ones and followed by consecutive zeros. Zeros and ones cannot alternate.

Look at the following example.

Now that you know the IP address and subnet mask of host A, convert them to binary form. The network number of the IP address corresponds to the 1 part of the subnet mask in binary, and the host number corresponds to the 0 part of the subnet mask. The picture below is clear:

The ICMP protocol

As we know, IP is an unreliable transport protocol, and TCP is the one that does reliable transport over the network, and we’ll talk about that later when we talk about transport layer. So what does the network layer do if the message does not arrive? At this point, ICMP is used. What is ICMP? ICMP is the Internet Control Message Protocol (ICMP).

Function: Effectively forwards IP datagrams As the data part of IP datagrams, it can be divided into ICMP error packets and ICMP query packets. Error packets are simply used to report errors, but it is the responsibility of the higher level protocol to handle the errors. Also, error packets are always sent to the original data source (this is because the only available IP addresses in ICMP datagram are the source and destination IP addresses), and query packets are always paired.

ARP protocol

While the IP address is used for addressing, the MAC address is used to deliver the datagram when the destination address is on the same network as the datagram. Now we have A problem. Host A sends A message to host B. The message is forwarded through A series of routes until the IP address of host B is found. However, as we all know, data transmission at the link layer requires MAC addresses, and communication cannot be carried out only knowing B’s IP address. Take a look at this:

This is where ARP comes in handy. The full name of ARP is the Address Resolution Protocol. Its basic function is to query the MAC Address of the target device through the IP Address of the target device to ensure smooth communication. It is an essential Protocol in the IPv4 network layer.

Just as switches work at the data link layer, routers work at the network layer. Switches have CAM tables and routers have routing tables.

Now, in order for the router to send a message to host B, it needs to know host B’s MAC address to communicate. At this time, the router sends an ARP request, which is sent in the form of a broadcast. Each host connected to the router receives this message. However, only host B can check that its IP address meets the requirements. Host B then sends an ARP response to the router, informing the router of its MAC address. As shown below:

Each time the router sends an ARP request, it adds a piece of data, which records the MAC address corresponding to the IP address. In this way, the router does not need to broadcast the next message to the host. Of course, just as the data in the CAM table of the switch has a lifetime, the data in the routing table also has a lifetime. Imagine if the data was always there, the router would have to spend a lot of storage space to cache the data that has already been deactivated.

Internal Gateway Protocol

There are two types of routing protocols on the Internet: RIP and OSPF. These two protocols are described in detail below.

First, introduce RIP:

  • Routing Information Protocol (RIP) is the first widely used internal gateway protocol IGP. RIP is a distributed routing protocol based on the distance vector. It is a standard protocol on the Internet. The biggest advantage of RIP is simple implementation and low cost.
  • Basic algorithm: Vector distance algorithm (v-D algorithm for short) the idea is: the gateway periodically broadcast path refresh message, the main content is composed of a number of (V, D) pairs of sequence table; (V, D) In the sequence couple, V stands for “vector”, which identifies the destination (gateway or host) that the gateway can reach; D stands for distance, indicating the distance from the gateway to the destination V; The distance D is measured by the number of postings. After receiving (V, D) packets from a certain gateway, other gateways refresh their routing tables based on the shortest path principle.
  • It is only suitable for small networks (15 hops is the limit), if the network is too large, when the network fails, it will take a long time to transmit the information to all the routers.

What is OSPF?

  • Basic definition: Open Shortest Path First (OSPF) is an Interior Gateway Protocol (IGP) used to determine routes within a single autonomous system (AS).
  • Basic algorithm: Dix plus algorithm. To establish a neighbor relationship, send a HELLO packet to the neighbor and select a DR.

Reference article: Computer network principles of RIP and OSPF comparison

NAT protocol

NAT technology is very simple, so what does NAT do?

Network Address Translation (NAT) was proposed in 1994. The NAT method can be used when a host on a private network has been assigned a local IP address (that is, a private address used only in the private network) but now wants to communicate with the host on the Internet (without encryption).

This approach requires NAT software to be installed on the router whose private network is connected to the Internet. A router with NAT software is called a NAT router and has at least one valid external global IP address. In this way, all hosts using local addresses must translate their local addresses into global IP addresses on the NAT router when communicating with the outside world, so that they can connect to the Internet. In addition, using a small number of public IP addresses to represent a larger number of private IP addresses will help slow down the exhaustion of the available IP address space.

Simply put, NAT technology is a protocol to realize the communication between LAN and Internet. There are three different types of NAT:

  • Static NAT: Static NAT is the simplest to set up and easiest to implement. Each host on the internal network is permanently mapped to a valid address on the external network. Static NAT is implemented when an internal host must be accessed as a fixed external address.

  • Dynamic ADDRESS NAT(Pooled NAT) : In dynamic NAT mode, a series of valid IP addresses (address pools) are defined on an external network and mapped to the internal network through dynamic allocation. The working process of dynamic NAT is as follows: When an internal host needs to access the Internet, an available IP address is selected from the public IP address pool and assigned to the host. After the communication is complete, the obtained public IP address is released back to the address pool. When an external public IP address is assigned to one internal host for communication, it cannot be assigned to another internal host.

  • Network Address Port translation (NAPT) : NAPT maps an internal address to different ports of an IP address on the external network. Network Address Port Translation (NAPT), which maps multiple internal addresses to a valid public Address, uses different protocol Port numbers to correspond to different internal addresses. This is the translation between < internal address + internal port > and < external address + external port >.

NAT: Network address translation

IPV6

The IP address we talked about earlier is actually IPV4, so why do you want an IPV4 when you already have IPIV4? It turns out that back in the last century, people anticipated the day when IPV4 addresses would run out, and the development of IPV6 was started to solve the problem.

IPv6 (IP Version 6) is a standardized Internet protocol to solve the problem of IPv4 address exhaustion. An IPv4 address contains four 8-bit bytes, that is, 32 bits. The length of an IPv6 address is four times that of the original address, or 128 bits, which is generally written as eight 16-bit bytes. As you can see, the number of IPV6 addresses is inexhaustible, so why not switch from IPV4 to IPV6 altogether?

Switching from IPV4 to IPV6 is time-consuming and requires resetting the IP addresses of all hosts and routers on the network. When the Internet becomes widespread, replacing all IP addresses will be even more difficult.

In existing networks, there are both IPV4 and IPV6, so how do they communicate? There are two technologies: dual protocol stack and tunneling, which are described as follows:

  • Dual protocol stack: When the header of an IP address is changed, some information about the IPV6 header is lost during the header conversion, which is inevitable.
  • Tunneling technology: What is tunneling technology? In fact, it can be understood literally. I’m going to draw it again to help you understand it. In plain English, tunnel technology means that data is encapsulated and decapsulated in another way during transmission. As shown in the figure, data enters an IPV4 network from an IPV6 network, and IPV6 packets need to be encapsulated in IPV4 packets.

The transport layer


Stop waiting protocol

What is the stop waiting agreement? Look at the picture below and you might get the idea

A stop-wait agreement can consist of the following three parts:

  • Error-free situation: In order to ensure error-free situation, host A must receive A reply from Host B in order to continue sending A message to host B.
  • Error: If there is an error, such as host A never receiving A reply from host B, there is A mechanism for host A to send the message to host B again. This involves the selection of A retransmission time, which should not be less than the RTT (the sum of the time that host A sends A message to host B, and host B sends A message to host A).
  • Confirm lost and confirm late: Confirm late and confirm lost, take a look at the chart below

Data may be lost or late during transmission. The lost data is retransmitted, and the late data is not processed. Speaking of stopping waiting for agreements, I have to add the ARQ agreement. What is the ARQ protocol?

The ARQ protocol is that the sender does not need to receive the acknowledgement of the previous message, and can send multiple packets at a time, so as to improve the utilization of the channel, can transmit enough data in a certain time.

UDP

UDP protocol and TCP protocol is relatively simple, the transport layer is also the focus of TCP protocol. Let’s start with a brief explanation of UDP.

UDP has the following features:

  • Unreliable transport for connectionless protocols
  • Datagrams oriented
  • There is no congestion control
  • UDP datagram headers have low overhead
  • Support one-to-one, one-to-many, many-to-many, many-to-one data transmission

TCP

Summary of TCP

TCP is another transport layer protocol. It has the following characteristics:

  • TCP is a connection-oriented transport layer protocol
  • Provide reliable delivery
  • Use full-duplex communication
  • Byte oriented stream

TCP datagram

Take a look at the image below (from the Internet).

Some fields of the datagram are:

  • Source Port: The port number of the sending host
  • Destination Port: indicates the port number of the receiving host
  • Serial number: The sequential number of each byte in the byte stream transmitted in a TCP connection. The start ordinal of the byte stream must be set when the connection is established. The serial number field value in the header of a TCP datagram refers to the serial number of the first byte of the data sent in this packet.
  • Acknowledgement number: indicates the sequence number of the first data byte of the next packet segment to be received from the peer. If the confirmation number is N, it indicates that all data up to the number n-1 have been received correctly.
  • Data offset: It indicates how far the data start of the TCP segment is from the start of the TCP segment.
  • Window: The window field specifies the amount of data that the other party is now allowed to send. The window value often changes dynamically. The window refers to the receiving window of the party sending the text (not its own sending window).
  • Checksum: The scope of the checksum field includes the header and the data. To calculate the checksum, a 12-byte pseudo-header (the same as UDP) is added before the TCP segment.
  • Acknowledge ACK: The acknowledge number field is valid only if ACK=1. When ACK=0, the confirmation number is invalid. TCP states that all segments sent after a connection is established must have an ACK of 1.
  • PUSH PUSH: When two application processes are communicating interactively, sometimes an application process on one end wants to receive a response immediately after typing a command, rather than waiting for the entire cache to fill up and then delivering it up. At this point, the sender TCP sets PSH to 1 and immediately creates a segment to send. After receiving the packet segment with PSH=1, the TCP receiver delivers the packet to the receiving application process as soon as possible (that is, “push” forward).
  • Reset RST: When RST=1, it indicates that there has been a serious error in the TCP connection (such as due to a host crash or other reason) and that the connection must be released before the transport connection is re-established.
  • SNY: synchronization sequence number used during connection establishment when SYN=1 and ACK=0 indicate that this is a connection request segment.
  • FIN: Used to release a connection.

The sliding window

TCP transmits data. In order to improve the efficiency of data transmission, a mechanism called sliding window is adopted to transmit data.

The following is a schematic diagram of the sliding window on the sender. The size of the sliding window is the sequence length of the green part and the red part. The mechanism by which it works is that once the sender receives an acknowledgement, the sliding window moves to the right.

Flow control

Flow control can be summarized in a short sentence.

The receiver sends a negative feedback to the sender, which controls the size of the sliding window on the sender.

Below can have a look at the knowledge on how to say, I found a talk of the most image, can be combined with understanding.

Zhihu: How does the SLIDING window of TCP protocol control the flow?

Congestion control

  • Slow start: Slow start is worthwhile because when a TCP connection is first established, instead of sending a large amount of data at once and causing network congestion to surge, the network congestion window gradually increases from small to large based on feedback.

  • Congestion avoidance: Congestion avoidance is to allow the sliding window to grow slowly, rather than multiply as slow start.

  • Fast retransmission: If the sender receives three consecutive double acknowledgements, the sender should immediately retransmit the unreceived message segments without waiting for the expiration of the retransmission timer.

  • Quick recovery: Quick recovery has the following two features

    • When the sender receives three consecutive double acknowledgements, a “multiplication-reduction” algorithm is performed to halve the slow start threshold. This is to prevent network congestion. Note that the slow start algorithm is not executed next.
    • When the fast recovery algorithm is implemented, the value of the sliding window is changed, and then the congestion avoidance algorithm is implemented, which makes the congestion window slowly increase.

Three-way handshake

Three handshakes and four waves are common interview topics, but before introducing the three handshakes, I think it’s important to understand the common characteristics of ideal transmission conditions:

  • The transmission channel does not generate errors
  • No matter how fast the sender sends the data, the receiver always receives the data in time.

An ideal situation is an ideal situation, and neither of the above is possible in the real world. So, let’s talk a little bit about how to make our reality a little bit closer to the ideal, and those are the three handshakes that we’re going to talk about.

First, the three-way handshake and the four-way wave are for TCP. UDP is a connectionless protocol, where three-way handshakes and four-way waves are impossible. The three handshake and four wave are used for reliable transmission. Let’s first look at the flow chart of the three handshake.

Since it is for reliable transmission, it is nothing more than to ensure the normal transmission and reception of data between the client and the server.

  • First handshake: The Client cannot confirm anything; The Server confirms that the Client is sending properly.
  • The second handshake: The Client confirms that its own sending and receiving are normal, and the other party’s sending and receiving are normal. The Server confirms that it receives properly and the Client sends properly.
  • Third handshake: The Client confirms that its own sending and receiving are normal, and the other party’s sending and receiving are normal. The Server confirms that its own sending and receiving are normal, and the peer party’s sending and receiving are normal.

Why is the third handshake needed? In short, the main purpose is to prevent the connection request packet that has failed to be sent to the server suddenly, resulting in error.

Through the above three steps, Client and Server can carry out reliable transmission, is indispensable.

Four times to wave

Now that you understand the three handshakes, it must be easy to understand the four waves. First, I attach the flowchart.

Like the three waves, the four waves are for reliable transmission. Four waves is the process of disconnecting the Client and Server, so you might think that the process of establishing the connection takes three times and you can understand why it takes four times to disconnect. Isn’t once or twice enough?

Well, since the three-way handshake is confirmed by the sender and receiver, the four-way wave is also confirmed by the sender and receiver.

  • First wave: The Client sends a disconnected request to the Server.
  • Second wave: The Server sends an acknowledgement of disconnection to the Client. After the Client receives the packet, TCP enters the half-connection state and the channel for sending data from the Client to the Server is closed.
  • Third wave: The Server sends a disconnection request to the Client.
  • Fourth wave: The Client sends an acknowledgement of disconnection to the Server. After the Server receives the packet, the TCP connection is completely disconnected.

Another way to think about it is, the above problem. Suppose that during the second wave, the Server sends an ACK request to the Client as well as a FIN request. Then, if the Server is still receiving data from the Client, it will close the receiving channel due to the Client’s next ACK, and the data will fail to receive as shown in the figure below.

Here is an article to help you better understand the process of establishing and disconnecting A TCP connection: Two GiFs – Fully understand TCP’s three handshake and four wave

Application scenarios of TCP and UDP

As for the relationship between TCP and UDP, look at this diagram (image from the web) and you may understand:

TCP is a reliable transmission and UDP is an unreliable transmission, so why do we need to use unreliable UDP for data transmission?

As we know, UDP does not need to establish a connection before transmitting data, and the remote host does not need to give any acknowledgement after receiving a UDP packet. While UDP does not provide reliable delivery, there are some situations where UDP is the most efficient way to work (typically for instant messaging), such as QQ voice, QQ video, live streaming, and so on.

TCP provides connection-oriented services. A connection must be established before data transfer and released after data transfer. TCP does not provide broadcast or multicast services. Because TCP to provide a reliable, connection-oriented transport service (TCP and reliable in TCP before passing data, there will be three times handshake to establish a connection, and in data transmission, are confirmed, the window, the retransmission, the congestion control mechanism, in after the data transfer, disconnected will also be used to save system resources), the hard to avoid increased a lot of overhead, Such as validation, flow control, timers, and connection management. This not only makes the header of the protocol data unit much larger, but also consumes a lot of processor resources. TCP is used for file transfer, mail sending and receiving, and remote login.

The application layer


The HTTP protocol

For a definition of HTTP, check out wikipedia:

HTTP is a standard for requests and responses between a client (user) and a server (web site), usually using the TCP protocol. Using a web browser, web crawler, or other tool, the client initiates an HTTP request to the specified port on the server (default port is 80). We call this client a user agent. The responding server stores resources such as HTML files and images. We call this reply server the Origin server.

The HTTP protocol, which is now widely used on the World Wide Web, will be discussed in a separate article later, but for now IT’s time to talk about HTTPS.

HTTP and HTTPS are the same protocol, but HTTPS is encapsulated by SSL (Secure Socket Layer) or TLS (Transport Layer Security). From these two protocols alone, you know that HTTPS is secure and HTTP is not.

The FTP protocol

File Transfer Protocol (FTP) is an application-layer Protocol in the TCP/IP Protocol family. Running on top of TCP, FTP is a reliable Transfer Protocol and is used to distribute and share files among users. The FTP function is used when the network administrator upgrades the device version, downloads logs, and saves configurations.

DNS protocol

As mentioned above, IP address is used to locate the host, but it is difficult to remember these irregular IP addresses in our life, we only know the domain name of the website. So what do we do now?

Hence the DNS protocol.

DNS is the domain name resolution protocol. If we know the domain name but do not know the IP address of the server, we need to use DNS protocol.

DHCP protocol

What is DHCP protocol? Look at the definition on the Wiki

Dynamic Host Configuration Protocol (DHCP) is a communication protocol that enables network administrators to centrally manage and automatically assign IP network addresses. On an IP network, each device that connects to the Internet needs to be assigned a unique IP address. DHCP enables network administrators to monitor and assign IP addresses from central nodes. When a computer moves to another location on the network, it automatically receives a new IP address.

As the wiki explains very clearly, THE role of DHCP is to dynamically assign IP addresses to hosts, greatly reducing the workload of network administrators.