preface
- This is a series of topics on computer networks that will continue to be updated.
- Learning materials come fromComputer Networks (7th edition) and Illustrated HTTP.
- This article mainly shares several protocols below the application layer: DNS domain name resolution; FTP file transfer protocol; DHCP Dynamic Host configuration protocol; SMTP, POP3 protocol for email.
- The HTTP protocol, as a top priority, will be shared in a separate section.
The application layer
Each application layer protocol is designed to solve a certain kind of application problem, which must be solved through the communication and cooperation between multiple application processes located in different hosts. The specific content of the application layer is precisely to define these communication rules. Specifically, the application layer protocol should define:
- Type of packets exchanged by application processes, such as request packets and response packets.
- Syntax of each packet type, such as each field in the packet and detailed description.
- The semantics of a field is the meaning of the information contained in the field.
- When and how a process sends messages, and the rules for responding to messages.
Domain name System DNS
An overview of the
What is a DNS?
The Domain name System (DNS) is the naming system used on the Internet to replace easy-to-use machine names with IP addresses. The domain name system is the name system. Why not “name” instead of “domain name”? Because there are a lot of “fields” used in the Naming system of the Internet.
Why do users use domain names to access?
When communicating with a host on the Internet, a user must know the IP address of the host. However, users have a hard time remembering 32-bit binary host addresses. The Domain name System (DNS) translates host names on the Internet into IP addresses.
Why do machines use IP addresses instead of domain names when processing IP data reports?
Because IP addresses are fixed 32 bits long (or 128 bits for IPv6), but domain names are not fixed, it is difficult for machines to process.
The structure of the DNS
The Internet uses a hierarchical tree naming system and a distributed domain name system (DNS).
DNS, the domain name system of the Internet, is designed as an online distributed database system and adopts the client server mode. DNS allows most names to be resolved locally, and only a small amount of resolution needs to be communicated over the Internet, so NDS systems are efficient. Because DNS is a distributed system, the failure of a single computer does not affect the normal operation of the entire DNS system.
Domain name to IP address resolution is done by many DNS programs distributed across the Internet. DNS programs run on a dedicated node, and the machine on which they run is often referred to as a DNS server.
The parsing process
Process is as follows: when an application needs to put the hostname resolves to an IP address, the application of process analytical program is invoked, and become a client of DNS, put to parse domain DNS request packet, UDP user datagram way to local domain name parser, local domain name server lookup, it returns the IP address of the corresponding. Application processes can communicate with each other after obtaining the DESTINATION IP address. If the local DNS server cannot resolve the problem, search for the DNS server at the next layer until the DNS server is found.
Domain structure
In the early days, we used non-hierarchical namespaces because there were few users. However, with the rapid increase of users, hierarchical tree structure naming method was adopted.
In mail.cctv.com, com is the top-level domain name, CCTV is the second-level domain name,mail is the third-level domain name. The lowest level is at the far left, while the highest level is at the far right.
DNS does not specify how many sub-domains a domain name must contain, nor what each level of domain name stands for. Each level of domain name is managed by the domain name authority above it, and the top-level domain name is managed by ICANN. This approach makes each domain name unique throughout the Internet and makes it easy to devise a mechanism for finding domain names.
- General top-level domain: com(corporate) net(web service organization) org(non-profit organization) INT (international organization) EDU (American special educational institution) Gov (American government institution) MIL (American military institution) and so on a total of 20.
- Country top-level domain: CN (China) US (United States) and so on
- Reverse domain name: ARPA, used for reverse lookup
Domain name server
In theory, each level of domain name can have a corresponding DNS server, but this will cause too many servers and reduce efficiency. So zoning is used to solve this problem.
The area that a server is responsible for is called an area. A domain name server is configured for each area to store the mapping of all domain names to IP addresses in the area.
The jurisdiction of the DNS server is not in the unit of domain, but in the unit of zone. An area is the actual area managed by the server. An area may be equal to or smaller than the domain, but must not be larger than the domain. There may be one or more extents under a field.
Classification of domain name servers
- Root DNS server. The highest level, most important domain name server.
- Top-level domain name server. Something like com, CN, gov.
- Permission domain name server. Responsible for a domain name server.
- Local domain name server. When a host sends a DNS query request, the query request packet is sent to the local DNS server. Every ISP, or university, can have a local domain name server. The local DNS server is located within the distance of several routers.When the host to be queried also belongs to the same ISP, the local DNS server can immediately convert the host name to its IP address without asking other DNS servers.
Secondary domain name server
To improve the reliability of DNS servers, DNS servers copy data to several DNS servers for storage, one of which is the primary DNS server and the other is the secondary DNS server. When the primary DNS server fails, the secondary DNS server can ensure that the work is not interrupted.
Domain name query in two ways
-
Recursive query
Suppose host A wants to access host B. Host after A type B domain, first to the local domain name server to query, query, the identity of the local domain name server in the DNS client on behalf of the host A visit root name servers, if still can’t query, root name servers do the same thing to the top-level domain name server level below query, until to get the IP address of host B. And then it comes back and gives the host A.
-
Iterative query
When receiving a request packet from the local DNS server, the root DNS server either provides an IP address or tells the local DNS server which DNS server to query next until the DNS server provides an IP address.
Note: The host queries the local DNS server recursively. The local DNS server uses iterative query to query the root DNS server.
The cache
- In order to improve DNS query efficiency, lighten the load of the root DNS server and reduce the number of DNS query packets on the Internet, DNS servers widely use cache. The cache is used to hold a record of the most recently queried domain name and where to get the domain name mapping information.
- Many hosts download their entire database of names and addresses from the local DNS server at startup, maintain a cache of their most recently used domain names, and use the DNS server only when names are not found in the cache.
This section describes the DNS resolution process
- The browser enters an address and presses Enter.
- If the address information is incomplete (the full format should be www.baidu.com::8080), then modern browsers will automatically complete the protocol number and port number for you.
- The browser gets the full address and starts to parse it, getting the protocol to change the address, the level of domain name, port, and path.
- Compare the cache stored locally, and if there is one, get the IP address directly.
- If no, the system sends a query request to the local DNS server. (Recursive query)
- If the local DNS server does not, the local DNS server directly sends a query request to the root DNS server (the topmost level). In this case, the root DNS server either gives the IP address or tells the local DNS server which DNS server to query next until the IP address is given. (Iterated query)
File transfer protocol FTP
File Transfer protocol FTP is the most widely used file transfer protocol on the Internet.
- FTP provides interactive access
- Allows customers to specify the type and format of the file
- Allow access to files (users who access files must be authorized and enter a valid password)
- FTP hides the details of individual computer systems, making it suitable for transferring files between any computer on a heterogeneous network.
The characteristics of
- FTP based on TCP and TFTP based on UDP are two types of file sharing protocols, that is, copying entire files. To access a file, you must first obtain a local copy of the file. If you want to modify a file, you can only modify the copy of the file and then pass the modified copy back to the original node.
- Online access. Allows multiple programs to access a file simultaneously.
Basic working principles of FTP
Transferring files between two hosts may seem simple, but it is often difficult.
The reason is that many computer manufacturers have developed hundreds of file systems that vary widely.
The questions that are often asked are:
- Computers store data in different formats.
- The file directory structure and file naming are different.
- Operating systems use different commands for the same file access function.
- Access control methods are different.
File transfer protocol FTP only provides some basic file transfer services. Its main function is to reduce or eliminate file incompatibility in different operating systems.
FTP protocol is divided into client and server. An FTP server can provide services for multiple clients.
There are two types of servers: master processes and slave processes. The main process is responsible for accepting new requests, and several subordinate processes are responsible for handling individual processes.
Both the client and server have two slave processes: the control process and the data transfer process. The control connection remains open for the entire session. The data connection is used to transfer files, and the data transfer connection is closed upon completion.
Summary of the FTP file transfer process
- The client sends the request and the server provides the file.
- Both the client and server have two slave processes: the control process and the data transfer process.
- The control process remains open for the entire session, indicating that it is always connected, sort of like circuit switching.
- The data connection is used to transfer files, and the data transfer connection is closed upon completion.
TFTP Simple file transfer protocol
It is a small and easy to implement file transfer protocol.
Advantages: Use UDP datagrams. TFTP code takes up less memory.
TFTP only supports file transfer, not interaction.
The sender: sends the data PDU repeatedly if it cannot receive confirmation within the specified time.
Downloading files through FTP (using a browser)
- Enter address: Protocol + host NUMBER. So it’s in the form “ftp://127.0.0.1”.
- It then gives you the username and password to log in to your host.
- Then enter the FTP server, select the file to download, save it.
- When you use FTP to download files, the browser (FTP client) is opened, and the FTP server is also opened for the client to download.
An overview of the
In 1982, ARPANET’s E-mail was introduced.
The two most important standards for E-mail are simple Mail Transfer protocol SMTP and Internet Text Message Format [RFC 5322].
Since SMTP over the Internet can only transmit printable 7-bit ASCII mail, the universal Internet Mail extension MIME was proposed in 1993. MIME specifies the data type of the message (text, sound, image, etc.) in the header of the message. Multiple types of data can be transmitted simultaneously in MIME mail.
E-mail system composition
An E-mail system should be built with three main components: user agent, mail server, mail sending protocol and mail reading protocol (such as POP3).
Sender User Agent (SMTP client) -> Sending mail SMTP -> Sending Mail Server (SMTP server, SMTP client) -> Sending mail SMTP -> Receiving Mail server (SMTP server, POP3 server) -> Read mail POP3 -> Recipient user agent (POP3 client)
The user agent
The UA is the interface between the user and the E-mail system, and in most cases it is just a program running on the user’s computer. So the user agent is also called E-mail client software. The user agent provides a very friendly interface (mainly a window interface) to send and receive mail. Microsoft Outlook and Zhang Xiaolong’s Foxmail are popular email user agents.
Mail server
There are many mail servers on the Internet for users to choose from. The mail server runs 24 hours a day and has a large capacity of mail boxes.
The function of a mail server is to send and receive messages, and to report to the sender the results of the message delivery (sent, rejected, lost, etc.).
The mail server uses two different protocols: SMTP for sending mail; POP3 is used by user agents to read messages from mail servers.
The mail server is both a client and a server. Server A sends emails to server B. Server A is the SMTP client and server B is the SMTP server. And vice versa.
Email in the TCP/IP system The email address format is as follows: Username@Domain name of the mail server.
SMTP Simple mail transfer protocol
SMTP specifies how information should be exchanged between two SMTP processes that communicate with each other. SMTP does not specify the format of the message content, how the message is stored, or how fast the mail system should send the message. SMTP communication has three stages: establish a connection -> mail transfer -> release the connection. I won’t go into the details.
POP3 mail read protocol
The Post Office protocol POP is a very simple but limited mail reading protocol that has been updated several times and now uses the 1996 version POP3, which has become the official standard for the Internet. Another mail reading protocol is IMAP.
Finally, do not confuse the mail read protocol POP3 or IMAP with the mail transfer protocol SMTP. The user agent of the sender sends mails to the sender mail server and the sender mail server sends mails to the receiver mail server using SMTP. POP3 is used only when the user agent reads messages from the recipient mail server.
Dynamic Host configuration protocol DHCP
background
The following items need to be configured for the protocol software of computers connected to the Internet:
- The IP address
- Subnet mask
- IP address of the default router
- IP address of the DNS server
In order to save the trouble of assigning IP addresses to computers, can you assign a unique IP address to a computer in advance during computer production? (As each Ethernet adapter has a unique hardware address)
Why is it so much trouble to go online? You don’t have to configure it all at once and you can stay online forever.
This is clearly not going to work. This is because the IP address includes not only the host number but also the network number.
An IP address indicates which network a computer is connected to. When a computer is in production, there is no way to know which network it will be connected to when it leaves the factory. Therefore, a computer that needs to be connected to the Internet must configure protocols for items such as IP addresses.
Can it be manually configured?
Manual configuration is inconvenient and error prone.
Dynamic IP
Dynamic IP addresses are allocated when needed. The so-called dynamic means that every time you go online, telecom will randomly assign an IP address.
As IP address resources are very precious, most users use dynamic IP addresses to access the Internet. For example, computers that access the Internet through Modem, ISDN, ADSL, wired broadband, and cell broadband are temporarily assigned an IP address each time they access the Internet.
An IP address is a 32-bit binary number of addresses. Theoretically, there are about 4 billion (2 ^ 32) possible combinations of addresses, which seems like a lot of address space. In fact, according to the rule of different bits of network ID and host ID, IP addresses can be divided into A (7-bit network ID and 24-bit host ID), B (14-bit network ID and 16-bit host ID), and C (21-bit network ID and 8-bit host ID). Due to historical reasons and differences in technical development, Class A and CLASS B addresses are almost exhausted, and only Class C addresses can be allocated by organizations around the world. Therefore, IP address is a very important network resource.
For an organization that has set up Internet services, a fixed IP address is usually disclosed to the public because its host has opened access services such as WWW, FTP, E-mail, etc., to facilitate user access. Of course, digital IP is hard to remember and recognize, and people are more used to accessing hosts through domain names, which actually still need to be translated into IP addresses by domain name servers (DNS). For example, your home page address, which users can easily remember and use, will be translated by the DNS server as 101.12.123.234, which is your real address on the web.
However, for most dial-up Internet users, it is not advisable to assign a fixed IP address (static IP address) to each user due to the discrete time and space of Internet access, which will cause a great waste of IP address resources. Therefore, these users will automatically get a dynamic IP address every time they make a call to the ISP’s host. Of course, the IP address is not arbitrary, but an address in the valid range of the NETWORK ID and host ID applied by the ISP. The IP address of the dial-up user may be different for any two connections, but the IP address does not change during each connection.
Static IP
A static IP address (also called a fixed IP address) is an IP address assigned to a computer or network device for a long time. Generally speaking, special servers or computers with dedicated Internet access have fixed IP addresses and are expensive.
Static IP is an IP address segment that can directly access the Internet. When the ISP is installed, an IP address will be assigned to you, so that the computer will not automatically obtain the network address when connecting to the network, avoiding the trouble of network connection. Broadband operators will provide users with one IP address, subnet mask, gateway and DNS server address. Without the use of a router, you just need to connect the network cable to the computer and manually set the IP address on the computer so that the computer can access the Internet. Static IP addresses do not change and are mainly used for web applications or services over the Internet. Some gamers and VOIP users also tend to prefer static IP addresses because communication is easier.
A dynamic IP address is opposite to a static IP address. One: in order to save lP resources, through telephone dialing, ADSL virtual dialing and other ways to access the Internet machine is not assigned fixed IP address. It is dynamically and temporarily allocated by ISP to improve THE utilization of lP address. Second: in the LAN for the client to set up simple, also often use dynamic allocation IP address, which means that each time you connect to the Internet to get lP address is different. Although this does not affect your access to the Internet, your friends and users cannot access you. Because, they don’t know where your computer is. It’s like everyone has a phone, but your number changes every day.
Static and dynamic IP addresses appear because IP addresses are not enough. There are too many people who need to be online, but the technology doesn’t exist for everyone to be online at the same time.
DHCP
DHCP (Dynamic Host Configuration Protocol) is a LAN network protocol. The lP address range is controlled by the server. When a client logs in to the server, the client automatically obtains the lP address and subnet mask assigned by the server. By default, DHCP is not automatically installed by the system as a service component of the Windows Server. You need to manually install DHCP and perform necessary configurations.
It provides a mechanism called plug and play networking. This mechanism allows a computer to join a new network and obtain an IP address without having to participate manually.
Host networking process
We know that in order to access the Internet, the host must have an IP address.
When a host that requires an IP address starts up, it broadcasts discovery packets to the DHCP server (with destination IP addresses set to all ones, that is, 255.255.255.255). In this case, the host becomes a DHCP client.
Broadcast packets are sent because the location of the DHCP server is not known. Therefore, discovery packets are sent. The host does not yet have an IP address, so the source IP address of the IP datagram is set to all zeros. In this way, all hosts on the local network can receive the broadcast message, but only the DHCP server responds to the broadcast: the DHCP server looks for the configuration information of the computer in its database, and returns the found information if it finds it. If not, an address is taken from the server’s IP address pool and assigned to the computer. The reply packet sent by a DHCP server is called an offer packet, which provides configuration information such as an IP address.
However, we do not want to have a DHCP server on every network, which would cause too many DHCP servers. Therefore, each network now has at least one DHCP relay agent (usually a router) configured with the IP address information of the DHCP server. After receiving the discovery packet from host A, the intermediate proxy forwards the packet to the DHCP server in unicast mode and waits for the reply. After receiving A reply from the DHCP server, the relay agent sends the offer packet back to host A.
The IP address assigned by the DHCP server to a DHCP client is temporary. Therefore, the DHCP client can use the assigned IP address only for a period of time. The DHCP protocol is called the lease period, but it does not specify how long the lease period should be or at least how long. The DHCP server determines the lease period.
Simple summary
- Request the lease
- Provide lease (all DHCP servers that receive request messages respond)
- Select an IP lease (Multiple DHCP servers may receive discovery packets, so select this option)
- Confirm the IP lease (the DHCP server sends an acknowledgement packet)
- After half of the time, renew the lease
- If the DHCP server does not agree, stop using the original IP address, and the client needs to send discovery packets again.
DHCP is well suited to computers that move around a lot. On the Windows operating system, choose Control Panel/Network, locate the menu under a connected network, find TCP/IP, and click the Properties button. If you select Automatically obtain IP address and automatically Obtain DNS server address, DHCP is used.
P2P applications
An overview of the
P2P applications refer to network applications with P2P architecture, without fixed servers, and most interactions are carried out in P2P mode.
P2P file distribution does not require the use of a centralized media server, and all audio/video files are transmitted between ordinary Internet users.
This is essentially the equivalent of having a number of scattered media servers (which are acted as media servers by ordinary users’ computers) providing other users with audio/video files to download.
This P2P file distribution method solves the bottleneck problem that can occur in centralized media server.
P2P working mode with centralized directory server
The first to use P2P was Napster, a software program written by an American college student in 1999. You can download all kinds of MP3 music for free with this software.
Napster made MP3 the standard for Internet music facts.
Napster is able to search music files and provide retrieval capabilities. The index information of all music files is stored centrally in the Napster directory server, which functions as an index server. Users can search the directory server to know where to download MP3 files.
How Napster works
- All users running Napster must promptly report to Napster’s directory server which music files they have stored.
- The Napster directory server uses this user information to build a dynamic database that centrally stores all users’ music file information (that is, object names and corresponding IP addresses).
- When a user wants to download an MP3 file, it queries the directory server (still in the traditional client-server fashion), retrieves the results and returns the user the IP address of the computer on which the file was stored. The user can then choose an address to download the desired MP3 file (this download process is P2P).
- As you can see, Napster’s file transfer is decentralized (P2P), but file location is centralized (client-server).
Napster was shut down in 2000 for indirect copyright infringement.
P2P file sharing program with full distributed structure
After Napster, the first P2P file sharing software, was shut down, Gnutella, the second generation of P2P file sharing software, emerged.
The biggest difference between G and N is that the centralized directory server is not used to query, but the flood method is used to query among a large number of G users.
Many P2P file sharing programs have been developed in recent years to make it more efficient for large numbers of users to download and share files using peer-to-peer technologies, which use decentralized location and decentralized transmission technologies. Such as KaZaA, emule eMulw, Bittorrent BT, etc.
BT working principle
Bittorrent refers to the unit of data from which files are downloaded from peers as a file block, and the length of a file block is fixed.
If a user wants to obtain a file (the user is called a peer), it does not have a file block at first, but a new peer may join to own some file blocks and provide data blocks to other peers. After the user acquires several file blocks, the entire file can be assembled. After obtaining the complete file, you can choose to quit BT (equivalent to a selfish user) or stay with BT (equivalent to a selfless user). You can join or quit at any time, even if a file is not fully downloaded, you can join BT later to complete the rest of the file block download.
Therefore, it is generally much faster to obtain different blocks of data from different peers and assemble the entire file than to download the entire file from just one place.
As for the problem BT solves: how do you locate exactly the file blocks you want? If there are multiple requests, who do I give them to first? Or which file block to give first? I’m not going to go into that.