This paper is participating in theNetwork protocols must be known and must be known”Essay campaign

The words written in the front

Recently, I was asked a lot of DNS related questions in the interview. So I sorted out this article.

Although the article scroll bar is very small, but the content is full, you can calm down to have a look, if there is a wrong place is also very welcome to correct oh ~ (recently code word a bit of hand gourd 🤔), if this article is helpful to you, please also like to encourage it, after all code word and drawing are not easy ah.

DNS

The Domain Name System (DNS) is a service of the Internet. As a distributed database that maps domain names to each other, it makes it easier for people to access the Internet

Link to Wikipedia zh.wikipedia.org/wiki/%E5%9F…

Domain name mapping: A domain name is mapped to another domain name or IP address, or an IP address is mapped to a domain name

Distributed database: Each DNS server has a database (resolution record table) that records the domain names that the DNS server is currently responsible for resolving

DNS consists of a parser and a domain name server. A domain name server stores the domain names and IP addresses of all hosts on the network and converts domain names into IP addresses.

Modern network communication usually uses domain name +IP address +MAC address.

To communicate with each other, two hosts must know each other’s IP address. Usually, when we visit a website, we just input a domain name, such as ww.baidu.com. Then how does the computer know the IP address corresponding to this domain name? That’s what the DNS protocol does.

Finding MAC addresses from IP addresses is the work of ARP, which is not explained in detail in this article.

The domain name

Domain Name (Domain Name for short) is the Name of a computer or computer group on the Internet consisting of a series of characters separated by dots. It is used to identify the electronic location of the computer during data transmission. A domain name is an alternate name for an IP address for easy memorization. For example, wikipedia.org is a domain name. People can access Wikipedia.org directly in place of an IP address, and the Domain Name System (DNS) will convert it into an IP address that is easy for machines to recognize. In this way, people only need to remember a string of characters called wikipedia.org with a special meaning, rather than a numeric domain name with no meaning.

To Wikipedia: zh.wikipedia.org/wiki/%E5%9F…

Domain name has a clear hierarchy, root domain name, top-level domain name, second level domain name, third level domain name, fourth level domain name…

❓ so who will define and manage the domain name? It can’t just be random, can it?

ICANN (Internet Corporation for Assigned Names and Numbers) is the world’s top domain name management organization, headquartered in California, The United States. ICANN manages the operation of the world’s domain name system.

The root of the domain name

All domain names are subdomain names of the root domain name. Root is commonly abbreviated as. However, the root domain name of all domain names is the same, so we usually omit the root domain name, and the computer will automatically add the root domain name for us when DNS resolution.

For example, the full domain name 🌰: www.baidu.com would be www.baidu.com. Or www.baidu.com.root

If you try ping www.baidu.com. You will find the same IP address as ping www.baidu.com

In theory, all domain name queries must first look at the root domain, because only the root domain can tell you which server manages a top-level domain. Indeed, ICANN maintains a list (the root domain list) of top-level domains and their corresponding hosts.

Top-level domain (level 1 domain)

Let’s start with top-level domain names (TLD). In short, it is the last part of the website. For example, the top-level domain of the web site www.baidu.com is.com.

One of ICANN’s main jobs is to dictate which strings can be used as top-level domains. As of July 2015, there were 1,058 top-level domains, which can be roughly divided into two categories:

  • One kind isGeneric top-level domain(gTLD), for example.com,.net,.edu,.org,.xxxWait, there are over 700 of them.
  • Another kind isCountry top-level domain(ccTLD), representing different countries and regions, such as.cn(China).ioBritish Indian Ocean Territory.cc(Cocos Islands),.tv(Tuvalu) and more than 300 others.

Of course, ICANN doesn’t manage these top-level domains on its own, because it can’t. If you think about it, there are more than 1000 top-level domains, and there are many wholesalers under each top-level domain, it would be too much trouble to manage all of them. ICANN’s policy is to have a host for each top-level domain, and that host is responsible for everything about that domain. ICANN only deals with the custodian, which makes it easier to administer. The host of the.cn national top-level domain, for example, is the China Internet Network Information Center (CNNIC), which determines policies on the domain.

Secondary domain name (sub-domain name)

Second Level domains (SLD) have different meanings under generic top-level domains or national top-level domains:

  • Secondary domain name under GTLDS: generally refers to the online name that the domain name registrant chooses to use, such asyahoo.comBusiness organizations often use their own trademarks, trade names or other business symbols as their online names, e.gbaidu.com)
  • A second level domain name under a national top-level domain name (NTLDS) : generally a mark similar to a generic top-level domain that indicates the class and function of the registrant. For example, in the.com.cnIn the domain name structure,.comThis time is placed in the country top-level domain.cnUnder the secondary domain name, represents the Commercial organization in China, and so on.

A level 3 domain name is a domain name in the form of www.baidu.com, which can be regarded as a subdomain name of a level 2 domain name. For the domain name owner/user, a level 3 domain name is an adjunct to a level 2 domain name without a separate fee. Level 3 domain name can not even be called a domain name, generally referred to as the “level 2 directory” under the domain name.

In addition to level 2 domain names, there are level 3 domain names (www.baidu.com), Level 4 domain names,…

Many articles on the Internet say baidu.com is a top-level domain name, which is wrong.

Domain name server

A domain name server, also known as a DNS server, is a server that translates domain names into IP addresses. In other words, DNS servers are used to manage domain names, usually consisting of multiple DNS servers that resolve the domain name and point it to the server hosted by your web application.

Given the hierarchical structure of domain names, there are so many domain names in the world, how can DNS servers handle so many domain name resolution requests?

Therefore, DNS servers also adopt a hierarchical structure. The related DNS server is only responsible for resolving the current domain name and its sub-domain name.

Root DNS server

As mentioned above, ICANN maintains a list of top-level domain names and their corresponding custodians, formally known as the DNS root zone, the server that holds the DNS root zone files. This is called a DNS root name server. The root DNS server holds the addresses of all top-level DNS servers

The result of the earlier DNS query was a 512-byte UDP packet. This package can hold up to 13 server addresses, so there are 13 root DNS servers worldwide, numbering from A.root-servers.net to m.root-servers.net. Ten of them are in the United States, with one each in the Netherlands, Sweden and Japan.

As mentioned earlier, all domain name queries must theoretically be made to the root domain first, so generally all DNS servers register a cache of root DNS IP addresses to send requests to when necessary.

For example, when you query www.baidu.com from the root DNS server, the root DNS server returns the IP address of the TOP-LEVEL DNS server of com

Top-level domain name server

By the same logic that the root DNS manages top-level domain names, the top-level domain name server is obviously used to manage all the secondary domain names registered under the top-level domain name and record the IP addresses of these secondary domain names.

Permission domain name server

According to the above logic, permission should be registered domain name server in the secondary domain name under all three/four domain name, but it is not the case, if a secondary domain name or a three/four domain name corresponds to a domain name server, the domain name server will be a lot of number, we need to use the ways to solve this problem. A domain name server is a domain name server that manages an area.

❓ What is a district? How do you divide it up?

Regions and domains are actually different, and regions can be divided in many different ways. Taking Baidu as an example, we assume that there are three three-level domain names, fanyi.baidu.com, ai.baidu.com and tieba.baidu.com. We can partition fanyi.baidu.com and Tieba.baidu.com into baidu.com domain name server, and ai.baidu.com into ai.baidu.com domain name server. Baidu.com domain name server and AI.baidu.com domain name server have the same status, and the specific zoning is determined by Baidu itself according to the number of domain names and access.

Let me draw a picture to get a sense of it:

Local domain name server

In addition to the above three DNS servers, another important DNS server that is not part of the DNS hierarchy is the local domain name server (also known as the authoritative domain name server). The local DNS server is the default DOMAIN name server (DNS) configured on the PC for resolution.

Domain name Resolution Process

Parse the record

Resolution records are stored in the domain name server to express the mapping between domain names and IP addresses. They are called Records. Records can be divided into different types depending on their usage. Here are a few common parse record types

type explain
A Address, which returns the IPv4 Address pointed to by the domain name.
AAAA AAAA Record: returns the IPv6 address specified by the domain name.
NS Name Server: returns the address of the Server that stores the next-level domain Name information. The record can only be set to a domain name, not an IP address.
CNAME Canonical Name, which returns another domain Name, i.e. the current query is a jump to another domain Name, as described below.
MX Mail eXchange, which returns the address of the server that receives E-mail messages.

The DNS server performs hierarchical query based on the domain name level.

To be clear, each level of domain name has its own NS record. The NS record points to the DNS server of the level of domain name. These servers know the various records of the next level of domain name.

Hierarchical search is to search NS records of domain names at each level from the root domain name to the final IP address. The process is as follows.

  1. Search NS record and A record (IP address) of top-level DNS server from root DNS server
  2. Search NS record and A record (IP address) of secondary DNS server from top-level DNS server
  3. Get the IP address of “host name” from “secondary DNS”

If you look closely at the procedure above, you may notice that there is no mention of how the DNS server knows the IP address of the root DNS server. The NS record and IP address of the root DNS server do not change, so the IP address of the root DNS server is already built into all DNS servers.

Iterative query

Tips: Typical DNS does not use iterative or recursive queries all the way, but a combination of both. This is just to demonstrate the entire process of using iterative queries.

Iteration of the query process is: when the domain name server receives iterative query request packet, demand given host “you the next step should be to which a domain name server to query” advice, and then by the host of the next query, when returned to content that is not the exact result and no further advice, DNS failure.

The recommendation here refers to records such as NS or CNAME

Please look at the picture (small robot)

You can use dig command to track DNS process iteratively. Install DIG command on Windows by yourself

dig +trace www.baidu.com
Copy the code

Note: How the client selects one of the NS records for the next query varies from operating system to operating system. Some systems select the first NS record, while others select it randomly. Therefore, the DNS server usually puts the fastest NS record in the first one.

When multiple NS records are returned, if the IP address query of the first NS record accessed does not respond within a certain period of time, the system attempts to access the IP address query of other NS records.

If the returned record is empty, the query fails and the DNS process ends.

Recursive query

A recursive query is: if the host asked the local domain name server does not know the IP address of the domain name is query, then local domain name server in the DNS client identity, send a query request message to other root name servers to continue (for the host to query), and not let host himself into the next phase of the query. Therefore, a recursive query returns the IP address to be queried, or an error message is displayed indicating that the REQUIRED IP address cannot be queried.

Host says to domain name server: you check the IP of www.baidu.com to me, I no matter how you check, if you don’t know you go to ask other people yourself, I want result only don’t process 🐅.

Of course, the picture has to go up.

The actual DNS query process

As mentioned above, actual DNS queries are not purely iterative or recursive queries, but are used in combination with each other. So when to use recursion and when to use iteration?

The host sends recursive queries to the local DNS server, and the local DNS server sends iterative queries to other DNS servers.

Since the parsing process of www.baidu.com requires a CNAME record www.a.shifen.com, I think you have already understood the function of CNAME record by now. In the following examples, I’ll use Baidu.com instead of www.baidu.com. (lazy)

Here you will find that baidu.com and Shifen.com have the same authorized domain name server, which is in the same area as mentioned above

DNS cache

To ease DNS query pressure and speed up DNS query, browsers, operating systems, and DNS servers cache DNS query results. If a DNS request is received again within the cache validity period, the DNS server returns the IP address instead of performing subsequent domain name query.

The priority of the cache is determined by the path of the DNS query, which is easy to understand

Browser Cache > OPERATING System Cache > hosts file > Local DNS server Cache > Root DNS server Cache >…

How to control the cache duration

The cache Time is generally specified by the TTL (Time to Live) in the resource record of the DNS response packet, in seconds

Baidu.com: type A, class IN, addr baidu.com: type A, class IN, addr 220.181.57.216 A (Host Address) (1) # Class: IN (0x0001) # Time to live: 5 # Data length: 4 # Address: 220.181.57.216 # Resource data, where the IP addressCopy the code

However, some browsers or operating systems do not comply with the TTL because the TTL value is too small or for other reasons. This also invalidates TTL solutions for many caching problems.

The hosts file is not a cache, so it is always available

Does DNS use UDP or TCP

DNS uses port 53 for both TCP and UDP. The fact that a single application protocol uses both transport protocols is also an anomaly in the TCP/IP stack. But little is known about the circumstances under which DNS uses the two protocols.

DNS Uses TCP when zone transmission (synchronous resolution records) and DNS response is larger than the maximum length of UDP packets, and UDP is used in other times.

Why use UDP

Fast is the biggest advantage of UDP

When a client queries a domain name from the DNS server, the returned value is usually no more than 512 bytes and can be transmitted through UDP. Without the need for a three-way handshake, the DNS server is less loaded and more responsive.

In theory, clients can also specify TCP for DNS queries, but in fact, many DNS servers are configured to support only UDP query packets.

When to use TCP

TCP is used when the transmitted data is larger than the maximum UDP packet length:

Let’s start with TCP and UDP byte transmission limits:

The maximum length of a UDP packet is 512 bytes, while the maximum length of a TCP packet is 512 bytes. If the DNS query exceeds 512 bytes, the subsequent data will be discarded. Therefore, TCP is used to send data. However, traditional UDP packets are usually smaller than 512 bytes. Even if the DNS server has a large number of records that meet the conditions, the DNS server limits the number of records returned to 13 to prevent packets from being larger than 512 bytes.

TCP is used for DNS zone transmission:

The DNS specification specifies two types of DNS servers, one called the primary DNS server and the other called the secondary DNS server. In a zone, the primary DNS server reads DNS data information from its own data files, and the secondary DNS server reads DNS data information from the primary DNS server in a zone. When a secondary DNS server is started, it needs to communicate with the primary DNS server and load data information. This is called zone transfer.

The secondary DNS server will query the primary DNS server periodically (usually 3 hours) to see if the data has changed. If there are changes, a regional transfer is performed for data synchronization. Regional transport uses TCP rather than UDP because data synchronization transfers much more data than a single request response and requires data reliability and integrity.

Application of DNS in CDN

CDN is a common solution for front-end optimization of loading speed. What is CDN and how does it optimize the loading speed?

The full name of CDN is Content Delivery Network. Its purpose is to add a new layer of network architecture in the existing Internet, the content of the website will be published to the edge of the network closest to users, users can get the content needed nearby, improve the response speed of users to visit the website.

The principle of CDN is simple

When users access CDN resources, DNS resolution is also required

During DNS resolution, the INTELLIGENT DNS scheduling system in CDN selects the most suitable resource server IP address for the host through load balancing and network

CDN cache

When the local cache of the browser is invalid, the browser sends a request to the EDGE node of the CDN. Similar to browser cache, CDN edge node also has a caching mechanism.

Disadvantages of CDN caching

CDN cache not only reduces the user access delay, but also reduces the load of the source station. However, its disadvantages are also obvious: when the website is updated, if the data on the CDN node is not updated in time, even if the cache on the browser is inactivated by the user using Ctrl+F5 in the browser, the CDN edge node does not synchronize the latest data, resulting in user access abnormalities.

CDN cache mechanism

CDN edge node Cache policies vary with different service providers, but generally follow the HTTP standard protocol and set the CDN edge node data Cache time through the cache-Control: max-age field in the HTTP response header.

When the client requests data from the CDN node, the CDN node will judge whether the cached data has expired. If the cached data has not expired, the CDN node will directly return the cached data to the client. Otherwise, the CDN node will issue a back-source request to the source station, pull the latest data from the source station, update the local cache, and return the latest data to the client. Therefore, if we modify the content, it is better to add a version number to allow CDN to obtain resources again, so as to reduce unnecessary troubles, such as:

app.js? V = 20171114 or style. CSS? v=20171114

The CDN cache is refreshed

CDN edge node is transparent to developers. Compared with the browser’s forced refresh of Ctrl+F5 to inactivate the browser’s local cache, developers can clear CDN edge node cache through the refresh cache interface provided by CDN service providers. In this way, after updating data, the developer can use the refresh cache function to force the expiration of the data cache on the CDN node to ensure that the client can pull the latest data when accessing.

DNS related large factory interview questions

What is the complete process of DNS query?

  1. The browser checks the cache to see if there is a resolved IP address for the domain name. If there is, the resolution process ends. The browser cache domain name also has limitations, including the cache time, size, can be set through the TTL attribute.

  2. If the user’s browser does not have the URL mapping in the cache, the operating system checks whether the URL mapping exists in the local DNS parser cache and hosts file. If yes, the OS invokes the IP address mapping to resolve the domain name.

  3. If none is available, it looks for the preferred DNS server set in the TCP/IP parameters, which we call the local DNS server. The system sends A recursive query to the local DNS server. If the local DNS server has A record or the cache mapping of the domain name, the system returns the query result

  4. If no, the local DNS server iteratively searches the domain name with 13 root DNS servers. The root DNS server returns the TOP-LEVEL DNS server IP address (NS record) of the domain name. The local DNS server then sends A query to the TOP-LEVEL DNS server. The top-level DNS server returns the NS record of the secondary DNS server. The process is repeated until it returns A record, and finally returns the IP address in A record to the host

What about recursive and iterative queries? And their strengths and weaknesses? When to use recursive queries and when to use iterative queries?

Is DNS based on UPD or TCP? When to use UPD and when to use TCP?

Does CDN know? How does it work?

It’s not easy. Did you like it? (Key question) 😃 Welcome to add DNS related interview questions you have encountered

Reference article:

Super detailed DNS resolution

Understand the principle of CDN acceleration