background
Some time ago in the process of iOS terminal HTTPDNS related SDK, in the access and test link found that everyone on the overall HTTP request process including HTTP hijacking principle and HTTPDNS working principle is not too clear, so write this article to help everyone in-depth web request process: How to initiate a request, HTTP protocol resolution, DNS domain name resolution.
HTTP initiates a request process
When we request a domain name @”www.baidu.com” on our mobile phone
-
1. The carrier’s DNS server resolves the domain name into an IP address.
-
2. Find the corresponding server on the Internet according to the IP address, and send a GET \ POST request to the server.
-
3. The server finds the original path of the resource and returns it to the user.
This is just an overview of the process. In fact, each step of the process has complex structure and logic. For example, there may be many servers. Whether the requested data is stored in a distributed cache, a static file, or in a database. When the data is returned to the browser, the browser will initiate another HTTP request when parsing the data and finding some static resources (such as CSS, JS or images), and these requests are likely to be on the CDN, so the CDN server will process the user’s request.
HTTP hijacked
The main reason we use HTTPDNS is to solve HTTP hijacking. HTTP hijacking is divided into two kinds: the first is DNS hijacking, the second is content hijacking, the latter is developed on the basis of the former, is a relatively advanced hijacking means, there is no solution at present, the following to separate explanation:
1: DNS hijacking
- Hijack process
DNS hijacking is also calledDomain name hijacking”, refers to the interception of domain name resolution requests within the hijacked network, analysis of the requested domain name, release of the request beyond the scope of review, otherwise return a fake IP address or do nothing to make the request lost response, the effect is that the specific network cannot be accessed or accessed is a fake URL. The essence is to fiddle with the DNS server, or to use a fake DNS server, as illustrated in the following figure
You can see in red the process of hijacking and forwarding your request to a fake server.
- Solution DNS hijacking is achieved by attacking the carrier’s resolution server. We can use our own DNS server without the carrier’s DNS resolution or send the resolved domain name in the form of IP in advance in our App to bypass the carrier’s DNS resolution, so as to avoid the problem of DNS hijacking.
2: Content hijacking
- Hijack process
Content hijacking is rarely mentioned on the web, this is also in the httpDNS SDK encountered a problem, in fact, content hijacking at the beginning of the starting point is good, is the operator in order to speed up the user access speed and reduce their own traffic loss and do a cache mechanism, When the user request data as the server operators will transfer the user’s request to the cache pool, if the cache is returned directly, without words like server requests then intercept and callback data cache server to the user, thus can greatly reduce the operators as the number of server requests, can also speed up the user’s access, so the starting point is good, However, some illegal vendors do something inside the cache pool to modify the returned content directly, so that we can receive the wrong data
The yellow line is dangerous because the callback data may have been manipulated.
- Solutions:
Now there is no way to find, but this is not a lot of hijacking.
DNS Resolution Process
For iOS devices, skip to step 3
-
1. The system checks whether the browser cache has a resolved IP address for the domain name. If yes, the resolution process is complete. The browser cache is controlled by the expiration time of the domain name and the size of the cache.
-
2. If the user does not have the Host file in the browser cache, the browser searches for the local Host file in the OS cache.
-
3. If the local Host file does not contain the domain name, the operating system sends the domain name to LocalDNS, that is, the local domain name server. This DNS usually provides a DNS resolution service for your local Internet access. The dedicated DNS server performs well and generally caches domain name resolution results. Of course, the cache time is controlled by the expiration time of the domain name. The common cache space is not the main factor affecting the expiration of the domain name. About 90% of domain name resolution is done at this point, so LDNS is mainly responsible for domain name resolution.
-
4. If the LDNS is still not matched, the Root Server directly requests resolution
-
5. The root DNS Server returns the gTLD Server address of the queried domain to the local DNS Server. GTLD is an international top-level domain name server, such as.com,.cn,.org, etc. There are only about 13 in the world.
-
6. The Local DNS Server sends a request to the gTLD Server returned in the preceding step.
-
7. The gTLD Server that accepts the request searches for and returns the address of the Name Server corresponding to the domain Name. This Name Server is usually the domain Name Server that you have registered, for example, the domain Name that you have applied for from a domain Name service provider
-
8. The Name Server queries the mapping table between domain names and IP addresses. In normal cases, the DNS Server obtains the destination IP address record based on the domain Name and returns it together with a TTL value to the DNS Server.
-
9. Return the IP address and TTL of the domain name. The Local DNS Server caches the mapping between the domain name and THE IP address.
-
10. The resolution result is returned to the user and cached in the local cache based on the TTL value. The domain name resolution process is complete. The above process can be simplified as the following figure
The green one shows the operation process for non-ios devices
CDN working mechanism
Introduction of CDN
CDN, full name of Content Delivery Network, is basically to publish the Content of a website to the “edge” of the Network closest to users, so that users can get the Content they need nearby and improve the response speed of users visiting websites. CDN= Mirror + cache + overall load balancing (GSLB). Therefore, CDN can significantly improve the efficiency of information flow in the Internet. At present, CDN mainly cache static data in websites, such as CSS, JS, images and static web pages. Users download these static data from CDN after requesting dynamic content from the master server, thus accelerating the download speed of web data content. For example, more than 90% of data of Taobao is provided by CDN. Home A’s Internet speed is 100M, but he only uses 10M. Home B’s Internet speed is 10M, but he needs 15M. How to do. C is A CDN service provider and has A node in A’s home (just like A is A sponsor). B bought CDN acceleration service in C’s home. When the speed of B is not enough, CDN acceleration will select nodes with savings to help B and improve the speed of B. So B can go 15 meters or faster, and everybody’s happy. A doesn’t waste, B has speed, C makes money. When there are a lot of C nodes all over the country. Then you use C home CDN acceleration service, you will be walking like a fly.
CDN workflow
When a user accesses a static file (such as the CSS), the domain name of the static file is www.baidu.com, and the domain name is eventually pointed to the CDN load balancing server in the GLOBAL CDN, and then the load balancing server allocates the access user in which place and returns the access user to the NEAREST CDN node. Then the user directly goes to the CDN node to access the static file. If the file requested in this node does not exist, the user will go back to the source station to obtain the file and then return it to the user.
Load balancing
Load balancing is to balance work tasks and distribute them to multiple operation units, such as picture servers and application servers, to jointly complete work tasks. It can improve server response speed and utilization efficiency, avoid single point of failure of software or hardware modules, solve network congestion, achieve geographical location independence, and provide users with more consistent access quality. The overall working process of CDN can be roughly classified as the following figure:
conclusion
The above is some understanding of Http request, if there is any wrong place to communicate with me in time.