This series is a summary of the HTTP lessons bought on Geek Time. If you want to know in detail, you are recommended to buy the course, which is very easy to understandCopy the code
01 – The past and present of HTTP
- HTTP/0.9 :(1989) Tim berners-lee published a paper that laid the groundwork for a simple text protocol that only gets text resources
- HTTP/1.0 :(1996) established most of the technologies in use today, but it is not an official standard
- HTTP/1.1 :(1999) is by far the most widely used protocol on the Internet and is very functional
- HTTP/2 :(2015) based on Google’s SPDY protocol, focused on performance improvements, but not yet widespread
- HTTP/3 :(entered standardization development stage in 2018) based on Google QUIC protocol, is the future direction of development
02 – What is HTTP? What is HTTP not?
What is HTTP (Hypertext Transfer Protocol)?
- First of all, HTTP is a protocol, a protocol used in the computer world. It uses a language that computers can understand to establish a specification for communication between computers, as well as related controls and error handling.
- HTTP is a transport protocol, a two-way protocol, that is, HTTP is a convention and specification in the computer world dedicated to transferring data between two points
- HTTP is a transport protocol for “hypertext”, which is text beyond ordinary text. It is a mixture of text, pictures, audio and video, and, crucially, “hyperlinks”.
In general, HTTP is a convention and specification for transferring hypertext data, such as text, pictures, audio, and video, between two points in the computer world.
What is HTTP not?
- Because HTTP is a protocol, there is no “separate entity.” It is not an application like browser or mobile phone APP, nor an operating system like Windows or Linux, nor a Web server like Apache, Nginx or Tomcat.
- HTTP is not the Internet.
- HTTP is not a programming language.
- HTTP is not HTML.
- HTTP is not an isolated protocol.
03 – Various concepts related to HTTP
04 – Various protocols related to HTTP
05 – TCP/IP Layer 4 &OSI Layer 7
TCP/IP has four layers, from bottom to top:
1. The first layer is called the “Link layer”, which is responsible for sending raw packets over Ethernet and WiFi networks. It works at the nic level and uses MAC addresses to mark devices on the network, so it is sometimes called the MAC layer.
2. The second layer is called the Internet Layer, and the IP protocol is at this layer. Because IP protocol defines the concept of “IP address”, it can replace MAC address with IP address on the basis of “link layer”, and connect many Lans and wide area networks into a virtual huge network. When looking for devices in this network, it is ok to “translate” IP address into MAC address again.
3. The third layer, called the Transport layer, is responsible for ensuring the “reliable” transfer of data between two points marked by IP addresses. This layer is where TCP works, along with its sister protocol, UDP. TCP is a stateful protocol. Data can be sent only after a connection is established with the peer party, ensuring that data is not lost or repeated. UDP is relatively simple. It is stateless and can send data arbitrarily without establishing a connection beforehand, but there is no guarantee that the data will be sent to the other party. Another important difference between the two protocols is the form of the data. TCP is a sequential byte stream, while UDP is a scattered packet that is sent sequentially and received out of order.
The fourth layer of the protocol stack is called the Application Layer. Because the bottom three layers lay the foundation so well, this layer is full of application-oriented protocols. Examples include Telnet, SSH, FTP, SMTP, and of course our HTTP.
HTTP uses the TCP/IP protocol stack to package and unpack the data layer by layer, but the details below are not visible.
OSI stands for Open System Interconnection Reference Model.
TCP/IP was invented in the 1970s, when there were so many other network protocols, the world of networking was a mess. At this time, the International Standards Organization (ISO) noticed this phenomenon and felt that there were too many “wild ways”, so they wanted to “unify”. Therefore, a new hierarchical network model is designed to unify the existing network protocols. The OSI model is divided into seven layers, some of which are similar to TCP/IP, from bottom to top:
1, the first layer: physical layer, the physical form of the network, such as cable, optical fiber, network card, hub and so on;
2, the second layer: data link layer, which is basically equivalent to the TCP/IP link layer;
3, the third layer: network layer, equivalent to TCP/IP in the Internet layer;
4, the fourth layer: transport layer, equivalent to TCP/IP transport layer;
5. The fifth layer: session layer, maintaining the connection state in the network, that is, maintaining session and synchronization;
6, the sixth layer: the presentation layer, the data into appropriate, understandable syntax and semantics;
7, the seventh layer: application layer, data transmission for specific applications.
A good (but not absolute) tip to tell the difference between layer 4 and layer 7 is “two whatevers” : whatever is handled by the operating system is layer 4 or less, and otherwise, whatever is handled by the application (that is, you write your own code) is layer 7.
06 – What are the keys to a domain name?
Domain name form
Domain name is a hierarchical structure, is a string of “. A number of words are separated, with the right-most being called a top-level domain, followed by a second-level domain, descending to the left. The host name on the far left is usually used to indicate the purpose of the host
For example, the geek time domain is “time.geekbang.org”, where “org” is the top-level domain, “geekbang” is the secondary domain, and “time” is the host name. Using this domain name, DNS will translate it into the appropriate IP address, and you can access Geektime’s website.
2. Resolve domain names
Just as AN IP address must be converted to a MAC address to access a host, a domain name must also be converted to an IP address, a process known as domain name resolution
== The core system of DNS is a three-layer tree, distributed service ==, basically corresponding to the domain name structure:
1. Root DNS Server: Manages the TOP-LEVEL DNS Server and returns the IP addresses of such top-level DNS servers as com, net, and CN.
2. Top-level DNS Server: The authoritative DNS Server that manages its own domain name. For example, the COM top-level DNS Server can return the IP address of the apple.com DNS Server.
Authoritative DNS Server: manages the IP address of the host under its own domain name. For example, apple.com Authoritative DNS Server can send back the IP address of www.apple.com.
With this system, any domain name can be queried from top to bottom in this tree structure, as if the domain name from right to left, and finally get the corresponding IP address of the domain name. For example, to access “www.apple.com”, you perform the following three queries: visit the root DNS server, which tells you the address of the “com” top-level DNS server; Visit the “com” top-level domain name server, and it tells you the address of the “apple.com” domain name server; When you visit the apple.com domain name server, you get the address www.apple.com.
In addition to the core DNS system, there are two ways to reduce the stress of domain name resolution and get results faster. The basic idea is “caching”.
First, many large companies and network operators set up their own DNS servers to act as proxies for users’ DNS queries and access the core DNS system on behalf of users. These “wild” servers are called “non-authoritative DNS servers” and can cache previous query results. If there are already records, the root server does not need to initiate a query and directly returns the corresponding IP address.
Second, the operating system also caches DNS resolution results. If you have visited “www.apple.com” before, the next time you type in the url in the browser, you will not have to ask the DNS. The operating system will get the IP address.
In addition, there is a special “host mapping” file in the operating system, which is usually an editable text called “/etc/hosts” in Linux and “C:\ Windows \ System32 \drivers\etc\hosts” in Windows. If the operating system can’t find DNS records in the cache, it looks for this file.
With the above “wild” DNS server, operating system cache and hosts files, a lot of domain name resolution work will not have to “laborious”, directly in the local or local can solve, not only convenient for users, but also reduce the pressure of DNS servers at all levels, the efficiency is greatly improved.
Q&A
Q: How do you understand HTTP?
Q: Do you think the CDN treats browsers differently than crawlers? Why is that?
A: There is no difference, because crawler itself is also the access to resources, and CDN is not 100% able to identify crawler.
Q: How do you understand the very similar terms WebService and Web Server?
A: WebService is A Web service entity and Web Server is A Web Server. The latter exists to carry the former.
Q: What does DNS have to do with URIs?
A: DNS is A system that resolves domain names to real IP addresses. A URI is a uniform Resource Identifier (URI) that identifies the location of resources that clients need to access. If the host name in the URI uses a domain name, DNS is used to resolve the domain name to an IP address.
Q: Can you explain “Layer 2 forwarding” and “Layer 3 routing” in your own words?
A: Layer-2 forwarding means that the device works at the link layer. When A frame passes through the switch, the device checks the header information of the frame and obtains the target MAC address for local forwarding and broadcasting. Layer-3 routing means that the device works at the network layer. When a packet passes through a device with the routing function, the device analyzes the header information in the packet, obtains the IP address, and forwards the packet locally or selects the next gateway based on the network segment.
Q: What layer do you think DNS protocol is located in? A: Application layer
Q: At what level do you think the CDN works? A: Application layer
Q: Enter a nonexistent domain name in the browser address bar, such as “WWW”. Dot-com does not exist “, try to explain its DNS resolution process.
A: 1. Check whether there is an IP address in the local DNS cache to resolve the domain name “WWW. 2. If no, check whether there is a fixed record in the local hosts file. 3. If the host still does not contain the DNS server IP address assigned to the local network card, the DNS server IP address is usually an “unofficial” IP address, such as Google’s “8.8.8.8”, which also caches the resolution results. If it has no cache or the cache is invalid, it first searches for the DNS server IP address of “nonexistent.com” on the top-level domain name server “com”. The result shows that the DNS server IP address does not exist, so it directly returns a message to the browser indicating that the domain name resolution error. Of course, the two searches are based on UDP
Q: What happens if, for some reason, the DNS fails or goes wrong? A: Cannot access, the message “cannot access this website, cannot find the server IP address of www.XXX.com”