First, HTTP development history

1. Key technologies to build hyperlink document system on the Internet

  • URI: Uniform Resource Identifier (URI), the unique identity of a resource on the Internet.
  • HTML: Hypertext markup Language, describing hypertext documents;
  • HTTP: Hypertext transfer protocol, used to transfer hypertext.

2.HTTP/0.9(simple text protocol, text resource only)

The vast majority of resources on the network are plain text, many communication protocols also use plain text, HTTP/0.9 structure is relatively simple, in order to facilitate server and client processing, it also uses plain text format

3.HTTP/1.0(not an official standard)

1. New methods such as HEAD and POST are added; 2. The response status code is added to mark possible error causes; 3. The concept of protocol version number is introduced. 4. The concept of HTTP Header is introduced to make HTTP more flexible in handling requests and responses; 5. Data transmission is no longer limited to text.

4.HTTP/1.1(currently the most widely used protocol on the Internet, with very complete functions)

1. Add new methods such as PUT and DELETE. 2. Increased cache management and control; 3. Clear connection management to allow persistent connections; 4. Chunked response data is allowed to facilitate large file transfer; 5. Enforce the Host header to make Internet hosting possible.

5.HTTP/2(low penetration)

1. Binary protocol, no longer plain text; 2. Multiple requests can be made and pipes in 1.1 can be discarded; 3. Use a special algorithm to compress the header to reduce the amount of data transmission; 4. Allow the server to actively push data to the client. 5. Enhanced security, “de facto” requiring encrypted communications.

6.HTTP/3(Future direction)

Is the future direction of development

What is HTTP

HTTP is a protocol used in the computer world. It establishes a standard for communication between computers, as well as various control and error handling methods. 2.HTTP is designed to transfer data between two points and cannot be used for broadcasting, addressing, or routing. 3.HTTP transmits hypertext data, such as text, image, audio, and video. 4.HTTP is an important fundamental technology for building the Internet. It has no entity and relies on many other technologies to implement it, but many other technologies also rely on it.

HTTP overview

1. Browser is essentially a ** requestor in HTTP protocol. The role of ** browser is called “User Agent”, namely “User Agent” and “client”.

2. The server is the responder, “server”, commonly used server Nginx, Apache

3.CDN is located between the browser and the server, mainly playing the role of cache acceleration. It can cache the data of the source site, so that the browser request can directly get the response “halfway” without reaching the source site server “from thousands of miles away”. If the scheduling algorithm of CDN is excellent, the nearest node to the user can be found and the response time can be greatly shortened.

4. Crawler is an application that can access Web resources automatically, and it is another kind of User Agent. It will consume excessive network resources, occupy servers and bandwidth, affect the website’s analysis of real data, and even lead to the leakage of sensitive information.

Protocol related to HTTP

1.TCP/IP

Layer 1 “Link layer” 2. Layer 2 “Internet layer” or “Network interconnection layer” 3 Layer 3 “Transmission Layer” 4. Layer 4 “Application Layer”

TCP belongs to the transport layer and IP belongs to the Internet layer.

2.DNS

A domain name does a translation that “maps” to its real IP, which is called “domain name resolution”.

3.URI && URL

A URI is a name used to mark resources on the Internet. It consists of protocol name, host name, and path

4.HTTPS

HTTPS is equivalent to HTTP+SSL/TLS+TCP/IP, which provides a security shell for HTTP.

5. The agent

The proxy is a “transfer station” in the HTTP transmission process, which implements functions such as cache acceleration and load balancing.

Fifth, network stratification

1.TCP/IP network layered model

1. The first layer is called the “Link layer”, which is responsible for sending raw packets over Ethernet and WiFi networks. It works at the nic level and uses MAC addresses to mark devices on the network, so it is sometimes called the MAC layer.

2. The second layer is called the “Internet Layer” and the IP protocol is located at this layer. Because IP protocol defines the concept of “IP address”, it can replace MAC address with IP address on the basis of “link layer”, and connect many Lans and wide area networks into a virtual huge network. When looking for devices in this network, it is ok to “translate” IP address into MAC address again.

3. The third layer, called the Transport layer, is responsible for ensuring the “reliable” transfer of data between two points marked by IP addresses. This layer is where TCP works, along with its sister, UDP.

4. The fourth layer is called the Application Layer. Because the bottom three layers lay the foundation so well, a hundred flowers bloom at this layer, with various application-oriented protocols. Examples include Telnet, SSH, FTP, SMTP, and of course our HTTP.

2.OSI seven-layer model

1. Layer 1: physical layer, physical form of network, such as cable, optical fiber, network card, hub, etc.; 2. Layer 2: data link layer, which is basically equivalent to the TCP/IP link layer; 3. The third layer: network layer, equivalent to TCP/IP in the Internet layer; 4. Layer 4: transport layer, equivalent to TCP/IP transport layer; 5. The fifth layer is the session layer, which maintains the connection state in the network, that is, maintaining the session and synchronization. 6. The sixth layer: presentation layer, which converts data into appropriate and understandable syntax and semantics; 7. The seventh layer is the application layer, which transmits data for specific applications.

3. Mapping between two hierarchical models

1. Layer 1: physical layer, no corresponding in TCP/IP; 2. Layer 2: data link layer, corresponding to the TCP/IP link layer. 3. Layer 3: network layer, corresponding to the TCP/IP Internet layer; 4. Layer 4: Transport layer, corresponding to TCP/IP transport layer; 5. Layer 5, layer 6, and Layer 7 correspond to the TCP/IP application layer.

4. How the TCP/IP stack works

This is how HTTP is transmitted, layer by layer, down the protocol stack, with each layer adding its own proprietary data, packaging each layer, and sending it through the lower layer. Receiving data is the opposite operation. It goes from the bottom up through the protocol stack, unpacking layer by layer. Each layer removes its own header, and the upper layer gets its own data. But the transmission process of the lower layer is completely “transparent” to the upper layer, the upper layer does not need to care about the specific implementation details of the lower layer, so the HTTP level, it does not care whether the lower layer is TCP/IP protocol, see only a reliable transmission link, as long as the data with their own head, the other party can be unchanged.

Summary:

1.TCP/IP is divided into four layers. The core is Layer 2 IP and Layer 3 TCP, and HTTP is at layer 4. 2.OSI is divided into seven layers, basically corresponding to TCP/IP, TCP in the fourth layer, HTTP in the seventh layer; 3.OSI can map to TCP/IP, but layers 1, 5, and 6 disappear in the meantime; 4. In daily communication, we usually use the OSI model, using terms like four layers, seven layers, etc. 5.HTTP uses the TCP/IP protocol stack to package and unpack the data layer by layer, but the details below are not visible.

A good (but not absolute) tip to tell the difference between layer 4 and layer 7 is “two whatevers” : whatever is handled by the operating system is layer 4 or less, and otherwise, whatever is handled by the application (that is, you write your own code) is layer 7.

Domain name resolution

1. Domain name format

Domain name is a hierarchical structure, is a string of “. A number of words are separated, with the right-most being called a top-level domain, followed by a second-level domain, descending to the left.

On the far left are the host names, which are usually used to indicate what the host is doing, such as “WWW” for providing the World Wide Web and “mail” for providing mail, but this is not absolute. The key is to make the name easy to remember.

2. Resolve domain names

Domain names must be converted to IP addresses, a process known as domain name resolution.

The core system of DNS is a three-layer tree and distributed service, which basically corresponds to the structure of domain name:

1. Root DNS Server: manages the TOP-LEVEL DNS Server and returns the IP addresses of such top-level DNS servers as com, net, and CN.

2. Top-level DNS Server: An authoritative DNS Server that manages its own domain name. For example, the COM top-level DNS Server can return the IP address of the apple.com DNS Server.

Authoritative DNS Server: manages the IP address of the host with its own domain name, as in apple.com Authoritative DNS Serverwww.apple.comThe IP address of.

For example, to access “www.apple.com”, perform the following three queries:

1. Access the root DNS server. The root DNS server tells you the address of the com top-level DNS server. 2. Access the com top-level domain server and it tells you the address of the apple.com domain server. 3. Access the apple.com domain name server and obtain the address www.apple.com.

3. Reduce the stress of domain name resolution

Reduce the pressure of domain name resolution, and can get results faster, the basic idea is “cache”.

4. Domain name resolution process

Browser cache -> OPERATING system Cache -> hosts file -> local DNS server -> root DNS server -> TOP-LEVEL DNS server -> Authoritative DNS server

5. Load balancing based on domain names

In the first method, domain name resolution can return multiple IP addresses, so one domain name can correspond to multiple hosts. After receiving multiple IP addresses, clients can use the polling algorithm to send requests to servers in sequence to achieve load balancing.

Second, you can configure an internal policy for domain name resolution to return the host closest to the client or the host with the best service quality. In this way, requests are distributed to different servers on the DNS server to achieve load balancing.