wedge

As a front-end programmer, you have to work with HTTP on a daily basis, yet many front-end programmers, even those who have worked for many years, have no idea how HTTP works. When asked how HTTP works, HTTP protocol structure, HTTP communication principle, HTTP protocol features, HTTP0.9, HTTP1.1, HTTP2, HTTP3, HTTPS and so on, most people are probably dumbstruck. If you want to study on the Internet, you will find that the information is too complicated to be systematic, and there may even be wrong content. And the general content is too superficial, the most important thing is that the learning process is very boring, it is difficult to stick to. So I came to see if I could keep it up, Funny face.

Study materials and reference materials:

  1. Big talk about THE HTTP protocol
  2. Baidu encyclopedia
  3. Wait (don’t ask me who wait!)

At the beginning of the HTTP get to know each other

Let’s take a look at the HTTP protocol to get a feel for it.

What is the HTTP protocol

HTTP stands for HyperText Transfer Protocol, which translates as HyperText Transfer Protocol. HTTP is the most widely used network transport protocol on the Internet, and all WWW files must comply with this standard. HTTP is a communication protocol that allows the transfer of hypertext data, such as text, pictures, audio, and video, between two dedicated points in the computer world. HTTP is an object-oriented protocol belonging to the application layer. Because of its simple and fast way, it is suitable for distributed hypermedia information system. It was put forward in 1990. After several years of use and development, it has been continuously improved and expanded.

WEB and HTTP

WEB is a global, dynamic and cross-platform distributed graphic information system based on hypertext and HTTP. A network service built on the Internet provides a graphical, easy-to-access and intuitive interface for visitors to find and browse information on the Internet, in which documents and hyperlinks organize information nodes on the Internet into a mutually related network structure.

The past and present of HTTP protocol

  1. In October 1990, TimBeners-Lee, the father of the World Wide Web, first proposed HTTP protocol
  2. HTTP: / / HTTP: / / HTTP: / / HTTP: / / HTTP: / / HTTP: / / HTTP: / / HTTP: / / HTTP: / / HTTP0.9
  3. In May 1996, HTTP1.0 was released (with a number of enhancements and improvements on the basis of HTTP0.9, such as request headers and response headers, status codes, redirection, the addition of head and POST methods, response objects are no longer limited to HTML text, and some long connections are also supported. Added caching mechanism, etc.)
  4. HTTP1.1 was released in January 1997 (it was replaced by a new specification in June 1999. HTTP1.1 is the most popular protocol in use today, adding options, PUT, DELETE, Patch, Connect, persistent connections, pipeline mechanisms, chunking transport, etc.)
  5. In May 2015, HTTP2.0 proposed (improve transport performance, achieve low latency and high throughput, and other basically the same as HTTP1.1)
  6. HTTP3.0 is based on the QUIC protocol (THE QUIC protocol is based on UDP and does not require a connection, but UDP is not very reliable. QUIC was introduced by Google in 2013. The main goal was to reduce TCP/ IP-based communication latency and other overhead. Google made a lot of improvements on UDP, trying to provide TCP reliability with the efficiency of UDP, merging the two into one, and then it wasn’t adopted anyway. The RETF set up a task force in 2016 to recreate one itself (not radically different from Google’s!). )

Look at HTTP through TCP/IP

HTTP is built on TOP of TCP/IP and is a subset of TCP/IP

TCP/IP protocol family

TCP/IP protocol is a collection of protocols associated with the Internet. Layered management is an important feature of TCP/IP. The TCP/IP protocol family is a system consisting of four layers:

  1. The application layer
  2. The transport layer
  3. The network layer
  4. Data link layer

The application layer

The application layer is generally the application we write, which determines the application services provided to users. The application layer can communicate with the transport layer through system calls. For example, FTP, DNS, and HTTP

The transport layer

The transport layer provides the data transfer function between two computers in a network connection to the application layer through system calls. At the transport layer, there are two different protocols: TCP and UDP. TCP is connection-oriented, so TCP is reliable, but the efficiency of establishing a connection is low. UDP is connectionless, because UDP has no connection, so it is very efficient, but because there is no connection, there is no verification mechanism, so the reliability is lower. In actual use, the protocol type depends on the scenario.

The network layer

The network layer is used to process the packets that flow across the network, which is the smallest unit of data transmitted over the network. This layer defines the route through which the packets are calculated and transmitted to the other party.

The link layer

The link layer processes the hardware that connects to the Network, including the control operating system, hardware device driver, Network Interface Card (NIC), and optical fiber. Hardware categories are within the scope of the link layer.

Packet encapsulation

Applications pass data up and down this protocol stack before publishing it to our data network.

Each layer of protocol adds its own header information on top of the previous layer, and the link layer adds its own tail information.

HTTP data transfer process

When the sender sends data, the data is transferred from the upper layer to the lower layer, and the data header information of the layer is typed after each layer. When the receiving end receives data, the data will be transferred from the lower layer to the upper layer, and the header information of the lower layer will be deleted before transmission.

TCP three-way handshake (Transport layer)

The two parties using TCP to communicate must establish a connection before data can be transmitted.

To ensure the reliability of the two parties, the TCP protocol uses the three-way handshake when the two parties establish a connection.

First handshake

The client sends a connection request packet with the SYN flag. Then the client enters the SYN_SEND state and waits for the confirmation from the server.

Second handshake

After receiving a SYN packet from a client, the server needs to send an ACK message to confirm the SYN packet and send its own SYN request. The server puts the above information into a packet segment (SYN + ACK packet segment) and sends it to the client. At this point, the server enters the SYN_RECV state.

Third handshake

After receiving a SYN + ACK packet from the server, the client sends an ACK packet to the server. After the ACK packet is sent, the client and the server enter the ESTABLISHED state to complete the TCP three-way handshake.

DNS Domain name Resolution

Usually we visit a website using a host name or domain name. Because domain names are easier to remember than IP addresses. But TCP/IP uses IP addresses for access, so there must be a mechanism or service to convert domain names to IP addresses. The DNS service is designed to solve this problem. It provides domain name to IP address resolution service.

HTTP in encounter

HTTP Protocol Features

Client/server mode

The client/server mode works in such a way that the client sends a request to the server, and the server responds to the request and performs the service in response.

Simple and quick

When a client requests a service from a server, it only needs to pass the request method and path. The request methods are GET, POST, PUT, TRACE, DELETE, HEAD, OPTIONS, and CONNECT. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.

flexible

Although HTTP0.9 could only transfer HTML text, HTTP has grown and evolved to allow the transfer of data objects of any type. The Type being transferred is marked by content-Type.

There is no connection

Connectionless means to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer.

The HTTP protocol was created in the Internet server to deal with millions of web access But each client and server data exchange intermittent particularly big, that is to say, transfer this thing is a sudden, instantaneity, and web browsing associative, divergent lead to multiple transmit data correlation is low, most of his fellow is very free, It’s taking up resources for no reason. So HTTP’s designers intentionally designed the protocol to make connections when requested and release connections when requested, freeing resources as quickly as possible to serve other clients. Of course, as time went on, the web became more and more complex so keep-Alive came into being. Keep-alive, as the name implies, keeps the connection between the client and the server alive, preventing the establishment or re-establishment of the connection. However, keep-alive allows resources that could otherwise be released to remain occupied, so it cannot be abused.

stateless

HTTP is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increased amount of data being transferred per connection. The server responds faster when it doesn’t need front-loading information. So what do you do when you need pre-information? So cookies and sessions are the two techniques used to store the state of HTTP connections.

HTTP packet Structure

HTTP header fields are classified into four categories:

  1. Universal header
  2. Request header
  3. Response header
  4. Entity header

Universal header

It can be used in request messages or response messages

  1. Cache-control Controls the Cache behavior
  2. Connection management, hop – by – hop header
  3. Date Date and time when the packet is created
  4. Pragma message instruction
  5. View the header at the end of Trailer message
  6. Transfer-encoding Specifies the Transfer Encoding mode of the packet body
  7. Upgrade Upgrade to another protocol
  8. Information about the Via proxy server
  9. Warning Error notification

Request header

  1. Accept Specifies the media types that the user agent can process
  2. Accept-charset Specifies the preferred character set
  3. Accept-encoding Indicates the preferred content Encoding
  4. Accept-language Preferred Language (natural Language)
  5. Authorization Web authentication information
  6. Expect expects the server’s specificity to be
  7. From Email address of the user
  8. Host Server for requesting resources
  9. If-match compares entity tag ETags
  10. If-modified-since Compares the update time of the resource
  11. If-none-match compares entity tags (as opposed to if-matc)
  12. If-range sends a Range request for entity Byte when the resource is not updated
  13. If-unmodified-since Compares the update time of a resource (as opposed to if-modified-since)
  14. Max-forwards Maximum transmission hops
  15. Proxy-authorization Proxy servers require client authentication information
  16. Range Specifies the byte Range request for the entity
  17. Referer refers to the original getter of the URI
  18. Priority of TE transmission encoding
  19. User-agent Indicates the information about the HTTP client program

Response header

  1. Accept-ranges whether bytes range requests are accepted
  2. Age Calculates the elapsed time of resource creation
  3. ETag Matching information of the resource
  4. Location redirects the client to the specified URI
  5. Proxy-authenticate Authenticate information about the client by the Proxy server
  6. Retry-after Specifies the time to Retry the request
  7. Server Installation information about the HTTP Server
  8. Vary proxy server cache management information
  9. Www-authenticate Indicates that the server authenticates the client

Entity header

  1. Allow Indicates the HTTP method supported by the resource
  2. Content-encoding Encoding mode applicable to the entity body
  3. Content-language The natural Language of the entity body
  4. Content-length Specifies the size of the entity body
  5. Content-location replaces the URI of the corresponding resource
  6. Content-md5 Indicates the packet digest of the entity body
  7. Content-range Specifies the location Range of the entity body
  8. Content-type Specifies the media Type of the entity body
  9. Expires Indicates the date and time when the entity body Expires
  10. Last-modified Date and time of the Last modification of the resource

HTTP request method analysis

HTTP1.1 common request methods include: GET, POST, PUT, HEAD, DELETE, OPTIONS, TRACE, CONNECT

GET

The GET method is used to request access to a resource identified by a URI. The specified resources are parsed by the server and the response content is returned. It is also the browser’s default request method. The GET method can also be used to submit forms and other data. For example: http://xxxxxx/xxxx.html? Name = Weilai&age =25 You can easily identify the form submission from the URL request above. At the same time, it also has a length limit, which is different for each browser. Internet Explorer The shortest Internet Explorer limits the URL length to 2083 bytes (2K+35). The GET method is now primarily used to pull resources (one or more) from the server.

POST

The POST method is similar to GET in that it is used to transfer the body of an entity. The primary purpose of the POST method is not to retrieve the content of the response body. The POST method was originally developed as an alternative to the GET method to submit form data to the Web server, especially in large quantities. The POST method submits data not directly to the URL, but directly to the request body. The POST method is now primarily used to create a new resource on the server.

PUT

The PUT method is basically the same as the POST method in that it is used to submit parameters. The biggest difference between the PUT method and the POST method is that; PUT is idempotent; POST is not idempotent. The PUT method is now primarily used to update the resource on the server (the client provides the changed full resource). Idempotent operations are characterized by the fact that any number of executions have the same effect as a single execution. So the PUT method is mostly used to update data, and the POST method is mostly used to create data. However, the HTTP1.1 PUT method itself does not have an authentication mechanism. Anyone can upload files, so it has some security issues.

HEAD

The HEAD method is almost identical to the GET method, except that the HEAD method returns nothing in the response, which is mainly used to GET the header. The HEAD method is just the header of the request message, not the complete content. The HEAD method requests the same header information as the GET method. So with this HEAD method we don’t have to transfer the entire content to get the resource information identified by the Request URI we want to Request. So the HEAD method is often used to test the validity of certain hyperlinks.

DELETE

The DELETE method is the opposite of the PUT method. The DELETE method basically deletes the resource specified by the server based on the URI we requested. However, the HTTP1.1 DELETE method itself does not have a validation mechanism, anyone can DELETE files, so it is a security problem.

OPTIONS

The OPTIONS method is used to query the supported methods for the resource specified by the request URI. It’s used when we don’t know what request method the other person supports. When cross-domain resource sharing (CORS) is involved, a CORS request that is not a simple request is added with an HTTP query request, called a “preflight” request, before formal communication. The request method for the “precheck” request is OPTIONS, indicating that the request is being queried.

TRACE

The TRACE method displays the requests received by the server for testing or diagnosis. The client can use the TRACH method to find out how the sent request was modified or tampered with. The TRACH method is used to confirm the sequence of operations that took place during the connection process. The TRACE method is particularly vulnerable to XST(cross-site tracing) attacks.

CONNECT

The CONNECT method opens a two-way communication channel between a client and the requested resource. It can be used to create a tunnel.