Before we talk about HTTP, it’s worth a brief introduction to the Internet and the Web.
The essence of the Internet is to connect computers around the world to form a network based on IP protocol. Then on this basis, to provide a variety of applications, is the so-called application software, the software is usually distributed, which is to deploy and run on multiple computers, and only in this way, the application can provide a complete service, the service is the essence of the storage, exchange and sharing of information, such as the World Wide Web (WWW, Or Web for short), DNS, email, file sharing services, Internet telephony, etc. But with the development of The Times, gradually the Development of the Web more and more powerful, other applications and services slowly fade out of the stage of history. We often equate the Internet with the Web because the Web is where we spend most of our time, and web-mail is where email is now. The Web, originally intended as a document sharing and reference service for research structures such as CERN, was far less powerful than the Web as we know it today.
Complete Web services include three aspects of technology:
- HTTP: defines how documents are transferred between clients and servers.
- URI: Defines a unique identifier for a document, that is, where it can be found.
- HTML: defines how to write documents. (It also includes CSS, JavaScript).
The focus here is HTTP. HTTP stands for Hypertext Transfer Protocol, also known as Hypertext Transfer Protocol. So what is hypertext? Let’s look at what the Oxford Dictionary says about Hypertext: text stored in a computer system that contains links that allow the user to move from one piece of text or document to Another. Simply put, hypertext is text with links that you can click to jump to other text. Hypertext originally isolated text, through the way of links to connect, so that they can refer to each other, reference. After all, the original intention of the invention of Web is for scientific research purposes. Scientific research requires access to a large number of literature. Through hypertext, the reference and reference of literature become much more convenient and efficient.
HTTP was originally invented to transfer documents, but developers soon realized that it could also be used to transfer other resources, such as images and JSON data, so the term HTTP is a bit different from what it actually does. In the field of Internet, this is a very common phenomenon, because of the birth of the Internet late, development of fast, a lot of times people define a new technology, new terms, but soon the real connotation of new technology, new terms will become swollen, and even deviated from the original intention of it was invented, such examples abound. So sometimes we don’t need to tangle with a noun, as long as we know how it came from, understand its historical background, and know its connotation and extension. More importantly, we need to focus on what it is and what it will become in the future.
HTTP gained popularity and success because it was designed to be simple (especially with its first official release, 0.9). People like simple things. Simple protocols mean easy implementation, so clients and servers can quickly support HTTP.
How easy is HTTP/0.9? This line of request is all it is: GET /. Yes, no request header fields, no cookies, nothing. There is only one GET method and one path. And it only supports the transfer of documents, other types of resource files, not at all. HTTP/0.9 does not specify a host, so how does it know which server to call for documents? This is because HTTP/0.9 doesn’t care who the host is. It assumes that you are already connected to port 80 of that host, so make sure you are connected to the target host before sending HTTP/0.9 messages.
Many times, the advantages and disadvantages of a thing are the same. The simple design of HTTP/0.9 is slowly losing pace with the development of the Web. As the Web became more and more complex and transferred more and more documents, HTTP/0.9’s excessive design made it unable to meet more and more complex requirements and was gradually abandoned by developers. Almost all Web servers implemented their own functions extended on HTTP/0.9. This is also a feature of the Web: the specification is always behind the implementation and the implementation is always ahead of the specification. The specification is more of a summary of real-world implementations than a prior constraint — after all, no one knew how the Web would evolve, much less how fast it would evolve.
HTTP/1.0 was immediately on the way. Compared to HTTP/0.9, it adds the following features:
- More request methods: HEAD and POST
- The request/response line adds a description of the HTTP version. If not, the default is HTTP/0.9
- Added request/response header fields
- The response adds a three-digit code that indicates whether the response was successful
Exactly, HTTP/1.0 does not add new syntax or functionality for clients and servers to implement; It is a record of the new capabilities that are naturally evolving from real-world Web clients and servers. So HTTP/1.0 is not really a formal standard, but a cheat sheet.
Let’s focus on the POST method. The POST method allows the HTTP request to behave like a response, with a body that transmits something to the server.
Furthermore, it is the addition of the request/response header field that gives HTTP the ability to transfer media resources other than documents, such as images, audio, video, scripts, and so on. By specifying the content-Type of the response body in the response header, the client knows what type of resource the server is responding to.
Soon, HTTP/1.1 came along with two of the biggest changes:
- The Host field must be included in the request header: Prior to HTTP/1.1, the Host field could not be included in the request header, because the client had to ensure that a TCP connection had been established with the server before making an HTTP request, and the server had to correspond to only one website. Now that you are connected to the server, there is no need to specify who the Host is. But with the development of virtual Host technology, the same server, can be used as the Host of multiple websites, so if you do not specify the Host is who, you do not know who to send the request.
- Persistent connection: Prior to HTTP/1.1, a TCP connection was established for a request. After the response, the connection was closed and the process was repeated for the next request. It was designed this way because early requests were for a single document and a small number of resources were requested. However, with the increasing complexity of web pages, a complete rendering of a web page often requires dozens or even hundreds of static resources, so the repeated establishment and disconnection of TCP connections will consume a lot of time and resources. Therefore, persistent connections are introduced in HTTP/1.1. By default, the Connection field is set as keep-alive in the request header. After the TCP Connection is established, it will not be closed immediately.
However, even with the introduction of persistent connections, HTTP/1.1 still has serious performance problems because HTTP requests/responses can only be made sequentially, a TCP connection can only run one pair of requests/responses, and the next request cannot be sent until the response from the previous request comes back. This results in: If the first request is delayed or stuck for any reason, all subsequent requests will not be sent out. This is known as a head-of-line blocking problem.
So how to solve this problem. Here are a few ideas:
- Pipeline technology: Send multiple requests at the same time. Unfortunately, this technology is not widely supported and basically not available for a variety of reasons.
- Use multiple TCP connections to send requests: usually, the browser establishes a maximum of six connections for each domain name. We can deploy static resource files such as pictures, styles and scripts on different sub-domain names by means of sub-domain names and CDN, thus increasing the maximum number of connections for the browser server in a disguised manner. Since each connection is independent of each other, there is no HOL blocking problem. However, there are drawbacks to this scheme. It takes time to establish a TCP connection (3-way handshake: SYN, SYN-ACK, ACK), maintaining TCP connections requires memory and computational power, the more TCP connections there are, the greater the pressure on the client and server, and TCP connections have slow start and bandwidth consumption problems.
- Reduce the number of requests and process more data per request: cache some static resources, merge resources (e.g. Sprite images, CSS merge, JavaScript packaging), inline some resources in other files (e.g. images can be SVG or base64 encoded), etc.
HTTP/2 versions below transmit content in a text-based format, which means that the content it transmits is an ASCII byte stream, and the client/server receives it and decodes it into text to understand the meaning of the specific content. HTTP/2 transmits content in binary format, and the client/server does not need to decode the content, but splits the bytes directly to know what each byte or group of bytes represents (the byte stream is divided into different intervals, and the semantics of each interval are predefined).