🔊 This article is posted on ⭐ CS-Wiki (Gitee recommended project, 0.9k star). Welcome star ~ 😊
The foreword 0.
Do you know how a Web page appears when we type a URL into the address bar of a Web browser?
The Web interface does not come out of thin air, of course. The Web uses a protocol called HTTP as a specification to carry out some of the flow from client to server, based on the URL specified in the address bar of the Web browser. It can be said that the Web is built on the HTTP protocol for communication.
1. Birth of HTTP
In fact, before March 1983, the Internet belonged to only a few people, and information could not be shared among the world’s netizens. HTTP was born at the dawn of the Internet.
Dr Tim Berners-Lee of CERN, the European organization for Nuclear Research, has come up with an idea for sharing knowledge between distant Internet users. The original idea was to create a World Wide Web (WWW) that could be consulted by connecting multiple documents to each other using hypertests.
Three WWW construction techniques have been proposed, which are:
- Use SGML (Standard General Markup Language) as HTML, the text markup language for pages
- HTTP as a document delivery protocol
- The URL that specifies the address of the document
The name WWW, which was the name of the client application that Web browsers used to browse hypertext, is now used to refer to this collection, or Web for short.
2. What is HTTP
Having said all that, HTTP is awesome, and people still don’t have a very intuitive idea of what HTTP is. Before we know what HTTP is, we need to know what hypertext is.
What HTTP transfers is hypertext:
-
Let’s first understand “text” : in the early days of the Internet, it was just a simple character text, but with the development of technology, now the meaning of “text” has been extended to pictures, videos, compressed packages, etc., in the eyes of HTTP these are considered “text”.
-
And to understand “hypertext” : it’s text that transcends normal text. It’s a mixture of text, pictures, video, etc. The key is hyperlinks, the ability to jump from one hypertext to another.
HTML is the most common hypertext, it itself is only pure text file, but with a lot of internal tags defined images, videos and other links, after the browser analysis, presented to us is a text, a picture of the web page.
OK, so what is HTTP?
HTTP: HyperText Transfer Protocol (HyperText Transfer Protocol) is the most widely used network Protocol on the Internet today. All WWW (World Wide Web) files must comply with this standard. HTTP, like many protocols in the TCP/IP protocol cluster, is used for communication between clients and servers.
3. HTTP stands still
HTTP protocol, which is widely used today, is still the version of more than 20 years ago. In other words, HTTP, the Web document transfer protocol, has hardly been updated. On the other hand, the previous wisdom is awesome 👍
HTTP/0.9: HTTP was introduced in 1990. It was rudimentary, supported only GET requests and could only access resources in HTML format. HTTP was not established as a formal standard at the time and was therefore referred to as HTTP 0.9.
HTTP/1.0: HTTP was officially published as a standard in May 1996, with version number HTTP 1.0. Improved in version 0.9, adding POST and HEAD requests; No longer limited to HTML version 0.9, multiple data formats can be supported according to the Content-Type…… Note that version 1.0 works with short connections. Although HTTP/1.0 was an early standard, the protocol standard is still widely used today.
HTTP/1.1: PUBLISHED in 1997, HTTP 1.1 is the current mainstream VERSION of HTTP. The emergence of HTTP protocol in those days is mainly to solve the problem of text transmission, now HTTP has already exceeded the limitations of the Web framework, has been used in various scenarios. Of course, the biggest change in version 1.1 is the introduction of long connections and pipelining.
I’m just going to give you an idea of all the proper nouns that we’re going to talk about.
4. Distinguish BETWEEN URLS and URIs
You are more familiar with Uniform Resource Location (URL) than URIs. A URL is the address of a Web page that you enter when accessing a Web page using a Web browser. Such as http://baidu.com.
Uris are Uniform Resource Identifiers. RFC 2386 defines these three words as follows:
- Uniform: A Uniform format makes it easy to handle many different types of resources
- Resource: The definition of a Resource is anything identifiable. It can not only be a single, it can be a set
- Identifier: Identifies identifiable objects. Also called identifiers
In summary, a URI is a location identifier for a resource represented by a protocol method. For example, when HTTP is used, the protocol scheme is HTTP. In addition, there are about 30 standard URI protocol methods, such as FTP and Telnet.
Uris come in two formats, relative urIs and absolute URIs.
-
Relative URI: Indicates the URL specified from the basic URI in the browser, for example, /user/logo.png
-
Absolute URIs: Use to cover all necessary information
To sum up: URIs use strings to identify an Internet resource, while urls identify the location of the resource (the location on the Internet). Therefore, URLS are a subset of URIs.
5. HTTP request and response
According to the HTTP protocol, when two computers communicate using THE HTTP protocol, one end of a communication line must be the client and the other end must be the server. When you enter a url into the browser to visit a web site, your browser (client) encapsulates your request as an HTTP request and sends it to the server site. The server receives the request and sends the response data as an HTTP response back to the browser. In other words, the communication must start with the client, and the server will not send a response until the request is received.
Let’s analyze HTTP request and response packets in detail
① HTTP request packets
An HTTP request packet consists of three parts:
1) Request line (must be the first line of the HTTP request message)
2) Request header (starting at line 2 and ending at line 1 blank. There is a blank line between the request header and the request body.
3) Request body (usually in the form of key-value pair {key:value} passing data)
Here is an example of a request message:
The POST at the beginning of the request line indicates the type of server requested, called a method. The subsequent string /form/login identifies the resource object requested, also known as the request URI (request-URI). The final HTTP/1.1 is the HTTP version number, which indicates the HTTP protocol function used by the client.
Request name = veal, age = 37 request name = veal, age = 37 request name = veal, age = 37
Note that there is always a blank line between the request/response header and the request/response body in either HTTP request or HTTP response message, and the request/response body is not required.
HTTP request methods
Methods on the request line can be used to specify the desired behavior of the requested resource by using methods to command the server.
The OPTIONS are GET, POST, PUT, HEAD, DELETE, OPTIONS, CONNECT, and TRACE. Of course, the first three are the only ones we use most often in development.
1) GET Obtains resources
The GET method is used to request access to a resource identified by a URI. The specified resource is parsed by the server and the response content is returned
An example of request-response using the GET method:
2) POST transmits the entity body
POST is used to transfer data, while GET is used to obtain resources.
Examples of request-response using the POST method:
3) PUT Transfers files
The PUT method is used to transfer files. Because no authentication mechanism is provided, anyone can upload files. Therefore, this method is not used because of security issues.
Examples of request-response using the PUT method:
4) HEAD Obtains the packet header
Similar to the GET method, but does not return the body part of the packet entity. This parameter is used to check the validity of the URI and the date and time of resource update.
Examples of request-response using the HEAD method:
5) DELETE Deletes a file
In contrast to PUT, it is used to delete files and delete specified resources according to the request URI without authentication.
Examples of request-response using the DEELTE method:
6) OPTIONS Query supported methods
Used to get the method supported by the current URI. If the request succeeds, the HTTP response header contains a field named “Allow” whose value is the supported method, such as “GET, POST”.
An example of request-response using the OPTIONS method:
7)…
The HTTP request header
The request header is used to supplement the request with additional information, client information, priority related to the response content, and so on. Common headers are listed below:
1) Referer: Indicates the URI from which the request was jumped. For example, when searching Taobao.com through Baidu, the value of Referer in the request message entering Taobao.com is: www.baidu.com. You wouldn’t have this header if you were accessing directly. This field is usually used to prevent theft.
2) Accept: Tells the server what type of response data the request can support. (Content-Type indicates the Type of data sent by the server. If the Type specified by Accept is inconsistent with the Type returned by the server, an error will be reported.)
Text /plain in the figure above; Q = 0.3 indicates that the priority and weight of data for text or plain media are 0.3 (q ranges from 0 to 1). If the weight is not specified, the default value is 1.0.
The data format types are shown below:
3) Host: informs the server of the Internet Host name and port number of the requested resource. This field is the only one in the HTTP/1.1 specification that must be included in the request header.
4) Cookie: The Cookie of the client is transmitted to the server through this header attribute!
Cookie: JSESSIONID=15982C27F7507C7FDAF0F97161F634B5
Copy the code
5) Connection: indicates the Connection type between the client and the service; Keep-alive indicates that the connection is persistent and close is closed
6) Content-length: specifies the Length of the request body
7) Accept-language: the browser notifying the server of the Language supported by the browser
8) Range: for Range requests that require only part of the resource, including the header field Range can tell the server the specified Range of the resource
9)…
② HTTP response packets
The HTTP response packet also consists of three parts:
- Response line (must be the first line of the HTTP response message)
- Response header (starting at line 2 and ending at line 1 blank. There is a blank line between the response header and the response body.
- Response body
HTTP 1.1 at the beginning of the response line indicates the HTTP version for the server. The 200 OK that follows represents the status code and reason phrase for the processing result of the request.
The HTTP status code
The HTTP status code is responsible for representing the return result of the client HTTP request, marking whether the server processing is normal, and notifying the error. Top priority!! And our daily development.)
The status code consists of three digits, the first of which defines the category of the response:
category | The reason the phrase | |
---|---|---|
1xx | Informational Indicates the Informational status code | The received request is being processed |
2xx | Success Indicates the Success status code | The request is successfully processed |
3xx | Redirection Redirection status code | Additional action is required to complete the request |
4xx | Client Error Indicates the Client Error status | The server cannot process the request |
5xx | Server Error Indicates the Server Error status code | The server failed to process the request |
🔶 2xx: The request is successfully processed
-
200 OK: The client request is successful
-
204 No Content: No Content. The server processed successfully, but did not return content. This is usually used when the client sends information to the server, but the server does not return any information to the client. The page is not refreshed.
-
206 Partial Content: The server has completed a Partial GET request (the client made a scope request). The response packet contains the entity Content in the Range specified by content-range
🔶 3xx: Additional action is required to complete the request (redirection)
-
301 Moved Permanently: Permanently redirected: Indicates that the requested resource has been Permanently Moved to another location.
-
302 Found: Temporary redirection, indicating that the requested resource was temporarily moved to another location
-
303 See Other: indicates temporary redirection. Use GET to obtain requested resources. 303 functions the same as 302, except that 303 specifies that the client should use GET access
-
304 Not Modified: Indicates that the client sends a conditional request (IF in the GET request packet). , the condition is not met. When 304 is returned, no response body is included. Although 304 is classified in 3XX, it has nothing to do with redirection
-
307 Temporary Redirect: indicates Temporary redirection, which has the same meaning as 302. POST doesn’t become GET
🔶 4xx: A client error occurs
-
400 Bad Request: The client Request has a syntax error that the server cannot understand.
-
401 Unauthorized: The request is not authorized. This status code must be used with the WWW-Authenticate header field.
-
403 Forbidden: The server receives requests but refuses to provide services
-
404 Not Found: The requested resource does Not exist. For example, enter the wrong URL
-
415 Unsupported media type: indicates the Unsupported media type
🔶 5xx: A server error occurred. The server failed to implement a valid request.
-
500 Internal Server Error: An unexpected Error occurs on the Server.
-
503 Server Unavailable: The Server is overloaded or is being shut down for maintenance. Therefore, the Server cannot process requests from clients temporarily. However, the Server may recover after a period of time
The HTTP response headers
The response header also uses key-value pairs k: V to supplement additional information about the response, server information, and additional requirements for the client.
The Location field, highlighted here, directs the response recipient to a resource other than a URI Location. Typically, this field provides the URI for the Redirection in conjunction with the 3xx:Redirection response.
6. HTTP connection management
Short connection (non-persistent connection)
In the original version of the HTTP protocol (HTTP/1.0), each HTTP session between the client and server established a connection and terminated the connection at the end of the task. When the client browser accesses an HTML or other type of Web page that contains other Web resources (such as JavaScript files, image files, CSS files, etc.), the browser re-establishes an HTTP session each time it encounters such a Web resource. This is called a short connection (also called a non-persistent connection).
This means that each HTTP request must be re-established. Because HTTP is based on TCP/IP, each connection established or disconnected requires the TCP three-way handshake or TCP four-way wave.
Obviously, there are huge drawbacks to this approach. For example, when accessing an HTML page containing multiple images, each image resource request will cause unnecessary TCP connection establishment and disconnection, which greatly increases the communication overhead
② Long connection (persistent connection)
Since HTTP/1.1, the default connection is keep-alive. Long Connection HTTP adds this line to the response header: Connection:keep-alive
In the case of a long connection, when a web page is opened, the TCP connection between the client and the server for the transmission of HTTP data is not closed. When the client accesses the server again, it continues to use the established connection. Keep-alive does not hold a connection forever, it has a hold time that can be set in different server software such as Apache. To implement persistent connections, both clients and servers must support persistent connections.
HTTP long connection and short connection are essentially TCP long connection and short connection.
③ Assembly line (pipework)
By default, HTTP requests are made sequentially, and the next request is made only after the current request has received a response. Due to network latency and bandwidth constraints, it can take a long time before the next request is sent to the server.
Persistent connections make it possible for most requests to be pipelined, sending requests consecutively over the same persistent connection without waiting for a response to return. This allows multiple requests to be sent in parallel at the same time instead of waiting for one response after another.
7. Stateless HTTP
HTTP is a stateless protocol. That is, it does not manage the status of previous requests and responses, that is, it cannot process this request based on the status of previous requests.
This leads to an obvious problem. If HTTP can’t remember the user’s login status, won’t the user need to log in again every time the page jumps?
Of course, there is no denying that stateless has significant advantages as it naturally reduces the CPU and memory resource consumption of the server by not having to save state. HTTP, on the other hand, is so widely used because of its simplicity.
In this way, while retaining the feature of stateless protocol, the problem caused by statelessness should be solved. There are many solutions, among which the simple way is to use Cookie technology.
Cookie Controls client status by writing Cookie information in request and response packets. Specifically, the Cookie tells the client to save the Cookie based on a header field called set-cookie in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet. After receiving the Cookie sent by the client, the server will check which client sent the connection request, and then compare the records on the server to obtain the previous status information.
In terms of image, after the client’s first request, the server will issue an ID card with the client’s information. When the client requests the server, the server will recognize the ID card.
The following figure shows the situation where Cookie interaction occurs:
1) Request without Cookie information:
Corresponding HTTP request message (without Cookie information status)
GET /reader/ HTTP/1.1 Host: baidu.com * The header field does not contain Cookie informationCopy the code
Corresponding HTTP response packet (Cookie information generated by the server)
HTTP/1.1 200 OK Date: Thu, 12 Jul 2020 15:12:20 GMT Server: Apache < set-cookie: sid=1342077140226; path=/; expires=Wed, 10-Oct-12 15:12:20 GMT> Content-Type: text/plain; charset=UTF-8Copy the code
2) Requests after the second time (with Cookie information status)
The corresponding HTTP request message (automatically sends the saved Cookie information)
GET /image/ HTTP/1.1
Host: baidu.com
Cookie: sid=1342077140226
Copy the code
8. HTTP breakpoint resumable
The so-called breakpoint continuation refers to the downloading of the transfer file can be interrupted, and then re-download can be resumed at the left place, without having to start from the beginning. Breakpoint continuation requires both client and server support.
This is a very common feature that works simply as a simple use of the Range field in the HTTP request header and content-range field in the response header. The client requests the data piece by piece and finally splices the downloaded data piece together into a complete data piece. For example, if a browser requests a service on a server, it makes the following request:
Assume that the server domain name is www.baidu.com and the file name is down.zip.
GET/down.zip HTTP / 1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms- excel, application/msword, application/vnd.ms-powerpoint, */* Accept-Language: zh-cn Accept-Encoding: gzip, deflate User-Agent: Mozilla / 4.0 (compatible; MSIE 5.01; Windows NT 5.0) Connection: keep-aliveCopy the code
After receiving the request, the server looks for the requested file as required, extracts the information of the file, and then returns the following information to the browser:
200 Content-Length=106786028 Accept-Ranges=bytes Date=Mon, 30 Apr 2001 12:56:11 GMT ETag=W/" 02CA57E173C11:95B "Content-type = Application /octet-stream Server=Microsoft-IIS/5.0 Last-Modified=Mon, 30 Apr 2001 12:56:11 GMTCopy the code
OK, so since the breakpoint is to continue, the client browser requests the server with an additional message – where to start the data request. For example, we want to start with 2000070 bytes:
Zip HTTP/1.0 user-agent: NetFox RANGE: bytes=2000070- Accept: text/ HTML, image/ GIF, image/jpeg, *; q=.2, */*; q=.2Copy the code
If you look closely, you’ll notice the extra line RANGE: bytes=2000070-. This line tells the server that down.zip starts with 2000070 bytes.
When the server receives this request, it returns the following message:
206 Content-Length=106786028 Content-Range=bytes 2000070-106786027/106786028 Date=Mon, 30 Apr 2001 12:55:20 GMT ETag=W/" 02CA57E173C11:95B "Content-type = Application /octet-stream Server=Microsoft-IIS/5.0 Last-Modified=Mon, 30 Apr 2001 12:55:20 GMTCopy the code
Compare this to the previous information returned by the server, and you will see the added line: Content-range =bytes 2000070-106786027/106786028. The code returned has also been changed to 206 instead of 200.
9. Disadvantages of HTTP
So far, we have seen that HTTP has a very good and convenient side. However, there are two sides to everything, and it also has its drawbacks:
- Communications use clear text (not encrypted) and the content can be eavesdropped
- The identity of the communicating party is not verified, so it is possible to encounter camouflage
- The integrity of the packet cannot be proved, so it may be tampered with
These problems occur not only in HTTP, but also in other unencrypted protocols. In order to solve HTTP pain points, HTTPS applications were born. To put it plainly, HTTP + encryption + authentication + integrity protection is HTTPS protocol, and content about HTTPS protocol is also very much and important. There will be a separate article to explain it later.
| flying veal 🎉 pay close attention to the public, get updates immediately
- The blogger is a postgraduate student in Southeast University and runs a public account “Flying Veal” in her spare time. The account was opened for the first time on 2020/12/29. Focus on sharing computer basics (data structure + algorithm + computer network + database + operating system + Linux), Java basics and interview guide related original technical good articles. The purpose of this public account is to let you can quickly grasp the key knowledge, targeted. I hope you can support me and grow with veal 😃