Well worth a look at HTTP article, both basic and in-depth

Writing in the front

If you find something, please follow or like it. Thanks for your support. HTTP is short for Hyper Text Transfer Protocol. It is used to Transfer hypertext from the World Wide Web server to the local browser. HTTP is a TCP/ IP-based communication protocol to transfer data (HTML files, image files, query results, etc.). This article introduces you to HTTP in step-by-step detail. The table of contents is on the right, and contains a rich table of fields that you can refer to as an API manual.

Basic concept

URI

Uris contain urls and UrNs.

Request and response messages

The request message

The response message

HTTP method

The first line of the request packet sent by the client contains the method field.

GET, obtain resources, current network requests, the vast majority of use is GET method.
HEAD: gets the packet HEAD, similar to GET, but does not return the body of the packet. It is used to verify the validity of the URL and the date and time of resource update.
POST is used to transfer the entity body. POST is used to transfer data, and GET is used to obtain resources.
PUT: uploads files. Anyone can upload files without the authentication mechanism. Therefore, this method is not used because of security problems.
PATCH: partially modifies resources. PUT can also be used to modify resources, but can only completely replace the original resources. PATCH allows partial modification.
DELETE: deletes a file. It is the opposite of PUT and does not have the authentication mechanism.
OPTIONS, query supported methods, query the methods supported by the specified URL. Returns theAllow: GET, POST, HEAD, OPTIONSSomething like that.
CONNECT: Establishes a tunnel for communication with the proxy server, encrypts the communication using THE Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols, and then transmits the communication through the network tunnel.
The server will return the communication path to the client. Send the request inMax-ForwardsThe header field is filled with a value, which is reduced by 1 each time it passes through a server, and stops transmission when the value reaches 0. TRACE is not usually used, and it is vulnerable to XST attacks (cross-site Tracing).

The HTTP status code

The first behavior status line in the response message returned by the server contains the status code and reason phrase to inform the client of the result of the request.

Status code	category	meaning
1XX	Informational (Informational status code)	The received request is being processed
2XX	Success (Success Status code)	The request is successfully processed
3XX	Redirection (Redirection status code)	Additional action is required to complete the request
4XX	Client Error (Client Error status code)	The server cannot process the request
5XX	Server Error	The server failed to process the request

1 xx information
- 100 Continue: Indicates that everything is fine so far and the client can Continue sending the request or ignore the response.
2 xx success
- 200 OK
- 204 No Content: The request has been processed successfully, but the response message returned does not contain the body of the entity. Typically used when you only need to send information from the client to the server without returning data.
- 206 Partial Content: Indicates that the client sends a Range request. The response packet contains the entity Content specified by Content-range.
3 xx redirection
- 301 Moved Permanently: Permanent redirection
- 302 Found: Temporary redirection
- 303 See Other: Provides the same functions as 302, but 303 explicitly requires that the client use the GET method to obtain resources.
- Note: Most browsers will change POST to GET for 301, 302, and 303 redirects, although HTTP does not allow you to change POST to GET for 301, 302, and 303 redirects.
- 304 Not Modified: If the request header contains some conditions, for example: If-match, if-modified-since, if-none-match, if-range, if-unmodified-since, If the condition is not met, the server returns the 304 status code.
- 307 Temporary Redirect: indicates that the browser does not change the POST method to GET method for redirecting requests.
4XX Client error
- 400 Bad Request: Syntax errors exist in the Request packet.
- 401 Unauthorized: This status code indicates that the request to be sent requires authentication information (BASIC authentication and DIGEST authentication). If the request has been made before, the user authentication failed.
- 403 Forbidden: The request is rejected.
- 404 Not Found
5XX Server error
- 500 Internal Server Error: An Error occurred while the Server was performing a request.
- 503 Service Unavailable: The server is temporarily overloaded or is down for maintenance and cannot process requests now.

The HTTP header

There are four types of header fields: general header fields, request header fields, response header fields, and entity header fields. The various header fields and their meanings are as follows (not necessary to remember all, just for reference) :

Generic header field

Header field name	instructions
Cache-Control	Controls the behavior of caching
Connection	Controls header fields that are no longer forwarded to agents and manages persistent connections
Date	Date and time when the packet was created
Pragma	Packet instructions
Trailer	View the header of the packet end
Transfer-Encoding	Specifies the transmission code of the packet body
Upgrade	Upgrade to another protocol
Via	Proxy server information
Warning	Error notification

Request header field

Header field name	instructions
Accept	The type of media that the user agent can handle
Accept-Charset	Preferred character set
Accept-Encoding	Priority content encoding
Accept-Language	Preferred language (natural language)
Authorization	Web Authentication Information
Expect	Expect specific behavior from the server
From	Email address of the user
Host	Request the server where the resource resides
If-Match	Compare Entity Tag (ETag)
If-Modified-Since	Compares the update times of resources
If-None-Match	Compare entity tags (as opposed to if-match)
If-Range	Send scope requests for entity Byte when the resource is not updated
If-Unmodified-Since	Compare resource update times (as opposed to if-modified-since)
Max-Forwards	Maximum transmission hop by hop
Proxy-Authorization	The proxy server requires authentication information of the client
Range	Byte range request for the entity
Referer	The original acquirer of the URI in the request
TE	Priority of transmission encoding
User-Agent	HTTP client program information

Response header field

Header field name	instructions
Accept-Ranges	Whether to accept byte range requests
Age	Calculate the elapsed time of resource creation
ETag	Matching information of resources
Location	Causes the client to redirect to the specified URI
Proxy-Authenticate	The proxy server authenticates the client
Retry-After	Request the timing of the request to be made again
Server	HTTP server installation information
vary	Proxy server cache management information
WWW-Authenticate	Authentication information about the server to the client

Entity head field

Header field name	instructions
Allow	HTTP methods supported by the resource
Content-Encoding	The encoding method applicable to the entity body
Content-Language	The natural language of entity subjects
Content-Length	The size of the entity body
Content-Location	Replace the URI of the corresponding resource
Content-MD5	The packet digest of the entity body
Content-Range	The location range of the entity body
Content-Type	The media type of the entity body
Expires	The date and time when the entity body expires
Last-Modified	The last modified date and time of the resource

The specific application

Connection management

Short and long links: When a browser accesses an HTML page with multiple images, it requests image resources in addition to the REQUESTED HTML page resources. If you had to create a TCP connection for every HTTP communication, it would be expensive. A long connection requires only one TCP connection to carry out multiple HTTP communications.
- Since HTTP/1.1, the default Connection is long. To disconnect the Connection, the client or server must request the Connection: close.
- Before HTTP/1.1, the default was short connections. If long connections are needed, Connection: keep-alive is used.
Pipelining: By default, HTTP requests are made sequentially, and the next request is made only after the current request has received a response. Due to network latency and bandwidth constraints, it can take a long time before the next request is sent to the server. Pipelining can reduce latency by continuously making requests over the same long connection without waiting for a response to return.

Cookie

The HTTP protocol is stateless, mainly to keep HTTP as simple as possible so that it can handle a large number of transactions. HTTP/1.1 introduced cookies to store state information.

Cookie is a small piece of data that the server sends to the user’s browser and saves locally. It will be carried when the browser sends another request to the same server, which is used to tell the server whether the two requests come from the same browser. Since each subsequent request will need to carry Cookie data, there is an additional performance overhead (especially in a mobile environment).

Cookies were once used to store client data as the only storage method because there was no other suitable storage method, but now as modern browsers begin to support a variety of storage methods, cookies are gradually phased out. New browser apis already allow developers to store data directly locally, such as using the Web Storage API (local and session storage) or IndexedDB.

use

Session state management (such as user login status, shopping cart, game score, or other information that needs to be logged)
Personalization (such as user-defined Settings, themes, etc.)
Browser behavior tracking (e.g. tracking and analyzing user behavior, etc.)

The creation process

The response packet sent by the server contains the set-cookie header field. After receiving the response packet, the client saves the Cookie content to the browser.

When the client later sends a request to the same server, it will fetch the Cookie information from the browser and send it to the server through the Cookie request header field.

classification

Session Cookie: It is automatically deleted after the browser is closed, that is, it is valid only during the session.
Persistent Cookie: A Cookie that becomes persistent after an expiration or max-age is specified.

scope

The Domain identifier specifies which hosts can accept cookies. If this parameter is not specified, it defaults to the host of the current document (excluding the subdomain name). If Domain is specified, subdomain names are generally included. For example, if you set Domain=mozilla.org, cookies are also included in the subdomain (e.g. Developer.mozilla.org).
The Path identifier specifies which paths under the host can accept cookies (the URL Path must exist in the request URL). Child paths are also matched with the character %x2F (“/”) as the path separator. For example, if Path=/docs is set, the following addresses will match:
- /docs
- /docs/Web/
- /docs/Web/HTTP

JavaScript

Browser passdocument.cookieProperty to create new cookies or to access cookies that are not HttpOnly tokens.

HttpOnly

Cookies marked HttpOnly cannot be called by JavaScript scripts. Cross-site scripting attacks (XSS) often use JavaScriptdocument.cookieApis steal users’ Cookie information, so using HttpOnly tags can protect against XSS attacks to some extent.

Secure

Cookies marked Secure can only be sent to the server through requests encrypted by THE HTTPS protocol. But even if Secure flags are set, sensitive information should not be transmitted through cookies because cookies are inherently insecure and Secure flags do not provide a guarantee of security.

Session

In addition to storing user information in the browser through cookies, Session can also be used to store information on the server, which is more secure. Sessions can be stored in files, databases, or memory on the server. Sessions can also be stored in an in-memory database such as Redis, which is more efficient. The process of using Session to maintain user login status is as follows:

When a user logs in, the user submits a form containing the user name and password and sends it to the HTTP request packet.
The server verifies the user name and password, and if correct, stores the user information in Redis. Its Key in Redis is called Session ID.
The set-cookie header field of the response packet returned by the server contains the Session ID. After receiving the response packet, the client saves the Cookie value in the browser.
The Cookie value will be included when the client makes a request to the same server. After receiving the Cookie, the server extracts the Session ID and the user information from Redis to continue the previous business operations.

Note that the Session ID cannot be easily obtained by malicious attackers. In this case, you cannot generate an easily guessed Session ID value. In addition, Session ids need to be regenerated frequently. In scenarios with high security requirements, such as transfer of money, you need to use Session to manage user status and re-authenticate users, such as re-entering passwords or using SMS verification codes.

Browser disables cookies

In this case, you cannot use cookies to save user information. Only Session can be used. In addition, the Session ID can no longer be stored in cookies. Instead, the URL rewriting technology is used to pass the Session ID as the URL parameter.

Cookie and Session selection

Cookies can only store ASCII characters, while sessions can store any type of data. Therefore, Session is preferred when considering data complexity.
Cookies are stored in a browser and can be viewed maliciously. If you want to store some privacy data in cookies, you can encrypt the Cookie value, and then decrypt it on the server;
For large web sites, it is very expensive to store all user information in a Session, so it is not recommended to store all user information in a Session.

The cache

advantages

Relieve server stress;
Reduce client latency in obtaining resources: Caches are usually in memory and can be read faster. Also, the cache server may be geographically closer than the source server, such as the browser cache.

Implementation method

Let the proxy server cache;
Let the client browser cache.

Cache-Control

HTTP/1.1 controls caching through cache-control header fields.

Disable caching:no-storeThe directive states that no part of the request or response can be cached.
Mandatory confirmation cache:no-cacheThe directive states that the cache server must first verify the validity of the cache resource with the source server, and only when the cache resource is valid can it be used to respond to the client’s request.
Private and public caches:privateThe directive specifies that the resource be a private cache that can only be used by individual users and is typically stored in the user’s browser.

publicThe directive specifies the resource as a common cache that can be used by multiple users and is typically stored in a proxy server.
Cache expiration mechanism:max-ageIf an instruction appears in a request message and the cache resource is cached for less than the time specified by the instruction, the cache is accepted.max-ageThe instruction appears in the response message and represents how long the cache resource is held in the cache server.

ExpiresThe header field can also be used to tell the cache server when the resource will expire.

在 HTTP / 1.1Is processed preferentiallymax-ageInstruction; inHTTP / 1.0,max-ageInstructions are ignored.

Cache validation

Need to knowETagThe meaning of the header field, which is the unique identifier of the resource. A URL cannot uniquely represent a resource, for examplewww.google.com/There are two resources: Chinese and English. Only ETag can uniquely identify these two resources.

Can cache resourcesETagValues intoIf-None-MatchHeader, after receiving the request, the server determines the cache resourceETagValues and resources are up to dateETagIf the values are consistent, the cache resource is valid304 Not Modified.

Last-ModifiedThe header field, which can also be used for cache validation, is included in the response message sent by the source server and indicates when the resource was last modified by the source server. But it is a weak validator because it can only be accurate to one second, so it is usually used as aETagBackup plan. If the response header field contains this information, the client can carry it with it in subsequent requestsIf-Modified-SinceTo verify the cache. The server will only return the requested resource if its contents have been modified after a given date and time, with a status code of200 OK. If the requested resource has not been modified since then, one with no entity body is returned304 Not ModifiedResponse packet.

Content negotiation

Content negotiation is performed to return the most appropriate content, for example, whether to return to the Chinese or English screen based on the default language of the browser.

type

Server-side driven: The client sets specific HTTP header fields, such as Accept, accept-charset, accept-encoding, and Accept-language, and the server returns specific resources based on these fields. It has the following problems:
- It is difficult for the server to know all the information about the client’s browser;
- The information provided by the client is rather verbose (HTTP/2Header compression alleviates this problem) and has privacy risks (HTTP fingerprinting);
- Given resources need to return different representations, shared caches become less efficient, and server-side implementations become more complex.
Proxy-driven: The server returns 300 Multiple Choices or 406 Not Acceptable, from which the client selects the most suitable resource.

vary

In the case of content negotiation, the cache can only be used if the cache in the cache server meets the content negotiation criteria, otherwise the resource should be requested from the source server. For example, a client sends an includeAccept-LanguageAfter the request for the header field, the response returned by the source server containsVary: Accept-LanguageAfter the cache server caches the response, the next time the client accesses the same URL resource, andAccept-LanguageThe cache is returned only if it matches the corresponding value in the cache.

Content encoding

Content encoding compresses the entity body to reduce the amount of data transferred.

The commonly used content encodings are gzip, COMPRESS, Deflate, and Identity.

The browser sends the accept-Encoding header, which contains the supported compression algorithms and their respective priorities. The server chooses one of these, uses the algorithm to compress the body of the response message, and sends the Content-Encoding header to tell the browser which algorithm it chose. Because the Content negotiation process selects the representation of the resource based on the Encoding type, the Vary header field in the response message must contain at least content-Encoding.

Scope of the request

If there is a network outage and the server sends only part of the data, a range request enables the client to request only that part of the data that the server did not send, avoiding the server from resending all the data.

Range

Add the Range header field to the request message to specify the Range of the request.

If the request is successful, the server returns a response containing 206 Partial Content status codes.

Accept-Ranges

Response header fieldAccept-RangesUsed to tell the client whether a range request can be processed and usedbytes, otherwise usenone.

Response status code

In case the request is successful, the server returns206 Partial ContentStatus code.
In case the scope of the request is out of bounds, the server returns416 Requested Range Not SatisfiableStatus code.
In cases where a range request is not supported, the server returns200 OKStatus code.

Block transfer encoding

Chunked Transfer Encoding, which splits data into chunks and lets the browser progressively display the page.

A collection of multipart objects

A message body can contain multiple types of entities and be sent at the same time. Each part is separated by the delimiter defined by the Boundary field. Each part can have a header field. For example, you can upload multiple forms as follows:

Virtual host

HTTP/1.1 uses virtual hosting technology to allow a single server to have multiple domain names and be logically treated as multiple servers.

Communication data forwarding

The agent

The proxy server accepts requests from clients and forwards them to other servers. The main purposes for using proxies are:

The cache
Load balancing
Network access control
Access logging

Proxy servers are classified into forward proxy and reverse proxy:

The user detects the presence of a forward proxy.
A reverse proxy is usually located on an internal network and is invisible to users.

The gateway

Unlike proxy servers, gateway servers convert HTTP to other protocols for communication, thus requesting services from other than HTTP servers.

The tunnel

A secure communication line is established between the client and server using encryption methods such as SSL.

HTTPS

HTTP has the following security issues:

Plaintext communications can be intercepted.
If the identity of the communicating party is not verified, the identity of the communicating party may be disguised;
The integrity of the packet cannot be proved and the packet may be tampered with.

HTTPS is not a new protocol. It allows HTTP to communicate with SSL (Secure Sockets Layer), and then SSL and TCP, which means HTTPS uses tunnels to communicate. Using SSL, HTTPS has encryption (anti-eavesdropping), authentication (anti-counterfeiting), and integrity protection (anti-tampering).

encryption

Symmetric key encryption

Symmetric-key Encryption: The same Key is used for Encryption and decryption.

Advantages: fast operation speed;
Disadvantages: Cannot securely transfer the key to the communicator.

Asymmetric key encryption

Asymmetric Key Encryption, also known as public-key Encryption, uses different keys for Encryption and decryption.

The public key is available to all. After obtaining the public key of the receiver, the sender can use the public key to encrypt the communication, and the receiver uses the private key to decrypt the communication.

In addition to encryption, asymmetric keys can also be used for signing. Because the private key cannot be obtained by others, the communication sender uses its private key to sign, and the communication receiver uses the public key of the communication sender to decrypt the signature to determine whether the signature is correct.

Advantages: The public key can be transmitted to the communication sender more securely.
Disadvantages: Slow operation speed.

HTTPS Encryption mode

As mentioned above, symmetric Key encryption has higher transmission efficiency, but the Secret Key cannot be securely transmitted to the communicator. Asymmetric Key encryption can ensure the security of transmission, so we can use asymmetric Key encryption to transmit the Secret Key to the communicator. HTTPS uses a hybrid encryption mechanism that takes advantage of the scheme mentioned above:

The asymmetric Key encryption mode is used to transmit the Secret Key required by the symmetric Key encryption mode to ensure security.
After obtaining the Secret Key, use the symmetric Key encryption mode for communication to ensure efficiency. (The Session Key in the figure below is the Secret Key)

certification

The communicator is authenticated by using a certificate. A digital Certificate Authority (CA) is a third-party organization trusted by both the client and the server. The operator of the server applies for a public key to the CA. After identifying the identity of the applicant, the CA digitally signs the applied public key, assigns the signed public key, and binds the signed public key into the public key certificate.

During HTTPS communication, the server sends the certificate to the client. After obtaining the public key, the client uses the digital signature for verification. If the verification succeeds, the client can start communication.

Integrity protection

SSL provides packet summarization to protect packet integrity. HTTP also provides MD5 packet summarization, but it is not secure. For example, if the MD5 value is recalculated after the packet content is tampered, the receiver cannot be aware of the tampering.

The packet summarization function of HTTPS is secure because it combines encryption and authentication. If an encrypted message is tampered with, it is difficult to recalculate the message digest because the plaintext is not readily available.

The disadvantage of the HTTPS

Because of the need for encryption and decryption process, so the speed will be slower;
A high fee for certificate authorization is required.

HTTP / 2.0

HTTP / 1 x defects

HTTP/1.x implementation simplicity comes at the expense of performance:

Clients need to use multiple connections to achieve concurrency and reduce latency;
Does not compress the request and response headers, resulting in unnecessary network traffic;
No valid resource priority is supported, resulting in low utilization of underlying TCP connections.

Binary frame layer

HTTP/2.0 divides packets into HEADERS frames and DATA frames, both of which are in binary format.

During communication, only one TCP connection exists, which hosts any number of two-way streams.

A data Stream has a unique identifier and optional priority information for carrying two-way information.
A Message is a complete set of frames corresponding to a logical request or response.
Frames are the smallest unit of communication. Frames from different data streams can be sent interlaced and then reassembled according to the data stream identifier of each Frame header.

Server push

HTTP / 2.0When a client requests a resource, related resources are sent to the client, so that the client does not need to initiate the request again. For example, a client requestpage.htmlPage, the server on thescript.js 和 style.cssAnd other related resources are sent to the client.

The first compression

HTTP / 1.1“, with a lot of information at the beginning, and repeated each time.HTTP / 2.0Requiring both the client and server to maintain and update a table of previously seen header fields avoids repeated transfers. Not only that,HTTP / 2.0Huffman encoding is also used to compress the header field.

HTTP / 1.1 new features

See above for details

The default is long connection
Support pipeline
Multiple TCP connections can be opened simultaneously
Support for virtual hosting
Added status code 100
Supports block transfer encoding
New cache processing instructionmax-age

Compare GET and POST

role

GET is used to GET resources, and POST is used to transport entity bodies.

parameter

Both GET and POST requests can use additional parameters, but the parameters of GET appear in the URL as a query string, while the parameters of POST are stored in the entity body. Just because the POST parameter is stored in the body of the entity does not mean it is more secure, as it can still be viewed with some packet capture tool (Fiddler).

Because THE URL only supports ASCII codes, if there are Chinese characters in the parameters of GET, they need to be encoded first. For example, Chinese will be converted to%E4%B8%AD%E6%96%87And the space will be converted to% 20. The POST parameter supports the standard character set.

security

A secure HTTP method does not change server state, which means it is only readable.

The GET method is secure, but the POST method is not, because the purpose of POST is to transmit the entity body content, which may be form data uploaded by the user. Once the upload is successful, the server may store this data in the database, thus changing the state.

There are other secure methods besides GET: HEAD, OPTIONS.

In addition to POST, other insecure methods are PUT and DELETE.

idempotence

Idempotent HTTP methods, the same request is executed once as many times in a row, and the state of the server is the same. In other words, idempotent methods should not have side effects (except for statistical purposes). All security methods are idempotent. With the right implementation,GET, HEAD, PUT and DELETEAll methods are idempotent, andPOSTThe method is not.GET/pageX HTTP / 1.1Is idempotent, and the client receives the same result for multiple consecutive calls:

POST/add_row HTTP / 1.1Not idempotent, if called more than once, multiple lines will be added:

DELETE/independence idX/DELETE HTTP / 1.1Is idempotent, even if different requests receive different status codes:

cacheable

If you want to cache the response, the following conditions need to be met:

The HTTP methods of request messages themselves are cacheable, including GET and HEAD, but PUT and DELETE are not cacheable, and POST is not cacheable in most cases.
The status code of the response message is cacheable, including:200, 203, 204, 206, 300, 301, 404, 405, 410, 414, and 501.
message-responsiveCache-ControlHeader field does not specify not to cache.

XMLHttpRequest

To illustrate another difference between POST and GET, you need to understand XMLHttpRequest: XMLHttpRequest is an API that provides a client with the ability to transfer data between the client and the server. It provides an easy way to get data through a URL without having to refresh the entire page. This allows the web page to update only a portion of the page without disturbing the user. XMLHttpRequest is widely used in AJAX.

When using the POST method of XMLHttpRequest, the browser sends the Header first and then the Data. But not all browsers do this. Firefox, for example, does not.
The GET methods Header and Data are sent together.