Writing in the front
If you find something, please follow or like it. Thanks for your support. HTTP is short for Hyper Text Transfer Protocol. It is used to Transfer hypertext from the World Wide Web server to the local browser. HTTP is a TCP/ IP-based communication protocol to transfer data (HTML files, image files, query results, etc.). This article introduces you to HTTP in step-by-step detail. The table of contents is on the right, and contains a rich table of fields that you can refer to as an API manual.
Basic concept
URI
Uris contain urls and UrNs.
Request and response messages
The request message
The response message
HTTP method
The first line of the request packet sent by the client contains the method field.
- GET, obtain resources, current network requests, the vast majority of use is GET method.
- HEAD: gets the packet HEAD, similar to GET, but does not return the body of the packet. It is used to verify the validity of the URL and the date and time of resource update.
- POST is used to transfer the entity body. POST is used to transfer data, and GET is used to obtain resources.
- PUT: uploads files. Anyone can upload files without the authentication mechanism. Therefore, this method is not used because of security problems.
- PATCH: partially modifies resources. PUT can also be used to modify resources, but can only completely replace the original resources. PATCH allows partial modification.
- DELETE: deletes a file. It is the opposite of PUT and does not have the authentication mechanism.
- OPTIONS, query supported methods, query the methods supported by the specified URL. Returns the
Allow: GET, POST, HEAD, OPTIONS
Something like that. - CONNECT: Establishes a tunnel for communication with the proxy server, encrypts the communication using THE Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols, and then transmits the communication through the network tunnel.
- The server will return the communication path to the client. Send the request in
Max-Forwards
The header field is filled with a value, which is reduced by 1 each time it passes through a server, and stops transmission when the value reaches 0. TRACE is not usually used, and it is vulnerable to XST attacks (cross-site Tracing).
The HTTP status code
The first behavior status line in the response message returned by the server contains the status code and reason phrase to inform the client of the result of the request.
Status code | category | meaning |
---|---|---|
1XX | Informational (Informational status code) | The received request is being processed |
2XX | Success (Success Status code) | The request is successfully processed |
3XX | Redirection (Redirection status code) | Additional action is required to complete the request |
4XX | Client Error (Client Error status code) | The server cannot process the request |
5XX | Server Error | The server failed to process the request |
-
1 xx information
- 100 Continue: Indicates that everything is fine so far and the client can Continue sending the request or ignore the response.
-
2 xx success
- 200 OK
- 204 No Content: The request has been processed successfully, but the response message returned does not contain the body of the entity. Typically used when you only need to send information from the client to the server without returning data.
- 206 Partial Content: Indicates that the client sends a Range request. The response packet contains the entity Content specified by Content-range.
-
3 xx redirection
- 301 Moved Permanently: Permanent redirection
- 302 Found: Temporary redirection
- 303 See Other: Provides the same functions as 302, but 303 explicitly requires that the client use the GET method to obtain resources.
- Note: Most browsers will change POST to GET for 301, 302, and 303 redirects, although HTTP does not allow you to change POST to GET for 301, 302, and 303 redirects.
- 304 Not Modified: If the request header contains some conditions, for example: If-match, if-modified-since, if-none-match, if-range, if-unmodified-since, If the condition is not met, the server returns the 304 status code.
- 307 Temporary Redirect: indicates that the browser does not change the POST method to GET method for redirecting requests.
-
4XX Client error
- 400 Bad Request: Syntax errors exist in the Request packet.
- 401 Unauthorized: This status code indicates that the request to be sent requires authentication information (BASIC authentication and DIGEST authentication). If the request has been made before, the user authentication failed.
- 403 Forbidden: The request is rejected.
- 404 Not Found
-
5XX Server error
- 500 Internal Server Error: An Error occurred while the Server was performing a request.
- 503 Service Unavailable: The server is temporarily overloaded or is down for maintenance and cannot process requests now.
The HTTP header
There are four types of header fields: general header fields, request header fields, response header fields, and entity header fields. The various header fields and their meanings are as follows (not necessary to remember all, just for reference) :
Generic header field
Header field name | instructions |
---|---|
Cache-Control | Controls the behavior of caching |
Connection | Controls header fields that are no longer forwarded to agents and manages persistent connections |
Date | Date and time when the packet was created |
Pragma | Packet instructions |
Trailer | View the header of the packet end |
Transfer-Encoding | Specifies the transmission code of the packet body |
Upgrade | Upgrade to another protocol |
Via | Proxy server information |
Warning | Error notification |
Request header field
Header field name | instructions |
---|---|
Accept | The type of media that the user agent can handle |
Accept-Charset | Preferred character set |
Accept-Encoding | Priority content encoding |
Accept-Language | Preferred language (natural language) |
Authorization | Web Authentication Information |
Expect | Expect specific behavior from the server |
From | Email address of the user |
Host | Request the server where the resource resides |
If-Match | Compare Entity Tag (ETag) |
If-Modified-Since | Compares the update times of resources |
If-None-Match | Compare entity tags (as opposed to if-match) |
If-Range | Send scope requests for entity Byte when the resource is not updated |
If-Unmodified-Since | Compare resource update times (as opposed to if-modified-since) |
Max-Forwards | Maximum transmission hop by hop |
Proxy-Authorization | The proxy server requires authentication information of the client |
Range | Byte range request for the entity |
Referer | The original acquirer of the URI in the request |
TE | Priority of transmission encoding |
User-Agent | HTTP client program information |
Response header field
Header field name | instructions |
---|---|
Accept-Ranges | Whether to accept byte range requests |
Age | Calculate the elapsed time of resource creation |
ETag | Matching information of resources |
Location | Causes the client to redirect to the specified URI |
Proxy-Authenticate | The proxy server authenticates the client |
Retry-After | Request the timing of the request to be made again |
Server | HTTP server installation information |
vary | Proxy server cache management information |
WWW-Authenticate | Authentication information about the server to the client |
Entity head field
Header field name | instructions |
---|---|
Allow | HTTP methods supported by the resource |
Content-Encoding | The encoding method applicable to the entity body |
Content-Language | The natural language of entity subjects |
Content-Length | The size of the entity body |
Content-Location | Replace the URI of the corresponding resource |
Content-MD5 | The packet digest of the entity body |
Content-Range | The location range of the entity body |
Content-Type | The media type of the entity body |
Expires | The date and time when the entity body expires |
Last-Modified | The last modified date and time of the resource |
The specific application
Connection management
-
Short and long links: When a browser accesses an HTML page with multiple images, it requests image resources in addition to the REQUESTED HTML page resources. If you had to create a TCP connection for every HTTP communication, it would be expensive. A long connection requires only one TCP connection to carry out multiple HTTP communications.
- Since HTTP/1.1, the default Connection is long. To disconnect the Connection, the client or server must request the Connection: close.
- Before HTTP/1.1, the default was short connections. If long connections are needed, Connection: keep-alive is used.
-
Pipelining: By default, HTTP requests are made sequentially, and the next request is made only after the current request has received a response. Due to network latency and bandwidth constraints, it can take a long time before the next request is sent to the server. Pipelining can reduce latency by continuously making requests over the same long connection without waiting for a response to return.
Cookie
The HTTP protocol is stateless, mainly to keep HTTP as simple as possible so that it can handle a large number of transactions. HTTP/1.1 introduced cookies to store state information.
Cookie is a small piece of data that the server sends to the user’s browser and saves locally. It will be carried when the browser sends another request to the same server, which is used to tell the server whether the two requests come from the same browser. Since each subsequent request will need to carry Cookie data, there is an additional performance overhead (especially in a mobile environment).
Cookies were once used to store client data as the only storage method because there was no other suitable storage method, but now as modern browsers begin to support a variety of storage methods, cookies are gradually phased out. New browser apis already allow developers to store data directly locally, such as using the Web Storage API (local and session storage) or IndexedDB.
use
- Session state management (such as user login status, shopping cart, game score, or other information that needs to be logged)
- Personalization (such as user-defined Settings, themes, etc.)
- Browser behavior tracking (e.g. tracking and analyzing user behavior, etc.)
The creation process
The response packet sent by the server contains the set-cookie header field. After receiving the response packet, the client saves the Cookie content to the browser.
When the client later sends a request to the same server, it will fetch the Cookie information from the browser and send it to the server through the Cookie request header field.
classification
- Session Cookie: It is automatically deleted after the browser is closed, that is, it is valid only during the session.
- Persistent Cookie: A Cookie that becomes persistent after an expiration or max-age is specified.
scope
-
The Domain identifier specifies which hosts can accept cookies. If this parameter is not specified, it defaults to the host of the current document (excluding the subdomain name). If Domain is specified, subdomain names are generally included. For example, if you set Domain=mozilla.org, cookies are also included in the subdomain (e.g. Developer.mozilla.org).
-
The Path identifier specifies which paths under the host can accept cookies (the URL Path must exist in the request URL). Child paths are also matched with the character %x2F (“/”) as the path separator. For example, if Path=/docs is set, the following addresses will match:
- /docs
- /docs/Web/
- /docs/Web/HTTP
JavaScript
Browser passdocument.cookie
Property to create new cookies or to access cookies that are not HttpOnly tokens.
HttpOnly
Cookies marked HttpOnly cannot be called by JavaScript scripts. Cross-site scripting attacks (XSS) often use JavaScriptdocument.cookie
Apis steal users’ Cookie information, so using HttpOnly tags can protect against XSS attacks to some extent.
Secure
Cookies marked Secure can only be sent to the server through requests encrypted by THE HTTPS protocol. But even if Secure flags are set, sensitive information should not be transmitted through cookies because cookies are inherently insecure and Secure flags do not provide a guarantee of security.
Session
In addition to storing user information in the browser through cookies, Session can also be used to store information on the server, which is more secure. Sessions can be stored in files, databases, or memory on the server. Sessions can also be stored in an in-memory database such as Redis, which is more efficient. The process of using Session to maintain user login status is as follows:
- When a user logs in, the user submits a form containing the user name and password and sends it to the HTTP request packet.
- The server verifies the user name and password, and if correct, stores the user information in Redis. Its Key in Redis is called Session ID.
- The set-cookie header field of the response packet returned by the server contains the Session ID. After receiving the response packet, the client saves the Cookie value in the browser.
- The Cookie value will be included when the client makes a request to the same server. After receiving the Cookie, the server extracts the Session ID and the user information from Redis to continue the previous business operations.
Note that the Session ID cannot be easily obtained by malicious attackers. In this case, you cannot generate an easily guessed Session ID value. In addition, Session ids need to be regenerated frequently. In scenarios with high security requirements, such as transfer of money, you need to use Session to manage user status and re-authenticate users, such as re-entering passwords or using SMS verification codes.
Browser disables cookies
In this case, you cannot use cookies to save user information. Only Session can be used. In addition, the Session ID can no longer be stored in cookies. Instead, the URL rewriting technology is used to pass the Session ID as the URL parameter.
Cookie and Session selection
- Cookies can only store ASCII characters, while sessions can store any type of data. Therefore, Session is preferred when considering data complexity.
- Cookies are stored in a browser and can be viewed maliciously. If you want to store some privacy data in cookies, you can encrypt the Cookie value, and then decrypt it on the server;
- For large web sites, it is very expensive to store all user information in a Session, so it is not recommended to store all user information in a Session.
The cache
advantages
- Relieve server stress;
- Reduce client latency in obtaining resources: Caches are usually in memory and can be read faster. Also, the cache server may be geographically closer than the source server, such as the browser cache.
Implementation method
- Let the proxy server cache;
- Let the client browser cache.
Cache-Control
HTTP/1.1 controls caching through cache-control header fields.
- Disable caching:
no-store
The directive states that no part of the request or response can be cached.
- Mandatory confirmation cache:
no-cache
The directive states that the cache server must first verify the validity of the cache resource with the source server, and only when the cache resource is valid can it be used to respond to the client’s request.
- Private and public caches:
private
The directive specifies that the resource be a private cache that can only be used by individual users and is typically stored in the user’s browser.
public
The directive specifies the resource as a common cache that can be used by multiple users and is typically stored in a proxy server.
- Cache expiration mechanism:
max-age
If an instruction appears in a request message and the cache resource is cached for less than the time specified by the instruction, the cache is accepted.max-age
The instruction appears in the response message and represents how long the cache resource is held in the cache server.
Expires
The header field can also be used to tell the cache server when the resource will expire.
在HTTP / 1.1
Is processed preferentiallymax-age
Instruction; inHTTP / 1.0
,max-age
Instructions are ignored.
Cache validation
Need to knowETag
The meaning of the header field, which is the unique identifier of the resource. A URL cannot uniquely represent a resource, for examplewww.google.com/There are two resources: Chinese and English. Only ETag can uniquely identify these two resources.
Can cache resourcesETag
Values intoIf-None-Match
Header, after receiving the request, the server determines the cache resourceETag
Values and resources are up to dateETag
If the values are consistent, the cache resource is valid304 Not Modified
.
Last-Modified
The header field, which can also be used for cache validation, is included in the response message sent by the source server and indicates when the resource was last modified by the source server. But it is a weak validator because it can only be accurate to one second, so it is usually used as aETag
Backup plan. If the response header field contains this information, the client can carry it with it in subsequent requestsIf-Modified-Since
To verify the cache. The server will only return the requested resource if its contents have been modified after a given date and time, with a status code of200 OK
. If the requested resource has not been modified since then, one with no entity body is returned304 Not Modified
Response packet.
Content negotiation
Content negotiation is performed to return the most appropriate content, for example, whether to return to the Chinese or English screen based on the default language of the browser.
type
-
Server-side driven: The client sets specific HTTP header fields, such as Accept, accept-charset, accept-encoding, and Accept-language, and the server returns specific resources based on these fields. It has the following problems:
- It is difficult for the server to know all the information about the client’s browser;
- The information provided by the client is rather verbose (
HTTP/2
Header compression alleviates this problem) and has privacy risks (HTTP fingerprinting); - Given resources need to return different representations, shared caches become less efficient, and server-side implementations become more complex.
-
Proxy-driven: The server returns 300 Multiple Choices or 406 Not Acceptable, from which the client selects the most suitable resource.
vary
In the case of content negotiation, the cache can only be used if the cache in the cache server meets the content negotiation criteria, otherwise the resource should be requested from the source server. For example, a client sends an includeAccept-Language
After the request for the header field, the response returned by the source server containsVary: Accept-Language
After the cache server caches the response, the next time the client accesses the same URL resource, andAccept-Language
The cache is returned only if it matches the corresponding value in the cache.
Content encoding
Content encoding compresses the entity body to reduce the amount of data transferred.
The commonly used content encodings are gzip, COMPRESS, Deflate, and Identity.
The browser sends the accept-Encoding header, which contains the supported compression algorithms and their respective priorities. The server chooses one of these, uses the algorithm to compress the body of the response message, and sends the Content-Encoding header to tell the browser which algorithm it chose. Because the Content negotiation process selects the representation of the resource based on the Encoding type, the Vary header field in the response message must contain at least content-Encoding.
Scope of the request
If there is a network outage and the server sends only part of the data, a range request enables the client to request only that part of the data that the server did not send, avoiding the server from resending all the data.
Range
Add the Range header field to the request message to specify the Range of the request.
If the request is successful, the server returns a response containing 206 Partial Content status codes.
Accept-Ranges
Response header fieldAccept-Ranges
Used to tell the client whether a range request can be processed and usedbytes
, otherwise usenone
.
Response status code
- In case the request is successful, the server returns
206 Partial Content
Status code. - In case the scope of the request is out of bounds, the server returns
416 Requested Range Not Satisfiable
Status code. - In cases where a range request is not supported, the server returns
200 OK
Status code.
Block transfer encoding
Chunked Transfer Encoding, which splits data into chunks and lets the browser progressively display the page.
A collection of multipart objects
A message body can contain multiple types of entities and be sent at the same time. Each part is separated by the delimiter defined by the Boundary field. Each part can have a header field. For example, you can upload multiple forms as follows:
Virtual host
HTTP/1.1 uses virtual hosting technology to allow a single server to have multiple domain names and be logically treated as multiple servers.
Communication data forwarding
The agent
The proxy server accepts requests from clients and forwards them to other servers. The main purposes for using proxies are:
- The cache
- Load balancing
- Network access control
- Access logging
Proxy servers are classified into forward proxy and reverse proxy:
- The user detects the presence of a forward proxy.
- A reverse proxy is usually located on an internal network and is invisible to users.
The gateway
Unlike proxy servers, gateway servers convert HTTP to other protocols for communication, thus requesting services from other than HTTP servers.
The tunnel
A secure communication line is established between the client and server using encryption methods such as SSL.
HTTPS
HTTP has the following security issues:
- Plaintext communications can be intercepted.
- If the identity of the communicating party is not verified, the identity of the communicating party may be disguised;
- The integrity of the packet cannot be proved and the packet may be tampered with.
HTTPS is not a new protocol. It allows HTTP to communicate with SSL (Secure Sockets Layer), and then SSL and TCP, which means HTTPS uses tunnels to communicate. Using SSL, HTTPS has encryption (anti-eavesdropping), authentication (anti-counterfeiting), and integrity protection (anti-tampering).
encryption
Symmetric key encryption
Symmetric-key Encryption: The same Key is used for Encryption and decryption.
- Advantages: fast operation speed;
- Disadvantages: Cannot securely transfer the key to the communicator.
Asymmetric key encryption
Asymmetric Key Encryption, also known as public-key Encryption, uses different keys for Encryption and decryption.
The public key is available to all. After obtaining the public key of the receiver, the sender can use the public key to encrypt the communication, and the receiver uses the private key to decrypt the communication.
In addition to encryption, asymmetric keys can also be used for signing. Because the private key cannot be obtained by others, the communication sender uses its private key to sign, and the communication receiver uses the public key of the communication sender to decrypt the signature to determine whether the signature is correct.
- Advantages: The public key can be transmitted to the communication sender more securely.
- Disadvantages: Slow operation speed.
HTTPS Encryption mode
As mentioned above, symmetric Key encryption has higher transmission efficiency, but the Secret Key cannot be securely transmitted to the communicator. Asymmetric Key encryption can ensure the security of transmission, so we can use asymmetric Key encryption to transmit the Secret Key to the communicator. HTTPS uses a hybrid encryption mechanism that takes advantage of the scheme mentioned above:
- The asymmetric Key encryption mode is used to transmit the Secret Key required by the symmetric Key encryption mode to ensure security.
- After obtaining the Secret Key, use the symmetric Key encryption mode for communication to ensure efficiency. (The Session Key in the figure below is the Secret Key)
certification
The communicator is authenticated by using a certificate. A digital Certificate Authority (CA) is a third-party organization trusted by both the client and the server. The operator of the server applies for a public key to the CA. After identifying the identity of the applicant, the CA digitally signs the applied public key, assigns the signed public key, and binds the signed public key into the public key certificate.
During HTTPS communication, the server sends the certificate to the client. After obtaining the public key, the client uses the digital signature for verification. If the verification succeeds, the client can start communication.
Integrity protection
SSL provides packet summarization to protect packet integrity. HTTP also provides MD5 packet summarization, but it is not secure. For example, if the MD5 value is recalculated after the packet content is tampered, the receiver cannot be aware of the tampering.
The packet summarization function of HTTPS is secure because it combines encryption and authentication. If an encrypted message is tampered with, it is difficult to recalculate the message digest because the plaintext is not readily available.
The disadvantage of the HTTPS
- Because of the need for encryption and decryption process, so the speed will be slower;
- A high fee for certificate authorization is required.
HTTP / 2.0
HTTP / 1 x defects
HTTP/1.x implementation simplicity comes at the expense of performance:
- Clients need to use multiple connections to achieve concurrency and reduce latency;
- Does not compress the request and response headers, resulting in unnecessary network traffic;
- No valid resource priority is supported, resulting in low utilization of underlying TCP connections.
Binary frame layer
HTTP/2.0 divides packets into HEADERS frames and DATA frames, both of which are in binary format.
During communication, only one TCP connection exists, which hosts any number of two-way streams.
- A data Stream has a unique identifier and optional priority information for carrying two-way information.
- A Message is a complete set of frames corresponding to a logical request or response.
- Frames are the smallest unit of communication. Frames from different data streams can be sent interlaced and then reassembled according to the data stream identifier of each Frame header.
Server push
HTTP / 2.0
When a client requests a resource, related resources are sent to the client, so that the client does not need to initiate the request again. For example, a client requestpage.html
Page, the server on thescript.js
和 style.css
And other related resources are sent to the client.
The first compression
HTTP / 1.1
“, with a lot of information at the beginning, and repeated each time.HTTP / 2.0
Requiring both the client and server to maintain and update a table of previously seen header fields avoids repeated transfers. Not only that,HTTP / 2.0
Huffman encoding is also used to compress the header field.
HTTP / 1.1 new features
See above for details
- The default is long connection
- Support pipeline
- Multiple TCP connections can be opened simultaneously
- Support for virtual hosting
- Added status code 100
- Supports block transfer encoding
- New cache processing instruction
max-age
Compare GET and POST
role
GET is used to GET resources, and POST is used to transport entity bodies.
parameter
Both GET and POST requests can use additional parameters, but the parameters of GET appear in the URL as a query string, while the parameters of POST are stored in the entity body. Just because the POST parameter is stored in the body of the entity does not mean it is more secure, as it can still be viewed with some packet capture tool (Fiddler).
Because THE URL only supports ASCII codes, if there are Chinese characters in the parameters of GET, they need to be encoded first. For example, Chinese will be converted to%E4%B8%AD%E6%96%87
And the space will be converted to% 20
. The POST parameter supports the standard character set.
security
A secure HTTP method does not change server state, which means it is only readable.
The GET method is secure, but the POST method is not, because the purpose of POST is to transmit the entity body content, which may be form data uploaded by the user. Once the upload is successful, the server may store this data in the database, thus changing the state.
There are other secure methods besides GET: HEAD, OPTIONS.
In addition to POST, other insecure methods are PUT and DELETE.
idempotence
Idempotent HTTP methods, the same request is executed once as many times in a row, and the state of the server is the same. In other words, idempotent methods should not have side effects (except for statistical purposes). All security methods are idempotent. With the right implementation,GET, HEAD, PUT and DELETE
All methods are idempotent, andPOST
The method is not.GET/pageX HTTP / 1.1
Is idempotent, and the client receives the same result for multiple consecutive calls:
POST/add_row HTTP / 1.1
Not idempotent, if called more than once, multiple lines will be added:
DELETE/independence idX/DELETE HTTP / 1.1
Is idempotent, even if different requests receive different status codes:
cacheable
If you want to cache the response, the following conditions need to be met:
- The HTTP methods of request messages themselves are cacheable, including GET and HEAD, but PUT and DELETE are not cacheable, and POST is not cacheable in most cases.
- The status code of the response message is cacheable, including:
200, 203, 204, 206, 300, 301, 404, 405, 410, 414, and 501
. - message-responsive
Cache-Control
Header field does not specify not to cache.
XMLHttpRequest
To illustrate another difference between POST and GET, you need to understand XMLHttpRequest: XMLHttpRequest is an API that provides a client with the ability to transfer data between the client and the server. It provides an easy way to get data through a URL without having to refresh the entire page. This allows the web page to update only a portion of the page without disturbing the user. XMLHttpRequest is widely used in AJAX.
- When using the POST method of XMLHttpRequest, the browser sends the Header first and then the Data. But not all browsers do this. Firefox, for example, does not.
- The GET methods Header and Data are sent together.