1. The concept

1.1 Request and Response Packets

The client sends a request packet to the server. The server processes the information in the request packet and returns the processing result to the client in the response packet

Request message structure

  • The first line contains the request method,URL, and protocol version

  • The following lines are the request Header, each with a Header name and corresponding value

  • A blank line separates the header from the content body

GET http:/ / www.example.com/ HTTP / 1.1Accept: text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,image/apng,*/ *; Q = 0.8, application/signed - exchange; v=b3; Q =0.9 Accept-encoding: gzip, deflate Accept-language: zh-cn,zh; Q = 0.9, en. Q =0.8 cache-control: max-age=0 Host: www.example.com if-modified-since: Thu, 17 Oct 2019 07:18:26 GMT if-none-match: "3147526947+gzip" Proxy-Connection: keep-alive Upgrade-Insecure-Requests: 1 User-Agent: Mozilla / 5.0 XXX param1 = 1 & param2 = 2Copy the code
  • Response message structure:
    • The first line contains the protocol version, status code, and description, most commonly 200 OK indicating that the request was successful
    • The following lines are also the first content
    • A blank line separates the header from the content body
    • Finally, the content body of the response
HTTP/1.1 200 OK
Age: 529651
Cache-Control: max-age=604800
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 648
Content-Type: text/html; charset=UTF- 8 -
Date: Mon, 02 Nov 2020 17:53:39 GMT
Etag: "3147526947+ident+gzip"
Expires: Mon, 09 Nov 2020 17:53:39 GMT
Keep-Alive: timeout=4
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Proxy-Connection: keep-alive
Server: ECS (sjc/16DF) Vary: Accept-Encoding X-Cache: HIT <! doctype html> <html> <head> <title>Example Domain</title>/ / to omit...
</body>
</html>
Copy the code

1.2 the URL

HTTP Uses Uniform Resource Locator (URL) to locate resources. Urls are a subset of Uniform Resource Identifiers (URIs). Urls add the locating capability based on URIs

  • Urn: ISBN: 0451450523, for example, defines the name of a book, but does not say how to find it

  • Wikipedia: Uniform Resource Identifier (URI)
  • wikipedia: URL
  • Rfc2616:3.2.2 HTTP URL
  • What is the difference between a URI , a URL and a URN

2. HTTP method

The first line of the request packet sent by the client contains the method field

2.1 the GET

Access to resources

The vast majority of current network requests use the GET method

2.2 the Head

Get message header

  • Similar to the GET method, but does not return the body part of the packet entity
  • It is used to verify the validity of the URL and the date and time of resource update

2.3 POST

Get entity body

POST is used to transfer data, while GET is used to obtain resources

2.4 PUT

Upload a file

Since there is no authentication mechanism, anyone can upload files. Therefore, this method is generally not used because of security problems

PUT /new.html HTTP/1.1
Host: example.com
Content-type: text/html
Content-length: 16

<p>New File</p>
Copy the code

2.5 PATCH

Modify some resources

PUT can also be used to modify resources, but can only completely replace the original resources. PATCH allows partial modification

PATCH /file.txt HTTP/1.1
Host: www.example.com
Content-Type: application/example
If-Match: "e0023aa4e"
Content-Length: 100

[description of changes]
Copy the code

2.6 the DELETE

Delete the file

The opposite of PUT, and also without validation

DELETE /file.html HTTP/1.1
Copy the code

2.7 the OPTIONS

Query supported methods

Query the methods supported by the specified URL

Allow: GET POST HEAD OPTIONS is returned

2.8 the CONNECT

Requires that a tunnel be established while communicating with the proxy server

Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols are used to encrypt communications and transmit them through network tunnels

CONNECT www.example.com:443 HTTP/1.1
Copy the code

2.9 Trace

Tracking path

The server returns the communication path to the client

  • When the request is sent, a value is filled in the max-forwards header field, minus 1 for each server passed, and the transmission is stopped when the value reaches zero

  • Trance is not usually used and is vulnerable to XST attacks.

3. The HTTP status code

The first behavior status line in the response message returned by the server contains the status code and reason phrase, which is used to inform the client of the request result

3.1 1 xx information

  • 100 Continue: Indicates that everything is fine so far and the client can either Continue sending requests or ignore the response

3.2 2 xx success

  • 200 OK

  • 204 No Content: The request is processed successfully, but the returned packet does not contain the body part. Usually used when you only need to send information from the client to the server without returning data

  • 206 Partial Content: Indicates that the client sends a Range request. The response packet contains the entity Content specified by Content-range

3.3 3XX Redirection

  • 301 Moved Permanently: Permanently redirected

  • 302 Found: Temporary redirection

  • 303 See Other: Provides the same functions as 302, but 303 explicitly requires that the client use the GET method to obtain resources

  • Note: Although HTTP does not allow you to change POST to GET for 301 and 302 redirects, most browsers will change POST to GET for 301 and 302 and 303 redirects

  • 304 Not Modified: If the request header contains conditions such as if-match, if-modified-since, if-none-match, if-range, and if-unmodified-since, the server returns a 304 status code If the conditions are not met.

  • 307 Temporary Redirect: indicates that the browser does not change the POST method to GET method for redirecting requests.

3.4 4XX Client Error Description

  • 400 Bad Request: Syntax errors exist in the Request packet.

  • 401 Unauthorized: This status code indicates that the request to be sent requires authentication information (BASIC authentication and DIGEST authentication). If the request has been made before, the user authentication failed.

  • 403 Forbidden: The request is rejected.

  • 404 Not Found

3.5 5XX Server Error

  • 500 Internal Server Error: An Error occurred while the Server was performing a request.

  • 503 Service Unavailable: The server is temporarily overloaded or is down for maintenance and cannot process requests now.

4. The HTTP header

There are four types of header fields: general header fields, request header fields, response header fields, and entity header fields

The various header fields and their meanings are as follows

5. Specific application

5.1 Connection Management

5.1.1 Short Link and Long Link

When a browser accesses an HTML page with multiple images, it requests image resources in addition to the HTML page resources requested. If you had to establish a TCP connection for every HTTP communication, this would be expensive

  • A long connection requires only one TCP connection to carry out multiple HTTP communications
    • Starting from HTTP/1.1, the default connection is long. To disconnect the connection, the client or server must request to disconnect the connectionConnection : close;
    • Short links were the default prior to HTTP/1.1, or long links if neededConnection : keep-Alive

5.1.2 line

By default,HTTP requests are made sequentially, and the next request is made only after the current request has received a corresponding response. Due to network latency and bandwidth constraints, it can take a long time before the next request is sent to the server

  • Pipelining reduces latency by continuously sending requests over the same long connection and not waiting for the response to return

5.2 the Cookie

The HTTP protocol is stateless, mainly to keep HTTP as simple as possible so that it can handle a large number of transactions. HTTP/1.1 introduced cookies to store state information

Cookie is a small piece of data that the server sends to the user’s browser and saves locally. It will be carried when the browser sends another request to the same server, which is used to tell the server whether the two requests come from the same browser. There is an additional performance overhead (especially in a mobile environment) because each subsequent request will need to carry Cookie data

Cookies were once used for client data storage because there was no other suitable storage method as the only storage method, but now as modern browsers begin to support a variety of storage methods, cookies are gradually phased out. New browser apis already allow developers to store data directly locally, such as using the Web Storage API(local and session Storage) or IndexedDB

5.2.1 USES:

  • Session state management (such as user login status, shopping cart, game score, or other information that needs to be logged)
  • Personalization (e.g., user-defined, principal, etc.)
  • Browser behavior (e.g., tracking and analyzing user behavior)

5.2.2 Creation Process:

  • The response packet sent by the server contains the set-cookie header field. After receiving the response packet, the client saves the Cookie content to the browser
HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: yummy_cookie=choco
Set-Cookie: tasty_cookie=strawberry

[page content]
Copy the code
  • When the client sends a request to the same server, it extracts the Cookie information from the browser and sends it to the server through the Cookie request header field
GET /sample_page.html HTTP/1.1
Host: www.example.org
Cookie: yummy_cookie=choco; tasty_cookie=strawberry
Copy the code

5.2.3 requires classification:

  • Session Cookie: It is automatically deleted after the browser is closed, that is, it is valid only during the session
  • Persistent Cookie: A persistent Cookie becomes a Cookie after an expiration or max-age is specified
Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;
Copy the code

5.2.4 scope

The Domain identity specifies which hosts can accept cookies. If not specified, the main clause of the current document (excluding the subdomain) is default. If a Domain is specified, it usually includes subdomain names

  • For example, if Domain = Mozilla.org is set, cookies are also included in subdomains (such as develop.mozilla.org)

The Path identifier specifies which paths under the host can accept cookies (the URL Path must exist in the request URL). Child paths are also matched with the character %x2F(“/”) as the path separator

  • For example: Path = /docs, the following addresses match:
    • /docs
    • /docs/Web/
    • /docs/Web/HTTP

5.2.5 JavaScript

The document.cookie property allows browsers to create new cookies and access cookies that are not HTTPOnly tagged

document.cookie = "yummy_cookie=choco";
document.cookie = "tasty_cookie=strawberry";
console.log(document.cookie);
Copy the code

5.2.6 HttpOnly

Cookies marked HttpOnly cannot be called by JavaScript scripts. Cross-site scripting attacks XSS often use JavaScript’s Document. cookie API to steal users’ cookie information. The HttpOnly flag can be used to some extent to avoid XSS attacks

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT; Secure; HttpOnly
Copy the code

5.2.7 Secure

Cookies marked Secure can only be sent to the server through requests encrypted by THE HTTPS protocol. Even if Secure flags are set, sensitive information should not be transmitted through cookies because cookies are inherently insecure and Secure flags do not provide security.

5.2.8 Session

In addition to storing user information in the browser through cookies, Session can also be used to store information on the server, which is more secure

Sessions can be stored in files, databases, or memory on the server. Sessions can also be stored in an in-memory database such as Redis, which is more efficient

The process of using Session to maintain user login status is as follows:

  • When a user logs in, the user submits a form containing the user name and password and puts it in the HTTP request packet.
  • The server verifies the user name and password, and if correct, stores the user information in Redis. Its Key in Redis is called the Session ID.
  • The set-cookie header field of the response packet returned by the server contains the Session ID. After receiving the response packet, the client saves the Cookie value to the browser
  • The Cookie value will be included when the client makes a request to the same server. After receiving the Cookie, the server extracts the Session ID and the user information from Redis to continue the previous business operations

Note that the Session ID cannot be easily obtained by malicious attackers. In this case, you cannot generate an easily guessed Session ID value. In addition, Session ids need to be regenerated frequently. In scenarios with high security requirements, such as transfer of money, you need to use Session management to manage user status and re-authenticate users, such as re-entering passwords or using SMS verification codes

5.2.9 Cookies Are Disabled on Browsers

In this case, you cannot use cookies to save user information. Only Session can be used. In addition, the Session ID can no longer be stored in cookies. Instead, the URL rewriting technology is used to pass the Session ID as the URL parameter

5.2.10 Cookie and Session Selection

  • Cookies can only store ASCII characters, while sessions can store any type of data. Therefore, Session is preferred when considering data complexity.

  • Cookies are stored in a browser and can be viewed maliciously. If you want to store some privacy data in cookies, you can encrypt the Cookie value, and then decrypt it on the server;

  • For large web sites, it is very expensive to store all user information in a Session, so it is not recommended to store all user information in a Session

5.3.1 cache

5.3.1 advantages

  • Relieve server stress;
  • Reduces client latency in retrieving resources: Caches are usually in memory and can be read very quickly. Also, the cache server may be geographically closer than the source server, such as the browser cache

5.3.2 Implementation method

  • Let the proxy service cache;
  • Let the client browser cache

5.3.3 Cache-Control

  • HTTP/1.1 controls caching through cache-control header fields

5.3.3.1 Disabling caching

The no-store directive states that no part of the request or response can be cached

Cache-Control: no-store
Copy the code

5.3.3.2 Forcing acknowledgement cache

The no-cache directive specifies that the cache server must first verify the validity of the cache resource from the source server. Only when the cache resource is valid can the cache be used to respond to client requests

Cache-Control: no-cache
Copy the code

5.3.3.3 Private and Public Caches

The private directive specifies that resources be kept as a private cache that can only be used by individual users, typically stored in the user’s browser

Cache-Control: private
Copy the code

The public directive specifies a resource as a public cache that can be used by multiple users and is typically stored in a proxy server

Cache-Control: public
Copy the code

5.3.3.4 Cache expiration mechanism

If the max-age instruction appears in the request message and the cache time of the cache resource is shorter than the specified time, the cache max-age instruction can be accepted in the response message, indicating the storage time of the cache resource in the cache server

Cache-Control: max-age=31536000
Copy the code

The Expires header field can also be used to tell the cache server when the resource will expire

Expires: Wed, 04 Jul 2012 08:26:05 GMT
Copy the code
  • In HTTP/1.1, max-age directives take precedence;
  • In HTTP/1.0,max-age directives are ignored

5.3.4 Cache Verification

The ETag header field is the unique identifier of the resource. A URL cannot uniquely represent a resource

  • For example, google.com has two resources: Chinese and English. Only ETag can uniquely identify these two resources
ETag: "82e22293907ce725faf67773957acd12"
Copy the code

After receiving the request, the server checks whether the ETag value of the cache resource is consistent with the latest ETag value of the resource. If the ETag value is consistent, the cache resource is valid and 304 Not Modified is returned

If-None-Match: "82e22293907ce725faf67773957acd12"
Copy the code

The last-Modified header field can also be used for cache validation. It is included in the response message sent by the source server and indicates when the resource was Last Modified by the source server. But it is a weak validator because it is accurate only to 1S, so it is often used as an alternative to ETag. If the response header field contains this information, the client can validate the cache with if-modified-since on subsequent requests. The server will only return the requested resource if its contents have been modified after a given date and time, with a status code of 200. If the requested resource has Not been Modified since then, a 304 Not Modified response message with no entity body is returned

Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT

If-Modified-Since: Wed, 21 Oct 2015 07:28:00 GMT
Copy the code

5.4 Content Negotiation

Content negotiation is used to return the most appropriate content, such as Chinese or English pages based on the default language of the browser

5.4.1 type

5.4.1.1 Server driven

The client sets specific HTTP header fields, such as Accept, accept-charset, accept-Encoding, and Accept-language, and the server returns specific resources based on these fields

It has the following problems:

  • It is difficult for the server to know all the information about the client’s browser;
  • The information provided by the client is verbose (HTTP/2’s header compression mitigates this) and has privacy risks (HTTP fingerprinting);
  • A given resource needs to be returned in a different representation, the shared cache becomes less efficient, and the server implementation becomes more complex

5.4.1.2 Agent-driven

The server returns 300 Multiple Choices or 406 Not Acceptable, from which the client selects the most suitable resource

5.4.2 than

Vary: Accept-Language
Copy the code

In the case of content negotiation, the cache can only be used if the cache in the cache server meets the content negotiation criteria, otherwise the resource should be requested from the source server

  • For example, after a client sends a request containing the accept-language header field, the response from the source server containsVary: Accept-LanguageThe content,After the cache server caches the response, the next time the client accesses the same URL resource, and the accept-language is the same as the corresponding value in the cache, the cache is returned

5.5 Content Coding

Content encoding compresses the entity body to reduce the amount of transport

The commonly used content encodings are gzip, COMPRESS, Deflate, and Identity

The browser sends the accept-Encoding header, which contains the supported compression algorithms and their respective priorities. The server chooses one of these, uses the algorithm to compress the desired message body, and sends the Content-Encoding header to tell the browser which algorithm it chose. Because the Content negotiation process selects the representation of the resource based on the Encoding type, the Vary header field in the response message must contain at least content-Encoding

5.6 Range Request

If there is a network outage and the server sends only part of the data, a range request enables the client to request only that part of the data that the server did not send, avoiding the server from resending all the data

5.6.1 Range

Add the Range header field to the request message to specify the Range of the request

GET /z4d4kWk.jpg HTTP/1.1
Host: i.imgur.com
Range: bytes=0- 1023.
Copy the code

If the request succeeds, the server returns a 206 Partial Content status code

HTTP/1.1 206 Partial Content
Content-Range: bytes 0- 1023./146515
Content-Length: 1024. (binary content)Copy the code

5.6.2 the Accept – Ranges

The accept-ranges field in the response header tells the client whether it can handle a range request, either using bytes or none otherwise

Accept-Ranges: bytes
Copy the code

5.6.3 Response status code

  • In the case of a successful request, the server returns the 206 Partial Content status code

  • If the Requested Range is out of bounds, the server returns the 416 Requested Range Not Satisfiable status code

  • In cases where range requests are not supported, the server returns a 200 OK status code

5.7 Block Transfer encoding

Chunked Transfer Encoding, which splits data into chunks and lets the browser progressively display the page

5.8 Multi-part Object Sets

A report body can contain multiple types of entities and be sent at the same time. ** Each part is separated by the delimiter defined by the Boundary field. Each part can have a header field **

  • For example, you can upload multiple forms using the following method
Content-Type: multipart/form-data; boundary=AaB03x

--AaB03x
Content-Disposition: form-data; name="submit-name"

Larry
--AaB03x
Content-Disposition: form-data; name="files"; filename="file1.txt"
Content-Type: text/plain

... contents of file1.txt ...
--AaB03x--
Copy the code

5.9 Virtual Host

HTTP/1.1 uses virtual hosting technology to allow a single server to have multiple domain names and logically be treated as multiple servers

5.10 Forwarding Communication Data

5.10.1 agent

The proxy server accepts requests from clients and forwards them to other servers

The main purposes for using proxies are:

  • The cache.
  • Load balancing;
  • Network access control;
  • Access logging

Proxy servers are classified into forward proxy and reverse proxy:

  • The user detects the presence of a forward proxy

  • A reverse proxy is usually located on an internal network and is invisible to users

5.10.2 gateway

Unlike proxy servers, gateway servers convert HTTP to other protocols for communication, thus requesting services from other than HTTP servers

5.10.3 tunnel

A secure communication line is established between the client and server using encryption methods such as SSL