The content is summarized from the authoritative HTTP Guide, which summarizes some of the knowledge points. Will continue to update, stay tuned ~ ~

An overview of HTTP

1.1 Media Type (MIME Type)

There are thousands of different data types on the Internet, and HTTP carefully labels each object to be transferred over the Web as a data format of the Multipurpose Internet Mail Extension type. It was first used in E-mail systems and later adopted by HTTP.

When a Web browser retrieves an object from a server, it looks at the associated MIME type to see if it knows what to do with the object. Most browsers can handle a hundred common object types: display image files, parse and format HTML files, play audio files from your computer’s sound card, run external plug-in software to handle specially formatted data, and so on.

A MIME type is a text marker that represents a primary object type and a specific subtype, separated by a diagonal bar, and placed in the Content-Type field of the Response Headers of Http. The main types of objects that are common are:

  • application/*

Describes many application-specific MIME types

  1. application/x-javascript: Javascript files
  2. application/json: XHR request or Json file
  3. application/octet-streamUnclassified binary data, such as some font transfers, and segmented transfers of long videos, are useful for this content Type
  4. application/x-www-form-urlencoded: form a simple request for a form in which the data is encoded as&Segmentation ofKey = valueYes, binary data is not supported. (Use multipart/form-data instead.)
  • text/*

Contains characters and potential format conversion information

  1. text/css: the CSS file
  2. text/html: the HTML file
  3. text/plain: Plain text
  • video/*

Video movie format. Note that some video formats are classified as application types.

  1. video/quicktime: Apple Quicktime video format
  • image/*

The picture

  1. image/png: PNG encoded image
  2. image/webp: WEBP images supported by Chrome and Opera
  • multipart/*

Is a composite object that contains other objects. Subtypes describe the implementation of multipart encapsulation and how components are handled.

  1. multipart/form-data: Encapsulates a set of values based on the result of the user filling in the table
  • model/*

Is an extension type registered with the IETF. It represents mathematical models of the physical world for computer aided design and three-dimensional images.

  1. model/vnd.dwf: DWF CAD file
  • binary/*

Binary format

  1. binary/octet-stream: Some font files
  • message/*

A composite type used to interact with data objects (via E-mail, HTTP, or other transport protocols)

  1. message/s-http: Secure HTTP packet, an alternative to HTTP over SSL
  2. message/rfc822: Indicates a complete email packet
  • audio/*

Audio content file

  1. audio/x-wav: WAV audio file
  2. audio/mp4: Audio files in MP4 format

Structural components of the Web

1.2 the agent

A Web proxy server is an intermediate entity on the network, located between a client and a server, that receives all HTTP requests from clients and forwards them to the server (with possible modifications).

For security reasons, proxies are typically used as trusted intermediate nodes that forward all Web traffic. The proxy can also filter requests and responses. For example, block some adult content from primary school students.

1.2.1 Comparison between proxy and Gateway

Strictly speaking, a proxy connects to two or more applications that use the same protocol, while a gateway connects to two or more endpoints that use different protocols.

HTTP/HTTP proxy

Graph LR HTTP client a - - > | | HTTP Web proxy Web proxy - > | | HTTP Web server Web proxy - > | | HTTP HTTP client a Web server - > | | HTTP Web proxy

HTTP/POP gateway, which connects the HTTP front end to the POP E-mail back end, allows users to read E-mail over HTTP. Web-based E-mail programs such as Yahoo! Mailboxes are HTTP E-mail gateways.

Graph LR HTTP client b - > | | HTTP HTTP/E-mail gateway HTTP/E-mail gateway - > | POP | E - mail server HTTP/E-mail gateway - > | | HTTP HTTP client b E - mail server -- - > | POP | HTTP/E - mail gateway

But in practice, the distinction between proxy and gateway is fuzzy. Because browsers and servers implement different versions of HTTP, proxies often do some protocol translation as well. Commercial proxy servers also implement gateway functions to support SSL security protocols, SOCKS firewalls, FTP access, and Web-based applications.

1.2.2 Proxy Server Deployment Mode

  • Export agent

Agents can be anchored at the exit points of the local network to control traffic between the local network and the large Internet. For example, the use of export agents in corporate networks can provide firewall protection against malicious hackers outside the company, reduce bandwidth costs and improve the performance of Internet traffic. Primary schools use export agents to prevent precocious students from viewing inappropriate content.

Export agent for a private LAN

Graph LR client - > | | local network export agent export agent server - > | | Internet Web server export proxy server - > | | local network client Web server - > | | Internet export proxy server
  • Access to the agent

Proxies are often placed at ISP (Internet service provider) access points to handle aggregated requests from customers. Isps use caching proxies to store copies of frequently used documents to increase download speeds and reduce Internet bandwidth consumption for users, especially those with high-speed connections.

  • The reverse proxy

A reverse proxy server is typically deployed at the edge of the network, in front of the Web server, and on a counterpoint to the egress proxy server. It can improve the security features of the Web server and do load balancing. For example, we often use nginx.

Graph LR client - > | | Internet reverse proxy server the reverse proxy server - > | | local network Web server the reverse proxy server - > | | Internet client Web server - > local network | | the reverse proxy server
  • Network switching agent

Agents with sufficient processing power can be placed at Internet peer exchange points between networks, to reduce congestion at Internet nodes through caching, and to monitor traffic.

1.2.3 Hierarchy of agents

Agents in real life are mostly the combination of the above agents.

In the proxy hierarchy, proxy servers are assigned parent-child relationships. Those near the server are called the parent, those near the client are called the child, and the child dynamically selects the parent.

  • Load balancing: The child agent selects the parent agent based on the parent agent workload level
  • Geographically nearby routing: The child agent may select the parent agent responsible for the physical extents of the original server
  • Protocol/type routing: The sub-proxy may forward packets to different parent proxies and original servers based on the URI. For example, for a specific type of image, the access agent forwards the request to a specific compressor agent and then compresses the image.
  • Subscription based routing: Publishers pay extra for high-performance services, for example, and their URIs are forwarded to large caching or compression engines to improve performance.

The implementation of dynamic parent routing logic varies from product to product, including the use of configuration files, scripting languages, and dynamic executable plug-ins.

1.2.4 Proxy Settings on the Client

  • Manual configuration: Many Web clients allow users to manually configure the proxy.
  • Client proxy configuration: The PAC file, as shown below, provides a URI in the automatic proxy configuration box from which the browser retrives the PAC file and executes what the PAC file must defineFindProxyForURLFunction to use the appropriate proxy server.
function FindProxyForURL(url, host) {
  if (defaultMatcher.matchesAny(url, host) instanceof BlockingFilter) {
    return proxy; // Use the specified proxy
  }
  return direct; // Connect directly without going through any proxy
}
Copy the code

Chrome – Advanced Settings – Opens computer Network Settings Agent

1.2.5 Client automatic URI extension and host name resolution

For example, when we search Baidu and enter Baidu.com, why does the browser complete www.baidu.com? This is because, when no host is found, the browser will try to provide some sort of automatic hostname extension mechanism, trying to add the prefix WWW. And the suffix.com, and most DNS configurations automatically search for domain names based on the prefix entered by the user.

1.2.6 Tracing packets

It is now common for Web requests to pass through multiple proxies on the path from client to server.

The VIA header field lists information about each intermediate node (proxy or gateway) of the message path. Each time a message passes through a node, the intermediate node must be added to the end of the VIA list. Here’s an example:

// Response Headers
server: Tengine // Describes the software used by the original server
via: cache45.l2cn1803[0.304-0,H], cache17.l2cn1803[1.0], cache13.cn747[0.0.200-0,H], cache6.cn747[5.0]
Copy the code

Each Via roadmap contains up to four components: an optional protocol name (default HTTP), a required protocol version, a required node name, and an optional description comment

Note that server is for the original server, and the proxy should add via entries.

1.3 the cache

Web cache or proxy cache is a special HTTP proxy server that can copy and store frequently used documents sent through the proxy.

1.4 the gateway

A gateway is a special server that acts as an intermediary entity for other servers. Usually used to convert HTTP traffic to other protocols.

For example, an HTTP/FTP gateway receives requests for FTP URIs through HTTP requests, but obtains documents through the FTP protocol.

Graph LR HTTP client - > | | HTTP HTTP/FTP HTTP/FTP gateway gateway - > | | FTP HTTP/FTP FTP server gateway - > | | HTTP HTTP client FTP server - > | | FTP HTTP/FTP gateway

1.5 the tunnel

A tunnel is an HTTP application that blindly forwards raw data between two connections once it is established. A common use of HTTP tunneling is to carry encrypted SSL (Secure Sockets Layer) traffic over an HTTP connection so that SSL traffic can pass through a firewall that only allows Web traffic.

1.6 the Agent Agent

User Agent An Agent is a client program that initiates HTTP requests on behalf of users. Such as: Web browser, crawler, etc.

HTTPS

HTTPS is the most popular form of HTTP security. It was pioneered by Netscape and is supported by all major browsers and servers. With HTTPS, all HTTP request and response data is encrypted before being sent to the network. HTTPS provides a transport-level password Security Layer underneath HTTP, either using SSL or its successor, Transport Layer Security (TLS).

  • HTTP
Graph TD HTTP --> TCP TCP --> IP IP --> Network interface
  • HTTPS

Most of the hard encoding and decoding is done in SSL libraries, so Web clients and servers replace TCP calls with SSL input/output calls on top of HTTP, and add a few more calls to configure and manage security information.

Graph TD HTTP --> SSL SSL --> TCP TCP --> IP IP --> network interface

Introduction to basic concepts of digital encryption

  • Cipher: An algorithm that encodes text so that it cannot be read by a voyeur
  • Key: a digitized parameter that changes the behavior of a cipher
  • Symmetric key cryptosystem: an algorithm that uses the same key for encoding/decoding
  • Asymmetric key cryptosystem: algorithms that encode/decode with different keys
  • Public-key encryption: Instead of using a separate encryption/decryption key for each host pair, public-key encryption uses two asymmetric keys: one for encoding and one for decoding host messages. The encoding key is well known, but only the host knows the private decryption key. Each client can encode a message to the server with the same key, but no one else can decode the message except the server, because only the server has the private key to decode it.
  • RSA: The common challenge facing all public-key asymmetric encryption systems is to ensure that one cannot calculate the secret private key even if one has all the following clues: (1) the public key (2) a small piece of intercepted ciphertext (3) a message and the ciphertext associated with it. The RSA algorithm, invented at MIT and later commercialized by RSA Data Security, is a popular public-key encryption system that satisfies all of these conditions.
  • Hybrid encryption systems and session keys: Public key encryption algorithms can be slow to compute. HTTPS uses a hybrid encryption algorithm combining symmetric encryption and asymmetric encryption. The procedure is as follows :(1) after obtaining the public key of the server, the browser generates a temporary random symmetric key, encrypts packets with the symmetric key, and encrypts the temporary symmetric key with the obtained public key of the server. This information is then passed to the server. (2) After the server obtains the information, it decrypts the symmetric key encrypted by the browser with its own private key, then decrypts the message with the symmetric key obtained, and encrypts the message information to be sent back with the symmetric key. This allows the rest of the data to be encrypted using faster symmetric encryption, which actually encrypts only the randomly generated symmetric key (session key) of the browser.

This section describes the HTTPS encryption process

  1. The server tells the client its certificate, which contains the public key.
  2. The client checks whether the certificate is valid.
  3. The client generates a random symmetric encryption key and encrypts it using the public key from step 1.
  4. The client tells the server the result of step 3 (along with the data to be transmitted, namely the HTTP request, which is also symmetrically encrypted);
  5. The server receives the symmetric encryption key transmitted by the third step and decrypts the transmitted data with the private key.
  6. After processing the data, the server sends the data to the client (i.e., the HTTP response) encrypted with the symmetric encryption key.
  7. The client decrypts the packet with a symmetric encryption key.
  8. Repeat steps 4-7.
  • Digital signature: checksum used to verify that packets are not forged or tampered with
  • Digital certificate: Identifying information verified and issued by a trusted organization

Click 🔒, the small lock on the left side of your browser’s url navigation bar, to see the site’s certificate. What I screenshot is the SSL certificate of Aliyun. How to buy a certificate, how to create a private key, we can go to see ali cloud SSL certificate service product introduction, will also have a more intuitive understanding of these concepts.

  • Site certificate validity: SSL itself does not require the user to check the Web server certificate, but most modern browsers to simple integrity check certificate, and to provide users with further investigate method, steps as follows: (1) the date test (2) the signature issuer reliability testing (3) signature detection (4) the site status

Establishing secure transmission

In unencrypted HTTP, the client opens a TCP connection to port 80 of the Web server. In HTTPS, the client first opens a connection to the Web server port 443. Once a TCP connection is established, the client and server initialize the SSL layer and perform an SSL handshake:

  1. Switch protocol version number
  2. Choose a password that both sides understand
  3. Authenticate the identities of both ends
  4. Generate a temporary session key

The secure traffic is transmitted in tunnel mode by proxy

Once clients start encrypting data sent to the server using the server’s public key, proxies can no longer read HTTP headers! The proxy cannot read the HTTP header, so it cannot know where to redirect the request. In order for HTTPS to work with the agent, several changes are made to tell the agent where to connect. One common technique is the HTTPS SSL tunneling protocol. With the HTTPS tunneling protocol, the client first tells the agent which secure host and port it wants to connect to. This is told in clear text before encryption begins, so the agent can understand the message.

HTTP sends endpoint information in plaintext through a new extension method called CONNECT. The CONNECT method tells the agent to open a connection to the desired host and port number. When this is done, the data is transmitted directly between the client and server as a tunnel.

CONNECT home.netscape.com:443 HTTP/1.1
User-agent:Mozilla/5.0<raw SSL-encrypted data would fllow here ... >Copy the code

After blank lines in the request, the client waits for a response from the proxy. The proxy evaluates the request to ensure that it is valid and that the user has the right to request such a connection. If all is well, the agent establishes a connection to the target server. If successful, a 200 Connection Established response is sent to the client.

HTTP request Entity

HTTP1.1 defines the following ten basic HTTP header entities:

  • Content-type: The Type of the object hosted in the entity
  • Content-length: The Length or size of the entity body to be sent. Content-length must be used to determine the end and start of a packet.
  • Content-language: The human Language that the transmitted entity object best matches
  • Content-encoding: Any transformation (for example, compression) of the object data, such as gzip, COMPRESS, Deflate, BR, identity (indicating that the entity is not encoded). Gzip is by far the most efficient.
  • Content-location: An alternate Location from which objects can be obtained on request
  • Content-range: In the case of a Range request, the header specifies what part of the whole it is
  • Content-md5: indicates the checksum of the body Content of the entity
  • Last-modified: The time when the transmitted content was created or Last Modified on the server
  • Expires: Indicates the date or time when entity data will expire
  • Allow: Methods that the resource allows requests
  • Etag: indicates the location verification code of the entity
  • Cache-control: Indicates how to Cache the document

To improve the request efficiency, source files on the server can be cached in the CDN, NG, and browser. Therefore, cache policies can be set to optimize resource requests.

This involves the issue of when the resource cache is updated and requires freshness verification. The server can provide one of two headers to provide this information: Expires (HTTP1.0) or cache-Control (HTTP1.1). While the document is still fresh, the cached resource is fetched directly if there is a cache, and the status code returned is 200. When the freshness expires, a captchas are needed to verify whether new resources need to be pulled (status code 200 at this point) or to continue using the cache (status code 304 at this point).

Expires provides an absolute time when the document Expires and is compared to the browser time. Therefore, the browser time is incorrect and the cache Expires.

expires: Sun, 21 Aug 2022 03:40:05 GMT
Copy the code

Cache-control provides the relative time at which a document expires, used in conjunction with max-age. There are many other instructions available:

instruction Message type describe For example
no-cache request Do not return a cached copy of the document until you have re-validated it to the server You can see that the Disable Cache option is checked in The Chrome netWork browser, and this parameter is added to the request header
no-cache The response Static resources can be cached, but each time the server is asked if there are updated resources cache-control:no-cache
no-store request Do not return a cached copy of the document. Do not save the server’s response
no-store The response Do not cache static resources, no time to fetch the server
max-age request Documents in the cache cannot exceed the specified lifespan max-age=31536000
max-age The response In seconds, specifies how long the document can be cached for freshness. During this time, if there is a cache, the cache file is fetched directly without asking the server. After that time, the server will be asked if there is an update max-age=31536000
private The response Resources can only be cached by the browser, not by the cache server
public The response Indicates that the resource can be cached by any server cache-control: public, max-age=3600
must-revalidate The response Each response must be validated to the server first Cache-Control: max-age=0, must-revalidate

Scope of the request

Common scenarios are:

  1. An example of a range request is when you download something and the network is bad and the download is interrupted, but you can click to continue the download.
  2. Another is to download a file, can be segmented from different servers to download resource fragments, accelerate the efficiency of resource download.

The associated HTTP header fields are:

  • Range: bytes=2323– Indicates that the client is requesting the portion of the document after 2323 bytes and does not give the number of ending bytes because the requester may not know the size of the document.
  • Not all servers support range requests, and the server includes them in the responseAccept-Ranges:bytesTo just go.
  • In cases where a client requests multiple ranges within a single request, the response returned is also a single entity with a multi-part body andContent-Type: multipart/byterangesThe first.

Develop reading

  • When will the options request be sent
  • Cache-control: Nginx cache-control: Nginx cache-control: Nginx cache-control