Do you really understand HTTP 🍻, consolidate your HTTP knowledge 🚀🚀

Calmly deal with all kinds of soul questions, but also improve their own professional quality as a Web development, HTTP caching mechanism, can say that this is one of the important knowledge points front-end engineers need to master.

Hypertext Transfer Protocol (HTTP) is an application layer Protocol used in distributed, collaborative and hypermedia information systems. It can be said that HTTP is the foundation of contemporary Internet communication.

Hypertext Transfer Protocol Secure (HTTPS) HTTPS, often called HTTP over TLS, HTTP over SSL, or HTTP Secure, is a transport protocol for Secure communication over a computer network. HTTPS communicates over HTTP, but uses SSL/TLS to encrypt packets. HTTPS is developed to provide identity authentication for web servers and protect the privacy and integrity of exchanged data.

Features and disadvantages of HTTP

The characteristics of

There is no connection,stateless,flexible,Simple and quick

Connectionless: The meaning of connectionless is to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved
Stateless: The state refers to the context of the communication process, and each HTTP request is independent and irrelevant. By default, no state information is required
Flexible: Data objects of any data Type (text, image, video, etc.) can be transmitted through the content-Type tag in the HTTP header
Simple and fast: When sending a request to access a resource, you only need to send the request method and URL. It is simple to use. Because the HTTP protocol is simple, the HTTP server program is small, so the communication speed is fast

disadvantages

stateless,unsafe,Clear transmission,Team head block

Stateless: The request does not record any connection information. Without memory, it is impossible to tell whether the originator of multiple requests is the same client. This means that if the previous information is required for subsequent processing, it must be retransmitted, which may result in an increase in the amount of data transmitted for each connection
Insecure: Plaintext transmission may be eavesdropped and insecure, lack of identity authentication may be disguised, and lack of packet integrity verification may be tampered with
Plaintext transmission: the packet (header part) uses plaintext, directly exposing the information to the outside world. WIFI trap is the feature of multiplexing plaintext transmission, inducing you to connect to a hotspot, and then frantically capturing your traffic, so as to get your sensitive information
Queue head blocking: When a long connection is enabled (described below), only one TCP connection can be established and only one request can be processed at a time. If the request takes too long, other requests will be blocked.

HTTP packet structure

Composition: Request message and response message

Request message: request line, request header, blank line, request data

Response message: response line, response header, blank line, response data

Request line: contains request method, request address, HTTP protocol, and version
Request header: Notifies the server that there is information about a client request
Empty line: Sends carriage return and newline characters to inform the server that there are no more headers below
Request data: Request parameters
Status line: contains HTTP protocol and version, digital status code, and English name of status code
Response header: The server returns some description of the client pair
Response data: Text information returned by the server to the client

HTTP request method

HTTP1.0

Get Obtains server data
Post transfers resources, usually resulting in changes to server resources
Head Indicates that the user obtains the packet header

HTTP1.1

Get Obtains server data
Post transfers resources, usually resulting in changes to server resources
Patch/PUT Updates data
Delete Delete data
A HEAD is similar to a GET request except that the response returned does not contain any specific content. The user obtains the packet head
Options allows the client to view server performance, such as the type of requests the server supports
Trace Trace path, used for testing or diagnosis
Connect requires the use of the tunnel protocol to connect to the broker

GET and POST

GET is harmless when the browser falls back, while POST initiates the request again
GET requests are actively cached by the browser, leaving a history, while POST is not by default unless set manually
GET requests have length limits (browser limits vary in size) on the parameters passed in the URL, while POST has no limits
The GET argument is passed through the URL, and the POST is placed in the Request body
The URL generated by GET can be bookmarked, but POST cannot
GET is less secure than POST because the GET request parameters are directly exposed to the URL and therefore cannot be used to pass sensitive information
GET requests can only be URL encoded, while POST supports multiple encoding methods
GET accepts only ASCII characters for the data type of the argument, while POST has no restrictions
GET generates one TCP packet and POST generates two packets (Firefox only sends them once). GET The browser sends the HTTP header and data with a response of 200 successful, POST sends the header with a response of 100 continue, and data with a response of 200 successful

The HTTP status code

Status code classification

Status code	meaning	explain
1xx	Server receives request	Receiving a request starting with 1xx indicates that the server has received the request but has not returned the message to the client
2xx	The request is successful, for example, 200	Indicates that the client has successfully requested data
3xx	Redirect, such as 302	When the client receives a status code starting with 3xx, it indicates that the server does not care about the address requested by the client and asks the client to request another address
4xx	Client error 404	A 4xx error is reported when a client requests an address unknown to the server
5xx	Server error, such as 500	Indicates that the error originates from the server. For example, the interface written by the server is buggy

Common status code

Status code	meaning	use
200	OK to success	Typically used for GET and POST requests
301	Redirect Permanently Redirects Permanently	With location, the browser handles it automatically
302	Found Temporary redirection	With location, the browser handles it automatically
304	Not Modified The resource is Not Modified	The requested resource is not modified, and the server does not return any resources when it returns this status code. Clients typically cache accessed resources by providing a header indicating that the client wants to return only resources that have been modified after a specified date
404	Not Found The resource was Not Found	The server could not find the resource (web page) based on the client’s request. With this code, a web designer can set up a personalized page that says “the resource you requested could not be found.
403	Forbidden No permission	The server understands the request from the requesting client, but refuses to execute the request
500	Internal server Error Indicates a server Error	Server internal error
504	Gateway time-out Indicates that the Gateway times out	The server acting as a gateway or proxy did not get the request from the remote server in time

HTTP cache

1. Introduction to caching

What is caching?

Caching is a technique for saving a copy of a resource and using it directly on the next request.

Why cache is needed:

Without caching, a large number of images and resources are loaded on every network request, which makes the page load much slower. The purpose of caching is to minimize the volume and number of network requests and make pages load faster.

What resources can be cached? Static resources (JS, CSS, IMG)

The HTML of a website cannot be cached. HTML can be updated and templates can be replaced at any time.
Business data from web pages cannot be cached. Such as message boards and comment sections, where users can comment at any time, the contents of the database will be updated frequently.
The HTML of a website cannot be cached. HTML can be updated and templates can be replaced at any time.
Business data from web pages cannot be cached. Such as message boards and comment sections, where users can comment at any time, the contents of the database will be updated frequently

2. HTTP cache strategy (mandatory cache + negotiated cache)

Mandatory cache

What is a mandatory cache

Forced caching is when files are fetched directly from the local cache without sending a request.

legend

As you can see from the above figure, the browser sends a request to the server. After receiving the request, the server returns the resource and a cache-control to the client. This cache-control usually sets the maximum expiration time of the Cache.

As you can see from the above figure, the browser has received the value of cache-Control. When the browser sends a request again, it checks whether its cache-control is expired. If it is not, it pulls the resource from the local cache and returns it to the client without passing through the server.

Forcing the cache to have an expiration time means that one day the cache will fail. So suppose one day, the client’s cache-control fails, and it can’t pull resources from the local cache. It then re-requests the server as in the first figure, after which the server returns the resource and cache-control values again.

So that’s how you enforce caching.

Cache-Control

What is cache-control?

Exists in the Response Headers;
Control the logic of forced caching;
Such as:Cache-Control: max-age = 31536000(In seconds).

The cache-control value

The cache-control values	meaning
max-age	Set the maximum expiration time of the cache
no-cache	No local cache, normal request to the server, the server does not care what we do
no-store	Simple and crude, pull the cache directly from the server
private	Only end users are allowed to cache, i.e. computers, phones, and so on
public	Allow intermediate routes or intermediate proxies to cache

Expires

amenResponse Headers 中
Also to control the expiration time of the cache (early use)
ifThe cache-control and expiresAt the same time,cache-controlIs of higher priority thanexpires

Negotiate the cache

What a negotiated cache is:

Negotiation cache, also known as contrast cache.
It is a kind ofCaching policy on the serverThat is, the server determines whether something can be cached.
The server checks whether the resources on the client are the same as those on the server304, otherwise return200And the latest resources.

legend

Similarly, a few diagrams illustrate the negotiated cache.

In the figure above, the whole process of negotiating a cache is shown. First, if the client makes a request to the server for the first time, the server returns the resource and its corresponding resource id to the browser. This resource identifier is a unique identifier for the currently returned resource, which can be either Etag or Last-Modified, as described after the legend.

Later, if the browser sends a request again, the browser will carry the resource identifier. In this case, the server checks the resource identifier to determine whether the browser resources are the same as those on the server. If so, 304 is returned, indicating that the Not Found resource is Not modified. If the result is inconsistent, 200 is returned along with the resource and the new resource id. This completes the negotiation of the cache.

Suppose our negotiated cache is last-Modified at this point. When the browser first sends a request, the server returns the resource and returns a last-Modified value to the browser. After the last-Modified value is given to the browser, the browser saves the last-Modified value in the if-Modified-since field, which is stored in the request header.

Later, when the browser sends a request again, the request header goes back to the server with the if-modified-since value. The server now matches the if-Modified-since value that the browser sent to the server to see If it is the same as its Last last-modified value. If they are equal, 304 is returned, indicating that the resource has not been modified. If not, 200 is returned, along with the resource and the new last-Modified value.

Suppose our negotiated cache is judged by Etag at this point. When the browser first sends a request, the server returns the resource and an Etag value to the browser. After the Etag value is given to the browser, the browser saves the Etag value in the if-none-match field, and if-none-match is stored in the request header.

Later, when the browser sends a request again, the request header goes to the server with the if-none-match value. The server then matches the if-none-match value that the browser sent to the server to see If it matches the value of its last modified Etag. If they are equal, 304 is returned, indicating that the resource has not been modified. If not, 200 is returned, along with the value of the resource and the new Etag.

Through the legend, I believe you have a new understanding of the negotiation cache. Next, I’ll look at some of the fields just included in the legend.

Resource identifier

In Response Headers, there are two resource identifiers:

Last-ModifiedThe request header corresponding to the last modification time of the resource isIf-Modified-Since ；
EtagThe unique identification of resources, the so-called unique, can be imagined as the fingerprint of human beings, with uniqueness; butEtagIs essentially a string; The corresponding request header isIf-None-Match 。

Last-modified and Etag

When the response headerResponse HeadersAt the same time there isLast-Modified 和 EtagIs used preferentiallyEtag ；
Last-ModifiedOnly accurate to the second;
If the resource is generated repeatedly without changing the contentEtagMore accurate.

Headers sample

As can be seen from the figure above, last-modified in the response header corresponds to if-modified-since in the request header, and Etag corresponds to if-none-match in the request header.

The flow chart

We use a flow chart to show the whole process of negotiating cache.

3. Refresh operation mode and its impact on cache

Refresh operation. When we usually get online, there is always a moment of sudden network card, this time human nature is always very impatient, do not hesitate to refresh. However, the refresh also has some impact on the cache. Let’s take a look at the various refresh postures and their impact on the cache.

Normal operation

Definition: Address bar input URL, jump link, forward and backward, etc.

Impact on cache: The mandatory cache is valid, but the negotiated cache is valid.

Manually refresh

Definition: F5, click refresh button, right click menu refresh.

Impact on cache: Force cache invalidation and negotiate cache validity.

Forced to refresh

Definition: CTRL + F5.

Impact on cache: Force cache invalidation and negotiate cache invalidation.

The HTTP header is blocked

When HTTP is enabled for a long connection, a shared TCP connection can process only one request at a time. If the current request takes too long, other requests are blocked, which is also known as queue header blocking

Concurrent connections

Because a domain name allows multiple long connections to be assigned, it increases the task queue so that no one task in the queue blocks all other tasks. RFC2616 previously stipulated that the client can only concurrent 2 connections, but the reality is that many browsers do not follow the routine card, is to comply with this standard T_T, so in RFC7230 to cancel this provision, the current browser standard in a domain name can have 6 or 8 concurrent connections, Remember 6 8, not 6 (Chrome6 /Firefox8) if that’s not enough for you

Domain name subdivision

A domain name can be concurrent at most 6~8, so we can add several more domain names, such as A.baidu.com, B.baidu.com, c.baidu.com, and prepare more secondary domain names. When we visit Baidu.com, different resources can be obtained from different secondary domain names. And they all point to the same server, so you can make more long connections and with HTTP2.0, you can load a lot of resources all at once because of the multiplexing, you can send multiple requests in a SINGLE TCP connection

HTTPS

The HTTPS protocol relies on TLS/SSL for its main functions

The characteristics of the HTTPS

Encryption. HTTPS encrypts data to protect it from eavesdroppers. This means that when a user is browsing a website, no one can listen in on the information exchanged between the user and the website, or track the user’s activities, access history, etc., to steal user information.
Data integrity: Data will not be modified by eavesdropping during transmission. The Data sent by the user will be completely transmitted to the server to ensure that the server receives what the user sends.
Authentication, which means confirming the true identity of the other party, or proving that you are you (can be likened to face recognition), prevents man-in-the-middle attacks and builds user trust.

Relationship between SSL and TLS

Transport Layer Security (TLS), and its predecessor, Secure Sockets Layer (SSL), is a Security protocol designed to provide Security and data integrity for Internet communications.
When Netscape introduced the first version of its web browser, Netscape Navigator, in 1994, it introduced the HTTPS protocol, which uses SSL for encryption, which is the origin of SSL.
IETF standardized SSL and published the first version of TLS standard document in 1999. This was followed by RFC 5246 (August 2008) and RFC 6176 (March 2011). This protocol is widely supported in applications such as browsers, E-mail, instant messaging, VoIP, and network fax.

SSL/TLS

Refer to TLS/SSL for full understanding

. Continue to improve

The articles

JavaScript handwritten code interview unbeatable 🐂01

What does New Vue do with Vue source code