preface

Hello, I’m Tutu. I haven’t written an article for more than a month. It’s hard being forced to carry buckets for all sorts of reasons. After several rounds of interviews, I was badly abused by the interviewer, many of which were due to the fact that I had not studied HTTP carefully. My knowledge of this area was relatively weak, so I decided to put data structure and algorithm on hold. So learn HTTP knowledge again, and then fill up the JS foundation.

The following nonsense not to say, let’s begin!

What is the HTTP

HTTP is a hypertext transfer protocol used to complete a series of operations on the client and server side. An agreement is an agreement of rules. It can be said that the Web is built on the HTTP protocol for communication.

The birth of HTTP

I’m sure you’re all in the same position. Before you learn a technology, you’ll want to know its history. Let’s take a look at the history of HTTP.

HTTP was born in March 1989. It was the brainchild of a buddy named Tim Berners-Lee, and the basic idea was to create a World Wide Web (WWW) that could be consulted by connecting multiple documents to each other through hypertext. Hereinafter referred to as the Web.

HTTP 0.9 came out in 1990. HTTP had not yet been established as a formal standard.

HTTP 1.0 became a standard in May 1996. The protocol standard is still widely used on the server side.

HTTP 1.1 was published as the current mainstream HTTP protocol version in January 1997.

HTTP 2.0 was called for proposals in March 2012.

The first draft of HTTP 2.0 was released in September of that year.

HTTP 2.0 was standardized in November 2014.

Understand TCP/IP

Before we understand HTTP, let’s take a quick look at the TCP/IP protocol family. Most commonly used networks operate on the TCP/IP protocol, and HTTP is a subset of that.

TCP/IP protocol family

When a computer and network device communicate with each other, both must be based on the same method. For example, the rules of how to detect the communication target, which side initiates the communication first, what language is used to communicate, how to end the communication, etc., are determined first. Communication between different hardware, operating systems, all of this requires a rule. Such rules are called protocols.

Protocol includes: cable specification to IP address selection method, method to find remote users, two parties to establish communication order, and Web page display to process the steps, and so on. Collectively, these associated protocols are called TCP/IP.

The TCP/IP model layer functions

The important thing about TCP/IP is layering. There are four layers: application layer, transport layer, network layer, and data link layer.

Here’s what each layer does.

  • Application layer: The application layer determines the activities of communication when providing application services to users. For example, FTP(File Transfer Protocol) and DNS(Domain name Resolution System). The HTTP protocol is also in this layer.

  • Transport layer: Transport layer to upper application layer, providing data transfer between two computers in a network connection. This layer has two different protocols: TCP transmission Control protocol and UDP User data Protocol.

  • Network layer: The network layer is used to process packets over the network. A packet is the smallest unit of data transmitted over a network. The function of the network layer is to select a transmission route among multiple routes for data transmission.

  • Link layer: Handles the hardware part that connects the network. This layer includes operating systems, hardware devices, routers, and so on.

The advantage of the TCP/IP hierarchy is that if the Internet is planned uniformly by one protocol, when the design needs to change somewhere, all parts must be replaced altogether. After layering, you just need to replace the changing layers. Once the interfaces between the layers are laid out, the internal design of each layer is free to change. For example, an application at the application level can only think about the tasks assigned to it and not worry about anything else.

TCP/IP traffic

TCP/IP communicates with each other in a hierarchical order. The client goes down from the application layer, and the server goes up from the link layer. Look at the picture below.

  1. First the client makes an HTTP request at the application layer.
  2. Then, after receiving the data from the application layer, the transport layer divides the data, marks the serial number and port number of each packet and forwards it to the network layer.
  3. At the network layer, add the MAC address of the communication destination and forward the MAC address to the link layer.
  4. The server at the receiving end (also known as the server) receives data at the link layer and sends it in sequence to the upper layer, all the way to the application layer. The application layer receives the HTTP request from the client.

HTTP related protocols

Before the HTTP client can send packets to the server, it needs to use IP, TCP, and DNS, which are inseparable from HTTP.

IP network protocol

The Internet Protocol (IP) network Protocol is at the network layer. The FUNCTION of IP protocol is to send various packets to each other. However, the IP address and MAC address are two important conditions to ensure correct transmission to the peer. Think of it as your home address, or your phone number.

The IP address is the address assigned to the node, and the MAC address is the fixed address of the NIC. IP addresses can be paired with MAC addresses. IP addresses are mutable, but MAC addresses are immutable.

IP and IP address not to be confused, IP is a protocol. The IP address is the identification of each computer

ARP protocol

The communication between IP addresses depends on MAC addresses. The two parties communicating on the network are rarely in the same LAN, generally through a number of computers or network equipment in order to connect to each other. During the transfer process, the MAC address of the next transfer device is used to search for the next transfer destination. In this case, ARP is used. The ARP protocol is used to resolve addresses. The CORRESPONDING MAC address can be traced based on the IP address of the communication party.

Computers and routers can only get a rough idea of the route before reaching the destination, a mechanism called routing.

It’s the same as when you buy things on Taobao. For example, if you buy a piece of clothing on Taobao.com, the delivery company will deliver it to you based on your address, rather than directly to you. It goes through all sorts of relay stations in Hangzhou and then to Shenzhen, and then it gets delivered to you.

TCP protocol

TCP protocol at the transport layer, the main function is to provide reliable byte stream service. Byte stream service refers to the management of large chunks of data by dividing them into packet segments for easy transmission. A reliable transmission service is the ability to transmit data accurately and reliably to each other.

In order to accurately transmit data to the other party, the triple handshake occurs. The following figure shows the process.

  1. First handshake: The client sends a packet with the SYN flag to the peer.
  2. Second handshake: After receiving the packet, the server sends back a packet with the SYN/ACK flag to acknowledge the packet.
  3. Third handshake: At the end of the handshake, the client sends back a packet with an ACK flag.

DNS service

The DNS service, like HTTP, is at the application layer. Its main function is to resolve domain names into IP addresses. The DNS protocol allows users to search for IP addresses by domain name or reverse lookup by IP address.

The relationship between each protocol and HTTP is shown below.

What are urls and URIs?

  • URLRefers to theUniform resource locatorIs the Web site address to access the Web site. For example,http://www.tutu.com.
  • URIRefers to theUniform Resource Identifier (URI)Called the,Uniform Resource IdentifierIts function is to distinguish between different resources on the Internet. For example, HTML documents, images, video clips, programs, and so on. whileURLURIA subset of.

URI format

The following figure shows the format of the URI.

  • Agreement,:http:orhttps:Indicates the protocol name. Case insensitive, plus at the end//.
  • The login information:user:pass@Indicates the user name and password for obtaining server resources. But it is not recommended because it is not safe.
  • Server addressThere are three types of server addresses:
    • In the form of a domain namewww.tutu.com;
    • In order to IPv4192.168.0.1Address name;
    • In order to[0:0:0:0:0:0:1]This IPv6 address enclosed in square brackets;
  • Server port number:: 8080Indicates the port number.
  • The file path:/html/index.htmlRepresents the server file path, resource access location.
  • Query string:? userId=1Represents a parameter in the file path.?Back tokey=valueIn the form. If additional parameters are required, use&Joining together.
  • Fragment identifier:#cn1Represents a location in a file. Is the usual web anchor location.

HTTP based

HTTP is a stateless protocol that does not persist requests/responses that have been sent.

Persistent connection

All connections in HTTP 1.1 are keep-alive by default. The Connection field in the request/response header allows you to see if persistent connections are enabled (the value of this field is described below), which is off by default (close) in HTTP1.0.

It is characterized by maintaining the TCP connection as long as either the client or the server does not break the connection. The benefit is to reduce the overhead caused by the repeated establishment and disconnection of TCP connections and reduce server stress. This allows HTTP requests and responses to end more quickly, as well as faster page display.

pipelines

Pipelining is the ability to send the next request without waiting for a response, known as parallel processing. Instead of waiting for one response after another, pipelining is faster than persistent connections.

The HTTP message

There are two types of HTTP packets: request packets and response packets. A packet is divided into a packet header and a packet body. The packet body is optional. The packet contains the following three parts.

  • The starting line(start line) there are two types.
    • Request line: request method, request URL, HTTP version
    • Response line: HTTP version, status code
  • Header fields(header) : Some header information tokey: valueIn the form.
  • The main body(body) : Data to be sent.

This figure is an example of a request message.

HTTP request method

  • GET: obtains server resources.
  • POST: submits information to the server.
  • PUT: transfers files.
  • HEAD: Same as GET method. But only the response header is returned. The purpose is to determine the validity of the URL and the time when the resource is updated.
  • DELETE: deletes the specified resource.
  • OPTIONS: Queries the methods supported by the resource specified by the request server.
  • TRACE: Identifies operations that occurred during the connection.
  • CONNECT: Establishes the connection channel for the proxy server.

The HTTP status code

1xx

1XX indicates that the received request is being processed.

2 xx success

  • 200 OK: Indicates that the request sent by the client is processed on the server.
  • 204 No Content: Indicates that the request was processed successfully, but there is no resource to return.
  • 206 Partial Content: Indicates that the client obtains only part of the file and the server successfully executes this part of the fileGETThe request. Contained in the response packetContent-RangeSpecifies the part’s entity content.

3 xx redirection

  • 301 Moved Permanenty: Permanent redirection. Indicates that the requested resource has been assigned a new URL and will use the URL to which the resource now refers.
  • 302 Found: temporary redirection. Indicates that the requested resource has been assigned a new URL.
  • 303 See Other: indicates that the requested resource holds another URL and should be usedGETMethod to get the requested resource.
  • 304 Not Modified: Indicates that the request was found, but the request does not match the condition. The negotiation cache returns this status code.
  • 307 Temporary Redirect: temporary redirection, and302Similar. But the completion changes the request method.

When the 301, 302, and 303 response status codes return, almost all browsers change POST to GET and remove the body from the request message, after which the request is automatically sent again. The 301 and 302 standards forbid changing POST to GET, but everyone does it in practice.

4XX Client error

  • 400 Bad Request: indicates that a syntax error exists in the request packet.
  • 401 Unauthorized: Indicates the authentication message that the sent request is authenticated through HTTP. If the request has been made once before, the user authentication fails.
  • 403 Forbidden: Indicates that the access to the requested resource is denied by the server.
  • 404 Not Found: Indicates that the requested resource cannot be found on the server.

5XX Server error

  • 500 Internal Serve Error: Indicates that an error occurs when the server executes the request.
  • 503 Service Unavailable: Indicates that the server is temporarily overloaded or is being stopped for maintenance.

And HTTP related Web servers

In addition to the client side and the server side, there are applications for forwarding communication data. Examples are proxy, gateway, tunnel, and cache.

The agent

A proxy is a forwarding application that exists between the client and server, acting as a middleman. It forwards requests from the client to the server. Of course, it also forwards the response returned by the server to the client.

The Via field appears in the header each time a request or response is forwarded through a proxy server.

The gateway

A gateway is a special type of server that acts as an intermediary entity for other servers. Used to convert HTTP requests to other protocol communications. The gateway receives the request as if it were the source server for its own resources.

The tunnel

The tunnel can establish a communication line with other servers as required, and then use SSL encryption for communication. The purpose of the tunnel is to ensure secure communication between the client and the server.

The cache

A cache is a copy of a resource stored on a proxy server or client’s local disk. Caching can reduce the access to the source server. The main purpose is to reduce the traffic and communication time of network bandwidth.

A cache server is a type of proxy server that keeps a copy of the resource when forwarding the response returned from the server. The advantage of the cache server is that the cache avoids multiple forwarding of resources from the source server. So the client can fetch resources from the nearest cache server, and the source server does not have to process the same request multiple times.

Cache validity

Every time a resource on the source server is updated, if the cache is still unchanged, it will return the old resource before the update.

Even if a cache exists, the validity of the resource is confirmed to the source server because of the client’s requirements, the expiration date of the cache, and so on. If the cached resource has expired, the cache server will fetch a new resource from the source server.

Client cache

By client cache, I mean the cache in the browser. If the browser cache is not expired, it directly retrieves the resources cached on the local disk without requesting the same resources from the source server. When a resource expires, its validity is verified with the source server. If the cached resource expires, another resource request is made to the source server.

Content negotiation

Content negotiation mechanism means that the client and the server negotiate with each other on the resource content of the response, and then provide the most suitable resource for the client. Content negotiation in language, character set, encoding mode, etc.

The main headers used are:

  • Accept
  • Accept-Charset
  • Accept-Language
  • Content-Language

There are three types of content negotiation techniques.

  • Server-driven Negotiation
    • Content negotiation is performed by the server.
  • Client Start Negotiation (Agent-driven Negotiation)
    • Content negotiation is performed by the client.
  • Transparent negotiation a combination of server-driven and client-driven content negotiation by the server and client.

End-to-end headers and hop-by-hop headers

HTTP header fields are defined as cached and uncached proxies. There are two types.

  • End-to-end headers Headers in this category are forwarded to the final recipient of the request or response, must be stored in the cached generated response, and must be forwarded.
  • Jump line by headHop-by-hop

    Headers in this category are valid only for a single forward and will not be forwarded because they pass through a cache or proxy. In HTTP 1.1 and later, if usedHop-by-hopHead, to provideConnectionHeader fields.

All but the following eight header fields belong to the end-to-end header.

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • Trailer
  • TE
  • Transfer-Encoding
  • Upgrade

HTTP generic header field

The fields that appear in the request/response header are listed below and contain important information.

Cache-Control

Cache-control specifies the Cache operation for resources. The parameter is optional. If multiple parameters exist, separate them with,.

Request header

When the cache-control field is used in the request header, it has the following value:

  • no-cache: Does not use the strong cache and forces the source server to verify again whether the cached resources are expired (negotiation cache).
  • no-store: Does not use any cache, fetching the latest resources from the source server each time.
  • max-age: In seconds. Indicates that a client retrieves a resource from the cache if the cached resource does not exceed the specified period of time.
  • min-fresh: In seconds. The proxy server is required to return cached resources that have not passed at least the specified time.
  • max-stale: Receives resources even if they are expired.
  • only-if-cached: Tells the proxy server to fetch resources from the cache (if any).
  • no-transform: Resources cannot be converted to prevent caching or proxy compression of images and similar operations.
Response headers

When the cache-control field is used in the response header, it has the following value:

  • public: Resources can be cached by browsers and proxy servers.
  • private: Resources can only be cached by the browser. Nothing else.
  • no-cache: You can cache, but verify with the source server that the cache resource has expired before using it.
  • s-maxage: Indicates the expiration time of resources on the proxy servers-maxageAfter, will ignoremax-ageandExpiresField.
  • max-age: In seconds. Set the cache duration. If the cache duration is not exceeded, do not request resources from the server. If it does, the resource has expired. If the response header appearsExpiresField, which is limited in HTTP 1.1max-ageIn HTTP1.0, the opposite is true.
  • must-revalidate: Can be cached, but must be validated again with the source server. Returns if the request fails504Status code. This field is ignoredmax-stale.
  • proxy-revalidate: Asks the cache server to validate the cached response.
  • no-transform: Resources cannot be converted to prevent caching or proxy compression of images and similar operations.

Connection

The Connection field determines whether to close the TCP Connection after it is complete. There are two kinds.

  • keep-Alive: Persistent connection.
  • close: Closes the TCP connection immediately after it is complete.

Date

The Date field is in GMT format, indicating the time and Date when the HTTP packet is created.

Date: Tue, 13 Apr 2021 12:35:41 GMT
Copy the code

Pragma

Pragma is used for backward compatibility with cache servers that only support the HTTP1.0 protocol. It works the same as cache-control.

Pragma: no-cache
Copy the code

Upgrade

Upgrade is used to check whether HTTP or other protocols can communicate with a later version.

Upgrade: HTTP / 2.0Connection: Upgrade <! -- delete the Upgrade and forward it -->Copy the code

Via

Via is used to track the transmission path of request and response packets between the client and server, and to avoid request loops.

<! -- through a single proxy server -->Via: Squid (1.0 gw hackr. Jp / 3.1) <! -- Through multiple proxy servers -->Via: Squid (1.0 gw hackr. Jp / 3.1), 1.1 (2.7) squids/al.example.comCopy the code

When passing proxy server A, the Via header append A string value like “1.0 GW. Hackr.jp (Squid/3.1)”. Line 1.0 refers to the VERSION of the HTTP protocol applied on the server receiving the request. This information is appended later if multiple proxy servers are passed.

Warning

The Warning field tells the user some cache-related warnings.

Request header field

Accept

The Accept header is used to tell the server what type of content the client can handle. Several media types are listed below.

  • Text file:text/html,text/plain,text/css,application/xhtml+xml,application/xmlAnd so on.
  • Image file:image/jpeg,image/gif,image/png;
  • Video file:video/mpeg,video/quicktime;
  • Binary file:application/octet-stream,application/zip;

If the value is */*, the client can be of any content type. The value image/* is used to represent any other image type.

If you want to give priority to the media type displayed, indicate the weight value with q= and use a semicolon (;). Separated. Weight values range from 0 to 1, accurate to 3 decimal places, with 1 being the maximum value. When no weight value is specified, the default weight is q=1.0.

Accept: text/html, appliaction/json; Q = 0.9Copy the code

Accept-Charset

The accept-charset request header is used to tell the server which character sets the client can handle. In addition, multiple character sets can be specified at once. Like Accept, priority is indicated by a q value. This header applies to server-driven negotiation for content negotiation mechanisms.

Accept-Charset: iso-8859-1

Accept-Charset: iso-8859-1; Q = 0.5Copy the code

Accept-Encoding

The accept-encoding request header tells the server what Encoding the client can understand. You can specify more than one content encoding at a time, including the following.

  • gzip: compressed by filegzipGenerated encoding format, usedLempel-ZivAlgorithm and 32 – bit cyclic redundancy verification.
  • compress: compressed by UNIX filescompressThe generated encoding mode is adoptedLempel-Ziv-WelchAlgorithm.
  • deflate: Combined usezlibFormat and bydeflateThe encoding method generated by the compression algorithm.
  • Indentity: Default encoding format that does not perform compression or does not change.
Accept-Encoding: gzip, deflate
Copy the code

As with Accept, the q value is used to set the priority. There is also the use of an asterisk (*) to specify any encoding format.

Accept-Language

The accept-language request header is used to tell the server the natural Language set (Chinese and English) that the client can understand, and the priority of the natural Language set. As with Accept, multiple natural language sets can be specified. Set the priority using the Q value.

Accept-Language: zh-CN,zh; Q = 0.9; Q = 0.8Copy the code

The client requests a response in Chinese if the server has one, or returns an English version if it does not.

Authorization

Authorization is used to tell the server the authentication information (certificate value) of the user agent. The header field Authorization is typically added to the request after the server returns a 401 status code response.

Authorization: Basic dWVub3NlbjpwYXNzd29yZA==
Copy the code

Expect

Expect is used to tell the server that the request will be processed only if this condition is met. If the server does not meet the client’s requirements, the 417 status code is returned. Currently, only the 100-continue condition is specified.

Expect: 100-continue
Copy the code

From

The From field represents the E-mail address of the user agent’s user. The purpose is to display the E-mail contact information of the search engine user agent’s owner.

From: [email protected]
Copy the code

Host

The Host header specifies the Host name and port number of the server where the requested resource resides. If the server does not set a host name, a null value is sent.

Host: www.tutu.com
Copy the code

If-Match

Request header fields such as if-xxxx are conditional requests. When the server receives a conditional request, it will execute the request only if it determines that the condition is true.

The if-match field is used to compare the request with the ETag value of the server resource. The request is processed only when the ETag value is the same as the if-match value. Otherwise, the 412 status code is returned. An asterisk (*) indicates that the request will be processed as long as the resource exists, but the server ignores the value of the ETag.

If-Match: "123456"
Copy the code

If-Modified-Since

If-modified-since is used to determine the resource validity of the proxy server or client. The request is processed when the requested resource changes after the specified time. If none of the resources have changed, the 304 status code is returned.

If-Modified-Since: Tue, 13 Apr 2021 12:35:41 GMT
Copy the code

If-None-Match

If-none-match is the opposite of if-match. The request is processed only when the ETag value of the server resource is different from that of if-none-match. Add this field to the GET and HEAD request methods to GET the latest resource.

If-Range

If-range is used to tell the server that If the value of the if-range field is the same as the ETag value or time of the requested resource, it will be processed as a Range request (the Range field specifies how many bytes of data are requested). Otherwise, the scope request is ignored and all resources are returned.

If-Range: "123456"
Range: bytes=5001-10000
Copy the code

If-Unmodified-Since

The if-unmodified-since field is used to tell the server that a request will be processed only If the requested resource has not been modified after a specified time. If a modification has occurred after the specified time, the 412 status code is returned.

If-Unmodified-Since: Tue, 13 Apr 2021 12:35:41 GMT
Copy the code

Proxy-Authorization

The proxy-authorization field contains the credentials provided by the user agent to the Proxy server for authentication.

Proxy-Authorization: Basic dGlwOjkpNLAGfFY5
Copy the code

Range

The Range request header field indicates which part of the resource to fetch. The server receives a request with the Range field and returns a 206 status code after processing the request. If it cannot, return the 200 status code and return all resources.

Range: bytes=5001-10000 <! -- bytes=5001-10000 for the resource -->Copy the code

Referer

The Referer field indicates which Web page the requested URL originated from. The server uses the Referer field to identify access sources for statistical analysis, logging, and cache optimization.

Referer: www.tutu.com <! -- The request was made by www.tutu.com -->Copy the code

TE

TE indicates the transmission encoding and priority of the response that the client can handle.

TE: gzip, deflate; Q = 0.5Copy the code

User-Agent

The user-Agent field is used to pass information such as the requested browser and User Agent name to the server.

User-Agent: Mozilla / 5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36Copy the code

Response header field

Accept-Ranges

Accept-ranges indicates that the server can process resources in a specified range. There are two values: bytes and None. None indicates that requests in the specified range cannot be processed.

Age

Age indicates how long ago the source server returned the resource. The unit is second.

Age: 3600
Copy the code

ETag

ETag represents a specific identifier for a resource. The server assigns an ETag value to each resource. When the resource changes, the ETag value changes. For example, if a website accessing the same URL has Both Chinese and English versions, Chinese resources (ETag: user-chi) are returned when switching to Chinese, while English resources (ETag: user-us) are returned when switching to English.

Weak ETag
  • Strong ETag: Changes the value of a resource no matter what happens to it.
ETag: "user-123456"
Copy the code
  • Weak ETag: Applies to whether the resources are the same or not. It can be changed only when the resources are fundamentally changedETagValue. Field values are initially addedW/The character of
ETag: W/"user-123456"
Copy the code

Location

The Location field indicates that the page needs to be redirected to an address, which is usually only useful if the response code is 3XX.

Location: www.baidu.com <! -- Redirect to www.baidu.com -->Copy the code

Proxy-Authenticate

The proxy-authenticate field indicates that authentication is used to obtain resources from the Proxy server.

Retry-After

The retry-after field indicates how soon the client should request again. Use with 503 and 3XX status code responses.

Retry-After: 120
Copy the code

Server

The Server field represents information about the software used by the Server processing the request.

Server: Apache / 2.2.17Copy the code

vary

The Vary field provides control over the cache. Requests with the Vary specified header field in the request are returned to the cache if they are cached after receiving a response from the proxy server from the source server that contains the Vary specified item. Even if a request is made for the same resource, if the header fields specified by Vary are different, the resource must be fetched again from the source server.

vary: Accept-Language
Copy the code

Entity header field

Allow

Allow is used to inform clients of HTTP methods supported by resources. If the server receives an unsupported HTTP method, it returns a 405 status code as a response.

Allow: GET, DELETE 
Copy the code

Content-Encoding

The Content-Encoding field indicates how the server encodes the body of the entity. Content encodings are described in Accept-encoding, and there are four of them.

Content-Encoding: gzip
Copy the code

Content-Language

The Content-language field represents the natural Language used by the entity body.

Content-Language: zh-CN
Copy the code

Content-Length

Content-length indicates the size (in bytes) of the body of the entity.

Content-Length: 1500
Copy the code

Content-Location

The Content-location field represents the address of the data to be returned.

Content-Location: https://www.tutu.com/index.html
Copy the code

Content-Range

The Content-range field represents the location of a piece of data in the entire file.

Content-Range: bytes 5001-10000/10000
Copy the code

Content-type

The Content-Type field represents the media type of the object in the entity body.

Content-type: text/html; charset=UTF-8
Copy the code

Expires

The Expires field tells the client when the resource Expires. After this date, the resource will expire. That is, resources can be fetched from the browser cache on a specified date. If this date is exceeded, a resource request must be made to the server. If the header has cache-control: max-age, the max-age instruction is processed first.

Expires: Tue, 13 Apr 2021 12:35:41 GMT
Copy the code

Last-Modified

The last-Modified field indicates the time when the resource was Last Modified.

Last-Modified: Tue, 13 Apr 2021 12:35:41 GMT
Copy the code

HTTP cache

HTTP cache includes strong cache and negotiation cache. HTTP cache is used to speed up resource acquisition, improve user experience, reduce network connections, and relieve server pressure.

Strong cache

For strong caching, the browser determines whether the requested resource is within its expiration date. If it is within the validity period, the resource is read directly from the cache without sending a resource request to the server. Strong caching is set through three header fields: Expires, Cache-Control, and Pragma.

Cache-Control

The cache-control header field also details the attribute values that are available at each end. The most common values are listed below.

  • public: This resource can be cached by browsers and proxy servers.
  • private: This resource can only be cached by the browser.
  • no-cache: Does not use a strong cache and forces the source server to validate the cache again. This value indicates the negotiation cache.
  • no-store: Does not use any cache, fetching the latest resources from the source server each time.
  • max-age: If the cached resource does not exceed the specified time, the client retrieves the resource from the cache. In seconds.
  • s-maxage: Applies only to the proxy server. Indicates the resource expiration time of the proxy servers-maxageAfter, will ignoremax-ageandExpiresField.
Expires

The Expires field is a TIME date in THE GMT format that tells the client when the resource Expires. The client caches the response body containing the field. When a client makes a subsequent resource request, the Expires value is compared with the local time. If the local time of the request is smaller than the Expires value, the client uses the cached resource instead of sending the request to the server.

The Expires value can create a problem. If you change the local time, the client and server time will be inconsistent, and the determination of cache expiration will not be as expected.

Expires has the lowest priority of the three.

Pragma

Pragma can look at the introduction above, but I don’t want to do much here.

Strong caching in Chrome returns a 200 status code and there are two cases.

  • memory cache: Resources are fetched from browser memory as long as the page is not closed.
  • disk cache: Reads cache resources from disk.

With strong caching, if a resource on the server is updated, the client is unaware of it and will use the resource in the cache until it expires. You can force the refresh by pressing Ctrl + F5.

Negotiate the cache

Negotiated cache is a GET request to the server to verify that the browser’s locally stored resources are expired before using the local cache.

The last-modified and if – modified – going

It is usually determined by the timestamp of the last modification of the requested resource. For example, suppose a client requests a file from the server so that the resource can be requested again using the local cache through a negotiated cache mechanism. The response header that returns the resource for the first time contains a last-Modified field whose value indicates when the resource was last modified. When the page is refreshed, the resource uses the negotiated cache. The browser cannot confirm whether the local cache is expired, and then issues a GET request to the server to negotiate the cache validity. This request header contains an if-Modified-since field. The value of the last-Modified field is the value of the last response header.

Shortcomings of last-Modified

Last-modified has two defects:

  1. It simply makes a judgment based on when the resource was last modified. Although the requested file resource is edited, the content does not change and the time is updated. This results in a failure to validate the validity of the negotiated cache and a re-request for the resource.
  2. Because the time unit of file resource modification is second, if the file is modified frequently. For example, if you change it in a few hundred milliseconds, you won’t be able to identify updates to the file resource.
The ETag and if – none – match

To compensate for the lack of time determination, HTTP 1.1 adds headers for ETags (entity tags).

ETag represents a specific identifier for a resource, similar to a file fingerprint. Function is also mentioned above, but here more explanation.

If both last-Modified and ETag fields exist in the response header, ETag prevails. When another request is made to the resource, the ETag value in the previous response header is used as the if-none-match value in the request and sent to the server for cache validity verification. If the validation cache is valid, a 304 status code response is returned and redirected to the local cache.

Shortcomings of ETag

ETag is not a last-Modified replacement, but a supplement, and it has drawbacks.

  • If the resource is large, numerous, and frequently modified, thenETagCan affect server performance.
  • That’s what it saysETagAlso strong pointsETagAnd the weakETag.
    • strongETagValues are generated based on the resource content, ensuring that each byte is the same.
    • weakETagValues are generated based on partial attribute values of the resource, which is fast but cannot guarantee that every byte is the same.

If the browser uses the negotiated cache and the resource does not change, the server returns a 304 status response code telling the browser to obtain the locally cached resource.

The disadvantage of HTTP

The main shortcomings of the HTTP protocol are as follows.

  • Communications use clear text, and the content can be eavesdropped
  • Do not verify the identity of the communicating party and may encounter camouflage
  • The integrity of the message could not be proved and may have been tampered with

Communications use clear text, and the content can be eavesdropped

HTTP has no encryption function, so it cannot encrypt communication requests and responses.

TCP/IP is a network that can be eavesdropped

Due to the working mechanism of TCP/IP protocol, communication content can be peeped on all communication lines. No matter which corner of the server is communicating with the client, some of the devices on the communication line cannot be personal belongings. So do not exclude in a certain link by malicious peep behavior. Even if it is encrypted, the contents of the communication can be seen. Eavesdropping on communications on the same end is easy, as long as the packets are collected as they flow across the network. Packet capture and sniffer tools can be used to collect packets.

Solution: Encryption prevents eavesdropping

The two most common encryption methods are communication encryption and content encryption

Communication encryption

HTTP has no encryption mechanism, but can be used to encrypt HTTP traffic using a combination of Secure Socket Layer (SSL) or Transport Layer Security (TLS). After establishing a secure communication line with SSL, HTTP communication can be carried out over this line. The HTTP used in combination with SSL is called HTTPS (HTTP Secure) or HTTP over SSL.

Content encryption

Since there is no encryption mechanism in the HTTP protocol, the transmitted content itself can be encrypted. The content contained in HTTP packets is encrypted. In this case, the client needs to encrypt the BODY of the HTTP packet before sending the request. To achieve content encryption, the premise is that both the client and server have encryption and decryption mechanisms. It is mainly applied to Web servers. This method is different from SSL and TLS, which encrypt the entire communication line, so the content can still be tampered with.

An identity that does not authenticate communication may encounter masquerade

HTTP requests and responses do not acknowledge the communicator.

Anyone can initiate a request

In HTTP communication, anyone can initiate a request because there is no processing step to determine the communication party. Whenever the server receives a request, no matter who it is, it will return a response (only if the IP address and port number of the sender are not restricted by the Web server Settings). That is, take what comes to you.

  • It could be a disguised server.
  • It could be a disguised client.
  • Unable to determine whether the communicating party has access rights. Because some Web servers hold important information, they only want to give specific users permission to communicate.
  • It is impossible to tell where the request came from or from whom.
  • Accept even meaningless requests. Denial of Service (Dos) attacks on a large number of requests cannot be prevented.

Solution: Find out the other party’s credentials

Although you cannot determine the communication party using HTTP, you can use SSL. SSL, in addition to encryption, uses a certificate to identify the communication party. Certificates are issued by trusted third parties to prove that the server and client are real.

Certificates prove that the communicator is the intended server, reducing the risk of personal information disclosure for individuals. In addition, the client can complete the confirmation of personal identity by holding the certificate, and can also be used for the authentication of Web sites.

Message integrity could not be proved and may have been tampered with

The content received is incomplete

There is no way to confirm that the request or response sent is the same as the one received. It can be altered into something else in the middle, and even if the content is actually changed, the recipient will not know about it.

Solution: MD5 and SHA-1

Content can be encrypted using hash checking methods such as MD5 and SHA-1, as well as digital signature methods (PGP signatures) used to confirm files. However, there is no guarantee of correctness with these methods, because the MD5 and PGP are modified without the user knowing.

HTTPS

HTTP plus encryption and authentication and integrity protection is HTTPS

HTTPS is HTTP in an SSL shell

HTTPS is not a new protocol at the application layer. The HTTP communication interface is replaced by SSL and TLS protocols. It used to be HTTP and TCP, but with SSL, it becomes HTTP and SSL and THEN SSL and TCP.

With SSL, HTTP has the encryption, certification, and full protection features of HTTPS.

encryption

SSL uses public-key encryption. The encryption algorithm in the encryption method is public and the key is secret. In this way, the encryption method can be kept secure.

Encryption and decryption use keys. There is no way to decrypt a password without a key. Anyone can decrypt a password as long as they have a key. If the key is obtained by an attacker, encryption is meaningless.

Symmetric encryption

Encryption and decryption using the same key is called shared key encryption, also known as symmetric key encryption. That is, the client and server share a key to encrypt messages. When a client sends a request, it encrypts the message with a key. After receiving the message, the server decrypts it with the key.

disadvantages

Symmetric encryption guarantees message confidentiality, but the client and server use the same key, if there is a middleman or an attacker in the process of transmission. The key could fall into the hands of an attacker, making it pointless to encrypt messages.

Asymmetric encryption

Asymmetric encryption solves the disadvantage of symmetric encryption. Asymmetric encryption uses an asymmetric pair of keys. One is called a private key and the other is called a public key. A private key can only be owned by oneself, while a public key can be accessed by anyone.

Before sending a message, the client encrypts it with the public key, and after receiving the message, the server decrypts it with the private key.

disadvantages

Asymmetric encryption requires the sender to encrypt messages with public keys. But public keys are available to anyone, even middlemen. Although the middleman does not know what the private key of the receiver is, he can intercept the public key of the sender, generate another public key or tamper with the public key and send it to the receiver. Moreover, asymmetric encryption is more complex than symmetric encryption, which leads to lower efficiency.

Hybrid encryption mechanism

HTTPS uses a mixture of symmetric and asymmetric encryption. The advantage of symmetric encryption is that the decryption efficiency is fast, while the advantage of asymmetric encryption is that the message cannot be cracked during transmission. Even if the data is intercepted, the message cannot be decrypted without the corresponding private key.

The algorithm

Digital digest is to digest the plaintext to be encrypted into a string of fixed length (128 bits) using the Hash function. This string is also called digital fingerprint. It has a fixed length and the result of different plaintext digests is always different. Digital digitization is the fundamental reason HTTPS ensures data integrity and tamper-proof.

A digital signature

Digital signature is the application of asymmetric encryption and digital digest. It encrypts the digest information with the sender’s private key and sends the original text to the receiver together. The receiver can decrypt the encrypted digest only with the sender’s public key, and then use the Hash function to generate a digest of the received text and compare it with the decrypted digest. If they do, the message is complete. Otherwise, the information has been modified. Therefore, the digital signature can verify the integrity of the information.

A signature is an encrypted verification code

A digital signature is a special encryption check code attached to a packet. There are two benefits to using a digital signature.

  • The signature determines that the message was signed and sent by the sender, because no one can impersonate the sender’s signature.
  • The signature determines the integrity of the message and proves that the data has not been tampered with.

The process of digital signature is as follows: Plaintext > Hash > Digest > Private key encryption > Digital signature

The digital certificate

A digital certificate (CA) is just like our id card, with unique information. It is owned by some trusted third party organization. The certificate contains the following information.

  • Certificate issuer CA
  • Validity of certificate
  • The public key
  • Certificate owner
  • The signature

The digital certificate also includes the public key of the object, and a description of the object and the signature algorithm used. Anyone can create a digital certificate, but not everyone can get the right to issue the certificate by vouch for the certificate information and issue the certificate with its private key.

HTTPS workflow

  1. First, the client makes an HTTPS request to the server.
  2. The server returns the public key certificate to the client.
  3. After receiving the public key certificate, the client uses the public key of the certificate to verify the digital signature to verify the authenticity of the public key of the server.
  4. The client uses the random number generator to generate a temporary session key, encrypts the session key with the public key of the server, and sends the session key to the server.
  5. The server decrypts the session key with its own key.
  6. The client and server then begin HTTPS communication.

SSL and TSL

HTTPS uses the Secure Socket Layer (SSL) and Transport Layer Security (TLS) protocols. SSL was pioneered by Netscape, but when Netscape went cold, it was transferred to IETF. IETF uses SSL 3.0 as the standard and has customized TLS1.0, TLS1.1, and TLS1.2. TLS is a protocol developed based on SSL. This protocol is sometimes called SSL.

Why not always use HTTPS

Every coin has two sides, but HTTPS security is not a problem. There are some problems with it. When SSL is used, its processing is slower. There are two reasons. One is slow communication, and the other is that encrypted communication consumes a lot of CPU and memory resources, resulting in slow processing speed.

  • In addition to connecting to TCP and sending requests and responses, you also need to communicate with SSL.
  • In addition, SSL needs to be encrypted and decrypted on both the server and client.
  • To communicate using HTTPS, purchasing a certificate is essential.

You can certainly improve the efficiency issue with SSL acceleration (dedicated server) hardware. This improves the computing speed of SSL and loads are shared. However, the SSL accelerator only works with SSL processing. For some non-sensitive information, HTTP is used for communication. For sensitive information, HTTPS is used for communication to save resources.

Difference between HTTP and HTTPS

  1. HTTP is a plaintext transmission protocol, while HTTPS is a secure SSL encryption transmission protocol.
  2. HTTP and HTTPS connect in two different ways and have different port numbers. The former is80, which is443.
  3. To use HTTPS, you have to buy a certificate (CA), and free integers are usually few, so you have to pay a fee.
  4. HTTPS is more SEO friendly to search engines, and preferentially indexes HTTPS pages.
  5. HTTP connections are simple and stateless. HTTPS is a network protocol that uses SSL and HTTP to encrypt transmission and authenticate identity. It is more secure than HTTP.

SPDY to solve HTTP 1.x bottlenecks

Disadvantages of HTTP 1.x

HTTP 1.x has the following major disadvantages:

  1. HTTP 1.0 allows only one request to be sent over one TCP connection, while HTTP 1.1 allows multiple TCP connections by default. But in the same TCP connection, all data traffic is in order, and the server typically processes one response before moving on to the next. This creates the problem of queue head congestion.
  2. Requests can only start from the client, and the client cannot receive instructions other than the response.
  3. The request/response header is sent without compression. The more headers, the greater the delay.
  4. Sending lengthy headers and sending the same headers to each other every time leads to a waste of resources.
  5. You can select data compression format at will, and send data without forcing compression.

SPDY

SPDY is an application layer protocol based on TCP protocol developed by Google. The goal is to optimize the performance of the HTTP protocol to reduce load times and improve security of web pages through compression, multiplexing, and priority technologies. The core idea of SPDY protocol is to minimize the number of TCP connections. SPDY is not a replacement for HTTP, but rather an enhancement of HTTP.

Instead of rewriting HTTP, SPDY operates by adding a new session layer between the TCP/IP application layer and the transport layer. At the same time, for security reasons, SPDY specifies the use of SSL in communication.

SPDY is added as a session layer to control the flow of data, but still uses HTTP to establish communication. Therefore, HTTP request methods, cookies, HTTP packets, and so on can be used as usual.

The HTTP 2.0

HTTP 2.0 is an upgrade to SPDY (it’s actually based on SPDY design), but there are some differences between HTTP 2.0 and SPDY. There are two main points:

  1. HTTP 2.0 supports plaintext transport, whereas SPDY enforces HTTP.
  2. The HTTP 2.0 header compression algorithm uses HPACK, while SPDY uses DEFLATE.

Here is a brief overview of the new HTTP 2.0 features. Because HTTP 2.0 is designed to do so much, I will cover HTTP 2.0 in a separate article later.

  • Binary framing Layer: At the heart of HTTP 2.0’s performance enhancements is the new binary framing layer. HTTP 1.x uses line breaks as plain text delimiters, whereas HTTP 2.0 breaks all transmitted information into smaller messages and frames and encodes them in binary format.

  • Multiple requests and Responses: The binary framing layer at the heart of HTTP 2.0, which breaks up HTTP messages into separate frames and sends them interleaved. They are then reassembled at the other end based on stream identifiers and headers. Fixed HTTP 1.x queue head blocking.

  • Request priority: After splitting the HTTP message into multiple independent frames, you can further optimize performance by optimizing the interlacing and transmission order of these frames.

  • Server push: The server can send multiple responses to a client request. The server can also push resources to the client without an explicit request from the client.

  • Header compression: In HTTP 2.0, the transmission header is encoded using the HPACK (HTTP2 Header compression algorithm) compression format, reducing header size. The index table is maintained at both ends to record the occurrence of the head, and then the recorded head key name can be transmitted in the transmission process. After receiving the data, the peer end can find the corresponding value by the key name.

If you want to learn more about HTTP2.0, check out the Definitive Guide to Web Performance, which is quite detailed.

supplement

The OSI model

In addition to TCP/IP model, there is OSI model. The OSI model actually has three layers.

Add SSL and SPDY (both at the application layer).

The data link layer is divided into two layers:

  • Data link layer: Provides reliable data transmission services over unreliable physical links. Including framing, physical addressing, flow control, error control, access control and so on.
  • Physical layer: The main function is to connect network devices.

Cookie

As mentioned earlier, HTTP is a stateless protocol that does not manage the status of previously sent requests and responses. Suppose the client sends a request, and the server receives the request and wants to know which guy sent the request, so there is a state to manage it. Cookies are a solution to this kind of problem.

The response header returned from the server has a set-cookie field that tells the client to save the Cookie. The next time the client sends a request to the server, the client automatically adds the Cookie value to the request header and sends it.

After receiving the Cookie sent from the client, the server will check which client sent the request, and then compare the records on the server, and finally get the previous status information.

Set-Cookie

Set-cookie is a field belonging to the response header that contains the following values.

  • NAME=VALUE: CookieThe name and value of.
  • expires=DATE: CookiePeriod of validity.
  • path=PATH: Takes the file directory on the server asCookieIf not set, the default is the file directory in which the document resides.
  • Domain = domain name: as aCookieSpecifies the domain name of the applicable object. If this parameter is not specified, the domain name is created by defaultCookieDomain name of the server.
  • Secure: Will be sent only when HTTPS is usedCookie.
  • HttpOnly: JavaScript cannot be accessedCookie. This is mainly to prevent cross-site scripting attacksCookieInformation theft.

Cookie field in the request header

Cookie is a field in the request header that contains the value Set by the server through the set-cookie header and stored to the client. If multiple cookies are received, they can be sent back as multiple cookies.

The last

These are some notes I took over a month of studying HTTP books. Where have wrong place, ask each big guy to give directions a lot! Of course, the article is helpful to you, welcome to add attention oh!

reference

  • Illustrated HTTP
  • The Definitive Guide to Web Performance