HTTP entity data
1. Data type and encoding
Encoding Type Three formats are commonly used:
- Gzip: GNU Zip compression format, is the most popular compression format on the Internet;
- Deflate: Zlib (Deflate) compressed format, second in popularity to Gzip;
- Br: A new compression algorithm (Brotli) optimized for HTTP.
2. Header fields used by the data type
The HTTP protocol defines two Accept request header fields and two Content entity header fields for “Content negotiation” between the client and server.
AcceptFields mark MIME types that the client understands, giving the server more options.
The server uses header fields in response messagesContent-TypeTells the entity the true type of data.
Accept-EncodingThe field marks the compression format supported by the client. The server can select one of these compression formats to compress the data. The actual compression format used is placed in the response header fieldContent-EncodingIn the water.
3. Header fields used by the language type
Accept-LanguageFields mark up natural language that the client understands and also allow multiple types to be listed with a comma delimiter, for example:
Accordingly, the server should use header fields in response messagesContent-LanguageTell the client the actual language type used for the entity data:
The accept-charset character set is used in the HTTP request header field, but the content-charset is not used in the response header. It is used in the content-Type field with “Charset = XXX”.
4. Quality value of content negotiation
When HTTP header fields such as Accept, accept-encoding and accept-language are used for content negotiation, a special “Q” parameter can be used to indicate the weight to set the priority, where “q” stands for “quality factor”.
5. Results of content negotiation
The server will add an extra one to the response headervaryField, which records the request header field that the server refers to during content negotiation, giving a bit of information.
summary
- The data type indicates what the Content of the entity data is, using MIME Type, with the associated header fields Accept and Content-Type;
- The data Encoding represents how the entity data is compressed. The related header fields are accept-encoding and Content-Encoding.
- The Language type represents the natural Language of entity data, with the associated header fields being Accept-language and Content-Language;
- The character set represents the encoding of entity data. The related header fields are Accept-Charset and Content-Type.
- The client needs to use Accept and other header fields in the request header to conduct “content negotiation” with the server, requiring the server to return the most appropriate data;
- Header fields such as Accept can list multiple possible options in order of “, “or”; The q= “argument to specify the exact weight.
Two, HTTP transfer large file method
1. Data compression
If the compression rate can have 50%, that is to say, 100K of data can be compressed into 50K size, then it is equivalent to in the case of constant bandwidth network speed has been doubled, the acceleration effect is very obvious.
However, there is a drawback to this solution. Gzip and other compression algorithms usually have good compression rates only for text files, while multimedia data such as images, audio and video are already highly compressed, and will not be reduced (or even increased) by gZIP processing, so it will not work.
2. Block transmission
The header field “transfer-encoding: chunked” is used in the response message.
Transfer-encoding: chunked and Content-Length are mutually exclusive, which means they cannot appear together in a response message. Keep in mind that the transmission of a response message is either chunked or known.
3. Scope request
Allowing the client to use a special field in the request header to indicate that only a portion of the file is being retrieved is equivalent to “splitting the whole into parts” for the client.
The server must use the accept-ranges: bytes field in the response header to explicitly tell the client: “I support range requests.”
4. Multi-segment data
“Multipart /byteranges” indicates that the body of the packet is composed of multiple byte sequences, and a parameter “Boundary = XXX” is also used to indicate the separation mark between segments.
summary
- Compression of HTML and other text files is the most basic method for transferring large files.
- Block transmission can send and receive data by streaming, saving memory and bandwidth. It is represented by the response header field “Transfer-Encoding: chunked”. The block format is hexadecimal length header + data block.
- Range request can only obtain part of the data, namely “block request”, to achieve video drag-and-drop or breakpoint continuation. Request header field “Range” and response header field “Content-range” are used, and the response status code must be 206.
- Multiple ranges can also be requested at one time. In this case, the data type of the response message is “Multipart/Byteranges”, and multiple parts in the body will be separated by boundary string.
HTTP connection management
1. The short connection
Because the entire connection between the client and the server is very short and does not stay connected to the server for a long time, it is called a “short connection”.
The disadvantage of short connections is quite serious because establishing and closing a connection in TCP is an “expensive” operation. TCP requires a “three-way handshake” to establish a connection, and sends three data packets, requiring one RTT. Closing the connection is “four waves” and requires 2 RTT for 4 packets.
2. Long connection
In response to the shortcomings of short connections, HTTP proposes a “persistent connections” communication method, also known as “persistent connections”, “keep alive”, “connection reuse”.
3. Connection related header fields
The header of the request explicitly requires the long Connection mechanism. The field used is Connection and the value is “keep-alive”.
If the server supports long connections, it always puts a “Connection: keep-alive” field in the response message.
Because the TCP connection is not closed for a long time, the server must keep its state in memory, which consumes the server’s resources. If there are a large number of idle long connections, the resources of the server will soon be exhausted and the server will not be able to provide services to the users who really need them. Therefore, long connections also need to be closed at the appropriate time and cannot remain connected to the server forever, which can be done on either the client or the server.
On the client side, you can add a “Connection: close” field to the request header to tell the server, “Close the Connection after this communication.” When the server sees this field, it knows that the client wants to close the connection. Therefore, the server adds this field to the response packet and calls the Socket API to close the TCP connection after sending the response packet.
The server usually does not actively close the connection, but some strategies can be used. Nginx, for example, works in two ways:
- Run the keepalive_timeout command to set the timeout period for a long connection. If no data is sent or received within a period of time, disconnect the connection to prevent idle connections from occupying system resources.
- Use the Keepalive_requests directive to set the maximum number of requests that can be sent over a long connection. For example, if set to 1000, Nginx will automatically disconnect after 1000 requests have been processed on this connection.
4. Queue head is blocked
Queue head blocking has nothing to do with short and long connections, but is caused by the basic REQUEST-reply model of HTTP.
Because HTTP dictates that packets must be received on a first-in, first-out (FIFO), a “serial” queue is formed. The requests in the queue have no priority, only the order in which they were queued, with the first request being processed first. If the request at the head of the queue is delayed because it is being processed too slowly, all subsequent requests in the queue have to wait along with it, resulting in other requests incurring undue time costs.
5. Performance optimization
Because the “request-reply” model cannot be changed, the “queue head blocking” problem cannot be solved in HTTP/1.1, only mitigated. What can be done?
- Using concurrent connections, or making multiple long connections to a domain name at the same time, is a matter of quantity over quality.
- “Domain sharding” technology, or quantity to solve the idea of quality.
summary
- The early HTTP protocol used short connections and immediately closed the connection after receiving the response, which was very inefficient.
- HTTP/1.1 enables long connections by default. Multiple requests and responses are sent and received on one connection, improving transmission efficiency.
- The server sends the Connection: keep-alive field to indicate that the long Connection is enabled.
- If “Connection: close” is displayed in the packet header, it indicates that the long Connection is about to be closed.
- Too many long connections consume server resources, so the server uses some policies to selectively close long connections.
- The “queue head blocking” problem, which causes performance degradation, can be mitigated by “concurrent connections” and “domain sharding” techniques.
4. HTTP redirection and redirect
1. Redirection concept
Passive jumps initiated by the server that the browser user has no control over are called passive jumps, also known as redirects
The Location field belongs to the response field and must appear in the response packet. But it only makes sense with the 301/302 status code, which marks the URI that the server is asking for redirection, in this case to redirect the browser to “index.html.”
301, commonly known as “Moved Permanently,” means that the original URI has “Permanently” ceased to exist, and all future requests must be Moved to the new URI.
302, commonly known as “Moved Temporarily,” means that the original URI is in “temporary maintenance” and the new URI is a “temporary” URI that acts as a “backstop.”
2. Application scenarios of redirection
For example, domain name change, server change, website revision, and system maintenance will all cause the resources pointed to by the original URI to be inaccessible. In order to avoid 404, you need to redirect to the new URI and continue to provide services for netizens.
Another reason is to “avoid duplication”, allowing multiple sites to jump to the same URI, adding access points without adding extra work.
3. Redirection related issues
The first problem is “performance loss”. Obviously, the redirection mechanism determines that a jump will have two request-replies, one more than a normal access. The second problem is “loop skipping”. If the redirection policy is not properly set, there may be an infinite loop of “A=>B=>C=>A”, constantly turning in this link, the consequences can be imagined.
summary
- A redirect is a redirect initiated by the server that requires the client to re-send the request using a new URI. This is usually done automatically, and the user is unaware of the redirect.
- 301/302 are the most commonly used redirect status codes, which are permanent redirect and Temporary redirect respectively.
- The response header field Location indicates the URI to jump to, either in absolute or relative form;
- Redirection can point one URI to another, or multiple URIs can point to the same URI.
- When using redirects, you need to be careful of performance losses and avoid circular jumps.
Cookie mechanism of HTTP
HTTP is “stateless,” which is both a strength and a weakness. The advantage is that servers have no state differences and can easily be clustered, while the disadvantage is that transactions requiring state logging cannot be supported.
1. The working process of Cookie
The response header field set-cookie and the request header field Cookie.
- When a user accesses the server through the browser for the first time, a unique id is created in the format of “key=value” and put in the set-cookie field, which is sent to the browser along with the response packet.
- The browser receives the response packet and sees set-cookie in it. It knows that set-cookie is the identity given by the server, so it saves the value and sends it to the server automatically in the Cookie field the next time it requests the value.
- Because there is a Cookie field in the second request, the server knows that the user is not new and has been here before, so it can take out the value in the Cookie and identify the user.
- The server sometimes adds multiple set-cookies to the response header, storing multiple “key=value” values. But the browser doesn’t need to use multiple Cookie fields, just use “; “in one line. Just separate it.
Note: Cookies are stored by the browser, not the operating system. Therefore, it is “browser-bound” and only works within the browser.
2. The attribute of the Cookie
- An Expires Expires refers to an absolute point in time, which can be understood as a deadline.
- Max-age indicates the relative time, in seconds. To obtain the absolute validity time, add the time when the browser receives the packet to max-age.
- Domain and Path specify the Domain name and Path of the Cookie. Before sending the Cookie, the browser extracts the host and Path parts from the URI to compare the Cookie attributes. If the criteria are not met, the Cookie is not sent in the request header.
- “HttpOnly” will tell the browser that this Cookie can only be transmitted through the browser HTTP protocol, prohibit other ways to access, the browser JS engine will disable document. Cookie and other relevant API, scripting attacks will not start.
- “SameSite” protects against “cross-site request forgery” (XSRF) attacks. “SameSite=Strict” allows cookies to not be sent across sites with jump links, while “SameSite=Lax” allows GET/HEAD and other security methods. However, cross-site POST is prohibited.
- Secure: indicates that the Cookie can only be encrypted and transmitted using HTTPS. The plaintext HTTP protocol forbids sending cookies. But the Cookie itself is not encrypted, the browser is still in clear text form.
3. The application of cookies
- One of the most basic uses of cookies is identity recognition, saving user login information and realizing session transactions.
- Another common use of cookies is AD tracking.
summary
- Cookie is the server entrusts the browser to store some data, so that the server has the “memory ability”;
- The response packet uses the set-cookie field to send the Cookie value in the form of key=value.
- Cookie field in the request message to send multiple Cookie values;
- In order to protect cookies, it is necessary to set the validity period, scope and other attributes, commonly used are max-age, Expires, Domain, HttpOnly, etc.
- The most basic use of cookies is identity recognition and the realization of stateful session transactions.
HTTP cache control
Based on the characteristics of the “request-reply” pattern, it can be roughly divided into client-side cache and server-side cache
1. Cache control of the server
1.1 process
- The browser finds no data in the cache and sends a request to the server to obtain resources.
- The server responds to the request, returns the resource and marks the expiration date of the resource.
- The browser caches resources for reuse.
1.2 Caching other attributes
The server marks the resource expiration date with the header field”Cache-Control“, the value inside”max-age=30“Is the expiration time of the resource, which tells the browser,” This page can only be cached for 30 seconds before it expires.”
Other attributes:
- No_store: does not allow caching, used for some data that changes very frequently, such as seckill pages;
- No_cache: no_store does not allow caching, but the server must verify that it is out of date and has the latest version before using it.
- “Must-revalidate” : another term similar to no_cache, which means that the cache can be used if it does not expire, but must be verified by the server if it does.
Github.com/amandakelak…
2. Client cache control
When you hit the “refresh” button, the browser will add “cache-Control: max-age=0” to the request header. Because max-age is “time to live.”
3. Conditional requests
The most common ones are “if-modified-since” and “if-none-match.” Last-modified and ETag are required for the first response, and the second request can be made with the original value in the cache to verify that the resource is up to date. If the resource has Not changed, the server responds with a “304 Not Modified” indicating that the cache is still valid, and the browser can update the expiration date and use the cache freely.
summary
- Cache is an important means to optimize the system performance, HTTP transmission in every link can have cache;
- The server uses cache-Control to set the Cache policy. Max-age is commonly used, indicating the validity period of resources.
- The browser receives the data and stores it in the cache. If it does not expire, it can be used directly. After expiration, it has to go to the server to verify whether it is still available.
- To verify whether a resource is invalid, use “conditional request”, such as “if-modified-since” and “if-none-match”. After receiving 304, you can reuse the resource in the cache.
- There are two conditions for verifying whether a resource is modified: Last-Modified and ETag. The server must set the last-Modified and ETag parameters in the response packet and use them with the condition request.
- The browser can also send the cache-control field and refresh data with max-age=0 or no_cache.
HTTP proxy service
The so-called “proxy service” refers to the service itself does not produce content, but is in the middle position to forward upstream and downstream requests and responses. It has dual identities: when facing downstream users, it acts as a server, representing the source server to respond to client requests; When facing the upstream source server, it represents the client and sends requests on behalf of the client.
- HTTP proxy is an intermediate link in the communication link between the client and server, providing “proxy service” for both ends.
- Proxy in the middle layer, for HTTP processing increased more flexibility, can achieve load balancing, security protection, data filtering and other functions;
- Proxy servers need to identify themselves with the field “Via”, and multiple proxies form a list;
- If you want to know the Real IP address of the client, you can use the fields X-Forwarded-For and X-real-IP.
- A special proxy protocol can transmit the real IP address of the client without changing the original packet.
HTTP caching proxy
- One of the most commonly used performance optimizations in computing is “time for space”, or “space for time”. HTTP caching is the latter.
- Caching proxy is a proxy service with caching function. It caches data from the source server and distributes it to downstream clients.
- The cache-control field can also Control the Cache proxy, such as “private”, “S-Maxage”, “no-transform”, and so on. It must also be used with “Last-Modified”, “ETag” and other fields.
- Caching proxies can sometimes have negative effects, caching bad data that needs to be refreshed or deleted in a timely manner.