How to transfer large files

  • The first thing that comes to mind is to use data compression, sending requests with Accept-encoding, which includes various HTTP compression formats such as gzip and Deflate. The server selects a compression algorithm to put in content-Encoding. Gzip typically has a compression rate of 60% for text files, but it is not very good for multimedia files such as audio and video files.
  • The chunk Transfer, also known as chunk, is represented by the field transfer-encoding: chunked. Chunking transfer can also be used for fluid data, i.e. the body data length is unknown and the “Content-Length” field cannot be given.
  • Scope of the requestYou can only get snippets of data from a large file. Range requests are not a required feature of the server, so the server must use the accept-Ranges: bytes field in the request header to tell the client that I support range requests. If range requests are not supported, you can either use accept-ranges: None or simply send the Accept-ranges field, and the client can only send the entire file.
    • The request header Range is the special field for HTTP Range requests in the format bytes=x-y. Where x and y are ranges of data in bytes. When the range field is received, it is validated and 416 is returned if the range is not valid. If the range is correct, 206 is returned. The server adds a response field content-range=bytes= X-y/Length.
    • If you have multiple segments of data. Support to add multiple X -y in the range header to obtain multiple fragment data at one time. This requires the use of a special MIME type: “multipart/byteranges”, which indicates that the body of the message is made up of a sequence of multiple bytes, and a parameter “Boundary = XXX” is used to indicate the separation between the segments. As the content-type: multipart/byteranges; boundary=00000000001

Cookie

Cookies are some data that the server entrusts the browser to store, so that the server has “memory capacity”, mainly used to solve the problem that HTTP is stateless. Use the response header field set-cookie and request header field Cookie to achieve Cookie transmission, format is “key=value”

Set the Cookie lifetime

Expires is an absolute time point. Max-age is a relative time point, in seconds. The browser calculates the absolute expiration time by adding the max-age to the time when the message was received. When both occur at the same time, max-age is used to calculate the failure time.

Sets the cookie scope

Domain and Path specify the Domain name and Path to which the Cookie belongs. Before sending the Cookie, the browser extracts the host and Path parts from the URI and compares the Cookie attributes. If the conditions are not met, the Cookie will not be sent in the request header.

Cookie security

Document. Cookie can be used to read and write cookie data in JS scripts, which brings security risks and may lead to “cross-site scripting” (XSS) attacks to steal data.

  • HttpOnly: This Cookie can be transmitted only through HTTP. This is an effective means of preventing XSS attacks.
  • SameSite: Protects against XSRF attacks. There are three values:
    • Strict: specifies that cookies cannot be sent across sites with a jump connection
    • Lax: Relatively lax, but cookies can only be carried if the GET method submits a form condition or if the A tag sends a GET request.
    • None: The request automatically carries a Cookie
  • Secure: Indicates that the Cookie can be encrypted and transmitted only using HTTPS

Cache

The cost of using HTTP to obtain resources is high. Therefore, it is very important to cache “hard-won” data and reuse it as much as possible the next time you request it. In this way, multiple request-response communication costs can be avoided, network bandwidth can be saved, and response speed can be accelerated. The header field used by the server to identify the validity period of the resource is cache-control with the following property values:

  • Max-age =30: the maximum validity period is 30 seconds
  • No-store: cache is not allowed, such as seckill pages
  • No-cache: The resource can be cached, but the server must verify whether the resource has expired before using it
  • Must-revalidate: indicates that it can be cached and available within the expiration time. After the expiration time, the server needs to verify whether it is available
  • Private: The cache can only be stored on the client. It is “private” to the user and cannot be shared with others on the proxy
  • Public: The cache is completely open. Anyone can save it and anyone can use it.

Negotiation rules

Strong cache

Strong caching means that if the cache does not expire, use the browser cache, and if it expires, ask the browser again. How do you see if the cache expires? In other words, what are the rules for using strong caching? This is the case where the cache-control value is max-age= XXX.

Negotiate the cache

The negotiated Cache is triggered when the value of cache-control is no-cache. Or max-age expires, as in the case of max-age=0. So what does the caching negotiation process look like? First, let’s look at two fields, ETag and Last-Modified. ETag: Each resource has one. This value changes when the file is changed and is the unique identifier of the resource. A strong ETag requires that resources match exactly at the byte level, while a weak ETag has a W/ in front of the value. It only requires that resources remain unchanged semantically and can be changed in order internally. Last-modified is the time when a file was Last Modified. The ETag and Last-Modified fields are provided in the first request, and are included in the next request, but with different names. ETag-> if-none-match and last-modified -> if-modified-since are used to verify whether the server has expired. If the resource has not changed, the server returns a 304 status code, which is safe to cache after the browser update expires. If the resource has changed, return 200 to return the latest resource.

Proxy cache

What is the agent

There are two communication parties in THE HTTP protocol, namely the “requester” browser and the “responder” server. We introduce a new role in this model, the agent. An intermediate link, also a server, is inserted into the original communication link to provide proxy services. Proxy service refers to a service that does not produce content itself, but is in the middle to forward requests and responses upstream and downstream, and has a dual identity. When facing the client, it plays the role of the server. When facing the source server, it plays the role of the client. The most basic function of an agent is load balancing. Via identifies itself, and each time a message passes through a proxy node, the proxy server appends its own information to the end of the field. Multiple agents form a list; X-forwarded-for: Indicates the forwarder to whom the device is Forwarded. The IP address of the requested party is appended. The client IP address is recorded without intermediate proxy information.

Meaning of proxy caching

When I talked about caching, I mostly talked about client-side (browser) caching, which reduces response time, saves bandwidth, and improves the user experience on the client side. But on an HTTP link, not only does the client have a cache, but the cache on the server can be very valuable, allowing requests to fetch resources without having to go to the source server. The HTTP server caching function is mainly implemented by proxy servers.

Cache control for the source server

Proxy-revalidate only requires that the proxy cache must be validated when it expires. The client does not need to validate the proxy. S-maxage specifies how long it can be stored on the proxy. The max-age no-transform agent sometimes optimizes cached data, such as converting images into PNG or webP formats for future requests, while the No-Transform agent forbids this and does not allow “sneaking around”.

Client cache control

Stale and min-fresh fields are added to cache lifetime. “Max-stale” means that it is acceptable if the cache on the agent expires, but not too much, and not after x seconds. “Min-fresh” means that the cache must be valid and must remain valid after x seconds. Omit-if-cached only accepts the data cached by the proxy and does not accept the response from the source server