Psychological guidance

You’ve read a lot of articles about caching and have a general idea. And then when you go to the interview, you’re still a little stammered? Flustered by questions? Face over the heart also have no bottom 😶. Let me share the notes from Onenote. Of course, for the sake of reliability, use professional language whenever possible. Rely on the HTTP Definitive Guide for descriptions. As for eggs, flip them slowly

It’s not that no-cache, Etag, Expires, and if-Modified-since are words that bother us, but If HTTP didn’t come along and allow you to do that, you’d be agreeing to constraints. Just on the basis of the original imperfect, bit by bit optimization, so it developed into the present. The code you write goes from zero to good to optimized, and the same goes for having your colleagues look at your code (shit). So ah, do not be afraid of trouble, a little patience to see (see not finished collection down, next time then see hee hee 😁).

In fact, web cache is nothing more than: database cache, server-side cache (proxy server cache, CDN cache), browser cache. In this case, let’s start with the HTTP cache in the browser cache.

Why cache?

Every morning, I set the alarm clock to collect energy in Ant Forest (TOU) of Alipay on time. The apple 6S in my hand, which has been used for more than 4 years, is very slow, and it takes half A day to load every time I enter A friend’s home page. Colleague A (the same phone as me) said that the network is very bad, and colleague B’s Apple XS seconds, almost invisible delay. In fact, the network here innocent back pot, mobile phone configuration performance can not keep up with ah, like the home that 04 years of the desktop, card like a robot. Of course it’s so fast. There’s no credit for caching. This is the most typical case in life, but also directly affect the speed of energy collection every day!

If you can, leave alipay account in the comment area, cut leeks!! 😍

So, what problems does caching solve?

  • Caching reduces redundant data transfers and saves you money on your network.
  • Caching relieves network bottlenecks. Pages load faster without requiring more bandwidth.
  • Caching reduces the requirements on the original server. The server can respond more quickly and avoid overloads.
  • Caching reduces distance latency because it is slower to load pages from farther away.

Why drive to the train station when I can buy a train ticket on my phone? You know what? Let’s get the king.

Of course, the cache cannot hold a copy of every document in the world, but existing copies can be used to service certain requests that reach the cache. This is called a cache hit, and other requests that reach the cache may be forwarded to the original server because no copy is available. This is called a cache miss. As follows:



Overview of cache processing steps

1. Step 1: Accept ——Cache reads incoming request packets from the network

The cache detects activity on a network connection and reads the input data. High-performance caches read data from multiple input connections at the same time and start processing transactions before the entire message arrives.

Step 2: ——Cache parses packets and extracts urls and headers

Next, the cache parses the request message into fragments, putting the pieces of the header into easy-to-manipulate data structures. This makes it easier for the caching software to process the header fields and modify them.

The parser is also responsible for standardizing the header sections, treating less important differences such as case or replaceable data formats as equally valid. Also, some request messages contain complete absolute urls, while others contain relative urls and Host headers, so parsers often hide these details.

3. Step 3: Check ——Caches to see if a local copy is available, and if not, retrieves a copy (and saves it)Local)

In step 3, the cache retrieves the URL and looks for a local copy. A local copy may be stored in memory, on a local disk, or even on another nearby computer. Professional caches use fast algorithms to determine whether an object is in the local cache. If the document is not available locally, it can either fetch it from the original server or the parent agent, or return an error message, depending on the situation and configuration.

The cached object contains both the server response body and the original server response header, so that the correct server header is returned in case of a cache hit. The cached object also contains metadata that records how long the object has been in the cache and how many times it has been used.

Complex caches also retain a copy of the original client response header that triggered the server response for HTTP/1.1 content negotiation.

4. The fourth step: freshness detection ——The cache checks to see if the cached copy is fresh enough, and if not, asks the server if it isThere is no update

HTTP keeps a copy of a server document for a period of time through caching. During this time, the document is considered “fresh” and can be served by the cache without contacting the server. But once a cached copy stays for too long beyond the freshness limit of the document, the object is considered “out of date” and the cache verifies with the server again to see if the document has changed before serving it. To make things even more complicated, all request headers sent by the client to the cache can themselves force the cache to revalidate, or avoid validation altogether.

HTTP has a very complex set of freshness detection rules, compounded by the large number of configuration options supported by caching products and the need to interact with non-HTTP freshness standards.

5. 5. Smart refrigeratorCreate a response ——The cache constructs a response message from the new header and the cached body

We want the cached response to look like it came from the original server, and the cache takes the cached server response head as the starting point for the response head. The cache then modifies and extends these base headers.

The cache is responsible for adapting these headers to match the requirements of the client. For example, the server may return an HTTP/1.0 response (or even an HTTP/0.9 response) while the client expects an HTTP/1.1 response, in which case the cache must translate the header accordingly. Caches also insert freshness information into them (cache-Control, Age, and Expires headers), and often include a Via header indicating that the request was provided by a proxy Cache.

Note that the cache should not adjust the Date header. The Date header represents the Date on which the object was originally generated by the original server.
 

6. 6. Smart refrigeratorSend — — — — — –The cache sends the response back to the client over the network

Once the response header is ready, the cache sends the response back to the client. Like all proxy servers, the proxy cache manages the connection to the client. High-performance caches try to send data as efficiently as possible and generally avoid copying document content between the local cache and the network I/O buffer.

7. Part SEVEN: Log ——The cache optionally creates a log file entry to describe this transaction

Most caches hold log files and some statistics related to the use of the cache. At the end of each cache transaction, the cache updates statistics on the number of cache hits and misses (as well as other relevant metrics) and inserts the entry into a log file that displays the request type, URL, and event that occurred.

Instead of saying so much, here’s a picture, clear and objective:



Strong cache

Through special HTTP cache-control and Expires headers, HTTP lets the original server append an “expiration date” to each document. Like the expiration date on a quart of milk, these headers indicate how long the content can be considered fresh.



The server specifies the expiration date with either the HTTP/1.0+ Expires header or the HTTP/1.1 cache-Control: max-age response header, along with the response body. The Expires header does essentially the same thing as the cache-Control: max-age header, but because the cache-control header uses relative time rather than absolute dates, we prefer to use the newer cache-Control header. The absolute date depends on the correct setting of the computer clock.

That is, cache-control has a higher priority.

Server revalidation

Just because a cached document has expired does not mean it is actually different from the document currently active on the original server; It just means it’s time to check. This is called server revalidation, and it means that the cache needs to ask whether the original server document has changed.

  • If the revalidation display changes, the cache takes a new copy of the document, stores it in place of the old document, and sends the document to the client.

  • If you revalidate that the display has not changed, the cache simply gets a new header, including a new expiration date, and updates the header in the cache.



Let’s look at the cache-control header :(it doesn’t have to be remembered all at once, but only one at a time.) 🏆)

1) Cache-control: no-store

Disallow all caching (which is what responses are not cached for). The cache typically forwards a no-store response to the client, like an uncached proxy server, and then deletes the object.

2) cache-control: no-cache

Forces the client to send requests directly to the server, meaning that each request must be sent to the server. The server receives the request and determines if the resource has changed, returns new content if it has, or 304 unchanged if it has not. This can be very misleading and can be mistaken for a response that is not cached. Cache-control: no-cache is actually cached, but the Cache evaluates the validity of the cached response to the server each time it provides response data to the client (browser).

Pragma: no-cache headers are provided in HTTP/1.1 for HTTP/1.0+ compatibility. All HTTP 1.1 applications should use cache-Control: no-cache except when interacting with HTTP/1.0 applications that only understand Pragma: no-cache.
Technically, Pragma:no-cache headers can only be used for HTTP requests, but in practice they are widely used for HTTP requests and responses as extension headers.

3) cache-control: must-revalidate

A stale copy of this object cannot be provided without prior revalidation with the original server. The cache can still serve fresh copies at will. If the original server is not available while the cache is performing a must-revalidate freshness check, the cache must return a 504 Gateway Timeout error.

4) Cache-control: max-age

The header represents the number of seconds that the document can be considered fresh from the time it was sent from the server.

5) Cache-control: s-maxage

This is the same as max-age, but only for proxy caches.

6) Cache-control: private

It can only be used for individual users and cannot be cached by proxy servers.

7) cache-control: public

Can be cached by any object, including the client that sends the request, the proxy server.

8) Cache-control: max-stale

The cache can provide expired files at will. If the parameter is specified, the document cannot expire during this time. This directive relaxes the cache rules.

Control HTML caching over HTTP-equiv

HTML 2.0 defines the <META HTTP-equiv > tag. This optional tag is located at the top of the HTML document and defines the HTTP header that should be associated with the document.

<HTML>
    <HEAD>
	<TITLE>My Document</TITLE>

	<META HTTP-EQUIV="Cache-control" CONTENT="no-cache">
    </HEAD>
    ... 			
</HTML>		Copy the code

The HTTP server can use this information to process documents. In particular, it can include a header field in the response sent for a message requesting this document: the header name is taken from the http-Equiv attribute value, and the header value is taken from the CONTENT attribute value.

Unfortunately, supporting this optional feature puts extra load on the server, the values are static, and it only supports HTML, not many other file types, so few Web servers and proxies support this feature.

In summary, the <META HTTP-equiv > tag is not a good way to control document caching features. Configuring a proper server to issue HTTP headers is the only reliable way to deliver document cache control requests.



tips:

Web browsers all have Refresh or Reload buttons that force a Refresh of content in the browser or proxy cache that might be out of date. The Refresh button issues a GET request with the cache-Control request header attached, which forces either a revalidation or an unconditional fetch of the document from the server. The exact behavior of Refresh depends on the configuration of the specific browser, document, and interceptor cache.

Borrow a picture: Understand web caching



Conditional method revalidation (negotiated cache)

The if-modified-since: Date revalidation

If the document has been modified since the specified date, the requested method is executed. Can be used in conjunction with the last-Modified server response header to retrieve content only if it has been Modified to differ from the cached version.

  • If the document has been Modified Since the specified date, the if-modified-since condition is true, and usually GET executes successfully. A new document with a new header is returned to the cache, and the new header contains, among other things, a new expiration date.

  • If the document has Not been Modified since the specified date, the condition is false and a small 304 Not Modified response message is returned to the client. For validity, the body of the document is Not returned. 16 These headers are returned in the response, but only those headers that need to be updated at the source are returned. For example, the header of the Content-Type is usually not modified, so it usually doesn’t need to be sent. A new expiration date is typically sent.

  • The if-Modified-since header works with the last-Modified server response header. The original server appends the last modification date to the supplied document. When the cache revalidates a cached document, it contains an if-modified-since header with the date the cached copy was last Modified:

    If-modified-since revalidation returns a 304 response If no change has occurred, or a 200 response with the new body If it has.



If None – Match: the etag

ETag Entity tag: It is usually a hash value of the resource entity. That is, ETag is a tag generated by the server to identify whether the returned value has changed. Etag has a higher priority than last-Modified.

Instead of matching documents with the most recently modified date, the server can provide documents with special labels that act like serial numbers. If the cached label is different from the one in the server document, the if-none-match header executes the requested method.

Why is if-modified-since used If- none-match 😇?

  • Some documents may be periodically rewritten (for example, written from a background process), but the actual data contained is often the same. Although the content does not change, the modification date does.

  • Some documents may have been modified, but the changes are not significant enough to require a worldwide cache to reload the data (such as changes to spelling or comments).

  • Some servers cannot accurately determine the last modification date of their pages.

  • One-second change dates may not be sufficient for servers that provide documents that change in subsecond intervals (for example, live monitors).

The first time you make an HTTP request, the server returns an Etag, and the second time you make the same request, the client sends an if-none-match, which is the value of the Etag (set here by the requesting client).. There is a document with an entity label Etag: V2.6 in the cache. It revalidates with the original server, and if the tag Etag: V2.6 no longer matches, a new object is requested. Assuming the tag still matches, one is returned
304 Not Modified The response. If the entity tag on the server has changed (perhaps to Etag: V3.0), the server will change the entity tag in a
200OKThe new content is returned in the response along with the corresponding new Etag.

Here’s an example:

The first time you make an HTTP request, the server returns an Etag



And when you make the same request a second time, the client sends an if-none-match, and its value is the Etag value (set here by the requesting client).



What most people don’t know:

Sometimes, clients and servers may need to adopt a less precise approach to entity tag validation. For example, a server might want to beautify a large, widely cached document, but it doesn’t want to generate a lot of traffic when the cache server revalidates. In this case, the server can prefix the tag with “W/” to broadcast a “weak” entity tag. In the case of weak entity tags, the tag changes only when there is a significant semantic change in the associated entity. Strong entity tags, on the other hand, change regardless of the nature of the associated entity.

The following example shows how a client can request revalidation from the server with weak entity tags. The server returns the body only if the content of the document has changed significantly since version 4.0:

GET/announce. HTTP / 1.1 HTML

If None - Match: W/" v4.0"

To summarize, when a client accesses the same resource multiple times, it first needs to determine whether its current copy is still fresh. If they are no longer fresh, they must get the latest version from the server. To avoid receiving an identical copy without changing the resource, the client can send a conditional request to the server stating a captcha that uniquely identifies the client’s current copy. The server only sends copies of the resource and the client if they are different.

Jane Shu, for example, has a similar example:



So, what exactly are Easter eggs?

Yesterday I asked the big guy next to me to discuss about login status. Is there any other way besides cookie and token?

Because CSRF can take the value of cookie and XSS can take the value of token, if a site has these two infiltration attacks, it is not finished.



Just this time to see some methods, (actually also not good 😆, but also ideas)

  • The HTTP header that carries user identity information.

  • Client IP address tracing: Identifies users by their IP addresses.

  • User login, using authentication to identify users.

  • Fat URL, a technique for embedding identifying information in a URL.

From:The user’s E-mail address

The From header contains the user’s E-mail address. Each user has a different E-mail address, so it makes sense to use this address as a viable source to identify the user. But few browsers send From headers for fear that unscrupulous servers might collect E-mail addresses for spamming. In fact, the From header is sent by an automated robot or spider, so the webmaster has a place to send angry complaints when a question comes up.

User-agent: indicates the browser software of a User

The user-Agent header can tell the server about the browser the User is using, including the name and version of the program, and often also about the operating system. This header is useful for good interoperability between custom content and specific browsers and their ownership, but it doesn’t provide much meaningful help in identifying specific users.

Referer:Users follow the link from this page

The Referer header provides the URL of the user’s source page. The Referer header itself cannot be fully identified
But it does indicate which page the user has visited before. It allows you to better understand the user’s browsing
Behavior, and what the user is interested in.

Authorization:User name and password

To make logging into Web sites easier, HTTP includes a built-in mechanism to transmit user information to Web sites using wwW-Authenticate headers and Authorization headers. Once logged in, the browser can continuously send this login information in every request to the site, so that login information is always available.

Client – IP:IP address of the client

X-forwarded-for: indicates the IP address of the client

This works if each user has a different IP address, the IP address (if it changes at all) rarely changes, and the Web server can determine the client IP address for each request. The IP address of the client is usually not provided in the HTTP header, but the Web server can find the IP address at the other end of the TCP connection hosting the HTTP request.

However, using the client IP address to identify users has many disadvantages that limit its effectiveness as a user identification technique.

  • The client IP address describes the machine being used, not the user. If multiple users share the same computer, it is impossible to distinguish between them.

  • Many Internet service providers dynamically assign IP addresses to users when they log on. Each time a user logs in, he or she gets a different address, so the Web server cannot assume that an IP address identifies the user between login sessions.

  • To improve security and manage scarce Address resources, many users browse online content through Network Address Translation (NAT) firewalls. These NAT devices hide the IP addresses of the actual clients behind the firewall, translating the actual client IP addresses into a shared firewall IP address (and different port numbers).

  • HTTP proxies and gateways typically open new TCP connections to the original server. The Web server will see the IP address of the proxy server, not the client. Some brokers get around this problem by adding a special client-IP or X-Forwarded-For header to store the original IP address (see Figure 11-1). But not all agents support this behavior.

Fat URL

Some Web sites generate a specific version of the URL for each user to track their identity. Typically, the real URL is extended to add some status information at the beginning or end of the URL path. When the user views the site, the Web server dynamically generates some hyperlinks to continue to maintain the state information in the URL.

A MODIFIED URL that contains user status information is called a fat URL. Here are some examples of fat urls used by the Amazon.com e-commerce site. Each URL is followed by a user-specific identification code (002-1145265-8016838 in this case) that helps track users as they browse store content.

. <a href="/exec/obidos/tg/browse/-/229220/ref=gr_gifts/002-1145265-8016838">All Gifts</a><br>
<a href="/exec/obidos/wishlist/ref=gr_pl1_/002-1145265-8016838">WishList</a><br>
...
<a href="http://s1.amazon.com/exec/varzea/tg/armed-forces/-//ref=gr_af_/002-1145265-8016838">Salute Our Troops</a><br>
<a href="/exec/obidos/tg/browse/-/749188/ref=gr_p4_/002-1145265-8016838">Free Shipping</a><br>
<a href="/exec/obidos/tg/browse/-/468532/ref=gr_returns/002-1145265-8016838">Easy Returns</a>
... Copy the code

Several separate HTTP transactions on a Web server can be bundled into a “session” or “access” using a fat URL. When a user visits the Web site for the first time, a unique ID is generated, added to the URL in a way that the server can recognize, and the server redirects the client to the fat URL. Whenever the server receives a request for a fat URL, it can look up all the incremental states associated with that user ID (shopping cart, profile, etc.) and rewrite all the output hyperlinks to make it
Is the fat URL to maintain the user ID.

But there are several serious problems with this technique:

  • Unable to share URL Fat urls contain state information related to a particular user and session. If you send this URL to someone else, you may inadvertently share all the personal information you’ve accumulated.

  • Breaking the cache to produce a user-specific version for each URL means that there are no more publicly accessible urls to cache.

  • Additional server load The server needs to rewrite HTML pages to fatten the URL.

  • Escape users can easily unintentionally “escape” from a fat URL session when they jump to another site or request a specific URL. Fat urls work only if the user strictly follows the pre-modified link. If the user runs away from the link, he loses information about his progress (perhaps an already full shopping cart) and has to start over.

  • Non-persistent between sessions unless the user bookmarks a particular fat URL, all information is lost when the user logs out.

  • The ugly URL

    Fat urls displayed in the browser can be confusing to new users.


it’s over.

Have a question, welcome to correct ah ~