In “What are Architectural Attributes?” I mentioned that the main way to improve “performance” is optimization, and one of the main ways to optimize is by adding caching!

There is a saying in software engineering: “There is no silver bullet”! That is to say, due to the complexity of software engineering, no one technology or method can solve all problems! Software engineering is complex and there are no silver bullets! But a certain problem in software engineering has a silver bullet!

As mentioned in architectural Style: Cure-all CS and Layering, “Any problem in computer science can be solved by adding an indirect middle layer”! The “cache layer” is arguably the most added layer! The main purpose is to improve performance! So caching is a performance silver bullet!

This paper will discuss the following:

  • The role of caching
  • Type of cache
  • Caching algorithm
  • Distributed cache
  • Use of cache
  • Caching in the network
  • The application cache
  • Database cache
  • Caches in computers

Start with the code

fn longRunningOperations(){ ... }let result = longRunningOperations(); // do other thingCopy the code

Looking at the pseudocode above, longRunningOperations is a time-consuming method (it takes tens of seconds or even minutes to call each time), such as:

  • Complex business logic calculations
  • Complex data queries
  • Time-consuming network operations

For this method, if you call it every time, it will affect performance and the user experience will be very bad.

So how do we deal with that?

Generally, there are several optimization schemes:

  • Optimize business code, such as: faster data structures and algorithms, faster IO model, database indexing, etc
  • Simplifying service logic may cause complex services. You can simplify service logic to reduce time-consuming services
  • Store the results of the operation. For example: for the results of some statistical classes, you can use the end of the day to execute, the results stored in the statistical results table, when searching, directly query from the results can be; For some temporary operations, the results can be stored in memory and retrieved directly from memory when called again.

This article focuses on the third option: use caching!

Active and passive caching

Typically we use caches to store content that has one or more of the following characteristics:

  • More frequent use
  • Infrequent changes
  • Time-consuming to obtain
  • Multiple system access

For instance,

  • Dictionary data: dictionary data is used in many parts of the system, and dictionary data is generally not modified after being configured. Although it is not time-consuming to obtain dictionary data directly from the database, it is not as fast as obtaining dictionary data directly from the cache due to multiple queries and network transmission
  • Seckill product information: There is a lot of traffic at seckill time and it is much faster to retrieve from the cache (static files, CDN, etc.) than to query from the database
  • Other frequently accessed information: The other information here is because its cache is handled differently than the dictionary processing above, as explained below.

For dictionary data, we usually load dictionary data directly into the cache when the system is started. Such cache data generally has no expiration time. When you modify a dictionary, the contents of the cache are updated. This type of cache is called an “active cache” because the cached data is updated by an active modification by the user.

For some information, the amount of information is too large to load into the cache all at once, and it is not clear which data is accessed frequently and which data is accessed infrequently. For such data, the general practice is:

  • It checks the cache to see if any data is accessed, and returns it directly to the user
  • If not, go back to the source
  • Find it and add it to the cache
  • Finally, it returns to the user

This type of caching is called “passive caching”! The expiration of its cached data is controlled by the system. So how does the system control that? This is where the cache replacement algorithm comes in!

Cache replacement algorithm

As mentioned above, for passive caches, there is too much information to load all data into the cache at one time. When the cache is full and new data needs to be added, it is necessary to determine which data should be removed from the cache to make room for new data.

The algorithm used to determine which data is preferentially removed from the cache is called the cache (page) displacement algorithm!

The Wiki lists the following substitution algorithms:

  • RR (Random replacement)
  • FIFO (First in First Out)
  • LIFO (Last in First Out)
  • MRU (Most recently used)
  • LFU (least-frequently used)
  • LRU (Least recently used)
  • TLRU (Time Aware least recently used)
  • PLRU (Pseudo – LRU)
  • LRU-K
  • SLRU (Segmented LRU)
  • MQ (Multi Queue)
  • LFRU (Least frequent recently used)
  • LFUDA (LFU with Dynamic Aging)
  • LIRS (Low Inter-reference recency Set)
  • ARC (Adaptive replacement Cache)
  • CAR (Clock with Adaptive replacement)
  • Pannier (Container-based Caching Algorithm for Compound Objects)

We don’t usually implement caching ourselves. There are many open source caching middleware out there, such as Redis and memcached. Here is a simple comb through several commonly used substitution algorithms.

FIFO

FIFO should be the simplest substitution algorithm:

  • It uses a queue to maintain data
  • Data is arranged in the order in which it is loaded into the cache, first at the head of the queue and later at the tail of the queue
  • When the cache is full, data is cleared from the queue head to make room for data that needs to be loaded
  • New data is added to the end of the queue

FIFO implementation is simple, but its performance is not always good. As a simple example, suppose a system needs 10 cached data, 5 of which happen to be in the head of the queue, 5 of which are not in the cache, and the queue is full. According to THE FIFO algorithm, 5 pieces of data that are not in memory are loaded into the cache, while the previous 5 pieces of data are erased. This requires loading the five cleared pieces of data into the cache again. This affects performance.

This problem can increase as the size of the cache allocated increases, and caching used to improve performance can now affect performance, a phenomenon known as the “Belady phenomenon”!

LIFO is very similar to FIFO, so I won’t repeat it here.

LRU

At present, the most commonly used substitution algorithm is called LRU substitution algorithm: the “least recently used” data is replaced first

  • Each piece of data is associated with the last time it was used
  • When it comes time to replace data, the LRU selects the data that has not been used for the longest time

There are many variants of LRU, for example:

  • TLRU (Time Aware least recently used) : Most cached data has an expiration Time. PLRU displaces data from two dimensions of least use and expiration time
  • Lru-k: Maintains an additional queue to record the number of times all cached data is accessed. Data is put into the cache only when it has been accessed K times. When data needs to be eliminated, lRU-K will eliminate the data whose KTH access time is the largest from the current time.
  • SLRU (Segmented LRU) 2Queue? : one FIFO queue, one LRU queue. When the data is accessed for the first time, the data is cached in the FIFO queue, and when the data is accessed for the second time, the data is moved from the FIFO queue to the LRU queue, and the two queues eliminate the data in their own way.
  • PLRU (pseudo-lRU) : LRU requires maintenance data access time, takes up extra space, and is too space-wasting for devices with very little space. PLRU only needs 1bit for each cache data to store data information, which can achieve the effect of LRU. See the following figure for the specific process:
! [non-functional constraints (1) – performance of silver bullet: cache] (https://p1-tt.byteimg.com/origin/pgc-image/27411dbc9fd14e79ad3040f68bd21139?from=pc)

There are MRU similar to LRU,LFU will not be repeated here!

The cache cluster

In order to improve the availability of caches, we usually do at least a primary and secondary cache, that is, a primary cache and a secondary cache.

  • Writes to the cache can only be written to the primary cache
  • The master cache synchronizes data to the slave cache
  • Data can be read from the primary cache. You can also read data from the cache (not required)
  • When the primary cache fails, the secondary cache is upgraded to the primary cache

A safer way to do this is to create a cache cluster:

  • Multiple machines cache the same data, one of which is the primary cache
  • Writes to the cache can only be written to the primary cache
  • The primary cache synchronizes data to other caches
  • Data can be read from the primary cache. You can also read data from other caches (not required)
  • When the primary cache fails, one of the other cache services is selected as the new primary cache

Distributed cache

Neither standalone cache, master/slave backup, nor cache cluster can solve the problem of cache size limitation. Because caches use memory, and a machine’s memory is limited. When the amount of data that needs to be cached far exceeds the memory size of a single machine, it is necessary to distribute the cached data across multiple machines. Each machine caches only a portion of the data, which is called distributed caching.

Distributed caching solves the problem of limited data cached by one machine, but it also introduces new problems:

  • What data should be cached on which server
  • How to ensure that the amount of data cached by each server is basically the same

The general approach is to hash the key and then mod the number of servers to determine which server the data is on. This solves “what data where the cache server” problem, but there is no guarantee that the cache for each server in the same amount of data “, because a number of key hash is probably after taking over all fell on the same server, which could lead to a lot of number one server cache, other server cache the amount of data are rare. A server with a large amount of cached data may run out of memory, triggering a data swap that degrades performance.

You can use a consistent hash ring to ensure that the amount of data cached by the server is roughly the same. The logic is as follows:

  • Distribute 0 to 2^32 points evenly on a circle
  • Each point corresponds to a cache server
  • The number of cache servers is far less than 2^32, so multiple nodes correspond to one cache server. The extra nodes are called virtual nodes
  • Ensure that the cache servers are evenly distributed
  • Again, we hash the key
  • I’m going to mod 2^32
  • The result corresponds to the hash ring
  • If it happens to fall on a node, the data is cached to the corresponding cache server
  • Otherwise, it is stored on the cache server corresponding to the node in front of the drop point

The ubiquitous cache

This is mostly about application caching, but caching is everywhere.

Here is a brief overview of where caching may be used in the process of visiting a website.

Web caching

When we type the URL in the browser, press Enter.

First, you need to look up the IP of the domain name! There are caches!

  • Browser cache: The browser caches DNS records for a period of time, first retrieving the corresponding IP address from the browser cache.
  • System cache: If the desired record is not found in the browser cache, it is searched in the system cache
  • Router cache: If not found in the system cache, the router cache is searched for records
  • ISP DNS cache: If you still can’t find it, go to the ISP DNS cache server. You can usually find the corresponding cached record here.
  • Recursive search: If none of the above caches can be found, you need to start a recursive search from the root DNS server

Once you find the IP, you don’t have to make a request, because the resource you’re accessing may have been accessed before and has already been cached in the browser cache. At this point, the browser simply returns to the cache without sending the request.

If there is no cache, a request is sent to retrieve the resource.

It might reach CDN later. CDN is an edge cache. When users visit the website, GSLB (Global Server Load Balance) technology is used to direct users to the nearest cache Server that works properly, and the cache Server directly responds to user requests. If the required resource is not found in the CDN, the request may end up in a reverse proxy.

Some reverse proxies can come from the same network as users, so when users access the reverse proxy server, they can get high quality response speed. Such reverse proxy cache is generally called edge cache, and CDN uses GSLB on the basis of edge cache

Generally, a reverse proxy has two functions:

  • Hide the source server to prevent malicious server attacks. The client does not perceive the difference between a proxy server and a source server
  • Cache: Caches the original server data to reduce the access pressure of the source server

If the required resource is also not found in the reverse proxy, the request goes to the source server for the resource.

Server and database cache

Generally, the Server receives a request, assembles a response based on the request, and returns it. This process may require database queries, business logic calculations, page rendering, and so on. Each of these steps can introduce caching.

For database queries, current persistence frameworks provide query caching. That is, for the same SQL query, the second query can not query the database, directly from the cache of the first query data. Saves the time consumed by calling database queries. For some highly trafficked data, it can also be cached in the caching middleware. The subsequent fetch is directly from the cache middleware.

And the database itself has a cache!

! [non-functional constraints (1) – performance of silver bullet: cache] (https://p3-tt.byteimg.com/origin/pgc-image/14ea23a939de43d3b7d0c161c9de4f1b?from=pc)
  • The client sends a query to the server
  • The server checks the query cache and returns the cached result immediately if it hits the cache. If not, then
  • The SQL is parsed, preprocessed, and the optimizer generates the corresponding execution plan
  • According to the execution plan, the storage engine’s API is invoked to execute the query
  • Returns the result to the client

Mysql’s query cache can be inefficient. First, the write cache is written in exclusive mode. Second, if a query result is cached, the cache will be invalidated whenever one of the tables involved is updated. For data that changes frequently, caching can be inefficient.

For business logic calculations, if some business logic is complex, the results can be cached. The results can be cached in a database or caching middleware. For a request with the same parameters, the second request does not need to be computed and returns the result directly from the cache.

For page rendering, in the case of some heavily visited pages and basically unchanged data, the page can be static. That is, static pages are generated. Instead of dynamically generating pages for each visit, pages are generated in advance and stored on disk. When the page is accessed, pages are directly retrieved from disk for return. Or cache the page content directly into caching middleware to further improve performance.

In addition, for the Server that needs to log in, the user information is actually cached. Whether it’s stored in the server Session or cached middleware. Otherwise, every time a user accesses the Server, the user information needs to be obtained from the database, which affects the Server performance.

Computer cache

Finally, the computer running the system has a lot of cache itself!

We all know that the general computer by CPU, memory, motherboard, hard disk, display card, mouse, keyboard, network card and so on! Storage devices include: cloud storage (such as Baidu cloud disk and NAS), local hard disk, memory, CPU cache (level-1 cache, level-2 cache, and level-3 cache), and CPU register. Their speeds vary by several orders of magnitude. The following figure shows the access rates for each device.

! [non-functional constraints (1) – performance of silver bullet: cache] (https://p1-tt.byteimg.com/origin/pgc-image/d6c8ccf82f6f49fd9fb04ae7750c181c?from=pc)
  • CPU registers are the fastest, at 1ns, but can store only a few hundred bytes and are the most expensive to build
  • Cache is second, also reached 10NS, can store tens of megabytes, cost second. L1, L2, L3 are getting slower and slower.
  • Then there’s memory, which is 100ns, up to GIGABytes, and costs less than cache (though it’s insanely expensive for two years)
  • Hard disk access rate is 10ms level, up to TB level, the cost can be said to be cabbage price
  • Cloud storage, on the other hand, reaches the second scale, which is basically unlimited if the money is enough

We all know that the CPU cache is “cache”, in fact, the above device, the upper device can be said to be the lower device “cache”!

In the book “Understanding Computers”, a brief introduction to the computer process of executing C language hello World program.

  • Run the command ‘./hello’ through mouse and keyboard input
  • Input content from the keyboard through the bus, into the register, and into memory
! [non-functional constraints (1) – performance of silver bullet: cache] (https://p3-tt.byteimg.com/origin/pgc-image/665f9e5f109041888dd09cccd9fafd5e?from=pc)
  • When you press Enter
  • Using DMA technology, the target file is read directly from the hard disk into memory
! [non-functional constraints (1) – performance of silver bullet: cache] (https://p3-tt.byteimg.com/origin/pgc-image/40991d8f6a8e4ae2b620e315970a7cb5?from=pc)
  • Last execution procedure
  • Copy Hello World to register
  • Copy from register to display
! [non-functional constraints (1) – performance of silver bullet: cache] (https://p3-tt.byteimg.com/origin/pgc-image/f7792c8968bc4cb389b11a77a6500aa2?from=pc)

You can see that the vast majority of operations are copies of data! Finally executed by the CPU, in order to faster data can reach the CPU, there is a layer of “cache”!

  • The data in the CPU register is directly used by the CPU, which is equivalent to L1 cache
  • L1 is the cache of L2, and L2 is the cache of L3
  • L3 is the cache of memory
  • Memory is the cache of the hard disk. For example, data stored on hard disks must be loaded to the memory before being used by the CPU. In addition, the “HMB memory buffer technology” of the hard disk can borrow the memory as the cache of the hard disk.
  • The hard disk itself also has cache, this is to reduce I/O operations, batch read and write.
  • A hard disk can also serve as a cache for cloud storage. For example, if the network is not good, we can download the movie first and watch it later, so that there will be no lag

conclusion

Performance is a non-functional constraint that needs to be considered when designing an architecture, and introducing caching is a simple and straightforward way to improve system performance.

This article starts from a simple pseudo code, briefly describes the role of cache, the technology involved and the current use of cache scenarios, in order to provide some reference to the architecture design.

The resources

  • Understanding Computer Systems in Depth
  • Diagram of TCP/IP
  • High Performance MySQL
  • Operating System Concepts
  • What really happens when you navigate to a URLCache replacement policies