How to design the Redis cache for a hundred million level system?? More challenges in August

This is the 29th day of my participation in the August More Text Challenge

Author: Tom Brother wechat official account: Micro technology

Hi, I’m Tom

Cache design is a cliche, memcache was used in the early days, now people tend to use Redis, in addition to knowing the common data storage type, combined with the business scenario of targeted choice, it seems that there is no other big difficulty.

RedisTemplate: RedisTemplate: RedisTemplate: RedisTemplate: RedisTemplate: RedisTemplate: RedisTemplate: RedisTemplate

If you have tens or hundreds of concurrent business scenarios, the cache design may not need to be considered that much, but what about a hundred-million-level system?

First, look at the cache knowledge graph

Early caches used RAM to speed up CPU data exchange. With the rapid development of the Internet, the application of cache is more extensive. All storage media used for high-speed data exchange are called cache.

What metrics should we focus on when using caching? What are the application patterns of caching? And what are the Tip tips for cache design? A picture is worth a thousand words, as follows:

Seven classic Questions

Caching will inevitably encounter some problems in the process of use. For high frequency problems, we can roughly classify them into 7 categories. The specific content below us one by one

This PDF includes Java basics, Java concurrency, JVM, MySQL, Redis, Spring, MyBatis, Kafka, design patterns and other interview questions. Links to download address: baidu cloud: pan.baidu.com/s/1XHT4ppXT… Extraction code: s3AB

1. Cache set failure

When the service system queries data, it first queries the Cache. If the data does not exist in the Cache, it queries the DATABASE, preheats the data to the Cache, and returns the data. The performance of cache is 50 to 100 times higher than that of DB.

Many business scenarios, such as: kill goods in seconds, top search on Weibo, or some active data, are preheated to the cache in batches and in a centralized manner by running tasks. The cache data has nearly the same expiration time.

When this batch of data expires, all the requests for this batch of data will expire together. In this case, all the requests for this batch of data will be cache invalid, and the pressure will be transferred to DB. As a result, DB’s requests surge, the pressure increases, and the response becomes slow.

So is there a solution?

Of course there is.

We can adjust the original fixed expiration time to expiration time = basic time + random time from the entry of the expiration time of cache, so that the cache will expire slowly, so as to avoid instant expiration, which will cause excessive pressure on DB.

2. Cache penetration

Not all requests are retrievable, either from the cache or DB.

Suppose a hacker attacks a forum and uses a bunch of spam to access a nonexistent post ID. If the database is not in the Cache, then the database is not in the Cache. If the database is not in the Cache, then the database is not in the Cache.

The poor DB throughput performance seriously affects the system performance and even the access of normal users.

Solution:

Solution 1: When searching DB, if the data does not exist, preheat a special null value to cache. In this way, subsequent queries will hit the cache, but special values will be parsed.
Scheme 2: Construct a BloomFilter and initialize the full data. When receiving a BloomFilter request, check whether the key exists in BloomFilter. If not, return the key directly without querying the cache and DB

Cache avalanche

Cache avalanche refers to the unavailability of some cache nodes, which leads to the unavailability of the entire cache architecture and even the service system.

The distributed cache design generally adopts consistent Hash. When some nodes are abnormal, the rehash policy is adopted, that is, requests from abnormal nodes are evenly distributed to other cache nodes. However, when the heavy traffic peak arrives, if the heavy traffic key is concentrated and happens to be in one or two cache nodes, it is easy to overload the memory and network adapter of these cache nodes, and Crash the cache nodes abnormally. Then these abnormal nodes go offline, and these heavy traffic key requests are rehash to other cache nodes. As a result, other cache nodes are also overloaded and Crash, and cache anomalies continue to spread. Finally, the whole cache system becomes abnormal and cannot provide services externally.

Solution:

Plan 1: Add real-time monitoring and timely warning. Through machine replacement and various automatic failover strategies, the external service capability of cache can be quickly restored
Scheme 2: Multiple copies are added to the cache. If the cache is abnormal, other copies are read. To ensure the availability of replicas, deploy multiple cache replicas on different racks to reduce risks.

4. Cache hotspots

For emergencies, when a large number of users access hotspot information at the same time, the cache node where the hotspot information resides is prone to overload and lag, or even Crash, which is called cache hotspot.

This is often encountered in Sina Weibo, a big V star derailment, marriage, divorce, instantly triggered hundreds of millions of people to watch, access the same key, traffic focused on a cache node machine, it is easy to hit the network card, bandwidth, CPU upper limit, and ultimately lead to the cache unavailable.

Solution:

You can find the hot key first. For example, you can use Spark real-time flow analysis to discover new hot keys in a timely manner.
Decentralize centralized traffic to avoid overload of one cache node. Since there is only one key, we can put an ordered number after the key, such as key#01, key#02… Multiple copies of key#10. These processed keys reside on multiple cache nodes.
At each request, the client randomly accesses one

A cache service management background can be designed to monitor the SLA of cache in real time, and a distributed configuration center can be opened. For some hot keys, rapid and dynamic expansion can be achieved.

5. Cache big Key

When accessing the cache, if the value of the key is too large, the read/write and load times out easily, causing network congestion. In addition, if there are too many fields in the cache, changes in each field will cause changes in the cache data, resulting in frequent read and write operations and slow query. If the key expires and becomes invalid, it takes a long time to warm up data and slow query.

Therefore, when designing the cache, we should pay attention to the granularity of the cache. If it is too large, it will easily lead to network congestion. It should not be too small, because if it is too small, the query frequency will be high, requiring multiple queries per request.

Solution:

Scheme 1: Set a threshold value. When the length of value exceeds the threshold value, compress the content and reduce the kv size
Scheme 2: Evaluate the proportion of large keys. Since many frameworks use pooling techniques, such as Memcache, large object space can be pre-allocated. When it comes to real business requests, use them directly.
Scheme 3: particle division, split the large key into several small keys, independent maintenance, the cost will be reduced
Scheme four: Large keys to set a reasonable expiration time, try not to eliminate those large keys

6. Cache data consistency

Caching is used for acceleration, not persistence. Therefore, a piece of data is usually stored in DB and cache, which brings up the problem of how to ensure data consistency between the two. In addition, cache hot spots introduce multiple duplicate backups, and inconsistencies may occur.

Solution:

Solution 1: If the cache update fails, try again. If the retry fails, write the failed key to the MQ message queue and use asynchronous tasks to compensate the cache to ensure data consistency.
Solution 2: Set a short expiration time. After the cache expires, the cache reloads the latest data through self-repair

7. Data concurrency competition preheating

Once the data in the cache expires or is deleted for some reasons, the data in the cache is empty. A large number of concurrent thread requests (to query the same key) will query the database together, and the pressure of the database increases suddenly.

If the volume of requests is very large and all the requests are placed on the database, it may overwhelm the database and make the entire system service unavailable.

Solution:

Plan 1: Introduce a handfulGlobal lock, when the cache does not hit, first try to obtain the global lock, if the lock, is eligible to queryDBAnd preheat the data to the cache. Although the client initiates a large number of requests, it can only wait because it cannot obtain the lock. After the data in the cache is preheated successfully, it obtains data from the cache

In order to facilitate understanding, a simple drawn flow chart. There is a special attention to a point, because there is a concurrent time difference, so there will be a second check cache to check whether there is a value, to prevent the cache preheat repeated overwriting.

Scheme 2: Create multiple backups of cached data. When one backup expires, you can access other backups.

Write in the last

When designing a cache, there are many tricks and optimization techniques that vary, but we need to focus on the core elements. That is, keep the accesses as close to hitting the cache as possible while keeping the data consistent.

More:

Github.com/aalansehaiy…

Author introduction: Tom brother, computer graduate student, the school recruited ali, P7 technical expert, has a patent, CSDN blog expert. Responsible for e-commerce transactions, community fresh, flow marketing, Internet finance and other businesses, many years of first-line team management experience

How to design the Redis cache for a hundred million level system?? More challenges in August

First, look at the cache knowledge graph

Seven classic Questions

1. Cache set failure

2. Cache penetration

Cache avalanche

4. Cache hotspots

5. Cache big Key

6. Cache data consistency

7. Data concurrency competition preheating

Related Posts

Node.js require and Module

Interview review and analysis: Sequential succession in binary tree with O(logN) time complexity

Python, Shell, three standard files