preface

Do some business C end, inevitable to introduce level cache instead of the pressure of the database and reduce the response time of business, in fact every time the introduction of a middleware to solve the problem at the same time, brings a lot of new problems need to pay attention to, such as the article mentioned in the database and cache consistency of actual combat cache consistency in how to do. In fact, there will be some other problems, such as hot key, big key and other problems may be brought by using Redis as level 1 cache. In this paper, we will discuss the hot key problem and how to solve the hot key problem reasonably.

The body of the

background

What is hot key and how does it cause it?

Generally speaking, the cache Redis we use is the cluster version of multi-node. When reading and writing a key, the corresponding slot will be calculated based on the hash of the key. According to this slot, the corresponding sharding (a Redis cluster consisting of a master and multiple slaves) can be found to access the K-V. However, in the actual application process, a large number of requests may occur to access the same key for some specific services or certain periods of time (such as commodity kill activities of e-commerce services). All requests (and the read/write ratio of such requests is very high) will fall on the same Redis server, and the load of the redis will be severely aggravated. In this case, it is useless to add new Instances of Redis to the whole system, because according to the hash algorithm, requests with the same key will still fall on the same new machine. This machine will still become the system bottleneck 2, and even cause the whole cluster to break down. If the value of the hot key is too large, the network adapter will reach the bottleneck. This problem is called “hot key” problem.

As shown in Figure 1 and figure 2 below, the normal Redis cluster cluster and the redis cluster key accessed by a layer proxy are respectively.

As mentioned above, hot keys will bring high load pressure to a small number of nodes in the cluster. If not handled correctly, these nodes may break down, which will affect the operation of the whole cache cluster. Therefore, we must find hot keys in time and solve the hot key problem.

1. Hot key detection

Hot key detection, due to the dispersity of redis cluster and some significant effects brought by hot key, we can do hot key detection scheme through coarse and fine thinking process.

1.1 QPS monitoring for each slot in the cluster

The most obvious impact of hot key is that the QPS in the whole Redis cluster is not that large, and the traffic is distributed in uneven slots in the cluster. Therefore, the first thing we can think of is to monitor the traffic in each slot and then compare the traffic in each slot. You can find the specific slot affected when a hot key appears. Although this monitoring method is the most convenient, its granularity is too coarse. It is only applicable to the cluster monitoring solution in the early stage, and not applicable to the scenario where hot keys are accurately detected.

1.2 Proxy The proxy mechanism is used to collect statistics on the entire traffic entrance

If we use the Redis cluster proxy proxy mode shown in Figure 2, since all requests are sent to the proxy first and then to the specific slot node, the hot key detection statistics can be done in the proxy. In the proxy, each key is counted based on the time sliding window. Then, the number of keys exceeding the threshold is displayed. To prevent excessive redundant statistics, you can also set some rules to count only the keys corresponding to the prefix and type. This method requires at least a proxy mechanism, which requires redis architecture.

1.3 Redis hot key discovery mechanism based on LFU

Redis 4.0 or later supports the LFU-based hotkey discovery mechanism on each node. You can use redis-cli -hotkeys, and add the -hotkeys option when running redis-cli. You can periodically run this command on a node to discover the corresponding hotspot key.

As you can see belowRedis - cli - hotkeysThe command takes a long time to execute. You can set the command to be executed periodically.

1.4 Detection based on Redis client

Because the command of REDis is issued from the client every time, we can make statistics in some codes of Redis client. Each client makes statistics based on the time sliding window, and after exceeding a certain threshold, it is reported to the server, and then the server sends the statistics to each client uniformly. And set the expiration time.

This approach may seem more elegant, but it may not be suitable in some scenarios, because the change on the client side will cause more memory overhead for the running process. More directly, for Java and goLang, automatic memory management languages, objects will be created more frequently. This triggers the GC and increases the interface response time, which is less predictable.

Finally, we can make corresponding choices based on the infrastructure of each company.

2. Hot key solution

We detect the corresponding hot key or slot in the above ways, so we need to solve the corresponding hot key problem. There are several ways to deal with hot keys, so let’s go through them one by one.

2.1 Traffic Limiting for Specific Keys or Slots

The simplest and most crude method is to limit traffic for specific slots or hot keys. This method is obviously harmful to services. Therefore, it is recommended to limit traffic only when online problems occur and you need to stop loss.

2.2 Using level 2 (local) caching

Local cache is also the most common solution, since our level 1 cache can’t handle this much pressure, let’s add a level 2 cache. Since each request is sent by the Service, this level of cache is best added to the service side. Therefore, when the server obtains the corresponding hot key, it can use the local cache to store a copy and request it again after the local cache expires, reducing the pressure on the Redis cluster. In the case of Java, guavaCache is an off-the-shelf tool. Examples:

Private static LoadingCache<String, List<Object>> configCache = cacheBuilder.newBuilder ().concurrencyLevel(8) The number of CPU cores is recommended. ExpireAfterWrite (10, Timeununit.seconds)// how long does the data expire after being writed.initialcapacity (10)// initialize the size of the cache.maximumsize (10)// the maximumSize of the cache.recordstats () // The build method can specify the CacheLoader, Build (new CacheLoader<String, List<Object>>() { @Override public List<Object> load(String hotKey) throws Exception { } }); // Get Object result = configcache.get (key);Copy the code

The biggest impact of local cache on us is data inconsistency. How long we set the cache expiration time will lead to the maximum amount of online data inconsistency. This cache time needs to measure its own cluster pressure and the maximum time of inconsistency accepted by the business.

2.3 open the key

How to avoid hot key problems and ensure data consistency as much as possible? Removing keys is also a good solution.

We split the cache key of the corresponding business into multiple different keys when we put it into the cache. As shown in the figure below, we first split the key into N pieces on the update cache side. For example, if a key is named “good_100”, we can split it into four pieces. “Good_100_copy1”, “good_100_COPY2”, “good_100_COPY3”, “good_100_COPY4”, each update and new need to change the N key, this step is to remove the key.

For the service end, we need to find ways to evenly access the traffic, how to add suffixes to the hot key that we will access. In several ways, hash the IP or MAC address of the local computer, and mod the value to the number of removed keys to determine what suffixes are spliced into and which machine is sent to. A random number at service startup mod the number of detachments.

2.4 Another way to configure the local cache center

For the partners who are familiar with the microservice configuration center, our idea can be changed to the consistency of the configuration center. How does nacOS, for example, achieve distributed configuration consistency and speed? So we can compare caching to configuration and do it this way.

Long polling + localized configuration. First, all the configurations will be initialized when the service is started, and then the long poll will be periodically started to check whether the configuration of the current service listening is changed. If there is change, the long poll request will be returned immediately to update the local configuration. If no changes are made, the local memory cache configuration is used for all business code. This ensures the timeliness and consistency of distributed cache configuration.

2.5 Other plans that can be made in advance

Each of the above solutions solves the hot key problem relatively independently, so if we are really faced with business demands, we will actually have a long time to consider the overall solution design. If we have sufficient budget, we can directly isolate services and redis cache cluster for hot key problems caused by some extreme seconds killing scenarios, so as to avoid affecting normal services and temporarily take better disaster recovery and traffic limiting measures.

Some integrated solutions

At present, there are many relatively complete application-level solutions for hotkeys on the market, among which JD has an open source hotKey tool. The principle is to do insight on the client side and report the corresponding hotKey, and when detected by the server side, the corresponding hotKey is delivered to the corresponding server side for local cache. In addition, the local cache will be updated synchronously after the remote corresponding key is updated. It has become a relatively mature solution for automatic detection of hot keys and distributed consistent cache, as well as jd retail hot keys.

conclusion

The above are some solutions to deal with hot keys that the author roughly knows or has practiced, from discovering hot keys to solving the two key problems of hot keys. Each scheme has its advantages and disadvantages, such as inconsistency of business and difficulty in implementation, etc. Adjustments and changes can be made according to the characteristics of its own business and the current infrastructure of the company.

Tomorrow will be the first working day of the Year of the Tiger in the Lunar calendar. I wish you all good health, promotion and salary increase this year.