In Redis, we often encounter bigkeys and hotkeys. If large keys and hot keys are not discovered and handled in a timely manner, service performance may deteriorate, user experience may deteriorate, and a large number of faults may occur.

The author | | smoke turn source ali technology to the public

A preface

In the use of Redis, we often encounter bigkeys (” big keys “below) and hotkeys (” hot keys” below). If large keys and hot keys are not discovered and handled in a timely manner, service performance may deteriorate, user experience may deteriorate, and a large number of faults may occur.

Two Key and hot Key definitions

We can often see the definition of big Key and hot Key in the company’s internal Redis development specification manual or a large number of Redis best practice articles on the Internet. However, the criteria for determining big Key and hot Key in these materials are not the same, but it is clear that they are determined in the same dimension: Large keys are usually determined by data size and number of members, while hot keys are determined by the frequency and number of requests they receive.

1 What is a big Key

A Key containing large data or a large number of members or lists is usually called a big Key. Here we will use several practical examples to describe the characteristics of a big Key:

  • A STRING Key with a value of 5MB
  • A LIST Key with 20,000 lists (too many lists)
  • A Key of type ZSET with 10000 members (too many members)
  • A HASH Key that has only 1000 members but a total value size of 100MB.

It is important to note that in the above example, we have given specific numbers for the data, members, and list numbers of the big Key for ease of understanding. In order to avoid misleading, in actual business, the judgment of big Key still needs to be made comprehensively according to the actual application scenarios and business scenarios of Redis.

2 What is a hot Key

When a Key receives significantly more access times than other keys, it is called a hot Key. Common hot keys are as follows:

  • One Redis instance had a total of 10,000 visits per second, and one Key had 7,000 visits per second (significantly more visits than the others).
  • Send a large number of HGETALL per second on a HASH Key with thousands of members and a total size of 1MB (bandwidth usage is significantly higher than other keys)
  • Send a large number of Zranges per second to a ZSET Key with tens of thousands of members (significantly higher CPU usage than other keys)

Three Key and hot Key problems

In the use of Redis, large keys and hot keys will bring a variety of problems to Redis, and the most common problems are performance degradation, access timeout, data imbalance and so on.

1 Common problems caused by large Keys

  • Client finds that Redis is slow.
  • Redis memory becomes OOM, or maxMemory is blocked, or important keys are ejected.
  • The memory of a node in Redis Cluster is much larger than that of other nodes, but the memory on the node cannot be equalized because the minimum granularity of data migration in Redis Cluster is Key.
  • Read requests on large keys cause Redis to occupy all the bandwidth of the server, slowing down itself and affecting other services on the server.
  • Deleting a large Key blocks the master library for a long time and causes synchronization interrupts or master/slave switches.

2 Common Problems Caused by hot Keys

  • Hot keys consume a lot of Redis CPU time, which degrades performance and affects other requests.
  • The distributed advantage of Redis Cluster cannot be taken advantage of by clients due to unbalanced traffic among nodes in Redis Cluster. The load of one fragment is high while other fragments are idle, resulting in read/write hotspots.
  • In the activities of panic buying and second killing, the requests for the corresponding inventory Key are too large and beyond the processing capacity of Redis, resulting in overselling.
  • The number of requests for hot keys exceeds the capacity of Redis, resulting in cache breakdown. In this case, a large number of requests will directly point to back-end storage to suspend it and affect other services.

Common causes of four major keys and hot keys

Insufficient business planning, incorrect use of Redis, accumulation of invalid data, and sudden increase of access will generate large keys and hot keys, such as:

  1. If Redis is used in scenarios that are not suitable for its capabilities, the value of the Key is too large. For example, String keys are used to store large binary file data (large keys).
  2. Insufficient planning and design before service launch does not split Key members properly, resulting in a large number of Key members.
  3. There is no regular cleanup of invalid data, resulting in continuous additions of members (large keys) in HASH keys, for example;
  4. Unexpected sudden increase in traffic, such as the sudden emergence of popular goods, hot news with surging traffic, a large number of screen swiping and thumbs-up caused by the activities of a big anchor in the live broadcast room, a battle between multiple unions in a certain area of the game involving a large number of players, etc. (hot Key);
  5. The code of the service consumer side that uses the Key of the LIST type is faulty. As a result, the member of the corresponding Key is increased but not decreased (large Key).

Find the big Key and hot Key in Redis

The analysis of big Key and hot Key is not difficult, we have a variety of ways and means to analyze the Key in Redis and find out the “problem” Key, such as the built-in function of Redis, open source tools, Key analysis function in the Redis console of Ali Cloud, etc..

1 Use the built-in Redis function to discover large keys and hot keys

Redis has built-in commands and tools to help us find these problem keys. If you have a clear analysis target for Redis big Key hot keys, you can run the following command to analyze the Key.

The target Key is analyzed by Redis built-in command

You may choose to use the debug Object command to analyze the Key. This command analyzes the Key based on the incoming object (the name of the Key) and returns a large amount of data. The value of serializedlength is the serializedlength of the Key. You may choose to use this data to determine whether the Key meets your criteria for large keys.

Note that the serialized length of the Key is not the same as its actual length in memory space. In addition, debug object is a debugging command, which is expensive to run, and while it is running, other requests to Redis will be blocked until it is finished. The amount of time this command takes to run depends on the serialization length of the incoming object (Key name), so it is not recommended to analyze large keys in an online environment, which can cause failures.

Since 4.0, Redis provides the MEMORY USAGE command to help analyze the MEMORY USAGE of keys. Compared with debug object, it is cheaper to execute. However, due to its time complexity of O(N), it still has the risk of blocking when analyzing large keys.

We suggest a less risky way to analyze keys. Redis provides different commands for different data structures to return their length or number of members, as shown in the following table:

Through the above Redis built-in commands, we can easily and safely analyze keys without affecting online services. However, the results they return are not the real memory usage data of keys, so they are not accurate enough and can only be used as reference.

Use the Redis bigkeys parameter of the Redis cli client to discover large keys

If you don’t have a specific target Key for analysis and want the tool to find large keys in the entire Redis instance, the Redis – CLI bigkeys parameter can help you achieve this goal.

Redis provides the BigKeys parameter that enables Redis – CLI to traverse all keys in an entire Redis instance and return the results in a summary report. The advantages of this scheme are convenience and security, but the disadvantages are also obvious: the analysis results cannot be customized.

Bigkeys can only output the largest keys of each of Redis’s six data structures. If you want to analyze only strings or find all HASH keys with more than 10 members, BigKeys will not be able to do that.

There are a number of open source projects on GitHub that can implement enhanced versions of BigKeys so that the output can be customized to the configuration, and you can also implement your own Redis instance level large Key analysis tool using SCAN + TYPE and the commands in the table above.

Similarly, the implementation method and return result of this scheme make it not accurate and real-time, so it is recommended for reference only.

Discover hotkeys using the hotkeys parameter of Redis -cli

Redis 4.0 provides the hotkeys parameter to facilitate the user to perform instance level hot Key analysis. This parameter can return the number of times that all keys have been accessed. It also has the disadvantages of not being able to customize the output report. The prerequisite for using this scheme is to set the maxmemory-policy parameter of redis-server to LFU.

Locate hot keys at the service layer

Every access to Redis comes from the business layer, so we can log and asynchronously summarize the access to Redis by adding code to the business layer. The advantage of this scheme is that it can accurately and timely analyze the existence of hot keys, but the disadvantage is that the complexity of business code increases, and some performance may be reduced.

Use the monitor command to find hot keys in an emergency

The monitor command of Redis can faithfully print all requests in Redis, including time information, Client information, command information and Key information. In the event of an emergency, hot keys can be found by briefly executing the monitor command and redirecting the output to the file, and by categorizing the requests in the file after the monitor command is closed.

The monitor command occupies CPU, memory, and network resources of Redis. Therefore, for an already high-pressure Redis, Monitor may add insult to injury. At the same time, this asynchronous collection and analysis scheme has poor timeliness, and since the accuracy of analysis depends on the execution time of Monitor, it is not accurate enough in most online scenarios where the command cannot be executed for a long time.

2 Discover large keys using open source tools

The popularity of Redis has made it easy for us to find a number of open source solutions to our current challenges: accurate analytics without compromising online services.

Use the redis-rdb-tools tool to find large keys in a customized manner

If you want to accurately analyze the real memory usage of all keys in a Redis instance according to your own standards without affecting online services, and get a concise and easy-to-understand report at the end of the analysis, redis-rDB-tools is a great choice.

The tool can perform customized analysis of Redis RDB files, but since the analysis of RDB files works offline, it has no impact on online services. This is its biggest advantage and its biggest disadvantage: offline analysis represents poor timeliness of analysis results. For a large RDB file, its analysis can take a long time.

3 rely on Redis analysis service of public cloud to discover big Key and hot Key

If you expect to be able to real-time analysis of all keys in the Redis instance and find the existing big keys and hot keys, know what big hot keys have appeared in the Redis runtime timeline, so that you can have a comprehensive and accurate judgment of the running state of the entire Redis instance. The Redis console of the public cloud will meet this requirement.

CloudDBA in the Redis console of Alibaba Cloud

CloudDBA is an intelligent database service system of Ali Cloud, which supports real-time analysis and discovery of Redis big Key and hot Key.

The bottom layer of big Key and hot Key analysis is the Key analysis function of Redis kernel of Ali Cloud, which directly discovers and outputs the relevant information of big Key hot Key through Redis kernel. Therefore, the analysis results of this function are accurate and efficient and have almost no impact on performance. You can access this feature by clicking “Key Analysis” in CloudDBA, as shown in Figure 1-1:

Figure 1-1: Alibaba Cloud Redis console CloudDBA

The Key analysis feature has two pages that allow analysis of keys in the corresponding Redis instance at different time dimensions:

  • Real-time: Analyzes the current instance immediately and displays all existing big keys and hot keys.
  • History: displays the large keys and hot keys that have appeared in the instance recently. In the history page, all the large keys and hot keys that have appeared are recorded, even if these keys no longer exist. This feature is a good reflection of the historical Key status of Redis, helping to trace past or damaged issues on site.

Six Key and hot Key processing

Now that we have found the problem keys in Redis through various means, we should start working on them immediately to prevent them from causing problems later.

1 Common handling methods for large keys

Split the large Key

For example, if a HASH Key containing tens of thousands of members is split into multiple HASH keys, and the number of members for each Key is within a reasonable range, in the Redis Cluster structure, splitting large keys plays a significant role in memory balance among nodes.

Clean up the big Key

Store data that does not fit Redis’s capabilities to another store and delete such data from Redis. Redis 4.0 provides the UNLINK command, which can slowly and gradually clean up incoming keys in a non-blocking manner. With UNLINK, you can safely delete large or even large keys.

Monitor Redis memory water level at all times

Sudden big Key problems can take us by surprise, so finding and dealing with big Key problems before they occur is an important means of maintaining service stability. We can monitor the system and set a reasonable Redis memory alarm threshold to remind us that a large Key may be generated at this time, such as: Redis memory usage exceeds 70%, Redis memory growth rate exceeds 20% within one hour, etc.

By using such monitoring methods, problems can be solved before they occur. For example, the consumption program of LIST is faulty and the number of lists corresponding to keys continues to increase. Alarms can be turned into warnings to avoid faults.

Clean up failure data periodically

For example, a large amount of data will be continuously written into the HASH structure in the form of increments, ignoring the timeliness of the data. The large amount of accumulated invalid data will cause the generation of large keys, and the invalid data can be cleaned up in the way of scheduled tasks. In such scenarios, it is recommended to use HSCAN and HDEL to clean up invalid data without blocking.

Use The Tair(Redis Enterprise Edition) service of Aliyun to avoid the cleaning work of invalid data

If you have too many HASH keys, there are a lot of member failures that need to be cleaned up. Due to the superposition of a large number of keys and a large number of invalid data, in such scenarios, scheduled tasks cannot timely clear invalid data. The Tair service of Ali Cloud can solve such problems well.

Tair is the Redis enterprise version of AliYun, which provides a large number of additional advanced features while having all the features of Redis, including the high performance features of Redis.

TairHash is a hash type data structure that can set expiration times and versions for fields. It not only supports rich data interfaces and high processing performance like Redis Hash, but also changes the restriction that only expiration times can be set for keys: TairHash allows you to set expiration times and versions for fields. This greatly improves the flexibility of hash data structures, simplifying business development in many scenarios.

TairHash uses an efficient Active Expire algorithm to efficiently determine field expiration and delete with little impact on response time. The proper use of such advanced features can free up a lot of Redis o&M, troubleshooting, and code complexity for the business, allowing o&M to focus on other, more valuable work, and freeing up r&d to write more valuable code.

2 Common solutions to hot keys

Copy hot keys in the Redis Cluster structure

In a Redis Cluster, hot keys are unable to break requests due to migration granularity issues, preventing pressure drop on a single node. In this case, you can copy the corresponding hot Key and migrate it to other nodes. For example, you can copy three identical hot keys named foo2, foo3, and foo4 for foo, and then migrate the three keys to other nodes to solve the hot Key pressure on a single node.

The disadvantages of this solution are that the code needs to be modified in conjunction with each other. At the same time, changing multiple keys brings data consistency challenges: updating one Key changes to updating multiple keys at the same time. In many cases, this solution is only recommended as a temporary solution to the current thorny problem.

Use the read-write separation architecture

If hot keys are generated from read requests, read/write separation is a good solution. When using the read-write separation architecture, you can reduce read request stress in each Redis instance by continuously adding slave nodes.

However, while the read-write architecture increases the complexity of business code, it also increases the complexity of Redis cluster architecture: We not only need to provide forwarding layer (such as Proxy, LVS, etc.) for multiple slave nodes to achieve load balancing, but also consider the increase of failure rate caused by the significant increase in the number of slave nodes. The change of Redis cluster architecture brings greater challenges for monitoring, operation and maintenance, and fault handling.

However, all this is very simple in aliyun Redis service, aliyun Redis service provides services out of the box. At the same time, when the business development changes, Aliyun Redis service allows users to adjust the cluster architecture by means of collocation to cope with it easily, such as: Master slave to read/write separation, read/write fragmentation to cluster, master slave to cluster supporting read/write separation, and Redis (Tair) from community edition to enterprise edition supporting a number of advanced features.

The read/write separation architecture also has disadvantages. In a scenario with a large number of requests, the read/write separation architecture inevitably generates delay, causing dirty data to be read. Therefore, the read/write separation architecture is not suitable for scenarios with high read/write pressure and high data consistency requirements.

Use the QueryCache feature of Ali Cloud Tair

QueryCache is one of the enterprise-level features of Alibaba Cloud Tair (Redis Enterprise Edition) service. Its principle is shown in Figure 2-1:

Figure 2-1: Tair QueryCache works

Alicloud database Redis will identify the hot Key existing in the instance according to the efficient sorting and statistics algorithm. After this function is enabled, the Proxy site will cache the hot Key request and query result according to the set rules (only the query result of the hot Key is cached, not the whole Key). When the same request is received within the cache validity period, the Proxy directly returns the result to the client without interacting with the Redis shard on the backend. In addition to improving the read speed, the hotspot Key reduces the impact on data fragmentation performance and avoids request skew.

At this point, the same request from the client does not need to interact with the Redis at the Proxy back end, but the Proxy directly returns data, and the request pointing to hot Key is transferred from one Redis node to multiple proxies, which can greatly reduce the hot Key pressure of the Redis node. At the same time, Tair’s QueryCache function also provides a large number of commands to facilitate users to view and manage, such as QueryCache keys command to view all cached hot keys, QueryCache listall to obtain all cached commands and so on.

Tair QueryCache’s intelligent hot Key determination and cache linkage can also reduce the workload of o&M and r&d.

Compared with the traditional Redis synchronization middleware, Ali Cloud Redis global distributed cache has the characteristics of high reliability, high throughput, low latency and high synchronization accuracy.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.