This article will take Redis Bigkey as the theme of technology, through understanding the high performance of REDis, Bigkey harm, the reason for existence, four solutions, to simulate the actual combat drill introduction, to understand, discuss and learn redis.

Meet Redis and Bigkey first

Redis — the darling of the Internet

Redis, as an excellent industrial-grade in-memory database, has gradually become the darling of the Internet since its birth, supporting the rich and colorful functions of the Internet and the huge QPS (query rate per second), and has become a synonym for high performance like Nginx. For example, behind every hot search on Weibo, Redis is silently waiting. In a sense, redis usage also represents an Internet company’s traffic.

Memory is an important presence in von Neumann computing system. There is a certain dialectical relationship between computing and storage, through computing can reduce storage, through storage can reduce a certain amount of consumption. Therefore, the idea of caching has become an important optimization method to reduce the amount of computation in system performance.

Let’s look at two comparisons.

This is a comparison of CPU, memory and disk performance. The read and write performance of memory is nearly one thousand times that of disk. Redis, as a memory storage medium, has revolutionary changes in system performance improvement. Economic basis determines superstructure, so economic benefit is always the first.

Let’s look at the price of storage.

We simply evaluate the analysis: memory conversion about 30 yuan 1G, disk about 0.5 yuan 1G, compared to the difference in performance, the economic benefit of the input-output ratio is still very high.

Storage data structure of REDIS: REDis is a storage database with key/value structure. The internal organization form uses the data access index structure of hashMap, and the reading time complexity of data is O(1). The key of a hashMap is a string. Valuek can be a string, list, hashMap, set, or Zset data structure. These data are stored in the memory for efficient read/write performance and are often used for cache processing to improve system performance.

The underlying storage structure is as follows:

Another design implementation that underpins the high performance of Redis is the single-threaded task processing design.

In the face of a huge workload, in order to complete the task as soon as possible, they usually choose to add others and break the task into multiple parts for parallel processing. This is a simple idea of concurrent processing with multiple threads, which can improve the throughput of task processing.

So why does Redis go the other way and use a single-threaded approach? Single thread here means that Redis network IO and key-value pair reads and writes are completed by one thread, which is also the main process of Redis to provide key-value storage service externally.

Let’s take a look at what are the problems when dealing with network IO when using multiple threads? Thread is the basic unit of CPU scheduling. CPU switches time slices through clock interruption, showing the effect of parallel processing externally, and task switching inevitably consumes resources. When too many threads are opened, these switches reach a performance bottleneck after a certain amount.

There is also the problem of resource sharing under multithreading, which is also the core problem of concurrent programming. Concurrent environment existence resource competition, you need to lock the Shared resources of critical section, handle concurrent processing into serialization, redis data structure, the underlying using hashMap index data structure, under the condition of multithreading, inevitable resource contention problem, which becomes a serial synchronization process. Therefore, we reject multithreading here.

So what’s the advantage of a single thread in this scenario

Request the client and the server via TCP three-way handshake will set up a network connection, after the request data from the network card to be written to the operating system kernel buffer, user programs to perform a read operation, the data from the kernel space is written to the user of the program execution variables, do not receive their data if the kernel space, blocking occurs here waiting.

In order to solve this technical problem, a technique called IO multiplexing was created. There are three implementations of IO multiplexing under Linux: SELECT, Poll, and epoll. In simple terms, this mechanism allows multiple listening sockets and connected sockets to exist simultaneously in the kernel. Then complete read and write ready, notify user mode to execute. Specific here do not expand, interested friends can search on the Internet.

Nodejs, Nginx, these gateway layer technologies, are single-thread design, in dealing with network IO, single-thread is better than multi-thread. But single threading also has its Achilles’ heel: if one of the requests in the process becomes too long, it blocks subsequent requests. This is the main reason for bigkey’s damage, which I’ll explore later. Just like on a one-way street, a car broke down, there will be a traffic jam phenomenon.

What is Bigkey?

From the underlying data storage structure of Redis in the figure above, it can be seen that value has a variety of data structure implementations. Therefore, value size is expressed as string length in string type. If value is a compound type, it is the number of elements.

Bigkey is the big value problem in redis key/ Value system. According to the data type, bigkey is embodied in two aspects:

  • The stored data is of string type and the value value is too long.

  • Value is a compound type and contains too many elements.

In Redis, the maximum value of a string is 512MB, and a secondary data structure (such as hash, list, set, zset) can store about 4 billion (2^32-1) elements. This is a theoretical value, and in practice, we can comprehensively measure the limit number through the data given by operation and maintenance. Generally, the string type is limited to 10KB, and the number of hash, list, set, and zset elements is limited to 5000.

What is the harm of Bigkey? What are the causes?

Seeing this, we already have a preliminary understanding of BigKey. Next, we will introduce the hazards and causes of Bigkey.

1. Bigkey’s four hazards

As the saying goes, “one piece of rat poop spoils the broth,” and for Redis, BigKey is like a piece of rat poop. Its risk is mainly manifested in the following four aspects:

1. The memory space is uneven

In cluster mode, the memory of host nodes is uneven due to bigkey, which adversely affects the centralized memory management of the cluster and may cause data loss.

2. Timeout is blocked

Due to the single-threaded nature of Redis, manipulating BigKey is often time-consuming, which means redis is more likely to block, which can cause client blocking or failover. Slow queries usually show up.

3. The network is congested

Bigkey also means that a large amount of network traffic is generated per fetch. Assuming that a bigkey is 1MB and the client is visited 1000 times per second, this will generate 1000MB traffic per second, which is a disaster for a server with a normal gigabit network card (128MB/s in bytes).

4. Block deletion

There is a bigkey, which is set to expire, and when it expires, it will be deleted. With Redis prior to 4.0, the expired key was deleted asynchronously, and there is a possibility of blocking Redis, and the expired delete will not be detected from slow queries (because the delete is not generated by the client, it is an internal loop event).

How did bigkey come into being?

Bigkey is mainly caused by improper program design, such as the following common business scenarios

  • Social: fan list, if some stars or big V is not carefully designed, it must be Bigkey.

  • Statistics: For example, store a set of users of a function or website by day. Unless few people use it, it must be bigkey.

  • Cache class: serialize data from the database load into Redis. This method is often used, but there are two things to note: first, is it necessary to cache all the fields; Second, there is no relevant data.

Thus, in the program design, we should have a fundamental assessment of the growth and boundary of data volume, and do a good job in technology selection and technology architecture.

Three or four solutions for finding Bigkey

Let’s start with a thought question:

At the beginning of today, a series of COVID-19 outbreaks broke out in Shijiazhuang. For a medium-sized and large city with a population of more than 10 million, the epidemic prevention and control are facing great pressure. How to efficiently identify the infected people and the people in contact with the virus has become the key to win the epidemic prevention and control. The government has made the following efforts, which can be summarized in four aspects:

  1. Prohibit the movement of people and quarantine at home;

  2. Developing risk levels;

  3. Grid management;

  4. Accounting test.

According to epidemiological medicine, symptoms must be reported voluntarily. This is voluntary reporting, because the novel coronavirus has a certain incubation period and many asymptomatic patients need to be detected actively through the mechanism of accounting and detection, which actually reflects the idea of computer processing and scanning.

The idea of finding and dealing with Bigkey is similar to that of epidemic prevention and control. There are also four conventional approaches.

1. Redis client tools

Redis -cli provides –bigkeys to find bigkeys, as shown in the following example.

As you can see from the figure above, this method gives the top 1 bigkey for each data structure, as well as the number of keys and average size for each data type. But if we need more Bigkeys, this approach won’t work. Internal scan is performed in scan mode, which incurs a certain performance cost. To avoid affecting services, you can execute the scan task on the secondary node.

2, the debug of the object

Redis provides a debug object key command. Suppose there is a requirement to “find keys larger than 10KB in Redis”. In order to obtain the result data, scan all keys first, and then call debug Object keys in a loop to obtain the size of all keys in bytes.

Due to the slow execution of the Debug Object key, it is possible to block the Redis thread. Therefore, this scheme also has certain damage to the service. When used, the execution program can be run on the slave node.

3. RDB file scanning

As we know, Redis has a persistence scheme called RDB persistence, which is a disk snapshot of redis memory storage data. By scanning the RDB file, you can find out the existence of bigkey.

When choosing this solution, you first need to do RDB file persistence. RDB persistence is a form of memory snapshot, which is carried out at a certain frequency. This scheme is an ideal choice, which will not affect the operation of redis host. However, RDB persistence scheme is not selected in scenarios with high requirements on data reliability, so it does not have universal applicability.

4, DataFlux Bigkey scan design idea

The preceding schemes are either discovered by the client or require a full data scan, which consumes a lot of computing resources. To compare the epidemic situation with a city with a population of 10 million, nucleic acid testing is carried out on all staff regardless of grade. This not only consumes huge material, financial and human resources, but also is very inefficient, which runs counter to the time race against epidemic prevention and control.

We have also analyzed the causes of Redis Bigkey above, many of which are caused by unreasonable business design and inadequate evaluation. Therefore, in the design of DataFlux product, the Redis collector of Datakit uses a scheme to independently configure potential Bigkeys for scanning and discovery, supporting fixed key values and key patterns. In key pattern, a certain range of keys can be obtained through scan pattern, and the value of each type of key (“HLEN”, “LLEN”, “SCARD”, “ZCARD”, “PFCOUNT”, “STRLEN”) can be obtained through length function. The length of the corresponding key is obtained and reported to DataFlux platform for storage monitoring.

There are two advantages to this approach:

First, because the length is obtained based on the target key, the length value acquisition of various data types in Redis is O(1) time complexity. Therefore, the execution efficiency is very high;

2. The collected result data is reported to DataFlux storage platform, on which indicators can be displayed in various charts and alarms can be monitored.

Next, we illustrate the solution by performing a simple business scenario simulation.

Four, the actual combat drill, and so on is this moment!

In a service system, the string type of Redis is used to store user authentication tokens and the list data type is used as asynchronous message queues.

Service analysis: The key data used to store tokens is of a fixed length and does not change the amount of data and does not form a Bigkey. As for message queues, if the consumer side fails and the message producer side is flooded with data at that moment, the Redis key using message queues becomes a potential Bigkey, so we need to monitor this key.

Let’s assume that the key name of the message queue is Queue.

Let’s follow the official tutorial to install the Datakit tool.

The official tutorial: “how to install DataKit help.” dataflux. Cn/doc/ef29e83…

After the installation is complete, go to the conf.d/db directory in the DataKit installation directory, copy redis.conf.sample, and name it redis.conf. Do the following configuration:

Simulate the initial queue push with 10 values

Data is reported to the DataFlux platform

Push a certain amount of data

The final collection results can be seen on the DataFlux back table.

On the DataFlux platform, indicators can be used to display graphs, monitor alarms and visualizations of monitored keys to maximize data value.