Redis is an in-memory database that keeps data in memory and reads and writes much faster than traditional databases that keep data on disk. However, Redis will also be delayed, which requires us to have a deep understanding of its causes, so as to quickly troubleshoot problems and solve the delay problem of Redis
A command execution process
In this scenario, latency is the interval between sending a command from the client and receiving the return value of the command from the client. So let’s take a look at the steps of a Redis command execution, each of which can cause high latency if it goes wrong.
The diagram above shows the execution process of a command sent by the Redis client, with the steps shown in green and the possible causes of high latency shown in blue.
Network connection limitations, network transfer rates, and CPU performance are performance issues that can occur on all servers. But Redis has its own unique problems that can cause high latency: command or data structure misuse, persistent blocking, and memory swapping.
And even more deadly, Redis uses single-threaded and event-driven mechanisms to handle network requests. There are corresponding connection reply handlers, command request handlers and command reply handlers to handle network request events from clients. After processing one event, it continues to process the next one in the queue. A high latency in the processing of one command can affect subsequent queued commands. See this article for more information about Redis event handling.
For high latency, Redis native provides slow query statistics. Running the slowlog get {n} command can obtain the latest n slow query commands. By default, the commands that are executed more than 10 milliseconds (configurable) are recorded in a fixed-length queue. For online instances, you are advised to set this parameter to 1 ms so that you can discover commands with a level higher than millisecond in a timely manner.
Commands that exceed the slowlog-log-slower than threshold are logged to the slow query queue
Slowlog-max-len = slowlog-max-len
slowlog-log-slower-than 10000
slowlog-max-len 128
Copy the code
If the command execution time is in milliseconds, the actual OPS of the instance is only about 1000. The default slow query queue length is 128, which can be increased. The slow query only records the command execution time, excluding the data network transmission time and command queuing time. Therefore, if the client is blocked abnormally, the command may be waiting for other commands instead of the current one. You need to compare the time points when the exception and slow query occur to check whether there are command blocking queues caused by slow query.
The output format of slowlog is shown below. The first field indicates the sequence number of the entry in all slow logs. The latest entry is displayed first. The second field is the system time at which the record was recorded, which can be converted to a friendly format using the date command. The third field represents the response time of the command in US (microseconds). The fourth field is the corresponding Redis operation.
> slowlog get
1) 1) (integer26 (2))integer), 1450253133 (3)integer), 43097 (4) 1)"flushdb"
Copy the code
Let’s take a look at the high latency problems caused by improper use of commands or data structures, persistent blocking, and memory swapping, in turn.
Unreasonable commands or data structures
Generally speaking, the command execution speed of Redis is very fast, but when the data volume reaches a certain level, the execution of some commands will take a lot of time, such as the hgetall operation on a hash structure containing tens of thousands of elements. Due to the large amount of data and the complexity of the command algorithm is O(n), This command must be slow to execute.
This problem is typical of improper use of commands and data structures. For high concurrency scenarios we should avoid executing commands with more than O(n) algorithm complexity on large objects. For hash structures with many key values, you can use the scan series of commands to iterate over them step by step instead of using hgetall to getall of them.
Redis provides a tool for discovering large objects. The corresponding command is Redis -cli-h {IP} -p {port} bigkeys. This command continuously samples from the specified Redis DB using scan, outputs the key value that occupies the largest space at that time in real time, and finally gives a summary report of the biggest key of various data structures.
> redis-cli -h host -p 12345 --bigkeys
# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type. You can use -I 0.1 to sleep 0.1 SEC
# per 100 SCAN commands (not usually needed).
[00.00%] Biggest hash found so far 'idx:user'With 1 fields [00.00%] Biggesthash found so far 'idx:product'With 3 fields [00.00%] Biggesthash found so far 'idx:order'With 14 fields [02.29%] Biggesthash found so far 'idx:fund'With 16 fields [02.29%] Biggesthash found so far 'idx:pay'With 69 fields [04.45%] Biggestset found so far 'indexed_word_set'With 1482 members [05.93%] Biggesthash found so far 'idx:address'With 159 fields [11.79%] Biggesthash found so far 'idx:reply' with 196 fields
-------- summary -------
Sampled 1484 keys in the keyspace!
Total key length inBytes is 13488 (AVg Len 9.09) Biggestset found 'indexed_word_set' has 1482 members
Biggest hash found 'independence idx:'Has 196 fields 0 strings with 0 bytes (00.00% of keys, AVG size 0.00) 0 Lists with 0 items (00.00% of keys, Avg size 0.00) 2 sets with 1710 members (00.13% of keys, AVg size 855.00) 1482 hashs with 6731 fields (99.87% of keys, Avg size 0.00) 0 zsets with 0 members (00.00% of keys, AVg size 0.00)Copy the code
Persistent blocking
If the Redis node has persistence enabled, check whether it is blocked by persistence. The operations that cause the main thread to block are fork and AOF flush.
Fork occurs during RDB and AOF overrides. The Redis main thread invokes fork to generate a shared memory sub-process, which performs the corresponding persistence work. If the fork itself takes too long, it will inevitably block the main thread.
When Redis forks, the memory usage of the child process is the same as that of the parent process, theoretically requiring twice as much physical memory to complete the corresponding operation. However, Linux has copy-on-write technology. The parent process shares the same physical memory page. When the parent process processes a write request, it makes a copy of the page that needs to be modified to complete the write operation, while the child process still reads the memory snapshot of the entire parent process at the time of fork. So, in general, forks don’t take too much time.
The latest_FORK_USec indicator can be obtained by executing the info stats command, indicating the time of the last fork in Redis. If the time is very long, such as more than one second, you need to optimize it.
> redis-cli -c -p 7000 info | grep -w latest_fork_usec
latest_fork_usec:315
Copy the code
When AOF persistence is enabled, the file flush is usually done once per second, and the backend thread fsync the AOF file every second. When hard disk pressure is too high, the fsync operation needs to wait until the write is complete. If the main thread finds it has been more than two seconds since the last fsync was successful, it blocks for data security until the background thread completes its fsync operation. The blocking behavior is mainly caused by hard disk pressure. You can check the Redis logs to identify the blocking behavior. When the blocking behavior occurs, the following logs are printed:
Asynchronous AOF fsync is taking too long (disk is busy). \
Writing the AOF buffer without waiting for fsync to complete, \
this may slow down Redis.
Copy the code
You can also look at the aof_delayed_fsync metric in the Info Persistence statistics, which accumulates each time fdatasync blocks the main thread.
>info persistence
loading:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
Copy the code
swapping
Memory swap is deadly to Redis, and an important prerequisite for Redis to ensure high performance is that all data is in memory. If the operating system swaps part of the memory used by Redis to the hard disk, the Redis performance deteriorates sharply due to the difference of several orders of magnitude between the memory and the hard disk. Check to identify Redis memory swapping as follows:
>redis-cli -p 6383 info server | grep process_id Query the redis process id
>cat /proc/4476/smaps | grep Swap Query memory swap size
Swap: 0 kB
Swap: 4 kB
Swap: 0 kB
Swap: 0 kB
Copy the code
If the exchange capacity is all 0KB or some are 4KB, it is normal and indicates that the Redis process memory is not swapped.
There are many ways to avoid memory swapping. Such as:
- Ensure that the machine has sufficient memory available
- Ensure that all Redis instances are set to MaxMemory to prevent uncontrolled Redis memory growth in extreme cases.
- Lower the swap priority, for example
echo10>/proc/sys/vm/swappiness
.
Personal blog, welcome to play
reference
- Redis. IO/switchable viewer/newest…