Five ways Redis affects performance:

  • Redis memory blocking operation
  • CPU core and NUMA architecture
  • Redis critical system configuration
  • Redis memory fragmentation
  • Redis buffer

What are the potential choke points of Redis?

  • Client: network IO, key value pair add delete change check operation, database operation
  • Disk: RDB snapshot is generated, AOF logs are recorded, and AOF logs are rewritten
  • Primary and secondary nodes: The primary database generates and transmits RDB snapshots, receives RDB snapshots from the secondary database, clears databases, and loads RDB snapshots
  • Switching cluster instances: Transferring hash information to other instances and migrating data

The client

Slow NETWORK I/O may cause performance problems in the blocking I/O model. Redis uses multiplexing to avoid waiting for the main thread, and the blocking caused by the network is not obvious.

Key-value pair operations are slow for more complex O(N) operations, blocking the main thread. For example, HGETALL, SMEMBERS, and aggregation statistics of a collection have high complexity. The essence of the deletion operation of the collection itself is to free up memory for key-value pairs. The operating system frees memory in order to manage it more efficiently. When a memory block is freed, the operating system records it to a linked list of free memory blocks for later management reallocation. If a large amount of memory is freed at once, the linked list operation time of free memory blocks will increase, causing the main Redis thread to block.

This frees up a lot of memory. One possible scenario isBig bigkey delete. The following figure shows how long it takes to delete a large number of elements in different collections:

Three rules are obtained:

  • For 10W to 100W data, the deletion time increases by 5 times to 20 times
  • The larger the element, the more time it takes to delete
  • When deleting elements reaches 100W, it takes close to 2s and inevitably blocks the main thread

In addition, the FLUSHDB and FLUSHALL flush database commands also remove the release key-value pairs.

disk

Disk IO has always been a performance bottleneck. Attention needs to be paid. Redis is designed to use child processes to generate RDB snapshots and AOF log overrides to prevent slow disk IO from blocking the main thread.

AOF logs in addition to being rewritten, AOF logs are normally recorded by executing commands, and data is saved on disks according to different write back policies. Each command takes about 1 to 2ms to write data to a disk. If a large number of write operations need to be logged in the AOF log and synchronized write back, block the main thread. AOF logs are written synchronously

Redis has three preset synchronization policies:

  • Appendfsync always Synchronizes every write operation. Worst performance
  • Appendfsync Everysec Redis synchronizes write operations once per second in performance
  • Appendfsync no Redis does not perform active disk flushing and relies on the best performance of the operating system

Choke point for master-slave interaction. During replication, the master library needs to fork to generate RDB snapshots and transfer them to the slave library. Only fork blocks the main thread, so for this optimization, remember not to have the primary node mount too many slave nodes. To clear the database from the library, you also need to load the RDB snapshot into the memory. The larger the RDB is, the slower the loading is. Therefore, the main Redis node cannot use too much memory. If you need to expand memory, consider slicing.

Slice the cluster

Redis slice cluster has hash slot information and data migration operations, because they are implemented gradually, there is little risk of blocking. However, you still need to be aware of the blocking caused by bigkey migration.

What optimizations did Redis make for performance?

Some of the above choke points can be optimized using asynchronous operations. To determine whether an asynchronous operation can be optimized, check whether the operation is a critical path on the Redis main thread.

Operation 1 on the left in the figure above does not need to return specific data to the client. The main thread can delegate execution to a subthread in the back and return directly until the subthread finishes. Operation 2 on the right waits for the return and delegates the part of operation 1 that does not need to return to the child thread.

Read operations are typically critical path operations for Redis.

Therefore, no aggregated query can take advantage of asynchronous operations and the data set should be reduced. Or use more specialized column storage for this BI(Business Intelligence) statistic-like scenario.

The same bigKey deletions and empties of data are not on the critical path and can be performed asynchronously using backbench subthreads.

AOF log synchronous write. Depending on the policy, the instance can wait or start child threads to perform AOF log synchronous write.

Loading the RDB from the library is a critical path for slave library operations and must be performed from the library main thread.

Asynchronous child threading mechanism

The Redis main thread starts, and the operating system provides pthread_CREATE to create three child threads that are responsible for the asynchronous execution of AOF log writes, key-value deletions, and file closures.

The main thread interacts with child threads through a linked list of task queues, thus increasing throughput.

Redis deletes are asynchronous deletes that are lazy deletes. When the client deletes the data, it marks it as deleted and does not actually free until the element is accessed again. In addition, there is a child thread in the background that periodically performs the operation of freeing memory. The same operations are AOF write and file close.

Asynchronous key-value pair and database flush operations are provided by Redis4.0:

  • Flag to remove key-value pairs UNLINK
  • FLUSHDB FLUSHALL ASYNC

The impact of CPU structure on Redis performance. Many people think that the relationship between Redis and CPU is simple, and the faster the CPU, the faster Redis processing. This perception is one-sided.

First, CPU has multi-core and multi-CPU architectures, which will also affect Redis performance. The mainstream CPU architecture consists of a CPU processor with multiple operating cores, each called a physical core, with a private level 1 cache (L1) containing level 1 instruction cache and level 1 data cache. Contains private level 2 cache (L2) The L2 cache is actually a buffer between L1 cache and main memory.

The different physical cores also share a level 3 cache (L3), which is an order of magnitude faster in memory access. In addition, mainstream CPU processors usually run two hyperthreads, also called logical cores.

The single-CPU multi-core architecture is a simple solution with good performance but high cost. Therefore, most server CPU manufacturers will balance the cost and performance, using multi-CPU multi-core scheme. But the CPU requires CPU sockets to use the bus for data exchange.

If an application program runs on one Socket and saves data to the memory, and then is scheduled to run on another Socket, the application program needs to access the memory connected to the Socket before accessing the memory. This type of memory access belongs to remote memory access. Remote memory access can increase application latency compared to accessing memory directly connected to a Socket. This architecture is also known as the Non-Uniform Memory Access NUMA architecture.

Context switch refers to the context switch of the thread, where the context is the runtime information of the thread. Context switch occurs when a program runs on one CPU core and then switches to another CPU core in a multi-CPU environment.

Redis performance tuning method in multi-core CPU environment:

Scheme 1: One Redis instance corresponds to one physical core

taskset -c 0 ./redis-server

In the NUMA architecture of the CPU, there is a potential risk when the network interrupt handler and Redis instance are bound to the CPU core respectively: If the network interrupt handler and the Redis instance are bound to different CPU cores on different CPU sockets, then the Redis instance will need to access memory across the CPU Socket to read the network data, which can take a long time

To avoid Redis accessing network data across CPU sockets, it is best to tie the network interrupt routine and Redis instance to the same CPU Socket

Note that the NUMBER of CPU cores is not sequential. Run the lscpu command to check the distribution of physical and logical CPU cores in NUMA architecture to avoid incorrect binding.

The risk of tying the core, multithreading running on a single thread will cause a race for CPU resources, causing the main thread to block, and increasing Redis request latency.