Hello, I’m Tiger!

Today I’m going to share 10 ways to Improve The performance of Redis, which is fast enough as an in-memory database. But getting these 10 things right can take your Redis performance to the next level!

Note: This article is based on Redis 6.2


01 using pipeline

Redis is a TCP server based on the request-response model. Means RTT (round trip time) for a single request, depending on current network conditions. This results in a single Redis request that can be very fast, such as through a local loop card. It can be very slow, such as in a poor Internet environment.

Redis, on the other hand, involves read and write system calls every time it request-response. Even multiple epoll_WAIT system calls (on Linux) are triggered. This results in Redis constantly switching between user and kernel mode.

static int connSocketRead(connection *conn, void *buf, size_t buf_len) {
    // Read system call
    int ret = read(conn->fd, buf, buf_len);
}

static int connSocketWrite(connection *conn, const void *data, size_t data_len) {
    // Write the system call
    int ret = write(conn->fd, data, data_len);
}

int aeProcessEvents(aeEventLoop *eventLoop, int flags) {
    Epoll_wait system call in Linux
    numevents = aeApiPoll(eventLoop, tvp);
}
Copy the code

So how do you save round-trip times and system calls? Batch processing is a good idea.

Redis provides “pipelines” for this purpose. The principle of a pipeline is simple: multiple commands are packaged and sent as “one command”. Redis receives the command and parses it into multiple commands. Multiple results are eventually returned as a package.

“Pipeline can effectively improve Redis performance”.

However, there are a few things you need to be aware of when using pipelines

  1. “Pipeline does not guarantee atomicity”. During a pipeline command execution, other client-initiated commands may be executed. Remember that pipelines are just batch processing commands. To ensure atomicity, use MULTI or Lua scripts.

  2. “One pipeline command should not be too many”. When pipeline is used, Redis will temporarily store the response result of pipeline command in the in-memory Reply buffer and return it after all commands are executed. If the pipeline command is too many, it can take up a lot of memory. A single pipeline can be split into multiple pipelines.


02 Enabling I/O multithreading

Prior to Redis 6, Redis was “single-threaded” for reading, parsing, and executing commands. Starting with Redis 6, IO multithreading was introduced.

The IO thread reads commands, parses commands, and returns results. After this function is enabled, I/O performance is improved.

I drew a diagram for your reference

As shown above, the main thread and the IO thread are involved in reading, parsing, and responding to commands.

However, the main thread executes the command.

IO thread is off by default, you can enable it by modifying the following configuration of redis.conf.

io-threads 4
io-threads-do-reads yes
Copy the code

Io-threads refers to the number of I/O threads (including the main thread). I suggest that you set different values for different machines and select the optimal value.


03 Avoid big keys

Redis execution commands are single-threaded, which means Redis operations on the “big key” run the risk of blocking.

Big key usually refers to the fact that Redis stores too many values. Include:

  • A single value is too large. For example, a String of 200M size.
  • Too many set elements. For example, lists, hashes, sets, and zsets contain hundreds of millions of data.

For example, suppose we have a String key of 200M size named “foo”.

Run the following command

127.0.0.1:6379 > GET fooCopy the code

When the result is returned, Redis allocates 200m of memory and performs a memcpy copy.

void _addReplyProtoToList(client *c, const char *s, size_t len) {
    ...
    if (len) {
        /* Create a new node, make sure it is allocated to at * least PROTO_REPLY_CHUNK_BYTES */
        size_t size = len < PROTO_REPLY_CHUNK_BYTES? PROTO_REPLY_CHUNK_BYTES: len;
        // Allocate memory (200m in this example)
        tail = zmalloc(size + sizeof(clientReplyBlock));
        /* take over the allocation's internal fragmentation */
        tail->size = zmalloc_usable_size(tail) - sizeof(clientReplyBlock);
        tail->used = len;
        // Memory copy
        memcpy(tail->buf, s, len);
        listAddNodeTail(c->reply, tail);
        c->reply_bytes += tail->size;

        closeClientOnOutputBufferLimitReached(c, 1); }}Copy the code

The Redis output buF is 16K

// server.h
#define PROTO_REPLY_CHUNK_BYTES (16*1024) /* 16k output buffer */

typedef struct client {.char buf[PROTO_REPLY_CHUNK_BYTES];
} client;
Copy the code

This means that Redis cannot return response data once and needs to register “writable events” to trigger multiple write system calls.

There are two time points:

  • Allocate large memory (it is also possible to free memory, as in the DEL command)
  • Trigger multiple writable events (frequent system calls such as write, epoll_wait)

So, how do you find the big key?

If slow log displays simple commands such as GET, SET, or DEL, there is a high probability that the big key appears.

127.0.0.1:6379> SLOWLOG GET
    3) (integer) 201323 // Unit subtle 4) 1)"GET"
       2) "foo"
Copy the code

Second, you can use the Redis analysis tool to find the big key.

$redis-cli -bigkeys - I 0.1... [00.00] Biggest string found so far'"foo"' with 209715200 bytes

-------- summary -------

Sampled 1 keys in the keyspace!
Total key length inBytes is 3 (avg len 3.00) Biggest string found'"foo"'Has 209715200 bytes 1 strings with 209715200 bytes (100.00% of keys, Avg size 209715200.00) 0 Lists with 0 items (00.00% of keys, AVG size 0.00) 0 hashs with 0 fields (00.00% of keys, Avg size 0.00) 0 streams with 0 entries (00.00% of keys, AVg size 0.00) 0 sets with 0 members (00.00% of keys, Avg size 0.00) 0 zsets with 0 members (00.00% of keys, AVg size 0.00)Copy the code

Here are some suggestions for big Keys:

1. Avoid big keys in services. When there is a big key, you have to decide if it is a good design or if there is a bug.

2. Split the big key into smaller keys.

3. Run alternative commands.

  • If the Redis version is later than 4.0, run the UNLINK command to replace DEL. If the Redis version is larger than 6.0, lazy-free can be enabled. The free memory operation is placed in a background thread for execution.

  • LRANGE and HGETALL are replaced by LSCAN and HSCAN.

But I still recommend avoiding big keys in your business.


04 Avoid executing commands with high time complexity

We know that Redis executes commands “single-threaded”. Executing a command with a high time complexity is likely to block other requests.

Commands with high complexity are related to the number of elements. There are two common scenarios.

  1. Too many elements consume IO resources. Such as HGETALL and LRANGE, the time complexity is O(N).

  2. Computations are too complex and consume CPU resources. For example, ZUNIONSTORE, the time complexity is O(N)+O(M log(M)).

Redis official manual, marking the time complexity of command execution. You are advised to check the manual before using an unfamiliar command to pay attention to the time complexity.

In real business, you should avoid commands with high time complexity. If you must, two suggestions

  1. Keep the number of elements in the operation as small as possible.

  2. Read/write separation. Complex commands are usually read requests and can be executed on a slave node.


05 Using Lazy to delete Lazy free

When a key expires or the DEL delete command is used, Redis removes an object from the global hash table as well as freeing its allocated memory. Freeing memory causes the main thread to block when a big key is encountered.

To do this, Redis 4.0 introduced the UNLINK command, which puts the operation of freeing object memory into the BIO background thread. This effectively reduces main thread blocking.

Redis 6.0 goes a step further and introduces lazy-free configuration. When the configuration is enabled, the “release object” operation will be “asynchronously executed” within the key expiration and DEL command.

void delCommand(client *c) {
    delGenericCommand(c,server.lazyfree_lazy_user_del);
}

void delGenericCommand(client *c, int lazy) {
    int numdel = 0, j;

    for (j = 1; j < c->argc; j++) {
        expireIfNeeded(c->db,c->argv[j]);
        // If lazy free is enabled, asynchronous deletion is used
        intdeleted = lazy ? dbAsyncDelete(c->db,c->argv[j]) : dbSyncDelete(c->db,c->argv[j]); . }}Copy the code

It is recommended to upgrade to At least Redis 6 and enable lazy-free.


06 Read/Write Separation

Redis implements “master-slave” operation mode through copy, which is the cornerstone of failover to improve system operation reliability. Also supports read/write separation to improve read performance.

You can deploy one master and multiple slaves. Distribute read commands to secondary nodes to reduce stress on primary nodes and improve performance.


07 binding CPU

Redis 6.0 began to support CPU binding, which effectively reduces thread context switching.

CPU Affinity is a scheduling property that “binds” a process or thread to one or a group of cpus. Also known as CPU binding.

Setting CPU affinity can avoid CPU context switchover to some extent and improve CPU L1 and L2 Cache hit ratio.

In the early SMP architecture, each CPU shared resources over a BUS. CPU binding is meaningless.

Under the current prevailing “NUMA” architecture, each CPU has its own local memory. Access to local memory is much faster. Accessing other CPU memory causes significant latency. In this case, CPU binding is of great significance to improve the running speed of the system.

The actual NUMA architecture is more complex than the one shown above, and typically groups cpus into a group of memory, called memory“Node”.

You can run the numactl -h command to view THE NUMA hardware information.

$ numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
node 0 size: 32143 MB
node 0 free: 26681 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
node 1 size: 32309 MB
node 1 free: 24958 MB
node distances:
node 0 1
  0: 10 21
  1: 21 10
Copy the code

In the figure above, you can see that the machine has 40 cpus grouped into 2 nodes.

Accommodate: Node is a two-dimensional matrix that represents “access distance” between nodes. 10 is the reference value. As you can see from the above command, node itself accesses the distance of 10. For example, the distance between node 0 and node 1 is 21. Note Cross-node access speed is 2.1 times slower than Node access speed.

In fact, as early as 2015, someone proposed that Redis needs to support setting CPU affinity, and at that time Redis did not support IO multithreading, the proposal was shelved.

Redis 6.0 introduces IO multithreading. In addition, CPU affinity can be configured.

I drew a picture of the Redis 6.0 thread family for your reference.

The figure above can be divided into three modules

  • Main thread and IO thread: responsible for command reading, parsing, result return. Command execution is done by the main thread.
  • Bio thread: Responsible for executing time-consuming asynchronous tasks, such as close FD.
  • Background process: fork child process to execute time-consuming commands.

Redis supports configuring THE CPU affinity of the above modules separately. You can find the following configuration in redis.conf (this configuration must be enabled manually).

The IO thread (including the main thread) is bound to CPU 0, 2, 4, 6
server_cpulist 0-7:2
The bio thread is bound to CPU 1, 3Bio_cpulist 1, 3# aof rewrite background processes bind to CPU 8, 9, 10, 11
aof_rewrite_cpulist 8-11
# bgSave background processes bind to CPUS 1, 10, 11Bgsave_cpulist 1, 10-11Copy the code

I ran the following tests on the above machine for IO thread and main thread:

First, turn on the IO thread configuration.

io-threads 4 Main thread + 3 IO threads
io-threads-do-reads yes IO thread enables read and parse commands
Copy the code

Test the following three scenarios:

  1. CPU binding is not enabled.

  2. Bind to different nodes.

“Server_cpulist 0,1,2,3”

  1. Bind to the same node.

“Server_cpulist 0,2,4,6”

Benchmark the GET command with redis-benchmark, three times for each scenario.

$ redis-benchmark -n 5000000 -c 50 -t get --threads 4
Copy the code

The results are as follows:

1. Do not enable CPU binding

Throughput summary: 248818.11 Requests per second Throughput summary: 248694.36 requests per second 249004.00 requests per secondCopy the code

2. Bind different nodes

Throughput summary: 248447.20 Requests per second Throughput summary: 248818.11 requests per secondCopy the code

3. Bind the same node

284414.09 Requests per second Throughput Summary: 284333.25 requests per second throughput summary: 265252.00 requests per secondCopy the code

Based on the test results, binding to the same node improved QPS by approximately 15%

When using a bound CPU, you need to pay attention to the following:

  1. On Linux, you can use “numactl — Hardware” to view the hardware layout and ensure that NUMA is supported and enabled.

  2. The CPU affinity Settings are valid only when threads are distributed on “different cpus, same Node” as possible. Otherwise, it can cause frequent context switches and remote memory access.

  3. Familiarize yourself with the CPU architecture and do plenty of testing. Otherwise, it may backfire and cause Redis performance to degrade.


08 Configure the persistence policy properly

Redis supports two persistence strategies, RDB and AOF.

RDB generates data snapshot in binary format through fork sub-process.

AOF is an incremental log, in text format, usually large. AOF rewrite will rewrite logs to save space.

In addition to executing the “BGREWRITEAOF” command manually, the following four points also trigger AOF rewriting

  1. Run the config set appendonly yes command

  2. AOF file size ratio exceeds the threshold, indicating auto-aof-rewrite-percentage.

  3. The absolute value of AOF file size exceeds the threshold, auto-aof-rewrite-min-size

  4. The RDB of the primary and secondary replication is loaded

RDB and AOF are both triggered in the main thread. The implementation, though, is handed over to the backend process via fork. However, fork operations copy process data structures, page tables, etc., which affects performance when the instance memory is large.

AOF supports the following three strategies.

  1. Appendfsync no: The operating system determines the fsync execution time. For Linux, fsync is typically performed every 30 seconds to flush data from the buffer to disk. If the Redis QPS is too high or big keys are written, the buffer may be full, triggering fsync frequently.

  2. Appendfsync everysec: Perform fsync every second.

  3. Appendfsync always: Fsync is called once every “write”, which has a significant performance impact.

Both AOF and RDB cause high pressure on disk IO. AOF rewrite will iterate over all the data in the Redis hash table and write it to disk. Performance may be affected.

The online business Redis is usually highly available. If not sensitive to cache data loss. Consider turning off RDB and AOF to improve performance.

If it cannot be closed, here are some suggestions:

  1. RDB chooses the peak period of business, usually early in the morning. Keep the memory of a single instance to no more than 32 GB. Too much memory can cause fork times to increase.

  2. AOF selects appendfsync No or appendfsync everysec.

  3. The AOF auto-aof-rewrite-min-size configuration is larger, such as 2G. Avoid triggering rewrite too often.

  4. AOF can only be enabled on the secondary node, reducing the stress on the primary node.

According to local tests, write performance improves by about 20% without AOF enabled.


09 Using long connections

Redis is a TCP – based, request-response server. Using short connections results in frequent connection creation.

The short connection has the following slow operations:

  1. When creating a connection, TCP implements policies such as three-way handshake and slow start.

  2. Redis triggers new/disconnect events and performs time-consuming operations such as assigning/destroying clients.

  3. If you are using a Redis Cluster, when you create a connection, the client will pull the slots information to initialize it. Making connections is slower.

As a result, creating a connection is a very slow operation compared to the fast performance of Redis.

You are advised to use a connection pool and set the connection pool size appropriately.

However, when using long connections, you need to be careful to have an “automatic reconnect” policy. Network exceptions may cause connection failures and affect normal services.


10 closed SWAP

SWAP is a memory swapping technology. Copy memory per page to a preset disk space.

Memory is fast and expensive. Disks are slow and cheap.

In general, the more SWAP is used, the worse system performance is.

Redis is an in-memory database. Using SWAP causes rapid performance degradation.

You are advised to reserve sufficient memory and disable SWAP.


conclusion

These are the 10 Ways to Improve Redis performance.

I made a mind map so you can remember.

As you can see, performance tuning is not easy and requires a lot of low-level knowledge and thorough testing. Redis has different performance on different machines, different systems and different configurations.

“We suggest that we fully test and optimize according to the actual situation.”

If you have a better Redis performance optimization method, feel free to share it in the comments section

-End-


Finally, welcome to my public account “Tiger”.

I will continue to write better technical articles.

If my article is helpful to you, please give it a thumbs up