In some network service systems, the performance of Redis may be more important than the performance of hard disk databases such as MySQL. For example, Weibo stores hot microblogs [1] and the latest user relationships in Redis. A large number of queries hit Redis instead of MySQL.

So, what performance tuning can we do for Redis services? In other words, what performance waste should be avoided?

Fundamentals of Redis performance

Before we discuss optimization, we need to know that the Redis service has some inherent features, such as single-threaded running. Unless the Redis source code is modified, these features are fundamental to thinking about performance optimization.

So, what are the basic Redis features we need to consider? The Redis project description Outlines its features:

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported.
First, Redis uses virtual memory provided by the operating system to store data. And that operating system is generally referred to as Unix. Redis can also run on Windows, but requires special treatment. If your operating system uses swap space, Redis data may actually be saved on hard disk.

Second, Redis supports persistence, so data can be saved on hard disk. In many cases, persistence is necessary for backup, data recovery, etc. But persistence doesn’t happen in a vacuum; it takes up resources.

Third, Redis uses key-value to read and write data, and value can contain many different kinds of data. Furthermore, the underlying layer of a data type is stored as different structures. Different storage structures determine the complexity and performance cost of adding, deleting, modifying and querying data.

Finally, not mentioned in the introduction above, Redis is mostly single-threaded [2], i.e. only one CPU is running at a time, only one instruction is running, no parallel reads and writes. Many of the latency problems associated with operations can be answered here.

In terms of the last feature, why Redis is single-threaded but has good performance (according to Amdahl’s Law, it makes more sense to optimize the process with a large proportion of time consuming), two sentences are summarized as follows: Redis takes advantage of the multi-channel I/O multiplexing mechanism [3] to process client requests without blocking the main thread; Redis simply executes (most of the instructions) an instruction in less than 1 microsecond [4], so a single-core CPU can process 1 million instructions a second (probably hundreds of thousands of requests) without multithreading (network is the bottleneck [5]).

Add VX: YDT929 to get more latest interview questions in Java for free, as well as more dry materials, notes

Optimizing network latency

The official Redis blog says in several places that the performance bottleneck is more likely to be the network [6], so how can we optimize latency on the network?

First, if you are using a single-machine deployment (the application service is on the same machine as Redis), using Unix interprocess communication to request Redis services is faster than using localhost Lans (loopback). So says the official document [7], and come to think of it, so should the theory.

But many companies don’t have the scale to support a standalone deployment, so TCP is still used.

The Redis client communicates with the server using TCP long links. If a client sends a request and needs to wait for Redis to return the result before sending the next instruction, multiple requests from the client and Redis form the following relationship:

! [](https://pic2.zhimg.com/80/v2-d634419ea48125b8a25b530081af01e7_720w.png)

(Note: if the key you want to send is not very long, a TCP packet can hold the Redis instruction, so only a push packet is drawn.)

In both cases, the client needs to experience a period of network transmission time.

However, if possible, it is perfectly possible to combine requests using multi-key class directives, such as MGET key1 key2 for two GET keys. In this way, the number of requests in the actual communication is reduced, and the delay is naturally improved.

If you cannot merge with a multi-key directive, such as a SET, a GET cannot merge. How to do?

Redis has at least two methods that can combine multiple instructions into a request, one is MULTI/EXEC and the other is script. The former was originally the way to build a Redis transaction, but it does merge multiple instructions into a single request, which goes as follows. As for scripts, it is best to use the sha1 hash key of the cached script to call up the script for less communication.

! [](https://pic2.zhimg.com/80/v2-c4b248f9b396f21357c692488f30c0ac_720w.png)

It really does cut down on the network time, doesn’t it? However, this requires that the transaction/script keys be on the same node, so be careful.

If we have considered all the above methods and still cannot merge multiple requests, we can also consider merging multiple responses. For example, merge two replies:

! [](https://pic2.zhimg.com/80/v2-8a707d59dc51d0b1f1c181a12fe5775e_720w.png)

In this way, the network transmission time for one reply can be saved in theory. That’s what Pipelines do. Here’s an example of a Ruby client using pipeline:

require ‘redis’ @redis = Redis.new() @redis.pipelined do @redis.get ‘key1’ @redis.set ‘key2’ ‘some value’ end # => [1, 2]

Some language clients even use pipelines to optimize latency by default, such as node_redis.

In addition, not any multiple reply messages can be put into a TCP packet. If there are too many requests and the reply data is very long (such as get a long string), TCP will subcontract transmission. However, pipeline can still reduce the transmission times.

Pipeline differs from the other methods above in that it is not atomic. So it is more likely to implement pipelines than atomic approaches on clusters in cluster state.

To summarize:

  1. Interprocess Unix communication is used in single-machine deployment
  2. Combine multiple instructions using multi-key directives to reduce the number of requests, if possible
  3. Merge requests and responses using Transaction, Script
  4. Use pipeline to merge response

Beware of operations that take a long time to execute

In the case of large data volume, the execution time of some operations will be relatively long, such as KEYS *, LRANGE MyList 0-1, and other instructions with O(n) algorithm complexity. Because Redis uses only one thread to query data, if these instructions take a long time, they can block Redis, causing a lot of latency.

Although KEYS * is documented as being fast — a million KEYS can be scanned in 40 milliseconds — tens of ms is a lot for a high-performance system, especially if there are hundreds of millions of KEYS (a machine can store hundreds of millions of KEYS). For example, if a key is 100 bytes, 100 million keys are only 10GB), it takes longer.

As a result, try not to use these slower-executing instructions in your production code, as Redis’ author wrote on his blog [8]. In addition, operation and maintenance students should also try not to use Redis when querying it. Even Redis Essential recommends using rename- Command KEYS “to disable this time-consuming command.

In addition to these time-consuming commands, Transaction and script in Redis can take a long time because multiple commands can be merged into one atomic execution process.

If you want to find “slow instructions” used in production, you can use the SLOWLOG GET count to look at the last count of instructions that took a long time to execute. How long is long can be defined by setting slowlog-log-slower than in redis.conf.

In addition, a possible slow instruction that is not mentioned in many places is DEL, but is mentioned in the comments [9] of the redis. Conf file. To make a long story short, it may take a long time (even seconds) to reclaim the corresponding memory of a large object, so we recommend using the asynchronous version of DEL: UNLINK. The latter starts a new thread to remove the target key without blocking the original thread.

Furthermore, when a key expires, Redis usually also needs to delete it synchronously. One way to remove keys is to check keys with an expiration date 10 times per second, stored in a global struct that can be accessed using server.db-> Expires. The way to check is:

  1. Randomly remove 20 keys from the keys
  2. Delete what’s out of date.
  3. If more than 25% of the last 20 keys (i.e., more than 5 keys) are expired, Redis thinks that there are still a lot of expired keys. Continue to repeat step 1 until the exit condition is met: there are not so many keys that have been removed in the past.
The performance impact here is that if there are really many keys that are out of date at the same time, Redis really does loop through deletion, taking up the main thread.

The Redis authors’ advice [10] is to be wary of the EXPIREAT directive because it is more likely to cause simultaneous keys expiration. I’ve also seen suggestions for setting a random fluctuation in keys expiration time. Finally, there is a method in redis.conf to make the expired deletion of keys asynchronous, that is, to set lazyfree-lazy-expire yes in redis.conf.

Add VX: YDT929 to get more latest interview questions in Java for free, as well as more dry materials, notes

Optimize data structures and use the right algorithms

The efficiency of adding, deleting, modifying, and querying a data type (such as string or list) is determined by its underlying storage structure.

When we use a data type, we can pay proper attention to its underlying storage structure and algorithm, and avoid using too complex methods. Two examples:

  1. ZADD’s time complexity is O(log(N)), which is more complex than adding a new element to other data types, so use it with caution.
  2. If the Hash value has a limited number of fields, it is likely to use ziplist for storage. However, ziplist may not be as efficient as Hashtable with the same number of fields. You can adjust the storage structure of Redis if necessary.
In addition to time performance considerations, we sometimes need to save storage space. For example, the ziplist structure mentioned above saves storage space compared to the Hashtable structure (the authors of Redis Essentials insert 500 fields into hashtable and ziplist Hash, respectively, Each field and value is a string of about 15 bits, resulting in a hashtable structure that uses four times as much space as Ziplist.) . However, the complexity of algorithms for space-saving data structures can be high. So there are trade-offs to be made here. Welcome to pay attention to the public number: Zhu Xiaosi’s blog, reply: 1024, you can get exclusive redis information.

How to make a better trade-off? I felt I had to dig deep into Redis’s storage architecture to make myself feel at ease. We’ll talk about that next time.

These are all programming considerations, and should be taken into account when writing programs. The following points also affect Redis performance, but the solution is not just code level adjustment, but also architectural and operational considerations.

Consider whether the operating system and hardware affect performance

The external environment in which Redis runs, namely the operating system and hardware, obviously also affects Redis performance. In the official documentation, some examples are given:

  1. CPU: Intel cpus are better than AMD Hao Long series
  2. Virtualization: Physical machines are better than virtual machines, mainly because some virtual machines have non-local hard disks. Monitoring software causes slow fork commands (fork is used for persistence), especially when Xen is used for virtualization.
  3. Memory management: In Linux, the translation lookaside buffer, or TLB, can manage more memory (TLB can only cache a limited number of pages), so the operating system makes some memory pages larger. Like 2MB or 1GB instead of the usual 4096 bytes, these huge pages are called huge pages. At the same time, in order to facilitate programmers to use these large memory pages, the operating system implemented a transparent Huge Pages (THP) mechanism, so that large memory pages are transparent to them, they can use them as normal memory pages. However, this mechanism is not necessary for the database, probably because THP makes the memory space compact and continuous. As mongodb documentation [11] clearly states, the database needs sparse memory space, so please disable THP. Redis is no exception, but the reasons given on the Redis blog are as follows: large memory pages cause slower forking when bgSave is used; If these pages are forked and changed in the original process, they need to be copied on write, which consumes a lot of memory (after all, people are huge pages and copying is expensive). Therefore, disable transparent Huge Pages in the operating system.
  4. Swap space: When some memory pages are stored in a swap space file and Redis requests that data, the operating system blocks the Redis process and takes the desired page out of swap space and puts it into memory. This involves blocking the entire process, so it can cause latency issues. One solution is to disallow swap space (as suggested in Redis Essentials, if you run out of memory, find another way to deal with it).

Consider the overhead of persistence

An important feature of Redis is persistence, which means copying data to hard disk. Based on persistence, Redis data recovery and other functions are available.

But the ability to maintain this persistence also has a performance cost.

First of all, RDB full persistence.

This persistence method packages the full amount of data in Redis into an RDB file and stores it on the hard disk. However, the RDB persistence process is forked out by a sub-process of the original process, and forking this system call takes time. According to the experiment conducted by Redis Lab 6 years ago [12], on a new AWS EC2 M1.small ^13, Fork a 1GB Redis process that takes 700+ milliseconds, during which time Redis is unable to process the request.

While machines today are probably better than they were then, the overhead of forks should be considered. To do this, use reasonable RDB persistence intervals, not too often.

Next, let’s look at another type of persistence: AOF incremental persistence.

This persistence method will save the command you send to redis Server as text (following the Redis Protocol format). During this process, two system calls will be called, one is write(2), which is done synchronously, and one is fsync(2), which is done asynchronously.

Both of these could be responsible for the delay:

  1. Write may be blocked because the output buffer is full, or because Kernal is synchronizing data from the buffer to disk.
  2. The purpose of fsync is to ensure that the data written to the AOF file falls onto the hard disk, which can take up to 20 milliseconds on a 7200 RPM hard disk. More importantly, write can be blocked while fsync is in progress.
The write block seems to be acceptable because there is no better way to write data to a file. For fsync, however, Redis allows three configurations, depending on your balance of backup timeliness and performance:

  1. Always: When appendfsync is set to always, fsync is synchronized with the client’s instructions and therefore is most likely to cause delay, but the backup is most timely.
  2. Everysec: When fsync is executed asynchronously once a second, Redis performs better, but fsync can still block write, which is a compromise.
  3. No: Redis will not initiate fsync (not never, that’s unlikely), and the kernel will decide when to fsync

Use distributed architecture – read/write split, data sharding

Above, we are based on a single, or a single Redis service optimization. Next, let’s consider using a distributed architecture to ensure Redis performance as the site grows in size.

First, when you have to (or best) use a distributed architecture:

  1. The amount of data is too large to fit in a single server’s memory, such as one terabyte
  2. The service needs to be highly available
  3. The request pressure of a single unit is too high
These problems can be solved by data sharding or master-slave separation, or both (that is, on the cluster node for sharding, the master-slave structure is also installed).

Such an architecture can add new entry points for performance improvement:

  1. To send slow instructions to some slave library for execution
  2. Put persistence on a little-used slave library
  3. Shard some large list
The first two are ways of supplementing performance with other processes (or even machines) based on the single-threaded nature of Redis.

Of course, with a distributed architecture, there can be performance implications, such as requests being forwarded and data being copied and distributed over time. (written)

The latter

There are a lot of other things that affect Redis performance, such as active rehashing (keys master table rehashing, 10 times per second; turning it off improves performance a little bit), but this blog post is a long one. Moreover, it is more important not to collect problems that have been asked by others and then memorize solutions; It is to grasp the basic principles of Redis and to resolve new problems in a constant and ever-changing way.