Almost cover Redis common knowledge points, I hope to help you

Summary of this article

  1. Why Redis is so fast 1.1. Smart use of DATA Structure SDS 1.2. High performance event model driven 1.3. Memory-based operations

  2. Why is Redis so reliable 2.1. AOF Persistence 2.2. RDB Persistence 2.3. Sentinel High Availability

  3. Redis6.x Multithreading overview

  4. Redis best practices

Part1 Why is Redis so fast

1.1 Clever use of data structure SDS

We know that the bottom layer of REDis is written in C language, but the data structure does not directly apply to the structure of C, but according to the positioning of Redis built a set of data structure.

String structure in C:

String structure under SDS definition:

As you can see, there are only a few more fields than in C to identify the free space and the current data length, but it’s a stroke of genius:

  • The length of the string can be obtained by O(1) complexity; The presence of len fields eliminates the need to iterate over the count as C constructs do.

  • Prevent cache overflow; The C string does not record the length occupied, so it needs to allocate enough space in advance and overflow if it runs out of space. The existence of free field enables SDS to judge and allocate enough space to the program before execution

  • Reduce the number of memory reallocations caused by string modifications; The existence of free field makes SDS have the ability of space pre-allocation and lazy release.

  • It’s safe for binary; Binary may have collisions with the C string ending ‘\0’, resulting in truncation exceptions when traversing and retrieving data, whereas SDS has the LEN field, which accurately identifies the data length without worrying about truncation by the middle ‘\0’.

The above content uses strings to illustrate the differences and advantages of SDS and C data structures. Let’s take a look at the linked list, hash table, and jump table data structures designed by Redis:

<<< swipe left and right to see more >>>

As you can see, Redis has the same starting point when designing data structures. It can be summed up in one sentence: space for time.

Fast manipulation of data at the expense of storage space and small computing costs

1.2 Event-driven mode with excellent performance

Before redis6.x, it was all about how single threads are so good.

So, where is the single thread, and how does it do data reading and writing?

$single-threaded

The new version of the multithreaded model will be discussed separately in a later section, but the single-threaded model will be discussed first.

A single thread means that all operations on data are performed sequentially by a single thread. Using a single thread can:

  • Avoid unnecessary context switching and competition conditions, there is no multi-process or multi-thread caused by switching and CPU consumption;

  • There is no need to worry about various locks, there is no lock release operation, and there is no performance penalty due to possible deadlocks.

However, using single-threaded processing means that requests arriving at the server cannot be processed immediately.

So how to ensure single thread resource utilization and processing efficiency?

$IO multiplexing and event driven

The Redis server, as a whole, is an event-driven application, with all operations performed in an event manner.

As shown in the figure, Redis event-driven architecture consists of socket, I/O multiplexing, file event dispatcher and event handler:

A Socket is an abstraction of an endpoint for two-way communication between application processes on different hosts in a network.

I/O multiplexing, which helps a single thread efficiently process multiple connection requests by monitoring multiple descriptors and notifying the program to take action when the descriptors are ready.

Redis implements the same API for each IO multiplexing function, so the underlying implementation is interchangeable.

The default I/O multiplexing mechanism of Reids is epoll. Compared with other multiplexing mechanisms such as SELECT /poll, epoll has many advantages:

| | concurrent connections limit memory copy | | | active connection perception | — – | — – | — – | — – | | epoll | there is no limit to the maximum concurrent connections | Shared memory, without memory copy | based on event callback, Only perception active connection | | | the select fd restrictions, the default 1024 a / 64 a 32-bit machine machine a default 2048 | copy fd set from user mode to kernel mode | only awareness of fd in place, but can’t locate, Need to traverse + polling | | poll | using linked lists to store there is no maximum number of simultaneous connections to limit fd with the select | | select, need to traverse + polling |

Event-driven, Redis designs two types of events, file events, which are abstractions for socket operations, and time events, which are abstractions for timed operations.

File events:

  • Client connection request (AE_READABLE event)

  • The client command requests (AE_READABLE event) and events

  • Server command Reply (AE_WRITABLE event)

Time event: divided into periodic event and periodic time; All time events in Redis are stored in an unordered linked list. When the time event executor runs, the list needs to be traversed to ensure that all events that have reached the time are processed.

It can be seen that the whole implementation of Redis is to achieve excellent processing efficiency and high throughput through efficient I/O multiplexer driver and single thread memory operation.

1.3 Memory-based Operations

As mentioned in the previous section, one of the reasons why Redis can use single-thread processing is that memory operations consume less resources, ensuring efficient processing.

How does Redis maintain and manage such precious memory resources?

$In addition to add, delete, change and check, what other maintenance operations [1]

Hit ratio statistics, after reading a key, the server updates the number of key space hits or key space misses based on whether the key exists.

LRU time update. After a key is read, the server updates the LRU time of the key. This value can be used to calculate the idle time of the key.

Lazy deletion, where if the server reads a key and finds that the key has expired, the server removes the expired key before performing any other operations.

The key’s dirty identifier. If a client monitors the key using the WATCH command, the server marks the key as dirty, letting the transaction program notice that the key has been changed. Dirty is incremented with each change, which is used to trigger persistence and replication.

Database notification, “If the server has database notification enabled, the server will send the corresponding database notification as configured after modifying the key.”

$Redis how to manage memory

Expiring key deletion, memory and CPU resources are precious, Redis through regular deletion set reasonable execution time and frequency, with lazy deletion bottom way, to achieve a balance between CPU time and memory waste.

Data obsolescence. If key production is too fast, regular deletion operations cannot keep up with the rate of new production, and these keys are rarely accessed and cannot trigger lazy deletion, will the memory overflow? The answer is no, because Redis has a data elimination strategy:

  • Noeviction: New write operations will bug when memory is insufficient to accommodate new write data.

  • Allkeys-lru: Removes the least recently used Key when the memory is insufficient to hold new data.

  • Allkeys-random: Randomly removes a Key when memory is insufficient to accommodate new written data.

  • Volatile – lRU: Removes the least recently used Key from the expired Key space when memory is insufficient to accommodate new writes.

  • Volatile -random: Randomly removes a Key from the expired Key space when memory is insufficient to accommodate new writes.

  • Volatile – TTL: When the memory is insufficient to accommodate new data, the Key whose expiration time is earlier is removed from the Key space.

It is worth mentioning that the LRU here is not exactly the same as the lRU we are familiar with. Redis uses the idea of sampling probability and omits the memory consumption of the bidirectional linked list.

Redis will determine whether the maximum limit has been reached each time the command is processed. If so, it will use the corresponding algorithm to delete the Key involved. In this case, the LRU value of the Key we maintained previously will come in useful.

Part2 Why is Redis so reliable

What should Redis, a memory-based storage, do when the server goes down?

2.1 RDB Persistence

Persistence is a common solution, so the simplest solution that comes to mind is to save the data in memory every once in a while to avoid losing most of the data. This is also the idea of Redis RDB persistence.

RDB is available in two ways, Save and BGSave

Save blocks other server operations until save completes, so all command requests during this period are rejected. The client is greatly affected.

BGSave, data is saved by the child process while Redis continues to process client requests. To prevent competition and conflict, BGSave is designed to be mutually exclusive with the Save/Bgrewriteaof operation.

By default, the Redis server performs a save every 100 milliseconds if the number of database changes (the dirty counter) is greater than the set threshold and the time since the lastsave execution (the lastsave attribute) is greater than the set threshold.

Because it is a unified batch storage operation, RDB files have binary storage, compact structure, less space consumption, fast recovery and other characteristics, in the persistence scheme is indispensable.

2.2 AOF Persistence

However, due to the bgSave cycle interval and save trigger conditions, it is inevitable that some of the latest data will be lost when the server goes down. This requires some assistance for persistent replenishment.

RDB holds key-value pairs, while AOF holds write commands.

Why does AOF hold commands instead of key-value pairs?

Coder’s technical approach holds that, firstly, aOF flush is in the process of file event processing. The specific position is to call the append function before ending an event cycle, so it is more convenient to use request command to store. Second, if the command is damaged during the append process, it can also be restored through redis-check-aof (command recovery is convenient).

AOF disk flushing policy. Since the AOF file is appended in sequence with the client request processing, the performance is greatly affected when the AOF file is appended to the AOF_BUF cache first. Whether the file is synchronized to the AOF file depends on the configuration of always, Everysec (default), and NO. Always has a greater impact on performance than Everysec, while NO tends to lose data.

The AOF file is overwritten and compressed. AOF is naturally larger than RDB because it holds the requested commands and gets larger as the program runs. However, there is a lot of redundant command data in the file that can be compressed because there is only one state for a particular key-value pair at any one time.

So what about the newly generated operations in the rewrite process?

2.3 Sentinel High availability solution

The above two sections, mainly in the single server data stability guarantee, then, if it is multi machine, multi process how to guarantee?

Sentinel role: Monitors the health of service nodes

When the master node is down, it is sensed by the sentry and re-elects the master node among the slave nodes:

Sentinel also monitors master nodes that fail, and when they recover, sets them to join the cluster as slave nodes.

In addition to the sentinel scheme of master-slave switchover, there is also the Cluster Cluster mode to ensure the high availability of Redis, which is used to solve the storage waste problem of master-slave replication.

Part3 Multithreading for redis6.x

The overall flow of the single-threaded model has been explained previously, so I won’t go into it too much here.

Redis multi-threaded model, not in the traditional sense of multi-threaded concurrency, but the socket parse back to write this part of the operation parallelization, in order to solve the IO time consumption brought by the system bottleneck.

Any request to the client is actually executed by the main thread, which avoids the competition between threads when operating the same data. The IO is partially parallelized, which reduces the resource consumption of IO and improves the throughput of the system. If you think about it, it feels similar to the asynchronous call in RPC, which is a binding source, waiting for the completion of processing to return corresponding results to each source.

Part4 Redis best practice

Redis is used as a distributed cache application scenario is very common, about cache penetration, cache breakdown, cache avalanche, data drift, cache stampede, cache pollution, hot key and other common problems, in the last article, many strategies, cache king has been elaborated, not repeated here.

Here are some of the concerns for daily development:

  • Key’s design. Try to control the length of the key, one is too long will take up more space, and the other is that we know that the key space is a dictionary, even if it is very fast in the search process, too long key will also increase the comparison judgment time.

  • Use of batch commands. Since most redis operations are consumed in network transmission, changing multiple transmission to single transmission will most likely improve the performance.

  • Value Specifies the size of the value. Avoid large value as much as possible. For the same reason, too much value may affect network transmission efficiency. For example, IN my previous experience, I acquired the information of 200 products in batches (there was a large amount of information, which could be considered as a large value), but it was found very slow. Later, I split 200 into four 50s and called them in parallel, and the effect was significantly improved. This problem can also be optimized with data compression

  • Use of complex commands. Operations such as sorting, aggregation, and so on should be handled offline and then cached, rather than computed online using complex commands.

  • Make use of data structures. The rich data structure of Redis has natural advantages to support business. For example, message queues were used together with bitmap data structure to store and maintain multiple states of goods (inventory, removal, kill, blacklist and whitelist, etc.), and GetBit directly determines whether the goods are allowed to be displayed.

In fact, there is no best practice, each business is different, need to study and try in practice, if you have a very good practical case, also welcome to add, welcome to comment ~

Recommended reading:

1. High concurrency Architecture optimization: Load balancing in detail

2. High Concurrency Architecture optimization: Load balancing practice under trillions of traffic

3. Optimization of high concurrency architecture: The clever use of message-oriented middleware from BAT cases

4. High concurrent storage optimization: details database indexing principles and optimization strategies

5. High concurrent storage optimization: Perhaps one of the most detailed repository and table articles ever written

6. High concurrent storage optimization: Database index optimization Explain combat

7. High concurrent storage optimization: The source code of Ali data middleware is not fully analyzed

8. High concurrent storage optimization: Many strategies, cache is king

If you think the article is still alive, in fact, you can click a “like” to encourageWelcome to search and pay attention to the wechat public account of the same name, to exchange and learn together, everyone’s encouragement is my motivation!

The resources

[1] Design and Implementation of Redis: Huang Jianhong. the

Copyright notice: This article is an original article by Coder’s Technical Path. Shall not be reproduced without permission.