GitHub 19K Star Java engineer into god’s path, not to learn about it!

GitHub 19K Star Java engineer into god’s road, really not to learn about it!

GitHub 19K Star Java engineer into god’s road, really really not to learn about it!

Redis is a well-known memory database, has a very rich application in each scene, some time ago, Redis launched version 6.0, in the new version of the multi-threaded model.

Since the in-memory database used by our company is self-developed, I do not pay much attention to Redis in fact, but because Redis is widely used, I need to know more about it so that I can conduct an interview.

You can’t say a candidate has used Redis, but I have to ask about Ali’s Tair.

So, after the introduction of Redis 6.0, I would like to understand why the use of multithreading, the current use of multithreading and previous versions of the difference? Why so late to multithreading?

Isn’t Redis already using multiplexing? Isn’t it supposed to be very high performance? Why a multithreaded model?

This article will analyze these problems and the thinking behind them.

Why was Redis originally designed to be single-threaded?

As a mature distributed cache framework, Redis is composed of many modules, such as network request module, index module, storage module, high availability cluster support module, data operation module and so on.

Many people say that Redis is single-threaded and assume that all module operations in Redis are single-threaded, which is not true.

When we say Redis single-threaded, we mean that the network IO and key pair reads and writes are performed by a single thread. In other words, only the network request module and data manipulation module are single-threaded. Others, such as persistent storage modules, cluster support modules, are multithreaded.

Therefore, it is not that there is no multi-threading model in Redis, as early as Redis 4.0 has been for some commands to do multi-threading.

So why didn’t the network manipulation module and the data storage module use multithreading in the first place?

The answer to this question is relatively simple! Because: “No need!”

Why is that not necessary? Let’s start with, when do you want to use multiple threads?

Multi-threaded application scenario

During the execution of a computer program, two main operations are read and write operations and calculation operations.

Read and write operations mainly involve I/O operations, including network I/O and disk I/O. Computing operations primarily involve the CPU.

The purpose of multithreading is to improve THE UTILIZATION of I/O and CPU by means of concurrency.

Does Redis need multithreading to improve I/O utilization and CPU utilization?

First of all, it’s safe to say that Redis does not need to improve CPU utilization, because Redis operations are mostly memory-based and CPU resources are not a performance bottleneck for Redis.

Therefore, it is unnecessary to use multi-threading to improve Redis CPU utilization.

What about using multithreading to improve I/O utilization in Redis? Is it necessary?

Redis is indeed an I/O intensive framework, with a lot of network I/O and disk I/O occurring during data operations. To improve the performance of Redis, you must improve the I/O utilization of Redis.

However, to improve I/O utilization, multithreading is not the only way to go!

Disadvantages of multithreading

In many articles, we have introduced some multi-threading techniques in Java, such as memory model, locking, CAS, etc. These are some of the techniques provided in Java to ensure thread safety in the case of multi-threading.

Thread safety: a programming term that refers to the ability of a function or library to correctly handle variables shared between multiple threads when called in a concurrent environment.

Similar to Java, all programming languages or frameworks that support multithreading have to face a problem, that is, how to solve the concurrency control problem of shared resources brought by multithreading programming mode.

While multithreading can help improve CPU and I/O utilization, the concurrency issues that multithreading brings with it add complexity to these languages and frameworks. Moreover, in multi-threaded model, switching between multiple threads will bring some performance overhead.

Therefore, in terms of improving I/O utilization, Redis does not use multithreading technology, but multiplex I/O technology.

summary

Redis does not use the multi-threaded model in the network request module and data manipulation module, mainly for the following four reasons:

  • 1. Redis operations are based on memory, and the performance bottleneck of most operations is not in CPU
  • 2, the use of single-threaded model, higher maintainability, development, debugging and maintenance costs are lower
  • 3, single thread model, avoid the performance overhead caused by switching between threads
  • 4. Using multiplexing I/O technology in a single thread can also improve I/O utilization of Redis

Again, keep in mind that Redis is not completely single-threaded, but that critical network IO and key-pair reads and writes are done by a single thread.

Redis multiplexing

The word multiplexing, I believe many people are familiar with. I’ve mentioned this word in many of my previous posts.

We mentioned it in the Introduction to the Linux IO model and in the introduction to HTTP/2.

So, what’s the difference between Redis’s multiplexing technology and what we’ve covered before?

Linux multiplexing is a technique that allows IO from multiple processes to be registered on the same channel, which interacts with the kernel uniformly. When the data required for a request in the pipeline is ready, the process copies the corresponding data into user space.

Look at the picture above and the sentence above again, you may need to use it later.

That is, multiple IO streams are processed by a single thread.

IO multiplexing in Linux includes three types: Select, poll, and epoll. In the abstract, they are similar in functionality, but the details vary.

In fact, all of Redis’s IO multiplexing programs are implemented by wrapping the OPERATING system’s IO multiplexing libraries. Each IO multiplexing library has a separate file in the Redis source code.

In Redis, a file event is generated whenever a socket is ready to perform connection reply, write, read, close, and so on. Because a server typically connects to multiple sockets, it is possible for multiple file events to occur concurrently.

As soon as a request arrives, it is handed over to the Redis thread for processing, which implements the effect of one Redis thread processing multiple IO streams.

Therefore, Redis chose to use multiplexing IO technology to improve I/O utilization.

The high performance of Redis is not only due to the use of multiplexing and single threading, but also due to the following factors:

  • 1, completely memory based, most requests are pure memory operations, very fast.

  • 2, the data structure is simple, the data operation is also simple, such as hash table, hop table has high performance.

  • 3, the use of single thread, avoid unnecessary context switch and competition conditions, there is no multi-process or multi-thread caused by the switch and consumption of CPU

  • 4. Use the multiplex I/O multiplexing model

Why did Redis 6.0 introduce multithreading

In May 2020, Redis officially launched version 6.0, which has many important new features, among which multi-threading has attracted a lot of attention.

However, we need to remind you that the multi-threading in Redis 6.0, also only for the processing of network requests using multi-threading, and data read and write commands, is still a single thread processing.

However, I don’t know if anyone has the question:

Doesn’t Redis boast high performance for single threads?

Multiplexing has greatly improved I/O utilization. Why multithreading?

Mainly because we have higher requirements for Redis.

It is estimated that Redis keeps all the data in memory with a response time of about 100 nanoseconds. For small packets, the Redis server can handle 80,000 to 100,000 QPS, which is high enough for 80% of companies to use single-threaded Redis.

However, with increasingly complex business scenarios, some companies are prone to hundreds of millions of transactions and therefore need larger QPS.

To improve QPS, many companies deploy Redis clusters and maximize the number of Redis machines. But the resource drain is huge.

After analysis, the main bottleneck limiting the performance of Redis appears in the processing of network IO, although multiplexing technology was used before. But as we mentioned earlier, the multiplexing IO model is still essentially a synchronous blocking IO model.

The following is the process of multiplex IO select function:

As we can see from the figure above, in the MULTIPLEXing IO model, the process of calling select (and any other function) is blocked when processing network requests, which means that the process blocks the thread, which can be a bottleneck if the concurrency is high.

Although many servers now have multiple CPU cores, for Redis, because of the use of single thread, in the process of a data operation, a large number of CPU time slices are consumed in the synchronous processing of network IO, and the advantages of multiple cores are not fully played.

If you can use multi-threading, so that the network processing requests concurrently, you can greatly improve performance. In addition to reducing the impact of network I/O waiting, multithreading can also take full advantage of the CPU’s multi-core advantages.

Therefore, Redis 6.0 uses multiple IO threads to process network requests. The resolution of network requests can be completed by other threads, and then the parsed requests are handed over to the main thread for actual memory reading and writing. Improve the parallelism of network request processing, thus improving overall performance.

However, Redis’s multi-IO threads are only used to handle network requests. Redis still uses single threads for read and write commands.

So, after the introduction of multi-threading, how to solve the problem of thread safety caused by concurrency?

This is why we mentioned several times that Redis 6.0’s multithreading is only used to handle network requests, while data is read and written by a single thread.

Redis 6.0 uses multithreading only for receiving and parsing network requests, and for sending the requested data back over the network. Data reads and writes are done by a single thread, so there are no concurrency problems.

References:

www.cnblogs.com/Zzbj/p/1353… Xie. Infoq. Cn/article/b38… Jishuin.proginn.com/p/763bfbd2a… Geek Time: Redis Core Technology and Practice

About the author: Hollis, a person with a unique pursuit of Coding, is a technical expert of Alibaba, co-author of “Three Courses for Programmers”, and author of a series of articles “Java Engineers becoming Gods”.

If you have any comments, suggestions, or want to communicate with the author, you can follow the public account [Hollis] and directly leave a message to me in the background.