Wechat search attention to “water drop and silver bullet” public account, the first time to obtain high-quality technical dry goods. 7 years of senior back-end research and development, to show you a different technical perspective.

Hi, I’m Kaito.

I hear a lot of people talking about whether it’s appropriate to use Redis as a queue.

Some people agree, arguing that Redis is lightweight and convenient to use as a queue.

Others argue that Redis will “lose” data and that it is better to use “professional” queue middleware.

Which is better?

In this article, I’ll talk to you about the appropriateness of Redis as a queue.

I will walk you through the details step by step, from simple to complex, and make this problem really clear.

After reading this article, I hope you will have a fresh perspective on this issue.

At the end of the article, I will also tell you about “technology selection” ideas, the article is a bit long, I hope you can read it patiently.

Start with the simplest: the List queue

First, let’s start with the simplest scenario.

If your business needs are simple enough to use Redis as a queue, the first thing that comes to mind is using the List data type.

Because the underlying implementation of a List is a “linked List” that operates on elements at the head and tail in O(1) time, this means that it fits the message queue model very well.

If you think of a List as a queue, you can use it this way.

Producers use LPUSH to publish messages:

127.0.0.1:6379> LPUSH queue msg1
(integer) 1
127.0.0.1:6379> LPUSH queue msg2
(integer) 2
Copy the code

On the consumer side, pull messages using RPOP:

127.0.0.1:6379> RPOP queue
"msg1"
127.0.0.1:6379> RPOP queue
"msg2"
Copy the code

The model is very simple and easy to understand.

However, there is a small problem. When there are no messages in the queue, the consumer will return NULL when performing RPOP.

127.0.0.1:6379> RPOP queue (nil) // No messageCopy the code

When we write consumer logic, it is usually an “infinite loop”, the logic needs to constantly pull messages from the queue for processing, the pseudocode would read:

while true:
    msg = redis.rpop("queue") // No message, continue loopif msg == null:
        continue// Handle the message (MSG)Copy the code

If the queue is empty, consumers will still pull messages frequently, which will cause “CPU idling”, wasting CPU resources and putting pressure on Redis.

How to solve this problem?

Quite simply, when the queue is empty, we can “sleep” for a while and then try to pull the message. The code could be modified to look like this:

while true:
    msg = redis.rpop("queue") // No message, sleep for 2sif msg == null:
        sleep(2)
        continue// Handle the message (MSG)Copy the code

This solves the CPU idling problem.

This problem was solved, but it created another problem: there was a “delay” in processing new messages while the consumer was dormant and waiting.

Assuming the sleep time is set to 2s, new messages will have a maximum delay of 2s.

The only way to reduce this delay is to reduce the length of sleep. However, a smaller sleep time may cause CPU idling problems.

You can’t have your cake and eat it too.

So how do you keep your CPU from idling while processing new messages in real time?

Is there a mechanism in Redis that if the queue is empty, the consumer “blocks and waits” while pulling the message, notifying my consumer to process the new message as soon as it comes in?

Fortunately, Redis does provide a “blocking” command for pulling messages: BRPOP/BLPOP, where the B stands for Block.

Now you can pull messages like this:

whileTrue: // No message blocking wait,0MSG = redis. Brpop ("queue".0)
    if msg == null:
        continue// Handle the message (MSG)Copy the code

When you use BRPOP blocking to pull messages, you can also pass in a timeout period. If set to 0, the timeout is not set and will not be returned until there is a new message, otherwise NULL will be returned after the specified timeout period.

This is a good solution because it achieves both efficiency and avoids CPU idling, killing two birds with one stone.

Note: If the timeout period is set too long and the connection has not been active for too long, Redis Server may determine that the connection is invalid, and Redis Server will force the client to go offline. Therefore, to use this scheme, the client must have a reconnection mechanism.

So once you’ve solved the problem of messages not being processed in a timely manner, you can think about, what’s the downside of this queue model?

Let’s analyze it:

  1. No double consumption: After a consumer pulls a message, the message is deleted from the List and cannot be consumed by other consumers. That is, multiple consumers cannot consume the same data
  2. Message loss: After a consumer pulls a message, if an abnormal outage occurs, the message is lost

The first problem is functional. Using List to do message queue, it only supports the simplest, a group of producers for a group of consumers, and cannot meet the business scenario of multiple groups of producers and consumers.

The second problem is trickier, because when a message pops out of the List, it is immediately removed from the List. That is, the message cannot be consumed again, whether or not the consumer processes it successfully.

This also means that if a consumer is abnormally down while processing a message, the message is effectively lost.

How to solve these two problems? Let’s look at them one by one.

Publish/subscribe model: Pub/Sub

As the name suggests, this module is designed by Redis specifically for the publish/subscribe queue model.

It solves the first problem mentioned earlier: repeated consumption.

Multiple sets of producers and consumers, and let’s see how it works.

Redis provides PUBLISH/SUBSCRIBE commands to complete PUBLISH and SUBSCRIBE operations.

Suppose you want to open two consumers and consume the same batch of data at the same time.

First, use the SUBSCRIBE command to start two consumers and “SUBSCRIBE” to the same queue.

127.0.0.1:6379> SUBSCRIBE queue Reading messages... (press Ctrl-C to quit) 1) "subscribe" 2) "queue" 3) (integer) 1Copy the code

At this point, both consumers are blocked, waiting for new messages to arrive.

After that, a producer is started and a message is published.

127.0.0.1:6379> PUBLISH queue msg1
(integer) 1
Copy the code

At this point, the two consumers are unblocked and receive a new message from the producer.

127.0.0.1:6379> SUBSCRIBE queue // Receive new messages 1) "message" 2) "queue" 3) "msg1"Copy the code

See, using the Pub/Sub scheme, which supports blocking pull messages, well meets the business needs of multiple groups of consumers consuming the same batch of data.

In addition, Pub/Sub also provides a “matched subscription” mode, which allows consumers to subscribe to “multiple” queues of interest according to certain rules.

127.0.0.1:6379> subscribe queue.* Reading messages... (press Ctrl-C to quit) 1) "psubscribe" 2) "queue.*" 3) (integer) 1Copy the code

The consumer here subscribes to queue.* related queue messages.

Producers then publish messages to Queue.p1 and queue.p2, respectively.

127.0.0.1:6379> PUBLISH queue. P1 msg1 (integer) 1 127.0.0.1:6379> PUBLISH queue. P2 msg2 (integer) 1Copy the code

At this point, the consumer can receive messages from these two producers.

127.0.0.1:6379> SUBSCRIBE queue.* Reading messages... (press Ctrl-C to quit) ... *" 3) "queue. P1 "4) "msg1" // Message from queue. P2 1)" pMessage "2) "queue. "queue.p2" 4) "msg2"Copy the code

As we can see, the biggest advantage of Pub/Sub is that it supports multiple groups of producers and consumers to process messages.

Having said its advantages, what are its disadvantages?

The biggest problem with Pub/Sub is data loss.

Data loss may occur if:

  1. Consumer offline
  2. Redis downtime
  3. Messages are stacked

What the hell is going on?

This actually has a lot to do with how Pub/Sub is implemented.

Pub/Sub is very simple to implement. It is not based on any data type, nor does it do any data storage. It simply establishes a “data forwarding channel” for producers and consumers, and forwards the data that meets the rules from one end to the other.

A complete publish/subscribe message processing process looks like this:

  1. When a consumer subscribes to a specified queue, Redis records a mapping: queue -> consumer
  2. The producer publishes a message to the queue, and Redis finds the corresponding consumer from the mapping and forwards the message to it

You see, there’s no data stored, everything is forwarded in real time.

This design scheme leads to the problems mentioned above.

For example, if a consumer hangs abnormally, it can only receive new messages when it comes back online, and messages posted by the producer during the offline period are discarded because the consumer can’t be found.

If all consumers are offline, the messages posted by the producer will also be “discarded” because no consumers can be found.

So, when you use Pub/Sub, be careful: consumers must subscribe to queues before producers can publish messages, or messages will be lost.

This is why, in the previous example, we asked the consumer to subscribe to the queue before we asked the producer to publish the message.

Also, because Pub/Sub is not implemented on any data type, it does not have “data persistence” capabilities.

In other words, Pub/Sub operations will not be written to the RDB and AOF. When Redis is down and restarted, Pub/Sub data will be lost.

Finally, why does Pub/Sub lose data when dealing with “message backlog”?

When consumers can’t keep up with producers, there’s a backlog of data.

If a List is used as a queue, messages are backlogged and the linked List becomes very long. The most immediate effect is that Redis memory continues to grow until the consumer removes all data from the linked List.

However, Pub/Sub is handled differently, and when messages are backlogged, it can lead to consumption failures and message loss!

What’s going on here?

Back to the implementation details of Pub/Sub.

When each consumer subscribes to a queue, Redis allocates a “buffer” on the Server to that consumer, which is essentially a chunk of memory.

When a producer publishes a message, Redis first writes the message to the corresponding consumer’s buffer.

After that, the consumer continually reads messages from the buffer and processes them.

However, the problem lies in this buffer.

Because the buffer is actually “capped” (configurable), if the consumer pulls messages slowly, the buffer memory continues to grow as messages posted to the buffer by producers start to get backlogged.

If the buffer configuration limit is exceeded, Redis will “force” the consumer offline.

That’s when the consumer fails and loses data.

If you have seen the Redis configuration file, you can see the default configuration of this buffer: client-output-buffer-limit pubSub 32MB 8MB 60.

Its parameters have the following meanings:

  • 32MB: Once the buffer exceeds 32MB, Redis simply forces the consumer offline
  • 8MB + 60: If the buffer exceeds 8MB and lasts for 60 seconds, Redis will also kick the consumer offline

Pub/Sub differs from List in this respect.

From here you can see that List is really a “pull” model and Pub/Sub is really a “push” model.

The data in the List can be stored in memory forever, and consumers can “pull” at any time.

But Pub/Sub “pushes” messages to the consumer’s buffer on Redis Server and waits for the consumer to retrieve them.

When the production and consumption rates do not match, the memory of the buffer starts to swell. In order to control the upper limit of the buffer, Redis has the mechanism described above to force consumers to kick out.

Ok, now let’s summarize the pros and cons of Pub/Sub:

  1. Support for publish/subscribe, support for multiple groups of producers and consumers processing messages
  2. Consumers go offline and data is lost
  3. Data persistence is not supported, Redis is down, and data will be lost
  4. Messages pile up, buffers overflow, consumers are forced offline, and data is lost

Have you found that in addition to the first advantage, the rest are weaknesses.

As a result, many people see Pub/Sub’s features as “chicken ribs”.

For these reasons, Pub/Sub is not used much in practical application scenarios.

Currently, Pub/Sub is used only when sentinels communicate with Redis instances, because Sentinels fit right into the im business scenario.

Pub/Sub has solved the problem of abnormal downtime during message processing and can not be consumed again.

After Pub/Sub removes the data from the buffer, the data will be deleted from the Redis buffer.

Ok, now let’s reframe our requirements when using message queues.

When we use a message queue, we want it to do the following:

  • Support blocking waiting for pull messages
  • Support publish/subscribe mode
  • Consumption fails, can be re-consumed, message is not lost
  • Instances go down, messages are not lost, and data is persistent
  • Message stacking

Does Redis have data types other than List and Pub/Sub that meet these requirements?

In fact, the authors of Redis also see these problems, and have been working towards these directions.

The Redis authors also developed an open source project, Disque, while working on Redis.

The positioning of this project is a distributed message queue middleware based on memory.

But for a variety of reasons, the project has been tepid.

Finally, in Redis 5.0, the authors ported disque functionality to Redis and gave it a new data type: Stream.

Now let’s see, does it meet the requirements mentioned above?

Mature queue: Stream

Let’s look at how Stream solves these problems.

Again, from simple to complex, how does a Stream queue messages?

First, Stream implements the simplest production/consumption model with XADD and XREAD:

  • XADD: Publishes messages
  • XREAD: Reads messages

The producer issues two messages:

// * Redis automatically generates message ID 127.0.0.1:6379> XADD queue * name zhangsan "1618469123380-0" 127.0.0.1:6379> XADD queue * name lisi "1618469127777-0"Copy the code

Publish messages using the XADD command, where the “*” means that Redis automatically generates a unique message ID.

The format of the message ID is timestamp – increment ordinal.

Consumer pull message:

// Read 5 messages from the beginning, 127.0.0.1:6379> XREAD COUNT 5 STREAMS queue 0-0 1) 1) "queue" 2) 1) 1) 1) "1618469123380-0" 2) 1) "name" 2) "zhangsan" 2) 1) "1618469127777-0" 2) 1) "name" 2) "lisi"Copy the code

If you want to continue pulling messages, pass in the ID of the previous message:

127.0.0.1:6379> XREAD COUNT 5 STREAMS queue 1618469127777-0
(nil)
Copy the code

No message, Redis returns NULL.

This is Stream’s simplest production and consumption.

Here no longer focus on the various parameters of the Stream command in my example demonstration, all uppercase word is “fixed” parameter, all lowercase words, are all can define your own, such as the queue name, message length and so on, the following example rules, too, in order to convenient for you to understand, it is necessary to remind here.

How does a Stream address the message queue requirement mentioned above?

1) Does Stream support “blocking” pull messages?

Yes, just add the BLOCK argument when reading the message.

127.0.0.1:6379> XREAD COUNT 5 BLOCK 0 STREAMS queue 1618469127777-0Copy the code

At this point, the consumer blocks and waits until the producer releases a new message.

2) Does the Stream support publish/subscribe?

No problem, Stream does the publishing subscription with the following command:

  • XGROUP: Create consumer groups
  • XREADGROUP: Enables consumer pull messages under the specified consumer group

Now, how do we do that?

First, the producer still publishes two messages:

127.0.0.1:6379> XADD queue * name zhangsan
"1618470740565-0"
127.0.0.1:6379> XADD queue * name lisi
"1618470743793-0"
Copy the code

After that, we want to enable two groups of consumers to process the same batch of data, we need to create two consumer groups:

127.0.0.1:6379> XGROUP CREATE queue group1 0-0 OK 0-0: pull message 127.0.0.1:6379> XGROUP CREATE queue group2 0-0 OKCopy the code

Once the consumer groups are created, we can attach a “consumer” to each of them and let them process the same set of data.

The first consumer group starts spending:

// group1's consumer starts to consume, 127.0.0.1:6379> XREADGROUP GROUP group1 consumer COUNT 5 STREAMS queue > 1) 1) "queue" 2) 1) 1) 1) "1618470740565-0" 2) 1) "name" 2) "zhangsan" 2) 1) "1618470743793-0" 2) 1) "name" 2) "lisi"Copy the code

Similarly, the second consumer group starts spending:

// Group2's consumer starts to consume, 127.0.0.1:6379> XREADGROUP GROUP group2 consumer COUNT 5 STREAMS queue > 1) 1) "queue" 2) 1) 1) 1) "1618470740565-0" 2) 1) "name" 2) "zhangsan" 2) 1) "1618470743793-0" 2) 1) "name" 2) "lisi"Copy the code

So we can see that these two groups of consumers, they can get the same data and process it.

In this way, the purpose of “subscription” consumption by multiple groups of consumers is achieved.

3) If the message processing is abnormal, can the Stream ensure that the message is not lost and re-consumed?

In addition to the message ID used to pull the message above, this message ID is also used here to ensure re-consumption.

When a group of consumers has finished processing a message and needs to execute the XACK command to notify Redis, Redis marks the message as “processing completed.”

127.0.0.1:6379> XACK queue group1 1618472043089-0 // message 1618472043089-0 in group1 is processed successfullyCopy the code

If the consumer is abnormally down, the XACK will definitely not be sent, and Redis will keep the message.

After the group of consumers come back online, Redis will send the data that was not processed successfully to the consumer again. This way, even if the consumer is abnormal, the data will not be lost.

127.0.0.1:6379> XREADGROUP GROUP group1 consumer1 COUNT 5 STREAMS queue 0-0 1) 1) "queue" 2) 1) 1) "1618472043089-0" 2) 1) "name" 2) "zhangsan" 2) 1) "1618472045158-0" 2) 1) "name" 2) "name" 2) "lisi"Copy the code

4) Will Stream data be written to RDB and AOF for persistence?

Stream is a new data type that, like any other data type, writes every write to the RDB and AOF.

We just need to configure the persistence policy so that if Redis goes down and restarts, the Stream data can be recovered from the RDB or AOF.

5) How to deal with Stream when messages pile up?

In fact, there are generally only two solutions when messages pile up on message queues:

  1. Producer limiting: to avoid consumers processing in a timely manner, resulting in continuous backlog
  2. Discarding messages: The middleware discards old messages and retains only new messages of fixed length

Redis uses the second solution when implementing Stream.

When Posting messages, you can specify the maximum length of the queue to prevent memory explosions caused by queue backlogs.

// Maximum queue length 10000 127.0.0.1:6379> XADD queue MAXLEN 10000 * name zhangsan "1618473015018-0"Copy the code

When the queue length exceeds the upper limit, old messages are deleted and only new messages of fixed length are retained.

If you specify a maximum length, a Stream may lose messages when messages are backlogged.

In addition to the above mentioned commands, Stream also supports view message length (XLEN), view consumer status (XINFO), and other simple commands. You can check the official documentation for more information.

Redis Stream covers almost all scenarios of message queues. Is that perfect?

Since it is so powerful, does this mean that Redis can be used as a professional message queue middleware?

But it’s still “near”, and even if Redis can do any of this, it’s “near” to professional message queues.

The reason is that Redis itself has some problems, if it is positioned as a message queue, it is somewhat lacking.

At this point, you have to compare Redis to professional queue middleware.

Let’s take a look at what Redis lacks when it comes to queueing.

Contrast this with a professional message queue

In fact, a professional message queue must have two blocks:

  1. The message is not lost
  2. Message stacking

The first point has been the focus of much of our discussion.

Here we change the perspective, from a message queue “usage model” analysis, how to do, to ensure that the data is not lost?

When using a message queue, there are three main blocks: producers, queue middleware, and consumers.

Whether a message will be lost depends on the following three aspects:

  1. Does the producer lose messages?
  2. Will consumers lose messages?
  3. Does queuing middleware lose messages?

1) Will the producer lose messages?

When a producer publishes a message, the following exceptions may occur:

  1. Message not sent: The middleware returns a failure when the publication fails due to network failure or other problems
  2. Uncertain whether publishing is successful: Publishing times out due to a network problem. Data may be sent successfully, but reading response results times out

If it’s case 1, and the message doesn’t go out at all, then it’s good to send it again.

In case 2, the producer has no way of knowing whether the message was sent or not, right? So, to avoid message loss, it can only continue to retry until the publication is successful.

The producer usually sets a maximum number of retries. If the maximum number of retries exceeds the threshold, the system still fails and logs alarms.

In other words, to avoid message loss, the producer can only process the message in a retry mode.

But did you find any? This also means that messages may be sent repeatedly.

Yes, when using message queues, make sure messages are not lost, rather than discarded.

On the consumer side, we need to do a little more logic.

For sensitive business, when consumers receive repeated data, idempotent logic should be designed to ensure the correctness of business.

From this point of view, whether a producer loses a message depends on how well the producer handles the exception.

So, whether it is Redis or professional queue middleware, the producer can guarantee that the message is not lost at this point.

2) Will consumers lose messages?

This kind of situation is what we mentioned before. After the consumer gets the message, it is abnormally down before the processing is completed. Can the consumer re-consume the failed message?

To solve this problem, the consumer must “tell” the queue middleware after processing the message, and the queue middleware will process the markup, otherwise the data will still be sent to the consumer.

This scheme requires the cooperation of the consumer and the middleware to ensure that the message on the consumer side is not lost.

Both Redis Stream and professional queue middleware such as RabbitMQ and Kafka do this.

So, in that respect, Redis also qualified.

3) Will the queue middleware lose messages?

The first two problems are relatively easy to deal with, as long as the client and server work well together, you can ensure that the production end and the consumer end do not lose messages.

But what if queue middleware is inherently unreliable?

After all, producers and consumers depend on it, and if it’s unreliable, then no matter what producers and consumers do, they can’t guarantee data loss.

In this respect, Redis actually falls short.

Redis causes data loss in the following two scenarios.

  1. AOF persistence is configured to write disks every second. However, the write process is asynchronous, and data may be lost when Redis is down
  2. Master/slave replication is also asynchronous, and there is the possibility of data loss during the master/slave switchover (data sent from the master database is deducted from the master database before synchronization is completed).

Based on the above reasons, we can see that Redis itself cannot guarantee strict data integrity.

Therefore, if Redis is treated as a message queue, it is possible to lose data in this respect.

How do professional message queue middleware solve this problem?

Professional queue middleware, such as RabbitMQ or Kafka, typically deploys a cluster and writes “multiple nodes” to ensure message integrity when producers publish messages. In this way, even if one of the nodes fails, data in the cluster will not be lost.

For this reason RabbitMQ and Kafka are more complex to design. After all, they are designed specifically for queue scenarios.

But Redis positioning is different, its positioning is more used as a cache, they are definitely different in this respect.

Finally, what about the message backlog?

4) What to do with the message backlog?

Because Redis data is stored in memory, it means that once message backlog occurs, Redis memory will continue to grow, if the machine memory limit is exceeded, it will face the risk of OOM.

Therefore, Redis Stream provides the ability to specify the maximum queue length to avoid this situation.

Unlike message queues such as Kafka and RabbitMQ, where data is stored on disk, the cost of disk is much lower than memory. When messages are backlogged, they simply take up more disk space and are more comfortable with backlogs than memory.

To sum up, we can see that there are always two problems when using Redis as a queue:

  1. Redis itself may lose data
  2. Redis is running out of memory in the face of message backlogs

At this point, whether Redis can be used as a queue, I think this answer should be clearer to you.

If your business scenario is simple enough, insensitive to data loss, and with a low probability of message backlogs, using Redis as a queue is perfectly fine.

Redis is also lighter to deploy and operate than Kafka and RabbitMQ.

If your business scenario is very sensitive to data loss and writes are very heavy and messages backlogs consume a lot of machine resources, THEN I recommend using professional message queue middleware.

conclusion

All right, so to sum up. From the perspective of “whether Redis can be used as a queue”, this article introduces the use of List, Pub/Sub and Stream in queues, and their advantages and disadvantages.

Then we compare Redis with professional message queue middleware and find the deficiency of Redis.

Finally, we find a suitable scenario for Redis as a queue.

Here I have also compiled a table summarizing the pros and cons of each.

Afterword.

Finally, I would like to talk with you about the problem of “technical solution selection”.

As you can see, this article starts with Redis, but it doesn’t stop there.

We were asking questions as we analyzed the details of Redis, looking for better solutions, and at the end of the article we talked about what a professional message queue should do.

In fact, when we discuss the selection of technology, it is a question of how to choose.

And here I want to convey to you the message is, in the face of technology selection, do not think which solution is good, which solution is not good.

You need to analyze it on a case-by-case basis, and I’ve divided the analysis process into two levels:

  1. Business function perspective
  2. Technical resource perspective

This article is all about making decisions from a business function perspective.

But the second point here, from a technical resource point of view, is also important.

From the perspective of technical resources, whether your company’s environment and technical resources match these technical solutions.

How do you explain this?

Simply speaking, is your company, team, whether there are matching resources can hold these technical solutions.

We all know that Kafka and RabbitMQ are very specialized messaging middleware, but their deployment and operation is a bit more complicated than Redis.

If you’re in a big company that already has a good operations team, it’s ok to use this middleware because there are good enough people who can handle it and the company will invest people and time in this direction.

However, if you are in a startup company and the business is in a rapid growth period, you do not have the team and people who can handle these middleware. If you use these components hastily, it will be difficult to troubleshoot the problems when they happen, and even hinder the development of the business.

In this case, if the company’s technical personnel are familiar with Redis, and the comprehensive evaluation shows that Redis can basically meet 90% of the business needs, it is not necessarily a bad decision to choose Redis at present.

Therefore, the selection of technology is not only a technical problem, but also related to people, team, management and organizational structure.

It is for these reasons that when you talk to others about technology selection, you will find that every company has a different approach.

After all, every company is in a different environment and culture, so of course the decisions will be different.

If you do not understand the logic of this, then in the selection of technology, it will only tend to the surface, unable to get to the root of the problem.

Once you understand this logic, you will not only have a deeper understanding of the technology, but also a clearer grasp of the technical resources and people when you look at the problem.

I hope you can take these factors into account when you choose your technology, which is very helpful for your technological growth.

Want to read more hardcore technology articles? Welcome to follow my public account”Water drops and silver bullets”.

I’m Kaito, a veteran backend programmer who thinks about technology. In my article, I will not only tell you what a technology point is, but also tell you why. I’ll also try to distill these thought processes into general methodologies that you can apply to other fields.