Request/Response protocols and RTT

Redis is a TCP server that uses the client-server model and the so-called request/response protocol. This means that requests can usually be completed by:

  • The client sends a query to the server and usually blocks to read the server’s response from the socket.
  • The server processes the command and sends the response back to the client.

So, for example, the sequence of four commands looks like this:

  • Client: INCR X
  • Server: 1
  • Client: INCR X
  • Server: 2
  • Client: INCR X
  • Server: 3
  • Client: INCR X
  • Server: 4

The client and server are connected through a network link. Such links can be very fast (loopback interface) or very slow (a connection made over the Internet has many hops between two hosts). Regardless of the network latency, the packet takes some time to travel from the client to the server and then from the server to the client for reply.

This time is called RTT (round trip time). It’s easy to see how this can affect performance when a client needs to perform many requests in a row (for example, adding multiple elements to the same list, or populating a database with many keys). For example, if the RTT time is 250 ms (in the case of very slow links on the Internet), even if the server can handle 100,000 requests per second, we can handle up to four requests per second.

If you’re using a loopback interface, the RTT is much shorter (for example, my host reports 0.044 ms when pinging 127.0.0.1), but it’s still a lot if you need to perform multiple writes in a row.

Fortunately, there is a way to improve this use case.

Redis Pipelining

A request/response server can be implemented so that it can process new requests even if the client has not yet read the old response. This way, multiple commands can be sent to the server without waiting for a reply at all, which can be read in the last step.

This is called pipelining and has been widely used for decades. For example, many POP3 protocol implementations already support this feature, greatly speeding up the process of downloading new E-mail messages from the server.

Redis has long supported pipelining, so no matter what version you’re running, you can use pipelining in Redis. Here is an example using the original Netcat utility:

$(printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG
Copy the code

This time we don’t have to pay RTT costs for each call, but only once for the three commands. To be clear, the first example works in the following order by pipelining:

  • Client: INCR X
  • Client: INCR X
  • Client: INCR X
  • Client: INCR X
  • Server: 1
  • Server: 2
  • Server: 3
  • Server: 4

Important: When a client uses pipelining to send a command, the server is forced to use memory to queue responses. Therefore, if you need to pipeline a large number of commands, it is best to send them in a reasonable number of batches, such as 10K commands, read the reply, then send 10K commands again, and so on. The speed is nearly the same, but the extra memory used will be up to the maximum number of responses to the 10K command to be queued.

It’s not just RTT

Pipelined transport is not only a way to reduce the latency cost of round-trip times, it can actually dramatically increase the total number of operations you can perform per second in a given Redis server. This is a result of the fact that not using pipelined services is very cheap per command from the point of view of accessing data structures and generating replies, but very expensive from the point of view of performing socket I /. This involves calling read () and write () system calls, meaning from the user domain to the kernel domain. Context switching is a huge loss of speed.

When pipelining is used, it is common to use a single read () system call to read many commands and a single write () system call to pass multiple replies. As a result, the total number of queries executed per second initially increases almost linearly with longer pipes, eventually reaching 10 times the baseline obtained without pipelining, as you can see in the figure below:

Some real code examples

In the following benchmark, we will use the pipe-enabled Redis Ruby client to test the speed increase due to pipes:

require 'rubygems'
require 'redis'

def bench(descr)
    start = Time.now
    yield
    puts "#{descr} #{Time.now-start} seconds"
end

def without_pipelining
    r = Redis.new
    10000.times {
        r.ping
    }
end

def with_pipelining
    r = Redis.new
    r.pipelined {
        10000.times {
            r.ping
        }
    }
end

bench("without pipelining") {
    without_pipelining
}
bench("with pipelining") {
    with_pipelining
}
Copy the code

Running the simple script above will provide the following data on a Mac OS X system via a loopback interface, and since RTT is already very low, the pipelinised approach will provide minimal improvement:

Without pipelining 1.185238 seconds with pipelining 0.250783 secondsCopy the code

As you can see, using the pipeline, we have increased the transmission speed by five times.

Pipelining VS Scripting

Using Redis scripts (available in Redis 2.6 or later), many pipelined use cases can be solved more efficiently with scripts that perform the extensive work required on the server side. One of the great advantages of scripting is that it can read and write data with minimal latency, making reading, calculating, writing, and so on very fast (pipelining doesn’t help in this case, because the client needs to reply to the read command before reading). It can call the write command.

Sometimes, an application may also want to send an EVAL or EVALSHA command in a pipe. This is entirely possible, and Redis explicitly supports it with the SCRIPT LOAD command (which guarantees that EVALSHA can be called without failure).

Appendix: Why do busy loops slow down even on loopback interfaces?

Even with all the background covered in this page, you might still be wondering why the Redis benchmark (pseudocode) shown below is still slow even when executing the server and client on the same physical machine in the loopback interface:

FOR-ONE-SECOND:
    Redis.SET("foo"."bar")
END
Copy the code

After all, if the Redis process and benchmark are both running in the same box, isn’t that just a message copied in memory from one place to another without any real latency and real network?

The reason is that the processes in the system don’t always run, it’s actually the kernel scheduler that lets the process run, so what happens, for example, is to allow the benchmark to run, read the reply from the Redis server (relevant to the last command executed), and write the new command. The command is now in the loopback interface buffer, but in order for the server to read it, the kernel should schedule the server process (currently blocked in the system call) to run, and so on. So, in fact, because of how the kernel scheduler works, the loopback interface still involves network-like latency.

Basically, busy benchmarking is the simplest thing you can do to measure performance in a network server. It is wise to avoid benchmarking in this way.