English | I ‘m not feeling the async pressure [1]

The original | Armin Ronacher 2020.01.01

The translator | under the pea flower @ Python cat cat

Statement: This translation is based on the CC BY-NC-SA 4.0 [2] license agreement, the content is slightly modified, please keep the original source, do not use for commercial or illegal purposes.

Async is all the rage. Asynchronous Python, asynchronous Rust, Go, Node,.net, pick your favorite language ecosystem, it all uses some asynchrony. How good asynchrony is depends a lot on the ecology of the language and how long it runs, but overall it has some nice benefits. It makes it very easy to wait for actions that might take some time to complete.

It is so simple that countless new ways have been created to blow ones foot off. One of the situations I want to talk about is when you don’t realize you’ve stepped on your foot until the system is overloaded, and that’s the theme of back pressure management. A related term in protocol design is flow control.

What is back pressure

There are many explanations of Backpressure, but a good one I recommend reading at first is Backpressure explained — the resisted flow of data through software [3]. So, rather than go into detail about what backpressure is, I want to give a very brief definition and explanation: backpressure is the resistance that prevents data from flowing through the system. Back pressure sounds negative — anyone can imagine a bathtub overflowing from a clogged pipe — but it’s meant to save you time.

(Back pressure)

Here, what we’re dealing with is more or less the same in all cases: We have a system that combines different components into a pipe that needs to receive a certain number of incoming messages.

You can imagine it’s like simulating baggage delivery at an airport. The luggage arrives, is sorted, loaded into the plane, and finally unloaded. In this process, one piece of luggage is thrown into a container for transport along with the others. When a container is full, it needs to be taken away. When there are no containers left, this is a natural example of back pressure. Now, the baggage-bearers can’t, because there are no containers.

A decision must be made at this point. One option is to wait: this is often called queueing or buffering. The other option is to drop some luggage until a container arrives — this is called dropping. That sounds bad, but we’ll talk later about why it’s sometimes important.

But here’s another thing. Imagine that the person responsible for putting luggage into a container does not wait for the container for a longer period of time (say, a week). Ultimately, if they don’t dump their luggage, there’s a huge amount of luggage around them. Eventually, they were forced to pack so many bags that they ran out of physical storage space. At that point, they’d be better off telling the airport they can’t accept any more luggage until they sort out the container problem. This is often referred to as flow control. [4], is a crucial network concept.

Typically these processing pipes can only hold a certain number of messages at a time (such as the suitcase in this case). If it is exceeded, or worse, the pipeline stalls, terrible things can happen. A real-world example is the opening of London’s Heathrow Terminal 5, which failed to handle 42,000 bags in 10 days because its IT infrastructure was not working properly. They had to cancel more than 500 flights, and for a while, the airline decided to allow carry-on baggage only.

Back pressure is important

What we learned from the Heathrow disaster is that being able to communicate back pressure is crucial. In real life as well as in computing, time is always limited. Eventually people give up waiting for something. Especially if something can wait forever on the inside, it can’t on the outside.

To take a practical example, if your luggage has to go through London Heathrow airport to reach your destination in Paris, but you can only stay there for seven days, it would be meaningless if it arrived 10 days late. In fact, you want to have your luggage re-routed back to your home airport.

In fact, it’s better to admit failure (you’re overloaded) than to pretend it works and keep buffering, because at some point it will only make things worse.

So why, after years of writing thread-based software, is back pressure suddenly a topic of discussion? There is a combination of factors, some of which can easily get you into trouble.

Bad default

To understand why back pressure is important in asynchronous code, I’d like to provide you with a deceptively simple Python Asyncio code that shows some cases where we inadvertently forgot about back pressure:

from asyncio import start_server, run

async def on_client_connected(reader, writer):
    while True:
        data = await reader.readline()
        if not data:
            break
        writer.write(data)

async def server():
    srv = await start_server(on_client_connected, '127.0.0.1', 8888)
    async with srv:
        await srv.serve_forever()

run(server())Copy the code

If you are new to the concept of async/await, imagine that while an await is called, the function hangs until the expression is resolved. In this case, the StartServer function provided by Python’s Asyncio library runs a hidden Accept loop. It listens for sockets and generates a separate task for each connected socket running the onClient_connected function.

Now, this seems pretty straightforward. You can remove all await and async keywords and the resulting code will look very similar to code written in a threaded manner.

However, it hides a very critical problem that is at the root of all our problems: there is no await before some function calls. In threaded code, any function can yield. In asynchronous code, only asynchronous functions can. In this case, this means that the writer.write method cannot block. So how does it work? It will attempt to write data directly to the operating system’s non-blocking socket buffer.

But what happens if the buffer is full and the socket is blocked? In the case of threads, we can block it here, which is ideal, because it means we’re applying some back pressure. However, since there are no threads, we cannot do this. Therefore, we can only buffer or delete data here. Because deleting data is bad enough, Python opted for buffering.

Now, what happens if someone sends a lot of data into it and doesn’t read it? Well in that case, the buffer gets bigger and bigger and bigger. This API flaw is why the Python documentation says not to just use write alone, but to write drain afterwards:

writer.write(data)
await writer.drain()Copy the code

Drain drains excess material from the buffer. It doesn’t empty the buffer, only to the point where things don’t get out of hand. So why not do implicit drain for write? Well, this is going to be a massive API monitor, and I’m not sure how to do it.

What’s important here is that most sockets are based on TCP, and TCP has built-in flow control. Writer writes only as fast (giving or taking up some buffer space) as reader can accept. This is completely hidden from the developer because even the BSD socket library does not expose this implicit flow control operation.

So did we solve the back pressure problem here? Well, let’s take a look at what happens in the threaded world. In a threaded world, our code would probably run a fixed number of threads, and the Accept loop would wait until the thread became available to take over the request.

However, in our asynchronous example, there are countless connections to deal with. This means we can get a lot of connections, even if it means the system can be overloaded. This might not be a problem in this very simple example, but imagine what would happen if we were doing database access.

Imagine a database connection pool that provides up to 50 connections. What good is it to accept 10,000 connections when most of them will block at the pool?

Waiting and waiting and waiting

Well, we’re back to where I wanted to start. In most asynchronous systems, and especially in most of the cases I’ve encountered in Python, even if you fix all the socket layer buffering behavior, you end up in a world where a bunch of asynchronous functions are chaining together, regardless of the back pressure.

If we take the database connection pool as an example, assume that there are only 50 connections available. This means that our code can have up to 50 concurrent database sessions. Suppose we want to handle four times as many requests because many of the operations we expect our application to perform are database independent. One solution is to make a semaphore with 200 tokens and get one at the beginning. If we run out of tokens, we wait for the semaphore to issue tokens.

But wait a minute. Now we’re in line again! We’re just in the front row. If we overload the system too much, we’ll be waiting in line from the start. So now everyone will wait as long as they are willing and then give up. Even worse: The server may still spend some time processing these requests until it realizes that the client has disappeared and is no longer interested in responding.

So instead of waiting forever, we want immediate feedback. Imagine you’re at a post office and you’re getting a ticket from a machine. The ticket says when it’s your turn. This ticket is a good indication of how long you’ll have to wait. If you wait too long, you decide to ditch your vote and come back later. Note that the time you spend waiting in line at the post office has nothing to do with the time it takes to actually process your request (for example, because someone needs to pick up the package, check the file, and collect the signature).

So this is the naive version of what we know we’re waiting for:

from asyncio.sync import Semaphore

semaphore = Semaphore(200)

async def handle_request(request):
    await semaphore.acquire()
    try:
        return generate_response(request)
    finally:
        semaphore.release()Copy the code

For the caller of the handle_REQUEST asynchronous function, we can only see that we are waiting and nothing is happening. We cannot see whether we are waiting because we are overloaded or because it is taking a long time to generate a response. Basically, we keep buffering here until the server finally runs out of memory and crashes.

This is because we don’t have a communication channel about back pressure. So how do we solve this? One option is to add an intermediate layer. Now unfortunately, the Asyncio semaphore here is useless because it just makes us wait. But suppose we can ask how many tokens are left in the semaphore, then we can do something like this:

from hypothetical_asyncio.sync import Semaphore, Service

semaphore = Semaphore(200)

class RequestHandlerService(Service):
    async def handle(self, request):
        await semaphore.acquire()
        try:
            return generate_response(request)
        finally:
            semaphore.release()

    @property
    def is_ready(self):
        return semaphore.tokens_available()Copy the code

Now, we have made some changes to the system. Now we have a RequestHandlerService that contains more information. In particular, it has the concept of readiness. The service can be asked if it is ready. This operation is essentially non-blocking and is the best estimate.

Now, the caller will place this:

response = await handle_request(request)Copy the code

Become this:

request_handler = RequestHandlerService()
if not request_handler.is_ready:
    response = Response(status_code=503)
else:
    response = await request_handler.handle(request)Copy the code

There are several ways to do it, but the idea is the same. Before we actually do something, we have a way to figure out the likelihood of success, and if we are overloaded, we will communicate upwards.

Now, I have no idea how to define this service. Its design comes from Rust’s Tower [5] and Rust’s Actix-Service [6]. Both define service characteristics very similarly.

Now, because it’s so racy, it’s still possible to stack up semaphores. Now, you can take that risk or throw a failure when Handle is called.

One library that solves this problem better than Asyncio is TRIO, which exposes internal counters on the semaphore and provides a CapacityLimiter, which is a semaphore optimized for capacity limits and prevents some common pitfalls.

Data flows and protocols

For now, the above example solves the RPC-style case for us. For each call, if the system is overloaded, we know it as soon as possible. Many protocols have a very direct way of communicating that the server is loading. For example, in HTTP, you can issue 503 with a retry-after field in the header that tells the client when it can retry. A natural point of reevaluation is added the next retry to determine whether to retry with the same request or change something. For example, if you can’t try again in 15 seconds, it’s better to show the user this incompetence than an endless load icon.

But the Request /response protocol is not the only protocol. Many protocols open persistent connections, allowing you to transfer large amounts of data. Traditionally, many of these protocols are based on TCP, which, as mentioned earlier, has built-in flow control. However, this flow control is not really exposed through a socket library, which is why advanced protocols often need to add their own flow control to them. For example, in HTTP2, a custom flow control protocol exists because HTTP2 multiplexes multiple independent streams over a single TCP connection.

Because TCP silently manages traffic control in the background, this can lead developers down the dangerous path of reading bytes from the socket and thinking that this is all they need to know. However, the TCP API is misleading because traffic control is completely hidden from the user from an API perspective. When you design your own protocols based on data streams, you need to be absolutely sure that there are two-way communication channels where the sender not only sends but also reads to see if they are allowed to continue.

For data flows, the concerns are usually different. Many data streams are just streams of bytes or data frames, and you can’t just drop packets between them. Worse: It’s not always easy for the sender to tell if they should slow down. In HTTP2, you need to interleaved reads and writes at the user level. You have to deal with flow control there. When you write and are allowed to write, the server will send you WINDOW_UPDATE frames.

This means that data flow code becomes more complex because you first need to write a framework that can control incoming traffic. For example, the Hyper-H2 [7] Python library has a surprisingly complex example of a file upload server, [8] based on Curio’s flow control, but not yet complete.

The new rifle

Async /await is great, but it encourages writing that can be disastrous when overloaded. Partly because it’s so easy to queue, but also because it causes API corruption after mutating the function. I can only assume that this is why Python still uses the unwaitable write function on the dataflow Writer.

The biggest reason, though, is that async/await allows you to write code that a lot of people wouldn’t be able to write in threads initially. I think this is a good thing because it lowers the barriers to actually writing large systems. The downside is that it also means that many developers who previously had little experience with distributed systems now experience many of the problems of distributed systems, even if they write only one program. Due to the nature of multiplexing, HTTP2 is a very complex protocol and the only logical way to implement it is based on the async/await example.

It is not just async/await code that encounters these problems. For example, Dask [9] is a Python parallel library used by data science programmers. Despite not using async/await, there are still some bug reports [10] indicating that the system is out of memory due to lack of back pressure. But the problems are quite fundamental.

The absence of back pressure, however, is a rifle the size of a bazooka. If you realize too late that you’ve built a monster, it’s almost impossible to fix it without making major changes to your code base, because you might forget to use asynchrony on some function that should have.

Other programming environments don’t help either. People have encountered the same problem in all programming environments, including the latest versions of Go and Rust. Even in very popular projects that are open source for a long time, it’s not uncommon to find an open issue about “process control” or “handle back pressure,” as adding this later proved really difficult. For example, go has had an open issue since 2014 about adding semaphores to all file system IO, [11] because it could overload the host. Aiohttp has a problem dating back to 2016 [12] about clients breaking servers due to insufficient back pressure. There are many, many more examples.

If you look at Python’s hyper-H2 documentation, you’ll see a number of shocking examples, including things like “doesn’t handle flow control”, “it doesn’t comply with HTTP/2 flow control, which is a bug, but otherwise fine”, and so on. When flow control was first introduced, I thought it was very complicated. It’s easy to pretend that this isn’t a problem, and that’s the root cause of why we’re in this mess. Flow control also adds a lot of overhead and does not work well in benchmark testing.

So, for you asynchronous library developers, here’s a New Year’s resolution: give back pressure and flow control the attention it deserves in documentation and apis.

A link to the

[1] I’m not feeling the async pressure: https://lucumr.pocoo.org/2020/1/1/async-pressure/

[2] CC BY – NC – SA 4.0: https://creativecommons.org/licenses/by-nc-sa/4.0/

[3] Backpressure explained — the resisted flow of data through software: https://medium.com/@jayphelps/backpressure-explained-the-flow-of-data-through-software-2350b3e77ce7

[4] flow control: https://en.wikipedia.org/wiki/Flow_control_ (data)

[5] tower: https://github.com/tower-rs/tower

[6] actix-service: https://docs.rs/actix-service/

[7] hyper-h2: https://github.com/python-hyper/hyper-h2

[8] file upload server example: https://python-hyper.org/projects/h2/en/stable/curio-example.html

[9] Dask: https://dask.org/

[10] back pressure: https://github.com/dask/distributed/issues/2602

[11] about to all file system IO add semaphore: https://github.com/golang/go/issues/7903

[12] There is a problem dating back to 2016: https://github.com/aio-libs/aiohttp/issues/1368

The public account “Python Cat”, this serial quality articles, cat philosophy series, Python advanced series, good books recommended series, technical writing, quality English recommended and translation, etc., welcome to pay attention to oh.