Sync vs. Async Python: What is the Difference?
Have you ever heard someone say that asynchronous Python code is faster than “normal” (or synchronous) Python code? How is that possible? In this article, I’ll try to explain what asynchrony is and how it differs from normal Python code.
What does Sync and Async mean?
Web applications typically need to process many requests, all from different clients in a short period of time. To avoid processing delays, they must be able to process multiple requests in parallel (often called concurrency). In this article, I’ll continue to use Web applications as examples, but keep in mind that there are other types of applications that benefit from multitasking, so this discussion is not specifically about the Web.
The terms “sync” and “async” refer to two ways of writing applications that use concurrency. The so-called “sync” server uses underlying operating system support for threads and processes to achieve this concurrency. Here’s what a synchronous deployment looks like:
In this case, we have five clients, all of which send requests to the application. The common point of access for this application is a Web server that acts as a load balancer, distributing requests to a set of server workers that can be implemented as processes, threads, or a combination of the two. The worker performs the requests assigned to them by the load balancer. You can write application logic in A Web application framework like Flask or Django, which is located in these workers.
This type of solution is ideal for servers with multiple cpus, as you can configure the number of workers as a multiple of the number of cpus, and with this configuration you can achieve even utilization of cores that a single Python process cannot. Because the global interpreter lock (GIL) imposes some restrictions.
In terms of drawbacks, the figure above clearly illustrates the major limitations of this approach. We have five clients, but only four workers. If the 5 clients send requests at the same time, and the load balancer can only dispatch 1 request to each worker, the requests without competition to the worker will remain in the queue, waiting for a worker to be available. As a result, four out of five clients will receive a response in time, but one of them will have to wait longer. The key to good server performance is to select the appropriate number of workers to prevent or minimize the blocking of requests given the expected load.
The asynchronous server setup is harder to draw, but this is my best option:
This type of server runs in a single process controlled by a loop. A loop is a very efficient task manager and scheduler that creates tasks to handle requests sent by clients. Unlike the long-running server worker, the loop creates an asynchronous task to process a particular request, and when the request completes, the task is destroyed. At any given time, an asynchronous server may have hundreds or even thousands of active tasks, all of which are managed by loops and doing their work simultaneously.
You may be wondering how parallelism between asynchronous tasks is implemented. This is the interesting part, because asynchronous applications rely entirely on collaborative multitasking. What does that mean? When a task needs to wait for an external event, such as a response from a database server, rather than waiting like the synchronization worker does, it tells the loop what to wait for and then returns control to the loop. The loop can then find another task ready to run, which is blocked by the database. Eventually the database will send a response, at which point the loop will consider that the first task is ready to run again and will resume it as soon as possible.
The ability to pause and resume execution of asynchronous tasks is abstract and may be difficult to understand. To help you apply it to something you probably already know, consider that one way to do this in Python is to use the await or yield keyword, but that’s not the only way, as you’ll see later.
Asynchronous applications run entirely in a single process and a single thread, which is surprising. Of course, this type of concurrency requires certain rules, because you can’t let tasks stay on the CPU for too long or the rest of the task will starve to death. In order to work asynchronously, all tasks need to be paused automatically and control returned to the loop in time. To benefit from the asynchronous style, applications need to perform tasks that are usually blocked by I/O and do not require much CPU work. Web applications are often a good fit, especially if they need to handle a large number of client requests.
To maximize the use of multiple cpus when using asynchronous servers, it is common to create a hybrid solution, adding a load balancer and running an asynchronous server on each CPU, as shown in the figure below:
Two ways to implement asynchrony in Python
As I’m sure you know, to write asynchronous applications in Python, you can use the asyncio package, which is built on top of coroutines to implement the suspend and restore features required by all asynchronous applications. The keyword yield, along with newer async and await, is the basis on which Asyncio builds asynchronous functionality. To paint a complete picture, there are other coroutine-based asynchronous solutions in the Python ecosystem, such as Trio and Curio. Then there’s Twisted, which is the oldest collaboration framework, even before Asyncio.
If you’re interested in writing an asynchronous Web application, there are a number of coroutins-based asynchronous frameworks to choose from, including AIOHTTP, SANIC, FastAPI, and Tornado.
What many people don’t know is that coroutines are just one of two ways you can write asynchronous code in Python. The second method is based on a package called Greenlet, which you can install using PIP. Greenlets are similar to coroutines in that they also allow Python functions to pause and resume later, but in a completely different way, which means that the asynchronous ecosystem in Python falls into two broad categories.
An interesting difference between coroutine and Greenlets for asynchronous development is that the former requires Python language specific keywords and features to work, while the latter does not. What I mean by that is that coroutine-based applications need to be written with a very specific syntax, while greenlet based applications look exactly like normal Python code. This is cool because, under certain conditions, it allows synchronous code to be executed asynchronously, which coroutine-based solutions such as Asyncio cannot do.
So what are the asyncio equivalents in greenlet? I know of three greenlets-based asynchronous packages: Gevent, Eventlet, and Meinheld, although the last one is more like a Web server than a generic asynchronous library. They all have their own asynchronous loop implementations and provide an interesting “monk-patching” feature that replaces blocking functions in the Python standard library, such as those that perform networking and threading, with equivalent non-blocking versions implemented in Greenlets. If you have a piece of synchronous code that you want to run asynchronously, these packages will most likely allow you to do that.
You’d be surprised. As far as I know, the only Web framework that explicitly supports Greenlets is Flask. The framework automatically detects when you are running on a Greenlet Web server and makes adjustments accordingly, without any configuration. While doing this, you need to be careful not to call the blocking functions, otherwise, use monkey-patching to “fix” them.
Flask isn’t the only framework that could benefit from Greenlets, however. Other Web frameworks, such as Django and Bottle, are unaware of Greenlets and can run asynchronously when paired with a Greenlet Web server, and the monkey patch fixes blocking.
Is asynchronous faster than synchronous?
There are widespread misconceptions about the performance of synchronous and asynchronous applications. People think that asynchronous applications are much faster than synchronous applications.
Let me clarify this so that we can reach a common understanding. Python code runs at exactly the same speed whether it is written synchronously or asynchronously. In addition to code, there are two factors that can affect the performance of concurrent applications: context switching and scalability.
Context switch
Equally sharing the required CPU work across all running tasks (called context switching) can affect application performance. For synchronous applications, this is done by the operating system and is essentially a black box with no configuration or fine-tuning options. For asynchronous applications, context switching is done by a loop.
The default implementation of loops provided by Asyncio is written in Python and is not considered very efficient. The UvLoop package provides an alternative loop that is partially implemented in C code for better performance. The event loops used by Gevent and Meinheld are also written in C code. Eventlet uses a loop written in Python.
A highly optimized asynchronous loop may be more efficient at context switching than an operating system, but in my experience you have to run at a high concurrency level to see a real performance improvement. For most applications, I don’t think the performance difference between synchronous and asynchronous context switches will be significant.
scalability
In my opinion, the myth that asynchronous is faster stems from the fact that asynchronous applications generally use THE CPU more efficiently because they scale better and have a more flexible approach than synchronous applications.
Consider what would happen if the synchronization server shown in the figure above received 100 requests at the same time. The server cannot process more than four requests at a time, so most of them will wait in the queue for some time before the worker can be allocated.
In contrast to the asynchronous server, the asynchronous server creates 100 tasks immediately (if using a mixed model, 4 asynchronous workers each create 25 tasks). With an asynchronous server, all requests can be processed without waiting (though to be fair, there may be other bottlenecks that slow things down, such as limits on the number of active database connections).
If these 100 tasks use a lot of CPU, then the synchronous and asynchronous solutions will have similar performance, because the CPU runs at a fixed speed, Python always executes code at the same speed, and the application does the same work. However, if the task requires a large number of I/O operations, with only four concurrent requests, the synchronization server may not be able to achieve high CPU utilization. An asynchronous server, on the other hand, is definitely better able to keep the CPU busy because it runs all 100 requests in parallel.
You may wonder why you can’t run 100 synchronization workers so that both servers have the same concurrency. Consider that each worker needs its own Python interpreter, with all the resources associated with it, plus a separate copy of the application with its own resources. The size of the server and application will determine how many Worker instances you can run, but this number is usually not very high. Asynchronous tasks, on the other hand, are very lightweight and all run within the context of a single worker process, so they have a distinct advantage.
With this in mind, we can say that asynchrony is faster than synchronization only if:
- High load (without high load there is no advantage of high concurrency)
- Tasks are I/O bound (if tasks are CPU bound, then concurrency over CPU is not helpful)
- You can look at the average number of requests processed per unit of time. If you look at individual request processing times, you won’t see much difference, and because there are more concurrent tasks competing for CPU, asynchrony may even be a little slower
I hope this article clears up some confusion and misconceptions about asynchronous code. I want you to remember the following two points:
- Under high load, an asynchronous application will only do better than its synchronous equivalent
- Even if you write normal code and use a traditional framework like Flask or Django, you can benefit from async thanks to Greenlet
If you want to learn more about how an Asynchronous system works, check out my PyCon presentation Asynchronous Python for the Complete.