A brief introduction to the Python asynchronous I/O library asyncio

Python’s asyncio library is based on coroutines, with event_loop serving as the driver and scheduling model for coroutines. The model is a single-threaded asynchronous model, similar to Node.js. Below is my understanding of the model

The event loop listens for ready events via select(), and if so, adds the corresponding callback to a Task list. A task is then retrieved from the task list header to execute. We implement our event_loop model by continuously registering and executing events in a single thread.

Tasks executed in event_loop are not functions

If we think of the figure above as a Web server, the task on the left as the complete task to be performed for an HTTP request. If run_task() completes a complete task each time, run the next task. So it’s no different than a normal serial server. Poor user experience in a concurrent environment.

How bad is that? You can imagine, after all, we’re doing a single threaded Web server, right

So a task that corresponds to a complete HTTP request cannot be a function, because the function needs to be executed from beginning to end, occupying the entire thread. What do you think task is?

If you don’t know the answer, take a look at my other article outlining Yield and yield from in Python

Yes, a task is a generator, or an interruptible function. The task code is still written down to process an HTTP request. This is what we call synchronous code organization.

The difference, however, is that in task, when we encounter an I/O operation, we hand the I/O operation to the selector. (We’ll parse the selector later and also tell the selector what callback to perform when the I/O operation is ready.) We then use yield to save and break the function.

Control of the thread returns to event_loop. Event_loop first looks to see if any ready data exists in the selector, puts the corresponding callback at the end of the task list (see figure), and continues run_task() from the head.

You may be wondering when the interrupted task will resume execution. As I mentioned in the previous sentence, the event_loop checks for ready I/O operations in the selector each time. If there are ready I/O operations, we put the callback to the end of the task when the event_loop executes to the task. We can go back to the function we just broke, and the data we need for the I/O operation is ready.

If you look at it from the perspective of the function, you need to get some data from the remote server, so you call get(), and suddenly you get the data from the remote server. That’s right. In the function’s view, it’s instantaneous, it’s like traveling to the future.

You may wonder why callback is put into task and then run back to the original function execution position.

I don’t know, I haven’t really followed asyncio’s code, it’s a bit complicated for me. If IT were me, I would simply set a variable gen in the callback to point to our generator, and then gen. Send (res_data) in the callback would return to the interrupt and continue. If you are interested, you can use debug to trace the code yourself.

But I recommend you read this blog post for an in-depth understanding of Python asynchronous programming (part 1)

There are a few more questions here.

For example, if we need to perform an operation from 1+2+3+ to 20 million in a task, and the operation is too time-consuming and not an I/O operation to be assigned to a selector, we need to yield ourselves so that other tasks have a chance to use our unique thread. So there’s a new problem. After yield, when do we execute the interrupted function again?

Sample problem code

import asyncio

def print_sum(a):
    sum = 0
    count = 0
    for a in range(20000000):
        sum += a
        count += 1
        if count > 1000000:
            count = 0
            yield
    print('1+ to 20 million is {}'..format(sum))

@asyncio.coroutine
def init(a):
    yield from print_sum()

loop = asyncio.get_event_loop()
loop.run_until_complete(init())
loop.run_forever()Copy the code

I think we could add the interrupted task directly to the end of the task list and continue the event_loop so that other tasks have a chance to execute and it’s easier to process. The Asyncio library does exactly that.

But asyncio also provides a better way to do this, we can start another thread to perform this CPU intensive operation

Let’s look at another problem. If your task list is empty at 3:30 am, how does your event_loop work? Continue the loop waiting for new HTTP requests to come in? No, we do not allow such a waste of CPU resources. The Asyncio library also does not allow this.

Start by looking at the two lines of code snippet in event_loop, the select(timeout) section in the upper right corner of the image above

    event_list = self._selector.select(timeout)
    self._process_events(event_list)Copy the code

In addition, as a Web server, we always need socket(), bind(), and listen() to create a listener descriptor, sockfd, that listens for incoming HTTP requests and completes a three-way handshake with the HTTP request. A connected descriptor, connectfd, is then obtained by the accept() operation.

The two file descriptors here, both existing in our system at this point, continue to be used by sockFD to perform listening for HTTP requests. We use connectfd to communicate with connected clients. There is usually one SOckFD versus multiple ConnectFd.

Read the chapters on Socket programming in Unix Network Programming Volume 1 for more details

Asyncio uses the Selector module for network I/O. The underlying selector module is implemented by epoll(). A synchronous I/O reuse system call (would you be surprised if asyncio uses synchronous I/O? We’ll look at the epoll function in the next section.)

Here you can read the Selector module in the Python manual to see what it does

The epoll() function has a timeout parameter that controls whether and for how long the function blocks. The mapping to a higher level is the timeout in our selector. Select (timeout) above. There is a timeout in event_loop. So I think you have a pretty good idea what we’re going to do with event_loop at 3:30 in the morning.

Asyncio’s implementation is pretty much what you’d expect. If task List is not None then our timeout=0 is non-blocking. To illustrate, we call selector. Select (timeout = 0), which immediately returns the result, and we do the same thing we did with the result, which is self._process_events(event_list). Then we continue to run task.

If our task list is None, then timeout=None. That is, set to block. At this point our code or thread will block at selector. Select (timeout = 0), in other words waiting for the function to return. The only way to do that, of course, is if you register the socket descriptor to wait for in the selector.

There are other questions, such as how asynchronous mysql is implemented on top of Asyncio, which may require reading the AIomysQL library.

You might notice that once we use event_loop to implement a single-threaded asynchronous server, all of the code we write is taken away from us and given to event_loop, which runs the task at the appropriate time. If you have read Liao Xuefeng’s Python tutorial, you must have seen this sentence

This is one of the principles of asynchronous programming: once you decide to use asynchrony, every layer of the system must be asynchrony.

This is asynchronous programming.

You may have a lot of questions about asyncio’s role, or use, or code implementation, and SO do I. But I’m sorry, I’m not very familiar with Python and have never done any projects with Asyncio, so I just learned about Python asynchronous I/O out of curiosity.

I’m an armchair layman, and in the end I didn’t see the implementation of the Asyncio library. I don’t plan to do any more research on the Asyncio library in the future, but I’m not willing to let these two days of research on the Asyncio library go to waste. So I leave this blog, is a confession to their own! I hope I can write another in-depth analysis of Python-Asyncio on the premise of knowing more about Python and Asyncio next time.

A brief introduction to the Python asynchronous I/O library asyncio

There are a few more questions here.

Related Posts

The “front end material package” delves into JavaScript scope (chain) knowledge and closures

From realisation to the user after the melee, Douyin Kuaishou overseas war will be more intense?

Nacos 1.1.4 was released, the first in the industry to support the Istio MCP protocol