The proper use of concurrent programming will undoubtedly bring great performance improvements to our programs. Today I’m going to take a look at concurrent programming in Python. Before we get into concurrent programming, we first need to understand the difference between concurrent and parallel.

The first thing you need to know is that concurrency does not mean that there are multiple operations going on at the same time. Instead, it allows only one operation to occur at a given time, but threads or tasks switch between each other until they complete. As shown in the figure below:

Here we see threads and tasks corresponding to the two forms of concurrency in Python: threading and asyncio, respectively. For multiple threads, it is the operating system that controls thread switching. For asyncio, when the main program wants to switch tasks, it must be notified that the task can be switched.

In parallel, it means to execute tasks at the same time and at the same time. As shown in the figure below:

Multi-processing in Python is an implementation of parallelism in Python.

In contrast, concurrency is typically applied to scenarios with heavy I/O operations, while parallelism is typically applied to SCENARIOS with heavy CPU loads.

Single-threaded versus multi-threaded performance

Let’s compare the performance differences between single and multi-threaded.

Let’s look at the single-threaded version first.

import time def process(work): time.sleep(2) print('process {}'.format(work)) def process_works(works): for work in works: process(work) def main(): works = [ 'work1', 'work2', 'work3', 'work4' ] start_time = time.time() process_works(works) end_time = time.time() print('use {} seconds'.format(end_time - start_time)) if __name__ == '__main__': Process work1 process work2 Process work3 Process work4 use 8.016737222671509 secondsCopy the code

Single threading is the simplest and most straightforward.

First, the task list is traversed; Then proceed to the current task; Wait until the current operation is complete, and then perform the same operation for the next task until the end. We can see that it takes about 8s. The advantage of single threading is simplicity, but it is obviously inefficient, since the vast majority of the program’s time is wasted waiting for I/O (assuming time.sleep(2) is the time it takes to process IO). Let’s take a look at the version of the multithreaded implementation.

import time import concurrent.futures def process(work): time.sleep(2) print('process {} '.format(work)) def process_works(works): with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor: executor.map(process, works) def main(): works = [ 'work1', 'work2', 'work3', 'work4' ] start_time = time.time() process_works(works) end_time = time.time() print('use {} seconds'.format(end_time - start_time)) if __name__ == '__main__': Main () #### Outputs #### process work1 Process work2 Process work3 Process work4 use 2.006268262863159 secondsCopy the code

And you can see that it took me over 2 seconds, which is a four-fold increase in efficiency. Let’s examine the following code.

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
         executor.map(process, works)
Copy the code

Here we create a thread pool with a total of four threads to allocate. Excuter.map () calls the process() function concurrently for each element in Works.

Asyncio for concurrent programming

Let’s take a look at Asyncio, another implementation of concurrent programming. Asyncio is single-threaded. It has only one main thread, but can run multiple different tasks, which are controlled by an object called an Event Loop. You can think of the task here as analogous to threading in the multithreaded version.

To simplify this problem, we can assume that the task has only two states: one is the ready state; The second is the waiting state. The so-called ready state is that the task is currently idle, but ready to run at any time. In a wait state, a task is running but waiting for an external operation, such as an I/O operation, to complete. In this case, the Event Loop maintains two task lists, one for each state; A task in the ready state is selected and run until it returns control to the Event Loop. When a task returns control to the Event Loop, the Event Loop puts the task into a list of ready or waiting states based on whether it has completed, and then iterates through the waiting state list to see if they have completed. If done, put it on the list of ready states; If not, it continues on the list in the wait state. The positions of the tasks that were on the ready list remain the same because they are not running yet. Then, when all the tasks have been replaced in the appropriate list, the loop begins again: The Event loop continues to select a task from the ready list for execution… And so on until all the tasks are completed.

Let’s look at how Asyncio can be used to implement concurrent programming.

import asyncio import time async def process(work): await asyncio.sleep(2) print('process {}'.format(work)) async def process_works(works): tasks = [asyncio.create_task(process(work)) for work in works] await asyncio.gather(*tasks) def main(): works = [ 'work1', 'work2', 'work3', 'work4' ] start_time = time.time() asyncio.run(process_works(works)) end_time = time.time() print('use {} seconds'.format(end_time - start_time)) if __name__ == '__main__': Main () #### Outputs #### process work1 Process work2 Process work3 Process work4 use 2.0058629512786865 secondsCopy the code

So far, we’ve covered both concurrent programming methods in Python: multithreading and Asyncio. But when it comes to practical problems, how do we choose? In general, we should follow the following norms.

If the I/O load is high and the I/O operation is slow and requires many tasks/threads to implement together, then Asyncio is more suitable. If the I/O load is high and I/O operations are fast and only a limited number of tasks/threads are required, then multithreading is fine. Welcome to leave a message and communicate with me.