How should programmers understand coroutines in high concurrency?

1. Introduction to series of articles

1.1 Purpose

As a developer of instant messaging technology, the concept of high performance and high concurrency has long been familiar to you, such as thread pool, zero copy, multiplexing, event-driven, epoll, etc., or perhaps you are familiar with the technical framework with these characteristics, such as: Proficient in Java Netty, Php Workman, Go GNET, etc. But when it comes to face or technical practice in the process of the doubt can not let go, we know that what we have mastered is only the surface.

Getting back to basics, what are the underlying principles behind these technical features? How to truly understand the principles behind these technologies in an easy-to-understand, effortless way is what the Understanding High Performance, High Concurrency at the root series is all about.

1.2 Origin of the article

I’ve compiled quite a few resources and articles on IM, messaging push, and other IM technologies, from the very beginning of the open source IM framework MobileIMSDK, to the online version of the classic network programming book TCP/IP In Detail, to the IM development guide article One Entry is enough: Develop mobile TERMINAL IM from zero, and network programming from shallow to deep “Network programming lazy introduction”, “Brain-dead Network programming introduction”, “high-performance network programming”, “Unknown network programming” series of articles.

The deeper I go into knowledge, the less I know about instant messaging technology. So later, in order to give developers a better understanding of the features of the web (especially mobile) from the perspective of basic telecom technologies, I put together a cross-specialty series of advanced articles called “Introduction to Zero-based Communication Technologies for IM Developers.” This series of articles is already the network communication technology knowledge boundary of the common im developer, add these network programming data before, solve the network communication knowledge blind spot basically sufficient.

Knowledge of network communication is indeed important for the development of systems such as IM, but getting back to the nature of the technology, what is the nature of implementing the technical characteristics of network communication itself: thread pooling, zero copy, multiplexing, event-driven, etc., as mentioned above? What about the underlying principles? That’s the purpose of this series, and I hope it will be useful to you.

1.3 Article Catalogue

Understanding High Performance, High Concurrency at the Root (1) : Understanding Threads and Thread Pools at the bottom of computers

Understanding High Performance, High Concurrency from the Root (2) : Deep operating Systems, UNDERSTANDING I/O and Zero-copy Technology

Understanding High Performance, High Concurrency at the Root (3) : Inside the Operating System for a Thorough Understanding of I/O Multiplexing

Understanding High Performance, High Concurrency from the Root (4) : Deep in the Operating System, Fully Understand Synchronization and Asynchronous

“Understanding High Performance, High Concurrency at the Root (5) : Understanding Coroutines in Operating Systems at High Concurrency” (* article)

Understanding High Performance, High Concurrency at the Root (6) : How high Concurrency High-performance Servers are implemented (to be released later..)

1.4 Overview

This is the fifth article in our series on high performance and high concurrency, following on from “Inside the Operating System for a Thorough Understanding of Synchronization and Asynchronous.”

Coroutines is essential to high performance high concurrent programming technology, including instant messaging (IM), the Internet widely used in the product application products, such as claiming to support WeChat mass user background frame is based on coroutines build (see the “open source libco library: single must connect, support WeChat background frame foundation of 800 million users). And more and more modern programming languages regard coroutines as the most important technical feature of the language, known to include: Go, Python, Kotlin, etc.

Therefore, it is necessary to understand and master coroutine technology for many programmers (especially the back-end programmers of massive network communication applications). This article is written for you to clarify the coroutine technology principle.

2. The author of this article

At the request of the author, no real name is provided, and no personal photos are provided.

The main technical direction of the author is the Internet back-end, high concurrency high-performance server, search engine technology, the net name is “code farmer’s desert island survival”, the public number “code farmer’s desert island survival”. Thanks for the author’s selfless sharing.

3. Introduction to the text

As a programmer, you’ve probably heard the term coroutines at some point or another, but the technology has been popping up more and more in recent years, especially in the high-performance, concurrent world. If your mind goes blank when a classmate or colleague mentions a coroutine…

Then this article is made for you.

Without further comment, today’s topic is how thoroughly you, as a programmer, understand coroutines.

4. Ordinary functions

Let’s start with an ordinary function, which is very simple:

def func():

print(“a”)

print(“b”)

print(“c”)

This is a simple, ordinary function, what happens when we call this function?

  • 1) call func;
  • 2) Func starts execution until return;
  • 3) Func completes, returns function A.

The function func executes until it returns and prints:

a

b

c

So easy!

Very good!

Note that this code is written in Python, but this discussion of coroutines applies to any language, since coroutines are not specific to a particular language. We just happen to use Python as an example because it’s simple enough.

So what is a coroutine?

5. From ordinary functions to coroutines

Next, we are going to transition from ordinary functions to coroutines. Coroutines can have multiple return points, unlike ordinary functions that have only one return point.

What does that mean?

void func() {

print(“a”)

Pause and return

print(“b”)

Pause and return

print(“c”)

}

In a normal function, the function returns only when print(“c”) is finished, but in a coroutine, when print(“a”) is finished, func returns to the calling function because of the “pause and return” code.

Some of you might be confused, but what’s so amazing about this?

I can also write a return, like this:

void func() {

print(“a”)

return

print(“b”)

Pause and return

print(“c”)

}

It is possible to write a return statement, but none of the code following the return will be executed.

The magic of coroutines is that we can continue calling the coroutine when we return from it, and continue execution from the last return point of the coroutine.

It’s like the Monkey King saying “fix” and the function is suspended:

void func() {

print(“a”)

set

print(“b”)

set

print(“c”)

}

At this point we can return to the calling function, and when the calling function remembers the coroutine, we can call the coroutine again, and the coroutine will continue from the previous return point.

Amazing, have you? Stay focused. Don’t flip.

In other languages, there may be different implementations, but the essence is the same.

It is important to note that when a normal function returns, the address space of the process does not hold any runtime information of the function. When a coroutine returns, the runtime information of the function needs to be saved.

Next, let’s take a look at coroutines in real code.

6, “Talk is cheap, show me the code”

Let’s use a real life example. The language is Python, so if you’re not familiar with it, there’s no barrier to understanding.

In Python, this “definite” word also uses the keyword yield.

So our func function becomes:

void func() {

print(“a”)

yield

print(“b”)

yield

print(“c”)

}

** Note: func is no longer a simple function, but a coroutine, so how do we use it?

Is simple:

def A():

Co =func() # gives the coroutine

Next (co) # Calls the coroutine

print(“in function A”) # do something

Next (co) # calls the coroutine again

We can see that even though func does not have a return statement, we can still write code like co = func(), which means that co is the coroutine we get.

Next we call the coroutine, using next(co), and run function A to see what happens on line 3:

a

Obviously, as expected, the coroutine func pauses after print(“a”) for yield and returns function a.

Next is line 4, which is no doubt that A is doing something of its own and will print:

a

in function A

Now the important line, what is printed when the coroutine is called again at line 5?

If func were a normal function, the first line of func would be executed, printing a.

But func is not a normal function, it’s a coroutine, and as we said before, a coroutine continues at the last yield, so what should be executed here is the code after the first yield of the func function, i.e

print(“b”)

.

a

in function A

b

You see, a coroutine is an amazing function that remembers its previous execution state, and when called again, it will continue from the last return point.

7. Graphical interpretation

To give you a more thorough understanding of coroutines, let’s look at them again graphically.

First is the normal function call:

** In the figure: ** The boxes represent the sequence of instructions for the function. If the function does not call any other functions, it should be executed from top to bottom, but other functions can be called from within the function, so its execution is not simply top to bottom. The arrow lines indicate the direction of the flow of execution.

** From the figure above, we can see: ** We first go to funcA, and after some time, we find that another function, funcB, is called. At this point, control is transferred to this function, and we return to the call point of main. This is a normal function call.

Next comes the coroutine:

** Here it is: ** We still execute in funcA first, run for a period of time and then call the coroutine, which starts executing until the first threshold, and then returns funcA like a normal function. FuncA executes some code and calls the coroutine again.

Note: * * * * coroutines and normal function is not the same at this moment, coroutines does not come from the first instruction execution but perform from the last time hang the starting point, to perform the second hang a starting point, after a period of time to meet again then coroutines like ordinary function returns funcA function, funcA function performs a period of time after the end of the entire program.

Functions are only a special case of coroutines

How about that? Amazing. Unlike normal functions, coroutines know where they were last executed.

By now you get the idea that coroutines save the running state of a function when it is suspended, and can recover from that saved state and continue running.

Have familiar smell, the operating system scheduling of threads is well (see the thorough understanding the underlying computer, thread and thread pool “), a thread can be suspended, the operating system keep running state and then to thread scheduling other threads, then the thread is assigned when the CPU can also continue to run again, just like not been suspended.

However, the scheduling of threads is implemented by the operating system, which is not visible to the programmer, while coroutines are implemented in user mode, which is visible to the programmer.

That’s why some people say you can think of coroutines as user-mode threads.

There should be applause.

So now that the programmer can play the role of the operating system, you can control when the coroutine runs and when it stops, which means that the scheduling of the coroutine is in your own hands.

In the case of coroutines, you’re the scheduler.

When you write yield in a coroutine, you want to pause the coroutine when you use

next()

Is to run the coroutine again.

Now you can see why a function is just a special case of a coroutine, a function is just a coroutine with no starting point.

History of coroutines

Some of you may think that coroutines are a relatively new technology, but coroutines were first introduced in 1958, before the concept of threads was even introduced.

In 1972, the concept was finally implemented in two programming languages, Simula 67 and Scheme.

But the coroutine concept never caught on, and even in 1993 there were archaeological papers written to dig up the ancient technique.

Because there were no threads at this time, if you wanted to write concurrent programs in the operating system you would have to use techniques like coroutines, and then threads started to come along, and operating systems finally began to natively support concurrent execution of programs, and coroutines faded from programmers’ attention.

In recent years, with the development of the Internet, especially the advent of the mobile Internet era, the requirements of high concurrency on the server are becoming higher and higher. Coroutine has once again returned to the mainstream of technology. All major programming languages have supported or plan to support coroutine.

So how does a coroutine actually work?

10. How is a coroutine implemented?

** Let’s think about this in terms of the nature of the problem: ** What is the nature of coroutines?

Functions that can be paused and can be resumed. So what does it mean to be suspended and to be resumed?

Seen classmates must know basketball game (not seen can also know), the basketball game is also can be suspended at any time, where you need to remember when the ball is suspended, what is their position, when the game continue when everyone back to their own position, the referee whistle rang the game continue, just like the game not been suspended.

** See the point: ** Play can be stopped or continued because the state of play is recorded (stance, who is on the ball). The state is known in computer science as the context.

Back to the coroutine.

If a coroutine can be paused or continued, it is important to record its state (context) when it is paused, and to restore its context (state) when it continues. In addition, all state information from a function when it is running is located in the function run time stack.

The function runtime stack is the state we need to save, known as the context.

As shown in the figure:

** There is only one thread in the process, and there are four stack frames in the stack area. Main calls A, A calls B, and B calls C. When C is running, the state of the process is as shown in the figure.

** Now: ** Now that we know that the runtime state of a function is stored in a stack frame in the stack area, here’s the important thing.

Since the runtime state of a function is stored in a stack frame in the stack area, we must store the data for the entire stack frame if we want to pause the coroutine. Where should we store the data for the entire stack frame?

** Consider this question: ** Which part of the entire process memory is dedicated to storing data over a long period of time (process life cycle)? Did your mind go blank again?

Don’t go blank yet!

** This is the heap. We can store the stack frames in the heap. How do we store the data in the heap? Hopefully you’re not freaking out yet, but opening up space in the heap is something that we use a lot, malloc in C or new in C++.

** All we need to do is: ** apply for a space in the heap, and then save the entire stack of the coroutine. When we need to restore the coroutine to run, we will copy out of the heap to restore the running state of the function.

Think again, why do we have to bother to copy data back and forth?

** Actually: ** What we need to do is to directly open up the stack frame space required by the coroutine to run directly in the heap, so that there is no need to copy data back and forth, as shown in the figure below.

** From the figure above we can see: ** The program has two coroutines open, both of which have stacks allocated on the heap, so that we can interrupt or resume execution at any time.

Now, some of you may ask, what is the function of the stack at the top of the process address space now?

The answer is: ** this area is still used to hold stack frames of functions, but these functions are not running in coroutines but in normal threads.

Now you can see that there are actually three execution flows in the figure above:

  • 1) a common thread;
  • 2) Two coroutines.

There are three execution flows but how many threads have we created?

** The answer is: ** a thread.

* * now you should understand why use coroutines now: * * using coroutines in theory we can open a myriad of concurrent execution flow, as long as enough heap area space, also did not create a thread at the same time, the overhead of all coroutines scheduling, switch occurs in user mode, which is why coroutines thread is also known as the user mode.

Where’s the applause?

Thus: even if you create N multicoroutines, there is still only one thread from the OS’s point of view, which means the coroutine is invisible to the OS.

This may be why coroutines the cause of the earlier than the thread is put forward, may be write ordinary application programmer to write an operating system than programmers met first require multiple parallel flow demand, then may be no operating system, the concept of or operating system is not parallel the demand, so the operating system programmer can only achieve myself execution flow, That’s the coroutine.

Now you should have a clear idea of coroutines.

11. Coroutine technical concept summary

The content of the text is used in a playful tone, the purpose is to help you to understand the coroutine technology concept in a relaxed and humorous way. So, to conclude from serious expertise, what exactly is a coroutine?

11.1 Coroutines are smaller units of execution than threads

A coroutine is a smaller unit of execution than a thread, which you can think of as a lightweight thread.

One of the reasons is that coroutines hold a much smaller stack than threads. In Java, each thread is allocated about 1M of stack space, while coroutines may only have tens or hundreds of K. The stack is mainly used to hold function parameters, local variables and return address information.

While thread scheduling is done in the operating system, coroutine scheduling is done in user space by the developer by calling the system’s underlying execution context related APIS. Some languages, such as NodeJS and Go, support coroutines at the language level, while some languages, such as C, require the use of third-party libraries to have coroutine capabilities (for example, wechat open source Libco library is such, see: Open source Libco Library: The Foundation of the Background Framework supporting 800 million users of wechat with tens of thousands of Connections).

Since thread is the smallest execution unit of the operating system, it can also be concluded that coroutine is implemented based on thread, and the creation, switching and destruction of coroutine are carried out in a certain thread.

Coroutines are used because the cost of switching threads is high, and coroutines have an advantage in this area.

11.2 Why is coroutine switching cheap?

On this issue, let’s review the process of thread switching:

  • 1) When a thread switches, it needs to store the information in the registers of the CPU and then read the data of another thread, which takes some time;
  • 2) Data in the CPU cache may fail and need to be reloaded;
  • 3) Thread switching involves switching from user mode to kernel mode, which is said to be time-consuming with thousands of instructions executed each time.

In fact, I think the main reasons why coroutine switching is fast are:

  • 1) When switching, the amount of data to be saved and loaded in registers is relatively small;
  • 2) Cache can be used effectively;
  • 3) There is no switch from user mode to kernel mode;
  • 4) More efficient scheduling, because the coroutine is non-preemptive, the previous coroutine will be finished or blocked, while the thread generally use the time slice algorithm, will make a lot of unnecessary switches (in order to make the user as unaware of a thread card).

12. Put it at the end

At this point, I believe you have understood the coroutine exactly what is the matter, about the coroutine more systematic knowledge can be referred to the relevant information, I no longer verbose.

Stay tuned for the next article, “Understanding High Performance and High Concurrency at the root (6) : How to achieve high Concurrency and High Performance servers”. (This article was published simultaneously at www.52im.net/thread-3306…)