1. Introduction to a series of articles

1.1 Purpose

As an INSTANT messaging developer, you already know the concepts of high performance, high concurrency, and thread pooling, zero copy, multiplexing, event-driven, epoll, and so on. Or you might be familiar with a technical framework that features these features: Proficient in Java Netty, Php Workman, Go Nget, etc. But when it comes to the real face to face or in the process of technical practice, it is impossible to let go of the doubts, only to know that what they have mastered is just the surface.

What is the underlying principle behind these technical features? Understanding the fundamentals of high Performance, High Concurrency at its roots is what this series of articles is all about: understanding the fundamentals of high Performance, High Concurrency.

1.2 Article origin

I sorted out quite a lot of resources and articles related to IM, message push and other instant communication technologies, from the open source IM framework MobileIMSDK at the beginning, to the online version of the classic book of network programming “TCP/IP Details”, and then to the programmatic article of IM development “A Beginner’s introduction is enough: From zero development of mobile TERMINAL IM, as well as network programming from shallow to deep “Network programming lazy introduction”, “Brain disability network programming introduction”, “high-performance network programming”, “unknown network programming” series of articles.

The deeper I went, the less I knew about instant messaging technology. Later, in order to help developers better understand the characteristics of the network (especially mobile network) from the perspective of basic telecom technology, I collected a series of advanced articles titled “Introduction to Zero-Base Communication Technology for IM Developers”. This series of articles is already at the edge of the network communication technology knowledge of the average IM developer, and together with the network programming materials, it is almost enough to solve the knowledge blind spots in network communication.

Knowledge of network communication is certainly important for the development of systems such as instant messaging, but what is the nature of getting back to the technical essentials of network communication itself: thread pooling, zero copy, multiplexing, event-driven, and so on mentioned above? What is the underlying principle? That is the purpose of this series of articles, which I hope will prove useful to you.

1.3 Article Contents

Understanding High Performance and High Concurrency at its Roots (PART 1) : Understanding Threads and Thread Pools

Understanding High Performance and High Concurrency at its Roots (Part 2) : Understanding I/O and Zero-copy Technologies

“Understanding High Performance, High Concurrency at its Roots (PART 3) : Understanding I/O Multiplexing from the Ground up” (* This article)

Understanding High Performance, High Concurrency at its Roots (4) : Dive deep into Operating Systems to Understand Synchronization and Asynchrony (to be released later)

Understanding High Performance, High Concurrency at its roots (5) : How high Concurrency High Performance Servers Really Work (to be released later..)

1.4 Overview of this Article

Pick up after the thorough understanding the operating system, I/O and zero copy technique “, this is a high performance, high concurrency series 3 of article, last we spoke in the I/O technology, this paper will take the more representational file this topic, this paper take you step by step to understand when programming with high performance, high concurrency server is unable to avoid the I/O multiplexing and related technology.

2. The author of this article

At the author’s request, no real names or personal photos were provided.

In this paper, the author’s main technical direction is Internet backend, high concurrency and high performance server, search engine technology, the net name is “The desert island survival of The Code Nong”, the public account “the desert island survival of the code Nong”. Thanks to the author for sharing.

3. What is a document?

Before we begin this article, we need to preview the concepts of files and file descriptors.

Programmers using I/O can’t escape the concept of files.

A file is a very simple concept in the Linux world, and as programmers we only need to understand it as a sequence of N bytes:

b1, b2, b3, b4, ……. bN

Virtually all I/O devices are abstracted to the concept of files, Everything is File, disks, network data, terminals, even pipes, etc. are treated as files.

All I/O operations can also be done through file reads and writes, a very elegant abstraction that allows programmers to use a single set of interfaces to I/O all peripherals.

Common I/O operation interfaces are as follows:

  • 1) Open the file.
  • 2) Change the read/write position, seek;
  • 3) File read and write, read and write;
  • 4) Close the file.

The power of the file concept is that programmers can perform almost any I/O operation through these interfaces.

4. What is a file descriptor?

In the previous article, Understanding I/O and Zero-copy technology, we explained that to do I/O reads, such as disk data, we need to specify a buff for loading data.

It usually goes something like this:

read(buff);

But we’re missing a key point here: even though we specify where to write data, where to read data from?

As we know from the previous section, we can implement almost any I/O operation through the concept of files, so the one missing star here is files.

So how do we use files in general?

** Here’s an example: * * if the weekend you go to restaurant should have experience of fire, the general high popularity restaurant will line up over the weekend, and then the waiter will give you a line number, through the serial number attendant can find you, here is the benefits of the waiter don’t need to remember who you are, what’s your name, from where, what is be fond of, and small animals is to protect the environment protection, etc., The key point here is this: the server knows nothing about you, but can still find you with a number.

** Likewise: ** To use files in the Linux world, we also need to use a number, which is called file descriptors according to the “don’t understand rule”. This number is also known in the Linux world for the same reason as the queue number above.

** Therefore: ** file description is just a number, but we can manipulate an open file with this number, remember that.

With file descriptors, a process can know nothing about a file, such as where the file is on disk, how it is loaded into memory, and so on. All this information is left to the operating system. The operating system only needs to give the process a file descriptor.

So let’s improve the above procedure:

int fd = open(file_name); // Get the file descriptor

read(fd, buff);

How about that? It’s very simple.

5. What if there are too many file descriptors?

After so much preparation, finally to high performance, high concurrency theme.

As we saw in the previous sections, all I/O operations can be done through the file-like concept, which of course includes network communication.

If you have an IM server, we call Accept to get a link after the three-way handshake is successful. Calling this function also gives us a file descriptor that can process chat messages sent by the client and forward them to the receiver.

That is, we can communicate with the client using this descriptor:

// Get the client’s file descriptor with accept

int conn_fd = accept(…) ;

The processing logic on the Server side is usually to receive the client message data and then perform the forwarding (to the receiver) logic:

if(read(conn_fd, msg_buff) > 0) {

do_transfer(msg_buff);

}

It’s not that simple, but the world is complicated, and of course it’s not that simple.

Now comes the tricky part.

Since our topic is high concurrency, the Server cannot communicate with just one client, but may communicate with thousands of clients simultaneously. You’re not dealing with one descriptor, you’re dealing with tens of thousands of descriptors.

To avoid getting too complicated at first, let’s simplify and assume that only two client requests are processed at the same time.

Some of you might say, “Well, that’s not easy. I’ll just write it like this:

If (read(socket_fd1, buff) > 0) {// process the first

do_transfer();

}

If (read(socket_fd2, buff) > 0) {// process the second

do_transfer();

As we discussed in our last article, Understanding I/O and Zero-copy Technology, this is a very typical blocking TYPE of I/O, where processes are blocked and suspended if there is no data to read. At this point we cannot process the second request, even if the data for the second request is already in place, which means that all remaining clients must wait while processing one client because the process is blocked, while the server processes tens of thousands of clients simultaneously. This is clearly intolerable.

You are smart enough to think about using multithreading: start one thread for each client request, so that one client blocking doesn’t affect the other client threads. Note: Since concurrency is high, do we need to start thousands of threads for thousands of requests? Creating and destroying a large number of threads can seriously affect system performance.

So how to solve this problem?

** The key point here is: ** We do not know in advance whether a file’s I/O device is readable or writable, and I/O in the unreadable or writable state of the peripheral will only cause the process to block and be suspended.

So the elegant way to solve this problem is to think about it in a different way.

6. “Don’t call me, I’ll call you if I need anything.”

You’re going to get more than one cold call in your life, ten or eight cold calls a day and it’s going to be draining.

The key to this scenario is that the caller doesn’t know if you want something, so he or she can only ask you again and again. So a better strategy is to not let them call you, take down their number and call them if you need to, so that the salesman doesn’t bother you over and over again (although that’s not possible in real life).

** In this case: ** you are the kernel, the promoter is the application, the phone number is the file descriptor, and talking to you on the phone is the I/O.

As you can see by now, a better way to handle multiple file descriptors can be found in cold calls.

Therefore, rather than asking the kernel via the I/O interface if the peripherals corresponding to these file descriptors are ready, a better approach would be to throw the interested file descriptors at the kernel and tell it: “I have 10,000 file descriptors here, you monitor them for me, you let me know when there are file descriptors that I can read and write so I can handle them.” Instead of weakly asking the kernel: “Is the first file description readable and writable? Is the second file descriptor readable and writable? Is the third file descriptor ready to read and write? …”

** Thus: ** applications go from “busy” active to idle passive, anyway file description readable and writable kernel will notify me, can be lazy I would not be so diligent.

This is a much more efficient I/O processing mechanism. Now we can handle multiple I/O at once. Let’s call this mechanism I/O multiplexing.

I/O multiplexing (I/O multiplexing)

Multiplexing the word multiplexing is actually used in the field of communications, in order to make full use of the communication line, want to transmit multiplex signals in a channel, to transmit multiplex signals in a channel you need to combine the multiplex signals into one way, the multiplex signal combination into a signal equipment called Multiplexer, The apparent multiplexer needs to restore the original multiplexer after receiving the combined signal. This device is called a Demultiplexer.

As shown below:

Back to our subject.

I/O multiplexing refers to a process that:

  • 1) We get a bunch of file descriptors (network specific, disk file specific, etc., any file descriptor);
  • 2) Call a function that tells the kernel: “Don’t return from this function, you monitor the descriptors for me, and return when there are I/O operations in the file descriptors”;
  • 3) We will know which file descriptors can be used for I/O operations when the function called returns.

In other words, by I/O multiplexing we can simultaneously handle multiple I/O. So what are the functions that can be used for I/O multiplexing?

In Linux, for example, there are three mechanisms that can be used for I/O multiplexing:

  • 1) select;
  • 2) poll;
  • 3) the epoll.

Let’s take a look at the three awesome I/O multiplexers.

I/O multiplexing three musketeers

Select, poll, and epoll on Linux are blocking I/O, also known as synchronous I/O.

The reason is that if any of the file descriptors to be monitored are not readable or writable when these I/O multiplexing functions are called, the process is blocked and suspended until the file descriptors are readable or writable.

8.1 SELECT: Novice

Under the I/O multiplexing mechanism of SELECT, we need to tell SELECT the set of file descriptors we want to monitor in the form of function parameters. Select then copies the set of file descriptors into the kernel.

In order to reduce the performance loss caused by data copying, the Linux kernel sets a limit on the size of the collection and limits the number of file descriptors monitored by the user to no more than 1024. In addition, when the select returns, we only know that some file descriptors can be read or written. But we don’t know which one. So the programmer must iterate to find out which file descriptor can be read or written.

Therefore, select has the following characteristics:

  • 1) THE number of file descriptors I can look after is limited to 1024;
  • 2) The file descriptor given by the user needs to be copied in the kernel;
  • 3) I can only tell you that there are file descriptors that meet the requirements, but I don’t know which ones, you can find them one by one.

As you can see, these features of the SELECT mechanism are inefficient in scenarios where a highly concurrent network server has tens of thousands of concurrent links.

8.2 Poll: Little makes a difference

Poll and SELECT are very similar.

The optimization of poll over SELECT only addresses the limit of 1024 file descriptors. Select and poll both deteriorate as the number of monitored file descriptions increases, so they are not suitable for high concurrency scenarios.

8.3 epoll: Unique

Of the three problems that SELECT faces, the file description limit has been addressed in poll. What about the remaining two?

For copy problems: Epoll uses a strategy of fragmentation and shared memory.

** Actually: ** The file descriptor set changes so infrequently that select and poll copy the entire set so frequently that the kernel gets annoyed. Epoll has been thoughtful enough to operate only on file descriptors that change by introducing epoll_ctl. Epoll and the kernel also become good friends, sharing the same block of memory that holds the set of file descriptors that are already readable or writable, thus reducing the copying overhead of the kernel and programs.

To solve the problem of traversing file descriptors to know what is readable or writable, Epoll uses a strategy of “playing the little guy.”

With select and poll: the process itself has to go to the next file descriptor and wait for any file descriptor that is readable or writable to wake up, but when woken up, the process has no idea which file descriptor is readable or writable and has to check again from beginning to end.

But Epoll is more sensible, actively find the process to be the younger brother to stand up for the elder brother.

Epoll tells Epoll when a file descriptor is readable or writable. Epoll records the file descriptor in a small notebook and wakes it up: “Wake up, process brother, I wrote down all the file descriptors you need to deal with”, so the process doesn’t have to check it from start to finish when it wakes up, because epoll has already written it down.

So we can see: the epoll under this mechanism, in fact use is “don’t give me a call, there is need I will call you” this strategy, the process does not need to trouble again and again ask each file descriptor, but turn master — “you these file descriptors which can be read or write the active submitted”.

This mechanism is actually known as event-driven, which will be the subject of our next article.

** In reality: ** On Linux platforms, epoll is basically synonymous with high concurrency.

9. Summary of this paper

Based on the design philosophy of “everything is a file”, I/O can also be implemented in the form of files. In high concurrency scenarios, multiple files need to interact with each other, which requires efficient I/O multiplexing technology.

In this paper, we explain in detail what IS I/O multiplexing and how to use it. Among them, I/O multiplexing (event-driven) technology represented by Epoll is widely used. In fact, you will find that event-driven programming method can be seen in every scenario involving high concurrency and high performance. Of course, this is also the topic that we will focus on in the next article “From the roots of High performance, high concurrency (4) : Deep operating systems, thoroughly understand synchronization and asynchrony”, stay tuned! (This article is simultaneously published at: www.52im.net/thread-3287…)