preface

The most basic TCP Socket programming, it is blocking I/O model, basically can only one-to-one communication, that in order to serve more clients, we need to improve the network I/O model. The more traditional approach is to use the multi-process/thread model, which allocates a process/thread for each client connection and then keeps all subsequent reads and writes in the corresponding process/thread. This works fine with 100 clients, but when the number of clients increases to 10,000, Scheduling 10,000 processes/threads, context switching, and the amount of memory they consume can become bottlenecks. To solve this problem, I/O multiplexing has emerged

First, traditional Socket communication

Traditional communication can only be one-to-one, the invocation process is the simplest and most basic, it can only be one-to-one communication, because the use of synchronous blocking, when the server has not finished processing a client’s NETWORK I/O, or when the read and write operation is blocked, other clients can not connect to the server.

How to deal with more requests

The drawbacks of the above traditional request are already well known, so how to deal with this situation? Obviously the multi-process model, which assigns one process to each client, also does not require context switching. Once the connection is complete, accept() returns a “connected Socket.” At this point, a child process is created using the fork() function, in effect making a copy of everything related to the parent process. This includes file descriptors, memory address space, program counters, code executed, and so on.

Because the child process copies the parent process’s file descriptor, it can directly use the connected Socket to communicate with the client. It can be found that the child process does not need to care about listening sockets, only needs to care about connected sockets. The parent process, on the other hand, hands off customer service to the child process, so the parent process doesn’t need to care about connected sockets, only listening sockets.

This method of using multiple processes to deal with multiple clients is feasible in dealing with a small number of clients, but when the number of clients is as high as 10,000, it will certainly not be able to withstand, because each process will occupy a certain amount of system resources, and the performance of inter-process context switch will be greatly compromised. After all, context switching for each process includes not only user-space resources such as virtual memory, stack, global variables, but also kernel-space resources such as stack and register.

Three, using multi-threaded model

As mentioned above, the multi-process model is not worth the loss in the case of a large number of processes because of the large volume of processes. Therefore, the multi-thread model can be adopted, because threads are small relative to processes, and the performance of context switching between threads is still considerable. Single process can run multiple threads, with the process of the thread can be Shared in the process of part of the resource, such as file descriptor list, process space, code, global data, heap, Shared libraries, etc., these Shared resources in a context switch is not need to switch, and you just need to switch threads of private data, register, etc. Do not share the data, Therefore, the overhead of a thread context switch within the same process is much lower than that of a process.

In combination with the process model shown above, you can easily understand the following: fork a multithread. There is a slight problem, however, that you cannot create a thread every time a connection is made, and then destroy it when it is closed. Since thread creation and closure is also a performance drain, we can use itThe thread poolSolve this problem. A thread pool creates threads in advance so that when a new connection is created, the connected Socket is put into a queue, and the threads in the pool are responsible for fetching the connected Socket process from the queue.

Disadvantages of multithreaded model: C10K

Since the thread pool is global, every time a connection comes in, a thread will be applied, so reaching C10K means a machine will have to maintain 10,000 connections, which is equivalent to maintaining 10,000 processes/threads, and the operating system will not be able to sustain it even if it is dead. Here is a simple mention, not to introduce C10K, in fact, now C100K will have. Baidu by yourself

I/O multiplexing

In fact, IO multiplexing technology is to solve the above disadvantages, through a process to maintain multiple client connections. What’s the interpretation here? If you say a request comes in, although we can only handle one request per process, we can actually shorten this time, for example, to 1 millisecond, so that we can handle 1000 requests per second. Just like our single-core CPU processing multiple tasks, when the single-core failed to dual-core…

The familiar Select /poll/epoll kernel provides user-mode multiplexing system calls where a process can retrieve multiple events from the kernel via a single system call function. How does select/poll/epoll get network events? To retrieve events, all connections (file descriptors) are passed to the kernel, which returns the connection that generated the event, and then processes the connection requests in user mode. Select /poll/epoll These are the three multiplexing interfaces, can all implement C10K? Next, let’s talk about each of them.

Fifth, the select/poll

Select implements the multiplexer by placing all Socket connections into a set of file descriptors, and then calling select to copy the set of file descriptors into the kernel. The kernel then iterates through the set and marks the Socket as readable or writable when it detects an event. The entire set of file descriptors is then copied back into the user state, which then iterates to find readable or writable sockets and processes them.

For select, the file descriptor set is traversed twice, once in kernel state and once in user state, and copied twice, first from user space into kernel space, modified by the kernel, and then out into user space. Poll implements a multiplexer in the same way as SELECT, except that select has a length constraint and uses a fixed-length BitsMap. Poll does not. Poll uses a dynamic array, which overcomes select’s limit on the number of file descriptors and, of course, system file descriptors.

Poll and SELECT are not fundamentally different. Both poll and SELECT use a “linear structure” to store the set of sockets that a process cares about, so both need to traverse the set of file descriptors to find readable or writable sockets in O(n) time. It is also necessary to copy the set of file descriptors between user-mode and kernel-mode, which degrades performance exponentially with the number of concurrent requests.

Six, epoll

Epoll solves the select/poll problem in two ways.

1. Epoll uses the red-black tree in the kernel to track all file descriptors to be detected, and adds the socket to be monitored into the red-black tree in the kernel by the epoll_ctl() function. Red-black tree is an efficient data structure, and the general time complexity of adding and deleting is O(logn). By operating on this black mantree, you don’t need to pass in the entire socket set for each operation like SELECT /poll, but only need to pass in one socket to be detected, reducing the amount of data copy and memory allocation in kernel and user space.

2. Epoll uses event-driven mechanism. A linked list is maintained in the kernel to record ready events. When an event occurs on a socket, the kernel adds it to the list through the callback function. It does not need to poll and scan the whole socket set like select/poll, which greatly improves the detection efficiency.

In epoll mode, even if the number of monitored sockets is larger, the efficiency is not greatly reduced, and the number of sockets that can be monitored at the same time is also very large. The upper limit is the maximum number of file descriptors that can be opened by the system defined process. Therefore, epoll is regarded as a sharp tool to solve the C10K problem.

Epoll supports two event trigger modes edge triggered (triggered, ET) level triggered (triggered, LT)

• In edgetrigger mode, the server wakes up only once from epoll_wait when a readable event occurs on the monitored Socket descriptor, even if the process does not call read from the kernel. Therefore, the program must ensure that the kernel buffer is read all at once. • In horizontal trigger mode, when a readable event occurs on the monitored Socket, the server constantly wakes up from epoll_wait until the kernel buffer has been read to tell us that there is data to read;

For example, if your delivery is placed in a delivery box, if the box will only notify you by SMS once, even if you never pick it up, it will not send you a second SMS to remind you, this method is edge trigger; If the delivery box finds that your package has not been taken out, it will continuously text you until you take out the package, it will not stop, this is the way of horizontal trigger.

This is the difference between the two. Horizontal triggering means that as long as the event conditions are met, such as there is data in the kernel that needs to be read, the event will continue to be passed to the user; An edge trigger means that the event is triggered only when the condition is met the first time, and the same event is never passed after that.

Using the horizontal trigger mode, when the kernel notifies the file descriptor that it is read or write, it can then continue to check its state to see if it is still readable or writable. So there is no need to perform as many reads and writes as possible at once after being notified.

In edge-triggered mode, we are only notified once when an I/O event occurs, and we do not know how much data can be read or written, so we should read or write as much data as possible after receiving the notification so as not to miss the opportunity to read or write. Therefore, we loop to read and write data from the file descriptor, so if the file descriptor is blocked, when there is no data to read or write, the process will block at the read-write function, and the program can’t proceed. Thus, edge-triggered mode is typically used in conjunction with non-blocking I/O, and the program will keep performing I/O until the system calls (such as read and write) return an error of type EAGAIN or EWOULDBLOCK.

In general, edge triggering is more efficient than horizontal triggering because edge triggering can reduce the number of system calls to epoll_wait. System calls also have some overhead, as there are also context switches.

Select /poll has only horizontal trigger mode. Epoll has horizontal trigger mode by default, but it can be set to edge trigger mode based on application scenarios.

To put it simply, the events returned by the multiplexing API are not necessarily read-write. If you use blocking I/O, the program blocks when you call read/write, so it’s best to use non-blocking I/O to handle the rare special cases.

conclusion

The most basic TCP Socket programming, it is blocking I/O model, basically can only one-to-one communication, that in order to serve more clients, we need to improve the network I/O model.

The more traditional approach is to use the multi-process/thread model, which allocates a process/thread for each client connection and then keeps all subsequent reads and writes in the corresponding process/thread. This works fine with 100 clients, but when the number of clients increases to 10,000, Scheduling 10,000 processes/threads, context switching, and the amount of memory they consume can become bottlenecks.

In order to solve this problem, I/O multiplexing has emerged. It is possible to process I/O of multiple files in a single process. Linux has three APIS that provide I/O multiplexing: SELECT, poll, and epoll.

Select and Poll are not fundamentally different in that they both use a “linear structure” to store a collection of sockets that a process cares about.

In the use of the system, the concerned Socket set needs to be copied from the user state to the kernel state by the SELECT /poll system call, and then the kernel detects the event. When a network event occurs, the kernel needs to iterate through the process to focus on the Socket set and find the corresponding Socket. And set its state to readable/writable, and then copy the entire Socket set from the kernel state to the user state, the user state must continue to traverse the entire Socket set to find readable/writable sockets, and then process them.

Obviously, the drawback of select and poll is that when the number of clients is larger, that is, the Socket set is larger, the traversal and copy of the Socket set will bring a lot of overhead, so it is difficult to cope with C10K.

Epoll is the solution to the C10K problem. It solves the SELECT /poll problem in two ways.

Epoll uses a “red-black tree” in the kernel to focus on the sockets that need to be detected by the process. The red-black tree is an efficient data structure, and the general time complexity of adding and deleting is O(logn). By managing this black tree, it does not need to pass in the whole Socket set like select/poll for each operation. Reduces the amount of data copy and memory allocation in kernel and user space. Epoll uses the event-driven mechanism. A “linked list” is maintained in the kernel to record ready events. Only the Socket set with event occurrence is passed to the application program. Also, epoll supports both edge and horizontal triggers, while SELECT /poll supports only horizontal triggers. In general, edge triggers are more efficient than horizontal triggers.

Write at the end, thanks for the likes

Welcome to follow my wechat official account [Village of the Apes]

To talk about Java interview and my wechat further communication and learning, wechat mobile search [codeYuanzhicunup] can be added if there are related technical problems welcome to leave a message to discuss, the public number is mainly used for technology sharing, including often meet test analysis, as well as source code interpretation, micro service framework, technical hot spots, etc..