Blog.csdn.net/tycoon1988/…

Question:

There is a single-process Linux epoll server program that I recently wanted to rewrite to a multi-process version,

The main reasons are as follows: 1. The number of concurrent network requests during peak service periods is very large, and the current single-process version cannot support it. 2. Want to make full use of the multiple cpus of the server;

Listen_fd = socket(…) listen_fd = socket(…) ; Epoll_fd = epoll_create(…) ; Then fork() and each child process enters a large loop waiting for new Accept, epoll_wait(…) , handling events and so on.

Listen_fd receives a new accept request, and the operating system wakes up all of its child processes (since these processes are all epoll_wait() on the same SERVER, the operating system does not know who is responsible for accepting, so it simply wakes up all of them). However, only one process will accept successfully and the other processes will fail to accept. Foreign IT friends believe that all child processes are “Thundering Herd”, so they call this Herd.

For example, there is a McDonald’s restaurant on the street. There are four small service Windows, and there is a waiter at each window. When a new guest came in at the gate, “Welcome!” The doorbell of the restaurant door rings automatically (equivalent to the operating system catching a network event), and all four waiters raise their heads (equivalent to the operating system waking up all service processes) hoping to direct the guests to their service window. As a result, the customer ends up walking to one of the Windows, while the staff at the other three “accept” and go about their business. This phenomenon is bound to cause a waste of resources. Is there a good solution?

 

Find a solution:

I have read more than N posts and web pages on the Internet, read the source code of many excellent open source programs, and combined with my own experimental tests, and concluded as follows: 1. In the actual situation, not all the child processes will be woken up when panic occurs, but some of the child processes will be woken up. However, the awakened process still has only 1 successful Accept and all others fail, errno=EAGAIN. 2. All server programs based on Linux epoll mechanism are troubled by group problems when multi-process, including Lighttpd and Nginx programs, and the processing methods of each program are also different.

Lighttpd’s solution: ignore the shock group. The Watcher/Workers pattern is used to optimize fork() and epoll_create() locations (let each child process go to epoll_create() and epoll_wait() itself), catch and ignore errors thrown by accept(), and so on. This way, multiple Lighttpd child processes will still wake up when there is a new Accept. Nginx solution: Avoid stampedes. Specific measures are using global mutexes, every child in epoll_wait () before you go to apply for the lock, apply for to continue to process, can’t get the waiting, and set up a load balancing algorithm (when a child process quota reached 7/8 of the total amount set, is not to try to apply for a lock) to balance the task of each process. After 3, also spread Linux 2.6 x kernel, has solved the problems of the accept of jing, paper addresses static.usenix.org/event/useni… . However, the improvements described in this paper do not completely solve the scare problem in real production environments, because most multi-process server programs fork() and then perform epoll_wait(listen_fd,…). On Listen_fd, the process will wake up when there is a new accept request. The main improvement in this paper is to make accept() an atomic operation at the kernel level to avoid being called by multiple processes.

 

After various considerations, I finally choose to refer to the Watcher/Workers model of Lighttpd to achieve the multi-process epoll program I need. The core process is as follows: The main process listens on the port first, and listen_fd = socket(…) ; , setsockopt (listen_fd, SOL_SOCKET, SO_REUSEADDR,…). , setnonblocking (listen_fd), listen (listen_fd,…). . Fork (). When the maximum number of child processes is reached (it is recommended to configure the child process based on the actual number of CPU cores on the server), the main process becomes a Watcher and only performs global work such as child process maintenance and signal processing. Each child process (Worker) creates its own epoll, epoll_fd = epoll_create(…) ; , listen_fd is added to epoll_fd, and the large loop is then entered, with epoll_wait() waiting and processing the event. Note that the epoll_create() step must come after fork(). Bold idea (unrealized) : Each Worker process adopts multi-thread mode to improve the speed of socket FD processing of large loop, and consider adding mutex for synchronization if necessary, but \

\

To analyze the process of event processing, take single process and multi-process processing an HTTP request as an example. I used ngx_log_debugX() already in Nginx to insert the main event handling functions ngx_epoll_process_events() and ngx_event_process_posted(). At compile time, you need to add the “–with-debug” parameter. And specify the nginx. The inside of the conf “error_log logs/debug log debug_core | debug_event | debug_http;” . Restart nginx.

\

Work_processes 1:

1. Call ngx_epoll_add_event() twice in initialization, ngx_worker_process_init(). The first is in ngx_event_process_init(), which adds an NGX_READ_EVENT event to each listening port (in my case only port 80). The second is ngx_add_channel_event(), which adds the NGX_READ_EVENT event to socketpairs that pass between processes.

2. Call ngx_epoll_process_events() repeatedly to detect whether the monitored event has occurred. If an HTTP request comes in at this point, the epoll event is triggered. Since handler was previously set to ngx_event_accept() for each listening port, rev->handler(rev) is called within ngx_epoll_process_events(), which calls ngx_event_accept(). In this function, accept() is called, which receives the request, allocates a new connection to it, initializes the new connection, and calls the Handler of the Listening socket, ls-> Handler (c). Because ls->handler is already set in http_block() (after reading configuration) (ls->handler = ngx_http_init_connection;) , ngx_http_init_connection() is called. Inside this function, we add a read event and set its hook to ngx_http_init_request().

3. Epoll triggers a new event call to ngx_http_init_request() and continues each step of the HTTP request processing. (such as Process Request line, Process Headers, each phase, etc.)

4. Finally, the client closes the connection. Ngx_http_finalize_request () => ngx_http_finalize_connection() => ngx_http_set_keepalive(). The ngx_http_set_keepalive() function sets the event handler to ngx_http_keepalive_handler() and calls ngx_post_event() to add it to the ngx_POsted_events queue. The ngx_event_process_posted() function then processes and deletes all the events in the queue one by one. In the ngx_http_keepalive_handler() function, Call ngx_http_CLOSE_Connection () => ngx_close_connection() => ngx_del_conn(c,NGX_CLOSE_EVENT). Ngx_del_conn (), ngx_epoll_del_connection(), removes the connection that processes the request from the list of events that epoll listens for.

\

\

Multi-processes (I set work_PROCESSES 2) : Unlike single-processes, which set the epoll timer to -1, there are no events that block until the listening port receives the request. Multiprocess, on the other hand, sets an epoll_wait() timeout for each process, tries to obtain the right to accept requests on the listening port in turn, processes other events if there are no events, and blocks (* until any events occur) if there are.

1. In ngx_event_process_init(), only ngx_add_channel_event() is called to add events to socketpairs for interprocess communication, not ports for HTTP listening (to ensure that there are no more than one worker process accepting requests at the same time). After each process is forked (), The master process calls ngx_pass_open_channel() => ngx_write_channel() => sendmsg() to notify all existing processes (this triggers the receiver’s event, Calling ngx_channel_handler()

2. In ngx_process_eventS_and_timers (), a lock is used to synchronize all processes ngx_trylock_accept_mutex() and only one process can obtain the lock. The process that holds the lock calls ngx_enable_accept_Events () to add an event to listen for the port.

In ngx_epoll_process_events(), ngx_locked_post_event() adds a read event to the Accept queue(ngx_posted_accept_events). It is then processed inside ngx_event_process_posted() by calling ngx_event_ACCEPT () and adding a read event. After all accept events in the ngx_POsteD_accept_Events queue are processed, the ngx_accept_mutex lock is released, giving the other process the right to accept the request.

\

* In multi-process mode, whenever a new child is started, the master process broadcasts the channel of the new child to the socketpair channels of all other processes. This will cause the process that previously acquired the listening port (ngx_accept_mutex) to trigger an epoll event to release ngx_accept_mutex, although this happens during initialization (after which child processes generally do not communicate). It is generally not possible for two or more processes to add listening port events to epoll at the same time. In theory, however, such a design could lead to bugs in the system (for example, if one process aborts ngx_accept_mutex by sending signals to its children, and another process later retrieves ngx_accept_mutex before it does).

\

\

\

Before we get to nginx, what is a “stampede”? In simple terms, multiple threads/processes (threads and processes are not that different under Linux) wait for the same socket event, and when that event occurs, the threads/processes wake up at the same time, called a stampede. As you can imagine, many processes are rescheduled by the kernel to respond to the event, but only one process can handle the event successfully, and the rest of the process can re-sleep after the event fails (there are other options). This waste of performance is called a stampede.

\

A panic usually occurs on a server when the parent process binds a port to listen on the socket, and then forks multiple child processes, which start looping (such as accept) the socket. When a user initiates a TCP connection, multiple child processes are awakened at the same time. Then, one of the child processes accepts the new connection, and the rest fail to sleep again.

\

So, can’t we just use one process to accept new connections? Then, through message queues and other synchronous methods, other child processes can handle these new connections, so that the stampede is not avoided. Yes, a swarm is avoided, but it is inefficient because this process can only be used for accept connections. On multicore machines, there is only one process to accept, which is the programmer creating the Accept bottleneck himself. So, I still insist on multiple processes handling accept events.

\

In fact, the accept system call no longer exists on the linux2.6 kernel (at least not on my 2.6.18 version). You could write a simple program, bind,listen, and fork the parent process, all of which accept the listener handle. This way, when a new connection comes in, you will notice that only one of the child processes returns to the new connection, and the other child processes continue to sleep on the Accept call without being woken up.

\

Unfortunately, our program is usually not that simple and will not block on accept calls. We have a lot of other network read and write events to deal with, and under Linux we like to use epoll to resolve non-blocking sockets. So, even if the accept call doesn’t have a surprise group, we still have to deal with the surprise group because epoll has this problem. If we use epoll_wait instead of blocking accept, we will see that all child processes wake up after epoll_wait when a new connection is connected.

\

All nginx worker processes start using epoll_wait to handle new events (on Linux). If, without any protection, a new connection comes in, Multiple worker processes wake up after epoll_wait and find that they failed to accept. Now we can see how Nginx handles this scare problem.

\

Each worker process in nginx handles events in the function ngx_process_eventS_and_timers, (void) ngx_process_events(cycle, timer, flags); Encapsulating different event handling mechanisms, epoll_wait calls are encapsulated by default on Linux. Let’s take a look at what ngx_process_eventS_and_timers does to resolve the scare:

\

\

\

\

Void ngx_process_EVENTS_and_timers (ngx_cycle_T *cycle) {… . //ngx_use_accept_mutex indicates whether accept needs to be locked to resolve the stampede problem. If (ngx_use_accept_mutex) {//ngx_accept_disabled indicates full load and no need to process new connections. In nginx.conf, we set the maximum number of connections that each Nginx worker process can handle. When the maximum number reaches 7/8, ngx_accept_disabled is positive, indicating that the nginx worker process is very busy and will not process new connections. If (ngx_accept_disabled > 0) {ngx_accept_disabled–; } else {// Get the accept lock, only one of multiple workers can get this lock. Obtaining the lock is not a blocking process, but is returned immediately. Ngx_accept_mutex_held is set to 1 on success. If the lock is obtained, the listener handle is placed in the epoll of the process. If the lock is not obtained, the listener handle is removed from the epoll. if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) { return; } // Set the flag to NGX_POST_EVENTS, which means that any events in ngx_process_events will be processed later. Accept events will be placed in the ngx_POsteD_accept_events list. Epollin | epollout events in ngx_posted_events list if (ngx_accept_mutex_held) {flags | = NGX_POST_EVENTS; } else {// If the lock is not obtained, the handle will not be processed. This timer is actually the timeout time for epoll_wait. If the timer is changed to the maximum ngx_accept_mutex_delay, the epoll_wait will return a shorter timeout. In order to avoid a new connection for a long time not be processed if (timer = = NGX_TIMER_INFINITE | | timer > ngx_accept_mutex_delay) {timer = ngx_accept_mutex_delay; }}}}… . // Under Linux, call ngx_epoll_process_events to start processing (void) ngx_process_events(cycle, timer, flags); . . // If the ngx_POsteD_accepT_events list has data, If (ngx_POsted_accept_events) {ngx_event_process_posted(cycle, & ngx_POsted_accept_events); If (ngx_accept_mutex_held) {ngx_shmtx_unlock(&ngx_accept_mutex); } if (delta) { ngx_event_expire_timers(); } ngx_log_debug1(NGX_LOG_DEBUG_EVENT, cycle->log, 0, “posted events %p”, ngx_posted_events); // Normal data read/write requests are processed. Because these requests take a long time, the NGX_POST_EVENTS flag in ngX_PROCESS_Events places the events in the NGX_POsteD_Events linked list and delays processing until the lock is released. if (ngx_posted_events) { if (ngx_threaded) { ngx_wakeup_worker_thread(cycle); } else { ngx_event_process_posted(cycle, &ngx_posted_events); }}}

\