Recently, I have been interested in Nginx source Code. With the help of the powerful VS Code, I have started to explore Nginx step by step, like the devil. For more information on how VS Code can debug Nginx, see VS Code Debugging Nginx easily.

A quote.

In fact, Nginx does not need to do too much introduction. As a well-known high-performance server in the industry, it is applied by the majority of Internet companies. Tegine of Ali is developed based on Nginx.

Nginx is basically used for load balancing, reverse proxy, and static/static separation. Most companies today use Nginx as a load balancer. As LBS, the most basic requirement is to support high concurrency, after all, all requests are forwarded through it.

So why does Nginx have such great concurrency? That’s what I’m interested in, and that’s what this article is about. But with the title “Getting Your Hands dirty with Nginx Multi-process Architecture,” is this a simple source code analysis?

These days research Nginx process, I often immersed in Nginx complex source code, not its solution, although also turned over some information and books, but always feel not get to the essence, is as if have understood, but for the specific process and details, always vague. So while taking advantage of the weekend, spent a small half day, again comb the next Nginx multi-process event source code, modeled to write a common Server, although the code and functions are very simple, but just suitable for readers to understand Nginx, and not into the jungle, I do not know the direction.

2. Traditional Web Server architecture

Let’s think about what you would do if you were to build a Web server.

Step one, listen on the port

Step two, process the request

Listening to the port is simple, how to handle the request? I do not know when everyone started to learn C language in the university, the teacher has not arranged the chat room and so on? At that time I actually completely rely on Baidu to complete: open port monitoring, the loop to receive requests, each received a request directly open a new thread to deal with.


This is certainly possible, and very simple, and completely satisfied my homework requirements at that time. In fact, many Web servers, such as Tomcat, also do this, allocating a separate thread for each request. So what’s the downside of this?

The most immediate drawback is that too many threads are open, causing the CPU to constantly switch contexts between threads. Each task switch of CPU needs to save some context information (such as register value) for the last task, and then load the context information of the new task, which is not a small overhead.

The second drawback is the decline of CPU utilization, considering the current only one thread, when the thread is waiting for the network IO is actually in the blocking state, this time the CPU is in idle state, which directly leads to the CPU has not been fully used, it is a waste of food!

This architecture makes Web servers inherently incapable of carrying high concurrency!

Nginx multi-process architecture

Nginx can support high concurrency precisely because it eliminates the multi-threaded architecture of traditional Web servers and takes full advantage of the CPU.

Nginx adopts a single-master, multi-worker architecture. As the name implies, the Master is the boss, while the Worker is the real working class.

Let’s take a look at the general architecture of how Nginx receives requests.


At first glance, it looks like a traditional Web Server, except that the Thread on the right becomes the Worker. That’s the beauty of Nginx.

After the Master process starts, N Worker processes will be forked. N is configurable. Generally speaking, it can be set to the number of server cores.

Each Worker process listens to requests from clients and processes them. Different from traditional Web servers, Worker processes do not allocate a separate thread to handle each request, but make full use of the characteristics of asynchronous IO.

If you’re not familiar with or using asynchronous IO before, it’s time to recharge your batteries. Looper in Android and Netty, the famous open source library in Java, are based on asynchronous I/O. The biggest difference between asynchronous I/O and synchronous I/O is that the process will not be blocked while waiting for THE I/O operation. Instead, it can do other tasks. The operating system actively notifies the process.

Nginx uses this idea. Although there are many requests to handle at the same time, there is no need to allocate a thread for each request. Whichever network I request is IO Ready, I will go to the processing, so that is not ok? Why create a thread and wait around.

Take an inappropriate example, the server is like a school, and the client is like a student, who will ask the teacher if there is no question.

  • For the traditional Web server, each student, the school will send a teacher to service, a school may have thousands of students, that is not to hire thousands of teachers, school leaders are afraid even wages are not sent out. Come to think of it, every student can’t ask questions all the time. What does the teacher do when the students are resting? You don’t work for free.
  • With Nginx, it doesn’t give teachers a chance to be idle. The school has several offices and employs several teachers. When a student asks a question, one teacher is assigned to answer it.

Some readers here may wonder, what if one of the students keeps hogging the teacher? Wouldn’t the teacher have a chance to answer other students’ questions? If as a Web server is responsible for the business processing, Nginx does this architecture may appear such problem, but remember, Nginx is primarily used for load balancing, he receives the request, is the main task of the forward request, so it will request business processing is then forwarded to other server, then receive use asynchronous I/o, Forwarding also use asynchronous IO not on the line.

Source code analysis

Based on the latest version 1.15.5

4.1 Overall operation mechanism

It all starts with main ().

There’s a lot of logic in nginx’s main () method, but two things stand out for what I want to talk about today:

  1. Create a socket, listen on the port;
  2. Fork N Worker processes.

There’s not much logic in listening on ports, so let’s look at the birth of Worker processes:

static void ngx_start_worker_processes(ngx_cycle_t *cycle, ngx_int_t n, ngx_int_t type) { ngx_int_t i; ngx_channel_t ch; . for (i = 0; i < n; i++) { ngx_spawn_process(cycle, ngx_worker_process_cycle, (void *) (intptr_t) i, "worker process", type); . }}Copy the code

The Woker process is created using ngx_spawn_process (). The second parameter ngx_worker_process_cycle is the new starting point for the child process.

static void ngx_worker_process_cycle(ngx_cycle_t *cycle, void *data) { ...... for ( ;; ) {... ngx_process_events_and_timers(cycle); . }}Copy the code

The above code omits some of the logic, leaving only the core. Ngx_worker_process_cycle, as its name suggests, starts an endless loop within it, calling ngx_process_events_and_timers ().

void ngx_process_events_and_timers(ngx_cycle_t *cycle) { ...... if (ngx_use_accept_mutex) { if (ngx_accept_disabled > 0) { ngx_accept_disabled--; } else { if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) { return; }... }}... (void) ngx_process_events(cycle, timer, flags); . }Copy the code

Ngx_process_events () is finally called to receive and process the event.

Ngx_process_events () points to different asynchronous I/O processing modules on different platforms, such as epoll on Linux, whereas on Mac OS it points to ngx_kqueue_process_events () in the kqueue module.

static ngx_int_t ngx_kqueue_process_events(ngx_cycle_t *cycle, ngx_msec_t timer, ngx_uint_t flags) { int events, n; ngx_int_t i, instance; ngx_uint_t level; ngx_err_t err; ngx_event_t *ev; ngx_queue_t *queue; struct timespec ts, *tp; n = (int) nchanges; nchanges = 0; . events = kevent(ngx_kqueue, change_list, n, event_list, (int) nevents, tp); . for (i = 0; i < events; i++) { ...... ev = (ngx_event_t *) event_list[i].udata; switch (event_list[i].filter) { case EVFILT_READ: case EVFILT_WRITE: ...... break; case EVFILT_VNODE: ev->kq_vnode = 1; break; case EVFILT_AIO: ev->complete = 1; ev->ready = 1; break; . }... ev->handler(ev); } return NGX_OK; }Copy the code

This is actually a fairly basic kqueue usage. At this point, we have to talk about how kqueue is used.

Kqueue relies on two apis:

// Create a kernel message queue, return the queue descriptor int kqueue(void); / / use: Register \ unregister listening event, wait for event notification // Kq, message queue descriptor created above // changelist, event to register // changelist, changelist array size // eventList, // nevents, eventList array size // timeout, waiting for the kernel to return the event timeout event, Int kevent(int kq, const struct kevent *changelist, int nchanges, struct kevent * eventList, int nevents, const struct timespec *timeout);Copy the code

Looking back at the ngx_kqueue_process_events () code above, kevent () is called to wait for the kernel to return a message and then process it. Here, message processing is mainly ACCEPT, READ, WRITE, etc.

Therefore, on the whole, the operation of Nginx event module is a process in which Worker processes wait for the kernel message queue to return event messages in an endless loop and process them.

4.2 Stampede problem

So far we have been talking about the running mechanism of a single Worker process. Is there any interaction between Worker processes?

Returning to ngx_process_events_and_timers () above, the Worker process performs an ngx_trylock_accept_mutex () operation each time ngx_process_events () is called to wait for a message, This is actually a process in which multiple Worker processes compete for the listening qualification. It is a solution designed by Nginx to solve the scare group problem.

The so-called scare group actually means that if multiple Worker processes are monitoring kernel message events at the same time, each Worker process will be woken up to accept the same request when a request comes, but only one process will accept successfully, while the other processes will accept failure and be woken up for nothing. It’s like when you wake up in the middle of the night and you don’t have anything to do with it.

To solve this problem, Nginx makes each Worker process compete for a lock before it listens for kernel events. Only the process that successfully obtains the lock can listen for kernel events, and other processes are obediently sleeping on the lock queue. When the process that acquired the lock finishes processing the Accept event, it will come back and release the lock, and all processes will compete for the lock at the same time.

In order to prevent the same process from capturing the lock every time, Nginx designed a small algorithm, using a factor ngx_accept_disabled to average the lock probability of each process, interested students can see the source code.

Build Nginx multi-process architecture

Finally to the DIY link, here I based on the MacOS platform to develop, asynchronous IO library is also used as mentioned above Kqueue.

5.1 Creating a Process Lock to qualify for listening events

Mm = (mt *) mmap (NULL, sizeof (mm) *, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANON, 1, 0). memset(mm,0x00,sizeof(*mm)); pthread_mutexattr_init(&mm->mutexattr); pthread_mutexattr_setpshared(&mm->mutexattr, PTHREAD_PROCESS_SHARED); pthread_mutex_init(&mm->mutex,&mm->mutexattr);Copy the code

5.2 Creating a Socket to Listen to ports

Int serverSock =socket(AF_INET, SOCK_STREAM, 0); if (serverSock == -1) { printf("socket failed\n"); exit(0); } struct sockaddr_in server_addr; server_addr.sin_family = AF_INET; server_addr.sin_port = htons(9999); Server_addr. Sin_addr. S_addr = inet_addr (" 127.0.0.1 "); if(::bind(serverSock, (struct sockaddr*)&server_addr, sizeof(server_addr)) == -1) { printf("bind failed\n"); exit(0); If (listen(sock, 20) == -1) {printf("listen failed\n"); exit(0); }Copy the code

5.3 Creating Multiple Worker Processes

// fork out 3 Worker processes int result; for(int i = 1; i< 3; i++){ result = fork(); if(result == 0){ startWorker(i,serverSock); printf("start worker %d\n",i); break; }}Copy the code

5.4 Start Worker processes and asynchronously listen for I/O events

Void startWorker(int workerId,int serverSock) {int kqueuefd=kqueue(); struct kevent change_list[1]; Struct kEvent event_list[1]; Array / / / / to accept events initialization required to register event EV_SET (& change_list [0], serverSock, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, 0). Pthread_mutex_lock (&mm->mutex); printf("Worker %d get the lock\n",workerId); Int nevents = kevent(kqueuefd, change_list, 1, event_list, 1, NULL); Pthread_mutex_unlock (&mm->mutex); // loop over all ready events returned for(int I = 0; i< nevents; i++){ struct kevent event =event_list[i]; If (event.ident == serverSock){// ACCEPT event handleNewConnection(kqueuefd,serverSock); }else if(event.filter == EVFILT_READ){char * MSG = handleReadFromClient(workerId,event); handleWriteToClient(workerId,event,msg); }}}}Copy the code

5.5 Enabling Multiple Client Process Tests

Void startClientId(int clientId) {int sock = socket(AF_INET, SOCK_STREAM, 0); Struct sockaddr_in serv_addr; serv_addr.sin_family = AF_INET; // Use the IPv4 address serv_addr.sin_addr.s_addr = inet_addr("127.0.0.1"); // The specific IP address serv_addr.sin_port = htons(9999); // connect(sock, (struct sockaddr*)&serv_addr, sizeof(serv_addr)); While (true) {// Send data to the server. String s = "I am Client "; s.append(to_string(clientId)); char str[60]; strcpy(str,s.c_str()); write(sock, str, strlen(str)); Char buffer[60]; if(read(sock, buffer, sizeof(buffer)-1)>0){ printf("Client %d receive : %s\n",clientId,buffer); } sleep(9); }}Copy the code

Running results:


Ha ha, basically achieved my requirements.

Demo source see:

HalfStackDeveloper/LearnNginx

6. Summary

Nginx’s high concurrency capability is due to its unique architecture design. Whether it is multi-process or asynchronous IO, it is an integral part of Nginx. Study Nginx source code is very interesting, but look at the source code and start to write is two different things, look at the source code can only understand the context, only their own knife, to really understand and use!