1. The epoll concept

Poll system calls mainly solve the limitation of the number of file descriptors compared to SELECT, but do not solve the fundamental problem in high concurrency scenarios:

  1. The whole fd array is copied between kernel space and user space
  2. Traversing the entire FD array for events wastes resources

Both performance issues were addressed in Banga’s 1999 paper A Scalable and Explicit Event Delivery Mechanism for UNIX, which proposed that both SELECT and poll were stateless, Processes that require user space to traverse for events themselves, and an improvement is to maintain the event collection internally. With a system call like declare_interest, the kernel can incrementally update the list of events the process is interested in. The application process can dispatch new events to the kernel by using the get_next_event call.

According to the research results of the paper, LINUX and FreeBSD provide their own solutions: epoll and Kqueue. We’ll focus on epoll, since the everyday server environment is LINUX.

Epoll is not supported until the LINUX kernel 2.6 onwards.

2. Epoll function

The epoll operation has three functions

#include <sys/epoll.h>

int epoll_create(int size);

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Copy the code
  1. epoll_createTo create aepoll fdAfter LINUX2.6.8, red and black trees are used to manage epoll event tables, so size does not have much effect
  2. epoll_ctlYou can add socket read and write events to the epoll event table created above
  3. epoll_waitSimilar to what we had beforeselectandpollTo get the event that occurred, such as whether the socket can be read or written

The second argument to epoll_ctl uses three macros to represent the action:

  • EPOLL_CTL_ADD: Registers a new FD into an EPFD
  • EPOLL_CTL_MOD: Modification Suggestion The listening event of the registered FD
  • EPOLL_CTL_DEL: Deletes a fd from an EPFD

The fourth parameter tells the kernel what event to listen for. The epoll_event structure is as follows

struct epoll_event { __uint32_t events; // epoll event epoll_data_t data; // User data variable}Copy the code

Epoll events are similar to the previous poll events, mainly the three:

  • EPOLLIN: The corresponding FD is readable
  • EPOLLOUT: The corresponding FD is writable
  • EPOLLERR: indicates the fifth day of fd sending

3. Epoll of actual combat

First of all, MacOS is bSD-based, so there is no epoll function. It is developed on Mac and compiled on LINUX. Same old 5 steps

  1. Create a socket to listen to ports
  2. The FD bound port of the socket
  3. The socket listening
  4. Create epoll fd
  5. Issue epoll_WAIT to get the event and use epoll_ctl to manage the event
#include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/epoll.h> #define MAX_FD_NUM 1024 #define MAXLEN 1024 int buf_len = 0; Int listenfd = socket(AF_INET, SOCK_STREAM, 0); If (listenfd == -1) printf(" Failed to create socket, error: %s (errno: %d)\n", strError (errno), errno); // 2. Unsigned short listenPort = 8090; struct sockaddr_in server_addr; memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_addr.s_addr = htonl(INADDR_ANY); server_addr.sin_port = htons(listenPort); int on = 1; If ((setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int))) < 0) {exit(1); } int bindRet = bind(listenfd, (struct sockaddr *) &server_addr, sizeof(server_addr)); If (bindRet == -1) {printf(" Socket binding failed, error: %s (errno: %d)\n", strerror(errno), errno); exit(1); Int listenRet = listen(listenfd, 10); If (listenRet == -1) printf(" Socket listening on port failed, error: %s (errno: %d)\n", strError (errno), errno); Printf ("socket listening completed, address: 127.0.0.1:%d", listenPort); struct sockaddr_in client_addr; socklen_t client_addr_len = sizeof(struct sockaddr_in); // 4. Create an EPFD and register listenfd with the EPFD. int epfd = epoll_create(1024); struct epoll_event ev,events[20]; ev.data.fd = listenfd; ev.events = EPOLLIN; epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev); int cur_fd_num = 1; char buf[MAXLEN]={0}; While (1) {// 5. Call epoll_wait to fetch IO events. // nReady is the length of events array. int nready = epoll_wait(epfd, events, 20, 50); int i = 0; for (; i < nready; i++) { if (events[i].data.fd == listenfd) { int client_sockfd = accept(listenfd,(struct sockaddr*)&client_addr,&client_addr_len); if(client_sockfd < 0) { perror("accept"); } else { printf("accept client_addr %s\n",inet_ntoa(client_addr.sin_addr)); ev.data.fd = client_sockfd; ev.events=EPOLLIN; epoll_ctl(epfd, EPOLL_CTL_ADD, client_sockfd, &ev); } } else if (events[i].events & EPOLLIN) { int connfd = events[i].data.fd; int n = recv(connfd, buf, MAXLEN, 0); if(n <= 0) { if(ECONNRESET == errno) { close(connfd); epoll_ctl(epfd, EPOLL_CTL_DEL, connfd, 0); } else { perror("recv"); } } printf("receive %s", buf); buf_len = n; ev.data.fd = connfd; ev.events = EPOLLOUT; epoll_ctl(epfd, EPOLL_CTL_MOD, connfd, &ev); } else if (events[i].events & EPOLLOUT) { int connfd = events[i].data.fd; write(connfd, buf, buf_len); ev.data.fd = connfd; ev.events = EPOLLIN; epoll_ctl(epfd, EPOLL_CTL_MOD, connfd, &ev); } } } return 0; }Copy the code

Compile with GCC and run

$ gcc -o epoll ./epoll.c
$ ./epoll
Copy the code

Then use the NC test

4. Epoll summary

Epoll solves two performance problems left over from the SELECT and Poll era, eliminating the need to copy the entire FD array back and forth between kernel and user space, and eliminating the need for application processes to traverse the entire FD array to find events. Epoll uses Mmap to speed up kernel and user-space messaging, avoiding unnecessary memory copies. Epoll only returns active socket FDS, so I/O efficiency does not decrease significantly as the number of FDS increases.

Epoll also supports ET(edge triggering) and LT(horizontal triggering), which won’t be covered in detail.