1. The epoll concept
Poll system calls mainly solve the limitation of the number of file descriptors compared to SELECT, but do not solve the fundamental problem in high concurrency scenarios:
- The whole fd array is copied between kernel space and user space
- Traversing the entire FD array for events wastes resources
Both performance issues were addressed in Banga’s 1999 paper A Scalable and Explicit Event Delivery Mechanism for UNIX, which proposed that both SELECT and poll were stateless, Processes that require user space to traverse for events themselves, and an improvement is to maintain the event collection internally. With a system call like declare_interest, the kernel can incrementally update the list of events the process is interested in. The application process can dispatch new events to the kernel by using the get_next_event call.
According to the research results of the paper, LINUX and FreeBSD provide their own solutions: epoll and Kqueue. We’ll focus on epoll, since the everyday server environment is LINUX.
Epoll is not supported until the LINUX kernel 2.6 onwards.
2. Epoll function
The epoll operation has three functions
#include <sys/epoll.h>
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Copy the code
epoll_create
To create aepoll fd
After LINUX2.6.8, red and black trees are used to manage epoll event tables, so size does not have much effectepoll_ctl
You can add socket read and write events to the epoll event table created aboveepoll_wait
Similar to what we had beforeselect
andpoll
To get the event that occurred, such as whether the socket can be read or written
The second argument to epoll_ctl uses three macros to represent the action:
EPOLL_CTL_ADD
: Registers a new FD into an EPFDEPOLL_CTL_MOD
: Modification Suggestion The listening event of the registered FDEPOLL_CTL_DEL
: Deletes a fd from an EPFD
The fourth parameter tells the kernel what event to listen for. The epoll_event structure is as follows
struct epoll_event { __uint32_t events; // epoll event epoll_data_t data; // User data variable}Copy the code
Epoll events are similar to the previous poll events, mainly the three:
- EPOLLIN: The corresponding FD is readable
- EPOLLOUT: The corresponding FD is writable
- EPOLLERR: indicates the fifth day of fd sending
3. Epoll of actual combat
First of all, MacOS is bSD-based, so there is no epoll function. It is developed on Mac and compiled on LINUX. Same old 5 steps
- Create a socket to listen to ports
- The FD bound port of the socket
- The socket listening
- Create epoll fd
- Issue epoll_WAIT to get the event and use epoll_ctl to manage the event
#include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/epoll.h> #define MAX_FD_NUM 1024 #define MAXLEN 1024 int buf_len = 0; Int listenfd = socket(AF_INET, SOCK_STREAM, 0); If (listenfd == -1) printf(" Failed to create socket, error: %s (errno: %d)\n", strError (errno), errno); // 2. Unsigned short listenPort = 8090; struct sockaddr_in server_addr; memset(&server_addr, 0, sizeof(server_addr)); server_addr.sin_family = AF_INET; server_addr.sin_addr.s_addr = htonl(INADDR_ANY); server_addr.sin_port = htons(listenPort); int on = 1; If ((setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int))) < 0) {exit(1); } int bindRet = bind(listenfd, (struct sockaddr *) &server_addr, sizeof(server_addr)); If (bindRet == -1) {printf(" Socket binding failed, error: %s (errno: %d)\n", strerror(errno), errno); exit(1); Int listenRet = listen(listenfd, 10); If (listenRet == -1) printf(" Socket listening on port failed, error: %s (errno: %d)\n", strError (errno), errno); Printf ("socket listening completed, address: 127.0.0.1:%d", listenPort); struct sockaddr_in client_addr; socklen_t client_addr_len = sizeof(struct sockaddr_in); // 4. Create an EPFD and register listenfd with the EPFD. int epfd = epoll_create(1024); struct epoll_event ev,events[20]; ev.data.fd = listenfd; ev.events = EPOLLIN; epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev); int cur_fd_num = 1; char buf[MAXLEN]={0}; While (1) {// 5. Call epoll_wait to fetch IO events. // nReady is the length of events array. int nready = epoll_wait(epfd, events, 20, 50); int i = 0; for (; i < nready; i++) { if (events[i].data.fd == listenfd) { int client_sockfd = accept(listenfd,(struct sockaddr*)&client_addr,&client_addr_len); if(client_sockfd < 0) { perror("accept"); } else { printf("accept client_addr %s\n",inet_ntoa(client_addr.sin_addr)); ev.data.fd = client_sockfd; ev.events=EPOLLIN; epoll_ctl(epfd, EPOLL_CTL_ADD, client_sockfd, &ev); } } else if (events[i].events & EPOLLIN) { int connfd = events[i].data.fd; int n = recv(connfd, buf, MAXLEN, 0); if(n <= 0) { if(ECONNRESET == errno) { close(connfd); epoll_ctl(epfd, EPOLL_CTL_DEL, connfd, 0); } else { perror("recv"); } } printf("receive %s", buf); buf_len = n; ev.data.fd = connfd; ev.events = EPOLLOUT; epoll_ctl(epfd, EPOLL_CTL_MOD, connfd, &ev); } else if (events[i].events & EPOLLOUT) { int connfd = events[i].data.fd; write(connfd, buf, buf_len); ev.data.fd = connfd; ev.events = EPOLLIN; epoll_ctl(epfd, EPOLL_CTL_MOD, connfd, &ev); } } } return 0; }Copy the code
Compile with GCC and run
$ gcc -o epoll ./epoll.c
$ ./epoll
Copy the code
Then use the NC test
4. Epoll summary
Epoll solves two performance problems left over from the SELECT and Poll era, eliminating the need to copy the entire FD array back and forth between kernel and user space, and eliminating the need for application processes to traverse the entire FD array to find events. Epoll uses Mmap to speed up kernel and user-space messaging, avoiding unnecessary memory copies. Epoll only returns active socket FDS, so I/O efficiency does not decrease significantly as the number of FDS increases.
Epoll also supports ET(edge triggering) and LT(horizontal triggering), which won’t be covered in detail.