Basic principles of IO read and write
User programs actually rely on the Linux kernel read() and write() functions to perform IO operations
Instead of reading data directly from the network card into user memory, the call to the read() function copies data from the kernel buffer into the user buffer
The call to write() does not write data directly to the network card, but instead writes data from the user buffer to the kernel buffer
Data is read from and written to the kernel buffer by the operating system kernel
Blocking IO and non-blocking IO
The network adapter synchronizes data to the kernel buffer. If the data in the kernel buffer is not ready, the user process initiates a read operation. The blocked process waits until the data in the memory buffer is complete
Read and write operations between the kernel buffer and the user buffer are definitely blocked
Synchronous and Asynchronous
Synchronization: The caller actively initiates the request, the caller actively waits for the result to return, and once the call is made, it must have a return value
Asynchronous: The call is returned directly after it is issued, so no result is returned. Notification callback, notification and other mechanisms to notify the caller when the processing is complete
Synchronous blocking IO
Read data flow
- The user process calls the read() system function, and the user process is blocked
- The system kernel receives the read() system call, the network adapter begins to prepare to receive data, at first the kernel buffer data is empty, the kernel is waiting for the data to be received, and the user process synchronously blocks and waits
- Once the kernel buffer has complete data, the kernel copies the data from the kernel buffer to the user buffer
- The user process cannot unblock until there is data in the user buffer
Synchronization blocks the underlying IO implementation
/ / create a socket
int listenfd = socket(AF_INET, SOCK_STREAM, 0);
/ / binding
bind(listenfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
/ / to monitor
listen(listenfd, 5);
// Accept the client connection
int socketFd = accept(listenfd, (struct sockaddr*) &clientaddr, &clientaddrlen)
// Receive the client data
recv(socketFd, buf, 256.0);
Copy the code
Advantages and disadvantages of synchronous blocking IO
Advantages:
- Development is simple, since accept() and recv() are both blocked, in order to service multiple client requests, a new connection can be created and processed by a thread
- When blocked, the thread hangs and consumes no CPU resources
Disadvantages:
- Each new I/O request needs to create a new thread, high concurrency system overhead, multithreaded context switching frequent
- Creating too many threads consumes too much memory
Thinking about the disadvantages of synchronous blocking IO
Since the accept() and recv() functions are both blocked, if the system wants to support multiple IO requests, it creates more threads. How about solving this problem?
If we could make the accept and recv functions non-blocking, wouldn’t we be able to avoid creating multiple threads? This introduces our synchronous non-blocking IO
Synchronize non-blocking IO
Read data flow
- The user process makes a request to call the read() function, the system kernel receives the read() system call, and the network adapter starts preparing to receive data
- The kernel buffer data is not ready, the request is returned immediately, and the user process keeps retrying to see if the kernel buffer data is ready
- When the kernel buffer data is ready, the user process blocks and the kernel starts copying the kernel buffer data to the user buffer
- After the replication is complete, the user process is unblocked and data continues to be read
Synchronizing the underlying non-blocking IO implementation
/ / create a socket
int listenfd = socket(AF_INET, SOCK_STREAM, 0);
/ / binding
bind(listenfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
/ / to monitor
listen(listenfd, 5);
// Set to non-blocking
ioctl(listenfd, FIONBIO, 1);
// Accept the client connection
int socketFd = accept(listenfd, (struct sockaddr*) &clientaddr, &clientaddrlen);
// Set to non-blocking
ioctl(socketFd, FIONBIO, 1);
while (1) {
int fd;
// Loop traversal
for (fd : fds) {
// Receive the client data
recv(fd, buf, 256.0); }}Copy the code
Advantages and disadvantages of synchronizing non-blocking IO
Advantages:
- Non-blocking, neither Accept () nor recv() blocks, and the user thread returns immediately
- Avoids the multithreading problem of synchronous blocking mode
Disadvantages:
- Let’s say I have 10,000 clients connected and only one client is sending data. In order to get the message from that one client, I need to loop 10,000 recv() system calls to the kernel, and 9999 of those requests are invalid, wasting CPU resources
Thoughts on the drawbacks of synchronizing non-blocking IO
Against disadvantages of synchronous non-blocking IO, imagine if the kernel provides a method, can blow 10000 client incoming socket connections, in the kernel to traverse, if there is no data this method has been blocked, but with the data of this method is unblocked and put all the data socket returns, the traversing the process of the hands of the kernel, Is it possible to avoid the empty run, to avoid 10,000 user mode to kernel mode switch?
IO multiplexing model
What is IO multiplexing?
One thread monitors multiple IO operations
IO multiplexing principle
The IO multiplexing model is based on the select function provided by the kernel. Using the SELECT function can avoid the polling waiting problem in the synchronous non-blocking IO model. That is, N client socket connections are passed to the kernel at one time and then blocked. When an event occurs on one or more socket connections, the block is unblocked and the list of events is returned. The user process loops through the socket connections that handle the event. This avoids calling recv() multiple times and switching from user mode to kernel mode.
Three implementations of IO multiplexing
The select function
The select function only knows how many I/O events have occurred, but it does not know which sockets have I/O events, and it needs to poll to find them. The complexity is O(n). The more requests processed, the longer the time.
The process of executing the select function
- Copy fD_set (set of registered events) from user space to kernel space
- All fd files are traversed, and the current process is attached to the waiting queue of each FD. When a fd file device receives a message, it wakes up the sleeping process on the waiting queue of the device, so the current process will be woken up
- If there is no I/O event after traversing all FDS, the current process goes to sleep. When an I/O event occurs on a FD file or the current process’s sleep times out, the current process wakes up and traverses all FDS again
Select function interface definition
#include <sys/select.h>
#include <sys/time.h>
// A maximum of 1024 connections are supported
#define FD_SETSIZE 1024
#define NFDBITS (8 * sizeof(unsigned long))
#define __FDSET_LONGS (FD_SETSIZE/NFDBITS)
/** * Data structure (bitmap) * fd_set holds related socket events */
typedef struct {
unsigned long fds_bits[__FDSET_LONGS];
} fd_set;
/** * select is a blocking function
// Return the number of value ready descriptors
int select(
int max_fd, // Max file descriptor value, 0-max_fd for traversal
fd_set *readset, // Read the event list
fd_set *writeset, // Write the event list
fd_set *exceptset, // Exception list
struct timeval *timeout // Block timeout
)
FD_ZERO(int fd, fd_set* fds) // Clear the collection
FD_SET(int fd, fd_set* fds) // Add the given descriptor to the set
FD_ISSET(int fd, fd_set* fds) // Determines whether the specified descriptor is in the collection
FD_CLR(int fd, fd_set* fds) // Delete the given descriptor from the file
Copy the code
Select Example
#include<stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
void server(a) {
// Create a socket connection
int lfd = socket(AF_INET,SOCK_STREAM,0);
struct sockaddr_in my_addr;
bzero(&my_addr, sizeof(my_addr));
my_addr.sin_family = AF_INET; // ipv4
my_addr.sin_port = htons(9090);
my_addr.sin_addr.s_addr = htonl(INADDR_ANY);
// Bind the port
bind(lfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
// Listen for connection requests
listen(lfd, 128);
printf("listen client @port=%d... \n".9090);
int lastfd = lfd;
// Define the file descriptor set
fd_set read_fd_set, all_fd_set;
// Add the service socket descriptor to the set
FD_ZERO(&all_fd_set);
FD_SET(lfd, &all_fd_set);
printf("Ready to enter the while loop \n");
while (1) {
read_fd_set = all_fd_set;
printf("In the block... lastfd=%d\n", lastfd);
int nready = select(lastfd+1, &read_fd_set, NULL.NULL.NULL);
switch (nready) {
case 0 :
printf("select time out ...... \n");
break;
case - 1 :
perror("select error \n");
break;
default:
// Listen for new client connections
if (FD_ISSET(lfd, &read_fd_set)) {
struct sockaddr_in client_addr;
socklen_t cliaddr_len = sizeof(client_addr);
char cli_ip[INET_ADDRSTRLEN] = "";
// There must be a connection that does not block
int clientfd = accept(lfd, (struct sockaddr*)&client_addr, &cliaddr_len);
inet_ntop(AF_INET, &client_addr.sin_addr, cli_ip, INET_ADDRSTRLEN);
printf("----------------------------------------------\n");
printf("client ip=%s,port=%d\n", cli_ip, ntohs(client_addr.sin_port));
// Add clientfd to the read set
FD_SET(clientfd, &all_fd_set);
lastfd = clientfd;
if(0 == --nready) {
continue; }}int i;
for (i = lfd + 1; i <= lastfd; i++) {// Handle read events
if (FD_ISSET(i, &read_fd_set)) {
char recv_buf[512] = "";
int rs = read(i, recv_buf, sizeof(recv_buf));
if (rs == 0 ) {
close(i);
FD_CLR(i, &all_fd_set);
} else {
printf("%s\n",recv_buf);
// Write data to each server
int j;
for (j = lfd + 1; j <= lastfd; j++) {if(j ! = i) { write(j, recv_buf,strlen(recv_buf));
}
}
}
}
}
}
}
}
int main(a){
server();
return 0;
}
Copy the code
Disadvantages of the select function
- The number of FD’s that can be opened by a single process is limited
FD_SETSIZE
The default value is 1024 - Each time select is called, the collection of FDS needs to be copied from user mode to kernel mode, which can be costly with many FDS
- Each call to SELECT requires the process to be added to all wait queues that monitor sockets, and each wake requires the process to be removed from each queue
- The select function has to reset its parameters before each call, which is cumbersome and degrades performance
- After the process is awakened, the program does not know which sockets received the data and needs to iterate over it once more
poll
Poll is essentially the same as select. It copies the array passed in by the user into kernel space and then queries the device state for each FD, but it does not have a maximum number of connections because it is stored based on a linked list
Poll function interface
#include <poll.h>
// Data structure
struct pollfd {
int fd; // File descriptors to monitor
short events; // Events that require kernel monitoring
short revents; // Actual events: 1: event occurred, 0: no event occurred
};
// Block method
int poll(struct pollfd fds[], // List of file descriptors to listen to
nfds_t nfds, // Number of file descriptors
int timeout // Timeout period
);
Copy the code
Poll sample
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <poll.h>
#include <unistd.h>
#include <sys/time.h>
#define MAX_POLLFD_LEN 4096
#define PORT 9108
void server(a) {
// Create a socket connection
int lfd = socket(AF_INET,SOCK_STREAM,0);
struct sockaddr_in my_addr;
bzero(&my_addr, sizeof(my_addr));
my_addr.sin_family = AF_INET; // ipv4
my_addr.sin_port = htons(PORT);
my_addr.sin_addr.s_addr = htonl(INADDR_ANY);
// Bind the port
bind(lfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
// Listen for connection requests
listen(lfd, 128);
printf("listen client @port=%d... \n",PORT);
// Define the pollfd object
struct pollfd fds[MAX_POLLFD_LEN];
memset(fds, 0.sizeof(fds));
// Add the socket service listener
fds[0].fd = lfd;
fds[0].events = POLLIN;
int nfds = 1;
int i;
for(i = 1; i < MAX_POLLFD_LEN; i++) {
fds[i].fd = - 1;
}
int maxFds = 0;
printf("Ready to enter the while loop \n");
while (1) {
printf([maxFds=%d]... \n", maxFds);
int nready = poll(fds, maxFds + 1.- 1);
switch (nready) {
case 0 :
printf("select time out ...... \n");
break;
case - 1 :
perror("select error \n");
break;
default:
// Listen for new client connections
if (fds[0].revents & POLLIN) {
struct sockaddr_in client_addr;
socklen_t cliaddr_len = sizeof(client_addr);
char cli_ip[INET_ADDRSTRLEN] = "";
// There must be a connection that does not block
int clientfd = accept(lfd, (struct sockaddr*)&client_addr, &cliaddr_len);
inet_ntop(AF_INET, &client_addr.sin_addr, cli_ip, INET_ADDRSTRLEN);
printf("----------------------------------------------\n");
printf("client ip=%s,port=%d\n", cli_ip, ntohs(client_addr.sin_port));
// Add clientfd to the read set
int j;
for (j = 1; j < MAX_POLLFD_LEN; ++j) {
if (fds[j].fd < 0) {
fds[j].fd = clientfd;
fds[j].events = POLLIN;
printf("Client added successfully... \n");
maxFds++;
break;
}
if(j == MAX_POLLFD_LEN){
printf("too many clients");
exit(1); }}if(--nready <= 0) {
continue; }}int i;
printf("maxFds=%d\n", maxFds);
for (i = 1; i <= maxFds; i++) {
printf("i=%d\n", i);
// Handle read events
if (fds[i].revents & POLLIN) {
int sockfd = fds[i].fd;
char recv_buf[512] = "";
int rs = read(sockfd, recv_buf, sizeof(recv_buf));
if (rs == 0) {
close(sockfd);
fds[i].fd = - 1;
} else {
printf("%s\n",recv_buf);
// Write data to each server
int j;
for (j = 1; j <= maxFds; j++) {if(j ! = i) { write(fds[j].fd, recv_buf,strlen(recv_buf));
}
}
}
}
}
}
}
}
int main(a){
server();
return 0;
}
Copy the code
epoll
Epoll can be understood as an event pool. Different from the polling mechanism of SELECT and poll, ePoll uses an event-driven mechanism. Each FD has a callback function registered on it.
When epoll_WAIT is called to check for an event, you simply check for an EPItem element in the RDList double-linked list in the EventPoll object. If rdList is not empty, the events that occurred are copied to the user mode and the number of events is returned to the user.
Interface definition of the epoll function
#include <sys/epoll.h>
// Data structure
// Each ePoll object has a separate EventPoll structure
// Used to store the events added to the epoll object by the epoll_ctl method
// Epoll_WAIT checks whether an event has occurred by checking whether there is an EPItem element in the RDList double linked list in the EventPoll object
struct eventpoll {
/* The root node of the red-black tree, which stores all events added to epoll to be monitored */
struct rb_root rbr;
/* The double-linked list contains events that meet the criteria to be returned to the user via epoll_wait */
struct list_head rdlist;
};
// API
// Add an EP object in the middle of the kernel, and put all the sockets that need to be listened on into the EP object
int epoll_create(int size);
// epoll_ctl is responsible for adding and deleting sockets to the kernel red-black tree
int epoll_ctl(int epfd, // The created EP object
int op, // Add or delete the operation type
int fd, // The object to operate on
struct epoll_event *event / / event
);
// epoll_wait detects the readable queue and blocks the process if there is no readable socket
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Copy the code
Epoll executes the process
- The call to epoll_create() creates an EP object, the root node of the red-black tree, and returns a file handle
- Call epoll_ctl() to add, delete, and modify events of interest to this EP object (red-black tree)
- Call epoll_wait() and when an event occurs, the driver will call the function registered on the FD and add the FD to the RDList, unblocking
The sample code
#include<stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/epoll.h>
void server(a) {
// Create a socket connection
int lfd = socket(AF_INET,SOCK_STREAM,0);
struct sockaddr_in my_addr;
bzero(&my_addr, sizeof(my_addr));
my_addr.sin_family = AF_INET; // ipv4
my_addr.sin_port = htons(8088);
my_addr.sin_addr.s_addr = htonl(INADDR_ANY);
// Bind the port
bind(lfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
// Listen for connection requests
listen(lfd, 128);
printf("listen client @port=%d... \n".8088);
int epct, i;
struct epoll_event event;
struct epoll_event events[100].
memset(events, 0.100 * sizeof(struct epoll_event));
int epfd = epoll_create(1);
event.data.fd = lfd;
event.events = EPOLLIN;
epoll_ctl(epfd, EPOLL_CTL_ADD, lfd, &event);
while (1) {
printf("Blocked.... \n");
int nready = epoll_wait(epfd, events, 20.- 1);
int i;
for (i = 0; i < nready; ++i) {
// Listen for new client connections
if (events[i].data.fd == lfd) {
struct sockaddr_in client_addr;
socklen_t cliaddr_len = sizeof(client_addr);
char cli_ip[INET_ADDRSTRLEN] = "";
// There must be a connection that does not block
int clientfd = accept(lfd, (struct sockaddr*)&client_addr, &cliaddr_len);
inet_ntop(AF_INET, &client_addr.sin_addr, cli_ip, INET_ADDRSTRLEN);
event.data.fd = clientfd;
event.events = EPOLLIN | EPOLLET;
epoll_ctl(epfd, EPOLL_CTL_ADD, clientfd, &event);
printf("----------------------------------------------\n");
printf("client ip=%s,port=%d\n", cli_ip, ntohs(client_addr.sin_port));
} else {
char recv_buf[512] = "";
int rs = read(events[i].data.fd, recv_buf, sizeof(recv_buf));
if (rs < 0) {
close(events[i].data.fd);
epoll_ctl(epfd, EPOLL_CTL_DEL, events[i].data.fd, &event);
continue;
}
printf("%s\n",recv_buf); }}}}int main(a){
server();
return 0;
}
Copy the code
Epoll summary
- The maximum number of file descriptors supported by EPOLL is the maximum number of files that can be opened in the entire system. In theory, a maximum of 100,000 file descriptors can be created with 1 GB of memory
- Each file descriptor has a callback function that is called when an event occurs on the socket, adding a reference to that fd to the ready list. Select and poll do not specify which file descriptor is ready, while epoll does. The difference is that after the system call returns, the program calling select and Poll has to go through the entire listener’s file descriptor to find out who is ready, whereas epoll just does it
- Select and poll use polling to check whether the file descriptor is in the ready state, while epoll uses callback mechanism. As a result, the efficiency of select and poll decreases linearly as the number of FDS increases, while epoll is not affected much unless there are many active sockets