Basic principles of IO read and write

User programs actually rely on the Linux kernel read() and write() functions to perform IO operations

Instead of reading data directly from the network card into user memory, the call to the read() function copies data from the kernel buffer into the user buffer

The call to write() does not write data directly to the network card, but instead writes data from the user buffer to the kernel buffer

Data is read from and written to the kernel buffer by the operating system kernel

Blocking IO and non-blocking IO

The network adapter synchronizes data to the kernel buffer. If the data in the kernel buffer is not ready, the user process initiates a read operation. The blocked process waits until the data in the memory buffer is complete

Read and write operations between the kernel buffer and the user buffer are definitely blocked

Synchronous and Asynchronous

Synchronization: The caller actively initiates the request, the caller actively waits for the result to return, and once the call is made, it must have a return value

Asynchronous: The call is returned directly after it is issued, so no result is returned. Notification callback, notification and other mechanisms to notify the caller when the processing is complete

Synchronous blocking IO

Read data flow
  • The user process calls the read() system function, and the user process is blocked
  • The system kernel receives the read() system call, the network adapter begins to prepare to receive data, at first the kernel buffer data is empty, the kernel is waiting for the data to be received, and the user process synchronously blocks and waits
  • Once the kernel buffer has complete data, the kernel copies the data from the kernel buffer to the user buffer
  • The user process cannot unblock until there is data in the user buffer
Synchronization blocks the underlying IO implementation
/ / create a socket
int listenfd = socket(AF_INET, SOCK_STREAM, 0);
/ / binding
bind(listenfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
/ / to monitor
listen(listenfd, 5);
// Accept the client connection
int socketFd = accept(listenfd, (struct sockaddr*) &clientaddr, &clientaddrlen) 

// Receive the client data
recv(socketFd, buf, 256.0); 
Copy the code
Advantages and disadvantages of synchronous blocking IO

Advantages:

  • Development is simple, since accept() and recv() are both blocked, in order to service multiple client requests, a new connection can be created and processed by a thread
  • When blocked, the thread hangs and consumes no CPU resources

Disadvantages:

  • Each new I/O request needs to create a new thread, high concurrency system overhead, multithreaded context switching frequent
  • Creating too many threads consumes too much memory
Thinking about the disadvantages of synchronous blocking IO

Since the accept() and recv() functions are both blocked, if the system wants to support multiple IO requests, it creates more threads. How about solving this problem?

If we could make the accept and recv functions non-blocking, wouldn’t we be able to avoid creating multiple threads? This introduces our synchronous non-blocking IO

Synchronize non-blocking IO

Read data flow
  • The user process makes a request to call the read() function, the system kernel receives the read() system call, and the network adapter starts preparing to receive data
  • The kernel buffer data is not ready, the request is returned immediately, and the user process keeps retrying to see if the kernel buffer data is ready
  • When the kernel buffer data is ready, the user process blocks and the kernel starts copying the kernel buffer data to the user buffer
  • After the replication is complete, the user process is unblocked and data continues to be read
Synchronizing the underlying non-blocking IO implementation
/ / create a socket
int listenfd = socket(AF_INET, SOCK_STREAM, 0);
/ / binding
bind(listenfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
/ / to monitor
listen(listenfd, 5);
// Set to non-blocking
ioctl(listenfd, FIONBIO, 1);
// Accept the client connection
int socketFd = accept(listenfd, (struct sockaddr*) &clientaddr, &clientaddrlen);
// Set to non-blocking
ioctl(socketFd, FIONBIO, 1);
while (1) {
    int fd;
    // Loop traversal
    for (fd : fds) {
        // Receive the client data
		recv(fd, buf, 256.0); }}Copy the code
Advantages and disadvantages of synchronizing non-blocking IO

Advantages:

  • Non-blocking, neither Accept () nor recv() blocks, and the user thread returns immediately
  • Avoids the multithreading problem of synchronous blocking mode

Disadvantages:

  • Let’s say I have 10,000 clients connected and only one client is sending data. In order to get the message from that one client, I need to loop 10,000 recv() system calls to the kernel, and 9999 of those requests are invalid, wasting CPU resources
Thoughts on the drawbacks of synchronizing non-blocking IO

Against disadvantages of synchronous non-blocking IO, imagine if the kernel provides a method, can blow 10000 client incoming socket connections, in the kernel to traverse, if there is no data this method has been blocked, but with the data of this method is unblocked and put all the data socket returns, the traversing the process of the hands of the kernel, Is it possible to avoid the empty run, to avoid 10,000 user mode to kernel mode switch?

IO multiplexing model

What is IO multiplexing?

One thread monitors multiple IO operations

IO multiplexing principle

The IO multiplexing model is based on the select function provided by the kernel. Using the SELECT function can avoid the polling waiting problem in the synchronous non-blocking IO model. That is, N client socket connections are passed to the kernel at one time and then blocked. When an event occurs on one or more socket connections, the block is unblocked and the list of events is returned. The user process loops through the socket connections that handle the event. This avoids calling recv() multiple times and switching from user mode to kernel mode.

Three implementations of IO multiplexing

The select function

The select function only knows how many I/O events have occurred, but it does not know which sockets have I/O events, and it needs to poll to find them. The complexity is O(n). The more requests processed, the longer the time.

The process of executing the select function
  • Copy fD_set (set of registered events) from user space to kernel space
  • All fd files are traversed, and the current process is attached to the waiting queue of each FD. When a fd file device receives a message, it wakes up the sleeping process on the waiting queue of the device, so the current process will be woken up
  • If there is no I/O event after traversing all FDS, the current process goes to sleep. When an I/O event occurs on a FD file or the current process’s sleep times out, the current process wakes up and traverses all FDS again
Select function interface definition
#include <sys/select.h>
#include <sys/time.h>

// A maximum of 1024 connections are supported
#define FD_SETSIZE 1024
#define NFDBITS (8 * sizeof(unsigned long))
#define __FDSET_LONGS (FD_SETSIZE/NFDBITS)

/** * Data structure (bitmap) * fd_set holds related socket events */
typedef struct {
    unsigned long fds_bits[__FDSET_LONGS];
} fd_set;

/** * select is a blocking function
// Return the number of value ready descriptors
int select(
    int max_fd,  // Max file descriptor value, 0-max_fd for traversal
    fd_set *readset,  // Read the event list
    fd_set *writeset,  // Write the event list
    fd_set *exceptset,  // Exception list
    struct timeval *timeout  // Block timeout
)

FD_ZERO(int fd, fd_set* fds)   // Clear the collection
FD_SET(int fd, fd_set* fds)    // Add the given descriptor to the set
FD_ISSET(int fd, fd_set* fds)  // Determines whether the specified descriptor is in the collection
FD_CLR(int fd, fd_set* fds)    // Delete the given descriptor from the file
Copy the code
Select Example
#include<stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>						
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>	

void server(a) {
	
	// Create a socket connection
	int lfd = socket(AF_INET,SOCK_STREAM,0);
	struct sockaddr_in my_addr; 
	bzero(&my_addr, sizeof(my_addr));
	my_addr.sin_family = AF_INET; // ipv4
	my_addr.sin_port   = htons(9090);
	my_addr.sin_addr.s_addr = htonl(INADDR_ANY); 
	// Bind the port
	bind(lfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
	// Listen for connection requests
	listen(lfd, 128);
	printf("listen client @port=%d... \n".9090);
	int lastfd = lfd;
	// Define the file descriptor set
	fd_set read_fd_set, all_fd_set;
	// Add the service socket descriptor to the set
	FD_ZERO(&all_fd_set);
	FD_SET(lfd, &all_fd_set);
	printf("Ready to enter the while loop \n");
	while (1) {
		read_fd_set = all_fd_set;
		printf("In the block... lastfd=%d\n", lastfd);
		int nready = select(lastfd+1, &read_fd_set, NULL.NULL.NULL);
		switch (nready) {
			case 0 :
				printf("select time out ...... \n");
				break;
			case - 1 :
				perror("select error \n");
				break;
			default:
				// Listen for new client connections
				if (FD_ISSET(lfd, &read_fd_set)) {
					struct sockaddr_in client_addr;	
					socklen_t cliaddr_len = sizeof(client_addr);
					char cli_ip[INET_ADDRSTRLEN] = "";	
					// There must be a connection that does not block
					int clientfd = accept(lfd, (struct sockaddr*)&client_addr, &cliaddr_len);
					inet_ntop(AF_INET, &client_addr.sin_addr, cli_ip, INET_ADDRSTRLEN);
					printf("----------------------------------------------\n");
					printf("client ip=%s,port=%d\n", cli_ip, ntohs(client_addr.sin_port));
					// Add clientfd to the read set
					FD_SET(clientfd, &all_fd_set);	
					lastfd = clientfd;
					if(0 == --nready) {
						continue; }}int i;
				for (i = lfd + 1; i <= lastfd; i++) {// Handle read events
					if (FD_ISSET(i, &read_fd_set)) {
						char recv_buf[512] = "";
						int rs = read(i, recv_buf, sizeof(recv_buf));
						if (rs == 0 ) {
							close(i);
							FD_CLR(i, &all_fd_set);
						} else {
							printf("%s\n",recv_buf);
							// Write data to each server
							int j;
							for (j = lfd + 1; j <= lastfd; j++) {if(j ! = i) { write(j, recv_buf,strlen(recv_buf));
								}
							}
						}
					}
				}
		}
		
	}
}

int main(a){
	server();
	return 0;
}
Copy the code
Disadvantages of the select function
  • The number of FD’s that can be opened by a single process is limitedFD_SETSIZEThe default value is 1024
  • Each time select is called, the collection of FDS needs to be copied from user mode to kernel mode, which can be costly with many FDS
  • Each call to SELECT requires the process to be added to all wait queues that monitor sockets, and each wake requires the process to be removed from each queue
  • The select function has to reset its parameters before each call, which is cumbersome and degrades performance
  • After the process is awakened, the program does not know which sockets received the data and needs to iterate over it once more

poll

Poll is essentially the same as select. It copies the array passed in by the user into kernel space and then queries the device state for each FD, but it does not have a maximum number of connections because it is stored based on a linked list

Poll function interface
#include <poll.h>
// Data structure
struct pollfd {
    int fd;                         // File descriptors to monitor
    short events;                   // Events that require kernel monitoring
    short revents;                  // Actual events: 1: event occurred, 0: no event occurred
};

// Block method
int poll(struct pollfd fds[],   // List of file descriptors to listen to
         nfds_t nfds,           // Number of file descriptors
         int timeout            // Timeout period
        );
Copy the code
Poll sample
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <poll.h>
#include <unistd.h>
#include <sys/time.h>


#define MAX_POLLFD_LEN 4096  
#define PORT 9108


void server(a) {
	
	// Create a socket connection
	int lfd = socket(AF_INET,SOCK_STREAM,0);
	struct sockaddr_in my_addr; 
	bzero(&my_addr, sizeof(my_addr));
	my_addr.sin_family = AF_INET; // ipv4
	my_addr.sin_port   = htons(PORT);
	my_addr.sin_addr.s_addr = htonl(INADDR_ANY); 
	// Bind the port
	bind(lfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
	// Listen for connection requests
	listen(lfd, 128);
	printf("listen client @port=%d... \n",PORT);
	
	// Define the pollfd object
	struct pollfd fds[MAX_POLLFD_LEN];
	memset(fds, 0.sizeof(fds));
	// Add the socket service listener
	fds[0].fd = lfd;
	fds[0].events = POLLIN;
	int nfds = 1;
	int i;
	for(i = 1; i < MAX_POLLFD_LEN; i++) {
		fds[i].fd = - 1;
	}
	int maxFds = 0;
	printf("Ready to enter the while loop \n");
	while (1) {
		printf([maxFds=%d]... \n", maxFds);
		int nready = poll(fds, maxFds + 1.- 1);
		switch (nready) {
			case 0 :
				printf("select time out ...... \n");
				break;
			case - 1 :
				perror("select error \n");
				break;
			default:
				// Listen for new client connections
				if (fds[0].revents & POLLIN) {
					struct sockaddr_in client_addr;	
					socklen_t cliaddr_len = sizeof(client_addr);
					char cli_ip[INET_ADDRSTRLEN] = "";	
					// There must be a connection that does not block
					int clientfd = accept(lfd, (struct sockaddr*)&client_addr, &cliaddr_len);
					inet_ntop(AF_INET, &client_addr.sin_addr, cli_ip, INET_ADDRSTRLEN);
					printf("----------------------------------------------\n");
					printf("client ip=%s,port=%d\n", cli_ip, ntohs(client_addr.sin_port));
					// Add clientfd to the read set
					int j;
					for (j = 1; j < MAX_POLLFD_LEN; ++j) {
						if (fds[j].fd < 0) {
							fds[j].fd = clientfd;
							fds[j].events = POLLIN;
							printf("Client added successfully... \n");
							maxFds++;   
							break;
						}
						if(j == MAX_POLLFD_LEN){
							printf("too many clients"); 
							exit(1); }}if(--nready <= 0) {
						continue; }}int i;
				printf("maxFds=%d\n", maxFds);
				for (i = 1; i <= maxFds; i++) {
					printf("i=%d\n", i);
					// Handle read events
					if (fds[i].revents & POLLIN) {
						int sockfd = fds[i].fd;
						char recv_buf[512] = "";
						int rs = read(sockfd, recv_buf, sizeof(recv_buf));

						if (rs == 0) {
							close(sockfd);
							fds[i].fd = - 1;
						} else {
							printf("%s\n",recv_buf);
							// Write data to each server
							int j;
							for (j = 1; j <= maxFds; j++) {if(j ! = i) { write(fds[j].fd, recv_buf,strlen(recv_buf));
								}
							}
						}
					}
				}
		}
		
	}
}

int main(a){
	server();
	return 0;
}

Copy the code

epoll

Epoll can be understood as an event pool. Different from the polling mechanism of SELECT and poll, ePoll uses an event-driven mechanism. Each FD has a callback function registered on it.

When epoll_WAIT is called to check for an event, you simply check for an EPItem element in the RDList double-linked list in the EventPoll object. If rdList is not empty, the events that occurred are copied to the user mode and the number of events is returned to the user.

Interface definition of the epoll function
#include <sys/epoll.h>

// Data structure
// Each ePoll object has a separate EventPoll structure
// Used to store the events added to the epoll object by the epoll_ctl method
// Epoll_WAIT checks whether an event has occurred by checking whether there is an EPItem element in the RDList double linked list in the EventPoll object
struct eventpoll {
    /* The root node of the red-black tree, which stores all events added to epoll to be monitored */
    struct rb_root  rbr;
    /* The double-linked list contains events that meet the criteria to be returned to the user via epoll_wait */
    struct list_head rdlist;
};

// API
// Add an EP object in the middle of the kernel, and put all the sockets that need to be listened on into the EP object
int epoll_create(int size); 
// epoll_ctl is responsible for adding and deleting sockets to the kernel red-black tree
int epoll_ctl(int epfd,  // The created EP object
              int op,    // Add or delete the operation type
              int fd,    // The object to operate on
              struct epoll_event *event  / / event
             ); 
// epoll_wait detects the readable queue and blocks the process if there is no readable socket
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

Copy the code
Epoll executes the process
  • The call to epoll_create() creates an EP object, the root node of the red-black tree, and returns a file handle
  • Call epoll_ctl() to add, delete, and modify events of interest to this EP object (red-black tree)
  • Call epoll_wait() and when an event occurs, the driver will call the function registered on the FD and add the FD to the RDList, unblocking

The sample code
#include<stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>						
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>	
#include <sys/epoll.h>

void server(a) {
	
	// Create a socket connection
	int lfd = socket(AF_INET,SOCK_STREAM,0);
	struct sockaddr_in my_addr; 
	bzero(&my_addr, sizeof(my_addr));
	my_addr.sin_family = AF_INET; // ipv4
	my_addr.sin_port   = htons(8088);
	my_addr.sin_addr.s_addr = htonl(INADDR_ANY); 
	// Bind the port
	bind(lfd, (struct sockaddr*)&my_addr, sizeof(my_addr));
	// Listen for connection requests
	listen(lfd, 128);
	printf("listen client @port=%d... \n".8088);
	int epct, i;
	struct epoll_event event;
	struct epoll_event events[100].
	memset(events, 0.100 * sizeof(struct epoll_event));
	int epfd = epoll_create(1);
	event.data.fd = lfd;
	event.events = EPOLLIN;
	epoll_ctl(epfd, EPOLL_CTL_ADD, lfd, &event);
	while (1) {
		printf("Blocked.... \n");
		int nready = epoll_wait(epfd, events, 20.- 1);
		int i;
		for (i = 0; i < nready; ++i) {
			// Listen for new client connections
			if (events[i].data.fd == lfd) {
				struct sockaddr_in client_addr;	
				socklen_t cliaddr_len = sizeof(client_addr);
				char cli_ip[INET_ADDRSTRLEN] = "";	
				// There must be a connection that does not block
				int clientfd = accept(lfd, (struct sockaddr*)&client_addr, &cliaddr_len);
				inet_ntop(AF_INET, &client_addr.sin_addr, cli_ip, INET_ADDRSTRLEN);
				
				event.data.fd = clientfd;
				event.events = EPOLLIN | EPOLLET;
				epoll_ctl(epfd, EPOLL_CTL_ADD, clientfd, &event);
				
				printf("----------------------------------------------\n");
				printf("client ip=%s,port=%d\n", cli_ip, ntohs(client_addr.sin_port));
			} else {
				char recv_buf[512] = "";
				int rs = read(events[i].data.fd, recv_buf, sizeof(recv_buf));
				if (rs < 0) {
					close(events[i].data.fd);
					epoll_ctl(epfd, EPOLL_CTL_DEL, events[i].data.fd, &event);
					continue;
				}
				printf("%s\n",recv_buf); }}}}int main(a){
	server();
	return 0;
}

Copy the code
Epoll summary
  • The maximum number of file descriptors supported by EPOLL is the maximum number of files that can be opened in the entire system. In theory, a maximum of 100,000 file descriptors can be created with 1 GB of memory
  • Each file descriptor has a callback function that is called when an event occurs on the socket, adding a reference to that fd to the ready list. Select and poll do not specify which file descriptor is ready, while epoll does. The difference is that after the system call returns, the program calling select and Poll has to go through the entire listener’s file descriptor to find out who is ready, whereas epoll just does it
  • Select and poll use polling to check whether the file descriptor is in the ready state, while epoll uses callback mechanism. As a result, the efficiency of select and poll decreases linearly as the number of FDS increases, while epoll is not affected much unless there are many active sockets
Select, poll, and epoll are compared