File descriptor

Create a file descriptor, which is a data structure represented by a number

File descriptor for a normal file

1. TXT # create a file descriptor 6, use the number 6 to represent the read operation on 1.txt. Exec 8<> 1.txt lsof -op $$lsof -p $BASHPID # Both commands can view the descriptor of the file being used by the current process
#As can be seen in the fd folder of the current process,$$represents the ID number of the current processCD /proc/$$/fd echo "HHH" > &7 # read a 0< &6 #Copy the code

In addition each process once created has three own default file descriptors 0 u (standard input) 1 u (standard output) 2 u (output) an error information, u can represent, speaking, reading and writing, and in fact we create read the file descriptor is 6 [6] and [w] 7, so at this point the current bash process has four file descriptors.

Each file descriptor represents a data structure that has its own offset indicating that it can operate (read and write) from that location in the current file.

Each process has its own file descriptor, and because of process isolation, the respective file descriptors maintained by different processes can be repeated, that is, the same file descriptor for different processes can point to different files. If the same file descriptor for different processes points to the same file, they still maintain their own offset Pointers, meaning that each process can access its own region

Socket file descriptor

#Create a socket file descriptor
exec 8<> /dev/tcp/www.baidu.com

lsof -op $$
Copy the code

Socket-type file descriptors also have their own cache area, but the data is not flushed, it is sent through the network adapter, through various protocols between the network layer, packaged into packets sent to the destination IP address.

Process descriptor

In Linux, every process has a process descriptor. This “process descriptor” is a structure called task_struct that holds a lot of information about process control. Task_struct is a Linux kernel data structure that is loaded into RAM and contains information about the process. Each process puts its information into a task_struct data structure.

Task_struct Content identifier: a unique identifier that describes this process and is used to distinguish other processes.

Status: task status, exit code, exit signal, etc.

Priority: Priority relative to other processes.

Program counter: Address of the next instruction to be executed in a program.

Memory Pointers: Pointers to program code and process-related data, as well as Pointers to memory blocks shared with other processes.

Context data: data in the registers of the processor while the process is executing.

I/O status information: Includes the DISPLAYED I/O requests, the I/O devices assigned to the process, and the list of files being used by the process.

Billing information: may include total processor time, total number of clocks used, time limit, billing number, etc.

Kernel mode and user mode

concept

  • Kernel mode:

There are both operating system programs and common user programs in the system. For security and stability, operating system programs can not be accessed casually, this is the kernel state. That is, programs that need to execute the operating system must be converted to kernel mode, which can use all the hardware resources of the computer.

  • User mode:

You cannot use system resources directly or change CPU working state, and you can access only the user program’s own storage space.

Kernel mode and user mode

When a task (process) makes a system call and gets stuck in kernel code, the process is said to be in kernel state. At this point the processor is in the most privileged (level 0) kernel code. ** When a process is in kernel mode, the kernel code executed will use the current kernel stack. Each process has its own kernel stack. A process is said to be in user mode when it is executing the user’s own code. ** that is, the processor runs in the user code with the lowest privileges. When a user program is interrupted while it is executing, the user program can also be in the kernel state symbolically. Because the interrupt handler will use the kernel state of the current process.

Kernel mode and user mode conversion

A. System call

This is a way for a user process to actively request switching to kernel mode. The user process uses system calls to apply for services provided by the operating system to complete its work. The core of the system call mechanism is to use an interrupt specially opened by the operating system for users to achieve, such as Linux ine 80h interrupt.

B. abnormal

When the CPU is executing a program running in user mode, it finds some event unknown exception, which triggers a switch from the current running process to the kernel related program that handles the exception, which is the kernel state, such as a page missing exception.

C. Interruption of peripheral devices

After peripheral equipment to complete the operation of the user request, will send a corresponding to the CPU interrupt signal, the CPU will be suspended for the next article will execute commands to perform the interrupt signal handler, if executed first instruction is under the user mode application, then the transformation process also occurs naturally have user mode to kernel mode switch. For example, after a disk read/write operation is complete, the system switches to the disk read/write interrupt handler for subsequent operations.

interrupt

What is an interrupt

An interrupt is an event that interrupts a sequence of CPU instructions. It is an electrical signal generated by hardware, both inside and outside the CPU. When the CPU receives an interrupt, it reports this signal to the OS, which processes the incoming data. Different events correspond to different interrupts, and the OS uses the interrupt number (also known as IRQ line) to find the corresponding processing method. Interrupts may be fixed or dynamically allocated in different systems.

When an interrupt occurs, the interrupt controller is first told. The interrupt controller is responsible for collecting all interrupts from interrupt sources. It can control the priority of interrupt sources, interrupt types, and specify which CPU interrupts are sent to.

After the interrupt controller notifies the CPU, for an interrupt, one CPU responds to the interrupt request. The CPU pauses the program being executed and instead executes the corresponding handler, the interrupt handler in the OS. Here, the interrupt handler is associated with a particular interrupt.

Interrupt descriptor table

So how does the CPU find the interrupt service routine? In order to let CPU by interrupt number to find the corresponding interrupt program entry, it is necessary to establish a query table in memory, that is, interrupt descriptor (IDT). In the CPU, there is a special register IDTR to hold the LOCATION of IDT in memory. It is important to note that the interrupt vector table, often referred to as the interrupt vector table, is in real mode. The interrupt vector directly indicates the entry point of the process, while the interrupt descriptor table has other information besides the entry address.

More: Understanding interrupts in depth

permissions

In Linux, for example, the hardware has two segment registers: DPL and CPL. The DPL segment register is set to 0 (indicating the permission level of the kernel memory space) when the system is loaded, and the CPL refers to the permission level of the user address space is 3 when the system is loaded and a shell is started to execute the user mode application

DPL = 0, CPL=3;

DPL >= CPL to access the kernel segment memory space

Of all the instructions, only the int 0x80 instruction can set CPL to 0 and DPL to 3 before accessing kernel space

The execution logic of int 0x80

Assuming that fmt.println (” Hello world”) is executed, this code eventually calls the write() system call to print the output to the screen, requiring the final printing operation to be done through the system call.

  • Change DPL from 0 to 3, so that user mode can enter the kernel.
  • Interrupt 0x80 is a kernel function entry: systemCall
  • CPL is set to 0, and kernel functions can be executed.
  • Systemcall finds the value in the %eax register, starts calling write, then gets the value in the other relevant registers, gets the file descriptor and the character and length to print

Int write(int fd, const char *buf, off_t count) has stored the address and fd of the write in other registers or memory

  • After executing, return to user mode, CPL set to 3, DPL set to 0, return to user mode, do what you need to do

select

Select function details

#include <sys/select.h>
int select(int maxfpd1, fd_set *read_fds, fdset *write_fds, fdset *exception_fds, struct timeval *restrict tvpr);
// Returns the number of file descriptors that can be manipulated.
Copy the code

Use the select function to reuse the IO port. The parameters passed to the select function tell the kernel:

  • The file descriptor we are interested in
  • For each descriptor, we care about the state
  • How long do we have to wait

After returning from the select function, the kernel tells us something:

  • The number of descriptors that are ready for our requirements
  • Which descriptors are ready for the three conditions (read, write, exception)

Fd_set is a bitmap structure, which is a binary with a length of 1024 (32-bit machine), which is actually a bitmap

read_fds, write_fds, exception_fds

Maxfdp1: The maximum file descriptor number +1. The maximum value is 1024. By specifying the largest descriptor we care about, the kernel only needs to search for open bits in this range.

Related operating functions:

int FD_ZERO(fd_set *fdset)   // Set all bits of a variable of type fd_set to 0

int FD_CLR(int fd, fd_set *fdset)  // Clear a bit

int FD_SET(int fd, fd_set *fdset)   // Set the bit of the specified position to 1

int FD_ISSET(intFd, fd_set * fdset)    // Tests whether someone is set to 1
Copy the code

The select() system call determines which file descriptor is open. If a fd is unreadable or writable, this bit is 0. If it is readable, it is still 1 in the array. After the system call is complete, return the sum of the number of readable and writable service program, loop through all file descriptors with FD_ISSET() to determine whether read and write operations

The illustration

The client initiates a request to the server

Set the socket file descriptor and build the readset

The select function makes a system call that copies the readset from user state to kernel state

Client 1,3 sends the data, and through the card’s DMA puts the data into memory, called the card buffer

Initiate a nic hardware interruption

Reads data from the nic buffer

Process A returns to the run queue

advantage

All FDS are passed to the kernel through a single system call, and the kernel traverses, which reduces the overhead of multiple system calls

Disadvantages:

  • Because select is modified directly on **&readset,&writeset**, the two arrays are not reusable and must be reassigned each time

  • The full FDS is retraversed for each select

poll

# include <poll.h>
int poll(struct pollfd fdarry[], nfds_t nfds, int timeout);

struct pollfd{
    int fd;
    short events;
    short revents;
}
Copy the code

pollfd

Rather than build a set of descriptors, poll builds an array of PollFd’s, each element of which specifies a descriptor number (FD in the structure) and the conditions (short events) in which we are interested in that descriptor.

When the poll system call completes, it also returns the number of file descriptors that can be manipulated, and the service checks the Revents field in the PollFD

int a = poll(*fdarray, nfds, 0);
if(a > 0) {// The server operates on a file descriptor that can be read or written
    for(i=0; i<4; ++i){
        if(pollfd->revents){  / / being is being fostered fostered fostered fostered fostered
            / / operation}}}Copy the code

advantage

  • The kernel operates on revents fields in structures created for file descriptors without breaking other fields in other structures, so you don’t have to reconstruct bitmaps like readset every time
  • There is no limit of 1024 file descriptors for SELECT

Disadvantages:

  • Each poll still iterates through the full FDS

  • The service also iterates through the full FDS to see if the Revents field of each file descriptor requires a read or write operation

epoll

The function,

Epoll is a Linux-specific I/O reuse function that uses a group of functions to accomplish tasks rather than a single function. Epoll puts the file descriptors of interest to the user into an event table in the kernel, so ePoll needs to use an additional file descriptor to represent the event table in the kernel.

#include <sys/epoll.h>
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Copy the code
  • Epoll_create () returns a file descriptor that “describes” an area of memory in the kernel. Size has no effect now.

  • Epoll_ctl () is used to manipulate the kernel event table,

    • Int epfd represents the event table returned by epoll_create()

    • Int fd: indicates the newly created socket file descriptor

    • int op

      • EPOLL_CTL_ADD: Adds a file descriptor to the event table. The socket events that the kernel should care about are stored in the epoll_event structure. The file descriptors added to the event table are in the form of a red-black tree to prevent repeated addition
      • EPOLL_CTL_MOD: Modifies events registered on fd
      • EPOLL_CTL_DEL: Deletes events registered on fd
    • struct epoll_event *event

      • struct epoll_event{

      _uint32_t events; // epoll_data_t data; Struct epoll_data{void* PRT; int fd; _uint32_t u32; _uint64_t u64; }epoll_data_t;

  • Epoll_wait () This function returns the number of ready file descriptors

Working mode (LT mode, ET mode)

LT Mode (Horizontal trigger)
  • After fd is readable, the file descriptor is still readable in LT mode if the service program finishes the read if it has read part of it
  • After the fd is writable, if the service program writes part of it, the file descriptor is still writable in LT mode
ET mode (edge trigger)
  • After the fd is readable, if the server reads part of the file, it ends the read. In ET mode, the file descriptor is unreadable and will not become readable until the next time data arrives. Therefore, we need to ensure that the data is read in a loop to ensure that all data is read
  • After the fd is writable, if the server writes part of the file, the write is terminated. In ET mode, the file descriptor is not writable. We need to ensure that the data is written to the full

ET mode greatly reduces the number of epoll events to be triggered repeatedly, so it is more efficient than LT mode

The illustration

The client establishes the socket connection

Executing epoll_create() returns a list of events

Execute the epoll_ctl() function

Execute the epoll_wait() function

The client sends data

interrupt