“This is the first day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021”
The nature of the file descriptor
File descriptors are essentially a credential provided by the kernel to the process. The kernel maintains a file descriptor table for the process to record the corresponding relationship between file descriptors and files, and record information related to read and write modes.
In Linux, when each process is started, the kernel creates a virtual directory for it in the /proc directory named PID of the process. In this directory there is a folder called fd, which records the files opened by the process.
First we start a Python interpreter
Then performps -ef | grep python3
The PID of the Python interpreter process is 7072/proc/7072/fd
As you can see, there are three files in this directory, which indicates that the process has opened three files. The file descriptors are 0,1, and 2.
- Standard input, fd=0.
- Standard output, fd=1.
- Standard error output, fd=2.
Execute ls -L and you can see that all three files are linked to the /dev/pts-0 device, which represents the user’s SSH terminal.
SSH is used to connect to the server for testing in this paper
Since they are files, that means we can manipulate them as if they were files. (This is where Linux gets interesting.)
The standard input file says print(‘Hello from Kovogo ‘) and switches to another window
As you can see, the other window checks our input.
From the above, we can observe that file descriptors count from zero and grow one by one. Some of the features can be cleverly implemented with this rule, and of course it can answer some questions, but before we do that, let’s verify the mechanism.
Let’s create three files and then open them in order
Log file descriptor should be 3, 2. Log file descriptor should be 4, 3. Log file descriptor should be 5, enter the file descriptor directory of the process, observe it
How is select implemented? Why is the number of file descriptors monitored less than epoll?
From the previous analysis, we already know that file descriptors are incremented from 0 without a break. The SELECT system call takes advantage of this mechanism by storing the file descriptors to monitor through a bitmap. So what is a bitmap?
We know that a byte is made up of eight bits, and when we assign certain information to these bits it becomes a bitmap, and a bitmap is usually used to determine whether an element is in a set.
Suppose, given an array of 0 to 15, we need to de-duplicate the array and output only the unique elements. Instead of using a Map to store the elements that have been present, we can use a bitmap.
As shown below, here is a 2-byte bitmap.
Suppose to determine whether an element is in the bitmap, we just need:
- If you use short to store bitmaps
bool exists = ((1 << elem) & bitmap) > 0
Copy the code
- If you use byte to store bitmaps
First, to locate the byte in which the element is likely to be stored, divide 8 directly, and then locate the bit in which the element is located
int byte_pos = elem / 8;
int elem_pos = elem % 8;
bool exists = (1 << elem_pos) & bitmap[byte_pos]) > 0
Copy the code
After the bitmap is simple to understand, you know why the select can only monitor a limited number of descriptors, because select store file descriptor bitmap size is limited, and because the file descriptor is increasing so the bitmap basically won’t have wasted a lot of bits and bytes, however, the bitmap size is limited.
Let’s just call it a little bitsizeof
If you look at the size of the bitmap,select.h
The structure type of the bitmap is fd_set.
The maximum number of file descriptors that can be monitored on my machine is 8 * 128 = 1024. The maximum number of file descriptors that can be monitored is 1024. Of course, you can adjust the size of fd_set by compiling the kernel.
Whisper Bibi: To be honest I haven’t gone that far