This is the third day of my participation in the August Text Challenge.More challenges in August

Epoll analysis

As an I/O processing model based on event notification, epoll is widely used in I/O multiplexing scenarios. The ePoll API performs a similar task to Poll: monitoring the I/O availability of multipath file descriptors.

Epoll features:

  • Trigger modes: The Epoll API can be used for edge or level trigger modes. The mechanism of these two trigger modes is quite different, which will be explained in detail below.
  • Performance: EPoll does a good job of monitoring a large number of file descriptors and is a good choice for large-scale I/O multiplexing scenarios.

Epoll instance basic management

Through epoll_create, epoll_wait, the epoll epoll_ctl system call created, waiting for I/O events, the increase of the file descriptor delete | | modified, and other functions.

  • Epoll_create: This creates an instance of epoll and returns a file descriptor pointing to the instance. Epoll_create1 extends some of the capabilities of epoll_create.
  • Epoll_ctl: Use epoll_ctl to add the interested file descriptor to the epoll instance. The set of file descriptors currently registered on an epoll instance, commonly known as an epoll set.
  • Epoll_wait: Waits for I/O events. If no I/O events are available, epoll_wait blocks the calling thread.

triggered

The ePoll event distribution interface can be presented in edge-triggered(ET) and Level-triggered (LT) modes. The differences between the two mechanisms will be explained in more detail below. Now simulate a scenario like this:

  1. Open a pipe and register the read-side file descriptor (RFD) of the pipe with the epoll instance.
  2. At the other end of the pipe, write 2kB of data to the pipe.
  3. Epoll_wait returns, and returns an RFD in the readable state.
  4. Read 1kB of data with RFD.
  5. Calls epoll_wait.

If the EPOLLET(edge-triggered) attribute is configured when the RFD is added to the epoll instance, the epoll_wait call from step 5 above will hang even though the data is still available in the RFD input buffer. Meanwhile, the other end of the pipe may be waiting for a response message based on this data. The reason for this is that edGE-triggered mode only delivers event notifications when the file descriptor changes. So, step 5’s epoll_wait may keep some data coming, but it already exists in the input buffer. In the above example, step 2 writes 2kB of data to the pipe. In Step 3, epoll_wait returns with RFD readable, and in step 4, epoll_wait reads some data from the pipe. The epoll_wait call in Step 5 May block because not all of the data was read out in Step 4.

Non-blocking file descriptors should be used in order to avoid the application from “starving” a task due to blocking and reading or writing multiple file descriptions using the EPOLLET attribute. The recommended epoll modes are as follows:

  • Use non-blocking file descriptors; and
  • Epoll_wait is called to wait for I/O events only after read or write returns an EAGAIN error.

In contrast, when using the level-triggered mode (which is the default mode), epoll is simply a high-speed version of poll with the same semantics as poll.

Even if some file descriptors use edged-triggered epoll, multiple events can be generated when multiple blocks of data are received, so the caller can choose to specify the EPOLLONESHOT flag to tell EPoll to disable the associated file descriptor after receiving an epoll_WAIT (2) event. When the EPOLLONESHOT property is enabled, the caller is responsible for undisabling epoll’s file descriptor again using EPOLL_CTL_MOD of epoll_Ctrl.

Epoll with memory

/proc/sys/fs/epoll/max_user_watches(Linux 2.6.28) is responsible for limiting the total amount of kernel memory used by epoll. This file defines the total number of file descriptors owned by all epoll instances under a single user. Each registered file descriptor occupies 90 bytes in 32-bit kernels and 160 bytes in 64-bit kernels. Currently, the default value for max_user_watches is 1/25(4%) of the available physical memory divided by the size of memory used for each file descriptor.

How to avoid common traps

  • Process hunger (edge trigger)

If there is a lot of I/O space, then by trying to exhaust it, other files may not get processed, leading to starvation. (This problem is not unique to Epoll.) The solution is to maintain a ready list and mark the file descriptor ready in its associated data structure, allowing the application to remember which files need to be processed, but still loop through all the ready files. This also supports ignoring subsequent events received by the prepared file descriptor.

  • If the Event cache is used

If you use event caching for all file descriptors returned from epoll_WAIT, be sure to provide a way to dynamically mark their closure (that is, caused by the processing of previous events). Suppose you receive 100 events from epoll_wait, and in event # 47, a condition causes event # 13 to close. If the structure is removed and the file descriptor for event # 13 is closed, the event cache may still assume that events are waiting for that file descriptor and cause confusion, which is a common cache state mismatch problem.

One solution to this is to call epoll_ctl (EPOLL_CTL_DEL) during the processing of event #47 to remove the file descriptor, close it, and then mark its associated data structure as removed and link it to the cleanup list. If another event of file descriptor #13 is found in the batch, you will find the previously deleted file descriptor and there will be no confusion.

Q&A

  • Q0: What is the key to distinguish between the file descriptors that have been registered in epoll?

  • A0: The key is the combination of the file descriptor and the description of the open file (also known as the “open file handle,” which is the internal kernel representation of the open file).

  • Q1: What happens when the same file descriptor is registered twice?

  • A1: The EEXIST error code may be returned. However, it is possible to register a file descriptor that has been copied using (DUP, DUp2, FCNTL) with the same epoll instance. This can be a useful technique for filtering events if duplicate file descriptors are registered with different event masks.

  • Q2: Can different epoll instances monitor the same file descriptor? If so, will both epolls receive event notifications?

  • A2: Yes, each epoll receives an event notification. However, careful programming can completely avoid this problem.

  • Q3: Can file descriptors of epoll itself poll/epoll/selectable?

  • A3: yes. If an epoll file descriptor is readable, it also generates event notifications.

  • Q4: What happens if epoll file descriptors are added to epoll’s own set of file descriptors?

  • A4: Epoll_CTRL returns an EINVAL error code. However, you can add the epoll file descriptor to the file descriptor set of another epoll instance.

  • Q5: Can I send an epoll file descriptor to another process through a UNIX domain socket?

  • A5: Yes, but that doesn’t make sense because the receiving process doesn’t have another set of file descriptors for this epoll.

  • Q6: After closing a file descriptor, does epoll automatically remove it from the file description set?

  • A6: Yes, but there is something to be aware of. A file descriptor is a reference to the description of an open file (see Open (2)). Whenever the descriptor is copied by dUP (2), dup2(2), FCNTL (2) F_DUPFD, or fork(2), a new file descriptor is created that references the same open file description. An open file description continues until all file descriptors referencing it are closed. File descriptors are removed from the epoll set only after all file descriptors referencing the underlying open file description have been closed (or before descriptors have been explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL). This means that even if a file descriptor in an epoll set is closed, events for that file descriptor may be reported if other file descriptors referencing the same underlying file descriptor remain open.

  • Q7: If multiple events are generated during epoll_wait blocking, are they reported together or one by one?

  • A7: Report together.

  • Q8: Does it affect the file description operation associated with a collected but unreported event?

  • A8: For existing file descriptors, you can do two things: delete them without much meaning; The modification restores readable IO.

  • Q9: When using the EPOLLET flag (edge triggering behavior), do I need to keep reading/writing the file descriptor until I return EAGAIN?

  • A9: Receiving an event from epoll_wait(2) should prompt you that the file descriptor is ready for the requested I/O operation. You must assume that it is ready until the next (non-blocking) read/write generates EAGAIN again. When and how to use file descriptors is entirely up to you.

For packet/token-oriented files (for example, datagram sockets, terminals in canonical mode), the only way to detect the end of the read/write I/O space is to continue reading/writing until EAGAIN is returned. For stream-oriented files (such as pipes, FIFOs, streaming sockets), you can also detect read/write I/O space exhaustion by examining the amount of data being read/written to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data, and read(2) returns fewer bytes, you can be sure that you have run out of read I/O space for the file descriptor. The same is true when writing with Write (2). (Avoid the latter technique if you cannot guarantee that the file descriptor being monitored will always refer to stream-oriented files.)