preface

While reading Linux Kernel Design and Implementation recently, I thought I might want to put my knowledge together.

What are you talking about? Today we will talk about Android IO call link.

Speaking of IO, this is really a very complex process, which involves a lot of content, first software, finally to the hardware, with a picture to show it:

The purpose of this article is to briefly discuss the IO process.

1. Application layer

As app developers, we’re usually the starting point for IO, like the user says this novel is great, I’m going to download it locally, or this is a great photo, I’m going to share it with you.

These are all common IO scenarios, but do you know what IO there are?

1. I/O classification

When using IO, we usually have a number of options. Common ones are:

  1. Buffered and unbuffered IO
  2. Direct and indirect IO
  3. Blocking and non-blocking IO
  4. Synchronous and asynchronous I/O

You’ve probably heard of buffering IO and blocking IO, and that’s probably what we’re talking about in development.

1.1 Buffering and directness

The first two categories use caching.

Buffering is for the standard library.

The Linux standard library defines many of the basic operating system services, such as input/output, string processing, and so on. The standard library for the Android operating system is Bionic, which acts as a bridge between the application layer and the kernel. Bionic can also be accessed through the NDK.

Using the standard library for IO we call buffered IO, and when we’re reading a file, we often have to read a line to get output, and we do a similar thing inside Android.

Directly to the kernel.

With Binders, data is passed from user space to kernel space, as with indirect IO, which does an extra layer of page caching. With direct IO, applications call the file system directly.

Caching and indirect IO are like level 1 and level 2 caches for IO scheduling. Why do so much caching? Since operating disks is inherently resource-intensive, not caching frequent I/OS can be both resource-intensive and time-consuming.

1.2 Blocking and Asynchrony

Synchronous and asynchronous I think we all know what that means.

Blocking IO refers to the fact that the thread blocks while the user is performing reads and writes, and that data preparation and copying to the user process are blocked:

NIO in Java is non-blocking IO. When a user initiates a read or write, the thread does not block. After that, the user can obtain the result of the current I/O scheduling by polling or receiving notification:

Even with non-blocking IO, for reading data, only the process of preparing data is asynchronous; copying data from the kernel to the user process is synchronous. Therefore, non-blocking IO is not truly asynchronous IO.

True asynchronous IO looks like this:

Preparing data and copying data from the kernel to the user process should be asynchronous, and when notified, we can already use the data in the application process.

2. IO process

As an application development layer, you don’t do a lot of IO, but you can read and write files using BufferedInputStream and BufferedOutputStream, and NIO is even less common.

Let’s look at the flow of read calls that block IO.

SysCall system call

The application layer is adjusted, will the next directly into the kernel?

Aside from direct IO, most won’t! User space is separated from the kernel by a system call (sysCall) that does the following:

  1. The user space provides abstract interfaces to access hardware, such as applying for system resources and operating devices
  2. Ensure system security and stability: The kernel can make some decisions about user processes’ access to prevent user processes from doing things that could harm the system

The kernel is complex, after all, and abstracting out a common interface prevents a user-space process from overstepping its bounds and getting something it shouldn’t.

In order for the application process to contact the kernel, it will tell the kernel, through a soft interrupt, that I want to call the read interface in sysCall in the kernel.

For read IO, there is a corresponding sys_read method in the system call, and when the kernel is notified to execute this method, it executes the read method of the virtual file system.

Virtual file systems

There are too many file systems. For example, the file system of my phone’s user space is F2FS and the file system of my system space is ext4. For an application, it just wants to call a read method, regardless of what your phone’s underlying file system is!

Virtual file systems do this by masking specific file systems and defining a set of data structures and standard interfaces that all file systems support. That way, programmers at the application layer need only understand the unified interface that VFS provides.

A Virtual File System is often called a Virtual File System (VFS).

1. The structure of the VFS

VFS takes an object-oriented approach and is often made up of the following objects (structures in C) :

These objects constitute the basic virtual file system.

The VFS needs to know how to manipulate them, so there are also objects for each object:

  • super_operationObject: a method that the kernel can call on a superblock
  • inode_operationObject: a method that the kernel can call on an index node
  • dentry_operationObject: Methods that the kernel can operate on directory entries
  • file_operationObject: methods that the kernel can operate on files opened in a process

The most familiar is the file, which we can actually manipulate in the process. For example, in the file_operation of a file, there are familiar methods such as read, write, copy, open, write to disk, etc.

If you’ve noticed, I’ve specifically indicated that superblocks and index nodes exist in memory and disk, while directory entries and files only exist in memory.

My understanding is that for disks, index nodes are sufficient to record file information, and there is no need for directory entries to record hierarchy. For memory, to save memory, only inodes used by files and directory entries will be added to memory. Super blocks will be added to memory only when the file system is mounted.

Structure diagram of directory entries, index nodes, files, and superblocks:

There are a few other things to note about the structure diagram above:

  1. For /home/pic/a.jpg, the root /, home, PIC, and a.jpg are all directory entries
  2. Each directory entry holds a pointer to an index node
  3. The index node contains all the information the kernel needs to operate on the file, such as the location of the disk
  4. A file opened in a process holds a directory entry

2. Caches in the VFS

As you can see from the first figure in this article, VFS has a directory entry cache, an index node cache, and a page cache. We know what directory entries and index nodes mean.

The page cache is made up of physical pages in RAM that correspond to physical addresses on ROM. As we all know, the current mainstream Android RAM access speed is up to 8.5GB /S, while ROM access speed is up to 6400MB /S, so RAM access speed is much faster than ROM, which is the purpose of the page cache.

When a read operation is initiated, the kernel first checks whether the desired data is in the page cache. If so, it is read directly from memory, which is called a cache hit. If not, the kernel puts the read data into the page cache as it reads the data. Note that the page cache can store the entire file or just a few pages.

3. IO process

After a system call, the READ I/O enters the VFS.

Go to the file object (in VFS), call the read method through the file object’s file_operation object, and pass in the amount of data to be read. But the read method also finds the directory entry of the file object, which in turn finds the index node. After all, only the index node knows where the file exists.

With index nodes, the kernel uniquely identifies a file, searches the page cache for data it needs, and returns it.

If not, proceed to the next step.

File system

VFS defines the unified interface of the file system. The implementation is up to the file system, how the data in the superblock is organized, how the directory and index structure is designed, and how the data is allocated and cleaned up.

To put it bluntly, file systems are used to manage persistent data on disk. For Android, the most common ones are ext4 and F2FS.

1. File system structure

Since a file system is a concrete implementation of the VFS, there are also directory entries, inodes, and superblocks, and the images above are just as appropriate to describe file systems.

Take the architecture of ext2 earlier:

Each ext2 consists of a large number of block groups, each of which is structured like the chart in the table of entries and index nodes above. Blocks are the smallest addressing unit in memory, as described in the section on disk. The band size can usually be set to between 2KB and 64KB.

2. Differences of file systems

While most file systems also have superblocks, inodes, and data blocks, the implementation of each file system is quite different, leading to different priorities. Take ext4 and F2fs:

  • Ext4 is better at reading large files continuously and takes up less space
  • F2fs random IO is faster

In other words, their allocation of free space is inconsistent with existing data management methods, and different data structures and algorithms lead to different results.

3. IO process

The IO process here is similar to the VFS, after all, the file system is the concrete implementation of VFS.

5. Block I/O layer

There are two basic device types under Linux:

  • Block devices: Hardware devices that have random access to fixed-size pieces of data. Hard disks and flash memory (described below) are common block devices
  • Character devices: Character devices can only be accessed in order as character streams, such as keyboards and serial ports

The difference between the two devices is whether they can be accessed randomly. Take the keyboard of a character device for example. When we type Hello World, the system can’t get the eholl wrodl first, so the output is messed up. In the case of flash memory, it’s often a matter of looking at a picture of a database and reading a novel of array blocks spaced far apart, so the blocks read are definitely not contiguous on disk.

Because the kernel is so complex to manage block devices, there are sub-systems that manage block devices, called file systems.

1. Block device structure

Common data management units in block devices:

  • Sector: The smallest addressing unit of a device
  • Block: The smallest addressing unit of a file system, several times larger than the sector
  • Fragments: Composed of hundreds to thousands of pieces

Since hard drives are commonly used in Linux, I’m wondering if the management unit here is the same as the flash management unit below.

2. IO process

If there are currently IO operations, the kernel creates a basic container of the BIO structure, which is made up of fragments, each of which is a small contiguous memory buffer.

The kernel then stores the IO requests in a request_queue.

If I/O requests are sent to block devices in the order in which they are generated, the performance will be unacceptable. Therefore, the kernel will merge and sort the I/O requests submitted before they are queued according to the disk address.

Six, disk

Nand flash is the most commonly used persistent storage in mobile devices, and UFS is the leader in Nand flash, which is characterized by higher speed, smaller size and lower power consumption.

Today’s flagship Android devices come with UFS 3.1, which is just a tiny chip:

Flash memory is a type of non-volatile memory that does not lose data even if there is a power failure. Flash memory storage units from small to large:

  • A Cell is the smallest unit of flash storage. According to the number of flash storage units, it can be divided into SLC (1bit/Cell), MLC (2bit/Cell), TLC (3bit/Cell), and QLC (4bit/Cell).
  • Page: Consists of a large number of cells, each Page is usually 16 KB in size, which is the smallest unit that flash memory can read and write
  • Blocks: Each Block consists of hundreds to thousands of pages
  • Plane: Plane consists of hundreds to thousands of Black
  • Die (logical unit) : Each Die consists of one or more planes and is the smallest unit in flash memory that can execute commands or return states

For each Cell, it is composed of a double-layer floating gate MOS tube of a kind of NMOS, roughly as follows:

For SLC (store 1bit) :

  • If 1 is required, a voltage is applied at the P pole to suck electrons out of the storage cell
  • If zero is required, a voltage is applied to the top control pole to suck electrons back into the storage cell

This makes up the smallest units of data storage, 0 and 1!

conclusion

The whole process is briefly represented by a diagram:

Because I am not particularly familiar with the kernel, the article inevitably has the wrong place, welcome to correct in the comment area, if you think this article is good, “like” is the best affirmation!

Article Reference:

Learn the File System in one Go with these 25 graphics