• In UNIX Everything is a File
  • Ph7spot.com
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: pmwangyang

In order to systematically evolve the architectural design, interface, culture, and development path, the UNIX system crystallized a unified set of concepts and ideas. The most important of these is the mantra “Everything is a file”, widely regarded as one of the definitions of UNIX.

The primary design principle is to provide a unified paradigm for accessing a wide range of input/output resources, including files, folders, hard disks, CD-RoMs, modems, keyboards, printers, monitors, terminals and even cross-process and network communications. The trick is to provide an abstract object of all these resources, which the fathers of UNIX called a “file.” Because each “file” is exposed by the same API[1], you can use the same set of commands to read, write, and/or manipulate disks, keyboards, files, or network devices.

This basic concept has two meanings:

  • In UNIX, everything is a byte stream
  • In UNIX, the file system is used as a generic namespace

In UNIX, everything is a byte stream

What does a file consist of in UNIX? A file is nothing more than a series of bytes that can be read and written. If you have an index to a file (we call it a “file descriptor [2]”), then UNIX’S I/O channels are ready to go, and they share the same set of operations and apis — regardless of the device type or underlying hardware.

Throughout history, UNIX was the first system to abstract I/O into a unified concept and set of primitives. At the time, most operating systems provided a different API for each device or class. Some early microcomputer operating systems even required you to use multiple commands to copy files — because each command corresponded to a specified floppy disk size!

For most programmers and users, UNIX exposes them to:

  • Files on hard disk

  • folder

  • link

  • Large capacity storage devices (e.g., hard disks, CD-RoMs, tapes, USB devices)

  • Cross-process communication (e.g., pipelines, shared memory, UNIX sockets)

  • Network communication

  • Interactive terminal

  • Almost all other devices (e.g., printers, graphics cards)

For byte streams you can:

  • read(read)
  • write(write)
  • lseek(Pointer moves)
  • close(closed)

The unified API feature is fundamental and very effective for UNIX programs: You can easily write a program to process files in UNIX because you don’t care whether the file is stored on a local disk, on a remote network drive, propagated over the Internet, through user input, or generated in memory by some other program. This significantly reduces the complexity of the application and slows down the learning curve for developers. And this fundamental feature of the UNIX architecture makes it very easy to put programs together (you only need to transfer two special files: standard input and standard output).

Finally, note that while all files provide a consistent API, some special types of devices may not support certain operations. For obvious examples, you cannot use lseek on mouse devices, or write on CD-ROM devices (assuming your CD is read-only).

File systems have common namespaces

In UNIX, files are not just byte streams with consistent apis, but can be indexed in a uniform way: file systems have common namespaces.

Global namespace and mount mechanism

The UNIX file system path provides a consistent global scheme for labeled resources so that their physical addresses can be ignored. To give you a few examples, You can use /usr/local to access a local folder, /home/joe/memo. PDF to access a file, / MNT /cdrom to access a CD-ROM, /usr to access a folder on a network drive, and /dev/sda1 Sock to access the UNIX domain name socket, /dev/tty0 to access the terminal, and even the /dev/mouse command to access the mouse. These generic namespaces often look like a file hierarchy or folder, but as in the previous examples, these are just a convenient abstraction that a file path can refer to anything: a file system, a device, a network share, or a channel.

The namespace is hierarchical and all resources can be accessed from the root folder (/). You can use the same namespace to access multiple file systems: you just “connect” a device or file system (such as an external hard drive) in specified locations of the namespace (such as /backups). In UNIX parlance, this operation is called mounting a file system. The namespace where you connect the file system is called a mount point. You can access all of a mounted file system’s resources by prefixing them with mount points as if they were part of a common namespace (such as the file /backups/ myproject-oct07.zip).

The mount mechanism I have just described is crucial in establishing a unified, unambiguous namespace when different resources are overwritten. Compare this to the file system namespaces in Microsoft operating systems — MS-DOS and Windows treat devices as files but do not place file systems in common namespaces, which are partitioned and each physical storage address is treated as a unique entity [3] : C:\ is the first hard disk, E:\ is the CD-ROM device, and so on.

Pseudo file system

In its early days, UNIX greatly improved the integration of input/output resources by providing global apis and mounting devices into a unified file system namespace. This approach was so successful that since then there has been a trend to expose more resources and system services as file system global namespaces. Plan 9 pioneered this approach, and all new UNIX systems now do it.

This approach results in many pseudo-file systems that look like regular file systems but can access resources that are not directly associated with traditional file systems. For example, you can use pseudo file systems to query control processes, access internal kernels, or establish TCP connections. These pseudo-file systems have file system semantics, can present layered information, and provide uniform access to most objects. Pseudo-file systems, sometimes referred to as virtual file systems, have no physical devices or backup storage and rely only on memory.

Examples of pseudo-file systems:

  • procfs (/proc) : The proc filesystem contains a special file layer that can be used to query or control running processes, or to peer through standard (mostly text-based) file portals into internal kernel files.
  • devfs (/dev or /devices) : Devfs renders all devices in the system as dynamic file system namespaces. Devfs can also manage these namespaces and interfaces directly through the kernel device driver to provide intelligent device management — including device entry registration/de-registration.
  • tmpfs (/tmpTMPFS is designed for speed and efficiency, with dynamic file system sizes, explicit fallbacks for space cleanup, and more.
  • portalfs (/p) : With the BSD portal file system, you can connect a server process to the file system common namespace. This provides explicit access to network services through the file system. For example, an App can open a compliant file/p/tcp/ph7spot.com/smtpWill you come andph7spot.comOn the SMTP server. Portal file systems are amazing because they can provide socket semantics within the file system and can also be transferred and used by UNIX system tools (e.g.cat.grep.awkEtc.) — even through a shell!
  • ctfs (/system/contract) : The Solaris protocol subsystem that acts as a file-based interface. The Solaris protocol defines how a process or process group should behave for various events and failure scenarios — for example, restarting when the process is stopped. The Solaris protocol provides very advanced capabilities for software management and monitoring in environments such as clustered failover software, batch queuing systems, and grid computing engines.

The examples above give you a good idea of the scope of system resources that can be managed through file system semantics.

conclusion

In modern UNIX operating systems, all devices and most interprocess communications are viewed and managed at the file system level as files or pseudo-files. The “Everything files” vision and design principles underlying UNIX have been key to the success and longevity of UNIX. It provides a powerful, simple abstraction on which systems, tools, and communities can be built. More importantly, it addresses the problem in a proprietary way by providing a powerful integration and base composition mechanism for linking tools and applications.

Despite the success of the “everything is a document” metaphor, some people are more or less skeptical of its universality. When every file is treated as a byte stream, one consequence is the lack of standard support for metadata: in order to properly process a file, every application must figure out how to calculate the file type, schema, and semantics. Also, to preserve the metadata, each tool that processes the data flow must keep the metadata unchanged (such as XMP information in the photo). Thus, while the large byte form of UNIX files is extremely efficient for programs that link text interfaces, it also severely limits the combination of multimedia and binary applications.

Despite its limitations, many acknowledge the power of the metaphor and its effect on operating system integration. Since UNIX was first released, researchers have continued to advance this central idea. The Plan 9 operating system, for example, advocates a fully integrated approach to system resources: the Plan 9 vision is based on such a goal — not just devices and channels, but representation of all system interfaces through the file system. For example, the Plan 9 designers note that in UNIX, network devices cannot be treated as fully qualified files: they are accessed through sockets, which have unique open semantics and belong to a different namespace (the host and port of the Internet socket). Plan 9 implements and proves that you can successfully unify all local and remote devices in a global namespace. This idea was eventually implemented in UNIX in the form of Portalfs.

Other innovative concepts from Plan 9 were also created based on the “Everything is a file in UNIX” principle. For example, Plan 9 provides another layer of abstraction on top of the unified namespace design: the file system namespace can be customized by each user, each process, and even dynamically adjusted [4]. Finally, Plan 9 proves that the “everything is a file in UNIX” metaphor can be implemented on a larger scale. In fact, this basic concept continues to be developed in modern UNIX operating systems [5].

reference

  • An amazing book — The Art of UNIX Programming by Eric S. Raymond.
  • The chapters “The Elements of Operating System Style” and “Problems in The Design of UNIX” are very helpful for this article.
  • 10 Things I Hate About (U)NIX. By David Chisnall.
  • Mount definitions in the Linux Intelligence Project.
  • UNIX File Types at Wikipedia.
  • Understanding UNIX Concepts, by USAIL (UNIX System Administration Independent Learning).
  • File System Level Standard.
  • Proc file system, Redhat.
  • Modular user mode file system for BSD systems.
  • Self-healing in Modern Operating Systems, By Michael W. Help you understand more about the Solaris protocol subsystem.

  1. For more background on the UNIX operating system, read the Wikipedia entry.
  2. File descriptors are simply abstract keys to access a file. File descriptors are usually integer values associated with an open file.
  3. See Wikipedia for more details.
  4. This concept was eventually implemented in UNIX in the form of UnionFS
  5. For some examples of current activity in this domain, see Unionfs, Portalfs, and objfs for instances.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.