The IO storage principle of Vim, the God of Linux editors

[toc]

Story cause

Original is not easy, more dry goods, welcome to pay attention to the public number: queer cloud storage

Accidentally opened a 10 G file with Vim, changed a line of content, : W saved for a while, slow me yo, the time spent enough to brew a few cups of tea. This got me wondering, what does Vim do when it opens and saves?

Vim – The god of editors

Dubbed the god of editors, Vim is known for its extreme extensibility and functionality. Vi/Vim exists as the standard editor on almost every distribution of Linux. Vim’s learning curve is steep, and there must be a grinding process at the beginning.

Vim is a terminal editor. In today’s world of visual editors, why is Vim so important?

Because there are situations where it is absolutely necessary, such as online server terminals, you have no choice but to use a terminal editor like VI/Vim.

Vim has a long history. Github has a document that summarizes vim’s history: Vim History, Github Open Source code: Code Repository.

Today I will not talk about the use of vim, this article online casually search a lot. Kiya will analyze this artifact from the perspective of VIM’s storage IO principle.

Here are a few quick questions to consider, so if you’re interested, you can read on:

How does Vim edit files? What dark technology does it use?
Vim opens a large 10G file. Why is it so slow and what is going on inside?
Vim modified a 10G large file, :w saved time, feel slower? Why is that?
Vim seems to be generating extra files? To file? SWP file? What do they do?

This article will focus on IO and analyze vim’s principles from the perspective of storage.

IO principle of Vim

Declaration, system and Vim versions are as follows:

Operating system version: Ubuntu 16.04.6 LTS

Compiled Jul 25 2021 08:44:54 VIM – Vi IMproved 8.2 (2019 Dec 12, Compiled Jul 25 2021 08:44:54)

Test file name: test.txt

Vim is just a binary program. Readers can also Github download, compile, debug their own oh, the effect is better.

Editing a file with Vim is generally simple. All you need is vim followed by the file name:


vim test.txt

Copy the code

This opens the file and allows you to edit it. This command is typed, in general, we can quickly see the contents of the file in the terminal.

What happens to this process? What does vim test.txt mean?

Essentially, run a program called vim, argv[1] taking test.txt. It’s no different than the helloWorld program you wrote before, except vim allows terminal human-computer interaction.

So the process is nothing more than a process initialization process, starting with main and going to main_loop (background loop listener).

The VIm process was initialized. Procedure

Vim has an entry file called main.c, where the main function is defined. The first thing to do is to initialize the operating system (MCH stands for machine) :


mch_early_init();

Copy the code

Then it will do the following assignment parameter, global variable initialization:


/* * Various initialisations shared with tests. */

common_init(&params);

Copy the code

For example, a parameter like test.txt must be assigned to a global variable because it will be used frequently.

In addition, the map table, similar to the command, is statically defined:


static struct cmdname

{

char_u *cmd_name; // name of the command

ex_func_T cmd_func; // function for this command

long_u cmd_argt; // flags declared above

cmd_addr_T cmd_addr_type; // flag for address type

} cmdnames [] = {

EXCMD(CMD_write, "write", ex_write,

EX_RANGE|EX_WHOLEFOLD|EX_BANG|EX_FILE1|EX_ARGOPT|EX_DFLALL|EX_TRLBAR|EX_CMDWIN|EX_LOCK_OK,

ADDR_LINES),

}

Copy the code

Vim commands such as :w, :write, :saveas correspond to a defined C callback function: ex_write. The ex_write function is the core function for writing data. For example, :quit corresponds to ex_quit, which is used as an exit callback.

In other words, vim supports commands such as w, which are determined during initialization. The human interaction is to input the string, and the Vim process reads the string from the terminal, finds the corresponding callback function, and executes. Again, it initializes some variables like home directory, current directory, etc.


init_homedir(); // find real value of $HOME

// Save the interactive parameters

set_argv_var(paramp->argv, paramp->argc);

Copy the code

Configure what is displayed in the terminal window. This section is mainly related to the terminal library:


// Initialize some terminal configurations

termcapinit(params.term); // set terminal name and get terminal

// Initialize the cursor position

screen_start(); // don't know where cursor is now

// Get some terminal information

ui_get_shellsize(); // inits Rows and Columns

Copy the code

Then you’ll load configuration files like.vimrc to make your Vim unique.


// Source startup scripts.

source_startup_scripts(&params);

Copy the code

The vim plug-in source_in_path is also loaded, using load_start_packages to load the package.

Here’s the first interaction, waiting for the user to hit Enter:


wait_return(TRUE);

Copy the code

What we often see: “Press ENTER or type command to continue” is executed here. Confirm that you really want to open the file and display to the terminal.

How do I open a file? How to display characters to the terminal screen?

This all comes from the create_Windows function. The name makes sense, because it was created when the terminal window was initialized.


/* * Create the requested number of windows and edit buffers in them. * Also does recovery if "recoverymode" set. */

create_windows(&params);

Copy the code

There are actually two aspects involved here:

Read the data out, read into memory;
Render the character to the terminal;

How do you read data off the disk? That’s IO. We don’t care how to render to the terminal, this is implemented using a terminal programming library such as Termlib or Ncurses, if you are interested in it.

This function calls our first core function, open_buffer, which does two times:

Create memfile: Creates an abstraction layer for the memory +.swp file that reads and writes data through;
Read file: Reads the original file and decodes it (for display on screen);

Function call stack:


-> readfile

-> open_buffer

-> create_windows

-> vim_main2

-> main

Copy the code

The real work is the readfile function, which is a 2533-line function, just for fun…

Readfile will create a SWP file (which can be used to restore data if there is one) at any given time, using the ml_open_file function. Once the file is created, the size of the file will take up 4k, and it will contain some specific metadata (used to restore data).

< span style = “box-width: border-box; color: RGB (50, 50, 50); display: block;

Going further, we call read_eintr to read the contents of the data:


long

read_eintr(int fd, void *buf, size_t bufsize)

{

long ret;

for (;;) {

ret = vim_read(fd, buf, bufsize);

if (ret >= 0|| errno ! = EINTR)break;

}

return ret;

}

Copy the code

This is a low-level function that encapsulates the system call read. This answers a key question: How does Vim store?

Underline: essentially callread.write.lseekIt’s a simple system call, that’s all.

Readfile will read the binary data out and then convert the characters (according to the configured mode), the encoding is not garbled. Each time the data is read from a fixed buffer, such as 8192.

To highlight:readfileWill read the document. This is why when Vim opens a very large file, it is very slow.

As an aside, the memline encapsulation is on top of the file. Vim modifs the file to the memory buffer. Vim sync the memfile to the SWP file according to the policy, one is to prevent loss of unsaved data, and the other is to save memory.

Mf_write Writes memory data to a file. This is the data structure in.test.txt. SWP:

Header of block 0

Vim version;
Edit file path;
Character encoding;

Here the implementation of an important knowledge point: SWP file is stored in the block, block management is a tree structure for management. There are three types of blocks:

Block0:4k header, which stores metadata such as path, encoding mode, timestamp, etc.
Pointer block: inner node of the tree;
Data block: Tree leaf node, storing user data;

Knock down`:w`The principle behind it

Now that we’re done with process initialization, let’s look at the call that w triggers. The user enters the :w command to trigger the ex_write callback (configured during initialization).

All flows are in ex_write, so let’s see what this function does.

Regardless of the code implementation, the user simply wants to save his or her changes by typing :w.

So the first question? Where are the user changes?

In memline encapsulation, the user’s changes are not changed to the original file as long as the :w save is not performed. In this case, the user’s changes may be in memory or in SWP files. The stored data structure is block.

So, :w is basically a memline data to the user file. How to brush?

The key steps are as follows (using test.txt as an example) :

Create a backup file (test.txt~) and copy the original file;
Test.txt truancate truncated to 0, equivalent to empty the original file data;
Copy data from memline (memory +.test.txt. SWP) and write it to test.txt.
Delete the backup file test. TXT ~;

So that’s all w does, so let’s look at the code.

The callback that fires is ex_write, and the core function is buf_write, which is line 1987.

In this function, we will use McH_open to create a backup file with a name after the name ~, such as test.txt~,


bfd = mch_open((char *)backup

Copy the code

Copy the data from test.txt to test.txt every 8K to make a backup.

Underline: If yestest.txtIt’s a very large file, so it’s slow here.

The backup loop is as follows:


// buf_write

while ((write_info.bw_len = read_eintr(fd, copybuf, WRITEBUFSIZE)) > 0)

{

if (buf_write_bytes(&write_info) == FAIL)

// If it fails, it terminates

// Otherwise until the end of the file}}Copy the code

Buf_write_bytes, which is the write_eintr function, writes a buffer to disk.


long write_eintr(int fd, void *buf, size_t bufsize) {

long ret = 0;

long wlen;

while (ret < (long)bufsize) {

// The wrapped system call write

wlen = vim_write(fd, (char *)buf + ret, bufsize - ret);

if (wlen < 0) {

if(errno ! = EINTR)break;

} else

ret += wlen;

}

return ret;

}

Copy the code

After the backup file is copied, you are ready to move the original file.

Consider: Why backup files in the first place?

Leave a way back ah, the mistake and restore, this is the real backup file.

The first step before modifying the original file is to ftruncate the original file to 0. Then, copy the data from memline (memory + SWP) and write back to the original file.

Underline: this is another copy of the file, and it can be very slow when you have very large files.


for (lnum = start; lnum <= end; ++lnum)

{

// Retrieve the data from memline and return a memory buffer (memline is a package of memory and swap files)

ptr = ml_get_buf(buf, lnum, FALSE) - 1;

// Write the memory buffer to the original file

if (buf_write_bytes(&write_info) == FAIL)

{

end = 0; // write error: break loop

break;

}

// ...

}

Copy the code

Emphasis: Vim does not make calls like pwrite/pread to modify the original file. Instead, it cleans the entire file and updates it by copying it. Knowledge has increased.

Delete the backup file.


// Remove the backup unless 'backup' option is set or there was a

// conversion error.

mch_remove(backup);

Copy the code

So this is the whole process of data writing. Is not as simple as you think!

To recap: What happens when you modify the test.txt file and call :w to write the saved data?

In human-machine interaction, :w triggers a call to the ex_write callback function, after do_write -> buf_write completes the write;
The specific operations are as follows: First back up a test. TXT file (full copy);
Next, the original file test.txt is truncated to 0, copy the data from memline (that is, the latest memory data + the package of.test.txt.swap), and write the data to test.txt (full copy).

“Drawing”

Data organization structure

I went into too much detail, so let’s try to explain it from the perspective of data organization. Vim encapsulates two layers of abstraction, memline and memfile, on top of the original file for users to modify the file. The corresponding files are memline.c and memfile.c.

Say firstmemlineIs what?

Corresponding to each line in a text file, memline is based on memfile.

Memline is based on memfile. What is a memfile?

This is a virtual memory space implementation, vim maps the entire text file into memory, through its own management way. The unit here is block, and memfile manages blocks in a binary tree. A block is of variable length. A block consists of pages. A page is 4k of a fixed length.

This is a typical virtual memory implementation scheme, the editor’s modification is reflected in the memfile modification, modification is modified on the block, this is a linear space, each block corresponds to the location of the file to be given, there is a block number, Vim uses a policy to swap blocks out of memory and write them to SWP files to save memory. This is where the swap file gets its name.

There are three types of blocks:

Block 0: The root of the tree, file metadata;
Pointer block: a branch of the tree that points to the next block;
Data block: Leaf node of a tree that stores user data;

Swap file organization:

Block 0 is a special block. The structure takes up 1024 bytes of memory. Writes to files are aligned to one page, so it is 4096 bytes. The diagram below:

There are two other types of block:

Pointer: this is the middle branch node that points to the block;
Data type: This is a leaf node;


#define DATA_ID (('d'< < 8) +'a') // data block id

#define PTR_ID (('p'< < 8) +'t') // pointer block id

Copy the code

This ID is equivalent to the magic number, which is easy to identify in SWP files. For example, in the following file, the first 4K stores block0, and the second 4K stores pointer blocks.

The third and fourth 4K stores a block of type Data, which stores the original file data.

When a user modifies a line, the change in memline corresponds to the change in which block the line is in, which is periodically flushed to the swap file.

Vim special files ~ and.swp?

Assume that the original file name is test.txt.

Test. TXT to file

Test.txt file is probably not seen by many people, because it disappears so fast. This file is generated before modifying the original file and deleted after modifying the original file. Function exists only in buF_write and is used for safe backup.

Test.txt is essentially the same as test.txt, there is no other specific format, it is all user data.

Try vim a 10 GB file, change a line, save :w, and you should find this file easily (because the backup and write back takes a long time).

. Test. TXT. SWP file

The.swp file life cycle exists throughout the lifetime of the process, and the handle is always open. Many people think that.test.txt. SWP is a backup file, but it is not a backup file, it is a swap file for virtual memory space, test.txt~ is the real backup file.

SWP is a part of memfile. The first 4K is the header metadata, and the second one is the encapsulation of 4k data rows. It doesn’t exactly correspond to user data.

Memfile = memory + SWP is the latest data.

Thinking to solve

How does VIM storage work?

Nothing, just reading and writing data using system calls like read and write.

Vim’s process has two kinds of redundant files?

Test.txt ~ : is the real backup file, born before the original file is modified, disappeared after the modification is successful;

.test.txt. SWP: swap file, consisting of blocks, may be the user’s unsaved changes, wait for :w such a call, will be overwritten to the original file;

Why is Vim slow to edit oversized files?

In general, you can intuitively feel the slowness in two places:

When Vim opens;
Modified one line, when :w saved;

Let’s start with the first scenario: Vim is a 10G file. What’s your intuition?

My intuitive feeling is: after the command is knocked down, you can go to make a cup of tea, such as tea cool a bit, almost can see the interface. Why is that?

During process initialization, before initializing the window, the call to readFile in create_Windows -> open_buffer will read the entire file once (in its entirety), displaying the encoded characters on the screen.

Underline: When initializing, readfile will read the entire file. With 10 gigabytes of files, you can imagine how slow it is. We can calculate that it would take 102 seconds for 100 M/s of bandwidth per disk.

Say a second scene: drink tea, changed a word, :w save, mom, after the command knock, can go to make a cup of tea again? Why is that?

Copy a 10GB backup file test.txt, 102 seconds passed;
Copy memfile (.test.txt. SWP) back to test.txt. 10gb of data. 102 seconds elapsed.

When vim is editing a large file, the space expands?

Yes, vim a test. TXT 10 GB file, will exist at some point, need >=30 GB disk space.

Test.txt 10g
Backup file test. TXT ranges from 10 gb
The swap file.test.txt. SWP >10G

conclusion

Vim doesn’t edit files with dark magic, but with read, write, and plain system calls;
Vim is slow to edit oversized files, because it will read the whole file once, and slow to save, because it will read and write the whole file twice (backup once, memfile overwrites the original file once).
Memfile is a layer of viM abstract virtual storage space (physically composed of memory blocks and SWP files) corresponding to a file modification, the storage unit is composed of blocks. When w is saved, it reads from memfile and writes to the original file.
Memline is another layer of encapsulation based on memfile, abstracting user files into the concept of “lines”;
The.test.txt. SWP file is always open. The memfile periodically exchanges data to facilitate disaster recovery.
Test.txt is the real backup file, born before :w overwrites the original file, disappears after the successful overwrites the original file;
Vim basically deals with whole files, not local ones. Editing large files is not suitable for Vim at all. Anyway, who would use Vim to edit 10 GB files? Vim is a text editor;
A readfile function 2533 lines, a buf_write function 1987 lines of code… I don’t want to discourage you, it’s… I don’t want to see it again anyway…

Afterword.

Curious about Vim, I went through the source code and learned the IO knowledge, but I didn’t want to be educated about thousands of lines of functions. Read the button 1, do not understand the buckle eye beads… Did you learn fei?