COW is not a COW. It is short for copy-on-write, which is a technique for copying, but not exactly copying.
In general, to copy is to create two identical copies, which are independent:
However, sometimes copying this matter is not necessary, you can reuse the previous, this time you can just refer to the previous copy, when writing the content to copy the corresponding part of the content. This eliminates copying if the content is to be read, but if it needs to be written, parts of the content can actually be copied to make changes.
This is called copy-on-write, or copy-on-write.
The principle is simple, but it’s common in operating system memory management and file systems, and it’s getting lazy in Node.js.
In this article we will explore the application of copy-on-write in Node.js process creation and file copying:
Copying files
The most common idea for copying files is to write exactly the same file in a different location, but there are two problems with this:
- Write exactly the same content. If the same file is copied hundreds of times, will the same content be created hundreds of times? What a waste of hard drive space
- What if the power goes out halfway through? How can overwritten content be restored?
What to do? At this point, the operating system designer came up with COW technology.
After realizing file replication with COW technology, the above two problems are solved perfectly:
- Replication only adds a reference to the previous content. If you do not modify the content, the data block is copied only when you modify the content for the first time. In this way, you avoid wasting a lot of hard disk space.
- When writing a file, it is first modified on another free disk block, and then copied to the target location, so that there is no power failure and cannot be rolled back
Node.js fs.copyFile API can use copy-on-write mode:
By default, a copyFile is written to the target file, overwriting the original content
const fsPromises = require('fs').promises;
(async function() {
try {
await fsPromises.copyFile('source.txt'.'destination.txt');
} catch(e) {
console.log(e.message);
}
})();
Copy the code
But you can specify the replication policy with a third parameter:
const fs = require('fs');
const fsPromises = fs.promises;
const { COPYFILE_EXCL, COPYFILE_FICLONE, COPYFILE_FICLONE_FORCE} = fs.constants;
(async function() {
try {
await fsPromises.copyFile('source.txt'.'destination.txt', COPYFILE_FICLONE);
} catch(e) {
console.log(e.message);
}
})();
Copy the code
The following three flags are supported:
- COPYFILE_EXCL: An error is reported if the target file already exists (overwrite by default)
- COPYFILE_FICLONE: Copy data in copy-on-write mode. If the operating system does not support it, the data is copied (direct copy by default).
- COPYFILE_FICLONE_FORCE: Copy data in copy-on-write mode. If the OPERATING system does not support COPYFILE_FICLONE_FORCE, an error occurs
The three constants are 1, 2, and 4, which can be passed by bitwise or by combining them:
const flags = COPYFILE_FICLONE | COPYFILE_EXCL;
fsPromises.copyFile('source.txt'.'destination.txt', flags);
Copy the code
Node.js supports the copy-on-write technology of the operating system, which can improve performance in some scenarios. It is recommended to use COPYFILE_FICLONE, which is better than the default method.
Process creation
Fork is a common way to create processes, and its implementation is a copy-on-write technique.
As we know, the process is divided into code segment, data segment, and stack segment in memory:
- Code snippet: Holds code to be executed
- Data segment: Stores some global data
- Stack segment: Holds the state of execution
If a new process is created based on that process, these three portions of memory are copied. If all three parts of memory are the same, it’s a waste of memory.
So the fork does not actually copy memory, but creates a new process that references the parent process’s memory and then actually copies that part of memory when the data is modified.
That’s why it’s called a fork, because it’s not completely independent, it’s just part of the process is forked into two parts, but most of the process is the same.
But what if the code to be executed is different? In this case, exec creates new code segments, data segments, stack segments, and executes new code.
We can also use the fork and exec apis in Node.js:
fork:
const cluster = require('cluster');
if (cluster.isMaster) {
console.log('I am master');
cluster.fork();
cluster.fork();
} else if (cluster.isWorker) {
console.log(`I am worker #${cluster.worker.id}`);
}
Copy the code
exec:
const { exec } = require('child_process');
exec('my.bat'.(err, stdout, stderr) = > {
if (err) {
console.error(err);
return;
}
console.log(stdout);
});
Copy the code
Fork is the foundation of Linux process creation, which shows how important copy-on-write technology is.
conclusion
Therefore, the operating system uses copy-on-write technology for file replication and memory replication during process creation. The copy-on-write technology is used only when actual changes are made.
Node.js supports fs.copyFile flags. You can specify COPYFILE_FICLONE to Copy files in copy-on-write mode.
The fork of a process is also an implementation of copy-on-write. It does not Copy the code, data, and stack segments of the process directly to the new content.
In addition, copy-on-write has many applications in Immutable implementations and distributed read/Write separation.
COW makes Node.js “lazy” but performs better.