background

Recently, with the rapid development of the business, the operation said that they needed to make sitemap optimization SEO, so the back end threw me a TXT file with tens of millions of pieces of data, and asked me to join them according to the ID. Finally, multiple XML files in line with Sitemap standards would be delivered to the operation.

The TXT file format is as follows:

123
456
789
xxx
yyy
Copy the code

Train of thought

The initial idea is to use the readFileSync and appendFileSync apis in the FS module to split the read content into lines and concatenate it with template strings. The core code is as follows

const fs = require("fs");

fs.readFile("./validBrandItems.txt"."utf-8".(err, data) = > {
  if (err) {
    console.log('err => ', err);
  } else {
    // todo...}});Copy the code

This is not a big problem, but the bad thing is that the TXT file is too big, I use nodejs to read the error directly to me, the cause of this error is that the file size is too large, Node will limit.

caught err Error: Cannot create a string longer than 0x1fffffe8 characters
    at Object.slice (buffer.js:608:37)
    at Buffer.toString (buffer.js:805:14)
    at FSReqCallback.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:58:23) {
  code: 'ERR_STRING_TOO_LONG'
}
Copy the code

The solution

After a baidu Google search, I found that this situation requires the use of Stream to read. Simply speaking, instead of reading the entire file into memory at once, you can load the file bit by bit, which is commonly used in streaming media sites to transfer data.

The fs createReadStream API is used to create a readable stream, and the ReadLine createInterface API is used to read the file line by line. I’m still using the appendFileSync API for writing files.

const fs = require("fs");
const readline = require("readline");

const source = "./validBrandItems.txt"; // Read the target
const rs = fs.createReadStream(source);

const rl = readline.createInterface({
  input: rs,
  crlfDelay: Infinity}); rl.on("line".(line) = > {
  if(! line)return;
  
  // todo...
}
Copy the code

The above code basically solved the node limitation problem when reading large files, and learned the practical application scenario of stream.