Explore Buffer objects that are not stored in V8 heap memory

preface

If you want to learn node.js from the last article, it is necessary to understand the content of the stream object data flow. This article will give you an in-depth explanation.

Koala is dedicated to sharing the complete Node.js technology stack, from JavaScript to Node.js, to back-end database. Wish you become an excellent senior Node.js engineer. [Programmer growth refers to north] Author, Github blog open source project github.com/koala-codin…

Buffer probe

Here’s an example of using stream to manipulate a file:

var fileName = path.resolve(__dirname, 'data.txt');
var stream=fs.createReadStream(fileName);
console.log('the stream content',stream);  
stream.on('data'.function(chunk){
    console.log(chunk instanceof Buffer)
    console.log(chunk);
})
Copy the code

If you look at the print, you will find that the first stream is an object.

What is binary

Binary is the lowest level of computer data formats, strings, numbers, video, audio, programs, network packages, etc., are stored in binary at the lowest level. Between these advanced formats and binary, the fixed encoding format can be converted to each other.

For example, an unsigned decimal integer of type INT32 in C takes up 32 bits, or 4 bytes. The binary number corresponding to the decimal 3 is 00000000 00000000 00000000 00000011. The same is true for strings, which can be converted to and from binary according to ASCII encoding rules or Unicode encoding rules such as UTF-8. In short, the data stored at the bottom of the computer is binary format, and various advanced types have corresponding coding rules and binary conversion.

Why is there a Buffer module in node

In the original javascript ecosystem, javascript still ran on the browser side and was easy to handle Unicode encoded string data, but not binary and non-Unicode encoded data. But handling of TCP/HTTP Server operations and file I/O is a must. I think that’s why Node.js provides a Buffer class that handles binary data and can handle all types of data.

A description of the Buffer module.

In Node.js, some important modules of NET, HTTP, FS data transfer and processing can be found in Buffer. Because some basic core modules rely on Buffer, so when Node is started, it has been loaded. You don’t have to do require(). The size of the Buffer is determined at creation time and cannot be adjusted.

Buffer to create

Prior to NodeJS v6.0.0, Buffer instances were created using the Buffer constructor using the new keyword, which returned different buffers depending on the parameters provided, but this declaration has been deprecated in later versions. There are several ways to create new alternatives.

Alloc and buffer. allocUnsafe(Creating Buffer of fixed size)

Alloc and buffer. allocUnsafe are the same for creating a Buffer. The parameter is the length of the Buffer and its numeric type.

// buffer. alloc and buffer. allocUnsafe create Buffer
// buffer. alloc creates a Buffer, an empty Buffer with a size of 6 bytes, initialized
let buf1 = Buffer.alloc(6);

// buffer. allocUnsafe Creates a Buffer, an uninitialized Buffer with a size of 6 bytes
let buf2 = Buffer.allocUnsafe(6);

console.log(buf1); // <Buffer 00 00 00 00 00 00>
console.log(buf2); // <Buffer 00 e7 8f a0 00 00>
Copy the code

Alloc and allocUnsafe are different for creating a Buffer. The Buffer created by buffer. alloc is initialized, and each item is filled with 00. Buffer.allocUnsafe, on the other hand, is an uninitialized Buffer that can be used immediately if it is not already in memory.

AllocUnsafe Creating buffers makes memory allocation very fast, but the allocated memory segment may contain potentially sensitive data, which has obvious performance advantages and is unsafe. Use buffers with caution.

2, Buffer. From (create Buffer directly from the content)

Buffer.from(STR,) supports three parameter transmission modes:

The first argument is a string, and the second argument is a character encoding, as inASCII,UTF-8,Base64And so on.
Pass in an array, each entry of which is stored in hexadecimalBufferEach of these terms.
Pass in aBufferWill,BufferEach item is returned as newBufferEach of these terms.

Description:BufferCurrent supported encoding format

ASCII – supports 7 bit ASCII data only.
Utf8 – Multi-byte encoded Unicode character
Utf16le-2 or 4-byte, small-encoder Unicode characters
Base64 – Base64 string encoding
Binary – binary encoding.
Hex – Encodes each byte into two hexadecimal characters.

Passing in the string and character encoding:

// Pass in the string and character encoding
let buf = Buffer.from("hello"."utf8");

console.log(buf); // <Buffer 68 65 6c 6c 6f>
Copy the code

Passing an array:

// The array member is a decimal number
let buf = Buffer.from([1.2.3]);

console.log(buf); // <Buffer 01 02 03>
Copy the code

// Array members are hexadecimal numbers
let buf = Buffer.from([0xe4.0xbd.0xa0.0xe5.0xa5.0xbd]);

console.log(buf); // <Buffer e4 bd a0 e5 a5 bd>
console.log(buf.toString("utf8")); / / how are you
Copy the code

In NodeJS, GB2312 encoding is not supported, and UTF-8 is supported by default. In GB2312, a Chinese character is two bytes, while in UTF-8, a Chinese character is three bytes, so the “hello” Buffer above is composed of six hexadecimal numbers.

// Array members are strings of numbers
let buf = Buffer.from(["1"."2"."3"]);
console.log(buf); // <Buffer 01 02 03>
Copy the code

The array member passed in can be any numeric value. If the member is a string, it is automatically identified as a numeric value. If the value is not a number or the member is some other non-numeric data type, the member is initialized to 00.

The Buffer created can be converted using the toString method to specify the encoding directly. The default encoding is UTF-8.

The incoming Buffer:

// Pass in a Buffer
let buf1 = Buffer.from("hello"."utf8");

let buf2 = Buffer.from(buf1);

console.log(buf1); // <Buffer 68 65 6c 6c 6f>
console.log(buf2); // <Buffer 68 65 6c 6c 6f>
console.log(buf1 === buf2); // false
console.log(buf1[0] === buf2[0]); // true
buf1[1] =12;
console.log(buf1); // <Buffer 68 0c 6c 6c 6f>
console.log(buf2); // <Buffer 68 65 6c 6c 6f>
Copy the code

When a Buffer is passed in, a new Buffer is created and each of its members is copied.

A Buffer is a reference type. One Buffer copies the members of another Buffer. When the copied members of one Buffer change, the corresponding members of the other Buffer do not change, indicating that the creation of a new Buffer is a deep-copy process.

The memory allocation mechanism of Buffer

A buffer corresponds to a block of raw memory outside of V8 heap memory

Buffer is a typical module combining javascript and C++. Performance-related modules are implemented in C++. Javascript is responsible for bridging and providing interfaces. Buffer is not V8 heap memory, it is independent of V8 heap memory, memory is allocated through the C++ layer (so to speak, the real memory is provided by the C++ layer), javascript allocated memory (so to speak, the javascript layer just uses it). Buffer allocates memory using an ArrayBuffer object as a carrier. In simple terms, the Buffer module uses v8::ArrayBuffer to allocate a slice of memory and write data to the V8 ::Uint8Array in TypedArray.

8K mechanism for memory allocation

Allocating small memory

When it comes to the memory allocation of Buffer, we have to say that the 8KB Buffer problem, corresponding to the source code of buffer.js processing is as follows:

Buffer.poolSize = 8 * 1024;

function allocate(size)
{
    if(size <= 0 )
        return new FastBuffer();
    if(size < Buffer.poolSize >>> 1 )
        if(size > poolSize - poolOffset)
            createPool();
        var b = allocPool.slice(poolOffset,poolOffset + size);
        poolOffset += size;
        alignPool();
        return b
    } else {
        returncreateUnsafeBuffer(size); }}Copy the code

Source directly view is 8 KB as boundaries, if write half more than 8 KB of data directly to allocate memory directly, if less than 4 KB, judging from the current distribution inside the pool is enough space to put down the current storage of data, if not enough to apply for afresh 8 KB of memory space, the data stored in the new application of space inside, If enough data is written, the data is directly written into the memory space. The following figure shows the memory allocation strategy.

8KB
Storage unit
8KB
Buffer

Allocating large memory

Again, if a Buffer object larger than 8KB is needed, a SlowBuffer object will be allocated as the base cell, which will be exclusively used by the larger Buffer object.

// Big buffer,just alloc one
this.parent = new SlowBuffer(this.length);
this.offset = 0;
Copy the code

The SlowBUffer class here is defined in C++, and although it is accessible by referencing the buffer module, it is not recommended to manipulate it directly, using buffer instead. The parent attribute points to a Buffer object that is defined in Node’s own C++ layer and not in the V8 heap

Memory allocation limits

In addition, there is a limit to how much Buffer can be allocated at a time, which varies from operating system to operating system. This limit can be seen in node_buffer.h

    static const unsigned int kMaxLength =
    sizeof(int32_t) = =sizeof(intptr_t)?0x3fffffff : 0x7fffffff;
Copy the code

The maximum memory allocation at a time is 1 GB for 32-bit operating systems and 2 GB for 64-bit or higher systems.

Advantages of the buffer memory allocation mechanism

The real memory for Buffer is provided by the C++ layer of Node, the JavaScript layer just uses it. When small and frequent Buffer operations are carried out, the mechanism of 8KB as a unit is used for pre-allocation and post-allocation, so that there are not too many system calls in memory allocation between Javascript and the operating system. For large buffers (larger than 8KB), the memory provided by the C++ layer is used directly without the need for delicate allocation.

Buffer and stream

Why use binary buffers for streams

According to the original code print, the data flowing through the stream is of type Buffer, or binary.

Reason 1:

The main purpose of stream design is to optimize IO operations (file IO and network IO). The corresponding backend is file IO and network IO. The main purpose of Stream design is to optimize IO operations (file IO and network IO). The data format is unknown, including string, audio, video, network package, etc. Even if it is a string, its encoding format is unknown, either ASC encoding or UTF-8 encoding. For these unknown cases, it is better to directly use the most common format binary.

Reason two:

Buffer also provides a performance boost for HTTP requests.

Here’s an example:

const http = require('http');
const fs = require('fs');
const path = require('path');

const server = http.createServer(function (req, res) {
    const fileName = path.resolve(__dirname, 'buffer-test.txt');
    fs.readFile(fileName, function (err, data) {
        res.end(data)   // Test 1: Return binary data directly
        // res.end(data.toString())) // Test 2: Returns string data
    });
});
server.listen(8000);
Copy the code

Increase the buffer-test file size in your code to about 50KB, and then use the ab tool to test performance. You’ll see that returning binary is much more efficient than returning string, both in Requests per second and connection times. Why is string formatting inefficient? — Because the data requested by the network is originally transmitted in binary format, although the code is written to return a string of response, it will eventually have to be converted to binary for transmission, so there is an extra step of operation, of course, the efficiency is low.

The role of a Buffer in a stream data flow

We can think of the entire stream and Buffer coordination process as a bus stop. At some bus stations, buses do not leave until they are full of passengers, or at certain times. Of course, passengers may also be at different times, the size of the flow of people will be different, when there are more people, when there are fewer people, passengers or bus stations can not control the flow of people.

At any time, early passengers must wait until the bus is told to leave. When a passenger arrives at his station and finds that the bus is full or has already left, he must wait for the next bus.

In short, there is always a waiting area, and the waiting area is the Buffer in Node.js. Node.js has no control over when data arrives, or how fast it travels, just like a bus stop has no control over the flow of people. He can only decide when to send the data (the bus leaves). If not, Node.js puts the data into a Buffer waiting area, an address in RAM, until it sends them out for processing.

Note:

Both Buffer and String can store data as strings. However, unlike Buffer, String is allocated using v8 heap instead of c++ out-of-heap. In addition, Google has also optimized String to be faster than Buffer in actual concatenation measurements. However, Buffer is designed to process binary and other non-Unicode encoded data, so you need to use Buffer to process non-UTF8 data.

Today share so much, if you are interested in sharing the content, you can pay attention to the public account “programmer growth refers to north”, or join the technical exchange group, we discuss together.

Node series original article:

Learn more about processes and threads in Node.js

To learn node. js, it is necessary to know stream first

Exports: module. Exports: module

Understand the Events module thoroughly

Node.js advanced fs file module learning

Pay attention to my

Feel good point Star, welcome to join the group to learn from each other.