This is the 10th day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021
Understand the Buffer
JavaScript is very friendly with strings
A Buffer is an array-like object that is used primarily to manipulate bytes.
The Buffer structure
Buffer is a typical combination of JavaScript and C++ module, which implements performance related parts in C++ and non-performance related parts in JavaScript.
The memory occupied by Buffer is not allocated by V8 and is off-heap memory. Due to the performance impact of V8 garbage collection, it is a good idea to manage common operation objects with more efficient and proprietary memory allocation recycling policies.
Buffers are valued when the Node process starts and are placed on the global object. So using buffer does not require introduction
Buffer object
The elements of a Buffer object are two hexadecimal digits, ranging from 0 to 255
let buf01 = Buffer.alloc(8);
console.log(buf01); // <Buffer 00 00 00 00 00 00 00 00>
Copy the code
You can use fill to fill buF values (utF-8 encoding by default), and if you fill more than buffer, it will not be written.
If the buffer is larger than the content, it is filled repeatedly
If you want to empty the previously filled content, you can simply fill()
buf01.fill('12345678910')
console.log(buf01); // <Buffer 31 32 33 34 35 36 37 38>
console.log(buf01.toString()); / / 12345678
Copy the code
If the input is In Chinese, utF-8 affects 3 elements for Chinese characters and 1 element for letters and punctuation marks.
let buf02 = Buffer.alloc(18.'Start our new journey.'.'utf-8');
console.log(buf02.toString()); // Start our new one
Copy the code
Buffer is heavily influenced by the Array type. You can access the Length attribute to get the length, access elements using subscripts, and view element positions using indexOf.
console.log(buf02); // <Buffer e5 bc 80 e5 a7 8b e6 88 91 e4 bb ac e7 9a 84 e6 96 b0>
console.log(buf02.length) / / 18 bytes
console.log(buf02[6]) // 230: e6 converts to 230
console.log(buf02.indexOf('我')) // 6: in the seventh byte position
console.log(buf02.slice(6.9).toString()) // I:
Copy the code
If you assign a byte that is not an integer between 0 and 255, or if you assign a decimal that is less than 0, increments the value to 256. Until you get an integer between 0 and 255. If it’s greater than 255, subtract 255 each time. If it’s a decimal, round off the decimal part (no rounding)
Buffer memory allocation
Buffer objects are allocated in the C++ layer of Node rather than in V8’s heap. Because processing a large number of bytes of data does not require a bit of memory from the operating system. For this purpose, Node uses memory allocation in C++ and JavaScript
Node uses slab allocation mechanism, which is a dynamic memory management mechanism. Currently, SLAB is widely used in some * NIX operating systems, such as Linux
A slab is a allocated memory area of fixed size. A slab has the following three states:
- Full: Indicates the full allocation status
- Partial: indicates the partial allocation status
- Empty: no assigned state
Node uses an 8KB threshold to distinguish between large and small buffers
console.log(Buffer.poolSize); / / 8192
Copy the code
This 8KB value is the size of each slab, which is used as a unit of memory allocation at the JavaScript level
Allocates small buffer objects
If the specified Buffer size is less than 8KB, Node allocates the Buffer as a small object
- Construct a new slab unit that is now empty
- To construct small
buffer
Object 1024KB, currentslab
Will be taken up 1024KB and recorded from thisslab
Where was it first used
- Now create another one
buffer
Object with a size of 3072KB. The construction process determines the currentslab
Whether the remaining space is sufficient, if so, use the remaining space, and updateslab
The allocation status of. After 3072KB space is used, the remaining space of the slab is 4096KB.
- If you create a 6144KB
buffer
, the current slab space is insufficient, and a new one will be createdslab
(This will waste the remaining space of the original slab)
For example in the following example:
Buffer.alloc(1)
Buffer.alloc(8192)
Copy the code
Only 1-byte buffer objects will exist in the first slab, and the next buffer object will build a new slab
Since a slab may be allocated to multiple Buffer objects, the space of the slab will be reclaimed only if these small Buffer objects are freed in scope and can all be reclaimed. Even though only 1 byte of buffer object is created, 8KB of memory is not freed if not freed
Summary:
The real memory is provided at the C++ level of Node and used only at the JavaScript level. When small and frequent Buffer operations are carried out, slab mechanism is used for pre-application and time allocation, so that there are not too many system calls in memory application between JavaScript and the operating system. For large buffers, the memory provided by the C++ layer can be used directly, without delicate allocation operations.
The joining together of Buffer
Buffers are typically transmitted in segments in usage scenarios.
const fs = require('fs');
let rs = fs.createReadStream('./ quiet night think. TXT ', { flags:'r'});
let str = ' '
rs.on('data'.(chunk) = >{
str += chunk;
})
rs.on('end'.() = >{
console.log(str);
})
Copy the code
The above is an example of reading streams. The chunk objects retrieved from data time are buffer objects.
But problems arise when there is wide-byte encoding (more than one byte for a word) in the input stream. The toString() operation is hidden in STR += chunk. Equivalent to STR = str.tostring () + chunk.tostring ().
The following limits the length of each read buffer for a readable stream to 11.
fs.createReadStream('./ quiet night think. TXT ', { flags:'r'.highWaterMark: 11});
Copy the code
The output is:
For any buffer of any length, it is possible to truncate a string of wide bytes, but the longer the buffer, the less likely it will be truncated.
encoding
However, if encoding is set to UTF-8, this problem does not occur.
fs.createReadStream('./ quiet night think. TXT ', { flags:'r'.highWaterMark: 11.encoding:'utf-8'});
Copy the code
Reason: Although the stream fires the same number of times no matter how the encoding is set, when setEncoding is called, the readable stream object sets a decoder object internally. Each data event is decoded from buffer to string via the decoder object and passed to the caller.
The string_decoder module provides an API for decoding Buffer objects into strings in a manner that preserves encoded multi-byte UTF-8 and UTF-16 characters
const { StringDecoder } = require('string_decoder');
let s1 = Buffer.from([0xe7.0xaa.0x97.0xe5.0x89.0x8d.0xe6.0x98.0x8e.0xe6.0x9c])
let s2 = Buffer.from([0x88.0xe5.0x85.0x89.0xef.0xbc.0x8c.0x0d.0x0a.0xe7.0x96])
console.log(s1.toString());
console.log(s2.toString());
console.log('-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --');
const decoder = new StringDecoder('utf8');
console.log(decoder.write(s1));
console.log(decoder.write(s2));
Copy the code
After StringDecoder is encoded, it knows that wide-byte strings are stored as three bytes in UTF-8 encoding, so the first decoder.write will print only the first nine bytes transcoded, with the last two remaining inside Of StringDecoder.
Buffer and performance
Buffer is widely used in file I/O and network I/O, especially in network transmission. In applications, strings are usually manipulated, but once transferred over the network, they need to be converted to buffers for binary data transfer. In Web applications, string conversion to buffer happens all the time. Improving the conversion efficiency of string to buffer can greatly improve the network throughput rate.
If sent to the client as a pure string, performance is worse than sending buffer objects, which do not need to be converted each time they respond. By pre-converting static content to buffer objects, you can effectively reduce CPU reuse and save server resources.
You can choose to separate the dynamic and static content of the page, and the static content portion is pre-converted to buffer to improve performance.
The highWaterMark setting is critical for performance when a file is read. Ideally, the length of each read is the highWaterMark specified by the user.
There are two points where highWaterMark size affects performance:
- The allocation and usage of buffer memory are affected to some extent
- If the value is too small, the system may call too many times