The purpose of this series of articles is to gain an in-depth understanding of the stable modules of Node.js in conjunction with the documentation.
In the process of learning, follow the idea of what -> why -> How, that is, understand what the module is, why it is there (what function it has), and how to use the API of the module.
What is flow?
First, the official definition:
A stream is an abstract interface in Node.js that handles streaming data. The Stream module is used to build objects that implement the stream interface.
A stream object is used to manipulate streams of data in Node.js.
Why: Flow?
Official definition:
The primary goal of the Stream API, especially stream.pipe(), is to limit data buffering to an acceptable level so that sources and destinations with inconsistent read and write speeds don’t overwhelm memory.
Here’s an example of the official stream API goal definition:
When we want to read the contents of a large file locally from the server and send it back to the client through network request, without using stream, we first need to read the entire contents of the file through disk IO into the program memory, and then return to the client through network IO.
Under normal circumstances, the disk I/O speed as the source of data is much faster than the network I/O speed as the transmission destination. This is where “sources and destinations of inconsistent read and write speeds” can overwhelm program memory when the concurrency is high enough or the file content is large enough.
In the process of streaming transmission, the data reading speed can match the consumption speed of consumers, so the memory occupation will be greatly reduced, avoiding the above problems.
How: Use of streams
The implementation of a Stream is also essentially an object that streams data by listening to and firing various events on the stream object and calling instantiated methods.
Node.js classifies streams into four types:
- Readable stream Readable
- Writable stream Writable
- Duplex flow Duplex
- Transformation flows the Transform
Each type of flow object has its own different triggering events and instance methods, and we only need to understand its events and methods one by one to learn to use flows.
Before we look at each type of method, we can look at some common concepts of each type:
The concept of
The buffer pool
When we use stream to read streaming data, we can imagine that the data source is a water pump, and the readable stream is the water pipe connected to the pump. Data flows out of the pump and is constantly flooding the consumer.
However, the producer’s data flow does not flow directly to the consumer, but first to the cache pool, from which the consumer can consume data at any time.
Of course, the buffer pool is not infinite; it has a threshold parameter, highWaterMark.
As you can see from the literal meaning of this parameter term, Node.js imagines the buffer pool as a pool, with producers being connected to the water pipes that feed into the buffer pool and consumers being connected to the water pipes that flow out of the buffer pool.
When this threshold is exceeded, producers should stop pushing data into the buffer pool and wait for consumers to finish consuming the data before producing. Otherwise, data exceeding the threshold will overflow into program memory until the maximum memory limit of Node.js is reached.
Back pressure
The phenomenon of data in the buffer pool exceeding the threshold when consumption is slower than generation is called back pressure.
Object pattern
The default data transferred in a Stream is String and Buffer. ObjectMode can be turned on to transfer data of other object types when we want to keep the data transferred as other types.
Readable Readable stream
Flow and pause
Readable flows have two modes: flow mode and pause mode.
All readable streams start in pause mode and can be switched to flow mode by:
- Call resume method
- Add a data event listener function
- Calling the PIPE method
Add data event listener by calling resume ();
A data event listener is automatically added to the readable stream when the PIPE method is called.
That is, all three methods end up in flow mode by calling the resume method, which ends up calling the following flow function:
As you can see, after switching to flow mode, the flow function continuously calls the read method through a while loop to produce data. The read methods for different types of readable streams read data from different sources, such as file readable streams that read from disk IO and HTTP readable streams that read from network IO. See custom flow types below for details.
Readable streams can switch to pause mode in two ways:
- The pause method is called when there is no pipe target (no PIPE connection has been called).
- When a pipe target is present, the unPIPE method is called to remove the pipe target.
Note that calling pause only stops the flow of data, not the generation of data. Therefore, developers should call the Resume method as soon as possible after a pause to resume the flow.
Readable stream events
In addition to the above methods of manipulating the readable stream pattern, there are several more important events for readable streams: Data events, End events, Error events, and Readable events.
In newer versions of Node.js we can consume streaming data in a number of ways:
- Listening for data events
- Listening for readable events
- Pipe method
In general, the readable.pipe() and ‘data’ event mechanisms are easier to understand than the ‘readable’ event. Processing ‘readable’ events can result in throughput increases.
For most users, readable.pipe() is recommended, as it is the easiest way to consume streaming data. If developers need to have more control over data passing and generation, they can use EventEmitter, readable.on(‘readable’)/readable.read(), or readable.pause()/readable.resume().
For most scenarios, the above official documentation recommends that developers use data events and Pipes to consume data. Pipe is used if there is a pipe target, and data event is used if there is not.
By listening for readable events to consume data, we need to handle more cases manually, but it also has the benefit of processing the data in a more refined way.
Therefore, listening for readable events to consume data is recommended only in scenarios where more refined control of data delivery is required.
summary
After switching to flow mode, the readable stream continuously calls the read method to get data from the data source and push it into the cache pool. The consumer then consumes the data in the cache pool through events such as data.
Streams can be written
Writable stream is less of a concept than readable stream. The usual approach is to create a writable stream and then repeatedly call the write method to write data, and then call the end method to end the stream.
If the cache pool consumes slower than push during write, the amount of data in the pool will reach the highWaterMark threshold limit.
The write method returns false, reminding the producer to pause writing data until the drain event is emitted. The drain event represents that the data in the buffer pool has been consumed.
We can look at this process in the source code:
The writeOrBuffer function returns the value returned by the write method. If the buffer pool size is smaller than the highWaterMark, the writeOrBuffer function returns true, indicating that write can continue.
When highWaterMark is exceeded, writeOrBuffer returns false and sets needDrain to true.
You can then see in the afterWrite check function called after each write method execution that the drain event is emitted when needDrain == true and state.length === 0 (buffer pool data has been consumed).
summary
The writable stream continuously calls the write method to write data to the cache pool and waits for the drain event to continue writing when it is full. Call the end method to end the stream when all data is written.
Duplex flow
Duplex is a stream that implements both Readable and Writable interfaces.
As you can see, the Duplex class implements both Readable and Writable, so it can call both Readable and Writable methods.
Transformation flows
Transform flow implements duplex flow:
On the basis of the duplex stream, the transform stream added the transform parameter, so the data can be converted by the transform function when writing:
Implement custom flows
Custom flows inherit from one of the four basic flows and re-implement some methods on top of that inheritance.
Most of the streams we use in Node.js are custom streams, such as fs file read streams, HTTP response streams, etc.
Node.js specifies that a new custom stream type must implement one or more methods corresponding to its inherited stream:
Why does Node.js require these methods to be implemented?
For example, as mentioned above, a readable stream constantly calls the read method to read data when it switches to flow mode. The read method calls _read. Different custom streams in the _read method can read data from different places.
For example, fs.createReadStream’s _read method uses fs.read to read the contents of the file:
scenario
The Stream module can be used in a variety of scenarios, such as pipe forwarding by proxy, zlib streaming compression, FS streaming reading large files, etc.
In a word, we can consider using the Stream module to optimize any scenario that consumes a lot of memory during data transfer.