Why use Node.js

This is a study note for mobile engineers dabble in front-end and back-end development. Please correct any mistakes or misunderstandings.

What is the Node. Js

Traditionally JavaScript runs on a browser, because the browser kernel is actually split into two parts: a rendering engine and a JavaScript engine. The former is responsible for rendering HTML + CSS, while the latter is responsible for running JavaScript. Chrome uses a JavaScript engine called V8, which is very fast.

Node.js is a framework that runs on the server side and uses the V8 engine at its base. We know that Apache + PHP and Java servlets can be used to develop dynamic web pages, but Node.js is similar to them, using JavaScript.

As a simple example, create a new app.js file and type the following:

var http = require('http');
http.createServer(function (request, response) {
    response.writeHead(200, {'Content-Type': 'text/plain'}); // HTTP Response header
    response.end('Hello World\n'); // Return data "Hello World"
}).listen(8888); // Listen on port 8888
// The terminal displays the following information
console.log('Server running at http://127.0.0.1:8888/);Copy the code

Thus, once a simple HTTP Server is written, type in Node app.js and run, and then access it to see the output.

Why node.js

With a new technology, it’s always good to ask a few whys. Why learn Node.js when PHP, Python, and Java can all be used for back-end development? At the very least, we should know when node.js is the best choice.

In general, Node.js is suitable for the following scenarios:

Real-time applications, such as online multiplayer collaboration tools, web chat applications, etc.
High concurrency applications that focus on I/O, such as providing apis for clients to read databases.
Streaming applications, such as clients, often upload files.
Front and rear ends separated.

In fact, the first two can be summed up as one, that is, clients widely use long connections. Although the number of concurrent connections is high, most of them are idle connections.

Node.js also has its limitations. It is not suitable for CPU intensive tasks, such as artificial intelligence computing, video and image processing, etc.

Of course, the above shortcomings are not loose talk, or rote memorization, but not conformity, we need to have a certain understanding of the principle of Node.js, in order to make a correct judgment.

Basic concept

Before introducing Node.js, a few basic concepts will help you understand node.js more deeply.

concurrent

Unlike the client side, the number of concurrent requests that the server can support is one of the data that the server developer is very concerned about. The early C10K issues discussed how to support 10K concurrency with a single server. Of course, with the improvement of hardware and software performance, C10K is no longer a problem, and we are trying to solve the C10M problem, which is how a single server can handle millions of concurrent processes.

When C10K was proposed, we were still using the Apache server, which worked by forking out a child process and running PHP scripts in the child process whenever a network request arrived. After executing the script, send the result back to the client.

This ensures that different processes do not interfere with each other, and even if one process fails, it does not affect the entire server. However, the disadvantages are obvious: processes are a heavy concept, with their own heap and stack, which takes up a lot of memory, and the number of processes that can run on a server is limited to a few thousand.

Although Apache later adopted FastCGI, it was essentially a pool of processes that reduced the overhead of creating processes, but did not significantly increase the number of concurrent processes.

Java servlets use thread pools, meaning that each Servlet runs on a single thread. Threads, while lighter than processes, are relative. It has been tested that a stack size of 1M per thread is still not efficient. In addition, multithreaded programming can bring all kinds of problems, as programmers know all too well.

If threads are not used, two other solutions are to use coroutines and non-blocking I/O. Coroutines are lighter than threads, and multiple coroutines can run in the same thread and be scheduled by the programmer himself, a technique widely used in the Go language. Non-blocking I/O is used by Node.js to handle high-concurrency scenarios.

Non-blocking I/O

I/O can be divided into two types: network I/O and file I/O, which are actually highly similar. I/O can be divided into two steps. The first step is to copy the contents of a file (network) into a buffer, which is located in an area of memory exclusive to the operating system. The contents of the buffer are then copied into the memory area of the user program.

For blocking I/O, both steps are blocked from initiating a read request, until the buffer is ready, and then until the user process gets data.

Non-blocking I/O essentially polls the kernel to see if the buffer is ready, and if not, to continue with other operations. When the buffer is ready, copying the contents of the buffer to the user process is actually blocked.

I/O multiplexing technology refers to the use of a single thread to process multiple NETWORK I/O, we often say select, epoll is used to poll all socket functions. For example, Apache uses the former, while Nginx and Node.js use the latter, but the latter is more efficient. Because I/O multiplexing is really single-threaded polling, it is also a non-blocking I/O solution.

Asynchronous I/O is the ideal I/O model, but true asynchronous I/O does not exist. AIO on Linux passes data through signals and callbacks, but it is flawed. Existing Libeio and IOCP for Windows essentially simulate asynchronous I/O using thread pools and blocking I/O.

Node.js threading model

Many articles have mentioned that Node.js is single-threaded, but this is not a serious or even irresponsible statement, as we should at least ask the following questions:

How does Node.js handle concurrent requests in a thread?
How does Node.js perform asynchronous FILE I/O in a thread?
How does Node.js reuse the processing power of multiple cpus on the server?

Network I/O

Node.js can indeed handle a large number of concurrent requests in a single thread, but this requires some programming skill. Looking back at the code at the beginning of this article, the console immediately gets output when we execute the app.js file, and we only see “Hello, World” when we visit the web page.

This is because Node.js is event-driven, meaning that its callback function will only be executed if the event is requested by the network. When multiple requests arrive, they form a queue and wait to be executed.

This may seem obvious, but not realizing that Node.js runs on a single thread, that callbacks are executed synchronously, and that the program is developed in a traditional fashion can cause serious problems. For a simple example, the “Hello World” string here might be the result of some other module running. Assuming that the generation of “Hello World” is time-consuming, it blocks the callback of the current network request and prevents the next network request from being responded to.

The solution is simple: use an asynchronous callback mechanism. We can pass the response parameter used to generate the output to other modules, generate the output asynchronously, and finally execute the actual output in the callback function. The advantage of this is that http.createserver’s callback function does not block, so requests do not go unanswered.

For example, if we modify the server entry, in fact, if we want to complete the routing ourselves, it is probably the same idea:

var http = require('http');
var output = require('./string') // a third-party module
http.createServer(function (request, response) {
    output.output(response); // Call a third-party module for output
}).listen(8888);Copy the code

Third-party modules:

function sleep(milliSeconds) {  // Simulate gridlock
    var startTime = new Date().getTime();
    while (new Date().getTime() < startTime + milliSeconds);
}

function outputString(response) {
    sleep(10000);  / / block 10 s
    response.end('Hello World\n'); // Perform time-consuming operations first, then output
}

exports.output = outputString;Copy the code

In summary, when programming with Node.js, any time-consuming operations must be done asynchronously to avoid blocking the current function. Because you’re serving a client, and all the code is always single-threaded, sequentially executed.

For starters who still don’t understand this, I recommend reading “Nodejs 101” or reading the chapter on event loops below.

File I/O

As I’ve emphasized in previous posts, asynchrony is about optimizing the experience and avoiding stalling. And really save processing time, the use of CPU multi-core performance, or to rely on multi-thread parallel processing.

Node.js actually maintains a thread pool underneath. As mentioned earlier in the basic Concepts section, there is no true asynchronous file I/O, and it is usually modeled through thread pools. By default, there are four threads in the thread pool for file I/O.

Note that the underlying thread pools cannot be manipulated directly, nor do we actually need to care that they exist. Thread pools are only used to perform I/O operations, not CPU intensive operations such as image, video processing, large-scale computing, etc.

If we have a small number of cpu-intensive tasks to handle, we can start multiple node.js processes and use IPC mechanisms for interprocess communication, or call external C++/Java programs. If there are a lot of CPU-intensive tasks, then node.js is a bad choice.

Drain the CPU

So far, we know that Node.js uses I/O multiplexing, handles network I/O with a single thread, and simulates asynchronous file I/O with a thread pool and a small number of threads. So on a 32 core CPU, does node.js’s single thread look lame?

The answer is no, we can start multiple Node.js processes. Unlike in the previous section, processes do not need to communicate with each other; they each listen on a port and use Nginx for load balancing at the outermost layer.

Nginx load balancing is very easy to implement by editing the configuration file:

http{
    upstream sampleapp {
        // Optional configuration items, such as least_conn and ip_hash
        server 127.0. 01.:3000;
        server 127.0. 01.:3001;
        / /... Listen for more ports}... server{ listen80; . location / { proxy_pass http://sampleapp; // Listen on port 80 and forward}}Copy the code

The default load balancing rule is to allocate network requests to different ports in sequence. We can use the least_conn flag to forward network requests to the node.js process with the fewest connections. We can also use the ip_hash to ensure that requests from the same IP address are handled by the same Node.js process.

Multiple Node.js processes can give full play to the processing power of the multi-core CPU, and also have a strong ability to expand.

Event loop

There is an Event Loop in Node.js, which may look familiar to those with iOS development experience. Yes, it is somewhat similar to Runloop.

A complete Event Loop can also be divided into multiple phases, including Poll, Check, Close Callbacks, Timers, I/O Callbacks, and Idle.

Since Node.js is event-driven, the callback function for each Event is registered to a different stage of the Event Loop. For example, the fs.readfile callback is added to the I/O callbacks, the setImmediate callback is added to the poll phase of the next Loop, and the process.Nexttick () callback is added to the current phase. Before the next phase starts.

It is important to understand that callbacks to different asynchronous methods will be executed in different phases, or logic errors will occur due to invocation order.

The Event Loop is a continuous Loop that synchronously executes all callbacks registered at that stage. This is why I mentioned in the network I/O section that you should never call blocking methods in callback functions and always use asynchronous thinking for time-consuming operations. A callback that takes too long may cause the Event Loop to be stuck in a certain stage so that incoming network requests cannot be responded to in a timely manner.

Since the purpose of this article is to have a preliminary, comprehensive understanding of Node.js. I won’t go into the details of each stage of the Event Loop, but you can check out the official documentation for details.

Node.js encapsulates EventEmitter to make it easier to use event-driven ideas.

var EventEmitter = require('events');
var util = require('util');

function MyThing() {
    EventEmitter.call(this);

    setImmediate(function (self) {
        self.emit('thing1');
    }, this);
    process.nextTick(function (self) {
        self.emit('thing2');
    }, this);
}
util.inherits(MyThing, EventEmitter);

var mt = new MyThing();

mt.on('thing1'.function onThing1() {
    console.log("Thing1 emitted");
});

mt.on('thing2'.function onThing1() {
    console.log("Thing2 emitted");
});Copy the code

Self. Emit (thing2) will be executed first, although it is defined later. This also conforms to the Event Loop call rule.

Many modules in Node.js inherit from EventEmitter, such as fs.readStream in the next section, which creates a stream of readable files that throw events when they are opened, read, and completed.

The data flow

The benefits of using data streams are clear and true to life. For example, if a teacher assigns homework for summer vacation, students can finish the task easily if they do a little homework every day. If they are piled up together, they will feel powerless in the face of a mountain of exercise books on the last day.

The same goes for Server development, assuming that the user uploads 1 GIGAByte of files, or reads 1 gigabyte of files locally. If there is no concept of data flow, we need to create a buffer of 1 gigabyte and process it all at once when the buffer is full.

For data streaming, we can define a very small buffer, such as 1Mb. When the buffer is full, the callback function is executed to process this small piece of data to avoid a backlog.

Both request and FS modules actually read files as a readable data stream:

var fs = require('fs');
var readableStream = fs.createReadStream('file.txt');
var data = ' ';

readableStream.setEncoding('utf8');
// A small chunk of data is processed each time the buffer is full
readableStream.on('data'.function(chunk) {
    data+=chunk;
});
// The file stream has been read
readableStream.on('end'.function() {
    console.log(data);
});Copy the code

With pipes, you can write content from one stream to another:

var fs = require('fs');
var readableStream = fs.createReadStream('file1.txt');
var writableStream = fs.createWriteStream('file2.txt');

readableStream.pipe(writableStream);Copy the code

Different streams can also be chained, such as reading a compressed file, decompressing it as it reads, and writing the decompressed content to the file:

var fs = require('fs');
var zlib = require('zlib');

fs.createReadStream('input.txt.gz')
  .pipe(zlib.createGunzip())
  .pipe(fs.createWriteStream('output.txt'));Copy the code

Node.js provides very concise data flow operations, so this is a brief introduction to use.

conclusion

For long connections with high concurrency, the event-driven model is much lighter than threads, and multiple Node.js processes with load balancing can be easily extended. Node.js is therefore well suited for I/O intensive applications. The downside of this approach is that it’s not good at cpu-intensive tasks.

Node.js typically describes data as a stream and provides a good encapsulation for this.

Node.js is developed using the front-end language (JavaScript) and is also a back-end server, so it provides a good idea for the separation of the front and back ends. I will examine this in the next article.

The resources

Concurrent tasks on node.js
Use Nginx to add load balancing to Nodejs
Understanding the node.js event loop
The Node.js Event Loop
The Basics of Node.js Streams