• Good Practices for High-performance and Scalable Node.js Applications [Part 1/3]
  • Virgafox
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: jianboy
  • Proofreader: UNICar9

In this three-part series, we’ll cover some good practices for developing Node.js Web back-end applications.

This series will not be a basic tutorial on Node; everything you will read is intended for developers already familiar with the basics of Node.js to help them improve their application architecture.

This article focuses on efficiency and performance to get the best results with fewer resources.

One way to increase the throughput of a Web application is to extend it multiple times to be instantiated to handle multiple incoming requests, so the first article in this series will show you how to scale a Node.js application horizontally on multiple cores or machines.

When you extend, you have to be careful with different aspects of your application, such as state and authentication, so the second article will cover some considerations you must take into account when extending Node.js applications.

Among the specified operations, there are some recommended practices that will be introduced in the third article when you scale out to N processes/machines and do not intend to run N times, such as splitting the API and worker processes, adopting priority queues, and managing periodic work such as cron processes.

Chapter 1 — Scaling Node.js applications horizontally

Horizontal scaling is about replicating application instances to handle a large number of incoming requests. This operation can be performed on a multi-core computer or on different computers.

Vertical scaling is about increasing stand-alone performance, and it doesn’t involve specific operations on the code side.

Multiple processes on the same machine

A common way to increase application throughput is to generate a process for each core of the computer. In this way, we can continue to generate and parallel the “concurrent” request management that works well in Node.js (see “Event-driven, non-blocking I/O”).

Processes larger than the core number may not be good, because at lower levels of process scheduling, the operating system may balance CPU time between these processes.

There are different scaling strategies on a computer, but a common strategy is to run multiple processes on the same port and use load balancing to distribute incoming requests on all processes/cores.

The strategy described below is the standard Node.js clustering pattern and automatic, higher-level PM2 clustering capabilities.

Native clustering mode

Local Node.js clusters are the basic way to scale Node applications on a single machine (Node.js.org/api/cluster…). . One instance of your process (called the “master”) is the instance responsible for generating other child processes (called the “worker”), each corresponding to a process running the application. Incoming requests are distributed according to all worker loop policies and are accessed on the same port.

The main disadvantage of this approach is that you have to manage the difference between the main and worker processes within your code, usually using the classic if-else block, and you can’t easily change the number of processes in the process.

The following example is taken from the official documentation:

Const cluster = the require (" cluster "); Const HTTP = the require (" HTTP "); Const numCPUs = the require (" OS ") cpus (.) length;if (cluster.isMaster) {
  
 console.log(`Master ${process.pid} is running`);
  
 // Fork workers.
 for (leti = 0; i < numCPUs; i++) { cluster.fork(); } cluster. On ('exit', (worker, code, signal) => {console.log(' worker${worker.process.pid} died`);
 });
  
} else {
  
 // Workers can share any TCP connection
 // In this caseit is an HTTP server http.createServer((req, res) => { res.writeHead(200); Res. The end (" hello world \ n "); }).listen(8000); console.log(`Worker${process.pid} started`);
 
}
Copy the code

PM2 cluster mode

If you use PM2 as a process manager (which I recommend), there is a magic clustering feature that allows you to scale processes across all the cores without worrying about clustering. The PM2 daemon will act as the “master” and generate N child processes as workers, and then use the round-robin algorithm for load balancing.

In this way, you can write your application as if it were a single-core usage (we’ll cover some of the considerations in the next article), and PM2 will focus on the multi-core part.

After starting the application in clustered mode, you can use “PM2 Scale” to adjust the number of instances in real time and perform a “0-second-downtime” reload, where processes are regrouped so that at least one process is always online.

As the process manager, PM2 is also responsible for restarting other useful processes if they crash while PM2 is running nodes in production.

If you need to scale further, you may need to deploy more machines.

Multi-server network load balancing

Scaling across multiple computers can be understood as scaling across multiple cores, with multiple computers, each running one or more processes, and load balancing servers for redirecting traffic to each computer.

After a request is sent to a particular node, the load balancing server described in the previous paragraph sends traffic to a particular process.

Network load balancing servers can be deployed in different ways. If you are using AWS to configure your infrastructure, a good choice is to use a managed Load balancing server like ELB (Elastic Load Balancer) because it supports useful features such as automatic scaling and is easy to set up.

But simply, you can deploy your own machine and set up load balancing with NGINX. Configuring load balancing for NGINX reverse proxies is very simple. Here is an example configuration:

http { upstream myapp1 { server srv1.example.com; server srv2.example.com; server srv3.example.com; } server { listen 80; location / { proxy_pass http://myapp1; }}}Copy the code

In this way, the load balancing server exposes your application to the outside world through a unique port. If you are concerned about a single point of failure, you can deploy multiple load balancing servers pointing to the same server.

To distribute traffic between load balancing servers (each with its own IP address), you can add multiple DNS “A” records to the primary domain, so DNS resolution will distribute traffic between multiple load balancing servers that you configure, resolving to A different IP each time.

In this way, you can also achieve redundancy on load balancing servers.

The next step

We’ve seen here how to scale Node.js applications at different levels to get the best possible performance out of your system architecture, from single-node to multi-node and multi-load balancing, but be careful: if you want to use an application in a multi-process environment, it must be ready, or you’ll run into a lot of problems.

In the next article, we’ll cover some considerations for getting your application extended-ready. You can find it here.


If this article is useful to you, please give me a like!

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.