Service development

Application scenarios of Node.js

General scenario

  • Generic HTTP/RPC interface
  • Generic timed task/queue consumption task
  • Ability to do most of the business logic

Advantage of scene

  • BFF front end “glue layer”
  • SSR server page rendering
  • Homogeneous Web applications
  • Real-time Communication Service (WebSocket)

Node.js can perform most tasks closely related to the front end, and there are many advantageous scenarios. At present, the underlying of social and live websocket is the IO library provided by Node.js

So how do we guarantee stability and performance in a complex business?

Stability and performance

Stability indicators are as follows: SLA

  • 3 hours 9:99.9% = 8760 * 0.1% = 8.72 hours
  • 4 9:9 9.99% = 8760 * 0.01% = 0.872 hours = 52.6 minutes
  • 5 9:9 9.999% = 8760 * 0.001% = 0.0872 hours = 5.26 minutes

In other words, the server outages are 8.27 hours, 52.6 minutes and 5.26 minutes in a year. As we know, downtime is the breakdown, what business can not be launched, so the shorter the downtime, the better, the standard of 5 nines is the highest standard

Performance statistics

  • Response time RTT (time between request initiation and receipt)
  • QRS/TPS (maximum number of requests or transactions that the current service can process per second)
  • Coucurrency (How many requests can the current service handle at the same time)
  • Error Rate (judge the number of status codes that are not 200, or count the number of catches)

Statistical indicators of resources

  • CPU Load CPU usage
  • Memory Usage
  • FD Count (number of file descriptors)
  • Disk Read/Write
  • Network Send/Recv (Network I/O throughput)

Stability guarantee

Package service stability during development

Exception handling

If I have careless code in my code or some other reason, there is no ban on introducing a third party library, I can use exception catching to get the cause of the error and log to help us solve the problem

process
    .on('unhandleRejection'.(reson) = >{
        //send error logs
        / /...
    })
    .on('uncaughException'.err= >{
        //send error logs
        / /...
        process.exit(1)})Copy the code
Automatic restart

In class-based modules, after the child process exits the capture process, the parent process automatically forks and restarts the child process. This ensures that in multi-process architecture, the failure of a single process can be quickly restarted to avoid downtime

// in cluster mode
cluster.on('exit'.function(worker,code,signal){
    if(worker.suicide){
        //send error logs
        cluster.fork()
    }
})
Copy the code
Health check

In the code to achieve health check, once in a while through the soket link whether the service is still available, timely discovery process in the running of abnormalities

//check whether the port is available
function connectToport(port:number, host? :string, timeout? :number) :Promise<void>{
     return new Promise((resolve,reject) = >{
         const socket = new net.Socket()
         const onError = (message:string) = >{
             socket.destroy(),
             reject(message)
         }
     })
     
     socket.setTimeout(timeout ?? 1000)
     socket.once("error".(e) = > onError(e.message))
     socket.once("timeout".() = > onError("TIMEOUT"))
     socket.connect({ port, host },() = >{
         socket.end();
         resolve()
     })
 }
Copy the code

The performance guarantee

Node.js Indicator statistics tool

Get metrics using the Node.js API

CPU Usage

  • User CPU time
  • System CPU Time

Memory Usage

  • HeapTotal /heapUsed (memory occupied by V8heap)
  • External (V8 out-of-heap memory footprint)
  • array buffers
  • rss

IO Usage

  • fsRead/fsWrite
  • ipcSent/ipcReceived
const {
    cpuUsage,
    memoryUsage,
    resourceUsage,
} = require('process')
​
console.log(cpuUsage())
//{ user:38579,system:6986}console.log(memoryUsage())
//Prints:{
// rss:4935680,
// heapTotal:1826816,
// heapUsed:650472,
// external:48979,
// arrayBuffers:9386
/ /}
Copy the code

Get service container metrics from system directives

Linux task manager, a series of metrics for the current container

You can see all file descriptors that are currently open for a process

Data collection && visualization

Encapsulate the common real-time reporting Metrics in the server or container image, collect data and display it through visual kanban, which helps us keep track of the current service status

Example of Grafana buried kanban

Release deployment

Early deployment plan

  • Purchase or lease a server/public IP address

  • Install the operating system, set up the internal network environment, and a series of infrastructure tools

  • Upload the production environment code package through FTP or RSYNC

  • Run the startup command in the corresponding path

    • Earlier implementations ran as daemons through Nohub

      nohub `NODE_ENV = production node test.js` 2> &1&
      Copy the code
    • Run through pM2 / framework-specific startup scripts

      pm2 start app.js -i max
      Copy the code
  • Purchase a domain name, configure DNS, and reverse proxy to the service

Everything is manually run and very unfriendly to developers

IaaS/Paas/FaaS development path

IaaS: The early stage of cloud computing

Infrastructure as a Service (IssS) is a cloud Service vendor that provides consumer processing, storage, network, and various basic computing resources to deploy and execute various software such as operating systems or applications

IaaS is the lowest layer of cloud services and mainly provides basic resources. Users can deploy and run processing, storage, network, and other basic computing resources without purchasing network devices such as servers and software. They cannot control or control the underlying infrastructure, but can control operating systems, storage devices, and deployed applications

Virtual host /VPS represents the product
  • AWS EC2
  • Aliyun ECS
  • Tencent Cloud server
The technical implementation
  • Virtual machine (KVM/OpenVZ/the Hyper – V
  • OpenStack
  • Docker

PaaS: Mainstream application hosting distribution mode

Platform as a Service is a cloud computing Service that provides computing platforms and solutions

PaaS provides the Software Deployment Platform (runtime), which abstracts hardware and operating system details to enable seamless scaling. Developers only need to focus on their own business logic, not the underlying layer

PaaS stands for product
  • Google AppEngine
  • Heroku
  • AWS Elastic Beanstalk
  • Vercel
The technical implementation
  • Docker Swarm /Docker Swarm

  • Kubernetes

    • Service choreography
    • Elastic expansion and contraction capacity
    • .
Paas-based publishing process

Most PaaS platforms provide running and support for Node.js services

We write the application according to the Node.js Runtime specification provided by the PaaS platform

Build script (NPM install), startup script (NPM start), and application configuration script (app.yml)

In the case of Vercel, you can bind Git Repository and release it directly

When the basic Runtime does not meet the requirements of the application, the PaaS platform also supports the customization of special functions and startup commands through Dockerfile

Heroku is used as an example to release a Container on the CLI

The PaaS and conversation

DevOps (a portmanteal of Development and Operations) is a culture, movement, or practice that values communication and cooperation between “software developers” and “IT Operations technicians”. Build, test, and release software quickly, frequently, and reliably by automating the software delivery and architecture change processes

Modern PaaS platforms provide a basic DevOps process to automate the integration of publishing into a Perview environment by binding Git Branch, greatly simplifying the process of bringing release tests online

PaaS and automatic scaling capacity

Thanks to the capabilities of Kubernetes, as well as the common PaaS application Runtime, modern PaaS services support defining the performance of instances and rapidly scaling up instances when requests surge, CPU/ memory is tight, and scaling down instances when requests are small and resources are abundant, thereby reducing maintenance and service costs

To rapidly expand capacity, ensure that performance indicators are monitored and data is reported on the Runtime

Serverless concept and products

“Serverless Computing is a cloud computing execution model in which the cloud provider allocates machine resources on Demand, taking care of the servers on behalf of the customers”

Serverless Optimal configuration: FaaS + BaaS

  • FaaS (Lambda) : function as a service, an event-driven computational execution model running in stateless containers, functions that leverage services to manage server-side logic and transitions. It allows developers to build, run, and manage these application packages functionally without having to maintain their own infrastructure.

    • AWS Lambda
    • Google Cloud Function
    • The Aliyun function computes FC
    • Tencent Cloud Function
  • BaaS: Backend as a service, enabling developers to focus on the front end of an application and leverage it without building or maintaining back-end services

    • Google Firebase
Limitations of the FaaS implementation (Lambda)

The principle of FaaS charging by volume is difficult to implement under the traditional container deployment technology

  • Container cold start + service start requires a second event
  • Resident instance + standby mode can ensure the efficiency of the first visit, but the demand of charging by volume cannot be met

Node.js implements “high density Deployment” based on VM modules

  • Isolation between functions
  • Recovery of an infinite loop
const vm = require('vm')
​
const runFunction = async (code)=>{
    const result = await new Promise((resolve,reject) = >{
        const sandbox = { require.console }
        try {
            timer = setTimeout(() = >{
                reject(new Error('Execute function time out'))},1000)
            vm.createContext(sandbox)
            const data = vm.runInNewContext(code,sandbox)
            resolve(data)
        }catch (error){
            reject(error)
        }
    }).catch((err) = >{
        return err instanceof Error ? err : New Error(err.stack)
    })
    
    if(timet){
        clearTimeout(timer)
        timer = null
    }
    return result
}
Copy the code

Alternative: WASM/V8 Worker instead of Node.js

  • Deno Deploy
  • Cloudflare Workers
  • Wasm Edge
FaaS vs PaaS

FaaS vs. PaaS in terms of development experience

  • The functional model is too simple
  • Writing multiple cloud functions is not engineering friendly
  • Application developers prefer to write/publish a finished Node.js WebApp

Jamstack pattern and Vercel exploration: Build PaaS applications into FaaS functions and release them

Monitor the operational

Log burying point and monitoring alarm

  • The log

    • process.stdout / process.stderr
    • send through udp socket
  • Buried point alarm

    • Metrics
    • Span
    • Trace

Online troubleshooting

Before performing this production service, pull out the cluster to prevent external users from being affected

Node.js Inspector

Node.js provides the Inspector module, which enables debugging of running services

const inspector  = require('inspector')
​
inspector.open()
​
console.log(inspector.url())
Copy the code

Inspector also supports a HeapSnapshot/CPUProfile at runtime to troubleshoot CPU/ memory problems

const inspector  = require('inspector')
const fs = require('fs')
const session = new inspector.Session()
​
const fd = fs.openSync('profile.heapsnapshot'.'w')
​
session.connect()
​
session.on('HeapProfiler.addHeapSnapshotChunk'.(m) = >{
    fs.writeSync(fs,m.params.chunk)
})
​
session.post('HeapProfiler.takeHeapSnapshot'.null.(err,r) = >{
    console.log('HeapProfiler.takeHeapSnapshot done:',err,r)
    session.disconnect()
    fs.closeSync(fd)
})
Copy the code

Strce and tcpdump: more general system diagnostics tools

  • Tcpdump can capture data in real network transmission, which is useful for Web Server development scenarios
  • Strace can clearly output the parameters and return results of each syscall between applications and kernel, and is a universal tool for understanding system calls

Strace: Check out syscall’s tools

Write an HTTP server and view and analyze the system calls through strace -p after starting

const http = require('http')
// Output pid to facilitate the use of strace
console.log(process.pid)
​
const server = http.createServer((req,res) = >{
    res.end('hello')
})
​
server.listen(() = >{
    const {port} = server.address()
    
    setInterval(() = >{
        const req = http.get(` http://127.0.0.1:${port}`.(res) = >{
            res.on('data'.() = >{})
        })
        req.end('hello')},1000)})Copy the code

Tcpdump: a universal packet capture tool

Tcpdump is a cross-platform packet capture tool that allows you to see every request packet transmitted on a network device. And it works on Windows/Mac/Linux

Common Filter commands

  • Host /net Specifies the request host/ IP address
  • Port Specifies the requested port
  • DST and SRC are used to specify whether rules are used to specify the source or destination of a packet
  • And and or are logical and, or relationships. Users combine multiple sets of filtering rules

Tcpdump: Use the Wireshark to view captured packets

Tcpdump can write captured packets to a file and view the result using the Wireshark