Writing is not easy, without the permission of the author forbid to reprint in any form! If you think the article is good, welcome to follow, like and share! Continue to share technical blog posts, follow the wechat public account 👉🏻 front-end LeBron

What was the original purpose of Making Node?

Node by Ryan Dahl

Create a lightweight, high-performance Web server based on V8 and provide a set of libraries

Why JavaScript?

Ryan Dahl is a veteran C/C++ programmer who worked around Web high performance servers before creating Node

He found two things about Web high performance servers:

  • event-driven
  • Non-blocking I/O

Ryan Dahl also evaluated the use of C, Lua, Haskell, Ruby and other languages as alternative implementations and came to the following conclusions:

  • C has a high barrier to development, and it is predictable that not many developers will be able to use it for business development
  • Ryan Dahl decided he wasn’t good enough to play Haskell, so he abandoned it
  • Lua already has many blocking I/O libraries, and building a non-blocking I/O library for Lua doesn’t change developer habits
  • Ruby virtual machines don’t perform well

Advantages of JavaScript:

  • Low barriers to development
  • There is no historical baggage in the backend
  • The second browser war is drawing to a close, with Chrome’s JavaScript engine V8 taking the top spot

What Node brings to JavaScript

Node’s structure is very similar to Chrome’s, except that UI-related technologies like HTML, Webkit, and graphics are not supported. They are both event-driven and asynchronous architectures:

  • The browser serves interactions on the interface through an event driver
  • Node services I/O with an event driver

JavaScript is also given new capabilities in Node:

  • Access local files as much as you want
  • Build the WebSocket server
  • Connect the database and conduct business research and development
  • Play multiple processes like a Web Worker

Node allows JavaScript to run in different places, no longer confined to browsers and DOM trees. If HTTP is horizontal, Node is the browser’s reflection on the other side of the stack.

Node doesn’t handle the UI, but runs with the same mechanics and principles as the browser, breaking the rule that JavaScript only runs in the browser. Unified front-end and back-end programming environment can greatly reduce the context cost of front-end and back-end conversion.

The characteristics of the Node

Asynchronous I/O

  • Take reading files as an example

var fs = require('fs'); 

fs.readFile('/path'.function (err, file) { 
 console.log('Reading file completed')});console.log('Initiate file reading'); 
Copy the code

Familiar users will know that “read file complete” is printed after “initiate read file.

The code after fs.readfile is executed immediately, while the “readFile complete” execution time is not expected

We know that it will be executed after this asynchronous operation, but we don’t know when

The capture of the result value in an asynchronous call conforms to the “Don’t call me, I will call you” principle

This is also a result – oriented, not caring about the process

In Node, the vast majority of operations are called asynchronously. Ryan Dahl overcame all difficulties to build a lot of asynchronous I/O apis at the bottom, from file reading to network requests. It makes it very natural for developers to conduct parallel I/O operations from the language level, without waiting for the end of previous I/O calls between each call, which can greatly improve efficiency in the programming model

Note: The asynchronous I/O mechanism is described in more detail below

Events and callback functions

The event

With the advent of Web2.0, JavaScript took on more responsibility at the front end and time became widely used. Bringing the event and return functions that are widely used and mature in the front-end browser to the back end, along with asynchronous I/O, is a good way to expose the timing of the event to the business logic.

  • Server example

The request event is bound to the server

For the request object, the Data and End events are bound

var http = require('http'); 
var querystring = require('querystring'); 

// Listen for server request events
http.createServer(function (req, res) { 
 var postData = ' '; 
 req.setEncoding('utf8'); 
 // Listen for the requested data event
 req.on('data'.function (trunk) {
 postData += trunk;
 });

 // Listen for the request's end event
 req.on('end'.function () { 
 res.end(postData); 
 }); 
}).listen(8080); 

console.log('Server startup completed'); 
Copy the code
  • The front case

Once the request is issued, you only need to care about executing the appropriate business logic if the request succeeds

request({ 
 url: '/url'.method: 'POST'.data: {}, 
 success: function (data) { 
 / / success}});Copy the code

The event programming method has the advantages of lightweight, loose coupling and only focusing on transaction points. However, in the scenario of multiple asynchronous tasks, events are independent from each other, so how to cooperate is a problem. A series of asynchronous programming solutions have emerged subsequently:

  • Event publish/subscribe pattern
  • Promise, async/await
  • Process control library

The callback function

  • In addition to asynchrony and events, Node also features callback functions
  • In general, callback functions are also the best way to receive data returned by asynchronous calls
    • But this kind of programming way for many people who are used to synchronous thinking of programming, may be very unaccustomed
    • The order in which the code is written has nothing to do with the order in which it is executed, which can be difficult for them to read
  • In terms of process control, asynchronous methods and callback functions are interspersed, making it less straightforward than the normal synchronous approach
    • After switching to asynchronous programming thinking, the complexity of handling business in process control is virtually the same as that in synchronous way, through the division of business and the refinement of events

Single thread

Node maintains the single-threaded nature of JavaScript in the browser

JavaScript does not share any state with other threads. The big advantage is that you do not have to worry about state synchronization as much as multithreaded programming does. There are no deadlocks and no performance overhead associated with thread context exchange

  • Disadvantages of single threading
    • Unable to utilize multi-core CPUS
    • Errors can cause the entire application to quit, resulting in poor robustness
    • A large number of calculations occupy the CPU, causing asynchronous I/O calls to fail
  • Child_process and Cluster modules have been introduced to alleviate these shortcomings

cross-platform

At first, Node only ran on Linux, but if you wanted to learn and use Node on Windows, you had to go through Cygwin/MinGW. Later, Microsoft decided to implement a cross-platform architecture based on Libuv

  • libuv

A layer of platform architecture is built between the operating system and Node’s upper module system

With a good architecture, Node’s third-party C++ modules can also be cross-platform with libuv

Node module mechanism – CommonJS

Background:

In other high-level languages, Java has class files, Python has the import mechanism, Ruby has require, and PHP has include and require. JavaScript, on the other hand, has a messy way of introducing code through script tags. People have to artificially constrain code in namespaces and other ways to make it safe and easy to use.

Until CommonJS came along…

vision

I want JavaScript to run anywhere

The starting point

For JavaScript itself, the specification is still weak, with the following flaws:

  • No modular system
  • Fewer standard libraries
    • ECMAScript defines only part of the core library
    • There is no standard API for common requirements such as file system I/O streams
  • No standard interface
    • In JavaScript, there is almost no standard unified interface defined for Web servers or databases
  • Lack of a package management system
    • As a result, JavaScript applications basically do not have the ability to automatically load and install

CommonJS is designed to make up for the current lack of standard JavaScript, so that Python, Ruby and Java have the basic ability to develop large applications, rather than stay in the stage of small scripting programs.

  • Server-side JavaScript program
  • Command line tool
  • Desktop graphical interface applications
  • Hybrid application

The CommonJS specification covers:

  • The module
  • binary
  • Buffer
  • Character set encoding
  • I/O streams
  • Process environment
  • The file system
  • The socket
  • Unit testing
  • Gateway interface of the Web server
  • Package management

Node’s relationships with browsers, as well as the W3C, CommonJS, and ECMAScript organizations, make for a thriving ecosystem

The module specification

  • The module definition

The context provides the exports object with a method or variable to export the current module, and it is the only exit for the export

There is also a Module object that represents the module itself, and exports is a module property

In Node, a file is a module, and methods are mounted on exports objects as properties to define how to export

// math.js
exports.add = function(a, b){
    return a + b;
}
Copy the code
  • The module reference
const math = require('./math');

const res = math.add(1.1);
console.log(res);
/ / 2
Copy the code

In the CommonJS specification, there is the require method, which accepts a module identifier to introduce a module’s API into the current context

  • Module identifier

The module id is the argument passed to the require method, which can be:

  • How do small humps name strings
  • . /,.. / Indicates a relative path or an absolute path
  • Can not have file name suffix.js

The module definition is very simple, and the interface is very concise

Each module has its own space, they do not interfere with each other, and they are clean when referencing

  • Meaning:

Restrict clustering methods, variables, and so on to private scopes, while supporting import and export capabilities to smoothly connect upstream and downstream dependencies

Module implements

To import modules into Node, you need to go through three steps

  • Path analysis
  • File location
  • Compile implementation

Nodes are divided into two types of modules:

  • The core module

During compilation, the binary executable file is compiled

When the Node process starts, part of the core module is directly loaded into memory, so when this part of the core module is introduced, the two steps of file location and compilation execution can be omitted, and the priority judgment in the path analysis, so it is the fastest loading speed.

  • User-written file modules

Dynamic loading at runtime requires a complete path analysis, file location, compilation and execution process, slower than the core module

Preferentially load from cache

Just as browsers cache static script files to improve performance, Node recaches imported modules to reduce overhead. The differences are:

  • The browser only caches files
  • Node caches objects after compilation and execution

The require method applies cache first to all secondary loads of the same module, regardless of whether it is a core module or a file module

Path analysis and file location

Identifier Analysis (path)

As mentioned earlier, the require method takes a parameter as an identifier, which falls into the following categories:

  • The core module

It is second in priority to cache loading, and is compiled to binary during source compilation of Node, making it the fastest loading process

Note: Loading a custom module with the same identifier as the core module will not succeed and can only be done by selecting a different identifier/swap path

  • File module in the form of a path

With the /,.. Identifiers starting with/are treated as file modules

The require method converts the path to the real path, indexes the real path, and stores the compiled result in the cache to make the secondary load faster

The file module gives Node the exact location of the file, so it saves a lot of time in the lookup process and only loads slower than the core module

  • Custom modules

Is a special file module in the form of a file or package

This type of module lookup is the most time-consuming and the slowest

First, I will introduce the concept of module path, which is also the search strategy developed when locating file modules. Specifically, it is an array of paths

  • console.log(module.path)
  • You can get an array of paths

[‘/home/bytedance/reasearch/node_modules’,

‘/home/bytedance/node_modules’,

‘home/node_module’, /node_modules’]

You can see the rules as follows:

  • Node_modules directory in the current file directory
  • Node_modules directory in the parent directory
  • Parent directory Node_modules directory in the parent directory
  • Recurse up the path to the node_modules directory in the root directory

It is generated in much the same way that JavaScript prototype/scope chains look up

During loading, Node tries each path in the module path until it finds the target file

The deeper the file path, the more time it takes to find modules, which is why custom modules are the slowest to load

File location

  • File name extension analysis

Require analysis identifiers do not contain file extensions

Js,.json, and.node. Try it once

The fs module is called to block synchronously to determine whether the file exists, which can cause performance problems for Node single-threading

If it is a. Node /. Json file with the extension can speed things up a bit, and with the caching mechanism, can greatly alleviate the node single-thread blocking defects

  • Directory analysis and packages

Parsing the identifier might not find a file, but get a directory, which is treated as a package

Parse the package.json file for the file name specified by the main property of the package

If the file corresponding to main is parsed incorrectly/there is no package.json file, Node will use index as the filename

Js index.json index.node

If the directory is not located successfully, search for the next module path

An exception is thrown until the module path array has been traversed and no target file is found

Modules compiled

In Node, each file module is an object

function Module(id, parent) { 
 this.id = id; 
 this.exports = {}; 
 this.parent = parent; 
 if (parent && parent.children) { 
     parent.children.push(this); 
 } 
 this.filename = null; 
 this.loaded = false; 
 this.children = []; 
} 
Copy the code
  • Js file
    • Read files synchronously through fs module and compile and execute
  • The node file
    • This is written in C/C++ extension file, by dlopen method added at the end of the compilation generated file
  • Json file
    • Json. parse parses the returned result after the fs module synchronously reads the file
  • other
    • Are loaded as JS files

Each compiled Module will store its file path as an index on the module. cache object to improve secondary import performance

Package and NPM

Node organizes its core modules and allows third-party file modules to be written and used in an orderly fashion

However, in third-party modules, modules are still hashed everywhere and cannot be directly referenced to each other

Outside of modules, packages and NPMS are mechanisms that link modules together

To some extent, it solves the problems of code organization such as variable dependence and dependency relationship

Package structure

A package is actually an archive file, that is, a directory is packaged into a. Zip /tar.gz file, decompressed after installation to restore the directory

  • A package directory that complies with the CommonJS specification should contain the following files
    • Package. json package description file
    • Bin stores executable binary files
    • Lib A directory for storing JavaScript code
    • Doc Is the directory for storing documents
    • Test is the code that holds the unit test case

Package description file

package.json

CommonJS defines the following required fields for package.json

  • Name the package name
  • The description package introduction
  • Version version number
  • Keyword array for NPM search
  • Maintainers Specifies the package maintainers list
  • List of Contributors
  • Bugs A web page address/email address that can be used to report bugs
  • List of Licenses
  • A list of locations of the repository source code
  • Dependencies Uses the packages on which the current package depends
  • Homepage Specifies the website address of the current package
  • OS List of supported operating systems
    • Aix, FreeBSD, Linux, MacOS, Solaris, VxWorks, Windows
  • CPU CPU architecture support list
    • Arm, MIPS, PPC, Sparc, x86, X86_64
  • Builtin indicates whether the current package is a standard component built into the underlying system
  • Implements a list of implementation specifications
  • Scripts Script description object

NPM is implemented based on the definition of the package specification, which helps Node solve the problem of dependent package installation

NPM common functions

The CommonJS package specification is the theory, and NPM is one of the practices

NPM for Node is equivalent to gem for Ruby and PEAR for PHP

Help with the release, installation and dependency of third-party modules

  1. See the help
  • Check the versionnpm -v
  • Check the commandnpm
  1. Installing dependency packages
npm install {packageName}
Copy the code

After this command is executed, NPM creates a package directory in the node_modules directory in the current directory, and then unzips the corresponding package to this directory

  • Global Installation mode
npm install {packageName} -g
Copy the code

Global mode is not meant to install a module package as a global package, and it does not mean that you can reuqire it from anywhere

Global mode This becomes imprecise, -g actually installs a package as a globally available execution command

It links the actual script to the same path as the Node executable according to the bin field configuration in the package description file

  • Local Installation

For some packages that are not published to NPM, or cannot be installed directly because of network reasons

This can be done by pressing the package to local and then installing it locally

npm install <tarball file>
npm install <tarball url>
npm install folder>
Copy the code
  • Install from unofficial sources

If you cannot use the official source, use the image source

npm install --registry={urlResource}
Copy the code

If the mirror source is used almost exclusively, you can specify the default source

npm config set registry {urlResource}
Copy the code
  1. NPM hook command

The idea behind the scripts field in package.json is to have packages provide hooks during installation, uninstallation, etc

"scripts": {"preinstall": "preinstall.js"."install": "install.js"."uninstall": "uninstall.js"."test": "test.js",}Copy the code
  • Install
    • Execute in the above fieldsnpm install <package>The script pointed to by preinstall is loaded and executed, and the script pointed to by install is executed
  • Uninstall
    • performnpm uninstall <package>When uninstall points to a script, it may do some cleaning
  • Test
    • performnpm testThe script pointed to by test will be run. A good package should contain the test cases and have commands to run the tests configured in package.json so that users can run the test cases to verify that the package is stable and reliable

LAN NPM

  • background

The limitation for an enterprise is that it needs to enjoy the low coupling and project organization benefits of module development on the one hand, while considering module confidentiality on the other. Therefore, sharing and publishing through NPM is potentially risky.

  • The solution

The solution is for enterprises to build their own NPM repositories in order to enjoy the many packages available on NPM while keeping their own packages confidential and restricted. NPM is open source for both its servers and clients.

A local NPM repository is set up in much the same way as a mirror repository, except that you can choose not to synchronize the packages in the official source repository

  • role
    • Private reusable modules can be packaged into a local NPM repository, which keeps the update centralized and prevents small projects from maintaining modules with the same functionality
    • Do not share code by copy and paste

Asynchronous I/O

Why asynchronous I/O?

  • The user experience

JavaScript executes on a single thread in the browser and shares a thread with UI rendering

High Performance JavaScript has concluded that pages can get stuck if scripts take more than 100ms to execute

If a web page temporarily needs to obtain a network resource through synchronization, JS can only continue to execute after the resource is completely obtained from the server. During this period, UI will stop and do not respond to user interaction. You can imagine how bad the user experience would be.

With asynchronous requests, JavaScript and UI execution are not in a waiting state, giving the user a live page

I/O is expensive, distributed I/O is even more expensive

The front-end experience can only be good if the back end responds quickly to resources

  • The allocation of resources

During the development of computer, the components are abstracted into I/O devices and computing devices

Assuming a business scenario has a set of unrelated tasks to complete, there are two main approaches:

  1. Multithreading in parallel

The cost of multithreading is the overhead of creating threads and performing thread context switches.

Problems such as locking and state synchronization are often encountered in complex services. But multithreading can improve CPU utilization on multi-core cpus

  1. A single thread executes sequentially

Single-threaded sequential execution of tasks is more consistent with the way programmers think sequentially, and is still the mainstream programming way

The disadvantage of serial execution is performance, as any task that is slightly slower will cause the subsequent execution code to block

In computer resources, where I/O and CPU calculations can normally be done in parallel, the problem with the synchronous programming model is that I/O can wait for subsequent tasks, which makes the resources not better utilized

  1. Node gives its answer somewhere in between

Using single thread, away from multi-thread deadlock, state synchronization and other problems;

Using asynchronous I/O allows single threads to get away from blocking and better use of the CPU

To compensate for the inability of a single thread to take advantage of a multi-core CPU, Node provides a child process similar to Web Workers in a front-end browser that uses CPU and I/O efficiently through worker processes

It is expected that the INVOCATION of I/O will not block subsequent operations, and the time waiting for I/O completion will be allocated to other services

Status of asynchronous I/O

Asynchronous I/O and non-blocking I/O

The operating system kernel has only two types of I/O: blocking and non-blocking

When calling blocking I/O, the application waits for the I/O to complete before returning the result

Features: The call must wait until all operations are completed at the kernel level before the call ends

Example: This call ends after the system kernel has done disk seeking, read data, and copied data into the inside talent.

Non-blocking I/O differs from blocking I/O in that it is returned immediately after the call

After the non-blocking I/O returns, the CPU’s time slice can be used to process other transactions, where the performance improvement is significant

Existing problems:

  • Because the full I/O is not complete, the data immediately returned is not what the business layer expects but merely the state of the current invocation
  • To get complete data, the application repeatedly calls I/O operations to confirm completion, known as “polling.”

The main polling technique

  • read

It is the most primitive and the least performance one, through repeated calls to check the state of the I/O to complete the data read

The CPU is spent waiting until the final data is available

  • select

It is an improvement on read, which makes judgments about event states on file descriptors

Limitations: It uses a 1024-length array to store state and can check up to 1024 file descriptors at the same time

  • poll

An improvement over SELECT, it uses a linked list to avoid array length constraints, and it can avoid unnecessary checks

With a large number of file descriptors, its performance is still very low

  • epoll

This scheme is the most efficient I/O event notification mechanism in Linux. If no I/O event is detected during the polling, it will sleep until the event wakes it up. It takes full advantage of event notifications and performs callbacks instead of traversing queries, so it doesn’t waste CPU and is more efficient to execute

Ideal non-blocking asynchronous I/O

Although epoll has used time to reduce CPU consumption, the CPU is almost limited during hibernation and underutilized for the current thread

The perfect asynchronous I/O is an application that makes a non-blocking call without polling through traversal or time wake up

You can work directly on the next task, simply passing the data to the application via signals or callbacks after THE I/O completes

One type of asynchronous I/O (AIO) that exists natively under Linux is to pass data through signals or callbacks

Disadvantages:

  • Only under Linux
  • Only O_DIRECT in I/O is supported. As a result, the system cache cannot be used

Note: About O_DIRECT

Realistic asynchronous I/O

Asynchronous I/O is easily implemented (although it is simulated) by having a few threads do either blocking I/O or non-blocking I/O plus polling, and having one thread do the computation and pass the DATA from the I/O through communication between threads

  • Libeio essentially simulates asynchronous I/O with thread pools and blocking I/O
  • Node initially implemented asynchronous I/O using Libeio and LibeV on the * NIx platform, and then implemented thread pools by itself
  • IOCP under Windows
    • Call an asynchronous method, wait for a notification that the I/O has completed, and perform a callback without the user thinking about polling
    • Inside, it’s still thread pools, but the difference is that these thread pools are managed by the kernel
    • This is very similar to the Node asynchronous invocation model
  • Due to the differences between The Windows and * NIx platforms, Node provides Libuv as an abstract wrapper to determine compatibility
    • Ensure that the custom thread pools and IOCPs for upper nodes and lower nodes are separate
  • We often mention that Node is single-threaded
    • The single thread here is just JavaScript executing in a single thread
    • For both * NIx and Windows platforms, there is a separate thread pool for internal I/O tasks

Node asynchronous I/O

Node completes the asynchronous I/O loop with event loops, observers, request objects, etc

Event loop

Emphasis is placed on Node’s own execution model, the event loop

When the Node process starts, it creates a loop similar to while(true)

The process of each loop body is called Tick, and the process of each Tick is to check whether there is an event to be processed

Fetch the event and its associated callback functions, if any, and execute them

The observer

There are one or more observers in each event loop, and the process of determining if there are events to be processed is to ask those observers if there are events to be processed

  • Browsers use a similar mechanism
    • Events can occur when a user clicks or loads a file, and each of these events has a corresponding observer
  • Node events originate from network requests and file I/O
    • The observers corresponding to these times are file I/O observers, network I/O observers, etc., which classify the events
  • The event loop is a typical producer/consumer model
    • Asynchronous I/O, network requests, and so on are producers of events
    • These events are passed to the corresponding observer, and the event loop retrieves the event from the observer and processes it

summary

The event loop, observer, request object, and I/O thread pool constitute the basic elements of NOde asynchronous I/O model

Since we know that JavaScipt is single-threaded, it is easy to understand that it does not take full advantage of multicore cpus by trying

In fact, in Addition to the fact that JavaScript is single-threaded in Node, Node itself actually likes nicknames a lot, but the I/O threads use less CPU

Another point to note is that all I/O can be executed in parallel, except for user code that cannot be executed in parallel

Note: The picture shows the entire Node asynchronous I/O process

Event-driven and high-performance servers

The previous explanation of asynchrony also Outlines the essence of event-driven, that is, through the main loop plus event trigger to run the program

Here are some classic server models:

  • synchronous
    • Only one request can be processed at a time, and the rest of the requests are in wait state
  • Process/request
    • This can handle multiple requests, but it is not scalable because there are only so many system resources
  • Thread/request
    • Although threads are cooler than processes, because each thread takes up a certain amount of memory, when large concurrent requests arrive, memory will quickly run out, slowing the server down
    • Better than process/request, but still not enough for large sites
  • conclusion
    • The thread/request approach is still used by Apache
    • Node handles requests in an event-driven manner, eliminating the need to create additional threads for each request and eliminating the overhead of thread creation and thread destruction
    • At the same time, the operating system scheduling tasks because of fewer threads, the cost of context is very low
      • This allows the server to process requests methodically, even with a large number of connections, without being affected by context-switching overhead, which is one reason for Node’s high performance

Event-driven efficiency is beginning to be valued by the industry

Well-known server Nginx has also abandoned the multi-threaded approach, using the same event-driven Node

The difference is that Nginx is written in pure C, which has high performance, but it is only suitable for Web server, used for reverse proxy or load balancing service, which is relatively poor in business processing

Node is a high-performance platform that can be used to build the same functionality as Nginx and handle specific businesses

Node is not quite as specialized as Nginx for Web servers, but the scenarios are larger and perform well on their own

In practical projects, their respective advantages can be combined to achieve the optimal performance of applications

JavaScript is almost blank on the server side, leaving Node with no historical baggage, and Node’s performance optimization has made it instantly popular in the community

Write in the last

This article introduces the purpose of Node creation, language selection, features, module mechanism, package management mechanism and asynchronous I/O and other related knowledge, hoping to bring you a new understanding of Node. Recently, I have been planning to learn Node and server related knowledge. Interested students can learn and communicate with each other

  • Nuggets: Front-end LeBron
  • Zhihu: Front-end LeBron

Continue to share technical blog posts, follow the wechat official account 👇🏻