Writing is not easy, without the permission of the author forbid to reprint in any form! If you think the article is good, welcome to follow, like and share! Continue to share technical blog posts, follow the wechat public account 👉🏻 front-end LeBron
What was the original purpose of Making Node?
Node by Ryan Dahl
Create a lightweight, high-performance Web server based on V8 and provide a set of libraries
Why JavaScript?
Ryan Dahl is a veteran C/C++ programmer who worked around Web high performance servers before creating Node
He found two things about Web high performance servers:
- event-driven
- Non-blocking I/O
Ryan Dahl also evaluated the use of C, Lua, Haskell, Ruby and other languages as alternative implementations and came to the following conclusions:
- C has a high barrier to development, and it is predictable that not many developers will be able to use it for business development
- Ryan Dahl decided he wasn’t good enough to play Haskell, so he abandoned it
- Lua already has many blocking I/O libraries, and building a non-blocking I/O library for Lua doesn’t change developer habits
- Ruby virtual machines don’t perform well
Advantages of JavaScript:
- Low barriers to development
- There is no historical baggage in the backend
- The second browser war is drawing to a close, with Chrome’s JavaScript engine V8 taking the top spot
What Node brings to JavaScript
Node’s structure is very similar to Chrome’s, except that UI-related technologies like HTML, Webkit, and graphics are not supported. They are both event-driven and asynchronous architectures:
- The browser serves interactions on the interface through an event driver
- Node services I/O with an event driver
JavaScript is also given new capabilities in Node:
- Access local files as much as you want
- Build the WebSocket server
- Connect the database and conduct business research and development
- Play multiple processes like a Web Worker
Node allows JavaScript to run in different places, no longer confined to browsers and DOM trees. If HTTP is horizontal, Node is the browser’s reflection on the other side of the stack.
Node doesn’t handle the UI, but runs with the same mechanics and principles as the browser, breaking the rule that JavaScript only runs in the browser. Unified front-end and back-end programming environment can greatly reduce the context cost of front-end and back-end conversion.
The characteristics of the Node
Asynchronous I/O
- Take reading files as an example
var fs = require('fs');
fs.readFile('/path'.function (err, file) {
console.log('Reading file completed')});console.log('Initiate file reading');
Copy the code
Familiar users will know that “read file complete” is printed after “initiate read file.
The code after fs.readfile is executed immediately, while the “readFile complete” execution time is not expected
We know that it will be executed after this asynchronous operation, but we don’t know when
The capture of the result value in an asynchronous call conforms to the “Don’t call me, I will call you” principle
This is also a result – oriented, not caring about the process
In Node, the vast majority of operations are called asynchronously. Ryan Dahl overcame all difficulties to build a lot of asynchronous I/O apis at the bottom, from file reading to network requests. It makes it very natural for developers to conduct parallel I/O operations from the language level, without waiting for the end of previous I/O calls between each call, which can greatly improve efficiency in the programming model
Note: The asynchronous I/O mechanism is described in more detail below
Events and callback functions
The event
With the advent of Web2.0, JavaScript took on more responsibility at the front end and time became widely used. Bringing the event and return functions that are widely used and mature in the front-end browser to the back end, along with asynchronous I/O, is a good way to expose the timing of the event to the business logic.
- Server example
The request event is bound to the server
For the request object, the Data and End events are bound
var http = require('http');
var querystring = require('querystring');
// Listen for server request events
http.createServer(function (req, res) {
var postData = ' ';
req.setEncoding('utf8');
// Listen for the requested data event
req.on('data'.function (trunk) {
postData += trunk;
});
// Listen for the request's end event
req.on('end'.function () {
res.end(postData);
});
}).listen(8080);
console.log('Server startup completed');
Copy the code
- The front case
Once the request is issued, you only need to care about executing the appropriate business logic if the request succeeds
request({
url: '/url'.method: 'POST'.data: {},
success: function (data) {
/ / success}});Copy the code
The event programming method has the advantages of lightweight, loose coupling and only focusing on transaction points. However, in the scenario of multiple asynchronous tasks, events are independent from each other, so how to cooperate is a problem. A series of asynchronous programming solutions have emerged subsequently:
- Event publish/subscribe pattern
- Promise, async/await
- Process control library
The callback function
- In addition to asynchrony and events, Node also features callback functions
- In general, callback functions are also the best way to receive data returned by asynchronous calls
- But this kind of programming way for many people who are used to synchronous thinking of programming, may be very unaccustomed
- The order in which the code is written has nothing to do with the order in which it is executed, which can be difficult for them to read
- In terms of process control, asynchronous methods and callback functions are interspersed, making it less straightforward than the normal synchronous approach
- After switching to asynchronous programming thinking, the complexity of handling business in process control is virtually the same as that in synchronous way, through the division of business and the refinement of events
Single thread
Node maintains the single-threaded nature of JavaScript in the browser
JavaScript does not share any state with other threads. The big advantage is that you do not have to worry about state synchronization as much as multithreaded programming does. There are no deadlocks and no performance overhead associated with thread context exchange
- Disadvantages of single threading
- Unable to utilize multi-core CPUS
- Errors can cause the entire application to quit, resulting in poor robustness
- A large number of calculations occupy the CPU, causing asynchronous I/O calls to fail
- Child_process and Cluster modules have been introduced to alleviate these shortcomings
cross-platform
At first, Node only ran on Linux, but if you wanted to learn and use Node on Windows, you had to go through Cygwin/MinGW. Later, Microsoft decided to implement a cross-platform architecture based on Libuv
- libuv
A layer of platform architecture is built between the operating system and Node’s upper module system
With a good architecture, Node’s third-party C++ modules can also be cross-platform with libuv
Node module mechanism – CommonJS
Background:
In other high-level languages, Java has class files, Python has the import mechanism, Ruby has require, and PHP has include and require. JavaScript, on the other hand, has a messy way of introducing code through script tags. People have to artificially constrain code in namespaces and other ways to make it safe and easy to use.
Until CommonJS came along…
vision
I want JavaScript to run anywhere
The starting point
For JavaScript itself, the specification is still weak, with the following flaws:
- No modular system
- Fewer standard libraries
- ECMAScript defines only part of the core library
- There is no standard API for common requirements such as file system I/O streams
- No standard interface
- In JavaScript, there is almost no standard unified interface defined for Web servers or databases
- Lack of a package management system
- As a result, JavaScript applications basically do not have the ability to automatically load and install
CommonJS is designed to make up for the current lack of standard JavaScript, so that Python, Ruby and Java have the basic ability to develop large applications, rather than stay in the stage of small scripting programs.
- Server-side JavaScript program
- Command line tool
- Desktop graphical interface applications
- Hybrid application
The CommonJS specification covers:
- The module
- binary
- Buffer
- Character set encoding
- I/O streams
- Process environment
- The file system
- The socket
- Unit testing
- Gateway interface of the Web server
- Package management
Node’s relationships with browsers, as well as the W3C, CommonJS, and ECMAScript organizations, make for a thriving ecosystem
The module specification
- The module definition
The context provides the exports object with a method or variable to export the current module, and it is the only exit for the export
There is also a Module object that represents the module itself, and exports is a module property
In Node, a file is a module, and methods are mounted on exports objects as properties to define how to export
// math.js
exports.add = function(a, b){
return a + b;
}
Copy the code
- The module reference
const math = require('./math');
const res = math.add(1.1);
console.log(res);
/ / 2
Copy the code
In the CommonJS specification, there is the require method, which accepts a module identifier to introduce a module’s API into the current context
- Module identifier
The module id is the argument passed to the require method, which can be:
- How do small humps name strings
- . /,.. / Indicates a relative path or an absolute path
- Can not have file name suffix.js
The module definition is very simple, and the interface is very concise
Each module has its own space, they do not interfere with each other, and they are clean when referencing
- Meaning:
Restrict clustering methods, variables, and so on to private scopes, while supporting import and export capabilities to smoothly connect upstream and downstream dependencies
Module implements
To import modules into Node, you need to go through three steps
- Path analysis
- File location
- Compile implementation
Nodes are divided into two types of modules:
- The core module
During compilation, the binary executable file is compiled
When the Node process starts, part of the core module is directly loaded into memory, so when this part of the core module is introduced, the two steps of file location and compilation execution can be omitted, and the priority judgment in the path analysis, so it is the fastest loading speed.
- User-written file modules
Dynamic loading at runtime requires a complete path analysis, file location, compilation and execution process, slower than the core module
Preferentially load from cache
Just as browsers cache static script files to improve performance, Node recaches imported modules to reduce overhead. The differences are:
- The browser only caches files
- Node caches objects after compilation and execution
The require method applies cache first to all secondary loads of the same module, regardless of whether it is a core module or a file module
Path analysis and file location
Identifier Analysis (path)
As mentioned earlier, the require method takes a parameter as an identifier, which falls into the following categories:
- The core module
It is second in priority to cache loading, and is compiled to binary during source compilation of Node, making it the fastest loading process
Note: Loading a custom module with the same identifier as the core module will not succeed and can only be done by selecting a different identifier/swap path
- File module in the form of a path
With the /,.. Identifiers starting with/are treated as file modules
The require method converts the path to the real path, indexes the real path, and stores the compiled result in the cache to make the secondary load faster
The file module gives Node the exact location of the file, so it saves a lot of time in the lookup process and only loads slower than the core module
- Custom modules
Is a special file module in the form of a file or package
This type of module lookup is the most time-consuming and the slowest
First, I will introduce the concept of module path, which is also the search strategy developed when locating file modules. Specifically, it is an array of paths
console.log(module.path)
- You can get an array of paths
[‘/home/bytedance/reasearch/node_modules’,
‘/home/bytedance/node_modules’,
‘home/node_module’, /node_modules’]
You can see the rules as follows:
- Node_modules directory in the current file directory
- Node_modules directory in the parent directory
- Parent directory Node_modules directory in the parent directory
- Recurse up the path to the node_modules directory in the root directory
It is generated in much the same way that JavaScript prototype/scope chains look up
During loading, Node tries each path in the module path until it finds the target file
The deeper the file path, the more time it takes to find modules, which is why custom modules are the slowest to load
File location
- File name extension analysis
Require analysis identifiers do not contain file extensions
Js,.json, and.node. Try it once
The fs module is called to block synchronously to determine whether the file exists, which can cause performance problems for Node single-threading
If it is a. Node /. Json file with the extension can speed things up a bit, and with the caching mechanism, can greatly alleviate the node single-thread blocking defects
- Directory analysis and packages
Parsing the identifier might not find a file, but get a directory, which is treated as a package
Parse the package.json file for the file name specified by the main property of the package
If the file corresponding to main is parsed incorrectly/there is no package.json file, Node will use index as the filename
Js index.json index.node
If the directory is not located successfully, search for the next module path
An exception is thrown until the module path array has been traversed and no target file is found
Modules compiled
In Node, each file module is an object
function Module(id, parent) {
this.id = id;
this.exports = {};
this.parent = parent;
if (parent && parent.children) {
parent.children.push(this);
}
this.filename = null;
this.loaded = false;
this.children = [];
}
Copy the code
- Js file
- Read files synchronously through fs module and compile and execute
- The node file
- This is written in C/C++ extension file, by dlopen method added at the end of the compilation generated file
- Json file
- Json. parse parses the returned result after the fs module synchronously reads the file
- other
- Are loaded as JS files
Each compiled Module will store its file path as an index on the module. cache object to improve secondary import performance
Package and NPM
Node organizes its core modules and allows third-party file modules to be written and used in an orderly fashion
However, in third-party modules, modules are still hashed everywhere and cannot be directly referenced to each other
Outside of modules, packages and NPMS are mechanisms that link modules together
To some extent, it solves the problems of code organization such as variable dependence and dependency relationship
Package structure
A package is actually an archive file, that is, a directory is packaged into a. Zip /tar.gz file, decompressed after installation to restore the directory
- A package directory that complies with the CommonJS specification should contain the following files
- Package. json package description file
- Bin stores executable binary files
- Lib A directory for storing JavaScript code
- Doc Is the directory for storing documents
- Test is the code that holds the unit test case
Package description file
package.json
CommonJS defines the following required fields for package.json
- Name the package name
- The description package introduction
- Version version number
- Keyword array for NPM search
- Maintainers Specifies the package maintainers list
- List of Contributors
- Bugs A web page address/email address that can be used to report bugs
- List of Licenses
- A list of locations of the repository source code
- Dependencies Uses the packages on which the current package depends
- Homepage Specifies the website address of the current package
- OS List of supported operating systems
- Aix, FreeBSD, Linux, MacOS, Solaris, VxWorks, Windows
- CPU CPU architecture support list
- Arm, MIPS, PPC, Sparc, x86, X86_64
- Builtin indicates whether the current package is a standard component built into the underlying system
- Implements a list of implementation specifications
- Scripts Script description object
NPM is implemented based on the definition of the package specification, which helps Node solve the problem of dependent package installation
NPM common functions
The CommonJS package specification is the theory, and NPM is one of the practices
NPM for Node is equivalent to gem for Ruby and PEAR for PHP
Help with the release, installation and dependency of third-party modules
- See the help
- Check the version
npm -v
- Check the command
npm
- Installing dependency packages
npm install {packageName}
Copy the code
After this command is executed, NPM creates a package directory in the node_modules directory in the current directory, and then unzips the corresponding package to this directory
- Global Installation mode
npm install {packageName} -g
Copy the code
Global mode is not meant to install a module package as a global package, and it does not mean that you can reuqire it from anywhere
Global mode This becomes imprecise, -g actually installs a package as a globally available execution command
It links the actual script to the same path as the Node executable according to the bin field configuration in the package description file
- Local Installation
For some packages that are not published to NPM, or cannot be installed directly because of network reasons
This can be done by pressing the package to local and then installing it locally
npm install <tarball file>
npm install <tarball url>
npm install folder>
Copy the code
- Install from unofficial sources
If you cannot use the official source, use the image source
npm install --registry={urlResource}
Copy the code
If the mirror source is used almost exclusively, you can specify the default source
npm config set registry {urlResource}
Copy the code
- NPM hook command
The idea behind the scripts field in package.json is to have packages provide hooks during installation, uninstallation, etc
"scripts": {"preinstall": "preinstall.js"."install": "install.js"."uninstall": "uninstall.js"."test": "test.js",}Copy the code
- Install
- Execute in the above fields
npm install <package>
The script pointed to by preinstall is loaded and executed, and the script pointed to by install is executed
- Execute in the above fields
- Uninstall
- perform
npm uninstall <package>
When uninstall points to a script, it may do some cleaning
- perform
- Test
- perform
npm test
The script pointed to by test will be run. A good package should contain the test cases and have commands to run the tests configured in package.json so that users can run the test cases to verify that the package is stable and reliable
- perform
LAN NPM
- background
The limitation for an enterprise is that it needs to enjoy the low coupling and project organization benefits of module development on the one hand, while considering module confidentiality on the other. Therefore, sharing and publishing through NPM is potentially risky.
- The solution
The solution is for enterprises to build their own NPM repositories in order to enjoy the many packages available on NPM while keeping their own packages confidential and restricted. NPM is open source for both its servers and clients.
A local NPM repository is set up in much the same way as a mirror repository, except that you can choose not to synchronize the packages in the official source repository
- role
- Private reusable modules can be packaged into a local NPM repository, which keeps the update centralized and prevents small projects from maintaining modules with the same functionality
- Do not share code by copy and paste
Asynchronous I/O
Why asynchronous I/O?
- The user experience
JavaScript executes on a single thread in the browser and shares a thread with UI rendering
High Performance JavaScript has concluded that pages can get stuck if scripts take more than 100ms to execute
If a web page temporarily needs to obtain a network resource through synchronization, JS can only continue to execute after the resource is completely obtained from the server. During this period, UI will stop and do not respond to user interaction. You can imagine how bad the user experience would be.
With asynchronous requests, JavaScript and UI execution are not in a waiting state, giving the user a live page
I/O is expensive, distributed I/O is even more expensive
The front-end experience can only be good if the back end responds quickly to resources
- The allocation of resources
During the development of computer, the components are abstracted into I/O devices and computing devices
Assuming a business scenario has a set of unrelated tasks to complete, there are two main approaches:
- Multithreading in parallel
The cost of multithreading is the overhead of creating threads and performing thread context switches.
Problems such as locking and state synchronization are often encountered in complex services. But multithreading can improve CPU utilization on multi-core cpus
- A single thread executes sequentially
Single-threaded sequential execution of tasks is more consistent with the way programmers think sequentially, and is still the mainstream programming way
The disadvantage of serial execution is performance, as any task that is slightly slower will cause the subsequent execution code to block
In computer resources, where I/O and CPU calculations can normally be done in parallel, the problem with the synchronous programming model is that I/O can wait for subsequent tasks, which makes the resources not better utilized
- Node gives its answer somewhere in between
Using single thread, away from multi-thread deadlock, state synchronization and other problems;
Using asynchronous I/O allows single threads to get away from blocking and better use of the CPU
To compensate for the inability of a single thread to take advantage of a multi-core CPU, Node provides a child process similar to Web Workers in a front-end browser that uses CPU and I/O efficiently through worker processes
It is expected that the INVOCATION of I/O will not block subsequent operations, and the time waiting for I/O completion will be allocated to other services
Status of asynchronous I/O
Asynchronous I/O and non-blocking I/O
The operating system kernel has only two types of I/O: blocking and non-blocking
When calling blocking I/O, the application waits for the I/O to complete before returning the result
Features: The call must wait until all operations are completed at the kernel level before the call ends
Example: This call ends after the system kernel has done disk seeking, read data, and copied data into the inside talent.
Non-blocking I/O differs from blocking I/O in that it is returned immediately after the call
After the non-blocking I/O returns, the CPU’s time slice can be used to process other transactions, where the performance improvement is significant
Existing problems:
- Because the full I/O is not complete, the data immediately returned is not what the business layer expects but merely the state of the current invocation
- To get complete data, the application repeatedly calls I/O operations to confirm completion, known as “polling.”
The main polling technique
- read
It is the most primitive and the least performance one, through repeated calls to check the state of the I/O to complete the data read
The CPU is spent waiting until the final data is available
- select
It is an improvement on read, which makes judgments about event states on file descriptors
Limitations: It uses a 1024-length array to store state and can check up to 1024 file descriptors at the same time
- poll
An improvement over SELECT, it uses a linked list to avoid array length constraints, and it can avoid unnecessary checks
With a large number of file descriptors, its performance is still very low
- epoll
This scheme is the most efficient I/O event notification mechanism in Linux. If no I/O event is detected during the polling, it will sleep until the event wakes it up. It takes full advantage of event notifications and performs callbacks instead of traversing queries, so it doesn’t waste CPU and is more efficient to execute
Ideal non-blocking asynchronous I/O
Although epoll has used time to reduce CPU consumption, the CPU is almost limited during hibernation and underutilized for the current thread
The perfect asynchronous I/O is an application that makes a non-blocking call without polling through traversal or time wake up
You can work directly on the next task, simply passing the data to the application via signals or callbacks after THE I/O completes
One type of asynchronous I/O (AIO) that exists natively under Linux is to pass data through signals or callbacks
Disadvantages:
- Only under Linux
- Only O_DIRECT in I/O is supported. As a result, the system cache cannot be used
Note: About O_DIRECT
Realistic asynchronous I/O
Asynchronous I/O is easily implemented (although it is simulated) by having a few threads do either blocking I/O or non-blocking I/O plus polling, and having one thread do the computation and pass the DATA from the I/O through communication between threads
- Libeio essentially simulates asynchronous I/O with thread pools and blocking I/O
- Node initially implemented asynchronous I/O using Libeio and LibeV on the * NIx platform, and then implemented thread pools by itself
- IOCP under Windows
- Call an asynchronous method, wait for a notification that the I/O has completed, and perform a callback without the user thinking about polling
- Inside, it’s still thread pools, but the difference is that these thread pools are managed by the kernel
- This is very similar to the Node asynchronous invocation model
- Due to the differences between The Windows and * NIx platforms, Node provides Libuv as an abstract wrapper to determine compatibility
- Ensure that the custom thread pools and IOCPs for upper nodes and lower nodes are separate
- We often mention that Node is single-threaded
- The single thread here is just JavaScript executing in a single thread
- For both * NIx and Windows platforms, there is a separate thread pool for internal I/O tasks
Node asynchronous I/O
Node completes the asynchronous I/O loop with event loops, observers, request objects, etc
Event loop
Emphasis is placed on Node’s own execution model, the event loop
When the Node process starts, it creates a loop similar to while(true)
The process of each loop body is called Tick, and the process of each Tick is to check whether there is an event to be processed
Fetch the event and its associated callback functions, if any, and execute them
The observer
There are one or more observers in each event loop, and the process of determining if there are events to be processed is to ask those observers if there are events to be processed
- Browsers use a similar mechanism
- Events can occur when a user clicks or loads a file, and each of these events has a corresponding observer
- Node events originate from network requests and file I/O
- The observers corresponding to these times are file I/O observers, network I/O observers, etc., which classify the events
- The event loop is a typical producer/consumer model
- Asynchronous I/O, network requests, and so on are producers of events
- These events are passed to the corresponding observer, and the event loop retrieves the event from the observer and processes it
summary
The event loop, observer, request object, and I/O thread pool constitute the basic elements of NOde asynchronous I/O model
Since we know that JavaScipt is single-threaded, it is easy to understand that it does not take full advantage of multicore cpus by trying
In fact, in Addition to the fact that JavaScript is single-threaded in Node, Node itself actually likes nicknames a lot, but the I/O threads use less CPU
Another point to note is that all I/O can be executed in parallel, except for user code that cannot be executed in parallel
Note: The picture shows the entire Node asynchronous I/O process
Event-driven and high-performance servers
The previous explanation of asynchrony also Outlines the essence of event-driven, that is, through the main loop plus event trigger to run the program
Here are some classic server models:
- synchronous
- Only one request can be processed at a time, and the rest of the requests are in wait state
- Process/request
- This can handle multiple requests, but it is not scalable because there are only so many system resources
- Thread/request
- Although threads are cooler than processes, because each thread takes up a certain amount of memory, when large concurrent requests arrive, memory will quickly run out, slowing the server down
- Better than process/request, but still not enough for large sites
- conclusion
- The thread/request approach is still used by Apache
- Node handles requests in an event-driven manner, eliminating the need to create additional threads for each request and eliminating the overhead of thread creation and thread destruction
- At the same time, the operating system scheduling tasks because of fewer threads, the cost of context is very low
- This allows the server to process requests methodically, even with a large number of connections, without being affected by context-switching overhead, which is one reason for Node’s high performance
Event-driven efficiency is beginning to be valued by the industry
Well-known server Nginx has also abandoned the multi-threaded approach, using the same event-driven Node
The difference is that Nginx is written in pure C, which has high performance, but it is only suitable for Web server, used for reverse proxy or load balancing service, which is relatively poor in business processing
Node is a high-performance platform that can be used to build the same functionality as Nginx and handle specific businesses
Node is not quite as specialized as Nginx for Web servers, but the scenarios are larger and perform well on their own
In practical projects, their respective advantages can be combined to achieve the optimal performance of applications
JavaScript is almost blank on the server side, leaving Node with no historical baggage, and Node’s performance optimization has made it instantly popular in the community
Write in the last
This article introduces the purpose of Node creation, language selection, features, module mechanism, package management mechanism and asynchronous I/O and other related knowledge, hoping to bring you a new understanding of Node. Recently, I have been planning to learn Node and server related knowledge. Interested students can learn and communicate with each other
- Nuggets: Front-end LeBron
- Zhihu: Front-end LeBron
Continue to share technical blog posts, follow the wechat official account 👇🏻