- Nodejs module parsing order, core module > path file module > custom module
- After the Nodejs Module is loaded, the result will be cached in module. _cache to facilitate secondary loading. Each file loading takes precedence over the cache
- If there is no extension, Node adds the.js >.node >.json extension in that order
- It is not recommended to load custom extensions using require.extensions[. Json]. The same with the Module. _extensions
- Js modules are wrapped by Node when compiled, and cannot be used directly because exports are parameters
exports = function() {}
, should be usedmodule.exports
(function(exports.require.module, __filename, __dirname) {
// ocde
})
Copy the code
- The C/C ++ module executes and Node calls it
process.dlopen()
Dlopen is implemented differently on Windows and Linux, encapsulated by the Libuv compatibility layer - Json file parsing The FS module synchronously reads the JSON file and parses it through json.parse
- Core module written by C/C ++ and javascript written in two parts
- The javascript core module converts all the built-in js code into c++ arrays using the js2c.py tool that comes with v8, and generates the node_natives. H header. In this process, js is stored as a string in the node namespace and cannot be executed directly. This code is added to memory during node startup. It also goes through the same wrapping process as a file module, but differs from a file module in how the source code is retrieved and where the execution results are cached. The source file is obtained using process.binding(‘natives’), and is compiled successfully and stored in the NativeModule._cache
- C/C ++ written module is divided into pure C ++ written and core written by C/C ++, JS main implementation encapsulation mode. These parts written in pure C/C ++ are called built-in modules.
- Built-in modules define modules into the node namespace through the NODE_MODULE macro, and the node_extensions.h file places these hashed built-in modules into the node_module_list array. Fetching built-in modules can be done using the get_buildin_module() method provided by Node.
- Core modules introduce processes
require('os')
NativeModule.require('os')
process.bind('os')
get_builtin_module('node_os')
NODE_MODULEE(node_os, reg_func)
Copy the code
- Blocking I/O is after a call until all operations at the system kernel level have been completed
- Non-blocking I/O returns immediately after invocation, improving CPU utilization. But because only the state is retrieved, the application needs to repeatedly invoke the I/O operation to confirm completion in order to retrieve the complete data. This technique of repeated calls to determine whether an operation is complete is called polling.
- Polling has read, select, poll, epoll, keueue, etc
- Realistic I/O implements AIO asynchronous I/O through thread pool emulation
- Event loop. Each loop is called a Tick, and the “observer” is asked to determine whether there is event processing
- The event loop is a typical producer/consumer model. Asynchronous I/O, network requests are time producers, and the event loop pulls events from observers and processes them.
- Windows Asynchronous I/O process
Initiate an asynchronous request assemble a request object send it to the I/O thread pool wait to execute execute the I/O operation on the request object put the result of the request execution into the request object notify the IOCP that the call is complete & return the thread Tick execution checks the thread pool to see if any completed requests are put into the I/O observer queue The callback is executed by removing result as an argumentCopy the code
- Linux uses epoll for asynchronous I/O processes. FreeBSD is implemented via kQueue.
- Non-i /O asynchronous apis. SetTimeout and setInterval are implemented in a similar way to I/O, except that they do not involve I/O. When called, a timer is created and placed in a red-black tree of the timer observer. Each time the Tick is executed, the timer object is removed iteratively from the red-black tree to check whether the time times out. If the timeout is an event, its callback is executed immediately. Process. nextTick rolls the callback into the queue and takes it out on the nextTick, which is more efficient. SetImmediate is similar to the process.nextTick implementation, except that setImmediate is stored in a linked list while Process. nextTick is stored in an array. Each time the process.nextTick is cleared, setImmediate executes only one. And Process. nextTick precedes setImmediate because the idle observer precedes the I/O observer and the I/O observer precedes the Check observer.
- Nginx also uses an event-driven approach as a server similar to Node. Nginx performs better written in pure C, but Nginx is only suitable as a Web server.
- In a normal backend development language, there is no limit to the basic memory usage, but when using the memory in JavaScript in Node, you find that you can only use partial memory (about 1.4GB on 64-bit systems, about 0.7GB on 32-bit systems) due to v8 limitations
- You can use process.memoryusage () to request memoryUsage
- V8 uses a generational garbage collection algorithm. No single garbage collection algorithm is suitable for all scenarios, and different algorithms work best for specific situations. V8 mainly divides memory into the new generation and the old generation according to the object survival time. The objects in the new generation have a shorter survival time, while the old generation has a longer survival time. Use –max-new-space-size and –max-old-space-size to specify new and old memory sizes at startup. V8 doesn’t automatically scale.
- The Scavenge algorithm is used to recycle garbage. The new generation memory was split in half, called Semispace, with only one in use and the other idle. A semispace in use is called a From space, and an idle Semispace is called a To space. Memory allocation takes place in From, and when garbage collection begins, surviving objects From the From space are checked and copied To the To space, and when replication is complete, the From space is cleared and the roles of the From and To Spaces are reversed. The algorithm has high efficiency, but its disadvantage is that the space utilization rate is not high, but it is suitable for the new generation objects with short life cycle. When an object survives multiple copies, it is considered to be an object with a long life cycle. The object will be moved to the old generation and managed by a new algorithm. This movement process is called promotion.
- Be recycled and use more than 25% of the To exploiture. The To exploiture is allocated as the From space, so you need To leave enough space. The promoted object will be processed by a new collection algorithm.
- The Scavenge algorithm is inefficient and wastes half the space when you have too many objects. So the old generation mainly used a combination of Mark-sweep and Mark-compact.
- Mark-sweep is divided into two phases: marking and cleaning. In the marking traversal phase, the surviving objects are marked, and the unmarked objects are cleared after the marking is completed. This method causes memory fragmentation. This is where the Mark-Compact markup is used, which moves all viable objects to one side and then removes non-viable objects from the other side. V8 mainly uses Mark-sweep because of the time it takes to move objects around, and mark-Compact is used when memory is low
- In order to avoid the inconsistency between the JS application logic and the garbage collector, the three basic methods of garbage collection all need to pause the application logic and resume the execution of the application logic after the garbage collection. This behavior is called stop-the-world. To avoid a full pause time process, V8 introduces incremental tagging, incremental collation, deferred cleanup, and so on, which is broken down into small steps that alternate garbage collection and application logic until the task is complete.
- In the Web server, sessions are usually processed by memory, but when there is a large number of visits, the number of live objects in the old generation increases rapidly, which not only causes the cleaning/collation process to take time, but also causes memory strain and even overflow.
- You can add –trace_gc to print garbage collection logs. You can also add –prof to get v8 performance analysis data, which includes the elapsed time of garbage collection, and can be improved by using the Linux-tick-Processor tool provided in the Node source code.
- Global variables add references and, if not released, hold them until the end of the program. In general, you can access external variables in JS via the chain of scope, the inner scope. Methods that implement external scope access to internal scope variables are called closures. This is thanks to the nature of advanced functions: functions can be arguments or return values. Closures can do a lot of clever things, but the problem is that once a variable references the intermediate function, the intermediate function is not released, the original scope is not released, and the memory footprint of the scope is not released until it is no longer referenced. Be aware of global variables and closures, which increase memory.
- Buffer does not go through v8’s memory allocation mechanism, so there is no limit to the size of the heap.
- The causes of memory leaks are as follows:
- Cache (use object key/value pairs to cache, a good way to solve the cache problem is to use external caches such as Redis, which can both make garbage collection more efficient, and can also share the cache)
- Queue consumption is not timely (consumption speed is less than production speed, accumulation alarm mechanism can be used)
- Scope is not released
- Solution for troubleshooting memory leaks. Nodeprofiler, node_hepdump, node_mtrace, dtrace, node_memwatch
- A Buffer object is an array-like object. The elements of a Buffer object are two hexadecimal digits ranging from 0 to 255.
- When the size of the allocated Buffer is less than 8KB, an 8KB slab is generated and refers to the local variable. This slab can be shared by subsequent buffers, and its position and state are recorded. When the memory is insufficient, a new slab is generated. However, if the memory size to be allocated is larger than 8KB, a slab is allocated directly. The length of this slab is the length to be allocated and is exclusively used by the Buffer.
- Buffer can be encoded in a variety of ways via buf.write. String conversion can be implemented through buf.toString. Buffer itself supports a limited number of encoding types, which can be determined using buffer.isencoding. For GBK and other unsupported coding can use Node ecological Iconv, IconV-Lite to do.
- Buffer concatenation using + will call toString by default to concatenate strings. This is fine in most scenarios, but if you encounter a wide string that may cause garbled characters, you should use arrays to store buffers. Finally, Buffer objects are merged with buffer. concat.
- Buffer is more efficient than direct text transmission in HTTP requests, which can effectively reduce CPU reuse and save server resources.
- Fs.createreadstream works by preparing a Buffer in memory and then progressively copying bytes from disk into Buffer as fs.read reads. When a read is completed, a portion of the Buffer is sliced as a small Buffer, which is passed to the caller via a data event. Ideally, each read should be the size of the highWaterMark, but if you get to the end of the file or if the file itself is not that large, some of the preset buffer will be free. This buffer pool resides in memory, and when memory runs out, a new buffer object is allocated. HighWaterMark affects performance. If highWaterMarker is set too small, it may lead to too many system calls.
- HTTP requests and responses
// server.js
var http = require('http');
http.createServer((req, res) = > {
res.writeHead(200, {
'Content-Type': 'text/plain'
});
res.end("hello world\n");
}).listen(1337.'127.0.0.1');
Copy the code
// client.js
var http = require('http');
var req = http.request({
host: '127.0.0.1'.port: 1337.path: '/'.method: 'GET'
agent: false // A separate agent is created for the current request by default
}, res= > {
console.log('STATUS: ' + res.statusCode);
console.log('HEADERS: ' + JSON.stringify(res.headers));
res.setEncoding('utf8');
res.on("data".chunk= > {
console.log(chunk);
})
})
req.end();
Copy the code
- By default, up to five connections can be created for HTTP requests made to the same server through the ClientRequest object. It is essentially a connection pool managed by HTTP.GlobalAgent with a default maximum of infinite connections that can be passed
maxSockets
To set up.agent.sockets
Represents the number of connections currently used in the connection pool,agent.requests
Represents the number of connections currently in the wait state
const agent = new http.Agent({
maxSockets: 3
})
Copy the code
- Websocket and Node work together perfectly, and the event-based programming model of the WebSocket client is similar to that of node’s custom events. The WebSocket protocol is implemented directly based on TCP rather than HTTP, but the websocket handshake process uses HTTP. The Websocket protocol is divided into two parts: handshake and data transmission. A Websocket handshake request differs from a normal HTTP request in many ways
Upgrade: websocket
andConnection: Upgrade
, these two fields indicate that the request server upgrade protocol is WebSocket. Some checksum headers are also included. - Cookies can be used to store state. However, some sensitive information may cause security problems if saved directly on the client. Therefore, there is a session scheme, cookie saves the password, and the server saves the session and queries the session with the cookie password every time. When using a multi-core CPU, requests may be split among many processes, and multiple processes cannot be shared. And because sessions are stored in memory, there is a risk of memory overflow. To solve this problem, you can use efficient caches such as Redis and memcached. Since passwords in cookies may not be enumerated, they can be signed to make enumeration more difficult for an attacker. However, the password after the signature may also be stolen by attackers. To solve this problem, you can use the user’s unique information such as user IP and user agent to sign, so as long as the attacker does not access the original client, it will fail.
- Nodejs execution JS is single process single thread, in order to take advantage of multi-core CPUS can be passed
child_process
Start multiple processes. Interprocess communication can be monitored using message events and sent using SEND. - Handles can be sent between processes, which can be used to identify a socket object, a UDP socket, a pipe, etc. We know that there is a problem with multiple child processes listening on the same interface, but if we let the main process listen on a port and then pass the handle to the child process, then the child process can listen on the same port. Inter-process entities can only pass messages through IPC, and do not actually pass objects. This is the result of passing handles, serializing them, and reconstructing them for child processes.
// parent.js
const cp = require('child_process')
const child1 = cp.fork('child.js');
const child2 = cp.fork('child.js');
const server = require('net').createServer();
server.on('connection'.(socket) = > {
socket.end('handled by parent\n');
})
server.listen(1337.() = > {
child1.send('server', server);
child2.send('server', server);
server.close();
})
// child.js
const http = require("http");
const server = http.createServer((req, res) = > {
res.writeHead(200, { 'Content-Type': 'text/plain'});
res.end('handled by child, pid is ' + process.pid + '\n');
})
process.on("message".(m, tcp) = > {
if (m === 'server') {
tcp.on('connection'.(socket) = > {
server.emit('connection', socket)
})
}
})
Copy the code
- The principle of common listening between processes is that the file descriptor restored by the handle is the same. File descriptors can only be used by the same process at a time. So multi-process services are preemptive.
- The parent process can send to the child process
kill
And other instructions, each process can listen to these signal events, make the corresponding processing. We can listen in this way and copy the child when it exits. Handle exceptions at the same time, when the self process is abnormal, the main process is told to close and will not accept new requests, the main process will copy a new child process to handle the request, and then the exception process smoothly exit. This greatly improves the stability and robustness of our applications. Finally, we can further robust, set the maximum number of restarts in a short period of time, if the value exceeds this value to give up the restart, and do the corresponding alarm and monitoring.
// master.js
const net = require('net');
const cp = require('child_process');
const server = net.createServer();
server.listen(1337);
const workers = {};
const createWorker = () = > {
const worker = cp.fork('./worker.js');
worker.on('message'.message= > {
if (message.act === 'suicide') {
createWorker();
}
})
worker.on('exit'.() = > {
console.log('worker ' + worker.pid + ' exited.');
delete worker[worker.pid];
createWorker();
})
worker.send('server', server);
workers[worker.pid] = worker;
console.log('create worker, pid: ' + worker.pid);
}
for (let i = 0; i < 4; i++) {
createWorker();
}
process.on("exit".() = > {
for(let pid inworkers) { workers[pid].kill(); }})// worker.js
const http = require("http");
const server = http.createServer((req, res) = > {
res.writeHead(200, { 'Content-Type': 'text/plain'});
res.end('handled by child, pid is ' + process.pid + '\n');
throw new Error('throw exception');
})
let worker;
process.on("message".(m, tcp) = > {
if (m === 'server') {
worker = tcp;
worker.on('connection'.(socket) = > {
server.emit('connection', socket)
})
}
})
process.on("uncaughtException".function(err) {
// Record logs for troubleshooting
loggeer.error(err);
// Sends a suicide signal to the main process
process.send({act: 'suicide'});
// Stop accepting new connections
worker.close(() = > {
// Exit the process after all connections are disconnected
process.exit(1);
})
// In the case of ws long connection, it may take a long time, here we set timeout
setTimeout(() = > {
process.exit(1);
}, 5000)})Copy the code
- Load balancing can use the round-robin mechanism provided by Node, also known as round robin scheduling. The following describes how to enable it
cluster.schedulingPolicy = cluster.SCHED_RR
cluster.schedulingPolicy = clusster.SCHED_NONE
Copy the code
Or set the value of NODE_CLUSTER_SCHED_POLICY in the environment variable.
export NODE_CLUSTER_SCHED_POLICY=rr
export NODE_CLUSTER_SCHED_POLICY=none
Copy the code
- The simplest way to share state between multiple processes is to use a third party to store data that is read into memory when all worker processes start. The problem with this approach, however, is that if the data changes, there needs to be a mechanism to notify each child process, so that the internal state of macau is also updated. If all child processes constantly poll for data updates, too much will increase the query overhead. The other is active notification, which is also polling but uses a separate notification process to query and notify individual worker processes. If you use signal transmission, it will be invalid when multiple servers, so you can consider using TCP or UDP scheme.