The original link
background
A typical Node.js application is basically a callback collection of various event response executions: an access connection, I/O completion, timeout expiration, Promise resolution, and so on. There is a single main thread (event loop) that performs all the callbacks. Because other pending callbacks are waiting to be executed, they should be completed as soon as possible. This is a known and challenging limitation of Node.js and is well explained in this documentation.
I recently stumbled upon a real-life event loop blocking scenario at work. As I tried to fix the problem, I realized how little I knew about the behavior of the event loop. This realization surprised me and some of my fellow developers at first. I think it’s important that many Node.js developers feel the same way, and that’s what prompted me to write this article.
A lot of valuable information already exists on this topic, but it took me a while to find the answers to the specific questions I wanted to ask. I will do my best to share questions and answers. I found a lot of great articles, some interesting experiments and studies.
First, let’s block the event loop!
Note:
The following example uses a 4-core Ubuntu 18.04 VIRTUAL machine and Node.js V10.9.0 running on a MacBook Pro 2017.
You can through the clone:https://github.com/michael-go/node-async-block to debug the code.
So, let’s start with a simple Express service:
const express = require('express');
const PID = process.pid;
function log(msg) {
console.log(` [${PID}] ` ,new Date(), msg);
}
const app = express();
app.get('/healthcheck'.function healthcheck(req, res) {
log('they check my health');
res.send('all good! \n')});const PORT = process.env.PORT || 1337;
let server = app.listen(PORT, () = > log('server listening on :' + PORT));
Copy the code
Now let’s add a pesky event loop blocking interface:
const crypto = require('crypto');
function randomString() {
return crypto.randomBytes(100).toString('hex');
}
app.get('/compute-sync'.function computeSync(req, res) {
log('computing sync! ');
const hash = crypto.createHash('sha256');
for (let i=0; i < 10e6; i++) {
hash.update(randomString())
}
res.send(hash.digest('hex') + '\n');
});
Copy the code
So we expect requests to/healthCheck to be slow during /compute-sync execution. We can use this bash single line to probe /healthcheck:
while true; do date && curl -m 5 http://localhost:1337/healthcheck && echo; sleep 1; done
Copy the code
It probes/HealthCheck every second and times out if there is no response within 5 seconds. Let’s try it:
We call /compute-sync and health-checks starts to time out, and none of them succeed. Note that the CPU usage of the Node.js process is 100%.
After 43.2 seconds, the calculation is complete, the 8 pending/HealthCheck requests are complete, and the service is ready to respond again. This is a very bad situation where a bad request completion blocks the service. With knowledge of single-threaded event loops, it’s obvious why the/Compute-sync code blocks all other requests: it doesn’t give back control of the event loop scheduler until this request completes, so there’s no chance that the/Healthcheck handler will be executed.
Let’s not block it!
Let’s add a new interface /compute-async, where we’ll do this by adding a new interface /compute-async at each calculation step (too many context switches?). After that, pause and return control of the event loop to divide the long blocked loop… We’ll tune later and only pause once per N iterations, but let’s start with a simple example:
app.get('/compute-async'.async function computeAsync(req, res) {
log('computing async! ');
const hash = crypto.createHash('sha256');
const asyncUpdate = async () => hash.update(randomString());
for (let i = 0; i < 10e6; i++) {
await asyncUpdate();
}
res.send(hash.digest('hex') + '\n');
});
Copy the code
Let’s try it:🤔 HMM… Does it still block the event loop?
What’s going on here? Does Node.js optimize our use of async/await here? It takes more seconds to complete, so it doesn’t look that way… If asynchrony is in fact affected, the flame chart will give you a hint. We can use this cool project to quickly produce the flame chart:
First, the flame graph captured when running /compute-sync:
We can see that at 100% synchronization,/compute-sync
The call stack containsexpress handler
And now,
Now, run/compute-async
When captured flame diagram:
Run in V8 context RunMicrotask & AsyncFunctionAwaitResolveClosure function to let the other side/compute – async flame figure look very different, so, don’t… Looks like not async/await is optimized.
Before delving further, let’s try one last desperate thing we programmers sometimes do: add a sleep call.
Let’s run it:
Finally, the service retains responsiveness during heavy computing 🎉. Unfortunately, the cost of this solution is very high: we can see the CPU usage is around 10%, so it doesn’t look like it’s working hard enough on the computation, which actually takes a long time to complete (not enough patience to see it happen 😅). Just by lowering the number of iterations, from 10⁷ to 10⁵, it’s ready after 2:23 minutes! (If you’re wondering, passing 1 instead of 0 as a delay parameter to setTimeout is no different — more on that later),
This raises several questions: why is the async code blocked without calling setTimeout()? Why does setTimeout() add such a large performance overhead? How can we not block our calculations without any side effects?
Event cycle stage
When something similar happened to me, I turned to Google for help. Searching for something like “Node Async Block Event Loop”, one of the results was the official Guide to Node.js, and it really opened my eyes.
It explains that the event loop is not the scheduler magic I imagined, but rather a fairly simple loop with multiple phases. Here’s a nice ASCII diagram of the main stages copied from the guide:
(I later found out that this loop works well in Libuv, as shown here: github.com/libuv/libuv…) This guide explains the differences between stages and gives the following overview:
- Timers: This stage performs callbacks set by setTimeout and setInterval.
- Pending Callbacks: Perform DEFERRED I/O callbacks until the next iteration of the loop.
- Idle, prepare: Used only internally.
- Poll: Retrieving the newly completed I/O event. Node blocks here when performing I/ O-related callbacks (almost all of them except closing callbacks, timer dispatch callbacks, and setImmediate) is appropriate.
- Check: This is where the setImmediate callback is invoked.
- Close callbacks: Some close callbacks, such as socket.on(‘close’,…) .
This begins to answer our question:
“Typically, when an event loop enters a given phase, it will perform any action specific to that phase and then perform callbacks in the queue for that phase until the queue runs out… Because any one of these operations could potentially schedule more operations…” .
This (vaguely) indicates that when a callback enlists another callback that will be processed in the same phase, it will then be processed before proceeding to the next phase. Node.js checks the network socket for new requests only during the poll phase. So does this mean that what we do in/compute-Async is actually blocked when the event loop leaves a certain phase, so that it never loops through the polling phase or even receives a/HealthCheck request? We need to go deeper to get a better answer, but it’s now clear why setTimeout() works: Every time we queue a timer callback, and the queue for the current phase is empty, the event loop must traverse all phases to reach the “timer” phase, passing the polling phase on the way, And handle pending network requests.
👋 Micro-tasks
We all know that using async/await in /compute-async is a syntactic sugar for a Promise, so what actually happens is that we create the Promise to compute the hash in each iteration of the loop, and when the Promise is resolved, our loop will proceed to an iteration. The official guidelines mentioned above do not specify how promises are handled during the event cycle phase, but our experiments show that both Promise’s callback and resolve’s callback are run without going through the poll phase of the event cycle.
Through some more Googling, I found Deepal Jayasekara’s 5 series of more detailed and better written articles about the cycle of events, including information about how the cycle of events handles Promises 🙏.
It contains this useful graph:
The boxes around the circle represent the different stages of the queue we saw earlier, and the two boxes in the middle represent the special queues that need to be consumed before moving from one stage to the next. Next Tick Queue handles callbacks registered with nextTick(), and Other Micro Tasks Queue is the answer to Promises.
In our case, if a resolve Promise creates another Promise, will that Promise be processed in the same phase before proceeding to the next phase? The answer, as we say, is yes! We can see how this happens in the V8 source code. This link points to the implementation code for RunMicrotask(). Here is an implementation snippet of the relevant lines:
TF_BUILTIN(RunMicrotasks, InternalBuiltinsAssembler) {
...
Label init_queue_loop(this);
Goto(&init_queue_loop);
BIND(&init_queue_loop);
{
TVARIABLE(IntPtrT, index, IntPtrConstant(0));
Label loop(this, &index).loop_next(this);
TNode<IntPtrT> num_tasks = GetPendingMicrotaskCount(microtask_queue);
ReturnIf(IntPtrEqual(num_tasks, IntPtrConstant(0)), UndefinedConstant()); . Goto(&loop); BIND(&loop); {... index = IntPtrAdd(index.value(), IntPtrConstant(1)); . BIND(&is_callable); {... Node*constresult = CallJS(...) ; . Goto(&loop_next); } BIND(&is_callback); {... Node*const result =
CallRuntime(Runtime::kRunMicrotaskCallback, ...);
Goto(&loop_next);
}
BIND(&is_promise_resolve_thenable_job);
{
...
Node* constresult = CallBuiltin(Builtins::kPromiseResolveThenableJob, ...) ; . Goto(&loop_next); } BIND(&is_promise_fulfill_reaction_job); {... Node*constresult = CallBuiltin(Builtins::kPromiseFulfillReactionJob, ...) ; . Goto(&loop_next); } BIND(&is_promise_reject_reaction_job); {... Node*constresult = CallBuiltin(Builtins::kPromiseRejectReactionJob, ...) ; . Goto(&loop_next); }... BIND(&loop_next);// This is line 62Branch(IntPtrLessThan(index.value(), num_tasks), &loop, &init_queue_loop); }}}Copy the code
This piece of C code, running loops through the magic of the underlying Goto() and Branch() functions and macros, looks strange. Because it is written in V8’s CodeStubAssembly(bytecode), it is “a platform-independent custom assembler that provides low-level primitives as a condensed abstraction from assembly”.
In the core, we can see that there is an outer loop named init_queue_loop and an inner loop named loop. The outer loop checks the number of microtasks in the actual pending state, while the inner loop processes all tasks one by one. And once that’s done, see line 62 of the code above, it iterates through the outer loop, which again checks the number of microtasks in the actual pending state, and only returns if no tasks have been added.
If you know how microtasks are handled at each event cycle stage, I recommend following another good article by Deepal Jayasekara: (B.T.W., remember the fire chart we created for/compute-Async? If you scroll up and look again, you will understand RunMicrotasks()) in the call stack. Also note that micro-tasks are V8 stuff, so it’s also relevant to Chrome, and Promises are handled in the same way — spin without a browser event loop.
nextTick()
Don’ttick
The Node.js guide says:
Callbacks passed to process.nexttick () will be processed before the next step in the event loop. This can cause some bad things because it allows you to recursively call process.nexttick () to starve your I/O.
The text is pretty clear this time, so I’ll keep another terminal screen shot for you, but you can do GET /compute-with-next-tick github.com/michael-go/… .
The nextTick queue is processed here: github.com/nodejs/node…
function _tickCallback() {...do {
while (tock = queue.shift()) {
...
Reflect.apply(callback, undefined, tock.args); . } runMicrotasks(); }while(! queue.isEmpty() || emitPromiseRejectionWarnings()); . }Copy the code
It is clear here that if the callback calls another process.nexttick (), the next callback will be processed in the same loop and will exit only if the queue is empty.
SetImmediate () to the rescue?
So, Promises are bad, Timers are worse — what else can we do? (😂 forced translation :So, Promises are red, Timers are blue — What else can we do?)
Looking at the graph above, all mention the setImmediate() queue that is processed during the Check phase, and after the poll phase, the Node.js official guide also says:
SetImmediate () and setTimeout() are similar, but behave differently depending on when they are called… The biggest advantage of setImmediate() over setTimeout() is that setImmediate() always executes before Timers if any timers are scheduled during the I/O cycle, no matter how many timers there are currently.
Sounds great, but the most promising quote is:
If the poll phase becomes idle and some scripts use setImmediate() to mediate, the event loop may continue to the check phase rather than wait.
Do you remember that when we used setTimeout(), the process was mostly idle (that is, waiting) and used about 10% of the CPU? Let’s see if setImmediate() solves this problem:
app.get('/compute-with-set-immediate'.async function computeWSetImmediate(req, res) {
log('computing async with setImmidiate! ');
function setImmediatePromise() {
return new Promise((resolve) = > {
setImmediate(() = > resolve());
});
}
const hash = crypto.createHash('sha256');
for (let i = 0; i < 10e6; i++) {
hash.update(randomString());
await setImmediatePromise()
}
res.send(hash.digest('hex') + '\n');
});
Copy the code
Yay! It does not block services, and CPU utilization is at 100%🎉! Let’s see how long it takes to complete the call:
Good, good It completes in 1:07 minutes, which is about 50% longer than /compute-sync and 34% longer than /compute-async — but it’s useful, unlike /compute-with-set-timeout. Given that this is a toy example, we use setImmediate() on every small iteration of 10⁷ gives a full event cycle spin, which is understandable
Does the setImmediate() callback not get processed in the same phase during the check phase? In fact, the Node.js documentation explicitly states: nodejs.org/api/timers…. :
If an immediate timer is enqueued from within an executing callback, the timer will not fire until the next iteration of the event loop
When comparing setImmediate() to process.nexttick (), the Node.js guide says:
We recommend that developers use setImmediate() in all situations…
For those of you who have been spoiled by code references, please don’t despair, I believe this is the actual code for performing the setImmediate() callback: github.com/nodejs/node…
B.T.W: Deepala mentioned in part 3 of the series a great truth: (jsblog.insiderattack.net/promises-ne…). Bluebird, a popular non-native Promise library, also uses setImmediate() to mediate Promise callbacks.
Thinking summary
(More or less, the bonus section on setTimeout() follows.) So setImmediate() seems to be a solution for separating synchronization code from running for a long time. But it doesn’t always run simply and work correctly. SetImmediate () generation here has an obvious overhead. The more requests in the pending state, the longer the long-running task takes to complete — so dividing your code too much is problematic, and dividing it too finely can block the time loop for too long. In some cases, blocking code is not a simple loop like in our example. As you recursively traverse a complex data structure, finding the right location and conditions becomes trickier.
In some cases, a possible solution is to track the time it takes to block before invoking setImmediate(). The clip will only call setImmediate() again 10ms after at least the last setImmediate() call.
.let blockingSince = Date.now()
async function crazyRecursion() {...if (blockingSince + 10 > Date.now()) {
await setImmediatePromise();
blockingSince = Date.now(); }... }Copy the code
Other situations, such as third-party libraries, where you don’t have full control (of the underlying code) can cause blocking – even built-in functions like json.parse () can cause blocking. A more extreme solution is to put potentially blocking code on a different service (non-Node.js), Either through child processes (through libraries like tiny-Worker) or actual threads through packages like Webworker-Threads or built-in (still experimental) :nodejs.org/api/worker_… In most cases, this is the right solution, but all of these separate solutions carry the hurdle of passing serialized data back and forth, because the separated code does not have access to the context of the JS main thread (unique thread).
So the common challenge with these solutions is that the responsibility for avoiding blocking is in the hands of each developer, not the operating system, as is the case with most threaded languages or run-time VMS, as in Erlang. Not only is it difficult to ensure that all developers are always aware of this, And explicit processing through code is often not the best option either, because the code lacks context and visibility into what is happening with other pending routines and IO.
The high throughput enabled by Node.js (if used incorrectly) can result in many pending requests waiting online due to a single blocking operation. Signal to the load-balancer as quickly as possible that the server has a long list of lagging requests, while allowing the requests to be directed to different nodes, you can also be in the health check interface (which won’t help in actual blocking…). To use such as toobusy – js library, or by like blocked – at the library or solutions like N | Solid and New – Relic such solution to monitor the time cycle at run time is also very important.
Extra meal: WhysetTimeout(0)
Zero of theta is not really zero, right?
I’m really curious as to why setTimeout(0) leaves the process 90% idle rather than labouring through calculations. An idle CPU can mean that the main thread of Node.js ran a system call, causing its execution (i.e. sleep) to be suspended for a long time until the kernel calls it again to resume execution, with some hints at setTimeout(). Read the source code for libuv processing timers, it seems that the real sleep occurs in uv__io_poll(), passed by the timeout argument as part of the poll() system call: github.com/libuv/libuv… Poll () ‘s home page says:
This timeout parameter specifies the number of milliseconds that poll() should block for a file descriptor to become ready… Specify a negative value as the timeout parameter to indicate an infinite timeout. Specifying a zero value causes poll() to return immediately.
So, unless timeout is confirmed to be 0, the process will sleep. We suspect that although a 0 value is passed to setTimeout(), the timeout value actually passed to poll() is higher. We can try to verify this by debugging the node itself with GDB. We can track all calls to uv__io_poll() with the dprintf command:
dprintf uv__io_poll, "uv__io_poll(timeout=%d)\n", timeout
Copy the code
Let’s also print the current time on each call trace:
dprintf uv__io_poll, "%d: uv__io_poll(timeout=%d)\n", loop->time, timeout
Copy the code
When our service is started,uv__io_poll() passed in timeout equals -1 is called, which means “endless timeout” – because it has nothing to do with the expected wait time for HTTP requests. Now let’s request the GET /compute-with-set-timeout interface:
As we have seen,timeout is sometimes 1 and 0, usually at least 1 millisecond between two consecutive calls. Debugging GET /compute-with-set-immediate in the same way always displays 0. One millisecond is a long time for a CPU. A tight/compute-sync-like cycle gives around 10 iterations around 43 seconds, which means it does about 233 random hash computes per millisecond, and waiting 1 millisecond on each iteration means waiting 10,000 milliseconds = 2:45 hours! Let’s try to understand why timeout passed to uv__io_poll is not 0. The timeout argument is evaluated at uv_backend_timeout() and then passed as an argument to uv__io_poll(), which can be seen in the main event loop: github.com/libuv/libuv… . For some cases (such as processing and requests for a pending state)uv_backend_timeout() returns 0, otherwise it returns the result of uv__next_timeout(), It picks the nearest timeout from the timer-heap and returns the retention time as its expiration time. Uv_timer_start () is used to add “timeouts” to the timer-heap. This is called a lot, so we can try to set a breakpoint to better locate the caller passing the value 0 in our scenario. Running the info stack command in GDB at the breakpoint yields:
#0 uv_timer_start (handle=0x2504cb0, cb=0x97c610 <node::(anonymous namespace)::TimerWrap::OnTimeout(uv_timer_s*)>, timeout=0x1, repeat=0x0) at .. /deps/uv/src/timer.c:77 #1 0x000000000097c540 in node::(anonymous namespace)::TimerWrap::Start(v8::FunctionCallbackInfo<v8::Value> const&) () #2 0x0000000000b5996f in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) () #3 0x0000000000b5a4d9 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) () #4 0x00003cc0e2fdc01d in ?? () #5 0x00003cc0e3005ada in ?? () #6 0x00003cc0e2fdbf81 in ?? () #7 0x00007fffffff8140 in ?? () #8 0x0000000000000006 in ?? (a)... (more nonsense addresses)Copy the code
TimerWrap::Start is just a simple wrapper around uv_timer_start(), and these unrecognized call stack addresses indicate that the code we’re actually looking for is probably implemented in Javascript. By looking again for the TimerWrap:Start source in Node or through the JS layer debugger, By simply “stepping into” the implementation of setTimeout(), we can quickly locate the constructor of Timeout at github.com/nodejs/node…
function Timeout(callback, after, args, isRepeat, isUnrefed) {
after *= 1; // coalesce to number or NaN
if(! (after >=1 && after <= TIMEOUT_MAX)) {
if (after > TIMEOUT_MAX) {
process.emitWarning(`${after} does not fit into a 32-bit signed integer. Timeout duration was set to 1.',
'TimeoutOverflowWarning'`);
}
after = 1; // schedule on next tick, follows browser behavior}... }Copy the code
Finally, we can see here at 👆 that timeout is merged from 0 to 1🙀. Hope you found this article helpful. Thanks for reading. (PS: translation is not easy, click a like!!)
The resources
- Nodejs.org/en/docs/gui…
- Jsblog.insiderattack.net/event-loop-… (5 article series)
- Jsblog.insiderattack.net/crossing-th…
- github.com/nodejs/node
- github.com/libuv/libuv (also vendored in nodejs/node)
- github.com/v8/v8 (also vendored in nodejs/node)
More information
- Javabeginnerstutorial.com/node-js/eve… * node-js/
- Stackoverflow.com/questions/3…
- Stackoverflow.com/questions/4…
- Stackoverflow.com/questions/5…
- Voidcanvas.com/setimmediat…
- webapplog.com/event-loop/
- Dtrace.org/blogs/brend…
- Developer.mozilla.org/en-US/docs/…
- Jakearchibald.com/2015/tasks-… – browsers
- www.codementor.io/simenli/dem…
- Blog.risingstack.com/node-js-at-…
- Gist.github.com/othiym23/61…
- Humanwhocodes.com/blog/2013/0…
- Runnable.com/blog/get-yo…
- www.dynatrace.com/news/blog/a…
- Medium.com/airbnb-engi…