The most important and notable feature of Node is its use of libuv, an asynchronous event-driven framework that makes JavaScript, also known as the toy language, a part of the backend language. (V8’s high performance, of course, and libuv’s code is elegant, It’s worth learning. However, the whole framework of Libuv is so large that it is impossible for us to know everything about it in just one article, so I chose the simplest FS module in Node to explain the process of synchronous and asynchronous file reading to understand a general process of Libuv.
fs.readSync
Fs. readSync is a method that I’m sure no one is unfamiliar with, but it’s not recommended in many articles because it can cause single-threaded node blocking, which is very unfriendly for some busy Node instances, but we won’t discuss it today, just the implementation. Fs.js is the lib directory of the Node project and you can see its code:
function(fd, buffer, offset, length, position) {
if (length === 0) {
return 0;
}
return binding.read(fd, buffer, offset, length, position);
};
Copy the code
Binding = process.binding(‘fs’). This is the builtin_module of Node. So we go straight to the SRC /node_flie.cc file.
The node::InitFs method returns static void read (const FunctionCallbackInfo
& args).
Static void Read(const FunctionCallbackInfo<Value>& args) {// Get the passed argument and process the argument.... Char * buf = nullptr; Local<Object> buffer_obj = args[1]->ToObject(env->isolate()); char *buffer_data = Buffer::Data(buffer_obj); size_t buffer_length = Buffer::Length(buffer_obj); . buf = buffer_data + off; uv_buf_t uvbuf = uv_buf_init(const_cast<char*>(buf), len); Req = args[5]; if (req->IsObject()) { ASYNC_CALL(read, req, UTF8, fd, &uvbuf, 1, pos); } else { SYNC_CALL(read, 0, fd, &uvbuf, 1, pos) args.GetReturnValue().Set(SYNC_RESULT); }}Copy the code
From the code above, we can see that the sixth parameter is a critical one and uses asynchronous operations if an object is passed in, whereas our fs.readsync method uses synchronous operations without passing in the sixth parameter and returns the result immediately after completion.
What does the SYNC_CALL macro do? It basically calls another macro:
#define SYNC_CALL(func, path, ...) \
SYNC_DEST_CALL(func, path, nullptr, __VA_ARGS__) \
Copy the code
Where __VA_ARGS__ represents arguments to other macros passed in besides func and path, let’s look at the SYNC_DEST_CALL macro:
#define SYNC_DEST_CALL(func, path, dest, ...) \ fs_req_wrap req_wrap; \ env->PrintSyncTrace(); \ int err = uv_fs_ ## func(env->event_loop(), \ &req_wrap.req, \ __VA_ARGS__, \ nullptr); \ if (err < 0) { \ return env->ThrowUVException(err, #func, nullptr, path, dest); \} \Copy the code
Env ->PrintSyncTrace() is used when node opens –trace-sync-io to trace where the synchronous IO is used in the code. You can use this method to type out the location of the synchronous IO in your code, so you can use this method to tune your code when it blocks frequently (not necessarily because of synchronous IO). The uv_fs_read method is a call to libuv to read the file. We found the location of this method in deps/uv/ SRC/Unix /fs.c:
int uv_fs_read(uv_loop_t* loop, uv_fs_t* req, uv_file file, const uv_buf_t bufs[], unsigned int nbufs, int64_t off, uv_fs_cb cb) { INIT(READ); if (bufs == NULL || nbufs == 0) return -EINVAL; req->file = file; req->nbufs = nbufs; req->bufs = req->bufsml; if (nbufs > ARRAY_SIZE(req->bufsml)) req->bufs = uv__malloc(nbufs * sizeof(*bufs)); if (req->bufs == NULL) { if (cb ! = NULL) uv__req_unregister(loop, req); return -ENOMEM; } memcpy(req->bufs, bufs, nbufs * sizeof(*bufs)); req->off = off; POST; }Copy the code
First let’s look at the macro call INIT(READ):
#define INIT(subtype) \ do { \ if (req == NULL) \ return -EINVAL; \ req->type = UV_FS; \ if (cb ! = NULL) \ uv__req_init(loop, req, UV_FS); \ req->fs_type = UV_FS_ ## subtype; \ req->result = 0; \ req->ptr = NULL; \ req->loop = loop; \ req->path = NULL; \ req->new_path = NULL; \ req->cb = cb; \ } \ while (0)Copy the code
This is an obvious initialization operation. The first is to point the req loop to the event_loop of the node, and the second is to feed the UV_FS_READ file to the fs_type. This is an important flag for later work. After that, back to the uv_fs_read method, we can see that there is a lot of work going on before the POST macro is called. There is nothing to talk about, but let’s focus on the POST macro:
#define POST \ do { \ if (cb ! = NULL) { \ uv__work_submit(loop, &req->work_req, uv__fs_work, uv__fs_done); \ return 0; \ } \ else { \ uv__fs_work(&req->work_req); \ return req->result; \ } \ } \ while (0)Copy the code
From the above code we can see that uv__work_submit is called when cb is available. This is called asynchronously, but we’ll get to that later. Now let’s say uv__fs_work:
static void uv__fs_work(struct uv__work* w) { int retry_on_eintr; uv_fs_t* req; ssize_t r; req = container_of(w, uv_fs_t, work_req); retry_on_eintr = ! (req->fs_type == UV_FS_CLOSE); do { errno = 0; #define X(type, action) \ case UV_FS_ ## type: \ r = action; \ break; switch (req->fs_type) { ... X(WRITE, uv__fs_buf_iter(req, uv__fs_write)); X(OPEN, uv__fs_open(req)); X(READ, uv__fs_buf_iter(req, uv__fs_read)); . } #undef X } while (r == -1 && errno == EINTR && retry_on_eintr); if (r == -1) req->result = -errno; else req->result = r; if (r == 0 && (req->fs_type == UV_FS_STAT || req->fs_type == UV_FS_FSTAT || req->fs_type == UV_FS_LSTAT)) { req->ptr = &req->statbuf; }}Copy the code
This method is shared by fs files, so it performs different methods according to the type of req. We have just seen that we gave it the type of UV_FS_READ when initializing req. Uv__fs_buf_iter (req, uv__fs_read),uv__fs_buf_iter calls the uv__fs_read function, which is the second parameter passed in. It’s a normal read(and readv and pread) operation, except for this code:
#if defined(_AIX)
struct stat buf;
if(fstat(req->file, &buf))
return -1;
if(S_ISDIR(buf.st_mode)) {
errno = EISDIR;
return -1;
}
#endif
Copy the code
This code nicely explains the section in the Node documentation about fs.readfilesync
Note: Similar to fs.readFile(), when the path is a directory, the behavior of fs.readFileSync() is platform-specific.
// macOS, Linux, and Windows
fs.readFileSync('<directory>');
// => [Error: EISDIR: illegal operation on a directory, read <directory>]
// FreeBSD
fs.readFileSync('<directory>'); // => null, <data>
Copy the code
After uv__fs_read successfully reads the file,req->bufs now has what it needs, Node_file. cc static void Read(const FunctionCallbackInfo
& args) req->bufs memory refers to binding.read(fd, Buffer, offset, Length, position) The memory segment of the buffer passed in. At this point, you have what you want to read. . And we often use at ordinary times the fs readFileSync is to open the file first get the fd, and generate a buffer and then call the fs. ReadSync, is generated in the buffer to get the file content and back, simplifies a lot, so more get the favour of people. This is the end of the synchronous read operation, which is quite simple because the read operation is a blocking operation, so it can be a performance bottleneck for a single-threaded Node process. Let’s look at the asynchronous fs.read function of Node.
fs.read
Asynchronous operations are much more complex than synchronous operations, so let’s walk through them step by step. NextTick (function() {callback && callback(null, 0, buffer); }); }
function wrapper(err, bytesRead) {
// Retain a reference to buffer so that it can't be GC'ed too soon.
callback && callback(err, bytesRead || 0, buffer);
}
var req = new FSReqWrap();
req.oncomplete = wrapper;
binding.read(fd, buffer, offset, length, position, req);
};
Copy the code
Req = new FSReqWrap(); req = new FSReqWrap(); Req is an instance of FSReqWrap = binding.FSReqWrap, so we can see the following code from node::InitFs:
Local<FunctionTemplate> fst = FunctionTemplate::New(env->isolate(), NewFSReqWrap);
fst->InstanceTemplate()->SetInternalFieldCount(1);
AsyncWrap::AddWrapMethods(env, fst);
Local<String> wrapString =
FIXED_ONE_BYTE_STRING(env->isolate(), "FSReqWrap");
fst->SetClassName(wrapString);
target->Set(wrapString, fst->GetFunction());
Copy the code
The above code uses the API provided by V8 to generate the constructor for FSReqWrap and void NewFSReqWrap(const FunctionCallbackInfo
& args) will talk about the constructor content. The function of the main work is only one object – > SetAlignedPointerInInternalField (0, nullptr); , but this is only about embedding C++ objects. From the static void Read(const FunctionCallbackInfo
& args) method, the macro ASYNC_CALL command is called when a req object is passed in. ASYNC_DEST_CALL(func, req, NULlptr, encoding, __VA_ARGS__) calls the actual logic, so let’s go straight to the ASYNC_DEST_CALL code:
#define ASYNC_DEST_CALL(func, request, dest, encoding, ...) \ Environment* env = Environment::GetCurrent(args); \ CHECK(request->IsObject()); \ FSReqWrap* req_wrap = FSReqWrap::New(env, request.As<Object>(), \ #func, dest, encoding); \ int err = uv_fs_ ## func(env->event_loop(), \ req_wrap->req(), \ __VA_ARGS__, \ After); \ req_wrap->Dispatched(); \ if (err < 0) { \ uv_fs_t* uv_req = req_wrap->req(); \ uv_req->result = err; \ uv_req->path = nullptr; \ After(uv_req); \ req_wrap = nullptr; \ } else { \ args.GetReturnValue().Set(req_wrap->persistent()); The \}Copy the code
FSReqWrap::New: req_wrap ::New: req_wrap ::New: req_wrap ::New: req_wrap
const bool copy = (data ! = nullptr && ownership == COPY); const size_t size = copy ? 1 + strlen(data) : 0; FSReqWrap* that; char* const storage = new char[sizeof(*that) + size]; that = new(storage) FSReqWrap(env, req, syscall, data, encoding); if (copy) that->data_ = static_cast<char*>(memcpy(that->inline_data(), data, size)); return that;Copy the code
New (storage) FSReqWrap(env, req, syscall, data, encoding); First let’s take a look at the inheritance relationship of FSReqWrap:
The key properties and methods of the key objects are shown in the figure above, so we can see the main functions of the FSReqWrap inherited objects:
1. Inherited the key attribute uv_fs_t of ReqWrap and the key method ReqWrap
:: health. Use req_. Pass itself in libuv’s methods.
2. Inherit MakeCallback from AsyncWrap, which will execute the asynchronous read callback we passed in, in this case via req.oncomplete = wrapper in JS; The wrapper function passed in.
Persistent_handle_ (Persistent
Local handles are held on a stack and are deleted when the appropriate destructor is called. These handles' lifetime is determined by a handle scope, which is often created at the beginning of a function call. When the handle scope is deleted, the garbage collector is free to deallocate those objects previously referenced by handles in the handle scope, provided they are no longer accessible from JavaScript or other handles.
Persistent handles provide a reference to a heap-allocated JavaScript Object, just like a local handle. There are two flavors, which differ in the lifetime management of the reference they handle. Use a persistent handle when you need to keep a reference to an object for more than one function call, or when handle lifetimes do not correspond to C++ scopes.
Copy the code
Local is cleaned up by GC as scope destructors are allocated on the stack, but Persistent is not. A bit like the stack-allocated vs. heap-allocated relationship, Persistent v8 objects are used for more than one function, which is node’s execution environment and contains almost all methods and attributes needed for node execution. It’s really hard to explain in a sentence or two, and it’s not directly related to the discussion in this article.
Finally, in the constructor of FSReqWrap, we associate the Persistent< object > persistent_handle_ persistent_js object we mentioned above with the C++ object of FSReqWrap via Wrap(object(), this). This is the most common approach used in Node (and the most common technique used in EBMed development). Returning to the macro ASYNC_DEST_CALL, we now know that the FSReqWrap::New method connects the FSReqWrap object instance to the req object New in JS, and also connects the uv_fs_t of libuv to the actual example. Void After(uv_fs_t *req) is passed as a callback to the last parameter cb. From our synchronization discussion, we can see the difference between passing the callback to uv_fs_t *req. QUEUE_INSERT_TAIL(&(loop)->active_reqs, &(req)->active_queue) in uv__req_init; The macro method places the reQ in the loop’s acitve_reqs list. Uv__work_submit (loop, &req->work_req, uv__fs_work, uv__fs_done) calls uv__work_submit. This method is in deps/uv/ SRC /threadpool:
uv_once(&once, init_once);
w->loop = loop;
w->work = work;
w->done = done;
post(&w->wq);
Copy the code
The static void worker(void* arg) method is called by uv_once to start several worker threads on the first call:
for (;;) {
uv_mutex_lock(&mutex);
while (QUEUE_EMPTY(&wq)) {
idle_threads += 1;
uv_cond_wait(&cond, &mutex);
idle_threads -= 1;
}
q = QUEUE_HEAD(&wq);
if (q == &exit_message)
uv_cond_signal(&cond);
else {
QUEUE_REMOVE(q);
QUEUE_INIT(q);
}
uv_mutex_unlock(&mutex);
if (q == &exit_message)
break;
w = QUEUE_DATA(q, struct uv__work, wq);
w->work(w);
uv_mutex_lock(&w->loop->wq_mutex);
w->work = NULL;
QUEUE_INSERT_TAIL(&w->loop->wq, &w->wq);
uv_async_send(&w->loop->wq_async);
uv_mutex_unlock(&w->loop->wq_mutex);
}
Copy the code
When there is no task, the thread will be blocked by uv_cond_wait, and when there is a task, the task will be taken out from the queue and executed by w->work(w). Uv_async_send (&W ->loop->wq_async) tells the main thread to fetch the task from loop->wq and execute its callback.
Back to the uv__work_submit method, we can see what it does next. To register the work function, we pass in the uv__fs_work function, which we have introduced before and won’t explain here, but in asynchronous mode, it is done by the worker thread. Does not block the main thread. The second function is the callback to the main line after registration, which is uv__fs_done:
req = container_of(w, uv_fs_t, work_req);
uv__req_unregister(req->loop, req);
if (status == -ECANCELED) {
assert(req->result == 0);
req->result = -ECANCELED;
}
req->cb(req);
Copy the code
As you can see, this function removes the task’s req from the loop’s acitve_reqs and then executes the callback passed in to uv_fs_read. In the last post, the current task is registered in the list of wq, and the uv_cond_signal function of the conditional variable is used to trigger the blocked function in uv_cond_wait to run, and then the worker process can execute the process we just said.
You can see how asynchronous reads are performed by using the Wokrer thread to do the actual reads, while the main thread performs the callback after the worker thread completes the operation. But now let’s look at how the main thread is notified after the worker thread. Uv_async_send (&W ->loop->wq_async) is a call to async_send(&w->loop->wq_async). First we need to go back to the loop initialization, the function uv_loop_init. Within this function there is a call: uv_async_init(loop, &loop->wq_async, uv__work_done); . This call generates a pipe with the following statement:
uv__io_init(&loop->async_io_watcher, uv__async_io, pipefd[0]);
uv__io_start(loop, &loop->async_io_watcher, POLLIN);
loop->async_wfd = pipefd[1];
Copy the code
When data is written to pipefd[1], the main line will execute the call to uv__async_io after reading the data. The most important job in uv__async_io is to execute its async_cb. The async_cb function registered at loop initialization is uv__work_done:
// Fetch data... while (! QUEUE_EMPTY(&wq)) { q = QUEUE_HEAD(&wq); QUEUE_REMOVE(q); w = container_of(q, struct uv__work, wq); err = (w->work == uv__cancelled) ? UV_ECANCELED : 0; w->done(w, err); }Copy the code
Here we can see that all the tasks put into the loop->wq queue are fetched and their callback is performed by w->done(w, err). The call to uv_async_send(&W ->loop->wq_async) in the worker thread triggers the whole process by writing a byte to loop->async_wfd, the pipefd[1] mentioned above. Void After(uv_fs_t *req) void After(uv_fs_t *req) void After(uv_fs_t *req) void After(uv_fs_t *req) I’m not going to post the code and the only thing I’m going to talk about is his first sentence
FSReqWrap* req_wrap = static_cast<FSReqWrap*>(req->data);
Copy the code
Req ->data to string together the FSReqWrap object instances. From here you can get the js object that was initialized and execute its onComplete function. Callback callback callback callback callback callback callback callback callback callback callback callback callback callback callback callback callback callback callback
callback && callback(err, bytesRead || 0, buffer);
Copy the code
The fs.readFile operation is more complicated than the fs.readFile operation. The fs.readFile operation is more complicated than the fs.readFile operation. The fs.readFile operation is more complicated than the fs.readFile operation. And if the file is too large to be read all at once (up to 8*1024 bytes at a time), callbacks continue to read the file until it is read, then the file is closed asynchronously, and the incoming callback function is executed through the callback that closes the file asynchronously. Visible in order to our usual development of convenience, Node developers or pay a lot of efforts.
conclusion
When you look at Node’s synchronous and asynchronous implementations of file reading, you can see the subtlety of Libuv. This is especially true for single-threaded languages such as Node, where a subthread is used to process tasks and a pipe is used to tell the main thread to execute callback. This is another big question that we will have to explain in a future article. This article will stop there and hopefully help you understand a little more about the logic behind Node.