This is the 25th day of my participation in the August Genwen Challenge.More challenges in August

Module introduction

We’ll start with a simple module loading principle and try reading the Node.js source code. First of all, we know that the source code of node.js is mainly written by C++ and JavaScript. The js part is mainly in the lib directory, while the C++ part is mainly in the SRC directory.

Module loading is mainly divided into four types of modules:

  1. C++ core modules: mainly in the SRC directory, such as node_file. Cc
  2. Node.js internal module: different from C++ core module, in the source code of the lib directory, with the js source code of the same name, in fact, node.js built-in module is a higher level of C++ core module encapsulation, such as fs, HTTP
  3. User source module
  4. C++ extension modules: plug-ins are dynamically linked shared objects written in C++. The require() function loads the plug-in as a normal Node.js module. Plug-ins provide an interface between JavaScript and C/C++ libraries.

This introduction is C++ core module load source code analysis

C++ core module loading process

At the end of each C++ core module source code is a macro call that registers the module to the C++ core module list for internalBinding to retrieve it. There are two steps to follow from this statement: register and get.

The overall process

  1. Entry file execution

As we know, the entry to node.js execution is in node.cc’s Start function

int Start(int argc, char** argv) {}
Copy the code
  1. Branch 1: C++ modules are registered in the linked list

Let’s take a look at what the Start function does. It first calls InitializeOncePerProcess to initialize Node.js and V8. From this function, you can find C++ modules registered to the linked list

InitializationResult result = InitializeOncePerProcess(argc, argv);
Copy the code
  1. Branch two: get C++ modules

If you look further in the Start function, you can see that NodeMainInstance is instantiated and Run is called. Follow this function to find the process of getting C++ modules

NodeMainInstance main_instance(...).;
result.exit_code = main_instance.Run(a);Copy the code

Branch 1: C++ modules are registered in the linked list

1. Front link

Starting with the InitializeOncePerProcess above, you can see that the InitializeNodeWithArgs function that initializes the node is called

result.exit_code = InitializeNodeWithArgs(&(result.args), &(result.exec_args), &errors);
Copy the code

Keep looking and see that it calls RegisterBuiltinModules in node_binding.cc to register C++ modules

binding::RegisterBuiltinModules(a);Copy the code

2, C++ module registration function call

RegisterBuiltinModules does two things: it defines the macro V, which takes modname; The second is NODE_BUILTIN_MODULES

void RegisterBuiltinModules(a) {
#define V(modname) _register_##modname();
NODE_BUILTIN_MODULES(V)  // Module name can be found in this macro
#undef V
}
Copy the code

Then we can see that NODE_BUILTIN_MODULES is also a macro definition:

#define NODE_BUILTIN_MODULES(V)                                          \   NODE_BUILTIN_STANDARD_MODULES(V)                                         \   .Copy the code

NODE_BUILTIN_STANDARD_MODULES is defined as follows:

#define NODE_BUILTIN_STANDARD_MODULES(V)                                 \
  V(async_wrap)                                                          \
  V(buffer)                                                              \
  ...
Copy the code

In other words, c++ preprocessing will change to the following function body

void RegisterBuiltinModules(a) { _register_async_wrap(); _register_buffer(); . }Copy the code

That is, _register_async_wrap and _register_buffer are finally called… These functions, all right, so where are these functions defined?

C++ module registration function definition

From the macro definition above, you can find the following comment: The definitions are in each module’s implementation when calling the NODE_MODULE_CONTEXT_AWARE_INTERNAL. That is, the definition is done when each module calls NODE_MODULE_CONTEXT_AWARE_INTERNAL

Ok, so we look at the fs module’s C++ file node_file.cc and see that the last line calls NODE_MODULE_CONTEXT_AWARE_INTERNAL

NODE_MODULE_CONTEXT_AWARE_INTERNAL(fs, node::fs::Initialize)
Copy the code

Continuing, node_binding.h sees that it calls NODE_MODULE_CONTEXT_AWARE_CPP

#define NODE_MODULE_CONTEXT_AWARE_INTERNAL(modname, regfunc)             \
  NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, nullptr, NM_F_INTERNAL)
Copy the code

Continuing, in node_binding.h, the _register_##modname execution is defined to invoke node_module_register

#define NODE_MODULE_CONTEXT_AWARE_CPP(modname, regfunc, priv, flags)    \ 
  void _register_##modname() { node_module_register(&_module); }
Copy the code

As a final step, node_binding.cc inserts the passed node module into modlist_internal, where it will be found later

extern "C" void node_module_register(void* m) {
  struct node_module* mp = reinterpret_cast<struct node_module* >(m);
  if(mp->nm_flags & NM_F_INTERNAL) { mp->nm_link = modlist_internal; modlist_internal = mp; }}Copy the code

Each C++ built-in module has a macro at the end of the source code. When compiling the module code, register the module in the C++ core module list modlist_internal. Remember the name of the list.

Branch two: get C++ modules

1. Front link

Well, this section starts with main_instance.run (), and you can see that its definition performs RunBootstrapping in node_main_instance.cc

if (env->RunBootstrapping().IsEmpty()) {
  *exit_code = 1;
}
Copy the code

RunBootstrapping executes the BootstrapInternalLoaders in Node. cc

if (BootstrapInternalLoaders().IsEmpty()) {
  return MaybeLocal<Value>();
}
Copy the code

Then it executes the internal/bootstrap/loaders this file

if (!ExecuteBootstrapper(this."internal/bootstrap/loaders", &loaders_params, &loaders_args).ToLocal(&loader_exports))
Copy the code

2. Get C++ modules from linked lists

In the comments to this file, it explains what it does: This file is compiled as if it’s wrapped in a function with arguments passed by node::RunBootstrapping() global Process, getLinkedBinding, getInternalBinding, primordials you can see that this js file is wrapped in a function that takes four arguments. Where do these four arguments come from in a C++ file

// Create binding loaders
  std::vector<Local<String>> loaders_params = {
      process_string(),
      FIXED_ONE_BYTE_STRING(isolate_, "getLinkedBinding"),
      FIXED_ONE_BYTE_STRING(isolate_, "getInternalBinding"),
      primordials_string(a)}; std::vector<Local<Value>> loaders_args = {process_object(),
      NewFunctionTemplate(binding::GetLinkedBinding)
      NewFunctionTemplate(binding::GetInternalBinding)
      primordials(a)};Copy the code

Well, there are two functions, getLinkedBinding and getInternalBinding, which are responsible for the interaction between JS and C++. GetInternalBinding is the function that gets the core module, and the other one should be related to C++ extensions. So let’s look at GetInternalBinding and as you can see in node_binding.cc, it internally executes get_internal_module

node_module* mod = get_internal_module(*module_v);
Copy the code

The get_internal_module inside ~ node_binding.cc executes FindModule passing in modlist_internal, the module identifier, and a constant that identifies the internal module

node_module* get_internal_module(const char* name) {
  return FindModule(modlist_internal, name, NM_F_INTERNAL);
}
Copy the code

Finally, let’s look at FindModule

inline struct node_module* FindModule(struct node_module* list,
                                      const char* name,
                                      int flag) {
  struct node_module* mp;
  for(mp = list; mp ! =nullptr; mp = mp->nm_link) {
    if (strcmp(mp->nm_modname, name) == 0) break;
  }
  CHECK(mp == nullptr|| (mp->nm_flags & flag) ! =0);
  return mp;
}
Copy the code

Well, found the module from the modlist_Internal linked list and returned, yay

Overall flow chart

subsequent

Finally, loader.js receives getInternalBinding and stops, so let’s see, He defines process.binding and process._linkedBinding to correspond to getInternalBinding and getLinkedBinding, respectively, as well as internalBinding

internalBinding = function internalBinding(module) {
    let mod = bindingObj[module];
    if (typeofmod ! = ='object') {
      mod = bindingObj[module] = getInternalBinding(module);
      moduleLoadList.push(`Internal Binding The ${module}`);
    }
    return mod;
  };
Copy the code

This function can be used in fs.js or when loading js modules via CommonJS or ESModule

Reference documentation

  • Juejin. Im/post / 5 d10b6…
  • www.zhihu.com/lives/84274…
  • Dev. To/captainsafi…
  • Github.com/tsy77/blog/…