Skillfully use Webpack to do page static resource dependency analysis

The original link

Author: Teda

Preface:

The so-called “static resource dependency analysis” refers to the dependency relationship between page resources can be obtained in the form of JSON data or charts after analyzing page resources.

For example, the entry file entry.js of college-index (the home page of Kukler University) references banner.js, and at the same time, banner.js references utils.js, so we hope to get such data after analysis:

[ { "type": "entry", "path": "/xx/xx/college-index/entry.js", "deps": [ { "type": "module", "path": "/xx/xx/college-index/banner.js", "deps": [ { "type": "module", "path": "/xx/xx/college-index/utils.js", "deps": []}]}] // Type indicates whether it is an entry or a moduleCopy the code

What can YOU do once you have the resource dependency file? The author has several scenarios for reference:

For a multi-page REPO, every time I want to publish, I want to get the files that changed through Git diff, and then get the resources that need to be built through dependency analysis, so that I can do a single-page publish.
I can get the current resource dependencies and weed out unused resources in the REPO for me
I want a real-time preview of resource dependencies in the front-end repo in the vscode extension

There may be many uses, the key is how to quickly get this dependency analysis data?

A train of thought

Here is an idea that the author has considered: through traversing the page entry, and then keyword matching, such as ([import xx from XXX], [require]) keyword processing, get the dependent module path, and then continue to do recursive analysis of the module path, and finally summarize to get the dependency tree.

This idea seems feasible at first glance, and the analysis process can be made more efficient by using JavaScript parsers such as Acorn for keyword matching, and by processing the AST of the parsed file.

However, after a simple attempt to give up this idea, the reason is that for the current front-end engineering projects, a page dependency not only js but also will have a variety of resources, such as SASS, LESS CSS preprocessor, or other resources and so on, so it is not enough to deal with the JS path.

With the help of Webpack

In the case that the above idea is not feasible, we will focus on using WebPack to achieve the solution. Developers familiar with WebPack will know the process of WebPack for dependency pair processing, here is a brief mention: After obtaining the entry file, Webpack will obtain the correct path of the resource, parse the file through loader, and finally get the dependency referenced in the module by traversing the AST, and then perform recursive processing on the dependences. Finally get the dependency tree.

This is basically the same as our initial idea, and the problem that different resources cannot be resolved with loader is also solved. Those of you reading this might wonder: Isn’t there a tool like Webpack-Bundle-Analyzer already available? After every build, you can get file dependencies in STATS.

The reason for not using it directly is simple: it’s too slow to build once in the first place, especially with dozens of pages, and the other reason is that I just want to get resource dependencies, and I don’t want to do a build of the entire front-end REPO or generate any bundles.

Is there a tool like WebPack that can use loader to find file dependencies without generating and compressing bundles?

Webpack didn’t exist before we revamped it. How? A WebPack plugin is enough.

The webPack build process

Before explaining how to transform, it’s important to take a look at the module handling process and the overall flow of WebPack.

Module processing process:

When WebPack gets a path, it executes the reslove module path -> Create module -> Build module -> Parse module. Each step of the process is as follows:

– [reslove module] : obtain the real path of module – [create module] : create a module context – [build module] : read module contents – [parse module] : Analyze the module content (mainly find the require keyword in the module, add the dependency to the dependency array of the module.

Repeat the process

Each process is very complicated. Take reslove module as an example: Processing logic mainly in webpack/lib/normalModuleFactory js the file, The whole reslove process will execute the steps of beforeResolve, Factory, Resolver and afterResolve successively. Each step corresponds to a hook. When executing each hook, it will get the description information about the module. Description information will become fuller with hook execution, and complete module information will be finally obtained, making preparations for the following [Create module] and [Build module]. We will intervene in this part of the process in [Concrete implementation] below.

This article will not discuss the module processing process in detail, there are many excellent online articles for reference, please freely refer to.

The overall flow of Webpack:

In my opinion, the overall process of Webpack is divided into four steps, as shown below:

We only need the first three webPack processes to implement dependency analysis, for reasons shown below.

The solution

As we mentioned earlier, webPack handles all dependencies in a recursive analysis entry. Since dependencies in a page include relative or absolute path references, dependencies with alias references, and dependencies in node_modules, Since we only want the resource dependencies before the repO, we can simply mask the dependencies in node_modules, which makes the recursion time of modules much shorter, as shown in the following figure:

By:

To:

In the 4 main processes of Webpack mentioned above, after the end of [Step3], Webpack can get all modules (all modules generated by a single build behavior), which is enough for dependency analysis. We directly terminate the subsequent processes of Webpack. Chunk generation and chunk consolidation optimizations are no longer performed. This meets the purpose of the title of this article, to do a page static resource dependency analysis in 1/10 of the build time.

Concrete implementation:

Write a Webpack plugin

The plugin is called FastDependenciesAnalyzerPlugin, plugin writing with reference to the official document.

class FastDependenciesAnalyzerPlugin {
  beforeResolve = (resolveData, callback) => {}

  afterResolve = (result, callback) => {}

  handleFinishModules = (modules, callback) => {}

  apply(compiler) {
    compiler.hooks.normalModuleFactory.tap(
      "FastDependenciesAnalyzerPlugin",
      nmf => {
        
        nmf.hooks.beforeResolve.tapAsync(
          "FastDependenciesAnalyzerPlugin",
          this.beforeResolve
        );
        
        nmf.hooks.afterResolve.tapAsync(
          "FastDependenciesAnalyzerPlugin", this.afterResolve ); }); compiler.hooks.compilation.tap("FastDependenciesAnalyzerPlugin",
      
      compilation => {
        compilation.hooks.finishModules.tapAsync(
          "FastDependenciesAnalyzerPlugin", this.handleFinishModules ); }); }}Copy the code

The complier. Hooks. NormalModuleFactory this hook callback to continue to monitor normalModuleFactory beforeResolve hook and beforeResolve hook.

In the complier.hooks.com pilation hooks callback to continue to monitor the compilation finishModules hook.

Step into the beforeResolve process

 beforeResolve(resolveData, callback) {
    const { context, contextInfo, request } = resolveData;
    const { issuer } = contextInfo;
  }
Copy the code

The context is the absolute path of the parse directory. The issuer of the webpack is the path of the object on which the module is dependent, which points to the source of the module. Request is the request path of the current module. /xx/ XXX /banner.js = /xx/ XXX /banner.js

// 1
import utils from './utils'

// 2
import utils from '@utils'

// 3
import utils from 'utils'
Copy the code

For the utils module, the value of issuer is /xx/ XXX /banner.js, and the value of request is “./utils.

At this point, we can only know the source path of the current module, issuer, and its requested path, Request. We can’t get the actual path of the current module, and we can’t put it into our dependency tree. Therefore, we will not process our dependency tree in beforeReslove. For example, if the utils NPM package contains a large number of small modules that will not be placed in the dependency tree, we will skip these modules directly. In fact, there is a statement in the WebPack source code:

For modules with no return value in beforeResolve, callback will be performed directly. In WebPack source code, if there is no parameter in callback, it usually means that the process ends prematurely. In beforeResolve, the return callback does not have the module’s subsequent reslove and build processes, which enables the module to skip.

To do this, we can implement a skip function, which provides a simple version with the parameters issuer and Request

Const ignoreDependenciesArr = object.keys (dependencies);function skip(request, issuer) {
  return (
    ignoreDependenciesArr.some(item => request.includes(item)) ||
    issuer.includes("node_modules")); }Copy the code

The current module can be skipped by comparing whether request is in the dependencies defined in package.json, or if the issuer itself contains node_modules. Of course, this method is only a simple version, in fact, to consider many special cases, not detailed here, interested readers can implement their own.

With the help of afterResolve

afterResolve(result, callback) { const { resourceResolveData } = result; const { context:{ issuer }, path } = resourceResolveData; } // Add the dependency to the dependency tree hereCopy the code

Webpack uses enhanced resolve to resolve a request path by passing in the module’s contextInfo, context, and request. If you want to tell enhanced-resolve, the package.json of the repo where the module resides, the alias information in the Webpack configuration, etc., this article will not describe how enhanced- Resolve works. Interested readers can check it out for themselves.

In afterReslove, we can take the enhanced resolve module path from Webpack, and the parent module path that depends on the enhanced resolve module. Add these two paths to the dependency tree, and after a simple recursive operation, we can obtain the full dependency tree. Of course in the dependency tree we can put all kinds of information, is it a module? Whether it is a JS file, whether it is a CSS file, these things can be implemented.

It ends up in finishModules

Webpack made a commit on 4.30, which converts the SyncHook of finishModules into AsyncSeriesHook and adds a line of code to the finishModules hook. The diagram below:

This allows us to listen on the Hook of finishModules and pass a value to err, which directly blocks the subsequent file merge and optimization process. The official can provide more appropriate hook, so that we can easily skip some processes.

Some hit the pit

For CSS or SCSS files referenced in JS, you can retrieve the dependencies through the normal reslove process, but if you use the @import syntax in CSS, since CSS-Loader handles the syntax itself, Therefore, it will not follow the reslove process of Webpack itself. Please see this issue for details. Here, we have to do additional processing for this part in beforeReslove, such as removing the loder description in the request by string interception. The @import is passed in and the module is processed by actively calling Enhance -resolve to get the correct path.
The usage and names of some hooks vary from version to version of Webpack, so developers should be careful when dealing with internal processes.

conclusion

The principle described in this article is not profound, mainly mining a usage of webpack, hoping to inspire developers who want to use Webpack to do more tools to use the internal principle of Webpack, finally thank you for reading, welcome interested friends to comment on the bottom of the article, give advice and help.