background

HaoHao business line is front end engineers (professional page boy), I am tool chain architecture group engineer (professional tools), HaoHao one day and said I he maintenance projects not to have too many modules, actually can be deleted, but now I don’t know what doesn’t, then can’t delete, asked me if I can be a tool to find all not cited module. After all, I am a professional tool man, so IT took me more than half a day to realize this tool.

This tool is a universal tool that node projects and front-end projects can use to find unused modules, and the idea of a module traverser can be applied to many other applications. So I sorted out the implementation ideas and wrote this article.

Thought analysis

The goal is to find all unused modules in the project. Projects always have several entry modules from which code is packaged or run. We need to know all the entry modules first.

Once there are entry modules, analyze which modules are used (dependent) by the entry module, and then analyze the dependencies from the used modules, and so on, recursively, until there are no new dependencies. In this process, all the modules that are iterated are used, and all the modules that are not iterated are not used, which are the modules that we are looking for that can be deleted.

We can save module information and the relationship between modules in the process of traversal as the relationship between objects, and construct a dependency graph (because there may be a module dependent on two modules, or even cycle dependent, so it is a graph). The subsequent analysis of the data structure of the dependency graph is the analysis of the dependency relationship between modules. For this requirement, we only need to save the module path traversed to, without generating dependency graph.

To find which modules it depends on, there are different ways to analyze dependencies for different modules:

  • Js, TS, JSX, TSX modules depend on es Module import or CommonJS require
  • CSS, LESS, and SCSS modules determine dependencies based on the syntax of @import and URL ()

There is also a layer of processing that needs to be done to get the dependency path. For example, WebPack can configure aliases, typescript can configure paths, and Monorepo has its own path resolution rules that need to be handled. After processing, we can find out what the real path of the module is.

After dependency analysis starting from the entry module, the module graph is traversed, the used module path is saved, and then the used module path is filtered out with all the modules, and the remaining modules are not used.

Here’s the idea. Let’s implement it:

Code implementation

Module traversal

We’ll write a module traverser that passes in the path to the current module and a callback function that handles the contents of the module as follows:

  • Try to complete the path because.js,.json,.tsx, etc can omit the suffix
  • Get the module type based on the path
  • If it is a JS module, process it by traversing the JS module
  • If it is a CSS module, proceed by traversing the CSS
const MODULE_TYPES = {
    JS: 1 << 0.CSS: 1 << 1.JSON: 1 << 2
};

function getModuleType(modulePath) {
    const moduleExt = extname(modulePath);
     if (JS_EXTS.some(ext= > ext === moduleExt)) {
         return MODULE_TYPES.JS;
     } else if (CSS_EXTS.some(ext= > ext === moduleExt)) {
         return MODULE_TYPES.CSS;
     } else if (JSON_EXTS.some(ext= > ext === moduleExt)) {
         returnMODULE_TYPES.JSON; }}function traverseModule (curModulePath, callback) {
    curModulePath = completeModulePath(curModulePath);

    const moduleType = getModuleType(curModulePath);

    if (moduleType & MODULE_TYPES.JS) {
        traverseJsModule(curModulePath, callback);
    } else if(moduleType & MODULE_TYPES.CSS) { traverseCssModule(curModulePath, callback); }}Copy the code

Js module traversal

Traversing the JS module requires parsing the import and require dependencies in it. We use Babel to do this:

  • Reading file contents
  • Determine whether to enable typescript and JSX parse plug-ins based on the.jsx,.tsx, etc
  • Convert code to AST using Babel Parser
  • Traverse the AST with Babel Traverse
  • Handles the AST of ImportDeclaration and CallExpression, from which dependency paths are extracted
  • After the dependent path is processed and becomes a real path, continue to traverse the module of that path

The code is as follows:

function traverseJsModule(curModulePath, callback) {
    const moduleFileContent = fs.readFileSync(curModulePath, {
        encoding: 'utf-8'
    });

    const ast = parser.parse(moduleFileContent, {
        sourceType: 'unambiguous'.plugins: resolveBabelSyntaxtPlugins(curModulePath)
    });

    traverse(ast, {
        ImportDeclaration(path) {
            const subModulePath = moduleResolver(curModulePath, path.get('source.value').node);
            if(! subModulePath) {return;
            }
            callback && callback(subModulePath);
            traverseModule(subModulePath, callback);
        },
        CallExpression(path) {
            if (path.get('callee').toString() === 'require') {
                const subModulePath = moduleResolver(curModulePath, path.get('arguments.0').toString().replace(/['"]/g.' '));
                if(! subModulePath) {return; } callback && callback(subModulePath); traverseModule(subModulePath, callback); }}})}Copy the code

CSS module traversal

Traversing CSS modules requires parsing @import and URL (). We use postCSS to do this:

  • Reading file contents
  • Determine whether to enable the syntax plug-in of less and SCSS according to the file path of.less and. SCSS
  • Use postcss.parse to convert the contents of the file to an AST
  • Iterate through the @import node to extract the dependency path
  • The style declaration is iterated to filter out the value of the URL () and extract the dependent path
  • After the dependent path is processed and becomes a real path, continue to traverse the module of that path

The code is as follows:

function traverseCssModule(curModulePath, callback) {
    const moduleFileConent = fs.readFileSync(curModulePath, {
        encoding: 'utf-8'
    });

    const ast = postcss.parse(moduleFileConent, {
        syntaxt: resolvePostcssSyntaxtPlugin(curModulePath)
    });
    ast.walkAtRules('import'.rule= > {
        const subModulePath = moduleResolver(curModulePath, rule.params.replace(/['"]/g.' '));
        if(! subModulePath) {return;
        }
        callback && callback(subModulePath);
        traverseModule(subModulePath, callback);
    });
    ast.walkDecls(decl= > {
        if (decl.value.includes('url(')) {
            const url = /.*url\((.+)\).*/.exec(decl.value)[1].replace(/['"]/g.' ');
            const subModulePath = moduleResolver(curModulePath, url);
            if(! subModulePath) {return; } callback && callback(subModulePath); }})}Copy the code

Module path processing

Both CSS and JS modules need to be processed after extracting the path:

  • Supports custom path resolution logic, allowing users to customize path resolution rules as required
  • Filter modules under node_modules without parsing
  • The suffix of the completion path
  • If traversed modules skip traversal, avoiding circular dependencies

The code is as follows:

const visitedModules = new Set(a);function moduleResolver (curModulePath, requirePath) {
    if (typeof requirePathResolver === 'function') {RequirePathResolver is user-defined path resolution logic
        const res = requirePathResolver(dirname(curModulePath), requirePath);
        if (typeof res === 'string') {
            requirePath = res;
        }
    }

    requirePath = resolve(dirname(curModulePath), requirePath);

    // Filter out third-party modules
    if (requirePath.includes('node_modules')) {
        return ' ';
    }

    requirePath =  completeModulePath(requirePath);

    if (visitedModules.has(requirePath)) {
        return ' ';
    } else {
        visitedModules.add(requirePath);
    }
    return requirePath;
}
Copy the code

This completes the transformation from the parsed dependency path to its real path.

Path to completion

When writing code, it is possible to omit some file suffixes (.js,.tsx,.json, etc.), we want to implement the logic of completion:

  • If the suffix already exists, skip it
  • If it is a directory, try to find the file in index.xxx and return to the directory if it is found
  • If it is a file, try to complete the suffix for.xxx and return to the path if it is found
  • Error: module not found
const JS_EXTS = ['.js'.'.jsx'.'.ts'.'.tsx'];
const JSON_EXTS = ['.json'];

function completeModulePath (modulePath) {
    const EXTS = [...JSON_EXTS, ...JS_EXTS];
    if (modulePath.match(/\.[a-zA-Z]+$/)) {
        return modulePath;
    }

    function tryCompletePath (resolvePath) {
        for (let i = 0; i < EXTS.length; i ++) {
            let tryPath = resolvePath(EXTS[i]);
            if (fs.existsSync(tryPath)) {
                returntryPath; }}}function reportModuleNotFoundError (modulePath) {
        throw chalk.red('module not found: ' + modulePath);
    }

    if (isDirectory(modulePath)) {
        const tryModulePath = tryCompletePath((ext) = > join(modulePath, 'index' + ext));
        if(! tryModulePath) { reportModuleNotFoundError(modulePath); }else {
            returntryModulePath; }}else if(! EXTS.some(ext= > modulePath.endsWith(ext))) {
        const tryModulePath = tryCompletePath((ext) = > modulePath + ext);
        if(! tryModulePath) { reportModuleNotFoundError(modulePath); }else {
            returntryModulePath; }}return modulePath;
}
Copy the code

According to the above ideas, we realized the module traversal, found all the modules used.

Filter out useless modules

Above we found all the modules used, then we just use all modules to filter out the modules used, that is, the modules not used.

We encapsulate a findUnusedModule method.

Passing in parameters:

  • Entries (Array of entry modules)
  • Includes (Glob expressions for all modules)
  • ResolveRequirePath (custom path resolution logic)
  • CWD (root path for parsing module)

Returns an object containing:

  • All (All modules)
  • Used (used module)
  • Unused (unused module)

Processing process:

  • Merge parameters with default parameters
  • Handle the module path of includes based on CWD
  • Find all modules according to the Glob expression of includes
  • Iterate through all entires entries and record the modules used
  • Filter out the modules that are used and find the modules that are not used
const defaultOptions = {
    cwd: ' '.entries: [].includes: ['* * / *'.'! node_modules'].resolveRequirePath: () = >{}}function findUnusedModule (options) {
    let {
        cwd,
        entries,
        includes,
        resolveRequirePath
    } = Object.assign(defaultOptions, options);

    includes = includes.map(includePath= > (cwd ? `${cwd}/${includePath}` : includePath));

    const allFiles = fastGlob.sync(includes).map(item= > normalize(item));
    const entryModules = [];
    const usedModules = [];

    setRequirePathResolver(resolveRequirePath);
    entries.forEach(entry= > {
        const entryPath = resolve(cwd, entry);
        entryModules.push(entryPath);
        traverseModule(entryPath, (modulePath) = > {
            usedModules.push(modulePath);
        });
    });

    const unusedModules = allFiles.filter(filePath= > {
        const resolvedFilePath = resolve(filePath);
        return! entryModules.includes(resolvedFilePath) && ! usedModules.includes(resolvedFilePath); });return {
        all: allFiles,
        used: usedModules,
        unused: unusedModules
    }
}
Copy the code

In this way, our wrapped findUnusedModule does the initial job of finding modules that are not used under the project.

Testing capabilities

Let’s test the effect, using this directory as a test project:

const { all, used, unused } = findUnusedModule({
    cwd: process.cwd(),
    entries: ['./demo-project/fre.js'.'./demo-project/suzhe2.js'].includes: ['./demo-project/**/*'],
    resolveRequirePath (curDir, requirePath) {
        if (requirePath === 'b') {
            return path.resolve(curDir, './lib/ssh.js');
        }
        returnrequirePath; }});Copy the code

The results are as follows:

Successfully found the unused module! (You can pull the code down and run it.)

thinking

We have implemented a module traverser, which can start from a module to traverse. Based on this traverser, we realized the need to find useless modules. In fact, we can also use it to do other analysis requirements. This traversal method is universal.

We know Babel can be used for two things:

  • Code translation: from es Next, typescript, etc., to js supported by the target environment
  • Static analysis: Analysis of code content, such as type checking, Lint, etc., without generating code

The module traverser can do the same thing:

  • Static analysis: analyze the dependency relationship between modules, construct dependency graph, and complete some analysis functions
  • Packaging: Print each module in the dependency diagram as object code with the corresponding code template

conclusion

We first analyzed the requirements: we identified modules that were not used in the project. This requires implementing a module traverser.

Module traversal treats JS modules and CSS modules differently: JS modules parse import and require, CSS parse URL () and @import.

The analyzed path is then processed into a real path. To deal with node_modules, webpack Alias, typescript types, etc., we expose a callback function for developers to extend.

Once module traversal is implemented, we can find out which modules are used and which are not by specifying all modules, entry modules.

After testing, it meets our needs.

The module traverser is general-purpose and can be used for various static analyses, as well as for subsequent code printing as a wrapper.

The github address of the code is here, you can pull down to run, learn to write module traverser is quite helpful.

eggs

When I introduced this function to Hao Hao, I wrote a document of the implementation idea and posted it here:

Haohao: Brother Light, what was the overall idea? The code was messy at the beginning

Me: A module is a graph structure that specifies the entry point from which to traverse. This is actually a DFS process, but with circular references, to be solved by recording processed modules. Recursively iterating through the graph, the resulting module is used.

Haohao: DFS is a module, how to determine the submodule?

Me: Different modules have different processing methods. For example, JS modules need to use import or require to identify sub-modules, while CSS needs to use @import and URL () to identify sub-modules. But these are just extraction paths, which are still not available, and need to be converted to a real path, with a resolve path procedure.

Haohao: What does Resolve Path do?

Me: That is to process alias, filter modules under node_modules because we don’t need them here, and then determine the absolute path of submodules based on the path of the current module. Also expose a hook function that allows users to customize the Resolve logic of the Require Path.

Haohao: Is that the requireRequirePath?

Me: Right, that’s the hook exposed for the user to customize path Resolve logic.

Hao hao: Do I understand the general process?

Me: Tell me

Haohao: The modules of the project constitute a dependency map. To determine the modules that are not used, we need to find the modules that are used and filter them out. The module used should start DFS with several entry modules. Traversing different modules has different ways to extract require path. After extracting the path, resolve should be performed to obtain the real path, and then recursively process the sub-modules. That way you can go through and determine which ones you’re using. You also have to deal with circular references, because a module is, after all, a graph, and DFS has loops in it.

Me: That’s right, awesome.