Webpack5 optimization and working principle from the design of building interprocess cache

Looking at the calendar, I found that today is the first solar term of the year 2020 — Start of Spring. In ancient times, with the “bucket handle pointing” method, the start of spring with the Beidou star handle pointing at Yin. The ganzhi era, with the beginning of spring as the beginning of the year, means that a new cycle has been opened, all things start, all rehabilitation. Although we are still facing the challenge of the precipice of the epidemic, today represents warmth and growth. I would like to sort out the technological development of this year and do some spring sowing work at this moment.

Let’s first focus on Webpack 5, which is about to break ground. Although there have been some attempts to grab fresh water at home and abroad, there are also some good articles: Webpack 5 upgrade experiment, about the upgrade path and experience, but there is no design analysis from the perspective of technical principle.

In this article, I will take A look at the history, technical background and implementation of Webpack 5’s most anticipated “persistent caching” feature in the direction of Spec: A Module Disk Cache between Build Processes. At the same time, I hope to get a glimpse of the overall Webpack construction process. The whole article will design a large number of Webpack implementation principles and system design, reading requires some pre-knowledge and understanding costs. Bottom line: “You think of caching as more than just” space for time, “and you think of Webpack engineers as the most powerful pieces of the puzzle in the front-end architecture.”

About the current situation and the depth of the scene about the problem and the direction of solution

In this part, we will introduce the pain points and existing solutions of the existing WebPack construction from two aspects. We will analyze the shortcomings of these solutions one by one, and discuss whether there is a more feasible and elegant official solution from the official perspective of WebPack.

Continuous processes and caches

For large complex project applications, it is customary for developers to use the Webpack — Watch option or webpack-dev-server to start a continuous process during the development phase to achieve optimal build speed and efficiency. Both the Webpack — Watch option and webpack-dev-server listen on the file system to trigger a continuous build if necessary. This detail is worth studying, source code involves file system monitoring, memory read and write, compatible with different operating systems, plug-in and layered cache design, etc., there are also a lot of famous performance optimization issues, here is no longer introduced one by one, but extracted key points need readers to understand first, to help the understanding of the following text digestion:

  • Normally starting the Webpack build process calls the Compiler.run method
  • The Webpack build process started in Watch mode invokes the Compiler. watch method and starts a Build Watch service
  • Webpack-dev-server is a small NodeJS server that uses the webpack-dev-middleware package, which eventually calls the Compiler. watch method
  • Watch mode relies on various levels of caching to speed up subsequent builds
  • — In watch mode, after the first build, Webpack will monitor the file changes on the Watching prototype method _done (watch.prototype._done) and build in real time in order not to start the build process repeatedly
  • Therefore, the Watch service process is in a “build -> listen for file changes -> trigger rebuild -> build” loop
  • Webpack uses graceful-fs to read and write files. It extends and encapsulates the FS module in Node.js to optimize the usage
  • In addition to graceful-FS, the industry (such as Webpack-dev-middleware) uses memory-FS, The outputFileSystem of the Compiler can be set as a MemoryFileSystem, so that the resource compiled file is not output in the way of memory reading and writing, which greatly improves the build performance
  • Up to this point, the source of the file listening logic focuses on the Watch method of the Compiler. watchFileSystem object, which is loaded with the Watchpack module of The Webpack assisted by the NodeEnvironmentPlugin
  • The bottom layer of the listener file (folder) is executed using the Chokidar package
  • When a file (clip) changes, in addition to the instance event trigger, the file change data update, and FS_ACCURENCY calibration logic. FS_ACCURENCY is calibrated to balance “low accuracy of file system data resulting in the same Mtime that does change.
  • The functionality triggered by low-level file (folder) listeners relies on inheriting from EventEmitter and ultimately completes the upper-layer Webpack rebuilding process
  • Webpack’s — Watch option has a batching like capability built in, which we call aggregateTimeout. After the change event is triggered for a file or folder on which the Watchpack instance is listening, the changes are temporarily stored in the aggregatedChanges array and emitted to the upper layer 200ms after the last file or folder has not changed

This should give you an idea of what’s going on behind the scenes in WebPack-Watch mode and the rationale behind the fact that the first build in Webpack-Watch takes much longer and subsequent builds are much faster. The principle is known as “caching”, but we need to understand how the Watch process leverages the event model, uses multiple logical layers, and decouples the trigger process to achieve clean and reliable code. All kinds of details need to be combined with the source code one by one analysis, here is not our focus, not to expand.

Industry construction optimization scheme sorting and analysis

However, not all Webpack uses require the initiation of a continuous process, For example, in the Continuous Integration (CI) stage and the Production Build stage. These phases are even more expensive to build, as developers need to add plugins such as code optimization and compression to the Webpack configuration for CI/CD pipeline and build line applications.

There are a number of great solutions to shorten Webpack build time and cost, including but not limited to:

  • cache-loader
  • DllReferencePlugin
  • auto-dll-plugin
  • thread-loader
  • happypack
  • hard-source-webpack-plugin

Cache-loader can be added before some loaders with high performance overhead to cache results to disks. DLLPlugin and DLLReferencePlugin realize the separation of bundles, save the cost of repeatedly constructing bundles, and greatly improve the construction speed. Thread-loader and happypack implement separate worker pools for multi-process/multi-thread loaders. Interestingly, however, vUE – CLI and create-React-app do not use DLL technology, but instead use a better alternative: hard-source-webpack-plugin.

These community method optimizations are implemented at the expense of some file volume and subsequent optimization space. There is also a learning cost to using all of these methods, not to mention the cost of the average developer participating in the implementation and development.

Unsafe Cache: unsafe Cache

We talked more about the concept of module caching above. In addition to modules, there is another important caching target, which we call resolver’s unsafe Cache.

What’s resolver’s unsafe cache? Let’s start with the concept of resolver in Webpack. The big idea behind Webpack is that everything is a module. In projects we can use the ESM approach import ‘./ XXX/XXX ‘or import’ some_PKG_in_nodemodules’, or even alias: import ‘@/ XXX ‘for modularity. Webpack handles these references by finding the correct target file through the resolve process. In fact, not only the introduction declaration in the project code, but also the overall processing process in Webpack, including the finding of loaders, etc., as long as the file path is involved, the resolve process is dependent on. So resolve can simply be understood as “file path lookup.” Webpack also exposes the Resolve configuration to the consumer so that we can configure the file path lookup process appropriately, such as setting the file extension, finding the directory to search for, and so on. Therefore, the Resolve process also involves many time-consuming operations.

The implementation of resolver in Webpack source code mainly relies on the ResolverFactory of Enhanced resolve, which creates three types of resolver:

  • NormalResolver: provides file path resolution for importing common files
  • ContextResolver: provides directory path resolution for dynamic file import
  • LoaderResolver: resolves file paths for importing Loader files

At Webpack build time, for each type of module, Resolver is used to predetermine the existence of the path and obtain the full address of the path for subsequent loading files. Of course, for all three types of resolvers, caching is also set up: Webpack itself caches the resolve result via the UnsafeCachePlugin, and returns the cache path result for the same reference. The principle of the UnsafeCachePlugin is simple: It through UnsafeCachePlugin. Prototype. Apply method, covering the resolve method of original Resolver instance, a new method on the packing layer path cache results, and packaging in the completion of the original method to update the cache by logic.

Sounds simple enough, but the design and implementation process is tied to the “need to rebuild” decision, which is worth digging deeper. Let’s analyze it in detail:

After the required file path lookup is done using the UnsafeCachePlugin, if the editing process does not go wrong and the current loader calls this.cacheable() and there is a result set from the last build, The decision policy is determined by the two key elements of the current module’s this.fileDependencies and this.contextDependencies. This. FileDependencies indicates the fileDependencies associated with the current module. This. contextDependencies indicates the folder dependencies associated with the module. ContextTimestamps = contextTimestamps; contextTimestamps = contextTimestamps; If timestamp >= buildTimestamp, a recompilation is required. If you do not need to recompile, read the values of the cache properties in the compilation object.

Why does UnsafeCachePlugin need to have an unsafe prefix in its name? In fact, the unsafe cache is turned on by default in Webpack Core, but it sacrifices a bit of resolving accuracy and means that the ongoing build process requires repeated restarts of the resolving policy, This is the process of collecting changes to the resolution strategy of the file. There is also a cost involved in identifying whether the resolutions have changed, but for most applications, application caching resolutions are more cost-effective and can significantly improve application build performance.

New design proposal for Webpack 5

With that in mind, we move on to explore the pitfalls of existing solutions and the “behind the scenes” of Webpack 5 persistent cache design. Webpack 5’s much-anticipated persistent cache optimizes the entire build process, and the principle remains the same: when a file change is detected, only files related to the dependency tree are compiled based on dependencies, dramatically speeding up builds. In official tests, the speed of a single page application consisting of 16,000 modules can be increased by 98%. One of the things to note is that persistent caching stores the cache to disk.

For a persistent build process, the first build is a full build that exploits disk module caching to benefit subsequent builds. The subsequent construction process is as follows: Read disk cache -> Verify module -> Decapsulate module content. Because relationships between modules are not explicitly cached, relationships between modules still need to be validated during each build, exactly the same logic as normal WebPack does when analyzing dependencies. The resolver cache can also be stored persistently. Once the Resolver cache is found to be an accurate match after verification, it can be used to quickly find the dependency relationship. In the case of a resolver cache verification failure, resolver’s normal build logic will be executed directly. Normally, a resolver change will also trigger a hook that changes the file path during a continuous build.

Cache design and security validation

So how do you design such a persistent cache? JSON is definitely the best choice in terms of data type and structure. With JSON data, we can read and write the module cache data on disk as well as each module’s status field, which will be used during the validation phase for cache availability.

One of the most important aspects of this cache design is the security and availability of the cache, which means that we need to ensure that the persistent cache is a safe cache. Any cached data should have a logic to verify availability. How to ensure the accuracy of calibration is a very important core issue. Specifically: For each module, we do usability checks against timestamps and content Hashes. But timestamps are not a guarantee of accuracy, because in practice there are cases where the content of the file changes but the timestamps do not change or even the timestamps become smaller (for example, the file is deleted in the dependency context or has been renamed). So using hashes (or other content comparison algorithms) for file contents makes a lot of sense. Note that Filesystem metadata warmth is also insecure because metadata is confusedby file size and last modified time.

Security of cache validation will be a key differentiator for Webpack 5 from previous releases, so let’s summarize:

  • Validation of module content based on timestamps or Filesystem metadata needs to be replaced by a hash algorithm or other content-based comparison algorithm
  • Timestamps validation for File dependencies needs to be replaced by the File content hashes algorithm
  • The validation of timestamps in Context dependency needs to be replaced by the hashes algorithm for file paths

File Dependency and Context Dependency mentioned here are important concepts in Webpack, which will not be expanded here. The reader only needs to understand the design idea of cache verification.

In addition to the module cache, the resolver cache mentioned earlier also requires a similar cache validation process. These two validation processes also need to be optimized for better performance and build speed.

Cache Sibling and cache set concepts

Changes in file dependencies, and changes in file contents, trigger builds that validate and apply each module’s cache. It is also worth considering that different Webpack configurations, such as changes to Webpack configurations, should not directly invalidate module caches, but should correspond to different collections of caches.

Webpack configurations can be changed frequently during development, and we can use hash of the configuration content to mark the cache collections under different configurations. For a continuous build, each new build finds a matching cache set based on the current configured hash and continues the build process.

The diagram below:

Another concern is caching and updating of third-party dependencies in the node_modules file. For Yarn and Npm 5 or later, the packet management mechanism itself can use hash verification of lock files to ensure consistency of dependent content and update listening. For versions below Npm 5 and beyond, we still need a caching and updating mechanism to ensure consistency of dependent content and change listening. A common approach is to hash all the package.json files in the first layer of the node_modules folder.

Cache elimination policy design

When designing any kind of cache architecture, we should consider not only cache verification, but also cache capacity limitation. Webpack 5 persistent caches cannot of course scale indefinitely, and are essential for proper disk utilization and cache cleanup Settings.

Mzgoddard, the early Webpack 5 core developer, discussed the design: for a cache set, there should be no more than five caches, the maximum cumulative resource usage should not exceed 500MB, and the oldest cached content should be removed first when approaching or exceeding the 500MB threshold. The cache is also designed to be valid for two weeks.

This is actually similar to a classic LRU cache design. The algorithm weeds out data based on the historical access record of the data. The core idea is that “if the data has been accessed recently, the probability of being accessed in the future is higher”. LRU cache is generally implemented based on HashMap and bidirectional linked list. Specific content only do prompt, no longer expand.

Other considerations

There are still many key points to consider for a robust and effective caching system, especially for a complex build tool like Webpack. Here we make a brief list, not exhaustive, more important is to enable readers to have a more three-dimensional cognition:

  • Caching requirements for Webpack Plugin and Loader developers

For Webpack Plugin and Loader developers, the cache architecture needs to implement tools or strategies out of the box to debug and validate the cache. It also needs to expose the Webpack Plugin and Loader developers to the ability to turn caching on and off, as well as the ability to invalidate full caching.

  • Caching requirements for the average developer

The core requirement for the average developer, of course, is the absolute magnitude of optimizations that can be done with a cache system. You also need to prompt for performance non-optimized builds that do not have caching enabled, and this prompt should be turned off. The operation of caching does not need to be carried out manually by ordinary developers, and all the operation of caching system should be an automatic process.

It is also possible for developers to use Webpack plugins and loaders that do not support persistent caching. Of course, the design and construction of the cache architecture should not break the whole Webpack ecosystem.

  • For Webpack developers

For the official Core Webpack developers, the caching architecture also needs to provide testing and debugging capabilities.

  • CI oriented processes and cross systems

Front-end builds involved in CI/CD (Continuous Integration/Continuous Deployment) are always an interesting topic. For Webpack 5 persistent caching, there should also be proper controls and design for CI/CD processes and cross-system scenarios.

In general, the persistent cache at this stage should be “portable,” which we use the term portable for this feature. For the clones of different CI instances or different projects on different system devices, cache should be reused and utilized across environments, which is also the portability of cache in ci-oriented processes and across systems. For example, cache content should be available on different CI instances in different pipeline phases; Or pull the latest cached content from a common, centralized store without needing to fetch the cached content through the first full build.

  • Persistent cache information is written to Webpack Stats

Readers familiar with the Core Webpack architecture should be familiar with Webpack Stats. Webpack Stats is the statistical analysis of a Webpack build. It is very important for analyzing the Webpack build process and optimizing the build solution. We should also add the disk cache information associated with the persistent cache to this information. For example, compile cache ID, disk space occupied by cache, etc.

conclusion

This article does not post source code to analyze the implementation of Webpack 5 persistent caching, but to explain the existing Webpack build process and caching process from the design architecture. Which involves more Webpack core principles and basic concepts, in the process of reading readers can check at any time to fill the gaps. The cache system is simple to say, but how to realize the elegant, complete the system, security, engineering and other aspects of consideration, still need every developer to think.

At this point in time, I believe that everything in the future is like Albert Camus in the Plague: “Spring is coming from all the remote regions towards the epidemic. Thousands of roses still languish in the market-places and in the florists’ baskets, but the air is full of their scent. At the same time, another sentence in the book also impressed me: “The true generosity towards the future is to devote everything to the present.” The same is true for fighting the epidemic and learning the path to progress.

Happy coding!