preface
Building packaging is one of the key components of front-end engineering. As a front-end developer, there is no getting around the problem of building packaging tools. Building tools help streamline and automate the front end, and have a profound impact on front-end frameworks, most of which rely heavily on compile-time tools for implementation.
This time we are oriented to compile the basic function of packaging, from zero to develop a modular packaging tool.
The outline
The main topics are as follows:
- compile
- Module parse
- Object code generation
- packaging
- Practical example
compile
Since it is a modular packaging tool, we need to parse out which modules the module depends on from the source file. We also need to do the following:
- Source files may not run directly in the browser, and need to be translated into object code with equivalent meaning, imported via JS, such as IMG, CSS;
- Some JS supersets or dialects, such as typescript, Coffeescript, etc., some js standard features, some browsers are not practical, JSX, etc., all need to do translation, translation into the browser can directly run JS code
- There are other engineering requirements, such as SVG to React Component, base64, etc
Traditional compilation consists of five phases:
Lexical analysis, syntax analysis, semantic analysis, code optimization, object code generation.
This time, we will focus on phases 1, 2 and 5.
The third phase of semantic analysis is to check the static semantics of language categories, such as TSC, ESLint stylelint and other tools.
The fourth phase of code optimization mainly follows the equivalent replacement of code, such as extracting common subexpression, deleting useless code, and circular optimization. This is more of a Babel Transform phase for the front end.
Pre-knowledge about finite state machines:
Finite state automata have a finite number of states, each of which can be migrated to zero or more states, and the input string determines which states to migrate to. A finite state automaton can be represented as a directed graph.
Compile-time parsing uses this concept.
Lexical analysis
String scanning and decomposition of the input source program to identify individual word symbols, such as identifiers, operators, strings, numbers, etc.
ECMAScript® 2022 Language Specification (TC39.ES);
The following characters can be distinguished and divided:
- Control character, zero width connection, zero width non-connection character;
- White space characters, tabs, whitespace, etc;
- Line breaks, LF, CR;
- Annotations;
- Various punctuation characters (CFS), +-=*/;
- Tokens, identifiers, strings, numbers, Template, TemplateSubstitutionTail, regular expressions;
All you need to do is match enumerations based on what the standard defines. Here are some interesting points:
The Numbers:
Note that the. And e characters are included within the entire numeric format. So it’s not a simple special character segmentation that will do the trick.
Template string
The Template string is split into two. One is Template, which belongs to CommonToken and is only responsible for matching:
- NoSubstitutionTemplate, which can include newlines compared to strings;
- TemplateHead,– Added special characters in the ${section. Note that these special characters are also included in CLOCkers;
The other is the content belonging to the middle and tail parts of the template characters:
\
In general, for example, the template character 111${a}22${b}33 will be split into
- 111 ${`
- a
- 22 ${}
- b
- } 33 `
In this way, the purpose of splitting is to know the identifiers A and B for the syntax and semantic analysis.
Why not emoji as an identifier?
支那
支那
With the exception of a few special characters, an identifier consists of the following.
By this specification www.unicode.org/Public/UCD/… To search for ID_Start, we can see all the contained limited character sets.
. Omit a lot 4E00.. 9FFC ; ID_Start # Lo [20989] CJK UNIFIED IDEOGRAPH-4E00.. CJK UNIFIED IDEOGRAPH-9FFC ... Omit a lot of AC00.. D7A3 ; ID_Start # Lo [11172] HANGUL SYLLABLE GA.. HANGUL SYLLABLE HIH D7B0.. D7C6 ; ID_Start # Lo [23] HANGUL JUNGSEONG O-YEO.. HANGUL JUNGSEONG ARAEA-E D7CB.. D7FB ; ID_Start # Lo [49] HANGUL JONGSEONG NIEUN-RIEUL.. HANGUL JONGSEONG PHIEUPH-THIEUTH F900.. FA6D ; ID_Start # Lo [366] CJK COMPATIBILITY IDEOGRAPH-F900.. CJK COMPATIBILITY IDEOGRAPH-FA6D ... Omit a lot ofCopy the code
😊 The value is “\ ud83D \ude0a” encoded in Unicode, and d83D does not belong to ID_Start.
The onion is encoded as “\u6d0b\ U8471”, 6d0b belongs to ID_Start, and the latter ID_Continue can be checked by you.
Does this code catch errors?
try { 2.a } catch(e) { console.log(e); }
Copy the code
The answer is no. Because the digital token is illegal, the JS interpreter will report an error during the lexical parsing phase and will not wait until the execution phase.
\
Analytical process
Special characters in the configuration:
\
Start the lexical analysis:
\
Each conditional branch matches with the smallest condition, matches special characters first, and is eventually treated as an identifier.
Combine whitespace characters and newlines, and combine operators. For example, continuous operators may have special meanings. For example, += represents an independent meaning, rather than being split into + and =, which will lose original information and affect subsequent parsing.
Strings, for example, must fully match between ‘or’, have no newlines in the middle of the string, and have symmetric quotes.
After lexical analysis, we can get something like the following:
Syntax analysis
This is where we pay more attention to, from the lexical analysis of a token parsing, translated into structured abstract syntax tree objects.
Through parsing, we can finally distinguish declarations, expressions, statements, functions, and so on.
Let’s analyze an import statement.
In the figure we can see all the states of ImportDeclaration. The import identifier is followed by namespace import, name import, default import, and namespace import. If these states are not met, you need to report an error directly.
The rest of the statements are resolved in a similar way. Based on this, we parse out a simplified version of abstract syntax tree.
Since we only need to parse ImportDeclaration and RequireCallExpress for the time being, the other syntax will be ignored.
Source introduction
Parse is entered when it encounters a statement that is ImportDelaration. StartWalk is the starting point of traversal, that is, the entry point of the child state machine. It enters different subsequent state nodes according to different conditions, and finally parses into an ImportDeclaration object after traversal. Will contain the key information.
ImportsList is the content to be imported, nameSpaceImport is the alias of the whole import, and fromClause is the address of the dependent module, which can be relative, absolute, or modules in node_modules. With the fromClause we can proceed to the next step of module parsing.
Module parse
This is about parsing a single file. The modular approach to packaging is based on one entry, using that entry as the root node to find the entire dependency tree. This chapter focuses on the process of analyzing dependency trees.
When we compile the entry file, we get the abstract syntax tree, we can parse out the subdependencies, we just have to walk through the tree, and it doesn’t really matter which way we walk through the tree.
Key code example: The parseDependencyGraph method resolves the entire dependency tree based on the input node, which is the entry file by default. According to the module node, compile the content of this node and resolve the module that the node depends on. Node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules: node_modules The search method for the file path is the same as that for node to load a JS file. Here we do a deep walk through the node to resolve the node content and dependencies.
A Module consists of the Module path, text content, and dependent modules.
Circular dependencies
The resolved module can be stored in a Map set, and the key is the absolute path after the module resolve. If you encounter one that has been loaded, skip it.
Now that you have only implemented the dependency resolution of JS files, let’s consider other types of files. For other types of file parsing, consider using plug-ins to support external customizations.
\
Image and other file resources plug-in
Directly as a file, no dependencies.
CSS style resource plug-ins
Object file generation
After the module is parsed, we get a dependency tree with the package entry as the root node. The whole dependency tree only has the information of the source file. We need to traverse the whole dependency tree, process each node, and convert it into the target file. The translation of each node is also part of the compilation section above.
For JS we need to convert the AST into a runnable JS string. This step is relatively easy to parse, there is no need to do state backtracking, all the states are known.
Post the code, focusing only on the inport export part.
\
For image files, we go straight to base64 String.
For the CSS file, all our object code needs to do is create the style tag, inline its content, and insert it into the head.
packaging
The object code for the entire tree is also generated. We then iterate based on this tree, wrap each module with a function, use the keyword of CMD specification as the function input, store all modules in a big map, key is the file relative project path, value is the module execution function.
Insert the generated Map string into the following runtime code, output to the target file, and you’re packaged!
In the picture aboveROOT_MOD_HOLDERIs replaced with an object code map for all modules.
ROOT_PATH_HOLDER is replaced with the key of the start entry.
Module runtime, each module is executed only once, and the execution results are saved under exports for later use.
Practical example
Here we run a demo, will contain styles, and a variety of JS modules, for an overall functional demonstration.
Function demonstration, code reference at the end of the address. Interested can download down to run a run.
Packaged code
conclusion
Through this article, we have gained a deeper understanding of the front-end build packaging tool. You learned about the processes of packaging and what each process needs to do. It also includes what modern packaging tools need, such as packaging of non-JS files.
We are missing some very important features compared to the Webapck packaging tool. Examples are more powerful plugins, chunk splits, more scalable architectures. The plug-in and Loader mechanism is the root of WebPack’s power and builds an ecosystem on it. We’ll look at these core webPack capabilities later.
To the end.
See: github.com/Zenser/tiny…