preface
In March, I went to Deeplang community of Huawei Programming Language Lab (a programming language community for college students, maintaining a iot programming language called Deeplang) to share the Translation packaging Tool chain of the front-end domain. At that time, I made the PPT a little hasty, and later I improved the content and organized the article to share with you.
Here is the text:
Compilation and translation
define
Compile, “compile,” is the conversion of one programming language into another, usually from a high-level language to a low-level language.
A high-level language is a language described in characters that are easy to read and organize, with features such as conditionality, branching, looping, object-oriented, and so on. It does not care about the details of execution, but only describes logic, such as Javascript, C++, and so on.
A low-level language is a language that directly operates specific hardware such as registers. It is concerned with execution details. It does not have many features of a high-level language, and is generally not described by characters, such as machine language, assembly language, etc.
Transpile is a special type of compilation that is the compilation from a high-level language to a high-level language, such as C++ to Java, Typescript to Javascript, Javascript to Javascript, Css to Css, and so on.
Why do front-end domains need translators
The front end is mainly HTML, CSS, JS:
HTML, CSS from the source code parse into DOM and CSSOM, and then generate a render tree to the rendering engine to render. Start from the source code interpretation.
Js is a scripting language that parses source code into an AST at runtime, then converts it into bytecode and interprets it for execution. Also start with the source code.
The product of the target is the source code, so the front-end scenario naturally requires a variety of source-to-source translators.
Which translators are needed for the front end domain
Translation is the generation of source code after making changes to the source code, source to source. What translators are needed in the front end?
Javascript
- New features such as ES 2015, ES 2016, and ES 2017 are not supported by the target environment, but they need to be translated into the target environment, such as Babel and typescript.
- Javascript is a dynamically typed language. There is no concept of types at compile time and there is no way to do type checking ahead of time. You want to add type syntax semantics to Javascript, but you need to remove the type information after compilation. This also requires a translator, such as typescript and Flow.
-
Some frameworks require syntactic sugar, such as react’s React. CreateElement, which is too cumbersome to write. Hopefully, they can be written in an XML-like manner during development, and the translator will compile the syntactic sugar into a specific API, such as JSX.
-
The code needs to be compressed and various optimizations (dead code removal, etc.) at compile time, and then converted into object code, such as terser.
- Some code specifications, such as ESLint, need to be checked for errors during compilation.
Css
- You need to extend some of the capabilities, such as variables, functions, loops, nesting, and so on, to make CSS easier to manage, such as DOMAIN Specific Language (DSL) like SCSS, less, or stylus, or CSS Next, These are translated into the target CSS using translators such as SCSS, less, stylus, and PostCSS, respectively.
- You need to handle compatibility prefixes (autoprefixer), CSS specification checking (stylelint), CSS modules, etc., which are supported by the PostCSS translator.
Html
- As with CSS, there are extended capabilities such as inheritance, combination, variables, loops, etc., which are supported by template engines such as PUG, Moustache, etc., as well as their own translators to convert source code into object code at compile time (this conversion may also be done at runtime).
- Support for various content to HTML, such as MarkDown to HTML, which can be translated through PosthTML.
In short, front-end domains require many translators.
So all these translators, how do they work?
Principles of translators
The compilation process
These translators are similar in that they require parse, transform, and generate phases.
(Although the names may vary, for example postCSS calls genenrate stringify, and vue Template Compiler calls transform optimize)
Why do you need these three stages?
In order to convert, you have to understand the code. The way a computer understands code is to organize the information in the source code through certain data structures. This data structure is called abstract syntax trees.
Abstract because separators such as commas, parentheses, and so on are ignored. The reason is tree, because the code is generally nested relationship, need to use the tree parent-child relationship to represent the source of the nested relationship. So abstract syntax trees are the best data structures for computers to understand code.
Once you understand the code (generating the AST), you need to do various transformations. Terser does compilation optimizations like dead code removal, Babel does ES next to js, and the typescript compiler does type checking on the AST. Postcss also does a bunch of things with the AST, and so on. These are the analysis, additions and modifications of AST.
However, although different translators do different things with the AST, their overall compilation process is similar, and this is the general principle of translators.
sourcemap
One feature of translators is that they all have sourcemap. Sourcemap is a mapping between the generated code and the source code that maps to the source code. All translators are source to source, so there is always sourcemap.
{version : 3.file: "out.js".sourceRoot : "".sources: ["foo.js"."bar.js"].names: ["src"."maps"."are"."fun"].mappings: "AAgBC,SAAQ,CAAEA"
}
Copy the code
Sourcemap = sourcemap = sourcemap = sourcemap
-
Version: Indicates the source map version. The current value is 3.
-
File: indicates the file name after conversion.
-
SourceRoot: Indicates the directory where the file before the transformation resides. This item is empty if it is in the same directory as the file before the conversion.
-
Sources: files before conversion. The entry is an array because it is possible to merge multiple source files into a single object file.
-
Names: all variable names and attribute names before the transformation. Extract all variable names. The following mapping directly uses subscript references to reduce the volume.
-
Mappings: A collection of mappings between the pre-transformed and post-transformed codes, each line represented by a semicolon and separated by a comma.
The specific details recommend ruan Yifeng teacher’s article
Where is sourcemap used?
We use Sourcemap for two main purposes:
Locate source code while debugging code
Browsers such as Chrome and Firefox support adding a comment at the end of a file
//# sourceMappingURL=http://example.com/path/to/your/sourcemap.map
Copy the code
Sourcemap can be associated with a url or inline to Base64. The browser automatically parses the Sourcemap and associates it with the source code. Thus the break point, error stack, and so on will correspond to the corresponding source code.
Online error locating to the source code
Sourcemap is used for debugging in development, but not in production. It would be a major accident to pass sourcemap into production. However, when an error is reported online, it does need to locate the source code. In this case, sourcemap is usually uploaded separately to the error collecting platform.
For example, Sentry provides a Sentry WebPack plugin that allows you to automatically upload sourcemap to the sentry background after packaging, and then delete the local sourcemap. A Sentry – CLI is also provided to allow users to upload manually.
Of course, it’s not just Sentry, similar analysis platforms such as Byte’s DynatRace are also supported
Sourcemap is usually used in at least two scenarios: debugging the source at development time and locating errors at production time.
The principle of sourcemap
Given what sourcemap does, how is sourcemap generated?
The generated logic can be done by source-Map, a package provided by Mozilla. We only need to provide each mapping, namely the column and column numbers in the source code and the column and column numbers in the object code.
When the source code is parsed into an AST, its position in the source code (line, column) is preserved in the AST.
The AST does not change the column number
When the object code is generated, a new position (line, column) is calculated.
The two locations are combined to create a mapping. Mapping all AST nodes will generate a complete sourcemap.
This is how Sourcemap is generated.
A front-end domain translator
Having introduced the general principles of translators and the principles of Sourcemap, let’s take a look at the specific translators.
babel
Babel is a translator that converts es Next, typescript, Flow, JSX, and so on into a polyfill that is supported in the target environment and introduces the missing API.
The compilation process is also the standard parse, Transform, and generate step,
It provides API and command line usage.
The Babel of API
Babel 7 contains these packages:
@babel/ Parser converts code into AN AST that can be parsed using typescript, JSX, Flow, and other plugins
@babel/traverse traverses the ast, calling the visitor function
@babel/generate print ast into object code and generate Sourcemap
@babel/types Creates and determines AST nodes
@babel/template creates AST nodes in batches based on the code template
Babel /core complete process of converting source code into object code, using Babel’s internal transformation plug-in
Based on the API of these packages, you can complete the transformation of various JS code.
Babel API demo
We use the above API to complete a function that inserts some parameters into console.log and console.error.
The idea is to reinsert arguments when the CallExpression node corresponding to console.* is encountered.
Let’s use the code to achieve:
Run this to see the effect:
You can see that console.log, console.error, and sourcemap are inserted.
For more information on the principles and examples of Babel, check out my forthcoming Babel Plugin Tutorial.
typescript compiler
Typescript extends the syntactic semantics of types to Javascript, first by type derivation, then by checking the AST based on the type, so that errors are found during compilation, and then object code is generated.
The typescript Compiler is divided into five parts:
-
Scanner: Generate tokens from source code (lexical analysis)
-
Parser: Generate an AST from a Token
-
Binder: Generate Symbol from AST (semantic analysis – generate scope for reference resolution (i.e., whether the referenced variable was declared))
-
Checker: Type checking (Semantic analysis — type checking)
-
Emitter: Generating the final JS file (object code generation)
These phases can also be applied to large phases of Parse, transform, and generate (Babel also performs scoped semantic analysis, similar to TSC’s Binder).
The scope is usually generated during traversal, so the semantic analysis is carried out in the Transform phase. The AST transformation is not shown in the diagram, but there is.
The typescript Compiler process also corresponds to the three stages of the translator, but the typescript compiler does a little more semantic analysis (type checking).
The typescript compiler API
The typescript Compiler API is unstable and not even documented, but it is available in the typescript package that is exposed.
Let’s take a look at the typescript apis for parse, transform, and generate:
parse
The type of ts is often obtained from multiple files. You need to create a Program first and then get the AST from one of the paths, that is, the SourceFile object. (This is different from Babel, which directly parses the source code into an AST. This takes two steps.)
transform
The AST is traversed by TS.visiteAchChild, generated by TS.createxxx, replaced by TS.updatexxx, and determined by TS.syntaxkind.
Corresponding to the @babel/traverse and @Babel/Types apis, you’ll find that once you learn one translator, the rest of the translators are pretty much the same.
generate
Print AST into object code by printer.
tsc vs babel
We know that @Babel/Parser already parses typescript syntax after Babel 7. Should we compile ts with Babel or with the official typescript compiler?
I think it is better to compile TS with Babel and run TSC –noEmit separately for type checking.
There are several reasons for this:
-
Babel can compile almost all ts syntax, and a few unsupported cases can be bypassed.
-
The Babel generation code transforms the syntax on demand and introduces a polyfill based on the targets configuration, resulting in smaller target code. Typescript is also a coarsely granular target that specifies ES5 and ES3, and cannot be converted on demand. Polyfills are also introduced in full entry, resulting in larger code volumes.
-
Babel plug-ins are abundant, and the typescript Transform Plugin is not known to many people, let alone the ecology.
-
Babel compiles TS code without type checking. It is faster. When you want to do type checking, you can run TSC –noEmit separately.
In conclusion, TSC for type checking and Babel for code conversion are better choices.
eslint
Eslint can check code specifications against configured rules, and some rules can be fixed automatically.
Eslint checks the AST based on the user’s configured rules.
It also provides both the API and the command line, which will be used when developing the tool chain.
Eslint plugin demo
Let’s write an ESLint rule that reports an error when console.time is detected and can be automatically removed by –fix
The idea is similar to the Babel plug-in, which is a visitor pattern that declares what to do with what type of AST node, but in a slightly different form.
Declare meta information, such as what information is displayed in the document, whether it can be fixed, what error information is reported, etc.
The create visitor returns some apis, such as calling the REPORT API to report an error, and if the fix method is specified, you can automatically fix when the user specifies the –fix parameter. For example, here’s fixer.remove to remove the AST.
terser
Terser is a compiler optimized translator that can compress, obfuscate, and remove dead code from JS code. It’s also basically a must-have tool in the front toolchain.
Originally ugLifyJS, but since it did not support parse and optimization of code above ES6, terser was written.
Terser supports various compression and obfuscation options, which you can see in detail in the documentation.
It also supports API and command-line usage. The API works as follows:
swc
SWC is a JS translator written in Rust, which is fast.
The speed of a Javascript parser is only a disadvantage of an interpreted language, as the runtime parses from the source code and then interprets and executes it slower than a compiled language.
Its goal is to replace Babel, but we’ll see what happens next.
postcss
CSS translators, like Babel, also support plug-ins, and the plug-in ecosystem is thriving.
It provides apis for Process, Walk, and Stringify, which correspond to parse, Transform, and Generate respectively.
For example, here is the code that extracts all dependencies (URL (), @import) from CSS
Postcss plugin
Postcss, like Babel, has a strong plug-in ecosystem.
Its plug-in takes the form of:
Plugins are also additions and modifications to the AST, but there is a difference. Instead of a visitor pattern like esLint and Babel plug-ins, you need to walk through the AST, find the target AST and then transform it. (The same is true of the typescript Compiler API, such as ts.ForeachChild.)
We can conclude that the interpreter operates the AST in two ways, the visitor mode and the manual lookup mode.
- Implementations of the Visitor pattern include Babel and ESLint
- Manual lookup methods include typScript Compiler, PostCSS, posthTML
posthtml
Posthtml, as the name suggests, translates HTML and supports plug-ins.
Here’s an example of a posthTML plugin:
The traversal is a manual lookup, similar to postCSS.
prettier
Prettier is an interpreter used to format code. Unlike other translators, prettier supports friendlier formats in the Transform phase, such as wrapping lines when code is too long.
It overlashes eslint and stylelint, and generally disables formatting rules for the Lint tool, leaving only errors checked, such as eslint-prettier and stylint-prettier.
Prettier usually uses the command line, but an API is used for toolchain development.
It can format not only JS, CSS, and a lot of other code
Use of the translator in the project
We have described a series of translators, each of which performs different functions. How are these translators used in our project?
There are three ways translators can be used in projects:
-
Ide plug-in. Lint, type check, typescript, and format code as it is written, such as the common eslint vscode plugin, typescript vscode plugin (this one is built-in), etc.
-
Git hooks. Execution is triggered by Husky’s Git Commit hook. “Prettier”, for example, only needs to be formatted when the code is submitted.
-
Called through the packaging tool. The translator is for a single file, and the packaging tool is for multiple files. During the packaging process, each file is processed by calling the corresponding translator, such as webPack’s Loader.
conclusion
We first clarified the difference between compilation and translation, and explored why and what translators are needed for front-end domains.
Then you learned the general process of the translator: Parse, Transform, genenrate, and learned how Sourcemap works and how it works.
“Babel”, “typescript”, “eslint”, “terser”, “SWC”, “postCSS”, “posthtml”, “prettier” and so on.
In the end, we summarized three ways to use translators in projects: IDE plug-ins, Git hooks, and loaders for packaging tools.
Hopefully, this article has given you a comprehensive understanding of translators.
(This is the first half of the content of “Front-end domain translation packaging tool chain” which I went to Huawei to share. There are more contents, and it is split into two articles. The next article, the second half of the Front-end Domain’s Translation packaging Tool Chain, will cover modularity, packaging tools, interpreters, and the closed loop of front-end engineering.