A preliminary study on abstract Syntax tree -AST

paraphrase

an abstract syntax tree ( AST ), or just syntax tree , is a tree representation of the abstract syntactic structure of source code written in a programming language . Each Block area between the tree and available space in the source code – wikipedia

The AST, or abstract syntax tree, is a tree-like abstract representation of the syntax structure of the source code. Each node corresponds to a code statement in the source code. It is a platform-independent data description.

How to generate

The code we write is essentially a structured text of instructions, a high-level programming language designed to be understood and read. But the processor can only recognize a string of binary machine code, not text code directly. The compiler serves as a bridge between the two. Its role is to translate source code into semantically equivalent object code. Machine code and AST are common compilation results.

The process

In the process of converting source code into AST, there are three steps: lexical analysis, syntax analysis, and semantic analysis.

Lexical analysis

The process of converting character sequences to token sequences, also called tokenize, is scanned primarily by lexical Analyzer. The analyzer reads characters one by one, divides the code text into an array of token elements according to the corresponding programming language rules and keywords, and distinguishes the corresponding token types. Take a life 🌰

I walked home happily ==> I, happily, to, walk back, homeCopy the code

In javascript: The following JS code will be sliced into the following token sequence

const num = 1;
console.log(num);
Copy the code

[{type:'identifier'.value:'const'},
    {type:'whitespace'.value:' '},    
    {type:'identifier'.value:'num'},
    {type:'whitespace'.value:' '},
    {type:'operator'.value:'='},
    {type:'whitespace'.value:' '},
    {type:'num'.value:'1'},... ]Copy the code

According to the above results, there are generally the following types of token segmentation:

The keyword
identifier
The operator
literal
punctuation

Syntax analysis

Syntactic analysis (also known as parsing) is the process of analyzing and determining the grammatical structure of an input text consisting of sequences of words, such as sequences of English words, according to a given formal grammar

Life 🌰 :

, happy, and I go back to home, = = > I (main), cheerfully (form), go back to the home (call), (b) meet the subject-predicate bing fixed shape ✅ complement structureCopy the code

In programming languages, all kinds of statements also have fixed syntax: conditional statements contain the if keyword, judgment condition, execution block declaration statements contain declaration keyword, identifier……

SyntaxtError is thrown directly at this step if the code has a syntax problem, interrupting the compilation process

After determining the correct syntax structure, the compiler combines the flat token list into declarative statement nodes and expression nodes according to the syntax rules, and finally forms a syntax tree with nested structure

You can see that many redundant tokens have been removed, such as: =; The abstraction of AST is also reflected here, where many of the details of code writing are hidden and only node types are used to explain the meaning of the corresponding code.

Semantic analysis

Semantic analysis is to check the relevancy of syntactically correct source programs before and after the context, and check whether the semantics of the program are consistent. Life 🌰 :

// In normal context I happily walk home ✅ I happily roll home ❌Copy the code

For programming languages, semantic analysis generally includes type checking, scope analysis and so on. Most static languages, such as Java and C, have this check; Javascript, as a dynamically interpreted language, has no type checking and can generally get a complete AST during parsing

AST Detailed Display

Astexplorer.net/#/gist/38a8…

The practical application

Application of AST in Babel

Industry summary of Babel process:

Webpack DCE

A common scenario for Webpack compilation is context-specific packaging. Through DefinePlugin, environment variables are injected. During compilation, the related variable token is replaced with the injected variable value, and dead code elimination (DCE) is performed in combination with AST. Redundant development code was removed and simplified.

/ / the original index. Js
if (process.env.NODE_ENV === 'production') {
  // Production code
  console.log('Welcome to production');
}
if (process.env.DEBUG) {
  // Develop debug code
  console.log('Debugging output');
}
console.log('running');

// webpack.config.js
new webpack.DefinePlugin({
  'process.env.NODE_ENV': JSON.stringify(process.env.NODE_ENV),
  'process.env.DEBUG': false});// Environment variable injection
if ('production'= = ='production') {
  // Production code
  console.log('Welcome to production');
}
if (false) { //
  // Develop debug code
  console.log('Debugging output');
}
console.log('running');

// After DCE processing
console.log('Welcome to production');

console.log('running');
Copy the code

Tree shaking is essentially removing and simplifying redundant code, but the granularity is different compared to DCE, which is more about removing logical layers of code, whereas Tree shaking is removing redundancy between reference modules. In addition, Tree Shaking takes advantage of the static import feature of ES6 module syntax for module dependency analysis to eliminate redundant modules

H5 with applets

In recent years, with the popularization of wechat applets, each platform has launched its own open technology of applets. Due to the lack of standards, each platform applets implemented their own SET of DSLS, becoming the “browser compatibility problem” of the new era. However, a careful understanding of the syntax and technical characteristics of each platform applets, are template + business logic, and VUE template language is very close. Wechat applets:

<! -- Data binding -->
<view> {{ message }} </view>
<! -- Conditional Render -->
<view wx:if="{{condition}}"> nice day! </view>
<! -- List render -->
<view wx:for="{{[0, 1, 2, 3, 4]}}"> {{item}} </view>
Copy the code

Page({
  data: {
    condition: true.message: 'heelo word',}})Copy the code

H5 (Vue2.0) :

<! -- Data binding -->
<div> {{ message }} </div>
<! -- Conditional Render -->
<div v-if="condition"> nice day! </div>
<! -- List render -->
<ul>
  <li v-for="item in items" :key="item.message">
    {{ item.message }}
  </li>
</ul>
Copy the code

var example1 = new Vue({
  el: '#example-1'.data: {
    items: [0.1.2.3.4].condition: true.message: 'heelo word',}})Copy the code

Based on this background, various cross-platform small program frameworks gradually emerged, one of which is compile-time framework, such as Taro2.0, which parses the source code into AST and further converts it into the code supported by each platform

Reference links:

Explain JS in JS! Explain AST and its applications

AST in Modern JavaScript

An understanding of AST – Start writing the Babel plug-in

How to choose applets framework in 2020