In the last article, we started with the packges/vue/ SRC /index.ts entry and looked at the compilation process of a Vue object. In this article, we mentioned that the baseCompile function generates the AST abstract syntax tree during execution, which is a key step. Because only when we get the generated AST can we traverse the AST nodes for transform operations, such as parsing v-if, V-for and other instructions, or analyzing the nodes to statically promote the nodes that meet the conditions, all of which depend on the AST abstract syntax tree generated before. So today we’ll take a look at AST parsing and see how Vue parses templates.

Generate AST abstract syntax tree

Let’s start by reviewing the logic and subsequent use of ast in the baseCompile function:

export function baseCompile(template: string | RootNode, options: CompilerOptions = {}) :CodegenResult {

  /* Ignore the previous logic */

  const ast = isString(template) ? baseParse(template, options) : template

  transform(
    ast,
    {/* Ignore the argument */})return generate(
    ast,
    extend({}, options, {
      prefixIdentifiers
    })
  )
}
Copy the code

Since I’ve commented out the logic we don’t need to focus on, it’s now very clear what the logic inside the function is:

  • Generating ast objects
  • The AST object is passed as an argument to the transform function, which transforms the AST node
  • The ast object is passed as an argument to the generate function to return the compiled results

Here we focus on ast generation. As you can see, the ast generation is judged by the ternary operator. If the template template argument passed in is a string, baseParse is called to parse the template string, otherwise template is directly treated as an AST object. What does baseParse do to generate an AST? Take a look at the source code,

export function baseParse(
  content: string,
  options: ParserOptions = {}
) :RootNode {
  const context = createParserContext(content, options) // Create a parsed context object
  const start = getCursor(context) // Generate cursor information that records the parsing process
  return createRoot( // Generate and return the root node
    parseChildren(context, TextModes.DATA, []), // Parse the child as the children property of the root node
    getSelection(context, start)
  )
}
Copy the code

I have added annotations to the baseParse functions to make it easier for you to understand what each function does. First, the parse context is created, and then the cursor information is obtained based on the context. Since it has not been parsed, So the column, line, and offset attributes in the cursor correspond to the starting position of template. After that, the root node is created and returned, and the AST tree is generated and parsed.

Create the root node of the AST

export function createRoot(children: TemplateChildNode[], loc = locStub) :RootNode {
  return {
    type: NodeTypes.ROOT,
    children,
    helpers: [].components: [].directives: [].hoists: [].imports: [].cached: 0.temps: 0.codegenNode: undefined,
    loc
  }
}
Copy the code

If we look at the code for the createRoot function, we can see that it simply returns a root object of type RootNode, where the children argument we pass in will be the children argument of the RootNode. This is pretty straightforward, but think of it as a tree data structure. So the key to generating an AST will focus on the parseChildren function. The parseChildren function is a function that parses child nodes. Let’s take a look at the most critical parseChildren functions in AST parsing. As usual, I’ll simplify the logic inside the function to help you understand.

Parse child node

function parseChildren(context: ParserContext, mode: TextModes, ancestors: ElementNode[]) :TemplateChildNode[] {
  const parent = last(ancestors) // Get the parent of the current node
  const ns = parent ? parent.ns : Namespaces.HTML
  const nodes: TemplateChildNode[] = [] // Store the parsed node

  // If the label is not closed, parse the corresponding node
  while(! isEnd(context, mode, ancestors)) {/* Ignore logic */}

  // Handle whitespace characters to improve output efficiency
  let removedWhitespace = false
  if(mode ! == TextModes.RAWTEXT && mode ! == TextModes.RCDATA) {/* Ignore logic */}

  // Remove whitespace and return the parsed array of nodes
  return removedWhitespace ? nodes.filter(Boolean) : nodes
}
Copy the code

From the code above, you can see that the parseChildren function takes three arguments: context: parser context, mode: text data type, and ancestors: array of ancestor nodes. The function first gets the parent of the current node from the ancestor node, determines the namespace, and creates an empty array to store the parsed node. A while loop then determines whether the closing position of the tag has been reached, and if not, the source template string is sorted and parsed in the body of the loop. This is followed by a bit of logic to process whitespace characters, which returns the parsed array of Nodes. After you have a preliminary understanding of parseChildren’s execution process, let’s take a look at the core of the function, the logic inside the while loop.

In the while the parser determines the type of the text DATA and only continues parsing if TextModes are DATA or RCDATA.

The first is to determine whether you need to parse “Mustache” syntax (double curly braces) in the Vue template syntax. If there is no V-pre instruction in the current context to skip the expression, And source of the template string is we specify separator at the beginning of (this time context. The options. In the delimiters is a pair of curly braces), will be double braces parsing. As you can see here, if you do not want to use double braces as an expression interpolation when you have special requirements, just change the delimiters attribute in the option before compiling.

It will then determine if the first character is “<” and the second character is “! ‘will try to parse the comment tag,

It then determines that when the second character is a “/”, “</” already satisfies a closed label condition, so it tries to match the closed label. When the third character is “>” and the tag name is missing, an error is reported and the parser is advanced three characters, skipping “</>”.

If it starts with “</” and the third character is lowercase, the parser parses the closing tag.

If the first character of the source template string is “<” and the second character begins with a lowercase character, the parseElement function is called to parse the corresponding tag.

When the branch condition for string characters ends and no Node is parsed, node is parsed as a text type and parseText is called.

Finally, the generated node is added to the Nodes array and returned at the end of the function.

This is the logic inside the while loop, and is the most important part of parseChildren. In this process, we saw the parsing of the double brace syntax, how comment nodes are parsed, how opening and closing tags are parsed, and how text content is parsed. Simplified code in the box below, you can compare the above explanation, to understand the source. Of course, the source code comments are also very detailed yo.

while(! isEnd(context, mode, ancestors)) {const s = context.source
  let node: TemplateChildNode | TemplateChildNode[] | undefined = undefined

  if (mode === TextModes.DATA || mode === TextModes.RCDATA) {
    if(! context.inVPre && startsWith(s, context.options.delimiters[0]) {/* If the tag has no V-pre instruction, the source template string begins with double curly braces' {{', parsed with double curly braces */
      node = parseInterpolation(context, mode)
    } else if (mode === TextModes.DATA && s[0= = ='<') {
      // If the first character position of the source template string is'! `
      if (s[1= = ='! ') {
				// If '<! Start with ', parse by comment
        if (startsWith(s, '<! -- ')) {
          node = parseComment(context)
        } else if (startsWith(s, '
      )) {
					// If '
      
          node = parseBogusComment(context)
        } else if (startsWith(s, '
      )) {
          // If '
      
          if(ns ! == Namespaces.HTML) { node = parseCDATA(context, ancestors) } }// If the second character position of the source template string is '/'
      } else if (s[1= = ='/') {
        // If the third character position of the source template string is '>', then it is the self-closing tag, three characters ahead of the scan position
        if (s[2= = ='>') {
          emitError(context, ErrorCodes.MISSING_END_TAG_NAME, 2)
          advanceBy(context, 3)
          continue
        // If the third character position is an English character, parse the end tag
        } else if (/[a-z]/i.test(s[2])) {
          parseTag(context, TagType.End, parent)
          continue
        } else {
          // If this is not the case, parse as a pseudo-comment
          node = parseBogusComment(context)
        }
      // If the second character of the tag is a lowercase character, it is treated as an element tag
      } else if (/[a-z]/i.test(s[1])) {
        node = parseElement(context, ancestors)
        
      // If the second character is '? ', parsed as a pseudo-comment
      } else if (s[1= = ='? ') {
        node = parseBogusComment(context)
      } else {
        // The first character is not a valid label character.
        emitError(context, ErrorCodes.INVALID_FIRST_CHARACTER_OF_TAG_NAME, 1)}}}// If no corresponding node is created after the above situation is parsed, it will be parsed as text
  if(! node) { node = parseText(context, mode) }// If the node is an array, add it to the nodes array, otherwise add it directly
  if (isArray(node)) {
    for (let i = 0; i < node.length; i++) {
      pushNode(nodes, node[i])
    }
  } else {
    pushNode(nodes, node)
  }
}
Copy the code

Parse the template Element Element Element

In the while loop, each branch determines the branch, and we see that Node receives the return value of the parse function for each node type. I’ll go into detail here about parseElement, the function that parses elements, because this is the scenario we use most frequently in templates.

I first parseElement of the source simplified paste, and then to Lao Lao logic inside.

function parseElement(context: ParserContext, ancestors: ElementNode[]) :ElementNode | undefined {
  // Parse the start tag
  const parent = last(ancestors)
  const element = parseTag(context, TagType.Start, parent)
  
  // If it is a self-closing label or an empty label, it is returned directly. VoidTag For example: '', '

', '
'
if (element.isSelfClosing || context.options.isVoidTag(element.tag)) { return element } // Recursive parse child node ancestors.push(element) const mode = context.options.getTextMode(element, parent) const children = parseChildren(context, mode, ancestors) ancestors.pop() element.children = children // Parse the end tag if (startsWithEndTagOpen(context.source, element.tag)) { parseTag(context, TagType.End, parent) } else { emitError(context, ErrorCodes.X_MISSING_END_TAG, 0, element.loc.start) if (context.source.length === 0 && element.tag.toLowerCase() === 'script') { const first = children[0] if (first && startsWith(first.loc.source, '<! -- ')) { emitError(context, ErrorCodes.EOF_IN_SCRIPT_HTML_COMMENT_LIKE_TEXT) } } } // Get the label position object element.loc = getSelection(context, element.loc.start) return element } Copy the code

First we get the parent of the current node and call the parseTag function to resolve it.

The parseTag function is executed as follows:

  • First match the tag name.
  • Parse the attribute attribute in the element and store it in the props attribute
  • Checks for the presence of the V-pre directive, and if so, changes the inVPre property in the context to true
  • Detect the self-closing label. If the label is self-closing, set the isSelfClosing attribute to true
  • Determine whether a tagType is an ELEMENT, COMPONENT, or SLOT
  • Returns the generated Element object

For space reasons, I will not post the source code of parseTag here, interested students can check.

Upon obtaining an Element object, it determines whether the element is a self-closing tag or an empty tag, such as ,

,


, and returns the Element object directly in this case.

We then try to parse the element’s child nodes, pushing the element onto the stack, and recursively calling parseChildren to parse the child nodes.

const parent = last(ancestors)
Copy the code

Looking back at parseChildren and this line of code in parseElement, we can see that after we push element onto the stack, the parent node we get is the current node. After parsing, call ancestors. Pop () to remove the element object of the currently parsed child node from the stack and assign the parsed children object to the Children attribute of element to complete element child node parsing. It’s a very clever design here.

Finally, the end tag is matched, element’s LOC location is set, and the parsed Element object is returned.

Example: template element parsing

Please take a look at the template we want to parse below. The picture shows the storage condition of the stack that holds the parsed node during the parsing process.

<div>
  <p>Hello World</p>
</div>
Copy the code

The yellow rectangle in the figure is a stack, and when parsing begins, parseChildren first encounters the div tag that initiates the call to the parseElement function. The div element is parsed out using the parseTag function and pushed onto the stack, recursively parsing the child nodes. The second call to the parseChildren function, when it sees a P element, calls the parseElement function and pushes the P tag onto the stack, which contains both div and P tags. The child node in P is parsed again, and the parseChildren tag is called a third time. This time, no labels will be matched and no node will be generated, so the text will be generated using the parseText function, which will parse Node to HelloWorld and return node.

After the text node is added to the children attribute of the P tag, the child node of the P tag is resolved, and the ancestor stack pops up. After the end tag is resolved, the corresponding Element object of the P tag is returned.

The node corresponding to the p tag is generated and returned in the parseChildren function.

The div tag receives the node with the P tag and adds it to the children property. At this point, the ancestor stack is empty. The div tag completes the logic of closed parsing and returns the Element.

Finally, the first call to parseChildren returns the result, generating the node object corresponding to the div, as well as the result, which is passed in as the children argument to the createRoot function to generate the root node object and complete ast parsing.

Afterword.

This article takes a detailed look at the execution of one of the specific parsers in the parseChildren child function, from the baseParse function that is called when the AST is generated to the result of the baseParse call that returns createRoot. Finally, through a simple template example, see how the Vue parser is to parse and analyze the ancestral stack of the situation, a more comprehensive explanation of the parser workflow.

If this article helps you understand the parser workflow in Vue3, please give it a thumbs up. ❤ ️