Vue-next /packages/compile-core/ SRC /parse.ts

Finally, I am ready to write. In fact, parse is not complicated or difficult to understand due to the laziness. Personally, IT is more about flow control

Look at the

The template string is actually a long string, which means that we can’t structurally analyze tag nesting directly, which is really important, because nesting tags, nesting content is a very, very common operation, so you have to be careful about it, but on second thought, There is only one case where you need to recursively call parseChildren, and that is when a tag is nested within a tag

In fact, this would have been a flow chart, but I found that I can’t draw it, so dictating

  • Element nodes: Element nodes are the most complex to parse because label nodes, attribute/instruction nodes, and content nodes need to be resolved, and child nodes need to be mounted recursively
  • Text node: Text node parsing is very simple, take the text content into a node to return on the line
  • Interpolation nodes: Interpolation nodes are very similar to text nodes, and are very simple. You can take the expression/content and return it as a node

Write about the

We’ve done a lot of preparatory work above, and this article is very easy to implement, starting directly with parseChildren

parseChildren

The beginning of this step is to compile true, we need to compile the template string on the inside, shall all tags, attributes, instruction, content and expression interpolation, and encapsulated into a node in a array to return to the above mentioned also, because the code will produce a large number of nested domain, will need to use the circulation of recursive way to control the process, implementation is as follows

const parseChildren = context= > {
    const nodes = [];

    while(! isEnd(context)) {const s = context.source;

        let node;

        // This is a simplification
        // There is a long string of if else if else
        // But most of them are for example
        / / '<! , '<! DOCTYPE' '<! [CDATA['
        // There is also a lot of fault tolerance

        // start with < for elements
        if (s[0= = ='<') {
            node = parseElement(context);
        }

        // an interpolation expression begins with {{
        else if (startsWith(s, '{{')) {
            node = parseInterpolation(context);
        }

        // Otherwise text nodes
        else {
            node = parseText(context);
        }

        // The source code looks like this
        // If none of the above is satisfied, parseText is used
        // if (! node) {
        // node = parseText(context);
        // }

        PushNode () {pushNode (); pushNode ()
        nodes.push(node);
    }
    return nodes;
}
Copy the code

The code here is actually easy to understand, before the string is parsed, loop read, and determine the type of the next node based on the first character at the beginning, call the corresponding method to parse and push into the array

Parsing element node

The first node in a string is usually an element node, and all we need to do here is parse out the contents of the node, including tag names, self-closing, attributes, directives, etc., and return it mounted on an Element node

parseElement

Regardless of nesting, a simple tag looks like this

<div class="a" v-bind:b="c">parse {{ element }}</div>
Copy the code

We can write a parseTag that resolves the tag name and returns a node of ELEMENT type. The parseTag also resolves the tag’s attributes and directives. Used to parse properties/directives and mount them to the ELEMENT node to be returned. The text content is easily handled by recursively calling parseChildren

Do we need to deal with the closing tag ?

In fact, it must be handled, because otherwise the context.source would start with a , which means that parsing will not continue, so we have to handle it, but we don’t have to parse it, because the closed tag doesn’t have to exist as a single node, just split it up

What about self-closing tags, such as

?

In fact, it has been mentioned above that an isSelfClosing is defined to indicate whether the node is a self-closing label, and the self-closing label does not have child nodes or a closed label, so it only needs to add a judgment layer

To achieve the following

// Parse the element node
const parseElement = context= > {
    const element = parseTag(context);

    // If it is a self-closing label, there is no need to parse the child node and the closing label
    // But 

is legal, and

is also legal
// Use isVoidTag if (element.isSelfClosing || isVoidTag(element.tag)) { return element; } element.children = parseChildren(context); // Just split the closing tag , so don't receive it parseTag(context); return element; } Copy the code

parseTag

ParseTag returns a node of ELEMENT type and mounts the required attributes to the node. ParseTag returns a node of ELEMENT type and mounts the required attributes to the node

// Parse the tag content
// It takes like this
// <div class="a" v-bind:b="c">parse {{ element }}</div>
const parseTag = context= > {
    // This re is explained below
    const tagReg = / ^ < \ /? ([a-z][^\t\r\n\f />]*)/i;

    // match is ['
    const match = tagReg.exec(context.source);
    const tag = match[1];

    advanceBy(context, match[0].length);
    advanceSpaces(context);

    / / at this point the context. The source
    // class="a" v-bind:b="c">parse {{ element }}</div>

    // parseAttributes is implemented below
    const { props, directives } = parseAttributes(context);

    // Context. source becomes
    // >parse {{ element }}</div>

    const isSelfClosing = startsWith(context.source, '/ >');

    // Split "/>" or ">"
    advanceBy(context, isSelfClosing ? 2 : 1);

    // Determine whether it is a component or a native element
    const tagType = isHTMLTag(tag)
        ? ElementTypes.ELEMENT
        : ElementTypes.COMPONENT;

    return {
        type: NodeTypes.ELEMENT,
        tag,
        tagType,
        props,
        directives,
        isSelfClosing,
        children: [],}; }Copy the code

Explain the /^<\/ re used to match tag names. ([a-z] [^ t \ r \ n \ \ f / >] *)/I, here USES a matching to capture groups to cache content, is ([a-z] \ r \ n \ [^ \ t f / >] *), the mean matching lowercase letter and are not blank characters, / > any number of characters, The first item of the match array is

parseAttributes

As is evident from the above, parseAttributes returns an object containing the props and directives properties and splits the properties/directives completely to parse out the data

// Parse all attributes
// It takes like this
// class="a" v-bind:b="c">parse {{ element }}</div>
const parseAttributes = context= > {
    const props = [];
    const directives = [];

    // loop parsing
    // stop parsing if ">" or "/>" or context.source is an empty string
    while (
        context.source.length > 0 &&
        !startsWith(context.source, '>') &&
        !startsWith(context.source, '/ >')) {/ / before invoking it
        // class="a" v-bind:b="c">parse {{ element }}</div>
        // parseAttributes is implemented below
        const attr = parseAttribute(context);
        / / after the call
        // v-bind:b="c">parse {{ element }}</div>

        if (attr.type === NodeTypes.DIRECTIVE) {
            directives.push(attr);
        } else{ props.push(attr); }}return { props, directives };
}
Copy the code

There may be more than one attribute and more than one instruction, so the loop is parsed and the type attribute of the parsed node attr is used to determine whether it belongs to an instruction or an attribute

parseAttribute

From the previous step, parseAttribute parses a single attribute of the form a=”b” and returns it as a node.

The most obvious difference between attributes and directives is that all directive names start with V – or some special symbols, such as:, @, #, etc. Therefore, only the name of this attribute needs to be processed, so there is a sequencing problem, as follows

// Parse a single attribute
// It takes like this
// class="a" v-bind:b="c">parse {{ element }}</div>
const parseAttribute = context= > {
    // Matches the re of the attribute name
    const namesReg = /^[^\t\r\n\f />][^\t\r\n\f />=]*/;

    // match is ["class"]
    const match = namesReg.exec(context.source);
    const name = match[0];

    // Split the attribute name
    advanceBy(context, name.length);
    advanceSpaces(context);
    // context.source
    // ="a" v-bind:b="c">parse {{ element }}</div>

    let value;
    if (startsWith(context.source, '=')) {
        // The source code is matched with the previous space
        // The first character is =
        // The source code looks like this
        // if (/^[\t\r\n\f ]*=/.test(context.source)) {
        // advanceSpaces(context);

        advanceBy(context, 1);
        advanceSpaces(context);

        // Parse the attribute values
        // implement later
        / / before invoking it
        // "a" v-bind:b="c">parse {{ element }}</div>
        value = parseAttributeValue(context);
        advanceSpaces(context);
        / / after the call
        // v-bind:b="c">parse {{ element }}</div>
    }

    // TODO

    // Attribute
    return {
        type: NodeTypes.ATTRIBUTE,
        name,
        value: value && {
            type: NodeTypes.TEXT,
            content: value,
        },
    };
}
Copy the code

As you can see from the code above, we parsed the property, obtained the property name and value respectively and separated them from the source code. Then it would be very easy to process the instruction after obtaining the property name and value, and start writing in the TODO location I reserved

// Parse a single attribute
const parseAttribute = context= > {
    // Get the attribute name and the attribute value
    / /...

    if (/^(:|@|v-[A-Za-z0-9-])/.test(name)) {
        let dirName, argContent;

        // <div :a="b" />
        if (startsWith(name, ':')) {
            dirName = 'bind';
            argContent = name.slice(1);
        }

        // <div @click="a" />
        else if (startsWith(name, The '@')) {
            dirName = 'on';
            argContent = name.slice(1);
        }

        // <div v-bind:a="b" />
        else if (startsWith(name, 'v-')) {
            [dirName, argContent] = name.slice(2).split(':');
        }

        // return the instruction node
        return {
            type: NodeTypes.DIRECTIVE,
            name: dirName,
            exp: value && {
                type: NodeTypes.SIMPLE_EXPRESSION,
                content: value,
                isStatic: false,},arg: argContent && {
                type: NodeTypes.SIMPLE_EXPRESSION,
                content: argContent,
                isStatic: true,}}; }/ /...
}
Copy the code

Here you can see, at the beginning of the regular expression / ^ (: | @ | v – [A Za – z0-9 -])/will match:, @, or v – at the beginning of the content, matching to the instructions, then according to the characters of the beginning of judgment, according to the different types of obtaining dirName and argContent, It then returns a DIRECTIVE node of type DIRECTIVE

parseAttributeValue

All that’s left of parseElement is a parseAttributeValue, which just takes the value of the attribute, as follows

// Get the attribute value
// It looks like this when it comes in
// "a" v-bind:b="c">parse {{ element }}</div>
const parseAttributeValue = context= > {
    // Get the first part of the quotes
    const quote = context.source[0];

    // Split the first part of the quotes
    // a" v-bind:b="c">parse {{ element }}</div>
    advanceBy(context, 1);

    // Find the matching closing quote
    const endIndex = context.source.indexOf(quote);

    // Get the attribute value
    const content = parseTextData(context, endIndex);

    // Split the part before the closing quotation mark
    advanceBy(context, 1);

    return content;
}
Copy the code

We have done the parseElement process, then we have parseText and parseInterpolation nodes

Parsing text nodes

ParseText = parseText = parseText = parseText = parseText = parseText

parse {{ element }}</div>
parse</div>
Copy the code

All we need to parse is the string parse, which means that all we need to parse is an interpolation or a closed tag. We can use two endtokens to mark the end of the parse, < and {{, which means to stop parsing at either of these two objects. And it should be the first one, so the implementation will look like this

// Parse text nodes
// It looks like this when it comes in
// parse {{ element }}</div>
const parseText = context= > {
    // Two end flags
    const endTokens = ['<'.'{{'];
    let endIndex = context.source.length;

    for (let i = 0; i < endTokens.length; i++) {
        // find the end tag
        const index = context.source.indexOf(endTokens[i]);

        // find the first closing flag
        if(index ! = = -1&& index < endIndex) { endIndex = index; }}// Separate out everything before the closing tag
    const content = parseTextData(context, endIndex);

    return {
        type: NodeTypes.TEXT,
        content,
    };
}
Copy the code

So that’s it, but the key is this little detail, you need to find the first one, which is to just take the minimum value of index and assign it to endIndex, otherwise it’s buggy

Analytic interpolation expression

The interpolation is parsed much like a parseAttributeValue, taking a” B “from an” ABA “structure and writing it directly

// Parse the interpolation
// It looks like this when it comes in
// {{ element }}</div>
function parseInterpolation(context) {
    const [open, close] = ['{{'.'}} '];

    advanceBy(context, open.length);
    // change to
    // element }}

    // find the index of "}}"
    const closeIndex = context.source.indexOf(close, open.length);

    const content = parseTextData(context, closeIndex).trim();
    advanceBy(context, close.length);
    // change to
    // </div>

    return {
        type: NodeTypes.INTERPOLATION,
        content: {
            type: NodeTypes.SIMPLE_EXPRESSION,
            isStatic: false,
            content,
        },
    };
}
Copy the code

conclusion

What Parse does is not hard to understand. It parses the information repeatedly, then deletes it and parses it again, but it may seem confusing at first. Personally, I don’t think I can understand it completely after reading this article. So it’s highly recommended that you copy the code and debug it, monitor the context.source, and see what happens to the context.source as it goes along

Because I hope to explain the change of context. Source as much as possible, the above writing steps write a lot of comments, and this part of the code a function to adjust a function and then adjust a function of the doll, so the organization of the text is not rigorous, the following directly to this part of the source code, really debug once can all understand

// compiler/parse.js
const createParseContext = content= > {
    return {
        source: content,
    };
}

const baseParse = content= > {
    const context = createParseContext(content);
    return createRoot(parseChildren(context));
}

const parseChildren = context= > {
    const nodes = [];

    while(! isEnd(context)) {const s = context.source;
        let node;
        if (startsWith(s, '<')) {
            node = parseElement(context);
        } else if (startsWith(s, '{{')) {
            node = parseInterpolation(context);
        } else {
            node = parseText(context);
        }

        nodes.push(node);
    }
    return nodes;
}

const parseElement = context= > {
    const element = parseTag(context);

    if (element.isSelfClosing || isVoidTag(element.tag)) return element;
    
    element.children = parseChildren(context);

    parseTag(context);

    return element;
}

const parseTag = context= > {
    const match = / ^ < \ /? ([a-z][^\t\r\n\f />]*)/i.exec(context.source);
    const tag = match[1];
    advanceBy(context, match[0].length);
    advanceSpaces(context);

    const { props, directives } = parseAttributes(context);

    const isSelfClosing = startsWith(context.source, '/ >');
    advanceBy(context, isSelfClosing ? 2 : 1);

    const tagType = isHTMLTag(tag)
        ? ElementTypes.ELEMENT
        : ElementTypes.COMPONENT;

    return {
        type: NodeTypes.ELEMENT,
        tag,
        tagType,
        props,
        directives,
        isSelfClosing,
        children: [],}; }const parseAttributes = context= > {
    const props = [];
    const directives = [];

    while (
        context.source.length > 0 &&
        !startsWith(context.source, '>') &&
        !startsWith(context.source, '/ >')) {const attr = parseAttribute(context);
        if (attr.type === NodeTypes.DIRECTIVE) {
            directives.push(attr);
        } else{ props.push(attr); }}return { props, directives };
}

const parseAttribute = context= > {
    const match = /^[^\t\r\n\f />][^\t\r\n\f />=]*/.exec(context.source);
    const name = match[0];
    advanceBy(context, name.length);
    advanceSpaces(context);

    let value;
    if (startsWith(context.source, '=')) {
        advanceBy(context, 1);
        advanceSpaces(context);

        value = parseAttributeValue(context);
        advanceSpaces(context);
    }

    if (/^(:|@|v-[A-Za-z0-9-])/.test(name)) {
        let dirName, argContent;
        if (startsWith(name, ':')) {
            dirName = 'bind';
            argContent = name.slice(1);
        } else if (startsWith(name, The '@')) {
            dirName = 'on';
            argContent = name.slice(1);
        } else if (startsWith(name, 'v-')) {
            [dirName, argContent] = name.slice(2).split(':');
        }

        return {
            type: NodeTypes.DIRECTIVE,
            name: dirName,
            exp: value && {
                type: NodeTypes.SIMPLE_EXPRESSION,
                content: value,
                isStatic: false,},arg: argContent && {
                type: NodeTypes.SIMPLE_EXPRESSION,
                content: argContent,
                isStatic: true,}}; }return {
        type: NodeTypes.ATTRIBUTE,
        name,
        value: value && {
            type: NodeTypes.TEXT,
            content: value,
        },
    };
}


const parseAttributeValue = context= > {
    const quote = context.source[0];
    advanceBy(context, 1);

    const endIndex = context.source.indexOf(quote);

    const content = parseTextData(context, endIndex);

    advanceBy(context, 1);

    return content;
}

const parseText = context= > {
    const endTokens = ['<'.'{{'];
    let endIndex = context.source.length;

    for (let i = 0; i < endTokens.length; i++) {
        const index = context.source.indexOf(endTokens[i]);
        if(index ! = = -1&& index < endIndex) { endIndex = index; }}const content = parseTextData(context, endIndex);

    return {
        type: NodeTypes.TEXT,
        content,
    };
}

function parseInterpolation(context) {
    const [open, close] = ['{{'.'}} '];
    advanceBy(context, open.length);

    const closeIndex = context.source.indexOf(close, open.length);

    const content = parseTextData(context, closeIndex).trim();
    advanceBy(context, close.length);

    return {
        type: NodeTypes.INTERPOLATION,
        content: {
            type: NodeTypes.SIMPLE_EXPRESSION,
            isStatic: false,
            content,
        },
    };
}

// utils
const advanceBy = (context, numberOfCharacters) = > {
    const { source } = context;
    context.source = source.slice(numberOfCharacters);
}

const advanceSpaces = context= > {
    const spacesReg = /^[\t\r\n\f ]+/;
    const match = spacesReg.exec(context.source);
    if (match) {
        advanceBy(context, match[0].length); }}const startsWith = (source, searchString) = > {
    return source.startsWith(searchString);
}

const isEnd = context= > {
    const s = context.source;
    return! s || startsWith(s,'< /');
}

const parseTextData = (context, length) = > {
    const rawText = context.source.slice(0, length);
    advanceBy(context, length);
    return rawText;
}
Copy the code