The origin of writing this article lies in the analysis of parseHTML method in Vue source code, found on the Internet to the regular analysis of the article is less, found a few articles also some vague language. So calm down to analyze the expression one by one of the regular, for future reference.

Common regex rules can be found in Appendix 1, where the rules used for Vue parseHTML regex can be defined.

All the res used in Vue parseHTML are as follows:

const attribute = /^\s*([^\s"'<>\/=]+)(? :\s*(=)\s*(? :"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))? /
const dynamicArgAttribute = /^\s*((? :v-[\w-]+:|@|:|#)\[[^=]+\][^\s"'<>\/=]*)(? :\s*(=)\s*(? :"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))? /
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeRegExp.source}] * `
const qnameCapture = ` ((? :${ncname}\ \ :)?${ncname}) `
const startTagOpen = new RegExp(` ^ <${qnameCapture}`)
const startTagClose = /^\s*(\/?) >/
const endTag = new RegExp(` ^ < \ \ /${qnameCapture}[^ >] * > `)
const doctype = / ^ 
      ]+>/i
const comment = / ^ 
      
const conditionalComment = / ^ 
      
Copy the code

The next step is to analyze the above regular rules by breaking down the expressions one by one.

attribute

const attribute = /^\s*([^\s"'<>\/=]+)(? :\s*(=)\s*(? :"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))? /
Copy the code

Analysis of its structure:

  1. ^\s* Matches up to 0 whitespace parts of the string starting with whitespace characters

  2. Capture group: “‘ < > \ [^ \ s / =] +) matches and capture 1 to several times in addition to the blank characters” ‘< > / = all characters

  3. Non-capture group :(? :\s*(=)\s*(? :”([^”])”+|'([^’])’+|([^\s”‘=<>`]+)))?

  • \s*Matches 0 to a maximum of whitespace characters
  • Capture groups:(=)Match and capture=
  • \s*Matches 0 to a maximum of whitespace characters
  • Non-capture group:(? :"([^"]*)"+|'([^']*)'+|([^\s"'=<>\`] +))
    • "] *) "" ([^ +
      • "matching"
      • ([^ "] *)Matches and catches zero or more divisions"Outside of the character
      • "+Matches 1 to multiple times"
    • '([^'] *) +
      • 'matching'
      • ([^ '] *)Matches and catches zero or more divisions'Outside of the character
      • '+Matches 1 to multiple times'
    • ([^\s”‘=<> ‘]+) matches and catches 1 to multiple divisionsWhite space characters " ' = < >Characters outside ‘
  • ?Match 3 non-capture group 0 or 1 times

summary

The attribute expression matches:

  1. Starts with 0 or more whitespace characters;

  2. Immediately following 1, at most characters other than the whitespace character “‘ < > / =;

  3. Followed by 0 or more whitespace characters;

  4. And then =;

  5. Followed by 0 or more whitespace characters;

  6. Followed by 0 or 1 match:

    (1) “+ 0 at most characters other than” + “;

    (2) or ‘+ 0 at most characters other than’ + ‘;

    (3) or 1 to more than one character except for the whitespace character “‘ = < > ‘

Such as:

<div id="mydiv" class="myClass" style="color: #ff0000" >
Copy the code

Id =”mydiv”, class=”myClass”, style=”color: #ff0000″

dynamicArgAttribute

const dynamicArgAttribute = /^\s*((? :v-[\w-]+:|@|:|#)\[[^=]+\][^\s"'<>\/=]*)(? :\s*(=)\s*(? :"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))? /
Copy the code

Analysis of its structure:

  1. ^\s* matches matches that start with 0 or more whitespace characters

  2. Capture group :((? :v-[\w-]+:|@|:|#)\[[^=]+\][^\s”‘<>\/=]*)

  • Non-capture group :(? : v – [/ w] + : | @ | : | #) match:

    (1) V – + 1 or more of any word character including underscore + :

    (2) or at sign

    (3) or:

    (4) or #

  • \[[^=]+\] Matches [+ 1 or more times for all characters except = +]

  • [^\s”‘<>\/=]* Matches zero or more characters other than whitespace characters, “, ‘, <, >, /, =

  1. Non-capture group :(? :\s*(=)\s*(? : “([^”]) “+ | ‘([^’])’+|([^\s”‘=<>`]+)))?

You have analyzed the attributes in the Attribute section.

summary

DynamicArgAttribute is used to match:

  1. Starts with 0 or more whitespace characters

  2. Followed by:

    (1) V – + 1 or more of any word character including underscore + :;

    (2) or at sign

    (3) or:

    (4) or #

  3. Followed by [+ 1 or more times for all characters except = +]

  4. Matches 0 or more characters except blank characters, “, ‘, <, >, /, =

  5. Followed by 0 or more whitespace characters;

  6. And then =;

  7. Followed by 0 or more whitespace characters;

  8. Followed by 0 or 1 match:

    (1) “+ 0 at most characters other than” + “;

    (2) or ‘+ 0 at most characters other than’ + ‘;

    (3) or 1 to more than one character except for the whitespace character “‘ = < > ‘

Such as:

<a v-bind:[attributeName] ="url">.</a>
Copy the code
<a v-on:[eventName] ="doSomething">.</a>
Copy the code

In Vue’s parseHTML, the dynamic parameter V-bind :[attributeName]=” URL “is extracted.

ncname

const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z${unicodeRegExp.source}] * `
Copy the code

First of all see unicodeRegExp

const unicodeRegExp = /a-zA-Z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-\u200D\u203F-\u2040\u2070-\u218F\u2C00-\u2FEF\u3 001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD/
Copy the code

Defines a range of legal characters that match across the Unicode character set range.

Unicoderegexp. source Used to obtain the character string of the regular expression unicodeRegExp.

Ncname is a set of legal characters.

qnameCapture

const qnameCapture = ` ((? :${ncname}\ \ :)?${ncname}) `
Copy the code

A character that matches the XXX: XXX or XXX mode.

startTagOpen

const startTagOpen = new RegExp(` ^ <${qnameCapture}`)
Copy the code

StartTagOpen matches the start of the tag, which is: < XXX: XXX or < XXX pattern.

< XXX: XXX represents an HTML tag with a namespace. The main function of this type of tag is to specify the namespace of the tag to avoid conflicts. Vue also parses this type of tag.

Such as

startTagClose

const startTagClose = /^\s*(\/?) >/
Copy the code

^\s*(\/?) > matches strings that begin with 0 or more whitespace characters followed by 0 or 1 slash, followed by >.

Such as: / >

endTag

const endTag = new RegExp(` ^ < \ \ /${qnameCapture}[^ >] * > `)
Copy the code

Matches a match that starts with
, and is followed by >.

Such as: < / div >

doctype

const doctype = / ^ 
      ]+>/i
Copy the code

Match with
, followed by >. Note that this matching pattern is case insensitive.

Such as:

comment

const comment = / ^ 
      
Copy the code

Match with

Such as:

conditionalComment

const conditionalComment = / ^ 
      
Copy the code

Match with

conclusion

This article takes parseHTML method in Vue source code as an example, analyzes the regular expression defined in it, and combs out the common regular rules. At the same time, we can refer to parseHTML method’s regular rules for further analysis of the method.

Appendix 1 Common re rules

A special character in a re

*? +. [] () {} | ^ $, a total of 13.Copy the code
  1. *Matches 0 or more previous characters (or bracketed expressions, or square bracketed character sets);
  2. ?(1)? 0 or 1 matches the preceding character (or bracketed expression, or square bracketed character set); (2)? When following any other restrictor (such as * +, etc.), the matching mode is non-greedy and matches as little as possible.
  3. +Matches the preceding character (or an expression in parentheses, or a character set in square brackets) 1 or more times;
  4. .Matches any character except newline once.
  5. []Character set, which matches one of the characters in square brackets. Special characters are treated as ordinary characters.
  6. (a)Capture group, which matches the subexpression within it;
  7. {n,m}Match the preceding expression at least n times and at most m times
  8. |Or, often used in capture groups;
  9. ^Matches the beginning position of the string
  10. $Matches the end position of the string

expression(a)The common way of writing

  1. (pattern)Match pattern and get the match;
  2. (? :pattern)Matches pattern but does not get the match;
  3. pattern1(? =pattern2)Matches the string pattern1 followed by Pattern2 and does not get the match
  4. pattern1(? ! pattern2)Matches the string pattern1 that is not followed by Pattern2 and does not get the match
  5. (? <=pattern2)pattern1Matches the string preceded by Pattern2 and does not get the match
  6. (? <! pattern2)pattern1Matches the string that is not preceded by Pattern1 and does not get the match

Character set[]Common writing

  1. [x|y]Matches x or y and can be a string
  2. [xyz]Matches x or Y or Z characters
  3. [^xyz]Matches characters other than X or Y or Z
  4. [a-z]Matches any lowercase character from a to Z

Common special characters

  1. \bMatching word boundaries
  2. \dMatches a numeric character
  3. \nMatches a newline character
  4. \rMatches a carriage return
  5. \tMatches TAB characters
  6. \sMatches any whitespace character
  7. \wMatches any word character including underscores