Sizzle source analysis discussion (I): Regular expressions

preface

This is my first article. Recently I read Stu’s javascript Framework Design and saw Sizzle here. I wanted to see the source code. So I’m going to write some articles and read the Sizzle from cover to cover. As a small white front, there must be a lot of understand, I hope you guys can help me with the wind storm. I also annotated the source code on GitHub: Under construction. Okay, let’s cut to the chase.

Regular expressions

Regular divided into two ways to write, one is literal; One is the RegExp constructor. When using constructor mode, the string passed in needs to be parsed, as in:

// This is literal
var reg1 = / \ \ /;
// This is the constructor
var reg2 = new RegExp('\ \ \ \');
Copy the code

The two re shown above match the same character: one \, but the string passed to the constructor takes four \ because it needs to be parsed once when passed to the constructor. You can go to the chapter on Regular forms in the Little Red Book. Regexes that use this parsing pattern can be particularly troublesome to read. That’s the kind of analysis that Sizzle uses.

The Sizzle of regular

The regexes in Sizze are mainly used to match selectors, to distinguish what type of selectors they are, and to classify them. It takes a few basic regular strings and finally combines them together, then instantiates them using the constructor, new.

Since the Sizzle regularization is all in the constructor pattern, I’ll comment a literal pattern below its code for easy reading.

Booleans starts with the simplest, booleans is just a bunch of orses, or in this case, pseudo-classes with no value. For example: :checked :disabled, instead of matching values, such as: :nth-child(3).

booleans = "checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|" +
	    "ismap|loop|multiple|open|readonly|required|scoped".Copy the code

This is the whitespace written by Sizzle herself, the difference from \s is that one \ V is missing.

I am not quite sure why there is a \ V missing here. If there is someone who knows, please inform me. Thank you.

whitespace = "[\\x20\\t\\r\\n\\f]"

/ / literal
copy_whilespace = '/[\x20\t\r\n\f]/'
Copy the code

\x20 represents the 20th ASCII hexadecimal character, which is the space
\t indicates TAB
\r indicates the carriage return character
\n indicates a newline character
\f indicates a page feed character

This should be the core of the Sizzle regular expression, which matches the class in class, which matches the nth-child in nth-child(3).

identifier = "(? : \ \ \ \ [\ da - fA - F] {1, 6})" + whitespace +
    "? |\\\\[^\\r\\n\\f]|[\\w-]|[^\0-\\x7f])+"

/ / literal
copy_identifier = '/ (? : \ \ [\ da - fA - F] {1, 6} [\ x20 \ r \ n \ \ t F]? |\\[^\r\n\f]|[\w-]|[^\x00-\x7f])+/'
Copy the code

First the identifier is a non-fetch capture, and the identifier matches four cases.

\ \ [\ da - fA - F] {1, 6} [\ x20 \ r \ n \ \ t F]?An ASCII table is matched. Such as\ 61The reason why{1, 6}, should support binary.
\\[^\r\n\f]Matching is\And is not\r\n\fAny visible or invisible character of, such as\a.
[\w-]Matches any visible character or-.
[^\x00-\x7f]Matches not all characters in the ASCII table

This means that things like #.()~=! $and so on are not going to be matched.

[attr $= ‘val’]; [attr $= ‘val’]; [attr $= ‘val’];

attributes = "\ \ [" + whitespace + "* (" + identifier + (")? :" + whitespace +
	"* ([* ^ $|! ~]? =)" + whitespace +
	"* (? : '((? : \ \ \ \ [^ \ \ \ \ | | \ ']) *) '" ((? : \ \ \ \ | [^ \ \ \ \ \ \ ""]) *) | (" + identifier + "))|)" +
	whitespace + "* \ \]";

// The literal is too long to replace the identifier and whitespace with a regular
// Change lines for better reading
copy_attributes = `\[whitespace*(identifier) (?:whitespace*([*^$|!~]?=)whitesapce* (?: '((?:\\.|[^\\'])*)'| "((?:\\.|[^\\\"])*)"| (identifier) ) |)whitespace*\]`;

// The real thing
true_attributes =  / \ [[\ x20 \ t \ n \ r \] f * ((? : \ \ [\ da - fA - f] {1, 6} [\ x20 \ r \ n \ \ t f]? | \ \ [^ \ r \ n \] f | | [\ w -] [^ \ x00 - \ x7f]) +) (? : [\ x20 \ t \ r \ n \] f * ([* ^ $|! ~]? =) [\ X20 \ t \ n \ r \] f * (? : '((? : \ \ [^ \ \ | | "']) *) '(? : \ \ [^ \ \ | "]) *) "| ((? : \ \ [\ da - fA - f] {1, 6} [\ x20 \ r \ n \ \ t f]? | \ \ [^ \ r \ n \] f | | [\ w -] [^ \ x00 - \ x 7f])+))|)[\x20\t\r\n\f]*\]/
Copy the code

Attribute regular An attribute selector that matches two states: one with an equal sign [attr=val] and one without an equal sign [attr]. To distinguish these two is the outermost layer of the non capturing group or “|”. Attribute regularization is complicated, but it is clear if you look at it in terms of capture groups. There are five capture groups. Except for the first attribute name, which must have a value, the rest of the capture groups must meet certain conditions:

The $1(identifier): The property name.
$2 ([* ^ $|! ~]? =):The operator. If it is[attr]In this case, the value is null
$3 '((? : \ \ [^ \ \ | ']) *) ':Attribute values enclosed in single quotes.[attr='val']theval
$4 "((? : \ \ [^ \ \ \ | "]) *)":Attribute values enclosed in double quotes.[attr="val"]theval
A $5(identifier):There are no quoted values.[attr=val]theval

There is also a difference between values that match in quotes and values that do not match in quotes. Such as[class=".aa"]It’s a match, but[class=.aa]You can’t.

Pseudos matches a pseudos selector

pseudos = "(" + identifier + (")? : \ \ ((" +
	"(" ((? : \ \ \ \ [^ \ \ \ \ | | \ ']) *) '" ((? : \ \ \ \ | [^ \ \ \ \ \ \ ""]) *)) |" +
	"((? : \ \ \ \ | [^ \ \ \ \ [\ \]] () |" + attributes + ") *) |" +
	". *" +
	| \ \ ")))";
	
/ / literal
copy_pseudos = `:(identifier) (? : \ ((((('? : \ \ [^ \ \ | ']) *) | "((? : \ \ [^ \ \ | | "]) *) ") ((? :\\.|[^\\()[\]]|attributes)*)| .* )\)|)`

/ / all
true_pseudos = / : ((? : \ \ [\ da - fA - F] {1, 6} [\ x20 \ r \ n \ \ t F]? |\\[^\r\n\f]|[\w-]|[^\x00-\x7f])+)(? : \ ((((('? : \ \ [^ \ \ | ']) *) | "((? : \ \ [^ \ \ | | "]) *) ") ((? : \ \ | [^ \ \ [\]] () | \ [[\ x20 \ t \ r \ n \] f * ((? : \ \ [\ da - fA - f] {1, 6} [\ x20 \ r \ n \ \ t f]? | \ \ [^ \ r \ n \] f | | [\ w -] [^ \ x00 - \ x7f]) +) (? : [\ x20 \ r \ n \ \ t f] * ([* ^ $|! ~]? =) [\ x20 \ t \ r \ n \] f * (? : '((? : \ \ [^ \ \ |']) *) | "((? : \ \ [^ \ \ |"]) *) "| ((? : \ \ [\ da - fA - f] {1, 6} [\ x20 \ r \ n \ \ t f]? | \ \ [^ \ r \ n \] f | [\w-]|[^\x00-\x7f])+))|)[\x20\t\r\n\f]*\])*)|.*)\)|)/
Copy the code

There are two cases of pseudo-class, one is after, and the other is nth-child(3). The pseudo-class has 11 capture groups, but $7-$11 is all attributes. Here are only the first six capture groups:

The $1 (identifier):Pseudo class name. Such as:nth-child(3)In thenth-child.:afterIn theafter.
$2 \ [(...). \):The outermost capture group, which captures everything inside the parentheses. Such as:nth-child(3)In the3;:nth-child("3")In the"3";:nth-child('3')In the'3';:not(".className")In the".className".
$3 (' ((? : \ \ [^ \ \ | ']) *) | "((? : \ \ [^ \ \ | "]) *) "):A string with quotes inside the brackets. Such as:nth-child("3")In the"3" ；:nth-child('3')In the'3'.
$4 ((? : \ \ [^ \ \ | ']) *):Values in single quotes. Such as:nth-child('3')In the3.
A $5 ((? : \ \ [^ \ \ | "]) *):Values in double quotes. Such as:nth-child("3")In the3.
$6 ((? :\\.|[^\\()[\]]|attributes)*):In addition to(a) \ []A string or property of. Such as:nth-child(3)In the3;:not([href = 'aaa'])In the[href = 'aaa'].

Usually $3 and $6, one of these two capture groups must have a value. But there is a special case:, :not(:nth-child(3)), in which case both $3 and $6 have no value. Sizzle differentiates some scenarios by whether $6 is worth it

Generating regular instances


var rwhitespace = new RegExp( whitespace + "+"."g" ),  / / blank
    rtrim = new RegExp( "^" + whitespace + "+ | ((? : ^ | ([^ \ \ \ \])? : \ \ \ \.) *)" + whitespace + "+ $"."g" ), // Leave the front and back blank
    rcomma = new RegExp( "^" + whitespace + "*," + whitespace + "*" ),  / / a comma
    rcombinators = new RegExp( "^" + whitespace + "* ([> + ~] |" + whitespace + ")" + whitespace + "*" ),   / / combiner
    rdescend = new RegExp( whitespace + "| >" ), / / offspring
    rpseudo = new RegExp( pseudos ), / / class
    ridentifier = new RegExp( "^" + identifier + "$" ), / / id code
    matchExpr = {
		"ID": new RegExp( "^ # (" + identifier + ")" ),
		"CLASS": new RegExp( "^ \ \." + identifier + ")" ),
		"TAG": new RegExp( "^" + identifier + "| [*])" ),
		"ATTR": new RegExp( "^" + attributes ),
		"PSEUDO": new RegExp( "^" + pseudos ),
		/** /^:(only|first|last|nth|nth-last)-(child|of-type) (? :\(whitespace*( even|odd|(([+-]|)(\d*)n|)whitespace* (? : ((+ -) * (\ d +) | |) whitespace) whitespace * \ |)/I / / $1 only | first | last | NTH | NTH - last / / child | of $2 - type / / $3 all contents in the brackets Such as 2n+1, 2n, 1, even, odd... // $6 $6 $2 // $7 operator + - // $8 the last number. // $7 operator + - // $8 1 * /
		"CHILD": new RegExp( "^:(only|first|last|nth|nth-last)-(child|of-type)(? : \ \ (" +
			whitespace + "*(even|odd|(([+-]|)(\\d*)n|)" + whitespace + "* (? : ((+ -) |)" +
			whitespace + "*(\\d+)|))" + whitespace + "* \ \ |)"."i" ),
		"bool": new RegExp( "^ (? :" + booleans + "$"."i" ),
		"needsContext": new RegExp( "^" + whitespace +
			"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(? : \ \ (" + whitespace +
			"* ((? :-\\d)? \\d*)" + whitespace + (" * \ \ |)? = [^ -] | $)"."i" )
	},
	rhtml = /HTML$/i,
	rinputs = / ^ (? :input|select|textarea|button)$/i,
	rheader = /^h\d$/i.//h1 h2 h3 h4 h5 h6..
	rnative = /^[^{]+\{\s*\[native \w/,
	rquickExpr = / ^ (? :#([\w-]+)|(\w+)|\.([\w-]+))$/.// Quickly find the ID CLASS tag
	rsibling = / [+ ~] /;  / / brother
    
Copy the code

So those are all the regular patterns for the Sizzle

Sizzle source analysis discussion (I): Regular expressions

preface

Regular expressions

The Sizzle of regular

Generating regular instances

Related Posts

Data Type Summary

Webpack optimizes configuration

CSS- :root + VM/vH implementation of responsive fonts!!