To understand how regular expressions are split, you must first know their structure and operators.
Structure of 1.
The common structure of JS regular expressions is as follows:
The name of the | paraphrase | For example, |
---|---|---|
character | Matches any specific character | a .\n |
Character groups | Matches any non-specific character | [a-z] .\w |
Position the anchor | Match a position | ^ .$ .\b .(? =)... |
quantifiers | Indicates the number of consecutive occurrences of a character | A {1, 3} Indicates that the character A appears for one to three consecutive times |
grouping | A whole group represented by parentheses | + (123) “, indicating that “123” appears at least once consecutively |
backreferences | A reference to a group | \ 1 Is referenced to the first group |
branch | Choose one of several subexpressions | ab|cd , matches the string “ab” or “CD” |
A few notes on grouping and backreferencing:
- When nested parentheses are encountered, the grouping starts with the left parentheses. Such as:
var reg = /(\d(\d))\2\1/;
var string = "34434";
console.log( reg.test(string) );
// true
console.log( RegExp.$1 )
34 "/ /"
console.log( RegExp.$2 )
/ / "4"
Copy the code
- When nonexistent group identification occurs:
var regex = /(\d)\d/;
var string = "123";
string.replace(regex, "$2"); / / $23
Copy the code
$10
Identification of :(yesThe $1
+'0'
, or$10
?).
var regex = /(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)/;
var string = "1234567890";
string.replace(regex, "$10." "); / / 0
var regex = /(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)/;
var string = "1234567890";
string.replace(regex, "$10." "); / / 100
Copy the code
2. The operator
The basic operators are:
The name of the | symbol | priority |
---|---|---|
Escape character | \ |
1 |
Brackets and square brackets | (a) ,(? :) ,(? =) ,(? !). ,[] |
2 |
Quantifier qualifier | ? ,* ,+ ,{m} ,{m,n} ,{m,} |
3 |
Position and sequence | ^ ,$ ,\ metacharacters ,General character |
4 |
Pipe, | | |
5 |
A brief mention of metacharacters. Metacharacters in re include:
^ $. * +? | \ / () [] {} =! : -- -Copy the code
When matching a metacharacter itself, use \ escape; For the rest of the characters, the escaped result is itself.
3. Case analysis and points for attention
Example 1.
The following re:
/^a(b|cd?) +|e/Copy the code
- In this expression, the parentheses have the highest precedence, so
(b|cd?)
It’s a whole structure; - In the brackets
(b|cd?)
In, quantifiers have the highest priority and therefore ared?
An integral structure; - In the brackets
(b|cd?)
The branch|
Lowest priority, thereforeb
Is a whole, andcd?
Is another whole; - By the same token, the whole regular is divided
^
,a
,(...). +
,e
. And because of the branching, it can be divided^a(b|cd?) +
ande
These two things.
2. Pay attention to the point
Careful use of structures and operators can lead to very different results.
1. Pay attention to operator priority
Suppose to match including fore and aft target string is ab or CD, if you don’t pay attention to, may be written / ^ ab | $/ CD.
Due to anchor and characters priority than pipe | is high, so the matching structure is: ^ ab and CD $two as a whole, rather than CD $$and ^ ^ ab.
Should be modified to: / ^ (CD) ab | $/;
2. […]. Groups cannot be used within character groups
For example, if you want to match ab or 0-9, you cannot write /[(ab)0-9]/ as a matter of course, () will be recognized as ordinary characters (and) rather than groups.
3. Use character groups within character groups
For example, to match b or 0-9, some creative children write /[a[0-9]]/, which results in the first [and first] character group. The tests are as follows:
/[a[0-9]]/.test("9")
//false
/[a[0-9]]/.test("a")
//false
/[a[0-9]]/.test("a]")
//true
Copy the code
4. Quantifiers should not be used directly
Suppose you want to match a string like this:
Each character is one of a, B, or C;
The length of the string is a multiple of 2.
/[ABC]{2}+/ if you are not familiar with the precedence of the operator, you might write it as /[ABC]{2}+/.
/[abc]{2} + ///Uncaught SyntaxError: Invalid regular expression: /[abc]{2}+/: Nothing to repeat
Copy the code
Should read:
/([abc]{2}) + /Copy the code
5. Metacharacter escape
Even for metacharacters, it is up to the circumstances whether escape is necessary.
Case 1:
Character groups […]. When the… If there are ^ and – characters that represent themselves, escape them. The remaining metacharacters do not need to be escaped within the character group. (Note the character group […] The special structures \d (group of characters) and \b (anchor points) can still be used.
Example 2:
To match the string [ABC] :
It could be written as /\[ABC \]/, but the latter escape is actually unnecessary. Can be tested as follows:
/\[abc\]/.test("[abc]")
//true
Copy the code
The following square brackets do not form a character group, and the re is not ambiguous, so there is no need to escape.
However, it should not be written as /[ABC \]/. As follows:
/[abc\]/
//Uncaught SyntaxError: Invalid regular expression: missing /
Copy the code