To understand how regular expressions are split, you must first know their structure and operators.

Structure of 1.

The common structure of JS regular expressions is as follows:

The name of the paraphrase For example,
character Matches any specific character a.\n
Character groups Matches any non-specific character [a-z].\w
Position the anchor Match a position ^.$.\b.(? =)...
quantifiers Indicates the number of consecutive occurrences of a character A {1, 3}Indicates that the character A appears for one to three consecutive times
grouping A whole group represented by parentheses + (123)“, indicating that “123” appears at least once consecutively
backreferences A reference to a group \ 1Is referenced to the first group
branch Choose one of several subexpressions ab|cd, matches the string “ab” or “CD”

A few notes on grouping and backreferencing:

  1. When nested parentheses are encountered, the grouping starts with the left parentheses. Such as:
var reg = /(\d(\d))\2\1/;
var string = "34434";
console.log( reg.test(string) );
// true
console.log( RegExp.$1 )
34 "/ /"
console.log( RegExp.$2 )
/ / "4"
Copy the code
  1. When nonexistent group identification occurs:
var regex = /(\d)\d/;
var string = "123";
string.replace(regex, "$2"); / / $23
Copy the code
  1. $10Identification of :(yesThe $1 + '0', or$10?).
var regex = /(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)/;
var string = "1234567890";
string.replace(regex, "$10." "); / / 0

var regex = /(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)(\d)/;
var string = "1234567890";
string.replace(regex, "$10." "); / / 100
Copy the code

2. The operator

The basic operators are:

The name of the symbol priority
Escape character \ 1
Brackets and square brackets (a),(? :),(? =),(? !).,[] 2
Quantifier qualifier ?,*,+,{m},{m,n},{m,} 3
Position and sequence ^ ,$,\ metacharacters,General character 4
Pipe, | 5

A brief mention of metacharacters. Metacharacters in re include:

^ $. * +? | \ / () [] {} =! : -- -Copy the code

When matching a metacharacter itself, use \ escape; For the rest of the characters, the escaped result is itself.

3. Case analysis and points for attention

Example 1.

The following re:

/^a(b|cd?) +|e/Copy the code
  1. In this expression, the parentheses have the highest precedence, so(b|cd?)It’s a whole structure;
  2. In the brackets(b|cd?)In, quantifiers have the highest priority and therefore ared?An integral structure;
  3. In the brackets(b|cd?)The branch|Lowest priority, thereforebIs a whole, andcd?Is another whole;
  4. By the same token, the whole regular is divided^,a,(...). +,e. And because of the branching, it can be divided^a(b|cd?) +andeThese two things.

2. Pay attention to the point

Careful use of structures and operators can lead to very different results.

1. Pay attention to operator priority

Suppose to match including fore and aft target string is ab or CD, if you don’t pay attention to, may be written / ^ ab | $/ CD.

Due to anchor and characters priority than pipe | is high, so the matching structure is: ^ ab and CD $two as a whole, rather than CD $$and ^ ^ ab.

Should be modified to: / ^ (CD) ab | $/;

2. […]. Groups cannot be used within character groups

For example, if you want to match ab or 0-9, you cannot write /[(ab)0-9]/ as a matter of course, () will be recognized as ordinary characters (and) rather than groups.

3. Use character groups within character groups

For example, to match b or 0-9, some creative children write /[a[0-9]]/, which results in the first [and first] character group. The tests are as follows:

/[a[0-9]]/.test("9")
//false
/[a[0-9]]/.test("a")
//false
/[a[0-9]]/.test("a]")
//true
Copy the code

4. Quantifiers should not be used directly

Suppose you want to match a string like this:

  1. Each character is one of a, B, or C;

  2. The length of the string is a multiple of 2.

/[ABC]{2}+/ if you are not familiar with the precedence of the operator, you might write it as /[ABC]{2}+/.

/[abc]{2} + ///Uncaught SyntaxError: Invalid regular expression: /[abc]{2}+/: Nothing to repeat
Copy the code

Should read:

/([abc]{2}) + /Copy the code

5. Metacharacter escape

Even for metacharacters, it is up to the circumstances whether escape is necessary.

Case 1:

Character groups […]. When the… If there are ^ and – characters that represent themselves, escape them. The remaining metacharacters do not need to be escaped within the character group. (Note the character group […] The special structures \d (group of characters) and \b (anchor points) can still be used.

Example 2:

To match the string [ABC] :

It could be written as /\[ABC \]/, but the latter escape is actually unnecessary. Can be tested as follows:

/\[abc\]/.test("[abc]")
//true
Copy the code

The following square brackets do not form a character group, and the re is not ambiguous, so there is no need to escape.

However, it should not be written as /[ABC \]/. As follows:

/[abc\]/
//Uncaught SyntaxError: Invalid regular expression: missing /
Copy the code