Study material for this

1. How do you write

Regular expressions in js are included between two /, note the annotated /. Do not (and cannot) use quotation marks. The second/can be followed by a modifier that is not part of an expression.

const reg = /hi/;
const reg = /hi/i; // The following I is a modifier that indicates that the preceding regular expression is case insensitive
Copy the code

1.1 modifier

  • G: Not just one match, but all matches.
  • I: Ignore case
  • M: Multi-line mode, changed^$The matching method, we’ll talk about later.
  • S: dotAll mode, formerly called single-line mode, so it starts with S. In order not to cause misunderstanding, it is changed into dotAll mode, but the shorthand is still left. It means change.The matching mode of, after adding. Matches all characters, including newlines.

1.2 the RegExp

/exp/ is a syntactic shorthand for new RegExp(), but the latter argument is a string, which is often difficult to escape twice.

1.2.1 RegExp instance method

  1. Exec (), the main method, takes the string to be matched, and returnsThe first oneArray of matched substrings (only the first one is returned even with the global variable argument g). This array has additional attributes: index and input. Returns NULL if there is no match.
    const reg = /\b\w+\b/g;
    const input = "hello, marisa";
    let res = reg.exec(input);
    // The first match
    console.log(res[0]); // "hello"
    // If the capture group is not set, then the array length is 1. If the capture group is set, then the element following the array is the content captured by the capture group.
    console.log(res[1]); // undefined because no capture group is set
    // The position of the match (the position of the original string, such as ",,hello,world" returns 2)
    console.log(res.index); / / 0
    // The original input
    console.log(res.input); // "hello, marisa"
    Copy the code

    If the G modifier is set, you can run exec again, which returns the second match.

  2. Test (), a weakened version of exec, returns only true or false. A string used to verify that there is a match.
  3. Js regular expressions have many features that are not implemented, such as comments and so on.

1.2.2 String methods

This is the string method.

  1. Match (), returns the same value as exec, but this isstringMethods.
    // The opposite position
    reg.exec("string");
    "string".match(reg);
    Copy the code
  2. Search (), returns the first matching position, or -1 if there is none
  3. The replace (),
    • The first argument can be a string or a re
      • In the case of a string, only the first string found is matched, and then replaced.
      • In the case of re, if global G is set, all matches are replaced.
    • The second argument can be a string or a function
      • In the case of a string, replace with that string. There are special values such as $n for the NTH capture group
      • In the case of functions, you can accept parameters, you can have more fine control. See the details when you need them.

2. Commonly used

2.1 .*The use of the

const reg = /hi.*/;
Copy the code

* cannot be used alone, but represents any number of characters before *, in this case., and. Matches a non-newline character. So.* is any non-newline character. So what this syntax means is that it starts with hi and goes to the end of the line.

2.1.1 +The use of the

+ is the same as *, but + matches at least one character, and * can match zero characters.

2.1.2 ?The use of the

? Matches 0 or 1.

2.2 a newline

Is \ n

2.3 \dThe use of the

It matches a number

2.4 -The use of the

It looks like a special character, but it’s not, it just matches – this character.

2.5 \sThe use of the

\s matches any whitespace character.

2.6 \wThe use of the

\w matches any letter or number or underscore or Chinese character.

2.7 \bThe use of the

\b is very special, it does not match a character, but a position, which is the beginning or end of the word. Suppose we have a string “hi, I’m chi\n”.

const reg = /\bhi\b/;  // match the initial hi (note not hi,)
const reg2 = /hi\b/; // Match the initial hi with the hi in chi (note not hi\n)
// Note that \b does not match characters such as punctuation newlines, but a position between the word and punctuation.
Copy the code

The beginning or end of a word is not really an English word, but when the position is not preceded by all the characters \w.

2.8 {}The use of the

2.8.1 {a} Exact number

Represents consecutive a characters that match the preceding character. For example, const reg = /\d{5}/, which means to match five consecutive digits.

2.8.1 Number of {a, b} ranges

Represents the number of consecutive matches between a and B of the preceding character. Const reg = /\d{5,12}/ indicates a minimum of 5 digits and a maximum of 12 digits.

2.8.2 {a,} At least a

Same as above, except there is no upper limit.

2.9 ^$The use of the

^$saidstringThe beginning and the end. That is, the beginning and the end of the entire input.But if you use modifiersmProcessing multiple lines,^$Can representEach lineThe beginning and the end. (Why? Because the beginning and end of a string are also the beginning of the first line and the end of the last line.)

3. Custom

What if you want to select one of several characters, such as O,y, I, S, or H? There is no expression in the re that matches these characters. You can use [] to select, such as [oyish], which takes any of the characters. You can also use – to indicate ranges, for example, [0-9a-za-z] indicates numbers or lowercase or uppercase letters. Characters in [] can be used directly without escaping. However, if you want to match 0 or – or 9 in the range [0-9], then you need to escape -. Always remember that [] matches only one of the characters.

Conditions of 4.

| said a variety of possible conditions. The order of the conditions is important. If the first condition is satisfied, the second condition is not tested. Regular expressions for not a lot of things (such as the size of the mathematical sense), so sometimes it takes many | all kinds of situation.

Group 5.

You can select multiple characters to repeat by enclosing a string of expressions in () followed by {} to indicate that the expression is repeated multiple times. ([]{} can only select a duplicate character)

6. Antisense

All of the above expressions, when capitalized, are antisense. For example, \D means not a number. \B indicates a non-word beginning or ending position. In [] ^ is used to express antisense. For example, [^ 0-9a-za-z] is not a number or a letter.

7. Backreferencing

When a group is represented by (), the result of this expression match can be simply extracted by \1, with multiple parentheses, from front to back, \1, \2… And so on. For example: ([0-9]),\1,\1 represent three repeated digits separated by commas.

8. Zero-width assertion

Zero width, which means positions are matched, not characters.

8.1 (? =exp)grammar

Such as (? =ing) matches the position before ing (not including ing). Use the \ \ w + b (? =ing\b), “I’m singing a song”

8.2 (? >exp)grammar

Such as (? <= UN) matches the position after UN (excluding UN). Use (? <= bun)\w+ b, then “This is an unordered set”

9. Negative zero width assertion

Antisense of zero-width assertion. (respectively? ! Exp) and (? !

10. Greed and laziness

Regular expressions are greedy by default, that is, when matching repeated characters. As many matches as possible. For example, {5,12} will not match 11 repeated characters if it matches 12, will not match 10 if it matches 11, and so on. And * and +, which is an unlimited greedy match, the longer the match, the better.

10.1 lazy

Laziness, as the name suggests, is the opposite of greed, as little as possible. Syntax is the syntax (*,+,{}) for matching repetitions followed by? . Such as:

const reg = / [0-9] {2, 5}? /; // Match 2 numbers
Copy the code