I’m just a document porter. If I make a bad summary, it’s hard to avoid omissions. If there are any mistakes, please help me point them out.

1. How to create one

  • Use the RegExp constructor
  • Use regular expression literals

Any regular expression defined using literals can be built using constructors

A regular expression is a sequence of characters that make up a search pattern, matching the pattern of combinations of characters in a string. It’s basically a description or a search rule for what you want to operate on.

RegExp is an object in JS. There are test and exec methods, as well as string match, replace, search, split, and matchAll methods. You can create or new RegExp objects directly from regular expression literals.

MDN says that when the script is loaded, the regular expression literal is compiled. Use literals for better performance when regular expressions remain unchanged. As the script runs, the regular expression created with the constructor is compiled. If the regular expression changes or needs to be generated dynamically from sources such as user input, constructors are used to create the regular expression. You can use compile to recompile regular expressions if you need to change them.

Why recompile? I found that the output was the same even without recompiling when testing, and I found an answer in Nuggets: juejin.cn/post/684490… Compiling regular expressions will improve the efficiency of the code if the specified regular expression needs to be repeated many times, but if it is executed only once or a few times, it will not have a significant effect. Compile improves the adaptability of regular expressions.

2. How to write

Format: / pattern/flags;

2.1. Determine the string

Just use/STR /

2.2. Indeterminate strings

2.2.1. Assertion

2.2.1.1. Assertions

Indicates that a match occurs under certain conditions. Assertions contain prior assertions, subsequent assertions, and conditional expressions.

    const text = 'A quick fox';
    const regexpLastWord = /\w+$/;
    console.log(text.match(regexpLastWord));
    // expected output: Array ["fox"]
    const regexpWords = /\b\w+\b/g; // \b Form word boundary
    console.log(text.match(regexpWords));
    // expected output: Array ["A", "quick", "fox"]
    const regexpFoxQuality = /\w+(? = fox)/; // Matches the preceding \w+ only if \w+ is followed by fox, this is called antecedent assertion.
    console.log(text.match(regexpFoxQuality));
    // expected output: Array ["quick"]
Copy the code
Characters Meaning
^ Matches the beginning of input
$ Matches the end of input
\b Matches a word boundary
\B Matches a non-word boundary
x(? =y) Matches “x” only if “x” is followed by “y”
x(? ! y) Matches “x” only if “x” is not followed by “y”
(? <=y)x Matches “x” only if “x” is preceded by “y”
(? <! y)x Matches “x” only if “x” is not preceded by “y”

attention:(different meaning in the start of group)

2.2.1.2. Additional :Groups and ranges

Characters Meaning
x|y Matches either “x” or “y”
[xyz] [a-c] A character set.
[^xyz] [^a-c] A negated or complemented character set
(x) Matches x and remembers the match
\n Where “n” is a positive integer
(? <Name>x) Named capturing group
(? :x) Non-capturing group

attention:(?

x) Matches “x” and stores it on the groups property of the returned matches under the name specified by

2.2.2. Boundary (Boundaries)

Represents the beginning and end of lines and words.

2.2.3. Character Classes

Distinguish between different types of characters, such as letters and numbers.

2.2.4. Groups and Ranges

Represents the grouping and range of expression characters.

2.2.5. Quantifiers (Quantifiers)

Represents the number of matched characters or expressions.

2.2.6. Unicode Property Escapes

Character discrimination based on Unicode character attributes. Examples include upper and lower case letters, mathematical symbols, and punctuation.

2.3. Special Characters

I have read the explanation of W3school.com.cn and the explanation of MDN. I suggest to see the meaning of special characters in MDN regular expressions.

quantifiers describe
n+ Match the previous pattern n 1 or more times
n* Matches the previous pattern n 0 or multiple times
n? Match the previous pattern n 0 or 1 times
n{X} X is a positive integer. The preceding pattern n matches if it occurs x times in a row
n{X,Y} Matches a string containing a sequence of X to Y n.
n{X,} Matches a string containing a sequence of at least X n.
n$ Matches any string ending in n.
^n Matches any string starting with n.
? =n Matches any string immediately followed by the specified string n.
? ! n Matches any string that is not immediately followed by the specified string n.
x\|y Matches x or y

2.4. The modifier

Against 2.4.1. Syntax

var re = /pattern/flags; or var re = new RegExp('pattern', 'flags');

flag meaning Corresponding properties
i Case insensitive RegExp.prototype.global
g Global search RegExp.prototype.ignoreCase
m Line search RegExp.prototype.multiline
s allow.Match newline (added in ES2018, not supported by Firefox) RegExp.prototype.dotAll
u Matches using patterns of Unicode codes RegExp.prototype.unicod
y Perform a sticky search, matching from the current position of the target string, using the Y flag RegExp.prototype.sticky

2.4.2. Additional

  1. Flags are part of a regular expression and cannot be added or removed at a later time after the regular expression is created.
  2. The behavior associated with the ‘g’ flag is different when the.exec() method is used. At the very beginningcreateExec is a RegExp method, and match is a String method. Why do the two methods look the same without the g flag?

    In the case of exec(), the regular expression isHave the methodIs a regular expression, where strings are arguments; In the case of.match(), the string class (or data type) owns the method, and the regular expression is just an argument. contraststr.match(reg)withreg.exec(str) , 'g'Marks andexec()Methods are used together because the lastIndex argument is iterated over, as described in the exec method later.
  3. Where m stands for multi-line matching how to use? The m flag is used to specify that a multi-line input string should be treated as multiple lines. If the m flag is used, ^ and $match the beginning or end of each line in the input string, not the beginning or end of the entire string.
    var reg = new RegExp("^."."gm");
    var str = "hui \nhello \nfuck!";
    console.log(str.match(reg)); //(3) ["h", "h", "f"]
Copy the code

Example 2.4.3.

    var names = "Orange Trump ; Fred Barney; Helen Rigby ; Bill Abel ; Chris Hand ";
    var output = ["---------- Original String\n", names + "\n"];
    // This output = ["---------- Original String Address ", "Orange Trump;Fred Barney; Helen Rigby; Bill Abel; Chris Hand Handmade Handmade Handmade "]
    var pattern = /\s*; \s*/;
    var nameList = names.split(pattern);
    // nameList = ["Orange Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]
    pattern = /(\w+)\s+(\w+)/;
    var bySurnameList = [];
    output.push("---------- After Split by Regular Expression");
    // This output = ["---------- Original String Address ", "Orange Trump;Fred Barney; Helen Rigby; Bill Abel; Chris Hand This address "---------- After Split by Regular Expression"]
    var i, len;
    for (i = 0, len = nameList.length; i < len; i++) {
        output.push(nameList[i]);
        bySurnameList[i] = nameList[i].replace(pattern, "$2, $1");
        console.log("xixixi:" + bySurnameList[i]);
    }

    // Outputs a new array
    output.push("---------- Names Reversed");
    for (i = 0, len = bySurnameList.length; i < len; i++) {
        output.push(bySurnameList[i]);
    }

    // Sort by last name and print the sorted array.
    bySurnameList.sort();
    output.push("---------- Sorted");
    for (i = 0, len = bySurnameList.length; i < len; i++) {
        output.push(bySurnameList[i]);
    }

    output.push("---------- End");
 
    console.log(output.join("\n"));
Copy the code

2.5. RegExp object methods

2.5.1. RegExpObject.compile(regexp,modifier)

It can be used to compile regular expressions during script execution, or to change or recompile regular expressions. Modifier has only I and G

    var str = "Every man in the world! Every woman on earth!";
    console.log(str);
    patt = /man/g;
    str2 = str.replace(patt, "person");
    console.log(str2);
    console.log(patt);

    patt = /(wo)? man/g;
    console.log(patt);

    patt.compile(patt); // Recompile after changing the regular expression
    // Why recompile? I found that the output was the same when I tested it without recompiling
    / / I found an answer in the Denver nuggets: https://juejin.cn/post/6844903653686378504#heading-22
    If the regular expression is to be used several times, compiling the regular expression will improve the code's efficiency. If the regular expression is to be used only once or a few times, it will not have a significant effect.
    console.log(patt);

    str2 = str.replace(patt, "person");
    console.log(str2);
Copy the code

2.5.2. RegExpObject.exec(string)

Searching the string by pattern and returning the found text returns an array, but only the first matched text is returned even if there are multiple matches in the string.

Exec is more flexible and complex, returning an array of results if exec() finds a match. Otherwise, null is returned. The 0th element of this array is the text that matches the regular expression, the first element is the text that matches the first subexpression of RegExpObject (if any), the second element is the text that matches the second subexpression of RegExpObject (if any), And so on. In addition to the array element and the length attribute, the exec() method returns two attributes. The index property declares the position of the first character of the matching text. The input property holds the retrieved string string. We can see that when we call the exec() method of a non-global RegExp object, we return the same array as when we call the string.match () method.

However, when RegExpObject is a global regular expression, the behavior of exec() is a little more complicated. It starts retrieving the string at the character specified in the lastIndex attribute of RegExpObject. When exec() finds the text that matches the expression, after the match, it sets the lastIndex property of RegExpObject to the position next to the last character in the matched text, which in the example below is the position next to the I character in the hui each time it is matched. You can iterate through all the matching text in a string by calling the exec() method repeatedly. When exec() finds no more matching text, it returns NULL and resets the lastIndex attribute to 0.

    var reg = new RegExp(/hui/ig);
    var str = 'hui hui hui Like Summer';
    console.log(reg.lastIndex); / / 0
    console.log(reg.exec(str));//["hui", index: 0, input: "hui hui hui Like Summer", groups: undefined]
    // The last character of the matched text is 3 after 2
    console.log(reg.lastIndex); / / 3
    console.log(reg.exec(str));//["hui", index: 4, input: "hui hui hui Like Summer", groups: undefined]
    console.log(reg.lastIndex); / / 7
    console.log(reg.exec(str));//["hui", index: 8, input: "hui hui hui Like Summer", groups: undefined]
    console.log(reg.lastIndex); / / 11
    console.log(reg.exec(str));//null
    console.log(reg.lastIndex);/ / 0
Copy the code
    var reg = /\w+\s/g;
    var str = "fee fi fo fum";
    var xArray;
    while (xArray = reg.exec(str)) console.log(xArray);
    // ["fee ", index: 0, input: "fee fi fo fum"]
    // ["fi ", index: 4, input: "fee fi fo fum"]
    // ["fo ", index: 7, input: "fee fi fo fum"]
Copy the code

If you want to rematch a string after the reg match, you need to manually set the reg. LastIndex to 0, otherwise it will exist but will not match as shown below.

    var reg = new RegExp(/hui/ig);
    var str = 'hui hui hui Like Summer';
    console.log("lastIndex:" + reg.lastIndex); / / 0
    console.log(reg.exec(str));
    console.log("lastIndex:" + reg.lastIndex); / / 3
    console.log(reg.exec('hui Like Summer'));
Copy the code

Exec also returns much more detailed information than match.

    var reg = new RegExp(/hui/ig);
    var str = ' HuiDT hui hui hui Like Summer';
    console.log(reg.exec(' HuiDT hui hui hui Like Summer'));
    console.log(reg.exec(str));
    console.log(/hui/ig.exec(' HuiDT hui hui hui Like Summer'));
    console.log(/hui/ig.exec(str));
Copy the code

2.5.3. Test ()

Returns true or false to see if there is a regular expression (searching the string by pattern) in the string.

    var reg = /hui/;
    var str = ' huiDT Like Summer';
    console.log(reg.test('huiDT Like Summer'));
    console.log(reg.test(str));
    console.log(/hui/.test('huiDT Like Summer'));
    console.log(/hui/.test(str));
Copy the code

2.6. RegExp instance properties

Each instance of RegExp has the following properties from which information can be obtained.

  • Global: A Boolean value indicating whether the G flag is set.
  • IgnoreCase: Boolean value indicating whether the I flag is set.
  • Multiline: integer indicating the position of the character at which to start searching for a match, counting from 0.
  • LastIndex: Boolean value indicating whether the M flag is set.
  • Source: String representation of the regular expression, returned as a literal rather than as a string pattern in the passed constructor.

2.7. RegExp constructor properties

The RegExp constructor contains properties that apply to all regular expressions in scope and have a long and short property name, respectively.

Long attribute name Short attribute name instructions
input The $_ String that was last matched
lastMatch $& The last match
lastParen + $ The capture group that was last matched
leftContext $` The text before lastMatch in the input string
multiline $* Boolean value indicating whether all expressions are in multi-line mode
rightContext $' The text after lastMatch in the Input string

2.8. String methods that support regular expressions

  1. Seaech () : Accepts a string as a search parameter. The string argument is converted to a regular expression that returns the position where the pattern first appears.
    var reg = /hui/i;
    var str = 'DT hui Like Summer';
    console.log(str.search(reg)); / / 3
    console.log(str.search("hui")); / / 3
    console.log(str.search(/hui/i)); / / 3
    console.log('DT hui Like Summer'.search(reg)); / / 3
    console.log('DT hui Like Summer'.search("hui")); / / 3
    console.log('DT hui Like Summer'.search(/hui/i)); / / 3
Copy the code
  1. Replace: Replaces the substring that matches the regular expression.
    var reg = /hui/i;
    var str = 'DT hui Like Summer';
    var str1 = 'can';
    console.log(str.replace('DT'.'can'));
    console.log(str.replace(/hui/.'can'));
    console.log(str.replace(/DT/, str1));
    console.log(str.replace(reg, str1));
    console.log('DT DT hui Like Summer'.replace(/DT/g.'can'));
Copy the code
  1. Match: Matches one or more regular expressions.
    var reg = new RegExp("hui".'ig');
    var str = "Huihui Like Summer";
    console.log(str.match(reg));
Copy the code

Regular expressions are composed of ordinary characters and special characters (metacharacters). Ordinary characters include non-printing characters

  1. Split: Splits a string into an array of strings.

2.9. The brackets

Represents a range, used to find characters in a specified range

  1. [ABC] find the characters that appear in square brackets, that is, in a given set
    var reg = /[abc]/ig;
    var str = "A a b d hui";
    console.log(str.match(reg)); //["A", "a", "b"]
Copy the code
  1. [^ ABC] find all characters that do not appear in square brackets, that is, characters outside the given set
    var reg = /[^abc]/ig;
    var str = "A a b d hui";
    console.log(str.match(reg)); // [" ", " ", " ", "d", " ", "h", "u", "i"]
Copy the code
  1. [0-9] : Finds any number from 0 to 9.

  2. [a-z] : Finds any character written from small a to lowercase Z.

  3. [a-z] : Finds any character from capital A to capital Z.

  4. [a-z] : Finds any character from upper case A to lower case Z.

  5. (red | blue | green) : look for any given option.

    var str = "abchui";
    console.log(str.match(/[|a|b|c]/ig)); //["a", "b", "c"]
    console.log(str.match(/[^|a|b|c]/ig)); //["h", "u", "i"]
    console.log(str.match(/(^a|b|c)/ig)); //["a", "b", "c"]
    console.log(str.match(/(a|hui)/ig)); //["a", "hui"]
Copy the code

2.10 yuan characters

Metacharacters are characters that have special meanings

Escape is required when trying to match them.

Special characters describe
. Find single characters, except newlines and line terminators.
\w Find word characters (except symbols, Spaces)
\W Find non-word characters.
\d Find numbers.
\D Find non-numeric characters.
\s Find whitespace characters.
\S Find non-whitespace characters.
\b Match word boundaries.
\B Matches non-word boundaries.
\ 0 Find NUL characters.
\n Find a newline character.
\f Find the feed character.
\r Look for carriage returns.
\t Find tabs.
\v Find vertical tabs.
\xxx Find the character specified as the octal number XXX.
\xdd Find characters specified in hexadecimal number dd.
\uxxxx Finds Unicode characters specified in hexadecimal XXXX.

[example]

    var str = "Like's 520";
    var reg1 = /./ig;
    console.log(str.match(reg1)); // ["L", "i", "k", "e", "'", "s", " ", "5", "2", "0"]
    var reg2 = /\w/ig;
    console.log(str.match(reg2)); //["L", "i", "k", "e", "s", "5", "2", "0"]
    var reg3 = /\W/ig;
    console.log(str.match(reg3)); / / / "'", ""
    var reg4 = /\d/ig;
    console.log(str.match(reg4)); / / / "5", "2", "0"]
    var reg5 = /\D/ig;
    console.log(str.match(reg5)); //["L", "i", "k", "e", "'", "s", " "]
    var reg6 = /\s/ig;
    console.log(str.match(reg6)); / / / ""
    var reg7 = /\S/ig;
    console.log(str.match(reg7)); //["L", "i", "k", "e", "'", "s", "5", "2", "0"]
    var reg8 = /\b/ig; // Find the space next to the word
    console.log(str.match(reg8)); //(6) ["", "", "", "", ""]

    var reg9 = /\B/ig; // Non-word boundaries, find the number of intervals between consecutive words
    console.log(str.match(reg9)); //(5) ["", "", "", ""]
    console.log("1234567".match(reg9)); //(6) ["", "", "", "", ""]

    str = "Like's hui A ቐ W \0 \n \f \r \t \v 0b10 070 0xaa";
    var reg10 = /\0/ig;
    console.log(str.match(reg10));
    var reg11 = /\n/ig;
    console.log(str.match(reg11));
    var reg12 = /\f/ig;
    console.log(str.match(reg12));
    var reg13 = /\r/ig;
    console.log(str.match(reg13));
    var reg14 = /\t/ig;
    console.log(str.match(reg14));
    var reg15 = /\v/ig;
    console.log(str.match(reg15)); / / / ""
    var reg16 = /\150/g; // find octal numbers
    console.log(str.match(reg16));
    var reg17 = /\151/g; // find octal numbers
    console.log(str.match(reg17));
    console.log(str.match(/\x57/g)); // Find a hexadecimal number
    var reg18 = /\u0041/g; // Find Unicode characters specified in the hexadecimal number XXXX
    console.log(str.match(reg18));
    console.log(str.match(/\u1250/));
Copy the code
([{\ ^ $|)? * +.]}Copy the code

Usage reference: developer.mozilla.org/zh-CN/docs/…

metacharacters describe
$ Matches the end of the input string. If the Multiline property of the RegExp object is setThe character itself, please use$`.
( ) Marks the start and end of a subexpression. Subexpressions can be retrieved for later use. To match these characters, use the\ [\).
* Matches the preceding subexpression zero or more times. To match*Character, please useA \ *.
+ Matches the previous subexpression one or more times. To match+Character, please use\ +.
. Matches the division newline character\nOther than any single character. To match., use\.
[ Marks the beginning of a bracketed expression. To match[, please use the\ [.
? Matches the preceding subexpression zero or once, or indicates a non-greedy qualifier. To match?Character, please use
\ Marks the next character as an or special character, or a literal character, or a backreference, or an octal escape. For example, ‘n’ matches the character ‘n’. ‘\n‘matches a newline character. Sequence ‘\ \‘match”\“And ‘ ‘\ [‘matches”(“.
^ Matches the start of the input string, unless used in a square bracket expression, in which case it indicates that the set of characters is not accepted. To match^The character itself, please use\ ^.
{ Marks the beginning of a qualifier expression. To match{, please use the\ {.
\ | Indicates a choice between two items. To match\ |, please use the\|

2.11. Non-print characters

character describe
\cx Matches the control character specified by x. For example,\cMMatches a Control-m or carriage return character. The value of x must be either A-z or a-z. Otherwise, c is treated as a literal ‘c’ character.
\f Matches a feed character. Is equivalent to\x0c\cL.
\n Matches a newline character. Is equivalent to\x0a\cJ.
\r Matches a carriage return. Is equivalent to\x0d\cM.
\s Matches any whitespace character, including Spaces, tabs, page feeds, and so on. Is equivalent to[ \f\n\r\t\v].
\S Matches any non-whitespace character. Is equivalent to[^ \f\n\r\t\v].
\t Matches a TAB character. Is equivalent to\x09\cI.
\v Matches a vertical TAB character. Is equivalent to\x0b\cK.

2.12. Operator priority

2.13. Matching rules

    var str = "hui hello fuck! Orange Trump";
    str1 = str.replace(/(\w+)\s+(\w+)/g."$2, $1");
    console.log(str1); //hello,hui fuck! Trump,Orange
Copy the code

3. The limitations

ECMAScript lacks the machine regular expression features supported by some languages (especially Perl).