preface

Regular expressions are patterns used to match combinations of characters in strings. In JavaScript, regular expressions are also objects. These patterns are used for the exec and test methods of RegExp, and for the match, matchAll, replace, search, and split methods of String.

This chapter introduces JavaScript regular expressions. By using regular expressions, you can:

  • Testing patterns within strings, you can test the input string to see if a pattern of phone numbers or credit card numbers appears within the string. This is called data validation.

  • Replace text. You can use regular expressions to identify specific text in a document, remove it entirely, or replace it with other text.

  • Extracting substrings from strings based on pattern matching allows you to find specific text within a document or input field.

Basic usage

Create a regular expression

Use a regular expression literal that consists of patterns contained between slashes, as follows:

var re = /ab+c/;
Copy the code

After the script loads, the regular expression literal is compiled. Use this approach for better performance when the regular expression remains unchanged. Or call the constructor of the RegExp object as follows:

var re = new RegExp("ab+c");
Copy the code

Regular expressions created with constructors are compiled during script execution. If the regular expression will change, or if it will be generated dynamically from sources such as user input, you need to use constructors to create the regular expression.

Js uses re

methods describe
exec A RegExp method that performs a look-up match in a string, returning an array (null if no match is found).
test A RegExp method that tests a match in a string and returns true or false.
match A String method that performs a lookup on a String, returning an array and null if there is no match.
matchAll A String method that finds all matches in a String and returns an iterator.
search A String method that tests a match in a String, returning the index of the matched position, or -1 on failure.
replace A String method that looks for matches in a String and replaces matched substrings with replacement strings.
split One uses a regular expression or a fixed string to separate a string and stores the delimited substring in an arrayStringMethods.
  • Difference between Exec and match

    • Methods of the RegExp class and the String class, respectively

    • Exec will only match the first matching string (meaning g has no effect on it), and whether match returns all matched arrays depends on whether g is present in the regular expression

const str = 'd3aish hello world d5aisy';
const reg = /\dai/g;
// Let's start with no g

console.log(str.match(reg));  // ['3ai', '5ai']
console.log(reg.exec(str)); // ['3ai']

/ / g
const str = 'd3aish hello world d5aisy';
const reg = /\dai/;
// Let's start with no g

console.log(str.match(reg));  // ['3ai']
console.log(reg.exec(str)); // ['3ai']

Copy the code

quantifiers

Characters Meaning
x* Match the preceding item “x” 0 or more times.

For example, /bo*/ matches “Boooo” in “A Ghost Booooed” and “B” in “A bird Warbled”, but not in “A goat grunt”.
x+ Match the previous item “x” 1 or more times. Equivalent to {1,}.

For example, /a+/ matches the “A” in “candy” and the “A” in “caaaaaaandy.
x? Match the preceding item “x” 0 or 1 times.

For example, / e? le? / matches el in angel and le in Angle.

If immediately any quantifier *, +,? Or {} to make the quantifier non-greedy (minimum number of matches) instead of the default greedy (maximum number of matches).
x{n} Where “n” is a positive integer that matches the previous term “x” n times.

For example, /a{2}/ does not match the “A” in “candy”, but it matches all the “AS” in “caandy”, as well as the first two “as” in “caaandy”.
x{n,} “N” is a positive integer that matches the previous item “x” at least “n” times.

For example, /a{2,}/ does not match the “A” in “candy”, but matches all the As in “caandy” and “caaaaaaandy”.
x{n,m} N is 0 or a positive integer, m is a positive integer, and m > n matches the previous item x at least and m at most. For example, /a{1,3}/ does not match the “a” in “cndy”, the “a” in “candy”, the two “as” in “caandy”, and the first three “as” in “caaaaaaandy”. Note that when “caaaaaaandy” is matched, “AAA” is matched, even though there are more “AS” in the original string.
x*?

x+?

x??

x{n}?

x{n,}?

x{n,m}?
By default, like*+Such quantifiers are “greedy,” meaning they try to match as many strings as possible. ? The character after the quantifier makes it “non-greedy” : meaning that it stops once it finds a match. For example, given a string “some new thing” :

> / <. * /will match ” new “

/ <. *? >/ will match “”

identifier

Regular expressions have six optional arguments (flags) to allow global and case-insensitive searches, etc. These parameters can be used alone or together in any order and are included in the regular expression instance.

mark describe
g Global search.
i Case insensitive search.
m Multi-line search.
s allow.Matches a newline character.
u Matches using patterns of Unicode codes.
y Perform “Stickiness (sticky) “and matches from the current position of the target string.

To include flags in a regular expression, use the following syntax:

var re = /pattern/flags;
Copy the code

or

var re = new RegExp("pattern"."flags");
Copy the code

It is worth noting that flags are part of a regular expression and they cannot be added or removed at a later time.

The identifier g

const reg = /abc/gi;
const str = 'helloabc';

reg.test(str) // true
reg.test(str) // false
reg.test(str) // true
reg.test(str) // false


const reg = /abc/i;
const str = 'helloabc';

reg.test(str) // true
reg.test(str) // true
reg.test(str) // true
reg.test(str) // true

Copy the code

LastIndex, another attribute of the global regular expression, is used to store the position of the first character after the last matched text. Both the regexp.prototype.exec () and regexp.prototype.test () methods take the location stored in the lastIndex property as the starting point for the next retrieval of a regular match. These two methods are called consecutively to iterate over all the matching text in the string. The lastIndex property is read and written, and is automatically reset to 0 when regexp.prototype.exec () or regexp.prototype.test () find no more text to match. So using these two methods to retrieve text can go on indefinitely.

The identifier y

Perform a “sticky” search, matching from the current position of the target string

var searchStrings, stickyRegexp;

stickyRegexp = /foo/y;

searchStrings = [
    "foo"." foo"." foo",]; searchStrings.forEach(function(text, index) {
    stickyRegexp.lastIndex = 1;
    console.log("found a match at", index, ":", stickyRegexp.test(text));
});

// found a match at 0 : false
// found a match at 1 : true
// found a match at 2 : false

// Change y to g
// found a match at 0 : false
// found a match at 1 : true
// found a match at 2 : true

Copy the code

It can be understood that it must be matched at the beginning of lastIndex, that is, the index is 1 to match /^ ABC /, to achieve more accurate position control.

Advanced usage

Greedy mode and non-greedy mode

var str='aacbacbc';
var reg=/a.*b/;
var res=str.match(reg);
// the aacbacb index is 0
console.log(res);
Copy the code

In the above example, after the first a is matched, the match begins.*, since it is greedy mode, it will continue to match until the last b meets the condition, so the match result is aacbacb

var str='aacbacbc';
var reg=/a.*? b/;
var res=str.match(reg);
// The acbacb index is 1
console.log(res);
Copy the code

The first match is a, then the next match is a, and the re does not match, so the match fails, index moves to 1, then the match succeeds ac, continue to match, because of greedy mode, as many matches as possible, until the last match is b, so the match is acbacb

Capture group

To repeat a single character, simply add a qualifier to the end of the character. For example, a+ means to match one or more a’s, a? Matches 0 or 1 A.

But what if we want to repeat multiple characters? We can use the parentheses “()” to specify which subexpression to repeat, and then repeat the subexpression, for example :(ABC)? Zero or one ABCs where a parenthesis is a grouping. Non-capture groups take many forms, including zero-width assertions and schema modifiers

backreferences

References are made to the text in the previous capture group, not to the re, which means that the text matched at the backreference should be the same as the text in the previous capture group. For example, /([“‘])(ABC).*\1/ where the grouping is used, \1 is a reference to the quotation mark group. It matches all strings contained in either two quotes or two single quotes, such as “ABC” or “‘ “or” “‘, but note that it does not match “a “or “a”. Peacetime development is also often used for HTML tag matching

Named capture group

Capturing Groups can be divided into Numbered Capturing Groups and Named Capturing Groups Named Capturing Groups. Numbered Capturing Groups can be Numbered Capturing Groups. Named capture groups, also capture groups, but with different syntax. The syntax for naming a capture group is as follows: < name > group) or (? ‘name’group), where name stands for the name of the capture group and group stands for the re in the capture group.

Non capturing group

Grammar: (? :Pattern)

For example, match indestry or indestries

We can use indestr (y | ies) or indestr (? :y|ies)

To (?) The leading group is a pure non-capture group that does not capture text and does not count against a combinatormeter. That is, if the curly braces begin with? The group does not capture the text, and it certainly does not have the group number, so there is no backreference. We can get what we want to match by capturing groups, so why have non-capturing groups? The reason is that the content captured by a capture group is stored in memory for later use, for example, a backreference is the content captured by a capture group stored in referenced memory. A non-capture group, on the other hand, does not capture text and does not group the content it matches separately into memory. Therefore, using non-capture groups saves more memory than using capture groups.

  • In actual application scenarios, the desired information can be quickly extracted
'https://www.toutiao.com'.match(/ (? :https? : \ \ /)/(. *))
// ["https://www.toutiao.com", "www.toutiao.com"]
Copy the code

assertions

Zero width assertion

(? =y) Matches ‘x’ only if ‘x’ is followed by ‘y’. This is called prior assertion.

For example, / Jack (? =Sprat)/ will match ‘Jack’ only if it is followed by ‘Sprat’.

/Jack(? = Sprat | Frost)/match ‘Jack’ only when it is followed by the ‘Sprat’ or ‘Frost.

But neither ‘Sprat’ nor ‘Frost’ is part of the match.
(? <=y)x Matches ‘x’ only if ‘x’ is preceded by ‘y’. This is called a trailing assertion.

For example, /? <=Jack)Sprat/ will match ‘Sprat’ only if it is preceded by ‘Jack’.

/ (? < = Jack Sprat | Tom)/match ‘Sprat only when it is in front of the’ Jack ‘or’ Tom. But neither ‘Jack’ nor ‘Tom’ is part of the match.
x(? ! y) Matching ‘x’ only if ‘x’ is not followed by ‘y’ is called a positive negative lookup.

For example, only if the number is not followed by a decimal point, /\d+(? ! .). / Matches a number.

Regular expression /\d+(? ! .). /. Exec (“3.141”) matches’ 141 ‘instead of’ 3.141 ‘
(? <! y)x Matching ‘x’ only if ‘x’ is not preceded by ‘y’ is called a reverse negative lookup.

For example, only if the number is not preceded by a minus sign, /(? <! -)\d+/ matches a number.

/ (? <! -)\d+/.exec(‘3’) matches “3”.

/ (? <! -)\d+/.exec(‘-3’)

These four non-capture groups are used to match expression X, but do not contain the text of the expression.

example

How to convert a string of integers into thousands of bit separated forms, such as 10000000000, into 10,000,000,000.

In addition to the regular method, you can use the re to solve the problem

const str = "100000000000";
const reg= / (? =(\B\d{3})+$)/g;
console.log(str.replace(reg, ","));
Copy the code

back

The original string

“Regex”

Greedy matching process analysis

“.*” the first “get control, match in the re”, match success, control to.*

. After gaining control, match the following character. . On behalf of the match any character, representative can match don’t match, it belongs to the greedy identifier, will first try to match, so start with 1 location R next match, ordinal successfully match the R, e, g, e, x, and then continue to match the last character “, the match is successful, by this time have been matched to the end of the string, So the.* match ends, passing the control to the last one in the re.”

“After obtaining control, the match failed because it is at the end of the string, look forward to the state that can be backtracked, give control to the.*, the.* gives up a character”, and then give control to “, and then the match succeeds.

At this point, the whole regular expression is matched, and the matching result is “Regex”. The matching process is backtracked once

“. * “Re
“. * “Reg
“. * “Rege
“. * “Rege
“. * “Regex
“. * “Regex”
“. *” “Regex”
“. * “Regex
“. *” “Regex”

Back in the trap

The following example will get your browser CPU up to 100% as a result of too much backtracking.

console.time('reg')
var reg =  /(a*)*b/ 

var str = 'a'.repeat(28); // aaaaaaaaaaaaa...

reg.exec(str)
console.timeEnd('reg')
Copy the code

Let’s take a brief look at the realization engine of re, which is mainly divided into DFA and NFA

The DFA and the NFA

Cause analysis,

  1. a*Since the greedy mode can directly match the whole string, but due to the existence of B, it needs to be backtracked, but no matter how backtracking is impossible, but NFA is a machine, it will keep backtracking, because(a*)*Think of it as a two-level quantifier combination, so the complexity increases exponentially with the length of the string.

Because it’s a finite state machine, it doesn’t loop forever, it just takes up a lot of CPU, and it will finish after a certain amount of time.

tool

Online sites:

regex101.com/

regexr.com/

Paid software:

www.regexbuddy.com/buynow.html…