Regular expressions are matching patterns that match either characters or positions.

The basic grammar

character meaning
\ escape
Matching position:
^ Matches the start of the input
$ Matches the end of the input
\b Match the beginning or end of words (the beginning and end of words is \b but not \w)
x(? =y) Forward looking, matches only x immediately followed by y
x(? ! y) Negative foresight, only matches x that is not followed by y
The matching characters
\w Letters, digits, and underscores
\d digital
\s Whitespace characters, including Spaces, tabs, page feeds, and line feeds
[xyz] Matches any character in []
x|y Matches x or y
. Matches any single character except newline by default
quantifiers
* Matches the previous expression 0 or more times
? Matches the previous expression 0 or 1 times

The re defaults to greedy matching, that is, matching as many characters as possible

After the quantifier? Indicates that the current match is lazy
+ Matches the previous expression 1 or more times
{n} Matches the previous expression n times
{n,} The previous expression occurs at least n times
flags
g Global matching
i Case insensitive
m Line search

Basic method

Create a regular expression

var reg = /\w+/; / / literal
/ / equivalent to the
var reg = new RegExp('\\w+'); // String arguments to the constructor
Copy the code

Note: When using constructors to create regular objects, you need the normal character escape rules (that is, you also need a backslash \ before escaping a character).

Regular expression (RegExp) method

  • test

    /a/.test('a'); // true
    /a/.test('b'); // false
    Copy the code
  • exec

    A regExp is a stateful object that performs the same method on the same or equal string multiple times, iterating over the results of all matches.

    Returns more than test, returns an array if the match is successful, and updates the lastIndex property.

    The array returned is actually an array of classes. The first item is the matched string, followed by the captured group, and the groups, index, and input attributes.

    var reg = /(a|d)/g;
    var target = 'abcd';
    var arr;
    while(arr = reg.exec(target)) {
        console.log(arr);
    }
    console.log(arr);
    
    // Array of length 2
    // ["a", "a", index: 0, input: "abcd", groups: undefined]
    // ["d", "d", index: 3, input: "abcd", groups: undefined]
    // null
    Copy the code

Method of a String object

  • split

    var values = '1,2,3,4,5'; // Values separated by Chinese or English commas
    var arr = values.split(/ /, |); // ['1', '2', '3', '4', '5']
    Copy the code
  • match

    • If the regular expression has Flag G, the match method returns all matches, but not the capture group.

    • If not, it returns the first matching and associated capture group, as well as the index, input, and Groups attributes, much like the exec method. If there is no match, null is returned.

    const str = 'Hello World';
    console.log(str.match(/[A-Z]/)); // ["H", index: 0, input: "Hello World", groups: undefined]
    console.log(str.match(/[A-Z]/g)); // ["H", "W"]
    Copy the code
  • replace

    console.log('abcd'.replace(/ab/.'AB')); // ABcd
    Copy the code
  • search

    Returns the index matched to the location, or -1 on a failed match.

    console.log('hello World'.search(/[A-Z]/)); / / 6
    Copy the code

Grouping and capture

reference

In regular expressions, parentheses are used to make the structure clearer, but also have a more important function: grouping and capturing. That is, matching data can be extracted or replaced. These functions need to be used together with related apis, such as exec and match.

Suppose we want to match a year, month and day with a regular expression like this:

var reg = /\d{4}-\d{2}-\d{2}/;
reg.exec('2021-06-01'); // ["2021-06-01", index: 0, input: "2021-06-01", groups: undefined]
Copy the code

It’s a match, but what if we want to extract information separately to year, month and day? Consider the following regular expression:

var reg = /(\d{4})-(\d{2})-(\d{2})/;
reg.exec('2021-06-01'); // ["2021-06-01", "2021", "06", "01", index: 0, input: "2021-06-01", groups: undefined]
Copy the code

Starting with the second item, each item corresponds to the matched contents of the three brackets. This is the grouping, which can be captured in the exec method and obtained by RegExp. 1−1-1−9.

console.log(RegExp. $1);// 2021console.log(RegExp.$2); // 06console.log(RegExp.$3); / / 01
Copy the code

backreferences

In addition to the above reference, there is another kind of backreference, where the grouping appears before the re itself refers to it.

Consider the scenario where we want to match dates in three formats: YYYY-MM-DD, YYYY /mm/ DD, and YYYY.mm-dd. Maybe we could write re like this:

var reg = /\d{4}(-|\/|\.) \d{2}(-|\/|\.) \d{2}/;
Copy the code

While it does match all three formats, this regular expression also matches dates like ‘2021-06/02’. How do you keep the hyphens consistent? This is where the backreference comes in.

var reg = /\d{4}(-|\/|\.) \d{2}\1\d{2}/;
Copy the code

In this case, \1 refers to the first group, and can be captured by escaping characters such as \1, \2, \3. No matter what is matched in the previous group, the backreference matches the same specific character.

If nested parentheses are encountered, the left parenthesis is used to identify the grouping order.

Uncaptured grouping

As in the above example, we create a group by wrapping the concatenation in parentheses, but when capturing, we don’t really care about the concatenation, but only the matching year, month and day, so we can use the non-capturing group (? 🙂 :

var reg = /(\d{4})(? :-|\.|\/)(\d{2})(? :-|\.|\/)(\d{2})/;console.log(reg.exec('2021-06-01')); // ["2021-06-01", "2021", "06", "01", index: 0, input: "2021-06-01", groups: undefined]
Copy the code

In this way, we can capture only the year, month and day of the grouping.

Regular construction

Before we build our re, we need to consider a few questions:

  • Whether re can be used

  • There is no need to use regees for simple problems that can be solved with string apis.

  • Whether it is necessary to build a complex re

    For password matching, if there are many rules, you might build a large regular expression, but you can also use many small ones

    function checkPassword(string) {  if(! regex1.test(string))return false;  if(! regex2.test(string))return false;  if(! regex3.test(string))return false; . }Copy the code

    Regex matches by backtracking. If a regular expression is too complex, performance may be affected.

When building regular expressions, also note:

  • Matches the expected string
  • Does not match unexpected strings
  • Efficiency optimization:
    • Use concrete character groups instead of wildcards
    • Use non-capture grouping
    • Separate out the determinate character
    • Extract the common part of the branch
    • Reduce the number of branches

Usage scenarios

Form validation/extraction

Verify that an input value matches a pattern, using the match method, and return null on failure and an array of classes with captured grouping information on success.

segmentation

Extract data from a comma-separated string, compatible with Chinese and English commas.

const arr = value.split(/ /, |);
Copy the code

To complement…

A site for testing regular expressions: regex101.com/

The resources

JS regular mini-book

NEXT Degree Programme