Regular expression usage manual

Regular expressions (regex or regexp) are useful for extracting information from any text by retrieving one or more matches of a particular pattern (that is, a particular SEQUENCE of ASCII or Unicode characters).

Application: parsing, replacing string, data format conversion, and web crawling.

The funny thing is, once you learn the grammar, You can use this tool in almost any programming language (JavaScript, Java, VB, C #, C/C++, Python, Perl, Ruby, Delphi, R, Tcl, and many others), with only minor differences.

Let’s take a look at some examples and parsing.

The basic grammar

Boundary matching – ^ and $

^The matches any string beginning with The -> Try it!
End $matches any string ending in end
^The end$matches The string exactly (The start, end)
Roar matches any string that contains roar

Quantifier – * +? and {}

ABC * matches ab followed by zero or more c strings [0, +∞] -> Try it!
ABC + matches the string after ab one or more times c [1, +∞]
abc? Matches the string [0, 1] after ab zero or once c
ABC {2} matches the string of the second c after ab
ABC {2,} matches the string c after ab 2 or more times
ABC {2,5} matches the string c after ab 2 to 5 times
A (BC)* Matches the string that follows a zero or more times BC
A (BC){2,5} matches the string BC two to five times after A

Or – | the or []

A (b | c) string matching a followed by b or c
| c a [b] in accordance with the above

Character class – \d \w \s and.

\d match a number -> Try it!
\w Matches a character (letter, digit, character, underscore) -> Try it!
\s matches a space (including tabs, newline \n)
. Matches any non-null character (excluding null characters such as newline \n) -> Try it!
[\s\ s] matches any character

Use. Metacharacters with caution, because character and negation classes are faster and more accurate.

\d, \w and \s use \d, \w and \s respectively to indicate their negation.

For example, \D will match the opposite character.

\DMatch aThe digitalThe character of – >Try it!

In order to correctly understand, you must use a backslash to escape character \ ^. [$() | * +? {\, because they have special meaning.

$\dMatch a$followed by a numberThe character of

Note that you can also match a non-printable character such as tabs \t, newline \n, carriage return character \r.

The modifier

We are learning how to write a regular expression, but have forgotten one basic concept: modifiers

Regular expressions are usually of the form/ABC /, where matching patterns are separated by two slashes. We can specify a flag with the following values at the end of them (or use them in combination).

G (global) : returns no result after the first match, continues to search after the last match, and finally returns all matches (global match).
m (multi-line): when enabled^And $will match the beginning and end of the line, not the entire string.
i (insensitive): Makes the entire expression case insensitive (for instance)/aBc/i would match AbC)

Es6 new

Y (sticky) : similar to the g modifier, y (sticky) is a global match. The next match starts from the next position of the last match. The difference is that the G modifier is ok as long as there is a match in the remaining position, while the Y modifier ensures that the match starts at the first position in the remaining position, which is what “bonding” means.
```
var s = 'aaa_aa_a';

var r1 = /a+/g;
var r2 = /a+/y;

r1.exec(s); // ['aaa']
r2.exec(s); // ['aaa']

r1.exec(s); // ['aa']
r2.exec(s); // null

var r = /a+_/y;

// try again
r.exec(s); // ['aaa_']
r.exec(s); // ['aa_']

r.sticky // true
Copy the code
```
U (Unicode) : “Unicode” mode, used to correctly handle Unicode characters larger than \uFFFF. That is, four bytes of UTF-16 encoding can be handled correctly.
```
var s = '𠮷';

$/ / ^.test(s); // false
/^.$/u.test(s); // true
Copy the code
```
S: Match Italian characters, we know the dot (.) Special characters represent any character except “line terminators” (eg.\n,\r, line separator, segment separator), s modifiers can contain all characters, which is called dotAll pattern, i.e. dot represents all characters.
```
var s = '𠮷';

/. * /s.test(s); // true
Copy the code
```

Intermediate grammar

Grouping and capturing — ()

The a(BC) brackets create a capture group with the value BC -> Try it!
a(? Used: BC) *? : Disable capture groups (non-capture groups) -> Try it!

To understand? You need to understand the concepts of capture and non-capture groups:

() represents the capture group, and () saves the matched value of each group, using $n(n is a number representing the contents of the NTH capture group);

(? 🙂 represents a non-capture group. The only difference is that values matched by a non-capture group are not saved.

Es6 new

a(? < foo >bc)use? <foo>Name the capture group. ->Try it!

If you name the capture group (using?

), we will be able to use the matching result in groups to look up the value of the captured group, the key being the group name.

This operator is useful when extracting information from strings or data. When using multiple capture groups to match data,

We will use the index of the matching result to access their value ($n), also accessible by named group groups.

var string = '1999-12-31';
const matchObj = string.match(/ (? 
      
       \d{4})-(? 
       
        \d{2})-(? 
        
         \d{2})/
        
       
      );
// ["1999-12-31", "1999", "12", "31", index: 0, input: "1999-12-31", groups: {day: "31", month: "12", year: "1999"}]

const newStr = string.replace(/ (? 
      
       \d{4})-(? 
       
        \d{2})-(? 
        
         \d{2})/
        
       
      .'$<day>/$<month>/$<year>')
/ / 31/12/1999

const newStr2 = string.replace(/(\d{4})-(\d{2})-(\d{2})/.'$3 / $2 / $1')
/ / 31/12/1999

// We can also use named group matching inside regular expressions \k< group name >
const RE_TWICE = / ^ (? 
      
       [a-z]+)! \k
       
        $/
       
      
RE_TWICE.test('abc! abc') // true
RE_TWICE.test('abc! ab') // false
Copy the code

Brackets — []

[ABC] match a or b or c, equivalent to a | b | c – > Try it!
[A-C] is consistent with the above
[a-FA-f0-9] Matches a hexadecimal character, case insensitive. -> Try it!
[0-9]% matches a string from 0 to 9 before %
[^ a-za-z] matches a letter that does not go from A-z or a to Z. In this case ^ is used in the negative. -> Try it!

Note that all special characters (including backslashes \) lose their special functionality in parenthesis expressions, so do not use the “escape” function

Greedy and Lazy match

Quantifiers (* + {}) are greedy matches, so they extend the match as much as possible with the text provided.

For example, using <.+> to match

simple div

, it returns the entire text

simple div

To match only one div tag, we can use? Make it lazy match

The <. +? >matching<and>Contains one or more characters, expanded as needed. ->Try it!

Then, better regular schemes should be avoided in favor of using more stringent patterns:

< [^ < >] + >matching<and>Any character contained within. ->Try it!

Advanced grammar

Boundary — \b and \b

\babc\bIf there are no characters before or after ABC, the command is executedWhole words onlyMatch – >Try it!

\b represents the position of a boundary (similar to ^ and $) where one side is a word character (such as \w) and the other side is not a word character (for example, it may be the beginning of a string or a space character eg: \b123).

It also has negation, \B. This matches \b all mismatched positions and can be matched if we find a matching pattern completely surrounded by word characters.

\Babc\BperformBoth sides of ABC are surrounded by charactersMatch – >Try it!

Return the reference -\ 1

([ABC]) use 1 \ \ 1 returns the same as the first capture group match match = = ([ABC]] ([ABC]) – > Try it!
([ABC]] ([DE]), 2, 1 in accordance with the above, use \ 2, \ 1 returns and capture the second group, the first to capture the same set of matching match = = ([ABC]) ([DE]] ([DE]] ([ABC]), and so on – > Try it!
(? < foo >[ABC])\k< foo > We put the name foo into the capture group and can reference it with \k, and the result is the same as the first re == ([ABC])([ABC]) -> Try it!

Forward (pre – assertion) and backward (post – assertion) — (? =) and (? < =)

Firefox is currently not compatible, encountered once, please note

d(? =r) matches d only if d is followed by r, but r does not become part of the entire regular expression -> Try it!
(? <=r)d matches d only if d is preceded by r, but r does not become part of the entire regular expression -> Try it!

You can also use negation operators.

d(? ! R) matches d only if d is not r after d, but r does not become part of the entire regular expression -> Try it!
(?
Try it!

Usage is introduced

Note: Pattern is an instance of RegExp, and STR is an instance of String

usage	instructions	The return value
regexp.test(str)	judge`str`Whether to contain matching results	Contains the return`true`, does not include returns`false`
regexp.exec(str)	According to the`regexp`right`str`Perform regular matching	Returns an array of match results, if no match is found`null`The difference with match is that it returns more complete matching information
str.match(regexp)	According to the`regexp`right`str`Perform regular matching	Returns an array of match results, if no match is found`null`
str.replace(regexp, newSubStr \ function) Break down	According to the`regexp` / `string`right`str`Performs a re match and replaces the match result with`newSubStr` \ `Return value of function`	Return the replaced string.
str.search(regexp)	According to the`regexp`right`str`Perform regular matching	Returns the position of the first match
str.split(regexp)	In order to`regexp`Is the delimiter, yes`str`Cut into arrays	Returns the cut array

Test /exec Precautions

If the regular expression sets the global flag /g, the execution of test() changes the lastIndex property of the regular expression. Successive executions of the test() method will match the string starting at lastIndex (exec() also changes the lastIndex property of the re itself).

The following example shows this behavior:

const digits = /\d+/g;

digits.test("Hello world! 123"); // true
digits.test("321"); // false
digits.test("321"); // true
Copy the code

You can hack like this:

const digits = /\d+/g;

digits.test("Hello world! 123"); // true

digits.lastIndex = 0;
digits.test("321"); // true

digits.lastIndex = 0;
digits.test("321"); // true
Copy the code

For details, please refer to MDN

The replace,

grammar

str.replace(regexp|substr, newSubStr|function)
Copy the code

parameter
- regexp (pattern)
  
  A RegExp object or its literal. What the re matches is replaced by the return value of the second argument.
- substr (pattern)
  
  A string to be replaced by newSubStr. It is treated as an entire string, not as a regular expression. Only the first match will be replaced.
- newSubStr (replacement)
  
  A string used to replace the matching part of the first argument in the original string
- function(a, b, c, d) (replacement)
  
  A function that creates a new substring whose return value replaces the result of the first argument.
  - A: Match
  - B: matched capture group
    
    If there is no capture group, this parameter is not available. If there are multiple capture groups, multiple parameters b, C, D,e… ;
    
    If a capture group is repeated several times, the parameter of the capture group is the result of the last match. For example: (\ d) +
  - C: Index of the match in the original string
  - D: Original string
    
    The last two arguments are always the match index and the original string
If you are still confused about replace, take a look at the following example

conclusion

As you can see, regular expressions are widely used, and I’m sure you’ve seen the rule at least once in your development career. Here’s a list of its applications:

Data validation (for example, checking that the time string is properly formatted)
Data fetching (especially web fetching, finding all pages containing a particular set of words, and finally ordering them in a particular order)
Data wrapping (converting data from “raw” format to another format)
String analysis (for example, capturing all URL GET parameters, capturing a set of text in parentheses)
String substitution (for example, even if a common IDE is used to convert Java or C classes in a code session) into the corresponding JSON object {– replace “;” With “, “to make it lowercase, avoid type declarations, etc.).
Syntax highlighting, file renaming, Packet Sniffing, and many other applications involving strings (where data doesn’t need to be textual)

Have fun and do not forget to recommend the article if you liked it 💚

Appendix: Replace example

Fill in the following two vacancies:

// define
(function(window) {
    function fn(str) {
        this.str = str;
    }

    fn.prototype.format = function () {
        var arg = ____;

        return this.str.replace(____, function (a, b) {
            return arg[b] || ' '; })};window.fn = fn; }) (window);

// use
(function() {
    var t = new fn('<p><a href="{0}">{1}<a><span>{2}</span></p>');
    console.log(t.format('http://www.yonyou.com'.'yonyou'.'Welcome')); }) ();// If you understand the use of replace, it's too easy.
Copy the code

Convert the 87654321 integer to currency $87,654,321 using the re

'87654321'.replace(/(\d)+? (? =(\d{3})+(? ! \d))/g.function(a, b, c, d) {
  return d < 2 ? ("$" + a + ",") : (a + ",");
})
/ / $87654321
// Read more about replace and re

'87654321'.replace(/ ((\ d {1, 3})? =(\d{3})+$)/g.function(a, b, c, d) {
  return d < 2 ? ("$" + a + ",") : (a + ",");
})
/ / $87654321

'87654321'.replace(/ \ d {1, 3} (? =(\d{3})+$)/g.'$&,) // $& is a match; $1, $2... To capture
/ / 87654321
Copy the code

The password is regular and contains at least six characters, including at least one uppercase letter, one lowercase letter, and one digit

/ (? * [=.0-9(])? =.*[a-z])(? =.*[A-Z])^[0-9A-Za-z]{6,}$/.test('w44Y4S')
// The first three leading assertions are for the constraint on the leading ^ term/ ^. * (? . = {6(,})? =.*\d)(? =.*[A-Z])(? =.*[a-z])/.test('w44sYw')
Copy the code

reference

Regex tutorial — A quick cheatsheet by examples
Regular expression? = and? : and? ! The understanding of the
[JS advanced] test, exec, match, replace
Introduction to ES6 standard (3rd edition) — Ruan Yifeng
A regular expression surprise in JavaScript