Regular expressions have always been a pain in my heart. First, there is no systematic way to learn regular expressions, and second, I always feel that I can’t use them all the time. But I’m sure most people who don’t know much about regular expressions have the same idea. When it comes time to use it, it turns out to be very important. Fortunately, I found an open source book called JavaScript regular Expressions, which is short but very systematic. I highly recommend you read it.😝

Today, with this article to systematically comb through the regular expression aspects, so that you have an overall understanding of the regular expression, can be used in the usual business development, and even when looking at other people’s projects and source code, not frowning

Overview (illustration)

matching

Regular expressions: Match either characters or positions

The matching characters

Fuzzy matching

Transverse fuzzy

The allowed matching length is not fixed, and there can be many cases

Quantifier: {n, m}, meaning that a character can appear at least n times, at most m times.

  • As shown in the figure, the re matches the characters that meet the criteria horizontally

Longitudinal fuzzy

When a character is matched, it can be multiple different characters or multiple possibilities

  • As shown, this character can bea.b.c.

Matching character group

Longitudinal fuzzy

[abcdefg] : matches any character

Range notation-

[a-g]: indicates any character from letters A to G

If you need to match the symbol -, you need to escape \-

Partial shorthand for range notation
  • Digit: \ D means [0-9]

  • Word (word): \w represents [0-9A-za-z] numbers, upper and lower case letters, and underscores

  • Space (whitespace) \s represents whitespace characters such as: space, horizontal TAB, vertical TAB, line feed, carriage return, and page feed.

If the above symbol is capitalized, it means not. \D = non-numeric, [^0-9]

Excluded character group

  • ^: [^ ABC] does not match any of ABC

The modifier

The use of modifiers can be superimposed

  • G global matching: Finding all substrings in the target string in order that meet the matching pattern, the G modifier, also affects some apis, more on this later. If no global modifier is added, it exits after finding the first qualified string.

  • mMulti-line matching

  • iIgnore letter case

  • UUnicode mode: Enable Unicode matching

  • SdotAll: metacharacter., matches any character

let str = 'I\nLove\nYou' // Note the presence of the newline character 👈

str.replace($/ / ^ |.The '#') // Output: '#I\nLove\nYou' is not global and matches only once

str.replace(/^|$/g.The '#') // output: '#I\nLove\nYou#'

str.replace(/^|$/gm.The '#') // output: '#I#\n#Love#\n#You#
Copy the code

The matching position

All matches find characters, but the re can also match positions

What is location?

The positions between adjacent characters are both the positions mentioned in this article

Regular boundary

In multi-line mode (g), ^, $is represented as the beginning and end

  • ^: This symbol not only indicates not, but also indicates the beginning character meaning

    In the case of the modifier, ^ indicates the beginning of a match, and in vertical matches, not.

    Such as:

    /[^123]/ Indicates a non-123 character.

    /^123/ Starts with 123 characters.

  • $: indicates the end of the matching character

    For example, /123$/ indicates that the matching character must end with 123

Word boundaries

B is short for boundary

\ B: represents the boundary of words, specifically the position between \w and \w, including \w and ^, and \w and $

Here’s an example:

First, \w means [0-9a-za-z], so here’s an example

let str = `a*bc`

str.replace(/\d/g.The '#') // replace = replace

/ / output "# # # * bc#"
Copy the code
  • What we’re matching now islocationSo it inserts at all the boundaries#. inbcIt’s not a word because it has words on both sidesThe border, so it won’t match

\B: Represents the boundary of a non-word

let str = `a*bc`

str.replace(/\B/g.The '#') // replace = replace

/ / output a * b# "c"
Copy the code
  • Notice where you insert it, and it makes sense. A non-word boundary, that’s between two words

Specify the boundary

The following two matches the position before pattern P, which can be a single character or a group

  • (? =p): position before p

For example: insert the character # before the a

let str = `123abc`

str.replace(/ (? =a)/g.The '#')

// "123#abc"
Copy the code
  • (? ! p): not the position before p
let str = `123abc`

str.replace(/ (? ! a)/g.The '#')

// "#1#2#3a#b#c#"
Copy the code

quantifiers

In vertical matching, {m, n} can be used to determine at least how many times a match should be made. There is also a shorthand for this matching pattern

  • ? (?)

    Equivalent to {0, 1} for presence or absence

  • +(there has to be one first)

    Equivalent to {1,} means that at least one occurs, and any number can occur

  • * (optional)

    This is equivalent to {0,} representing any occurrence, possibly none

Match the pattern

Greedy matching pattern

On the basis of satisfying the conditions, as many matches as possible

var regex = / \ d {2, 5} / g; // Match at least 2 to 5 consecutive numbers
var string = "123, 1234, 12345, 123456"; 
console.log( string.match(regex) ); // => ["123", "1234", "12345", "12345"]
Copy the code

Lazy matching pattern

It doesn’t match down as long as it satisfies the condition, and it puts one after the quantifier, okay? , can be converted to lazy matching

  • Suppose we want to get the ID of a node
let str = '<div id="container" class="main"></div>'

let reg = /id=".*"/

str.match(reg) // Output: [id="container" class="main"]
Copy the code
  • .Is a wildcard, that is, any character can be matched,*Since any character can appear any number of times, and the character is greedy, it will try to match as many as possible, which results in it matching the last double quotation mark

  • Turn to lazy matching
let str = '<div id="container" class="main"></div>'

let reg = /id=".*?" / // 👈 with a question mark

str.match(reg) // Output: [id="container"]
Copy the code
  • Because the double quotation marks after the container match are sufficient, the match is not continued

grouping

() Views the regular expression in parentheses as a whole

Capturing parentheses

In a grouping structure, the regular expression in parentheses matches the content that is captured and stored, which we call a reference. So how do we get the captured content?

  • Accessed via properties of the regular constructor
let str = 'abc-abcd-abcde'

str.match(/(\w{3})-(\w{4})-(\w{5})/g)

// The grouping sort appears from left to right with the re
RegExp$1.// abc
RegExp$2.// abcd
RegExp$3.// abcde
Copy the code

Of course, it doesn’t have to be this way. In replace, you can use the grouping capture directly, as we did at the beginning of this article. In addition to the number type, you can also use the $_ abbreviation

The full name shorthand instructions
input The $_ The string searched last
lastMatch $& Last matched text
lastParen + $ The capture group that was last matched
leftContext $` The text that appears before lastMatch in the input string
rightContext $' The input string appears in the text after lastMatch

It is important to note, however, that the constructor property of RegExp is not recommended, a phrase from JavaScript advanced programming

Non-capture parenthesis

(? :p): match but not capture

const str = 'abc123abc'
const reg = /(\d+)/g

str.match(reg)
RegExp$1./ / 👈 '123'

const reg = / (? :\d+)/g / / not capture
// or
const reg = /\d+/g


str.match(reg)
RegExp$1.// 👈 ''
Copy the code

branch

Branch structure is through the | (pipe), the matching mode split, as in JavaScript, or to understand.

Such as: (1) p1 | p2 | p3) of p1, p2, p3 is subpatterns

APIs

String. The match (regex) match

If the re has a G symbol, it returns an array of match strings rather than a standard match format

const str = '123-ABC-123'
const reg1 = /ABC/
const reg2 = /ABC/g

str.match(reg1) // ['ABC', index: 4, input: '123-ABC-123', groups: undefined]
str.match(reg2) // ['ABC']
Copy the code
  • capture
const str = '123-ABC-123'
const reg = /(ABC)/

str.match(reg) // ['ABC', 'ABC', index: 4, input: '123-ABC-123', groups: undefined]
Copy the code

Note: One thing to note is that match converts the string argument passed in to a re (in the example below, the argument is mistaken for a wildcard character). This may not serve the original purpose.)

let str = '12.4'

str.match('. ') // ['1', index: 0, input: '12.4', groups: undefined]
Copy the code

string.search(regex)

Returns the index that was successfully matched the first time, or -1 otherwise

Note: The search method is similar to match in that it does an implicit conversion, so be careful

regex.exec(string)

Similar but more powerful than match, exec also returns a standard match format with the global representation G. And the exec method is stateful (remembering the position of the last match)

const str = '123-ABC-456-EFG'
const reg = /\d{3,}/g

reg.exec(str) // ['123', index: 0, input: '123-ABC-123-EFG-456', groups: undefined]
reg.lastIndex / / 3
// Execute the second time
reg.exec(str) // ['456', index: 8, input: '123-ABC-123-EFG-456', groups: undefined]
reg.lastIndex / / 11
// Execute the third time
reg.exec(str) // null
reg.lastIndex / / 0
// Execute again to start the next cycle

Copy the code

regex.test(string)

The test method is also “stateful” with the global modifier G

let reg = /a/g
let str = 'abc-abc-abc'

console.log(reg.test(str), reg.lastIndex)
// true 1
console.log(reg.test(str), reg.lastIndex)
// true 5
console.log(reg.test(str), reg.lastIndex)
// true 9
Copy the code

string.replace(regex, <function | string>)

The replace method is called substitution, but it’s actually very powerful, because we can do a lot of things under the guise of substitution. – I

The second argument to replace can be a callback function or a string. If it is a callback function, it can receive five arguments

  • matchMatched content
  • $1 to $9Captured groups (How much of this parameter exists depends on the number of captured groups you set)
  • indexThe current index
  • inputInput text

Since I set up two capture groups, there are two parameters here, $1 and $2, as references to capture content

const str = "1234, 2345, 3456"
const reg = /(\d)\d{2}(\d)/g

str.replace(reg, function (match, $1, $2, index, input) {
    console.log([match, $1, $2, index, input]);
})
/ * [' 1234 ', '1', '4' 0, ', 1234, 2345, 3456] [' 2345 ', '2', '5' 5, '1234, 2345, 3456] [' 3456', '3', '6' 10. '1234, 2345, 3456] * /
Copy the code

How do you write it?

Suppose we want to match a landline phone.

055188888888 
0551-88888888 
(0551)88888888
Copy the code

1. Understand the pattern rules for each part

What these three strings have in common is that they consist of a 4-digit area code and an 8-digit number, and we know that the area code begins with a 0 and the number does not. so

  • The matching rule of the area code is as follows:0\d{2, 3}
  • The rules for matching numbers are as follows:[1-9]\d{6, 7}

Therefore, the re that matches these three strings is

  1. /^0\d{2, 3}[1-9]\d{6, 7}$/

  2. /^0\d{2, 3}-[1-9]\d{6, 7}$/

  3. /^\(0\d{2, 3}\)[1-9]\d{6, 7}$/

2. After finding all possible strings to match, determine the relationship between them.

Hence the regularity

/^0\d{2, 3}[1-9]\d{6, 7}$|^0\d{2, 3}-[1-9]\d{6, 7}$|^\(0\d{2, 3}\)[1-9]\d{6, 7}$/

Here only roughly to the above three regular use or | rough added together

3. Extract the common parts, just as you normally type code, and abstract some functionality

Pull the number part outward

/^(0\d{2, 3}|0\d{2, 3}-|\(0\d{2, 3}\))[1-9]\d{6, 7}$/

Continue to optimize, using the -? judge

/^(0\d{2, 3}-? |\(0\d{2, 3}\))[1-9]\d{6, 7}$/


🤭 this is basically all we need to know about business development, but there are many more details that I may not have covered. For example, the backtracking principle of regular expression, which is not a difficult concept, but I think the author has done a good job after deleting it. So I hope you can read the original and understand re more deeply!

Thank 😘


If you find the content helpful:

  • ❤️ welcome to focus on praise oh! I will do my best to produce high-quality articles

Contact author: Linkcyd 😁 Previous:

  • React Get started with 6 brain maps

  • Interviewer: Come on, hand write a promise

  • Talk about front-end performance optimization based on browser rendering mechanism

  • Prototype and Prototype Chain: How to implement Call, Bind, New yourself?