Regular expressions have always been a pain in my heart. First, there is no systematic way to learn regular expressions, and second, I always feel that I can’t use them all the time. But I’m sure most people who don’t know much about regular expressions have the same idea. When it comes time to use it, it turns out to be very important. Fortunately, I found an open source book called JavaScript regular Expressions, which is short but very systematic. I highly recommend you read it.😝
Today, with this article to systematically comb through the regular expression aspects, so that you have an overall understanding of the regular expression, can be used in the usual business development, and even when looking at other people’s projects and source code, not frowning
Overview (illustration)
matching
Regular expressions: Match either characters or positions
The matching characters
Fuzzy matching
Transverse fuzzy
The allowed matching length is not fixed, and there can be many cases
Quantifier: {n, m}, meaning that a character can appear at least n times, at most m times.
- As shown in the figure, the re matches the characters that meet the criteria horizontally
Longitudinal fuzzy
When a character is matched, it can be multiple different characters or multiple possibilities
- As shown, this character can be
a
.b
.c
.
Matching character group
Longitudinal fuzzy
[abcdefg] : matches any character
Range notation-
[a-g]: indicates any character from letters A to G
If you need to match the symbol -, you need to escape \-
Partial shorthand for range notation
-
Digit: \ D means [0-9]
-
Word (word): \w represents [0-9A-za-z] numbers, upper and lower case letters, and underscores
-
Space (whitespace) \s represents whitespace characters such as: space, horizontal TAB, vertical TAB, line feed, carriage return, and page feed.
If the above symbol is capitalized, it means not. \D = non-numeric, [^0-9]
Excluded character group
-
^: [^ ABC] does not match any of ABC
The modifier
The use of modifiers can be superimposed
-
G global matching: Finding all substrings in the target string in order that meet the matching pattern, the G modifier, also affects some apis, more on this later. If no global modifier is added, it exits after finding the first qualified string.
-
m
Multi-line matching -
i
Ignore letter case -
UUnicode mode: Enable Unicode matching
-
SdotAll: metacharacter., matches any character
let str = 'I\nLove\nYou' // Note the presence of the newline character 👈
str.replace($/ / ^ |.The '#') // Output: '#I\nLove\nYou' is not global and matches only once
str.replace(/^|$/g.The '#') // output: '#I\nLove\nYou#'
str.replace(/^|$/gm.The '#') // output: '#I#\n#Love#\n#You#
Copy the code
The matching position
All matches find characters, but the re can also match positions
What is location?
The positions between adjacent characters are both the positions mentioned in this article
Regular boundary
In multi-line mode (g), ^, $is represented as the beginning and end
-
^: This symbol not only indicates not, but also indicates the beginning character meaning
In the case of the modifier, ^ indicates the beginning of a match, and in vertical matches, not.
Such as:
/[^123]/ Indicates a non-123 character.
/^123/ Starts with 123 characters.
-
$: indicates the end of the matching character
For example, /123$/ indicates that the matching character must end with 123
Word boundaries
B is short for boundary
\ B: represents the boundary of words, specifically the position between \w and \w, including \w and ^, and \w and $
Here’s an example:
First, \w means [0-9a-za-z], so here’s an example
let str = `a*bc`
str.replace(/\d/g.The '#') // replace = replace
/ / output "# # # * bc#"
Copy the code
- What we’re matching now islocationSo it inserts at all the boundaries
#
. inbc
It’s not a word because it has words on both sidesThe border, so it won’t match
\B: Represents the boundary of a non-word
let str = `a*bc`
str.replace(/\B/g.The '#') // replace = replace
/ / output a * b# "c"
Copy the code
- Notice where you insert it, and it makes sense. A non-word boundary, that’s between two words
Specify the boundary
The following two matches the position before pattern P, which can be a single character or a group
(? =p)
: position before p
For example: insert the character # before the a
let str = `123abc`
str.replace(/ (? =a)/g.The '#')
// "123#abc"
Copy the code
(? ! p)
: not the position before p
let str = `123abc`
str.replace(/ (? ! a)/g.The '#')
// "#1#2#3a#b#c#"
Copy the code
quantifiers
In vertical matching, {m, n} can be used to determine at least how many times a match should be made. There is also a shorthand for this matching pattern
-
? (?)
Equivalent to {0, 1} for presence or absence
-
+(there has to be one first)
Equivalent to {1,} means that at least one occurs, and any number can occur
-
* (optional)
This is equivalent to {0,} representing any occurrence, possibly none
Match the pattern
Greedy matching pattern
On the basis of satisfying the conditions, as many matches as possible
var regex = / \ d {2, 5} / g; // Match at least 2 to 5 consecutive numbers
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) ); // => ["123", "1234", "12345", "12345"]
Copy the code
Lazy matching pattern
It doesn’t match down as long as it satisfies the condition, and it puts one after the quantifier, okay? , can be converted to lazy matching
- Suppose we want to get the ID of a node
let str = '<div id="container" class="main"></div>'
let reg = /id=".*"/
str.match(reg) // Output: [id="container" class="main"]
Copy the code
.
Is a wildcard, that is, any character can be matched,*
Since any character can appear any number of times, and the character is greedy, it will try to match as many as possible, which results in it matching the last double quotation mark
- Turn to lazy matching
let str = '<div id="container" class="main"></div>'
let reg = /id=".*?" / // 👈 with a question mark
str.match(reg) // Output: [id="container"]
Copy the code
- Because the double quotation marks after the container match are sufficient, the match is not continued
grouping
() Views the regular expression in parentheses as a whole
Capturing parentheses
In a grouping structure, the regular expression in parentheses matches the content that is captured and stored, which we call a reference. So how do we get the captured content?
- Accessed via properties of the regular constructor
let str = 'abc-abcd-abcde'
str.match(/(\w{3})-(\w{4})-(\w{5})/g)
// The grouping sort appears from left to right with the re
RegExp$1.// abc
RegExp$2.// abcd
RegExp$3.// abcde
Copy the code
Of course, it doesn’t have to be this way. In replace, you can use the grouping capture directly, as we did at the beginning of this article. In addition to the number type, you can also use the $_ abbreviation
The full name | shorthand | instructions |
---|---|---|
input |
The $_ |
The string searched last |
lastMatch |
$& |
Last matched text |
lastParen |
+ $ |
The capture group that was last matched |
leftContext |
$` | The text that appears before lastMatch in the input string |
rightContext |
$' |
The input string appears in the text after lastMatch |
It is important to note, however, that the constructor property of RegExp is not recommended, a phrase from JavaScript advanced programming
Non-capture parenthesis
(? :p): match but not capture
const str = 'abc123abc'
const reg = /(\d+)/g
str.match(reg)
RegExp$1./ / 👈 '123'
const reg = / (? :\d+)/g / / not capture
// or
const reg = /\d+/g
str.match(reg)
RegExp$1.// 👈 ''
Copy the code
branch
Branch structure is through the | (pipe), the matching mode split, as in JavaScript, or to understand.
Such as: (1) p1 | p2 | p3) of p1, p2, p3 is subpatterns
APIs
String. The match (regex) match
If the re has a G symbol, it returns an array of match strings rather than a standard match format
const str = '123-ABC-123'
const reg1 = /ABC/
const reg2 = /ABC/g
str.match(reg1) // ['ABC', index: 4, input: '123-ABC-123', groups: undefined]
str.match(reg2) // ['ABC']
Copy the code
- capture
const str = '123-ABC-123'
const reg = /(ABC)/
str.match(reg) // ['ABC', 'ABC', index: 4, input: '123-ABC-123', groups: undefined]
Copy the code
Note: One thing to note is that match converts the string argument passed in to a re (in the example below, the argument is mistaken for a wildcard character). This may not serve the original purpose.)
let str = '12.4'
str.match('. ') // ['1', index: 0, input: '12.4', groups: undefined]
Copy the code
string.search(regex)
Returns the index that was successfully matched the first time, or -1 otherwise
Note: The search method is similar to match in that it does an implicit conversion, so be careful
regex.exec(string)
Similar but more powerful than match, exec also returns a standard match format with the global representation G. And the exec method is stateful (remembering the position of the last match)
const str = '123-ABC-456-EFG'
const reg = /\d{3,}/g
reg.exec(str) // ['123', index: 0, input: '123-ABC-123-EFG-456', groups: undefined]
reg.lastIndex / / 3
// Execute the second time
reg.exec(str) // ['456', index: 8, input: '123-ABC-123-EFG-456', groups: undefined]
reg.lastIndex / / 11
// Execute the third time
reg.exec(str) // null
reg.lastIndex / / 0
// Execute again to start the next cycle
Copy the code
regex.test(string)
The test method is also “stateful” with the global modifier G
let reg = /a/g
let str = 'abc-abc-abc'
console.log(reg.test(str), reg.lastIndex)
// true 1
console.log(reg.test(str), reg.lastIndex)
// true 5
console.log(reg.test(str), reg.lastIndex)
// true 9
Copy the code
string.replace(regex, <function | string>)
The replace method is called substitution, but it’s actually very powerful, because we can do a lot of things under the guise of substitution. – I
The second argument to replace can be a callback function or a string. If it is a callback function, it can receive five arguments
match
Matched content$1 to $9
Captured groups (How much of this parameter exists depends on the number of captured groups you set)index
The current indexinput
Input text
Since I set up two capture groups, there are two parameters here, $1 and $2, as references to capture content
const str = "1234, 2345, 3456"
const reg = /(\d)\d{2}(\d)/g
str.replace(reg, function (match, $1, $2, index, input) {
console.log([match, $1, $2, index, input]);
})
/ * [' 1234 ', '1', '4' 0, ', 1234, 2345, 3456] [' 2345 ', '2', '5' 5, '1234, 2345, 3456] [' 3456', '3', '6' 10. '1234, 2345, 3456] * /
Copy the code
How do you write it?
Suppose we want to match a landline phone.
055188888888
0551-88888888
(0551)88888888
Copy the code
1. Understand the pattern rules for each part
What these three strings have in common is that they consist of a 4-digit area code and an 8-digit number, and we know that the area code begins with a 0 and the number does not. so
- The matching rule of the area code is as follows:
0\d{2, 3}
- The rules for matching numbers are as follows:
[1-9]\d{6, 7}
Therefore, the re that matches these three strings is
-
/^0\d{2, 3}[1-9]\d{6, 7}$/
-
/^0\d{2, 3}-[1-9]\d{6, 7}$/
-
/^\(0\d{2, 3}\)[1-9]\d{6, 7}$/
2. After finding all possible strings to match, determine the relationship between them.
Hence the regularity
/^0\d{2, 3}[1-9]\d{6, 7}$|^0\d{2, 3}-[1-9]\d{6, 7}$|^\(0\d{2, 3}\)[1-9]\d{6, 7}$/
Here only roughly to the above three regular use or | rough added together
3. Extract the common parts, just as you normally type code, and abstract some functionality
Pull the number part outward
/^(0\d{2, 3}|0\d{2, 3}-|\(0\d{2, 3}\))[1-9]\d{6, 7}$/
Continue to optimize, using the -? judge
/^(0\d{2, 3}-? |\(0\d{2, 3}\))[1-9]\d{6, 7}$/
🤭 this is basically all we need to know about business development, but there are many more details that I may not have covered. For example, the backtracking principle of regular expression, which is not a difficult concept, but I think the author has done a good job after deleting it. So I hope you can read the original and understand re more deeply!
Thank 😘
If you find the content helpful:
- ❤️ welcome to focus on praise oh! I will do my best to produce high-quality articles
Contact author: Linkcyd 😁 Previous:
-
React Get started with 6 brain maps
-
Interviewer: Come on, hand write a promise
-
Talk about front-end performance optimization based on browser rendering mechanism
-
Prototype and Prototype Chain: How to implement Call, Bind, New yourself?