Regular expressions are matching patterns that match either characters or positions.
The basic grammar
character | meaning |
---|---|
\ | escape |
Matching position: | |
^ | Matches the start of the input |
$ | Matches the end of the input |
\b | Match the beginning or end of words (the beginning and end of words is \b but not \w) |
x(? =y) | Forward looking, matches only x immediately followed by y |
x(? ! y) | Negative foresight, only matches x that is not followed by y |
The matching characters | |
\w | Letters, digits, and underscores |
\d | digital |
\s | Whitespace characters, including Spaces, tabs, page feeds, and line feeds |
[xyz] | Matches any character in [] |
x|y | Matches x or y |
. | Matches any single character except newline by default |
quantifiers | |
* | Matches the previous expression 0 or more times |
? | Matches the previous expression 0 or 1 times The re defaults to greedy matching, that is, matching as many characters as possible After the quantifier? Indicates that the current match is lazy |
+ | Matches the previous expression 1 or more times |
{n} | Matches the previous expression n times |
{n,} | The previous expression occurs at least n times |
flags | |
g | Global matching |
i | Case insensitive |
m | Line search |
Basic method
Create a regular expression
var reg = /\w+/; / / literal
/ / equivalent to the
var reg = new RegExp('\\w+'); // String arguments to the constructor
Copy the code
Note: When using constructors to create regular objects, you need the normal character escape rules (that is, you also need a backslash \ before escaping a character).
Regular expression (RegExp) method
-
test
/a/.test('a'); // true /a/.test('b'); // false Copy the code
-
exec
A regExp is a stateful object that performs the same method on the same or equal string multiple times, iterating over the results of all matches.
Returns more than test, returns an array if the match is successful, and updates the lastIndex property.
The array returned is actually an array of classes. The first item is the matched string, followed by the captured group, and the groups, index, and input attributes.
var reg = /(a|d)/g; var target = 'abcd'; var arr; while(arr = reg.exec(target)) { console.log(arr); } console.log(arr); // Array of length 2 // ["a", "a", index: 0, input: "abcd", groups: undefined] // ["d", "d", index: 3, input: "abcd", groups: undefined] // null Copy the code
Method of a String object
-
split
var values = '1,2,3,4,5'; // Values separated by Chinese or English commas var arr = values.split(/ /, |); // ['1', '2', '3', '4', '5'] Copy the code
-
match
-
If the regular expression has Flag G, the match method returns all matches, but not the capture group.
-
If not, it returns the first matching and associated capture group, as well as the index, input, and Groups attributes, much like the exec method. If there is no match, null is returned.
const str = 'Hello World'; console.log(str.match(/[A-Z]/)); // ["H", index: 0, input: "Hello World", groups: undefined] console.log(str.match(/[A-Z]/g)); // ["H", "W"] Copy the code
-
-
replace
console.log('abcd'.replace(/ab/.'AB')); // ABcd Copy the code
-
search
Returns the index matched to the location, or -1 on a failed match.
console.log('hello World'.search(/[A-Z]/)); / / 6 Copy the code
Grouping and capture
reference
In regular expressions, parentheses are used to make the structure clearer, but also have a more important function: grouping and capturing. That is, matching data can be extracted or replaced. These functions need to be used together with related apis, such as exec and match.
Suppose we want to match a year, month and day with a regular expression like this:
var reg = /\d{4}-\d{2}-\d{2}/;
reg.exec('2021-06-01'); // ["2021-06-01", index: 0, input: "2021-06-01", groups: undefined]
Copy the code
It’s a match, but what if we want to extract information separately to year, month and day? Consider the following regular expression:
var reg = /(\d{4})-(\d{2})-(\d{2})/;
reg.exec('2021-06-01'); // ["2021-06-01", "2021", "06", "01", index: 0, input: "2021-06-01", groups: undefined]
Copy the code
Starting with the second item, each item corresponds to the matched contents of the three brackets. This is the grouping, which can be captured in the exec method and obtained by RegExp. 1−1-1−9.
console.log(RegExp. $1);// 2021console.log(RegExp.$2); // 06console.log(RegExp.$3); / / 01
Copy the code
backreferences
In addition to the above reference, there is another kind of backreference, where the grouping appears before the re itself refers to it.
Consider the scenario where we want to match dates in three formats: YYYY-MM-DD, YYYY /mm/ DD, and YYYY.mm-dd. Maybe we could write re like this:
var reg = /\d{4}(-|\/|\.) \d{2}(-|\/|\.) \d{2}/;
Copy the code
While it does match all three formats, this regular expression also matches dates like ‘2021-06/02’. How do you keep the hyphens consistent? This is where the backreference comes in.
var reg = /\d{4}(-|\/|\.) \d{2}\1\d{2}/;
Copy the code
In this case, \1 refers to the first group, and can be captured by escaping characters such as \1, \2, \3. No matter what is matched in the previous group, the backreference matches the same specific character.
If nested parentheses are encountered, the left parenthesis is used to identify the grouping order.
Uncaptured grouping
As in the above example, we create a group by wrapping the concatenation in parentheses, but when capturing, we don’t really care about the concatenation, but only the matching year, month and day, so we can use the non-capturing group (? 🙂 :
var reg = /(\d{4})(? :-|\.|\/)(\d{2})(? :-|\.|\/)(\d{2})/;console.log(reg.exec('2021-06-01')); // ["2021-06-01", "2021", "06", "01", index: 0, input: "2021-06-01", groups: undefined]
Copy the code
In this way, we can capture only the year, month and day of the grouping.
Regular construction
Before we build our re, we need to consider a few questions:
-
Whether re can be used
-
There is no need to use regees for simple problems that can be solved with string apis.
-
Whether it is necessary to build a complex re
For password matching, if there are many rules, you might build a large regular expression, but you can also use many small ones
function checkPassword(string) { if(! regex1.test(string))return false; if(! regex2.test(string))return false; if(! regex3.test(string))return false; . }Copy the code
Regex matches by backtracking. If a regular expression is too complex, performance may be affected.
When building regular expressions, also note:
- Matches the expected string
- Does not match unexpected strings
- Efficiency optimization:
- Use concrete character groups instead of wildcards
- Use non-capture grouping
- Separate out the determinate character
- Extract the common part of the branch
- Reduce the number of branches
Usage scenarios
Form validation/extraction
Verify that an input value matches a pattern, using the match method, and return null on failure and an array of classes with captured grouping information on success.
segmentation
Extract data from a comma-separated string, compatible with Chinese and English commas.
const arr = value.split(/ /, |);
Copy the code
To complement…
A site for testing regular expressions: regex101.com/
The resources
JS regular mini-book
NEXT Degree Programme