This is the second in a series of mind mapping foresides on regular expressions. First of all, I would like to say my starting point. The reason why I draw a mind map by myself is that I have added my own understanding to the mind map, which is easier to remember. I have seen a lot of other people’s mind maps before. Although they are useful, it is difficult to absorb two or three percent of the nutrients. Therefore, it is suggested that readers of this series may consider organizing the knowledge points themselves after reading the articles, if time permits, for better understanding and absorption.
Recommend the following articles in the same series:
- Learn about Javascript objects, prototypes, and inheritance
Many front-end novices are intimidated when they encounter regular expressions. When I first learned, I basically skipped the regular expressions chapter directly. In addition to some common regular expressions on Copy Internet to do form verification, the rest of the time I almost didn’t know how to write a regular expression.
However, when it came time to actually write a regular expression for a particular business, I found that my regular expression knowledge was really stretched. So I’ve also put together a mind map of some of the things you need to know about regular expressions.
What is a regular expression
Regular expressions are also called regular expressions. Regular Expression (often abbreviated as regex, regexp, or RE in code) is a computer science concept. Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule).
In the process of software development, we will more or less come into contact with regular expressions. For the front end, regular expressions can not only verify forms, text search and replacement, but also be used as syntax parser for AST, editor and other fields.
Regular notation
Direct scalar representation
Direct quantities are also called literal quantities and are written as follows:
/^\d+$/g
Copy the code
The regular expression of the direct method is converted to a new RegExp object when executed. I think it is because direct quantities do not have the ability to call methods, only converted to objects, which makes calling methods possible. /^\d+$/.test().
When using the lastIndex object to determine the termination condition in a loop, do not use direct regular expression writing. Otherwise, lastIndex will be reset to 0 every time the loop executes a literal regular expression, which is converted to a new RegExp object. The corresponding lastIndex will of course become 0.
RegExp object representation
var pattern = new RegExp(/^\d+$/.'g')
Copy the code
The first argument can take either a regular expression or a string. When passing a string as the first argument, you do not need to start or end with a slash. If you use a special character \ in a string, you need to add a \ before \ to prevent \ from escaping in the string.
"\s"= = ="s" // true
Copy the code
The string “\\s” can correctly represent \s
The second argument represents flags. Acceptable flags are I, g, m, and so on.
Logo flags
i
If flag I is enabled, the regular expression is executed case-insensitive.
/ ABC /i.test(‘ABC’) is equivalent to/ABC /i.test(‘ABC’)
g
If the flag G is enabled, the regular expression performs global matching. After a result is matched, the regular expression does not stop matching until there are no more characters matching the rule.
m
If the flag M is enabled, the regular expression performs a multi-line match, with ^ matching at the beginning of each line or the entire string, and $matching at the end of each line or the entire string.
The following is an example:
/^\d+$/.test('123\n456') // false
/^\d+$/m.test('123\n456') // true
Copy the code
You can still match the entire string
/^\d+\n\d+$/m.test('123\n45') // true
Copy the code
Position qualifier
^
The start of the matching character. For example, if you must start with a number, you could write:
/^\d/
Copy the code
$
The end of the matched character. For example, it must end with a number.
/\d$/
Copy the code
Scope of matching
Range matching is achieved using square brackets [].
Square brackets [] are used for range matching, that is, finding characters within a range. For example, [0-9] matches a number, and [a-z] matches any of the 26 characters from lowercase letters a to z.
If you want to match characters that are not in square brackets, you can start the square brackets with a ^, such as [^0-9], to match non-numbers, which is equivalent to \D.
Major metacharacter
.
Match any character except the newline \n. To match any character, use /[.\n]*/.
\s
Matches any null character, including space, TAB \t, vertical TAB \v, newline \n, carriage return \r, feed \f. \s is equivalent to [\t\v\n\r\f]. Note that there is a space in the first position of the square brackets.
Here’s the difference between a newline and a carriage return:
- A newline
\n
: The cursor moves down a line without returning to the beginning of the line. - A carriage return
\r
: The cursor returns to the beginning of the line without a line feed.
\S
\S = \S; \S = \S; \S = \S; \S = \S;
/[\s\S]/
Copy the code
\d
\d is used to match numbers and is equivalent to [0-9].
\D
\D is the inverse set of \D, that is, matching non-numbers, equivalent to [^0-9].
\w
\w is used to match word characters, including 0-9, a-z, a-z, and the underscore _, equivalent to [a-za-z0-9_].
\W
\W is the inverse set of \W, used to match non-word characters, and is equivalent to [^ a-za-z0-9_].
\n
\n is a newline character often encountered in development, and the \s mentioned above includes \n. Therefore, any character that can be matched by \n must also be matched by \s.
\b
\b is used to match the word boundary, the beginning or end of a word.
At first, I didn’t really understand the function of \b in regular expressions.
Until I tried the case myself
'I love you'.match(/love/)
'Iloveyou'.match(/love/)
Copy the code
Both of these expressions match the result “love”.
But sometimes we don’t want the string ‘Iloveyou’ to be matched, because it has no Spaces between the words.
So \b has a meaning for its existence. Look at the following example:
'I love you'.match(/\blove\b/)
'Iloveyou'.match(/\blove\b/) // null
Copy the code
The first expression still matches normally, but the second does not, as expected.
Some people might say, well, I can match it with Spaces.
'I love you'.match(/ love /)
Copy the code
Space and \b are a little different in this scenario, which is reflected in the result of match.
If we match with a space, then the first item in the result array of match is “love” with a space. However, many times we don’t want a space in the result, so it makes sense to have \b there.
\B
As opposed to \b, which stands for non-word boundary. That is, when matching with \B, the target character cannot be preceded or followed by a space.
Suppose \B comes before, for example
/\Babc/.test('111 abc') // false
Copy the code
Suppose \B after, for example
/abc\B/.test('abc 111') // false
Copy the code
The escape character \
Since many characters in regular expressions have special meanings, such as (,), \, [,], and +, you must escape \ if you really want to match them.
/ \//.test('/'); // true
Copy the code
Or |
Implementation or logic is relatively simple, provides | regular expressions.
Note that | partition is about the whole expression, rather than a single common characters.
So,
/^ab|cd|ef$/.test('ab') // true
/^ab|cd|ef$/.test('cd') // true
/^ab|cd|ef$/.test('ace') // false
Copy the code
Notice also that the | from left to right of priority, if left on the matching of the, on the right side of the is ignored, even on the right side of the match looks more “perfect”.
/ a | ab /. The exec (‘ ab ‘), the result is
["a".index: 0.input: "ab".groups: undefined]
Copy the code
quantifiers
?
Matches the previous subexpression zero or one times
+
Matches the previous subexpression one or more times
*
Matches the previous subexpression zero or any number of times
{n,m}
Matches the previous common character or subexpression at least n times and at most m times
{n,}
Matches the previous common character or subexpression at least n times
{n}
Matches the previous common character or subexpression n times
greed
A greedy match is to match as many matches as possible, and to encroach on as many matches as possible if the matching conditions are met.
Greedy matching is the default, such as /\d? / will match as many digits as possible, and /\d+/ and /\d*/ will match as many digits as possible.
So for example,
'123456789'.match(/^(\d+)(\d{2,})$/)
Copy the code
The first entry of the capture group in the above results is “1234567” and the second entry is “89”.
Why is that? Since \d+ is a greedy match, match as many as possible, without \d{2,}, the first item in the capture group would be directly “123456789”. But since \d{2,} exists, \d+ will save face for \d{2,} by satisfying its minimum condition, which is to match 2 digits, while \d+ itself matches 7 digits.
Not greed
Non-greedy matching is matching as little as possible, generally in quantifiers, right? Plus * and then one more, right? , which means to match as little as possible, leaving the opportunity for later matching rules.
Or take the example from the greedy mode, and change the \d+ to the non-greedy mode \d+? .
'123456789'.match(/^(\d+?) (\d{2,})$/)
Copy the code
The first item in the capture group is “1”, and the second item becomes “23456789”.
Why is that? Because in non-greedy mode, matches are made as little as possible, leaving the opportunity for later matching rules.
grouping
Grouping is a very useful artifact in regex. Anything wrapped in parentheses () is a grouping. In regex, it is expressed this way:
/(\d*)([a-z]*)/
Copy the code
Capture group ()
Using capture groups, we can capture key characters.
Such as
var group = '123456789hahaha'.match(/(\d*)([a-z]*)/)
Copy the code
Group 1 matches any number and group 2 matches any lowercase letter.
Then we can get the matching result of the two groups in the return of the match method, group[1] is “123456789”, group[2] is “hahaha”.
We can also get the first nine group matches from the static attributes $1 to $9 of the RegExp. RegExp.$1 is “123456789” and RegExp.$2 is “hahaha”. However, RegExp.$1~$9 is not standard, and although many browsers implement it, try not to use it in production environments.
The same applies to the replace method of the string, but when the replace method is called, we need to refer to the group in the form $1, $2, and $n.
"123456789hahaha".replace(/(\d*)([a-z]*)/."$1") / / "123456789"
Copy the code
With $1, we can replace the source string with the string matched by group 1, which is “123456789”.
Non-capture group (? 🙂
A non-capturing group is a group that does not generate references. It is also wrapped in parentheses (), but the first part of the parentheses is? That is /(? :\d*)/ this form.
Let’s take a look at the previous example:
var group = '123456789hahaha'.match(/ (? :\d*)([a-z]*)/)
Copy the code
Since non-capture groups do not generate references, group[1] is “hahaha”; Similarly, RegExp.$1 is also “hahaha.”
Seeing this, I can’t help but wonder, since I don’t need to refer to a non-capture group, what’s the point of a non-capture group?
After thinking about it for a while, I think the non-capture group has some advantages and needs:
-
A non-capture group is less costly in memory than a capture group because it does not require reference generation
-
The grouping is for the convenience of adding quantifiers. We could not generate the reference, but it would be inconvenient to add quantifiers to a set of characters without grouping.
'1a2b3c... '.match(/ (? D: \ [a-z]) {2, 3} \. (+) /)
Copy the code
Reference \ num
A regular expression can refer to a preceding group with a reference, using the form \1, \2 to reference the preceding subexpression.
For example, if I want to match a string with the following rule:
The string starts and ends with single or double quotation marks, and contains numbers or words in the middle.
I want to make sure that the beginning and end of the pattern are either single quotes or double quotes, so my pattern can be:
var pattern = /^(["'])[a-z\d]*\1$/
pattern.test("'perfect123'") // true
pattern.test('"1perfect2"') // true
Copy the code
Zero width assertion
To be honest, when I first looked at the concept and interpretation of the zero-width assertion, I really had no idea what I was talking about.
- Zero-width forward preemptive assertion (? =)
- Zero width negative first assertion (? !).
- Zero-width forward trailing assertion (? < =)
- Zero width negative back row assertion (? The
After the vocabulary apart to see, add their own understanding, slowly a little understand.
- Zero width: Zero width, the assertion is matched as a necessary condition, but is not reflected in the result of the match.
- Positive: the character in the assertion must be matched.
- Negative: Negative, the character in the assertion cannot be matched.
- Lookahead: must meet the conditions ahead, conditions in front, front equivalent to the right.
- Lookbehind: must satisfy the condition in the back, the condition in the back, the back is the same as the left.
Zero-width forward preemptive assertion (? =)
The specified character must exist on the right of the constraint target.
/123(? =a)/.test('123a') // true
Copy the code
The example above constrains that there must be a on the right of 123.
Zero width negative first assertion (? !).
The specified character cannot exist on the right of the constraint target.
/123(? ! a)/.test('123a') // false
Copy the code
The above example restricts the right hand side of 123 to a, otherwise the result will be false.
Zero-width forward trailing assertion (? < =)
The specified character must exist to the left of the constraint target.
/ (? <=a)123/.test('a123') // true
Copy the code
The example above constrains that there must be a to the left of 123.
ES2018 can only support zero width rear line assertion, see TC39 Proposals
Zero width negative back row assertion (? The
The specified character cannot exist to the left of the constraint target.
/ (? <! a)123/.test('a123') // false
Copy the code
The above example has the constraint that 123 cannot have a to the left, otherwise the result will be false
Note: This feature is only supported in ES2018.
RegExp
When we talk about regular expressions, we have to talk about RegExp objects. Let’s take a look at RegExp objects in terms of prototype methods, static properties, instance properties, etc.
Prototype method
RegExp.prototype.test
Test () is the most common regular method we use. The test() method performs a search to see if the regular expression matches the specified string, returning a Boolean value of true or false.
If the regular expression sets the global flag g, executing test() changes the regexp.lastIndex property to record the starting index of the last matched character. The test() method is executed consecutively, and subsequent executions will match the string starting at lastIndex. In this case, if test() does not match, lastIndex is reset to 0.
RegExp.prototype.exec
Compared with test(), exec() can get richer matching information. The result of exec() is an array. The 0th element of the array is the matched string, and the first to n elements are the results captured by the group of parentheses ().
The result array is an array, and arrays are also object type data, so the result array also has two properties, index and input
index
The zero-based index of the original string that represents the matched characterinput
Represents the raw string
As with test(), if the regular expression sets the G flag, then lastIndex is updated every time exec() is executed.
Static attributes
Static properties do not belong to any instance and must be accessed through the class name, which was mentioned in the previous article “Mind Mapping Front-end”.
RegExp.$1 to $9
$1 is the first group, and RegExp.$9 is the ninth group.
See section Grouping – Capture Groups above for details.
Instance attributes
lastIndex
LastIndex, semantically, is the starting index of the last matched character. Note that lastIndex is valid only if the G flag is set.
When no match has been made, the lastIndex is 0, meaning that the match starts at the 0th string.
LastIndex is updated as exec() and test() are executed
var reg = /\d/g
reg.lastIndex / / 0
reg.test('123456')
reg.lastIndex / / 1
reg.exec('123456')
reg.lastIndex / / 2 Copy the code
LastIndex can be modified manually, which means you have free control over the details of the match.
flags
The flags attribute returns a string representing which flags are enabled for the regular expression instance.
var reg = /\d/ig
reg.flags; // "gi"
Copy the code
global
Global is a Boolean that indicates whether the regular expression uses the G flag.
ignoreCase
IgnoreCase is a Boolean that indicates whether the regular expression uses the I flag.
multiline
Multiline is a Boolean that indicates whether the regular expression uses the m flag.
source
Source is a string representation of a regular expression that does not contain slashes or any flags on either side of the regular literal.
String involves regular methods
String.prototype.search
The search() method matches a string object with a regular expression and returns an index, representing the index of the first match of the regular expression in the string. If there is no match, -1 is returned.
The argument to the search() method must be a regular expression; if not, it is silently converted to a regular expression object by new RegExp().
"123abc".search(/[a-z]/); / / 3
Copy the code
String.prototype.match
The match method of a string is used to retrieve the string, and is similar to the exec method of a regular expression. The argument to the match method must also be a regular expression. The match method returns an array.
Unlike exec(), if the match method passes in a regular expression with the identity G, it returns all the results that match the full regular expression, but does not return the capture group.
"123abc456".match(/([a-z])/g);
// return ["a", "b", "c"]
var reg = /([a-z])/g;
reg.exec('123abc456');
// Return array ["a", "a", index: 3, input: "123abc456", groups: undefined] reg.exec('123abc456'); // Return array ["b", "b", index: 4, input: "123abc456", groups: undefined] reg.exec('123abc456'); // Return array ["c", "c", index: 5, input: "123abc456", groups: undefined] Copy the code
If the match() method passes a regular expression without the flag G, it behaves the same as the exec() method, returning only the first match and the result captured by the group.
If there are parenthesis groups in the expression, the results of those groups can also be obtained in the result array of match(), as mentioned in the capture group.
"123abc456".match(/([a-z])/);
// Return ["a", "a", index: 3, input: "123abc456", groups: undefined]
RegExp. $1; // "a"
Copy the code
String.prototype.replace
Replace () is a string replacement method that does not require the first argument to be a regular expression. If the first argument is a regular expression and contains the group, then in the second argument of replace(), you can refer to the group match in the form “$1”, “$2”.
"123456789hahaha".replace(/(\d*)([a-z]*)/."$1") / / "123456789"
Copy the code
String.prototype.split
The split() method is a string split method that is used a lot, but many people don’t know that it can take regular expressions as an argument.
Let’s say we get an irregular string like “1,2, 3,4, 5”, and we need to split the string to get an array of pure numbers. Using split(“,”) is not an option, but using regular expressions as the split condition is an option.
var str = "1,2, 3 ,4, 5";
str.split(/\s*,\s*/);
// return ["1", "2", "3", "4", "5"]
Copy the code
The last
Regular expression is a very important but easily overlooked knowledge point, in the interview is also a frequent test point, so it must be given enough attention. After combing the above knowledge points, I believe that I can have confidence in the following actual combat, not in a hurry.
This article uses mdnice smart blue theme layout