Introduction to regular expressions
What is a regular expression
Regular expressions are also called regular expressions. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule).
In simple terms, it is to match the string according to a certain rule.
2. Visual regular expression tools
Regexper:regexper.com/
The RegExp object
Two ways to instantiate a RegExp.
There are two ways to define a RegExp object.
1. Literals
let reg = /[a-z]{3}/gmi;
let reg = /[a-z]{3}/g;
let reg = /[a-z]{3}/m;
let reg = /[a-z]{3}/i;
Copy the code
mark
g
Global indicates global search. If not added, the search stops until the first match.m
Multi-line stands for multi-line search.i
Ignore case indicates case insensitive. The default case is case sensitive.
constructors
let reg = new RegExp('\\bis\\b'.'g');
Copy the code
Because the JavaScript string \ is a special character, it needs to be escaped.
Metacharacters
Treat metacharacters as escape characters.
Regular expressions consist of two basic character types.
- Literal text characters
- metacharacters
1. Literal characters
The character is what it was originally meant to be.
2. Metacharacters
Are non-alphabetic characters that have special meaning in regular expressions. * +? $^. | \ () {} []
character | meaning |
---|---|
\t |
Horizontal TAB character |
\v |
Vertical TAB character |
\n |
A newline |
\r |
A carriage return |
\ 0 |
Null character |
\f |
Page identifier |
\cX |
Control character, corresponding to X control character (Ctrl + X) |
Similar to escape characters.
4. Character classes
Represents a character class that matches a feature.
A simple class can be constructed using the metacharacter []. A class is an object that conforms to some property, a general reference, not a character.
example
The expression [ABC] groups characters a or B or C into a class, and the expression can match any character in that class.
The // replace() method is used to replace some characters in a string with other characters, or to replace a substring that matches a regular expression.
'a1b2c3d4e5'.replace(/[abc]/g.'0'); //010203d4e5
Copy the code
The character class is inverted
We want to replace a character that is not any character in ABC.
// metacharacter ^ Creates a reverse class/negative class
'abcdefg'.replace(/[^abc]/g.'0'); //abc0000
Copy the code
5. Scope class
Matches characters in this range.
If we wanted to match the numbers 0-9, we might write [0123456789]. If we want to match the 26 letters, then we might write [abcdefghijklmnopqrstuvwxyz]. That’s a little bit of a hassle, which is why we have scope classes.
example
// Replace all numbers
'a1c2d3e4f5'.replace(/[0-9]/g.'x'); //axcxdxexfx
// Replace all lowercase letters
'a1c2d3e4f5'.replace(/[a-z]/g.'x'); //x1x2x3x4x5
// [] consists of classes that can be penalized internally. Replace all upper and lower case letters
'a1C2d3E4f5G6'.replace(/[a-zA-Z]/g.The '*'); / / * 1 * 2 * 3 * 4 * 5 * 6
Copy the code
doubt
What if I want to replace the numbers, and I want to replace the -sign as well?
// Replace all numbers and dashes
'2018-5-21'.replace(/[0-9-]/g.The '*'); / / * * * * * * * * *
Copy the code
Predefined classes
Some classes are already defined and can be used directly.
character | Equivalence class | meaning |
---|---|---|
. |
[^\r\n] |
All characters except carriage return and line feed |
\d |
[0-9] |
Numeric characters |
\D |
[^ 0-9] |
Non-numeric character |
\s |
[\t\n\x0B\r] |
Whitespace characters |
\S |
[^\t\n\x0B\r] |
Non whitespace characters |
\w |
[a-zA-Z_0-9] |
Word characters (letters, numbers, underscores) |
\W |
[^a-zA-Z_0-9] |
Non-word character |
example
Replaces a string of ab + numbers + arbitrary characters
/ / write 1
'ab0c'.replace(/ab[0-9][^\r\n]/g.'TangJinJian'); //TangJianJian
/ / write 2
'ab0c'.replace(/ab\d./g.'TangJinJian'); //TangJianJian
Copy the code
Word boundaries
character | meaning |
---|---|
^ |
Start with XXX (meaning when not in brackets) |
$ |
Ended up with XXX |
\b |
Word boundaries |
\B |
Non-word boundary |
example
The string I want to replace is one that only appears at the beginning.
'YuYan is a boy, YuYan'.replace(/^YuYan/g.'TangJinJian'); //TangJinJian is a boy, YuYan
Copy the code
The string I want to replace is one that only appears at the end.
'YuYan is a boy, YuYan'.replace(/YuYan$/g.'TangJinJian'); //YuYan is a boy, TangJinJian
Copy the code
Word boundary examples.
// Replace all is with 0
'This is a man'.replace(/is/g.'0'); //Th0 0 a man
// Replace all strings preceded by word boundaries
'This is a man'.replace(/\bis/g.'0'); //This 0 a man
// Replace all strings that are preceded by is without word boundaries
'This is a man'.replace(/\Bis\b/g.'0'); //Th0 is a man
Copy the code
Eight, quantifiers
Used to handle successive strings.
character | meaning |
---|---|
? |
Zero or one occurrence (maximum one occurrence) |
+ |
Appear once or more (at least once) |
* |
Zero or more occurrences (any occurrence) |
{n} |
A n time |
{n,m} |
N to m occurrences |
{n,} |
At least n times |
I want to replace 10 consecutive occurrences in the string with *.
'1234567890abcd'.replace(/\d{10}/.The '*'); //*abcd
Copy the code
I want to replace the QQ number in the string.
'My QQ is: 10000'.replace(/ [1-9] [0-9] {4} /.'19216811'); // My QQ is: 19216811
Copy the code
The greed model
As many matches as possible.
There is a scenario for the regular expression, /\d{3,6}/ should I replace three digits or six digits, four or five digits?
Greedy mode matches in as many ways as possible
'123456789'.replace(/ / \ d {3, 6}.'x'); //x789
'123456789'.replace(/\d+/.'x'); //x
'123456789'.replace(/\d{3,}/.'x'); //x
Copy the code
10. Non-greedy model
As few matches as possible.
What if we want minimal replacement?
// Use non-greedy mode? Match in as few ways as possible
'12345678'.replace(/ \ d {3, 6}? /g.'x'); //xx78
'123456789'.replace(/ \ d {3, 6}? /g.'x'); //xxx
Copy the code
Because of the g flag, it matches all the strings in this string that match the rule. First rule /\d{3,6}? /g, 12345678 has two strings that meet the criteria, 123 and 456. So the substitution is xx78. Second rule /\d{3,6}? /g, 123456789 has three strings that meet the criteria: 123, 456, and 789. So the substitution is XXX.
Xi. Grouping
Some rules in parentheses, grouped together.
I want to replace letters and numbers that appear three times in a row.
// If there is no grouping, the quantifier just means that the number is matched 3 times.
'a1b2d3c4'.replace(/[a-z]\d{3}/g.The '*'); //a1b2d3c4
// If there are groups, the quantifier after the group represents the string that matches the rules in the group, and matches 3 times.
'a1b2d3c4'.replace(/([a-z]\d){3}/g.The '*'); //*c4
Copy the code
1, or
There are two types of rules in a group, and a match can be made if one of them is satisfied.
// I want to replace both ijaxxy and ijcdxy with *
'ijabxyijcdxy'.replace(/ij(ab|cd)xy/g.The '*'); / / * *
Copy the code
2. Backreference
You can refer to groups as variables.
// I want to change the separator between year, month and day
'2018-5-22'.replace(/ (\ d {4}) - (\ d {1, 2}) - (\ d {1, 2})/g.'$1 / $2 / $3'); / / 2018/5/22
// I want to replace the date and change the order
'2018-5-22'.replace(/ (\ d {4}) - (\ d {1, 2}) - (\ d {1, 2})/g.'$2 / $3 / $1'); / / 5/22/2018
Copy the code
3. Ignore grouping
You ignore the groups, you don’t capture the groups, you just add? :
// When the matching year group is ignored, the matching month group becomes $1 and the matching day group becomes $2
'2018-5-22'.replace(/ (? : \ d {4}) - (\ d {1, 2}) - (\ d {1, 2})/g.'$1 / $2 / $3'); / / 5/22 / $3
Copy the code
12. Foresight
Regular expressions are parsed from the beginning of the text to the end of the text, which is called the front. A lookahead is a forward check for assertion compliance while an expression matches a rule, and a backward/backward check in the opposite direction. JavaScript does not support backtracking. Conforming and disconforming specific assertions are called positive/positive matching and negative/negative matching.
The name of the | regular | meaning |
---|---|---|
Positive predictive | exp(? =assert) |
|
Negative predictive | exp(? ! assert) |
|
Is long | exp(? <=assert) |
JavaScript does not support |
Negative to look | exp(? <! assert) |
JavaScript does not support |
example
There is a string in the word-character + number format, and the word-character is replaced whenever this format is met.
'a1b2ccdde3'.replace(/\w(? =\d)/g.The '*'); //*1*2ccdd*3
Copy the code
There is a string in the form of a word character plus a non-numeric character. Whenever this format is met, the preceding word character is replaced.
'a1b2ccdde3'.replace(/\w(? ! \d)/g.The '*'); //a*b*****e*
Copy the code
RegExp object properties
Global Specifies whether to perform full-text search. The default value is false. Ignore case is case sensitive. The default is false. Multiline search, default is false. LastIndex is the position next to the last character of the current expression match. Source Specifies the text string of the regular expression.
let reg1 = /\w/;
let reg2 = /\w/gim;
reg1.global; //false
reg1.ignoreCase; //false
reg1.multiline; //false
reg2.global; //true
reg2.ignoreCase; //true
reg2.multiline; //true
Copy the code
RegExp object method
1, the RegExp. Prototype. The test ()
Used to check whether the regular expression matches the specified string. Returns true or false.
let reg1 = /\w/;
reg1.test('a'); //true
reg1.test(The '*'); //false
Copy the code
With the G symbol, it makes a little bit of difference.
let reg1 = /\w/g;
/ / the first time
reg1.test('ab'); //true
/ / the second time
reg1.test('ab'); //true
/ / a third time
reg1.test('ab'); //false
/ / 4 times
reg1.test('ab'); //true
/ / 5 times
reg1.test('ab'); //true
/ / 6 times
reg1.test('ab'); //false
Copy the code
It’s actually RegExp. LastIndex. The lasgIndex changes after each match. LastIndex is a readable and writable integer property of the regular expression that specifies the starting index for the next match.
let reg = /\w/g;
// Each time a match is made, lastIndex points to the index of the character following the matched string.
while(reg.test('ab')) {
console.log(reg.lastIndex);
}
/ / 1
/ / 2
Copy the code
Reg. lastIndex is 0 initially and 1 on the first match to a. On the second match to B, reg.lastindex is 2.
let reg = /\w\w/g;
while(reg.test('ab12cd')) {
console.log(reg.lastIndex);
}
/ / 2
/ / 4
/ / 6
Copy the code
Reg. lastIndex is 0 initially and 2 on the first match to AB. On the second match to 12, reg.lastindex is 4. On the third match to CD, reg.lastIndex is 6.
let reg = /\w/g;
// The lastIndex changes to 0 if no string matches the regular string.
while(reg.test('ab')) {
console.log(reg.lastIndex);
}
console.log(reg.lastIndex);
reg.test('ab');
console.log(reg.lastIndex);
/ / 1
/ / 2
/ / 0
/ / 1
Copy the code
So, that’s why reg.test(‘ab’) returns false after repeated execution.
let reg = /\w/g;
reg.lastIndex = 2;
reg.test('ab'); //false
Copy the code
The starting position of each match is lastIndex. In the example above, the match starts at position 2, which is followed by no string matching the re, so it is false.
2, the RegExp. Prototype. The exec ()
Performs a search match in a specified string. Returns an array of search results or NULL.
Nonglobal case
let reg = /\d(\w)\d/;
let ts = '*1a2b3c';
let ret = reg.exec(ts); //ret is the result array
// reg.lastIndex must be 0 because there is no g flag. In the absence of the G flag, lastIndex is ignored.
console.log(reg.lastIndex + '\t' + ret.index + '\t' + ret.toString());
console.log(ret);
// 0 1 1a2,a
// ["1a2", "a"]
Copy the code
The return array consists of the following elements:
- The first element is the text that matches the regular expression.
- The second element is
reg
Object to which the first subexpression matches the text, if any. - The second element is
reg
Object’s second subexpression matches the text, if any, and so on.
// Subexpressions are grouping.
let reg = /\d(\w)(\w)(\w)\d/;
let ts = '*1a2b3c';
let ret = reg.exec(ts);
console.log(reg.lastIndex + '\t' + ret.index + '\t' + ret.toString());
console.log(ret); // Outputs an array of results
// 0 1 1a2b3,a,2,b
// ["1a2b3", "a", "2", "b"]
Copy the code
The global situation
let reg = /\d(\w)(\w)(\w)\d/g;
let ts = '*1abc25def3g';
while(ret = reg.exec(ts)) {
console.log(reg.lastIndex + '\t' + ret.index + '\t' + ret.toString());
}
// 6 1 1abc2,a,b,c
// 11 6 5def3,d,e,f
Copy the code
The first match is 1abc2, and the last character of 1abc2 starts at 6, so reg.lastindex is 6. The first character of 1abc2 starts at position 1, so ret.index is 1.
The second match is 5def3, where the last character starts at 11, so reg.lastindex is 11. The first character of 5def3 starts at position 6, so ret.index is 6.
String object methods
1, the String. The prototype. The search ()
Performs a search match between a regular expression and a String. The index () method returns the index of the first match, -1 if not found. No global matches are performed, the G flag is ignored, and the retrieval is always done from the beginning of the string.
I want to know where the Jin string starts.
'TangJinJian'.search('Jin'); / / 4
'TangJinJian'.search(/Jin/); / / 4
Copy the code
The search method, which can search for matches either by string or by re description string.
2, the String. The prototype. The match ()
When a string matches a regular expression, the match() method retrieves the match. Whether or not a RegExp object parameter is supplied with a G flag makes a big difference to the result.
Case of non-global invocation
If the RegExp does not have a G flag, then match can only perform a match once in the string. If no matching text is found, null is returned. Otherwise, it returns an array containing information about the matching text it finds.
let reg = /\d(\w)\d/;
let ts = '*1a2b3c';
let ret = ts.match(reg);
console.log(ret.index + '\t' + reg.lastIndex);
console.log(ret);
// 1 0
// ["1a2", "a"]
Copy the code
The non-global case has the same effect as the regexp.prototype.exec () method.
Global calls
I want to find all strings in number + word + number format.
let reg = /\d(\w)\d/g;
let ts = '*1a2b3c4e';
let ret = ts.match(reg);
console.log(ret.index + '\t' + reg.lastIndex);
console.log(ret);
// undefined 0
// ["1a2", "3c4"]
Copy the code
Global and RegExp. Prototype.exec () methods. Well, there’s no grouping information. If we don’t use grouping information, it’s more efficient to use the string.prototype.match () method. And you don’t need to write a loop to get all the matches one by one.
3, String. The prototype. The split ()
Splits a String into an array of strings using the specified delimiter String.
'a,b,c,d'.split(/, /); //["a", "b", "c", "d"]
'a1b2c3d'.split(/\d/); //["a", "b", "c", "d"]
'a1b-c|d'.split(/[\d-|]/); //["a", "b", "c", "d"]
Copy the code
4, the String. The prototype. The replace ()
Returns a new string that replaces some or all of the matched patterns with a replacement value. The pattern can be a string or a regular expression, and the replacement value can be a string or a function called for each match.
Regular use
'TangJinJian'.replace('Tang'.' '); //JinJian
'TangJinJian'.replace(/Ji/g.The '*'); //Tang*n*an
Copy the code
The above two usages are the most common, but cannot be controlled in detail.
Fine usage
I want to add one to all the numbers in A1B2C3D4 to make a2B3C4D5.
'a1b2c3d4'.replace(/\d/g.function(match, index, orgin) {
console.log(index);
return parseInt(match) + 1;
});
/ / 1
/ / 3
/ / 5
/ / 7
// a2b3c4d5
Copy the code
The callback function takes the following arguments:
match
The first parameter. Matched string.group
The second argument. Grouping, if there are n groups, and so on ngroup
Argument, the next two arguments will become no2+n
and3+n
A parameter. If there is no grouping, this parameter is not available.index
The third parameter. Index position of the first character of the matched string.orgin
The fourth parameter. Source string.
I want to remove the letter between the two numbers.
'a1b2c3d4e5f6'.replace(/(\d)(\w)(\d)/g.function(match, group1, group2, group3, index, orgin) {
console.log(match);
return group1 + group3;
});
// 1b2
// 3d4
// 5f6
// a12c34e56
Copy the code