Introduction to regular expressions

What is a regular expression

Regular expressions are also called regular expressions. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule).

In simple terms, it is to match the string according to a certain rule.

2. Visual regular expression tools

Regexper:regexper.com/

The RegExp object

Two ways to instantiate a RegExp.

There are two ways to define a RegExp object.

1. Literals

let reg = /[a-z]{3}/gmi;
let reg = /[a-z]{3}/g;
let reg = /[a-z]{3}/m;
let reg = /[a-z]{3}/i;
Copy the code

mark

  • gGlobal indicates global search. If not added, the search stops until the first match.
  • mMulti-line stands for multi-line search.
  • iIgnore case indicates case insensitive. The default case is case sensitive.

constructors

let reg = new RegExp('\\bis\\b'.'g');
Copy the code

Because the JavaScript string \ is a special character, it needs to be escaped.

Metacharacters

Treat metacharacters as escape characters.

Regular expressions consist of two basic character types.

  • Literal text characters
  • metacharacters

1. Literal characters

The character is what it was originally meant to be.

2. Metacharacters

Are non-alphabetic characters that have special meaning in regular expressions. * +? $^. | \ () {} []

character meaning
\t Horizontal TAB character
\v Vertical TAB character
\n A newline
\r A carriage return
\ 0 Null character
\f Page identifier
\cX Control character, corresponding to X control character (Ctrl + X)

Similar to escape characters.

4. Character classes

Represents a character class that matches a feature.

A simple class can be constructed using the metacharacter []. A class is an object that conforms to some property, a general reference, not a character.

example

The expression [ABC] groups characters a or B or C into a class, and the expression can match any character in that class.

The // replace() method is used to replace some characters in a string with other characters, or to replace a substring that matches a regular expression.
'a1b2c3d4e5'.replace(/[abc]/g.'0');  //010203d4e5
Copy the code

The character class is inverted

We want to replace a character that is not any character in ABC.

// metacharacter ^ Creates a reverse class/negative class
'abcdefg'.replace(/[^abc]/g.'0');  //abc0000
Copy the code

5. Scope class

Matches characters in this range.

If we wanted to match the numbers 0-9, we might write [0123456789]. If we want to match the 26 letters, then we might write [abcdefghijklmnopqrstuvwxyz]. That’s a little bit of a hassle, which is why we have scope classes.

example

// Replace all numbers
'a1c2d3e4f5'.replace(/[0-9]/g.'x');  //axcxdxexfx
// Replace all lowercase letters
'a1c2d3e4f5'.replace(/[a-z]/g.'x');  //x1x2x3x4x5
// [] consists of classes that can be penalized internally. Replace all upper and lower case letters
'a1C2d3E4f5G6'.replace(/[a-zA-Z]/g.The '*');  / / * 1 * 2 * 3 * 4 * 5 * 6
Copy the code

doubt

What if I want to replace the numbers, and I want to replace the -sign as well?

// Replace all numbers and dashes
'2018-5-21'.replace(/[0-9-]/g.The '*');  / / * * * * * * * * *
Copy the code

Predefined classes

Some classes are already defined and can be used directly.

character Equivalence class meaning
. [^\r\n] All characters except carriage return and line feed
\d [0-9] Numeric characters
\D [^ 0-9] Non-numeric character
\s [\t\n\x0B\r] Whitespace characters
\S [^\t\n\x0B\r] Non whitespace characters
\w [a-zA-Z_0-9] Word characters (letters, numbers, underscores)
\W [^a-zA-Z_0-9] Non-word character

example

Replaces a string of ab + numbers + arbitrary characters

/ / write 1
'ab0c'.replace(/ab[0-9][^\r\n]/g.'TangJinJian');  //TangJianJian
/ / write 2
'ab0c'.replace(/ab\d./g.'TangJinJian');  //TangJianJian
Copy the code

Word boundaries

character meaning
^ Start with XXX (meaning when not in brackets)
$ Ended up with XXX
\b Word boundaries
\B Non-word boundary

example

The string I want to replace is one that only appears at the beginning.

'YuYan is a boy, YuYan'.replace(/^YuYan/g.'TangJinJian');  //TangJinJian is a boy, YuYan
Copy the code

The string I want to replace is one that only appears at the end.

'YuYan is a boy, YuYan'.replace(/YuYan$/g.'TangJinJian');  //YuYan is a boy, TangJinJian
Copy the code

Word boundary examples.

// Replace all is with 0
'This is a man'.replace(/is/g.'0');  //Th0 0 a man
// Replace all strings preceded by word boundaries
'This is a man'.replace(/\bis/g.'0');  //This 0 a man
// Replace all strings that are preceded by is without word boundaries
'This is a man'.replace(/\Bis\b/g.'0');  //Th0 is a man
Copy the code

Eight, quantifiers

Used to handle successive strings.

character meaning
? Zero or one occurrence (maximum one occurrence)
+ Appear once or more (at least once)
* Zero or more occurrences (any occurrence)
{n} A n time
{n,m} N to m occurrences
{n,} At least n times

I want to replace 10 consecutive occurrences in the string with *.

'1234567890abcd'.replace(/\d{10}/.The '*');  //*abcd
Copy the code

I want to replace the QQ number in the string.

'My QQ is: 10000'.replace(/ [1-9] [0-9] {4} /.'19216811');  // My QQ is: 19216811
Copy the code

The greed model

As many matches as possible.

There is a scenario for the regular expression, /\d{3,6}/ should I replace three digits or six digits, four or five digits?

Greedy mode matches in as many ways as possible
'123456789'.replace(/ / \ d {3, 6}.'x');  //x789
'123456789'.replace(/\d+/.'x');  //x
'123456789'.replace(/\d{3,}/.'x');  //x
Copy the code

10. Non-greedy model

As few matches as possible.

What if we want minimal replacement?

// Use non-greedy mode? Match in as few ways as possible
'12345678'.replace(/ \ d {3, 6}? /g.'x');  //xx78
'123456789'.replace(/ \ d {3, 6}? /g.'x');  //xxx
Copy the code

Because of the g flag, it matches all the strings in this string that match the rule. First rule /\d{3,6}? /g, 12345678 has two strings that meet the criteria, 123 and 456. So the substitution is xx78. Second rule /\d{3,6}? /g, 123456789 has three strings that meet the criteria: 123, 456, and 789. So the substitution is XXX.

Xi. Grouping

Some rules in parentheses, grouped together.

I want to replace letters and numbers that appear three times in a row.

// If there is no grouping, the quantifier just means that the number is matched 3 times.
'a1b2d3c4'.replace(/[a-z]\d{3}/g.The '*');  //a1b2d3c4
// If there are groups, the quantifier after the group represents the string that matches the rules in the group, and matches 3 times.
'a1b2d3c4'.replace(/([a-z]\d){3}/g.The '*');  //*c4
Copy the code

1, or

There are two types of rules in a group, and a match can be made if one of them is satisfied.

// I want to replace both ijaxxy and ijcdxy with *
'ijabxyijcdxy'.replace(/ij(ab|cd)xy/g.The '*');  / / * *
Copy the code

2. Backreference

You can refer to groups as variables.

// I want to change the separator between year, month and day
'2018-5-22'.replace(/ (\ d {4}) - (\ d {1, 2}) - (\ d {1, 2})/g.'$1 / $2 / $3');  / / 2018/5/22
// I want to replace the date and change the order
'2018-5-22'.replace(/ (\ d {4}) - (\ d {1, 2}) - (\ d {1, 2})/g.'$2 / $3 / $1');  / / 5/22/2018
Copy the code

3. Ignore grouping

You ignore the groups, you don’t capture the groups, you just add? :

// When the matching year group is ignored, the matching month group becomes $1 and the matching day group becomes $2
'2018-5-22'.replace(/ (? : \ d {4}) - (\ d {1, 2}) - (\ d {1, 2})/g.'$1 / $2 / $3');  / / 5/22 / $3
Copy the code

12. Foresight

Regular expressions are parsed from the beginning of the text to the end of the text, which is called the front. A lookahead is a forward check for assertion compliance while an expression matches a rule, and a backward/backward check in the opposite direction. JavaScript does not support backtracking. Conforming and disconforming specific assertions are called positive/positive matching and negative/negative matching.

The name of the regular meaning
Positive predictive exp(? =assert)
Negative predictive exp(? ! assert)
Is long exp(? <=assert) JavaScript does not support
Negative to look exp(? <! assert) JavaScript does not support

example

There is a string in the word-character + number format, and the word-character is replaced whenever this format is met.

'a1b2ccdde3'.replace(/\w(? =\d)/g.The '*');  //*1*2ccdd*3
Copy the code

There is a string in the form of a word character plus a non-numeric character. Whenever this format is met, the preceding word character is replaced.

'a1b2ccdde3'.replace(/\w(? ! \d)/g.The '*');  //a*b*****e*
Copy the code

RegExp object properties

Global Specifies whether to perform full-text search. The default value is false. Ignore case is case sensitive. The default is false. Multiline search, default is false. LastIndex is the position next to the last character of the current expression match. Source Specifies the text string of the regular expression.

let reg1 = /\w/;
let reg2 = /\w/gim;

reg1.global;  //false
reg1.ignoreCase;  //false
reg1.multiline;  //false

reg2.global;  //true
reg2.ignoreCase;  //true
reg2.multiline;  //true
Copy the code

RegExp object method

1, the RegExp. Prototype. The test ()

Used to check whether the regular expression matches the specified string. Returns true or false.

let reg1 = /\w/;
reg1.test('a');  //true
reg1.test(The '*');  //false
Copy the code

With the G symbol, it makes a little bit of difference.

let reg1 = /\w/g;
/ / the first time
reg1.test('ab');  //true
/ / the second time
reg1.test('ab');  //true
/ / a third time
reg1.test('ab');  //false
/ / 4 times
reg1.test('ab');  //true
/ / 5 times
reg1.test('ab');  //true
/ / 6 times
reg1.test('ab');  //false
Copy the code

It’s actually RegExp. LastIndex. The lasgIndex changes after each match. LastIndex is a readable and writable integer property of the regular expression that specifies the starting index for the next match.

let reg = /\w/g;
// Each time a match is made, lastIndex points to the index of the character following the matched string.
while(reg.test('ab')) {
    console.log(reg.lastIndex);
}
/ / 1
/ / 2
Copy the code

Reg. lastIndex is 0 initially and 1 on the first match to a. On the second match to B, reg.lastindex is 2.

let reg = /\w\w/g;
while(reg.test('ab12cd')) {
  console.log(reg.lastIndex);
}
/ / 2
/ / 4
/ / 6
Copy the code

Reg. lastIndex is 0 initially and 2 on the first match to AB. On the second match to 12, reg.lastindex is 4. On the third match to CD, reg.lastIndex is 6.

let reg = /\w/g;
// The lastIndex changes to 0 if no string matches the regular string.
while(reg.test('ab')) {
    console.log(reg.lastIndex);
}
console.log(reg.lastIndex);
reg.test('ab');
console.log(reg.lastIndex);
/ / 1
/ / 2
/ / 0
/ / 1
Copy the code

So, that’s why reg.test(‘ab’) returns false after repeated execution.

let reg = /\w/g;
reg.lastIndex = 2;
reg.test('ab');  //false
Copy the code

The starting position of each match is lastIndex. In the example above, the match starts at position 2, which is followed by no string matching the re, so it is false.

2, the RegExp. Prototype. The exec ()

Performs a search match in a specified string. Returns an array of search results or NULL.

Nonglobal case

let reg = /\d(\w)\d/;
let ts = '*1a2b3c';
let ret = reg.exec(ts);  //ret is the result array
// reg.lastIndex must be 0 because there is no g flag. In the absence of the G flag, lastIndex is ignored.
console.log(reg.lastIndex + '\t' + ret.index + '\t' + ret.toString());
console.log(ret);
// 0 1 1a2,a
// ["1a2", "a"]
Copy the code

The return array consists of the following elements:

  • The first element is the text that matches the regular expression.
  • The second element isregObject to which the first subexpression matches the text, if any.
  • The second element isregObject’s second subexpression matches the text, if any, and so on.
// Subexpressions are grouping.
let reg = /\d(\w)(\w)(\w)\d/;
let ts = '*1a2b3c';
let ret = reg.exec(ts);
console.log(reg.lastIndex + '\t' + ret.index + '\t' + ret.toString());
console.log(ret);  // Outputs an array of results
// 0 1 1a2b3,a,2,b
// ["1a2b3", "a", "2", "b"]
Copy the code

The global situation

let reg = /\d(\w)(\w)(\w)\d/g;
let ts = '*1abc25def3g';
while(ret = reg.exec(ts)) {
    console.log(reg.lastIndex + '\t' + ret.index + '\t' + ret.toString());
}
// 6 1 1abc2,a,b,c
// 11 6 5def3,d,e,f
Copy the code

The first match is 1abc2, and the last character of 1abc2 starts at 6, so reg.lastindex is 6. The first character of 1abc2 starts at position 1, so ret.index is 1.

The second match is 5def3, where the last character starts at 11, so reg.lastindex is 11. The first character of 5def3 starts at position 6, so ret.index is 6.

String object methods

1, the String. The prototype. The search ()

Performs a search match between a regular expression and a String. The index () method returns the index of the first match, -1 if not found. No global matches are performed, the G flag is ignored, and the retrieval is always done from the beginning of the string.

I want to know where the Jin string starts.

'TangJinJian'.search('Jin');  / / 4
'TangJinJian'.search(/Jin/);  / / 4
Copy the code

The search method, which can search for matches either by string or by re description string.

2, the String. The prototype. The match ()

When a string matches a regular expression, the match() method retrieves the match. Whether or not a RegExp object parameter is supplied with a G flag makes a big difference to the result.

Case of non-global invocation

If the RegExp does not have a G flag, then match can only perform a match once in the string. If no matching text is found, null is returned. Otherwise, it returns an array containing information about the matching text it finds.

let reg = /\d(\w)\d/;
let ts = '*1a2b3c';
let ret = ts.match(reg);
console.log(ret.index + '\t' + reg.lastIndex);
console.log(ret);
// 1  0
// ["1a2", "a"]
Copy the code

The non-global case has the same effect as the regexp.prototype.exec () method.

Global calls

I want to find all strings in number + word + number format.

let reg = /\d(\w)\d/g;
let ts = '*1a2b3c4e';
let ret = ts.match(reg);
console.log(ret.index + '\t' + reg.lastIndex);
console.log(ret);
// undefined 0
// ["1a2", "3c4"]
Copy the code

Global and RegExp. Prototype.exec () methods. Well, there’s no grouping information. If we don’t use grouping information, it’s more efficient to use the string.prototype.match () method. And you don’t need to write a loop to get all the matches one by one.

3, String. The prototype. The split ()

Splits a String into an array of strings using the specified delimiter String.

'a,b,c,d'.split(/, /);  //["a", "b", "c", "d"]
'a1b2c3d'.split(/\d/);  //["a", "b", "c", "d"]
'a1b-c|d'.split(/[\d-|]/);  //["a", "b", "c", "d"]
Copy the code

4, the String. The prototype. The replace ()

Returns a new string that replaces some or all of the matched patterns with a replacement value. The pattern can be a string or a regular expression, and the replacement value can be a string or a function called for each match.

Regular use

'TangJinJian'.replace('Tang'.' ');  //JinJian
'TangJinJian'.replace(/Ji/g.The '*');  //Tang*n*an
Copy the code

The above two usages are the most common, but cannot be controlled in detail.

Fine usage

I want to add one to all the numbers in A1B2C3D4 to make a2B3C4D5.

'a1b2c3d4'.replace(/\d/g.function(match, index, orgin) {
    console.log(index);
    return parseInt(match) + 1;
});
/ / 1
/ / 3
/ / 5
/ / 7
// a2b3c4d5
Copy the code

The callback function takes the following arguments:

  • matchThe first parameter. Matched string.
  • groupThe second argument. Grouping, if there are n groups, and so on ngroupArgument, the next two arguments will become no2+nand3+nA parameter. If there is no grouping, this parameter is not available.
  • indexThe third parameter. Index position of the first character of the matched string.
  • orginThe fourth parameter. Source string.

I want to remove the letter between the two numbers.

'a1b2c3d4e5f6'.replace(/(\d)(\w)(\d)/g.function(match, group1, group2, group3, index, orgin) {
  console.log(match);
  return group1 + group3;
});
// 1b2
// 3d4
// 5f6
// a12c34e56
Copy the code