Every time I see a regular expression in a project, I feel like I’m reading a book, as if I have a fear of regular expressions. This time I made up my mind to understand the regular, spent a lot of energy on learning, and finally I can loudly say that I will be regular. The goal of this article is to enable you to read regular expressions and solve some problems in your project using handwritten regex.

A symbol in a regular

In people’s impression, re is a string of symbols, these special symbols have special meaning, they are:

^ $. * +? =! < : | \ / () [] {}

This section starts with these symbols and gives you a quick guide to regular expressions. Some symbols have special meanings only in certain contexts of regular expressions, and are treated as ordinary characters at other times.

Backslash notation

character meaning
\ To precede non-special characters\Indicates that the next character is a special character. Such as/d/Matches lowercase letters"D", but/\d/The match is a number0-9.



Add before special characters\Indicates that the next character is not a special character, but the character itself, which is calledCharacter escaping. For example, to match a decimal point in a string, add\:/ /. /Because the.Is a special character,/ /.Matches all characters.

Example 1: Match the decimal point in the character because of the decimal point. Is a special character that needs to be escaped

// Do not escape. The. Represents any character (more on that later).
const regExp = / /.;
regExp.test('20'); // true
regExp.test('20.1'); // true

// Escape, \. Represents the decimal point itself
const regExp1 = / /. /;
regExp1.test('20'); // false
regExp1.test('20.1'); // true
Copy the code

Character groups

character meaning
[…]. Contains any character in square brackets

Such as[123]Represents the matching number123Any number in

It can also represent a range,[a-z]Represents the match characteratozAny character between
[^…]. Reverse selection to exclude any character in square brackets

Such as[123] ^Represents a matching exclusion number123Any character of the

[^a-z]Represents a matching exclusion characteratozOne other character between all characters
\d Represents [0-9], any number
\D Represents [^0-9], any character other than a number
\w Said [0-9 a zA – Z_]
\W Says [^ 0-9 a – zA – Z_]
\s Represents [\t\v\n\r\f]. Represents whitespace, including Spaces, horizontal tabs, vertical tabs, line feeds, carriage returns, and page feeds.
\S Represents [^ \t\v\n\r\f]. Non-whitespace character.
. Says [^ \ n \ r \ u2028 \ u2029]. Wildcard character, representing almost any character. Newline, carriage return, line and segment separators are excluded.

Example 1: /[ABC]/ matches whether the string contains any of the ABC characters

We can verify this by using the test method on the RegExp object:

const regExp = /[abc]/;
regExp.test('1a'); // true
regExp.test('bc'); // true
regExp.test('1'); // false
Copy the code

Example 2: /a[^ BC]d/ matches whether the string has a substring axd, where x cannot be characters b and C

const regExp = /a[^bc]d/;
regExp.test('caed'); // true
regExp.test('abd'); // false
regExp.test('abcd'); // false
Copy the code

quantifiers

character meaning
x{m,n} X occurs m minus n times
x{m,} X occurs at least m times
x{m} X occurs m times, equivalent to {m,m}
x? X occurs 0 or 1 times, equivalent to {0,1}
x+ X occurs 1 or more times, equivalent to {1,}
x* X occurs 0 or more times, equivalent to {0,}

In x{m,n}, x indicates the item to be matched, which can be a character or a whole. M is 0 or a positive integer. N is a positive integer, and n>m.

Example 1: /a{2,4} BC /a{2,4} refers to the occurrence of a in the string 2-4 times

const regExp = The BC / / a {2, 4};
regExp.test('aaabc'); // true
regExp.test('aabc'); // true
regExp.test('abc'); // false
Copy the code

Example 2: /a{2,}b{3}c? d+e*/

const regExp = /a{2,}b{3}c? d+e*/;
regExp.test('aaabbbcdee'); // true
regExp.test('aabbbdd'); // true
regExp.test('aaabc'); // false
Copy the code

Non-greedy match

character meaning
x{m,n}? The number of occurrences of x m minus n times
x{m,}? X occurs at least m times
x?? X occurs 0 or 1 times, equivalent to {0,1}
x+? X occurs 1 or more times, equivalent to {1,}
x*? X occurs 0 or more times, equivalent to {0,}

All the repeated characters we talked about above are as many matches as possible, which we call greedy matches, and we also have non-greedy matches (also known as lazy matches), which are as few matches as possible.

Grammar: a quantifier followed by a “?” .

Example 1: Use “aaa” as the matching string and use the regular expression /a+/ to match it. The result is “aaa” and use /a+? / the result of the match is “A”.

'aaa'.match(/a+/)
// ["aaa", index: 0, input: "aaa", groups: undefined]

'aaa'.match(/a+? /)
// ["a", index: 0, input: "aaa", groups: undefined]
Copy the code

Multiple branch

character meaning
| Multiple select branches. For example,/x|y/Matched characterxory

Grammar: p1 | p2 | p3, said p1, p2, p3 have a meet. Matches are attempted from left to right, and subsequent matches are ignored if the left option matches.

Example 1: / SCSS | sass /, said SCSS or sass can be matched

location

Some regular expression elements match positions between characters rather than actual characters, which is often used when inserting characters into strings. There are several positions in a regular expression:

character meaning
^ The beginning of a string. The beginning of a line is matched in a multi-line search

Such as/^A/Match withAString at the beginning, not matching"an A"In the"A"But it matches"An A"The first of the"A"
$ The end of a string, the end of a line is matched in a multi-line search

Such as/t$/Match withtThe string at the end cannot match"eater"In the "t", but it can match"eat"In the"t".
\b The position of word boundaries
\B Position of non-word boundaries
x(? =p) xThe following character andpMatch on matchxThat matchpFront position

For example, for /Jack(? =Sprat)/, “Jack” is matched only if it has “Sprat”.
x(? ! p) xThe following characters do not matchpMatch on matchx, that is, match nopFront position

(? <=p)x xThe preceding character andpMatch on matchxThat matchpRear position

For example, for/ (? <=Jack)Sprat/, “Sprat” will only get a match if it follows “Jack”.
(? <! p)x xThe preceding character does not matchpMatch on matchx, that is, match nopRear position

Example 1: We use simple examples to understand these positions.

// Insert "#" at the beginning and end of the string (^ and $)
'javascript css html'.replace(/^|$/g.The '#'); // "#javascript css html#"

// Insert "#" at the word boundary (i.e. \b)
'javascript css html'.replace(/\b/g.The '#'); // "#javascript# #css# #html#"
// Insert "#" at non-word boundary (i.e. \B)
'javascript css html'.replace(/\B/g.The '#'); // "j#a#v#a#s#c#r#i#p#t c#s#s h#t#m#l"

// Insert "#" before space
'javascript css html'.replace(/ (? =\s)/g.The '#'); // "javascript# css# html"
// Insert "#" before the space
'javascript css html'.replace(/ (? ! \s)/g.The '#'); // "#j#a#v#a#s#c#r#i#p#t #c#s#s #h#t#m#l#"

// Insert "#" after the space
'javascript css html'.replace(/ (? <=\s)/g.The '#'); // "javascript #css #html"
// Insert "#" after a non-space position
'javascript css html'.replace(/ (? 
      .The '#'); // "#j#a#v#a#s#c#r#i#p#t# c#s#s# h#t#m#l#"
Copy the code

Example 2: /Jack(? = Sprat \ | Frost)/” Jack “followed by” Sprat “or” Frost “will get match, the match results do not include” Sprat “or” Frost “.

'JackFrost'.match(/Jack(? =Sprat|Frost)/g); // ["Jack"]
Copy the code

Example 3: For /\d+(? ! .). /, the number is matched only if it is not followed by a decimal point. The result is “141” instead of “3”.

/\d+(? ! .). /.exec(3.141)
// ["141", index: 2, input: "3.141", groups: undefined]
Copy the code

Example 4: Based on what we learned earlier, we can do thousandths of digit separation, such as changing 123456789 to 123,456,789. This example is a bit more complicated and can be done in 3 steps:

  1. Make the last comma: Find the position before the third-to-last number/ (? =(\d{3})$)/gAnd then usereplace()Method insert comma
'123456789'.replace(/ (? =(\d{3})$)/g.', '); // "123456,789"
Copy the code
  1. Make all commas: use the quantifiers we learned above+
'123456789'.replace(/ (? =(\d{3})+$)/g.', '); // ",123,456,789"
Copy the code
  1. Deal with special cases. As you can see from the above, when the number of digits is a multiple of 3, the number will start with a comma, and we’ll have to deal with these special cases separately, because we don’t want to insert the comma at the beginning(? ! ^), so the complete form is:
'123456789'.replace(/ (? ! (^)? =(\d{3})+$)/g.', '); / / "123456789"
Copy the code

The function of parentheses

character meaning
(x) Capture groupMatch:xThe regular expression in parentheses is defined as a subexpression, which can be used by backreference or used to obtain the matching result of the subexpression. Using the index of the resulting element ([1], ..., [n]) or from predefinedRegExpObject properties (At $1,... , $9) to get.



Capturing groups has a performance penalty, so if you don’t need to store matches or your parentheses are just grouping, you can choose the following non-capturing groups.
(? :x) Non capturing groupMatch:xAnd indefinite subexpression, only the primitive function of parentheses
\n backreferences: n is a positive integer, referencing the previous grouping,\ 1Represents the subexpression in the re that matches the first open parenthesis
(? <Name>x) Named capture groupMatch:xAnd stores it in the Groups attribute of the returned match

Parentheses in regular expressions have several functions, which are described in more detail below.

Group 1.

Parentheses in a re group individual items, treating the contents of the parentheses as a whole.

Example 1: / (CD) ab | + | ef /, the (ab | CD) as a whole in +, (CD) ab | + mean string “ab”, or “CD” can be repeated once or several times.

const regExp = /(ab|cd)+|ef/;
regExp.test('abab'); // true
regExp.test('cdef'); // true
regExp.test('acf'); // false
Copy the code

2. Define a subexpression

When a regular expression matches a string, it is possible to extract a string of characters from the string that matches the parenthesized subexpression.

Example 1: We can extract the year, month and day of the date and wrap the subexpressions that match the year, month and day in parentheses

const regExp = / (\ d {4}) - (\ d {1, 2})/(\ d {1, 2})
"2021-1-20".match(regExp);
// ["2021-1-20", "2021", "1", "20", index: 0, input: "2021-1-20", groups: undefined]
Copy the code

In the above example, match returns an array, with the first element being the result of the matching string, followed by the result of the subexpression match, index being the subscript of the matching result in the original string, and input being the string itself. The exact use of match will be explained later.

Example 2: We can also get the result of our subexexpression match from $1-$9 in the global property of RegExp

const regExp = / (\ d {4}) - (\ d {1, 2})/(\ d {1, 2})
regExp.test("2021-1-20");

console.log(RegExp. $1);/ / 2021
console.log(RegExp. $2);/ / 1
console.log(RegExp. $3);/ / 20
Copy the code

RegExp.$1-$9: This feature is non-standard, so try not to use it in production!

Example 3: Combining the above example, we can use the string replace method to convert the date format, such as “2021-1-20” to “2021/1/20”

const regExp = / (\ d {4}) - (\ d {1, 2})/(\ d {1, 2})
"2021-1-20".replace(regExp,"$1 / $2 / $3"); / / "2021/1/20"
Copy the code

3. Reference

Parentheses also allow the back of the same regular expression to reference a previous subexpression.

Example 1: We know that the date can be “2021-1-20”, “2021/1/20” or “2021.1.20”. Now we need to implement a regular expression that matches all three forms of date. We can see that the year, month and day delimiters must be consistent. The regular expression we implemented should not match the form “2021-1/20”

const regExp = /\d{4}(-|\/|\.) 1 \ \ d {1, 2} \ d {1, 2} /
regExp.test("2021-1-20"); // true
regExp.test("2021/1/20"); // true
regExp.test("2021.1.20"); // true
regExp.test("2021-1/20"); // false
Copy the code

The above regular expression \ 1 references is (- | \ / | \.) , regular above (- | \ / | \.) The result of this partial match is the same as that of \1.

\1 refers to the contents wrapped in the first parentheses, the contents wrapped in the second parentheses can be referenced with \2, and so on. The same is true if parenthesis nesting occurs, except that the position of the first parenthesis refers to the position of the left parenthesis.

Example 2: To better understand the use of parentheses, let’s use another example where parentheses are nested

const regExp = /^(((\d)\d)(\d))(\d)\1\2\3\4\5$/;
regExp.test('123412312134'); // true
console.log( RegExp.$1 ); / / 123
console.log( RegExp.$2 ); / / 12
console.log( RegExp.$3 ); / / 1
console.log( RegExp$4);/ / 3
console.log( RegExp.$5 ); / / 4
Copy the code

Group #n is what $n gets

The modifier

Regular expressions have six optional arguments (flags) to allow global and case-insensitive searches, etc. These parameters can be used alone or together in any order and are included in the regular expression instance.

character meaning
i Case insensitive
g Global match, looks for all matches, stops at the first match without the G modifier
m Multi-line matching mode, ^/$matches the beginning/end of a line or string
s allow.Matches a newline character
u Matches using patterns of Unicode codes.
y Perform “Stickiness (sticky) “and matches from the current position of the target string.

Example 1: Match strings are case insensitive

const regExp = /html/i;
regExp.test("html"); // true
regExp.test("HTML"); // true
regExp.test("Html"); // true
Copy the code

Example 2: Replace string matching text with ‘text’

// No g modifier
const regExp = /html|css/;
"javascript html css".replace(regExp,'text'); // "javascript text css"


// there is a g modifier
const regExp = /html|css/g;
"javascript html css".replace(regExp,'text'); // "javascript text text"
Copy the code

Example 3: In multi-line mode, insert the character ‘#’ at the beginning and end of each line (\n in the string is a newline character)

// No m modifier
const regExp = /^|$/g;
"javascript\nhtml\ncss".replace(regExp,The '#'); // "#javascript\nhtml\ncss#"

// there is an m modifier
const regExp = /^|$/mg;
"javascript\nhtml\ncss".replace(regExp,The '#'); // "#javascript#\n#html#\n#css#"
Copy the code

String methods that handle regex

Regular expression is a powerful, convenient, efficient text processing tool, using re, you can modify the text, delete and so on, but this is often to use some methods in JS to achieve, we often used to deal with the re several methods in detail.

There are four common methods on strings named search, replace, match, and split, and regular expressions are usually passed in as arguments.

1. search()

Performs a search match between a regular expression and a String.

Grammar: STR. Search (regexp)

Parameters: Regexp is a regular expression object; If a non-regular expression object is passed in, it is implicitly converted to a regular expression object using new RegExp(RegExp)

Return value: Returns the index of the matching result in the string if the match is successful. Otherwise -1 is returned

Example 1:

const regExp = /script/i;
"JavaScript".search(regExp); / / 4
"I Love Java".search(regExp); // -1
Copy the code

2. replace()

Replaces the text in the string.

Grammar: STR. Replace (regexp | substr, newSubStr | function)

Parameters:

  • regexpIs a regular expression object, or its literal, whose matching string is replaced by the return value of the second argument;
  • substrIs a string. Search the string directly, and the first match will be replaced;
  • newSubStrThe string used to replace the matching text, which has special meaning if the following special characters appear in the string:
character meaning
$$ Insert a “$”
$& Insert the matching substring
$` Inserts the content to the left of the currently matched substring
$’ Inserts the content to the right of the currently matched substring
$n Let’s say the first parameter is zeroRegExpObject, and n is a non-negative integer less than 100, then insert the string matching the NTH subexpression. If no NTH grouping exists, the matches are replaced with literals. If there is no third grouping, the match is replaced with “$3”.
  • functionIs the function that is called on each match and returns the string as the replacement text; The first argument to this function is the string that matches the pattern, followed by zero or more subexpression matched strings, followed by index, which is the location of the match, and input, which is the string itself.

Return value: A string that partially or completely matches the new string replaced by the alternate pattern

Example 1: Replace all javascript (case insensitive) in the text with javascript

const text = "I love javascript, and I am learning javaScript.";
const regExp = /javascript/ig;
text.replace(regExp, "JavaScript"); 
// I love JavaScript, and I am learning JavaScript.
Copy the code

Example 2: Now that we look at the previous date, the example is easier to understand. Use the replace method to convert the date format, such as “2021-1-20” to “2021/1/20”

const regExp = / (\ d {4}) - (\ d {1, 2})/(\ d {1, 2})  
"2021-1-20".replace(regExp,"$1 / $2 / $3"); / / "2021/1/20"
Copy the code

Example 3: We often encounter the hump to “-” notation, such as converting the string “helloWorld” to “hello-world”, as follows

function formatStr(str) {
  return str.replace(/[A-Z]/g.(match) = >{
      return The '-' + match.toLowerCase();
  });
}
formatStr("helloWorld"); // "hello-world"
Copy the code

3. match()

String match, returns the match result.

Grammar: STR. Match (regexp)

Parameters: Regexp is a regular expression object; If a non-regular expression object is passed in, it is implicitly converted to a regular expression object using new RegExp(RegExp); If no argument is passed, return [“”]

The return value:

  • If you are usinggModifier returns an array of all results that match the full regular expression.
  • If not usedgModifier, the first element of the array is the matching string, the remaining elements are parenthesized subexpressions, and three attributes:indexProperty specifies where the matching text starts in the string,inputIs a reference to the string itself,groups: an array of capture groups or undefined (if no named capture group is defined)

Example 1: Extracting the time from a string

const regExp = / (\ d {4}) - (\ d {1, 2})/(\ d {1, 2})
"date:2021-1-20".match(regExp);
// ["2021-1-20", "2021", "1", "20", index: 5, input: "date:2021-1-20", groups: undefined]
Copy the code

Example 2: with g modifier, return all matches

const regExp = / \ d {1, 4} / g
"date:2021-1-20".match(regExp); / / / "2021", "1", "20"]
Copy the code

Example 3: The named capture group was mentioned earlier, also the date example

const regExp = / (? 
      
       \d{4})-(? "The month > \ d {1, 2}) - (? The < date > \ d {1, 2}) /
      
const result = "date:2021-1-20".match(regExp);
console.log(result.groups);
// {date: "20", month: "1", year: "2021"}
Copy the code

4. split()

Splits a string into an array. The general usage is not introduced here, only the usage of the argument is re.

Grammar: STR. The split (separator, limit)

Parameters:

  • separator: Specifies the string representing the point at which each split should occur.
  • limit: an integer that limits the number of split fragments returned.

Return value: Returns a new array of the source string delimited by the occurrence position of the delimiter.

Example 1: Split the string to get an array of names

const names = "Harry Trump ; Fred Barney; Helen Rigby ; Bill Abel ; Chris Hand ";
const regExp = /\s*(? :; |$)\s*/;
const nameList = names.split(regExp);
console.log(nameList);
// ["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand", ""]

const nameList2 = names.split(regExp, 4);
console.log(nameList2);
// ["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel"]
Copy the code

The RegExp method that handles the re

Before introducing the RegExp method, consider the following properties of the RegExp object:

  • source: Indicates the text of the regular expression
  • global: Read-only Boolean value, with or without modifiersg
  • ignoreCase: Read-only Boolean value, with or without modifiersi
  • multiline: Read-only Boolean value, with or without modifiersm
  • sticky: Read-only Boolean value, with or without modifiersy
  • unicode: Read-only Boolean value, with or without modifiersu
  • lastIndex: a readable and writable integer, if the matching pattern hasgModifier, which stores the location of the successful match and the start of the next retrievalexec()andtest()Methods used

The first six attributes are easy to use, with the lastIndex attribute being the one to pay special attention to. The value of the lastIndex attribute of the regular expression is used only as the starting search location for the exec and test methods, and only if the regular expression has the G modifier.

1. exec()

Similar to the String method match() described above, except that the argument is a String and the function returns the same value with or without the G modifier in the regular expression.

Grammar: regexObj exec (STR)

Parameter: STR the string to match

Return value: If the match is successful, the exec() method returns an array containing the attributes index and input and updates the lastIndex attribute of the regular expression object. Fully matched text is returned as the first item in the array, and from the second, each subsequent item corresponds to successfully matched text in the regular expression and subexpression.

Example 1:

const regExp = / (\ d {4}) - (\ d {1, 2})/(\ d {1, 2})
regExp.exec("2021-1-20")
// ["2021-1-20", "2021", "1", "20", index: 0, input: "2021-1-20", groups: undefined]
Copy the code

Example 2: When the regular expression uses the “G” flag, the exec method can be executed multiple times to find a successful match in the same string. When you do this, the search will start at the location specified by the lastIndex property of the regular expression.

const myRe = /ab*/g;
const str = 'abbcdefabh';
let myArray = myRe.exec(str);
while(myArray ! = =null) {
  console.log(myArray, myRe.lastIndex);
  myArray = myRe.exec(str);
}
// ["abb", index: 0, input: "abbcdefabh", groups: undefined] 3
// ["ab", index: 7, input: "abbcdefabh", groups: undefined] 9
Copy the code

2. test()

The test method is much simpler to use to see if the regular expression matches the string.

Grammar: regexObj. Test (STR)

Parameter: STR is the string used to match the regular expression

Return value: True if the regular expression matches the specified string; False otherwise.

Example 1:

var regex = /foo/g;

/ / regex. LastIndex is 0
regex.test('foo'); // true

// Regex.lastindex is now 3, so the following match is false
regex.test('foo'); // false
Copy the code

For more information about RegExp objects, see RegExp(Regular Expressions).

reference

  • JS regular expressions complete tutorial (slightly longer) : Lao Yao summed up this easy to understand, the examples are just right
  • Chapter 10 is the “distilled essence” of regex.
  • Regular expression: indicates the authoritative re on the MDN