What is a regular expression?

Regular expressions? Here apply the concept of Baidu

Regular expressions are also called regular expressions. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule). Many programming languages support string manipulation using regular expressions. For example, there is a powerful regular expression engine built into Perl. The concept of regular expressions was first popularized by Unix tools such as sed and grep. Regular expressions are usually abbreviated to “regex”. There are regexp and regex in the singular and regexps, regexes, and regexen in the plural.

Regular expression constructor

In ES5, there are two cases for arguments to the RegExp constructor.

In the first case, the argument is a string, in which case the second argument represents the regular expression modifier (flag).

var regex = new RegExp('xyz'.'i'); Var regex = /xyz/ I;Copy the code

In the second case, if the argument is a regular expression, a copy of the original regular expression is returned.

var regex = new RegExp(/xyz/i); Var regex = /xyz/ I;Copy the code

However, ES5 does not allow the second argument to be used at this time, adding modifiers, otherwise an error will be reported.

var regex = new RegExp(/xyz/, 'i');
// Uncaught TypeError: Cannot supply flags when constructing one RegExp from another
Copy the code

ES6 changes this behavior. If the first argument to the RegExp constructor is a regular object, the second argument can be used to specify the modifier. Furthermore, the returned regular expression ignores the modifier of the original regular expression and uses only the newly specified modifier.

new RegExp(/abc/ig, 'i').flags
// "i"In the code above, the modifier of the original re object is ig, which is overridden by the second argument I.Copy the code

The basic syntax of regular expressions

Before we begin, let’s briefly review a basic concept of regular expressions in JS. This part is very basic, and it is also easy to overlook, or omit, but it is a very important concept related to regular development.

Basic metacharacter

writing meaning
. Matches any single character except a newline character
\ A backslash before a non-special character indicates that the next character is special and cannot be interpreted literally. For example, ‘b’ without pre-\ usually matches a lowercase ‘b’, no matter where they appear. If “is added, the character becomes a special meaning character, and the backslash can also escape the special character that follows it to a literal
| Logic or operator
[] Defines a character set that matches a character in the character set, in which characters like., \ represent themselves
(^) Take not from the set above
Defines an interval, such as [a-z], where the first and last characters are in the ASCII character set

Quantitative metacharacter

writing meaning
{m,n} Matches the preceding character at least m times and at most n times, and {m} means m times and {m,} means at least m times
+ Match the previous expression once or more, equivalent to {1,}, memory mode append (+), at least once
* Matches the preceding expression zero or more times, equivalent to {0,}, memory mode multiplication (*), may not be once
? Match the preceding expression zero or once, equivalent to {0,1}, memorization, ok? , with (1) or without (1), if followed by any quantifier *,+,? The following {} will change the quantifier to non-greedy mode (match as few characters as possible). Greedy mode is used by default. For example, applying /\d+/ to “123abc” will return “123” if /\d+? /, then only “1” will be matched.

Special metacharacter

writing meaning
\d [0-9], represents a digit, memory mode digit
\D [^0-9], representing a non-digit digit
\s [\t\v\n\r\ F], represents the space character, including space, horizontal TAB character (\ T), vertical TAB character (\ V), newline character (\n), carriage return character (\ R), page feed character (\ F), memory mode space character
\S [^\t\v\n\r\f], indicating non-blank character
\w [0-9A-zA-z], which represents alphanumeric, uppercase and lowercase letters and underscores, memorized in Word
\W [^ 0-9A-za-z], which represents a non-word character

Mark character

writing meaning
g Global search memory mode Global
i Case insensitive memory mode ignore
m Line search

Js methods related to regex

When you think of regex, the first thing that comes to mind is strings, so it’s natural to think of some of the methods we use when dealing with strings:

StringThe methods in

Search accepts a re as an input parameter, and if the input parameter is not a regular expression, it implicitly converts it to a re using new RegExp(obj)! Returns a match to the start of the substring, -1 if not matched (this is similar to the indexOf method in arrays)

Match accepts the same parameters as the above method. The return value depends on whether the re contains g. If there is no G, the match method matches the string once. If no matching text is found, match returns null. The remaining elements hold the text captured by the re, and the array also contains two objects, index representing the position of the matched text in the string, and input representing the original string being parsed. If g is present, it returns an array containing the results of each match.

Replace takes two arguments. The first is the text to be replaced, which can be either a re or a string. The second is the text to be replaced, which can be a string or a function. The string can replace the previously captured substring with some special variable.

? Insert a ‘$’.

$& inserts the matching substring.

$’ inserts the content to the left of the currently matched substring.

$’ inserts the content to the right of the currently matched substring.

$n If the first argument is a RegExp object and n is a non-negative integer less than 100, insert the NTH parenthes-matched string.

If the second argument is a function, the function takes the following input argument:

Match: matching substring. (Corresponding to the $& above.)

p1,p2,… If the first argument to the replace() method is a RegExp object, it represents the NTH parenthes-matched string. (Corresponding to the above $1, $2, etc.)

Offset: offset of the matched substring in the original string. (For example, if the original string is “abcd” and the matched substring is “BC”, this parameter will be 1.)

String Indicates the matched string.

RegExpThe methods in

Test takes a string argument and returns true if the regular expression matches the specified string, false otherwise

Exec also takes a string as an argument and returns an array containing the matching results. If no match is found, the return value is null. In a match, the return value is the same as if the match method had no G flag. The 0th array represents the text that matches the re, the next n are the corresponding n captured text, and the last two are the index and input objects and it starts retrieving the string at the character specified in the lastIndex attribute of the re instance. When exec() finds the text that matches the expression, after the match, it sets the lastIndex property of the re instance to the position next to the last character of the matched text. The presence or absence of a G flag has no effect on the word exec method, except that the exec() method can be repeatedly called with the G flag to iterate over all the matching text in the string. When exec() finds no more matching text, it returns NULL and resets the lastIndex attribute to 0.

Multiple number matching

{m,n} indicates that m to N entries are matched

letReg = / abcd efg/g {1, 3}let str = 'abcddefg abcddddefg abcdefg abcdddefg'Str.match (reg)"abcddefg"."abcdefg"."abcdddefg"]
Copy the code

Greedy mode is turned on by default in all matches of regex:

Greed and inertia are easy to understand, and we can see from the literal meaning that “greed” means that if you meet the requirements, you keep matching until you don’t, and that’s the greedy pattern. The opposite is the lazy pattern (also known as the non-greedy pattern), which stops once a match is made.

Greedy mode identifier: +,? , *,{n},{n,},{n,m}. Lazy mode: +? And?????? , *?? , {n}? , {n,}? ,{n,m}? ;

Non-greedy mode:

Var regex = / \ d {2, 5}? /g; var string ="123, 1234, 12345, 123456"; console.log( string.match(regex) ); / / = > ["12"."12"."34"."12"."34"."12"."34"."56"]
Copy the code

The implication of this case is that if you match two, you don’t match any more.

Multiple case matching

1. Use [] metacharacter for processing, for example:

let reg = /demo[1234].js/g
let filesName = 'demo1.js demo2.js demo3.js demo4.js demo5.js'
filesName.match(reg)
// => ["demo1.js"."demo2.js"."demo3.js"."demo4.js"]
Copy the code

For example, [123456abcdefGHIJKLM] can be written as [1-6a-fg-m]. [^0-9] can be used to indicate ‘not ‘, that is, any character other than a number

3. A variety of conditions can also be a variety of branches, with a pipe to connect |, for example

var regex = /good|goodbye/g;
var string = "goodbye"; console.log( string.match(regex) ); / / /"good"]
Copy the code
/ / the number of up to 2 decimal places / ^ (\ [1-9] d * | 0) (\. \ d {1, 2})? $/ Phone number /(\+86)? 1 \ d {10} / id / ^ (\ d {15} | \ d {and}) ([xX] | \ d) $/Copy the code

Position related

For example, /^kingdee/ matches a string position at the beginning of the string, followed by k, then I,n,g,d,e,e, and finally at the end of the string. To verify that ^ and $are placeholders in nature, we can use an example:

let str = 'kingdee';
str.replace(/^|$/g,The '@'); The result is :@kingdee@ // We'll do it one at a time'String'It's replaced by the @ signCopy the code

/b, /b matches word boundary and non-word boundary. Word boundary refers specifically to the position between \w([a-za-z0-9_]) and \w, including the position between \w and ^ and $, for example:

let str = 'kingdee fly [js]_reg.exp-01'.replace(/\b/g, The '@') the result is @kingdee @@fly @[@js@]@_reg@.@exp@-@01@Copy the code

(? = p), (? ! P) matches the position before p (the position of the character followed by p) and the position not before p (the position of the character not followed by p), for example:

'hello'.replace(/(? =l)/g,The '#') The result is he#l#lo
'hello'.replace(/(? ! l)/g,The '#'The result is#h#ell#o#
Copy the code

The property of string positions is that the ‘space’ between strings can be thought of as infinitely many, for example, ‘ ‘

The positions between characters can be multiple, or even thought of as infinite, empty strings, for example, hello can be generic ” + ‘H’ + ‘e’ + ‘L’ + ‘L’ + ‘O’ + ”, Can also Be a ‘+’ a ‘+’ ‘+’ ‘+’ h ‘+’ e ‘+’ l ‘+’ l ‘+’ o ‘+’, so / ^ h \ Be/Bl/Bl/Bo $/. The test (‘ hello ‘), the result is true, /^^^^^^h\B\B\Be\Bl\Bl\Bo? $/.test(‘hello’) is also true.

Example: thousandth conversion Although it is possible to convert a string of numbers directly to thousandth format using methods such as toLocaleString, it is always better to have more than one solution to the same problem.

let num = '1008610086';
letreg = /(? =(\d{3})+$)/g str.replace(reg,', ') The results are:"1008610086"
Copy the code

The use of parentheses

Use parentheses to represent the concept of a local whole in a re. Use parentheses primarily.

let str = 'hello word';
letReg = / ^ hello (world | today) $/ STR. The match (reg) results output: ["hello world"."world", index: 0, input: "hello world", groups: undefined] // The second element in the array is the matched groupCopy the code

Use of groups

Grouping can play an important role in the use of re, adapting to multiple scenarios. In the previous example of thousandth conversion, grouping was actually used to deal with ‘recurring’ string fragments.

Match and capture

Date, for example, the extraction of regular expressions can be written as / \ (\ d {4})/(\ d {1, 2}) \ / (\ d {1, 2}), extraction of date 2020/4/15 (date) (month) (year).

let str = '2020/4/15'
letReg = / \ (\ d {4})/(\ d {1, 2})/(\ d {1, 2})/STR. The match (reg) results output: ["2020/4/15"."2020"."4"."15", index: 0, input: "2020/4/15", groups: undefined] // return the second item of the array, which is year, month and day respectivelyCopy the code

For grouping, we can also use the test method of the RegExp object itself:

let str = '2020/4/15'
letReg = / \ (\ d {4})/(\ d {1, 2})/(\ d {1, 2}) / / / / STR. The match (reg) reg. Test (STR) / / get digital RegExp (date) (month) (year).The $1
"2020"
RegExp.$2
"4"
RegExp.$3
"15"
Copy the code

Match the change

For example, if you want to replace YYYY-MM-DD with MM/DD/YYYY, what do you do?

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, "$2/$3/The $1"); console.log(result); / / = >"06/12/2017"In replace, the second parameter is usedThe $1,$2,$3Refers to the corresponding grouping. = /(\d{4})-(\d{2})-(\d{2})/; var string ="2017-06-12";
var result = string.replace(regex, function() {
	return RegExp.$2 + "/" + RegExp.$3 + "/" + RegExp.The $1; }); console.log(result); / / = >"06/12/2017"Var regex = /(\d{4})-(\d{2})-(\d{2})/; var string ="2017-06-12";
var result = string.replace(regex, function(match, year, month, day) {
	return month + "/" + day + "/"+ year; }); console.log(result); / / = >"06/12/2017"
Copy the code

Of course, ES6 provides a more convenient date matching method Named Capture Groups, which allows you to assign a name to each group match, making it easy to read the code and reference it.

const RE_DATE = /(? <year>\d{4})-(? <month>\d{2})-(? <day>\d{2})/; const matchObj = RE_DATE.exec('2020-4-15'); const year = matchObj.groups.year; // 2020 const month = matchObj.groups.month; // 4 const day = matchObj.groups.day; / / 15Copy the code

Group nesting problem

Often encountered in our development group of groups, namely the parenthesis in parentheses, like this: / (a) ((a | b) +) c /, so this kind of regular, how to understand, regular will be how to match? Add RegExp.$1,RegExp.$2,RegExp.$3. RegExp.$1 ($1, $1, $1, $1, $1, $1, $1, $1, $1, $1, $1, $1) Look at this simple example:

var reg = /(A+)((B|C|D)+)(E+)/gi; // This regular expression has four groups // correspondence //RegExp.The $1<-> (A+) // Grouping 1 //RegExp.$2< - > ((B) | | C D +) / / group / 2 / RegExp.$3< - > C (B | | D) / / group 2-1 (note that this is a group of 2 son branch) / / RegExp.$4<-> (E+) //'i'); Result output: ["ABCDE"."A"."BCD"."D"."E"]
Copy the code

Summary: The nesting rule of groups is to match the large group first, if there are other groups under the large group, then match the group first.

backreferences

Take the case from the previous date. In our daily work, we often encounter the problem of date conversion format, specifically XXXX – XX-xx or XXXX /xx/xx. If you have a requirement at this point, you need to determine whether the returned date format is correct or matches the date. Then, the regularity as a basis for matching needs to be more rigorous. For example, the date format returned by the background might be 2020-4-15 or 2020/4/15. So how to unify match? May be we can first think of/(\ d {4}) \ / – (\ d {1, 2}) \ / – (\ d {1, 2}) /, at first glance, no problem, but if the background one day return to 2020-4/15, this format, does this also can match? Here’s a bug:

letReg = / (\ d {4}) \ / - (\ d {1, 2}) [\ / -)/(\ d {1, 2})let string1 = "2020-04-15";
let string2 = "2020/04/15";
let string3 = "2020/04-15";
let string4 = "2020-04/15";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // true
Copy the code

In fact, we need to be very simple, is’ year ‘and’ month ‘,’ month ‘and’ day ‘delimiter must be the same and we expect! So that’s where the backreference comes in.

letReg = (\ d {4})/(\ / -) \ 2 (\ d {1, 2})/(\ d {1, 2})let string1 = "2020-04-15";
let string2 = "2020/04/15";
let string3 = "2020/04-15";
let string4 = "2020-04/15";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // false
console.log( regex.test(string4) ); // false
Copy the code

Uncaptured matching

The groups that appear in the previous article capture the data they match for subsequent reference, so they are also called captured groups. If you just want the primitive functionality of parentheses, you don’t refer to them, that is, you don’t refer to them in the API or back reference them in the re. At this point you can use non-capture grouping (? :p), for example, the first example of this article can be modified to:

var reg = /(? :ab)+/g; var string ="ababa abbb ababab"; console.log( string.match(reg) ); / / = > ["abab"."ab"."ababab"]
Copy the code

Positive predictive

expression The name of the describe
(? =exp) Positive predictive Matches the following position that satisfies the expression exp
(? ! exp) Negative predictive Matches the following position that does not satisfy the expression exp
(? <=exp) To look back Matches the previous position that satisfies the expression exp (JS not supported)
(? <! exp) Negative backward deep Matches previous positions that do not satisfy the expression exp (JS not supported)
var str = 'Hello, Hi, I am Hilary.'; var reg = /H(? =i)/g; var newStr = str.replace(reg,"T"); console.log(newStr); //Hello, Ti, I am Tilary.Copy the code

In this DEMO we can see the effect of forward lookforward, again the character “H”, but only matches “H” followed by “I”. To be matched. What about negative foresight? The principle is the same:

var str = 'Hello, Hi, I am Hilary.'; var reg = /H(? ! i)/g; var newStr = str.replace(reg,"T"); console.log(newStr); //Tello, Hi, I am Hilary.Copy the code

In this DEMO, we have replaced the previous positive outlook with a negative one. This re means that it matches “H” and cannot be followed by an “I”. In fact, it is negative forward-looking negative mode.

Common regular expressions

Having covered some of the basics, let’s now list some common front-end regular expressions

Mobile phone number

Var mPattern = / ^ ([0-9] (13) | | (14 [5] | 7) (15 ([0, 3] | [5-9])) | (18 [0, 5-9])) \ d {8} $/;Copy the code

Id number (18 digits) is regular

var cP = /^[1-9]\d{5}(18|19|([23]\d))\d{2}((0[1-9])|(10|11|12))(([0-2][1-9])|10|20|30|31)\d{3}[0-9Xx]$/;
Copy the code

The above regularization can only be used to determine the basic passability of the ID number. For more information about the determination of the citizenship number, please refer to the document: Correctness determination of the Citizenship Number and Program Implementation

Mobile phone number

Var mPattern = / ^ ([0-9] (13) | | (14 [5] | 7) (15 ([0, 3] | [5-9])) | (18 [0, 5-9])) \ d {8} $/;Copy the code

url

var urlP= /^((https? |ftp|file):\/\/)? ([\ \. Da - z -] +) \. ([a-z \.] {2, 6}) (/ \ \ / \ w - *) * \ /? $/;Copy the code

The IP address

Var ipP = /^(? : (? : 25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) \.) {3} (? : 25 [0 to 5] | 2 [0 to 4] [0-9] | [01]? [0-9] [0-9]?) $/; // IPv6 address regular var pattern = /(([0-9a-fA - F] {1, 4} {7, 7}) [0-9 A-fA - F] {1, 4} | ([0-9 A-fA - F] {1, 4} {1, 7}) : | ([0-9 A-fA - F] {1, 4} {1, 6}) : [0-9 A-fA - F] {1, 4} | ([0-9 A-fA - F] {1, 4} {1, 5}) (: [0-9 A-fA - F] {1, 2} {1, 4}) | ([0-9 A-fA - F] {1, 4} {1, 4}) (: [0-9 A-fA - F] {1, 3} {1, 4}) | ([0-9 A-fA - F] {1, 4} {1, 3}) (: [0-9 A-fA - F] {1, 4} {1, 4} | ([0-9 A-fA - F] {1, 4}} {1, 2) (: [0-9 A-fA - F] {1, 4} {1, 5} | [0-9 A-fA - F] {1, 4} : ((: [0-9 A-fA - F] {1, 4} {1, 6}) | : ((: [0-9 A-fA - F] {1, 4} {1, 7} | :) | fe80: : [0-9 A-fA - F] {0, 4} {0, 4} % [0-9 A zA - Z] {1} | : : (FFFF (: 0 {1, 4}) {0, 1} {0, 1}) ((25 [0 to 5) | (2 [0 to 4] | 1 {0, 1} [0-9]) {0, 1} [0-9]) \.) {3} (25 [0 to 5) | (2 [0 to 4] | 1 {0, 1} [0-9]) {0, 1} [0-9]) | ([0-9 a-fA - F] {1, 4} {1, 4}) : ((25 [0 to 5) | (2 [0 to 4] | 1 {0, 1} [0-9]) {0, 1} [0-9]) \.) {3} (25 [0 to 5) | (2 | 1 [0-4] [0-9] {0, 1}) [0-9] {0, 1})) /;Copy the code

QQ number regular, 5 to 11 digits

Var qqPattern = / ^ (1-9] [0-9] {4, 10} $/;Copy the code

Micro signal regular, 6 to 20 characters, beginning with a letter, letter, number, minus sign, underscore

Var wxPattern = / ^ [a zA - Z] ([- _a - zA - Z0-9] {5} 3) + $/;Copy the code

The license plate is regular

Var cPattern = /^[a-z]{1}[a-z]{1}[a-z]{1}[a-z0-9]{4}[a-z0-9 hang student police Hong Kong and Macau]{1}$/;Copy the code

Include Chinese regular

var cnPattern = /[\u4E00-\u9FA5]/;
Copy the code

China Postcode (China postcode is 6 digits)

var pattern = /^[1-9]\d{5}(? ! \d)$/Copy the code

Regular expressions for blank lines

Var pattern = /\n\s*\r/Copy the code

Whether the call is fixed

Var pattern = / 0 \ d {2, 3} - \ d {7, 8} /Copy the code

Legitimate email

var pattern = /^([a-zA-Z0-9]+[-_\.]?) +@[a-zA-Z0-9]+\.[a-z]+$/Copy the code

The domain name

var pattern =[a-zA-Z0-9][-a- zA - Z0-9] on conversion {0} (/. [a - zA - Z0-9] [-a- zA - Z0-9] on conversion {0}) + /.?Copy the code