Preface: In the normal development of the regular is probably not need to master completely, but with the technology of in-depth excavation or to go into a framework source code parsing in or find some daily processing strings of better processing, then the regular expression can be said to be essential. Below, I begin to demystify regular expressions step by step, from shallow to deep.

We need to grasp a few concepts before we begin

  • How many ways can regular expressions be rendered?
  • What are metacharacters? What are the common ones?
  • What are the common modifiers?
  • What is the regular greedy mode and non-greedy mode
  • .

Regular expression rendering

We can use an example to see the difference between the following two approaches.

🌰 [whether the text contains the test field] :

let str = 'this is test';
console.log(str.indexOf('test')! = = -1) // true
console.log(str.includes('test')) // true
Copy the code

shorthand

// : two slashes. [] Here are a few simple words:

🌰 regular:

// Whether the text contains the test field
let str = 'this is test';
let res = /test/
console.log(res.test(str))  // true
Copy the code

The constructor

New RegExp(pattern[, flags]): constructor. It is not used in many scenarios, but dynamic parameter transfer is supported. This is useful for scenarios where rules need to be configured dynamically.

🌰 regular:

 // Find the text containing test
let str = 'this is test';

function foo(r, s) {
    const re = new RegExp(r)
    return re.test(s)
}
let res1 = foo('test', str);  // true
let res2 = foo('test1', str); // false
Copy the code

metacharacters

Metacharacters are special characters with specific functions. Most of them need to be identified with backslashes to distinguish them from common characters. A few metacharacters need backslashes to escape them into common characters.

A list of common metacharacters

character describe
\ Marks the next character as a special character, or a literal character, or a backreference, or an octal escape. For example, ‘n’ matches the character “n”. ‘\n’ matches a newline character. The sequence ‘\’ matches “” and “(” matches “(“).
^ Matches the start of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ‘\n’ or ‘\r’.
$ Matches the end of the input string. If the Multiline property of the RegExp object is set, $also matches the position before ‘\n’ or ‘\r’.
* Matches the preceding subexpression zero or more times. For example, zo* matches “z” and “zoo”. * is equivalent to {0,}.
+ Matches the previous subexpression one or more times. For example, ‘zo+’ matches “zo” and “zoo”, but not “z”. + is equivalent to {1,}.
? Matches the preceding subexpression zero or once. For example, “do (es)?” Can match “do” or “does”. ? Equivalent to {0,1}.
{n} N is a non-negative integer. Match certain n times. For example, ‘o{2}’ does not match the ‘O’ in “Bob”, but does match two o’s in “food”.
{n,} N is a non-negative integer. At least n times. For example, ‘o{2,}’ does not match the ‘O’ in “Bob”, but matches all o’s in “foooood”. ‘o{1,}’ is equivalent to ‘o+’. ‘o{0,}’ is equivalent to ‘o*’.
{n,m} Both m and n are non-negative integers, where n <= m. At least n times and at most m times are matched. For example, “o{1,3}” will match the first three o’s in “fooooood”. ‘o{0,1}’ is equivalent to ‘o? ‘. Note that there can be no Spaces between commas and numbers.
. Matches any single character except newline characters (\n, \r). Match any character including ‘\n’

To view more


How about a couple of examples of metacharacters

🌰 chestnuts:

Find the special character of the string

let str = '- & *'
let re = / [\. \ \ & \ *] + /
console.log(re.test(str)) // true
Copy the code

Using escape characters to match special characters such as \-, \&, \. And so on is consistent with native JavaScript escape principles.

Above:


🌰 chestnuts:

Checks whether the string starts with a number and ends with a letter

let str = 'abc123'
let re = /^[a-z]+[0-9]+$/;
re.test(str) // true
Copy the code

Explanation: ^ begins with XXX, & ends with XXX, in [a-z] denotes either of the a-z characters, + one or more times.

Above:

🌰 chestnuts:

Replace the format of the date

let str = '2019-10-1'
let re = /(\d+)\-(\d+)\-(\d+)/;
let result = str.replace(re, '$1 year $2 month $3'); 
console.log(result) // October 1, 2019
Copy the code

Use the metacharacter () grouping concept to get the matching group name, and then local substitution.

Above:

🌰 chestnuts:

Matches hexadecimal color values

let str = '#ffbbad #Fc01DF #FFF #ffE'
let regex = / # [0-9 a zA - Z] {3, 6} / g;
console.log(str.match(regex));
Copy the code

Explanation: Find the rule of hexadecimal characters, use the {3,6} form to match at least three times, up to six times.

Above:

The above is how to use metacharacters have a simple introduction, want to master more detailed words also need to read more practice thanks. Let’s look at some common modifiers.


A commonly used modifier

g: global mode, which looks for the entire content of the string rather than ending with a match.

🌰 chestnuts:

 // Find the text containing test
let str = 'this is test test';

str.match(/test/); // Query the first match
str.match(/test/g);// Find all matches ['test','test']
Copy the code

On stringsmatchMethod, please check for yourself.


i: case insensitive: ignores the case of pattern and the string.

🌰 chestnuts:

// Find the text containing test
let str = 'this is TEST TEST';

str.match(/test/); // null
str.match(/test/i);// select * from TEST;
str.match(/test/ig);// Query globally for a case-insensitive test string
Copy the code

m: Multi-line mode, which indicates that the search continues after reaching the end of a line of text.

By default, ^ and $match the beginning and end of a string. With the m modifier, ^ and $also match the beginning and end of a line. That is, ^ and $recognize newline \n.

PS: all metacharacters are covered later.

  • ^: Matches the start of the input string.
  • $: Matches the end of the input string.

🌰 chestnuts:

// Find the text containing test
let str = 'this is test\n'
/test$/.test(str) // false
/test$/m.test(str) // true
Copy the code

y: Adhesion mode, which only looks for strings starting from lastIndex and after.

The y modifier is similar to the g modifier. Why is it similar? Because the G modifier is ok as long as there is any remaining position, and the Y modifier must start at the first remaining position, which is what adhesion means.

🌰 chestnuts:

// Find the text containing a
let str = 'aaa_aa_a'
let r1 = /a+/g;
let r2 = /a+/y;

r1.exec(str)   // ['aaa']
r2.exec(str)   // ['aaa']

r1.exec(str) // ['aa']
r2.exec(str) // null
Copy the code

Explanation: One uses the G modifier and the other uses the Y modifier. The two regular expressions are executed twice each, and on the first execution, they behave the same, with the remaining strings being _aa_A. Since the G modifier has no position requirement, the second execution returns the result, while the Y modifier requires that the match must start at the head, so null is returned. More details


m:Unicode mode, enabling Unicode matching.

Will correctly handle four bytes of UTF-16 encoding. More details

🌰 chestnuts:

let str = 'ð ®·';
$/ / ^.test(str) // false
/^.$/u.test(str) // true
Copy the code

s: Indicates metacharacters in dotAll mode. Matches any character (including \n or \r).

In plain English: actually in the meta string. Can represent any character except terminators such as \n \r, for which the s modifier is used.

🌰 chestnuts:

let str = 'hello\nworld'
/hello.world/.test(str)    // false
/hello.world/s.test(str)   // true
Copy the code

Regular greedy mode and non-greedy mode

Before saying greedy mode and non-greedy mode, we need to grasp what quantifier is, as shown in the following figure:

Quantifiers are metacharacters, so you don’t have to worry about all the new nouns, but at least you should know how to use them.

Here’s an example of what greedy mode and non-greedy mode are:

🌰 chestnuts:

Greed mode

var regex = / \ d {2, 5} / g;
var string = "123, 1234, 12345, 123456";
console.log(string.match(regex)); // ["123", "1234", "12345", "12345"]
Copy the code

Above:

Non-greedy model

var regex = / \ d {2, 5}? /g;
var string = "123, 1234, 12345, 123456";
console.log(string.match(regex)); / / / "12", "12", "34", "12", "34", "12", "34", "56"]
Copy the code

Above:

Notice anything different? Oh, that’s one more, right? Just add? After the quantifier. It doesn’t match down as long as it meets the requirements. The non-greedy model is also called an inert quantifier,

For details, you can refer to Lao Yao’s regular book, which explains in detail some concepts and methodology of regular, and suggests that you often read and write.


The last

In fact, there are many concepts about regular, I just help you here in the complex regular world pave a turn, into the entry. After that, you need to explore by yourself. If you have time, you might as well read Lao Yao’s regular Expression book recommended by me. If you also need other books to consolidate your foundation, you can also download the list of books ARRANGED by me to download for free.