preface

This article introduces the basic use of regular expression and summary, hope to use the regular expression later, no longer for the search engine programming 😃.

First of all, recommend two good websites to learn regular:

  1. Edit regular expressions online
  2. Capsule programming regular expression learning

Regular object method

The method name describe
exec Searches for matched characters based on the re, and returns information about the first character matchedAn array of(Returns if there is no matchnull)
test Searches for matched characters based on the re, and returns a successful matchtrueOtherwise returnfalse

exec

Usage:

const regExp = /h/;

console.log(regExp.exec('hello'));

//[0: "h", groups: undefined, index: 0, input: "hello"]
Copy the code

test

Use the same as exec:

const regExp = /h/;

console.log(regExp.test('hello'));

// true
Copy the code

String methods for regular matching

The method name describe
search Returns the index of the first matched character based on the re, or -1 if no match is found. Note: Because search ignores modifiersg, so it does not support global search
replace Searches for matched characters according to the re, and replaces the matched string with a replacement string. Returns a new replaced string, the original string unchanged
match Returns an array of matching results based on the character found by the re
split Searches for matched characters based on the re, separates the original string based on the match result, and returns an array of split results

search

Grammar:

Str.search (regexp) // regexp: regular expression

Usage:

const regExp = /h/;
const str = 'elho';

console.log(str.search(regExp)); // return the matching index: 2
Copy the code

replace

Grammar:

STR. Replace (regexp | substr, newSubStr | function) / / regexp: regular expressions

Usage:

const regExp = /h/;
const str = 'elho';

console.log(str.replace(regExp, 'k')); //elko
Copy the code

match

Grammar:

Str.match (regexp) // regexp: regular expression

Use:

const regExp = /h/;
const str = 'elho';

console.log(str.match(regExp)); 

// [0: "h", groups: undefined, index: 2, input: "elho"]
Copy the code

When global pattern matching is not used, match returns the same result as the exec method on the re object. The difference is that when using global pattern matching, match returns an array of matching results, whereas the exec method always returns information about the first character that was matched.

const regExp = /h/g;
const str = 'elhoh';

console.log(str.match(regExp)); // ['h', 'h']
console.log(regExp.exec(str)); // [0: "h", groups: undefined, index: 2, input: "elho"]
Copy the code

split

Grammar:

str.split(separator|regexp, limit)

Regexp: indicates a regular expression

Limit: an integer that limits the number of split fragments to be returned. For example, if split is [1, 2, 3] and limit is 2, only the first two digits of the result array are returned. The final result is [1, 2].

Use:

const regExp = /\s? ; \s? /g;
const str = 'Harry ; Fred ; Rigby ; ';

console.log(str.split(regExp)); // ['Harry', 'Fred', 'Rigby', '']

console.log(str.split(regExp, 1)); // ['Harry']
Copy the code

Regular expression flag

Regular expressions have six optional parameters, allowing global and case-insensitive searches, and so on. These parameters can be used singly or in any combination.

mark describe
g Global search
i Case insensitive
m Global search
s allow.Matches a newline character
u Use the Unicode encoding pattern for matching
y Perform “Stickiness (sticky) “and matches from the current position of the target string

Method of use

Single use:

const regExp = /h/g;
const str = 'helho';

console.log(str.match(regExp)); // ['h', 'h']
Copy the code

Multiple combinations:

const regExp = /h/gi;
const str = 'helHo';

console.log(str.match(regExp)); // ['h', 'H']
Copy the code

Character classes

Character groups

Character group ([]) : Allows matching of a set of possible characters.

Such as:

Place different characters to match in [] : J, J. This means you need to configure: javaScript or javaScript.

interval

Some character groups are very large. For example, if we need to match all numbers, our character group would look like this: [0123456789]. To match all letters, we have to write both a-z and a-z, which is cumbersome, so we can use the – symbol in [] to represent intervals in the re.

  1. If we need to match numbers, then we can write intervals like this:[0-9].
  2. If we need lowercase letters, we can write the interval like this:[a-z].
  3. If we need capital letters, we can write the interval like this:[A-Z].
  4. To match all three, write:[0-9a-zA-Z].

Such as:

The not

Sometimes we want to exclude certain characters to match other characters. For example, we now want to match all digits except 2 and 4. Using normal character groups, we would write it like this: [01356789]. For large groups of characters, this can be troublesome. All re’s provide the ability to invert a group of characters by adding ^ to the beginning of the group. The above example can be written as: [^24]:

As shown in figure:

Match special characters – escape

Many punctuation marks have special meanings in regular expressions:

  1. []: indicates a character group
  2. -: indicates the interval

.

When we need to match characters with [], – or other special characters, we need to escape the special characters, namely: \-, \[\].

As shown in figure:

Special characters

Quick character

Shortcut characters: These are used to replace certain regular expressions with characters.

character meaning
. Matches any single character except newline. A decimal point represents any character. This character can be a number, symbol, Chinese or English, and so on
\w (lowercase w) matches a single-word character (letter, digit, or underscore), equivalent to[A-Za-z0-9_].
\W (capital W) Matches a non-single-word character. Is equivalent to[^A-Za-z0-9_]
\s (lowercase S) Matches a whitespace character, including a space, TAB, page feed, and line feed
\S (capital S) Matches a non-whitespace character
\d Matching a number is equivalent to [0-9].
\D Matches a non-number, equivalent to [^0-9].

Anchor character

Anchor character: It is used to determine the position of certain characters to be matched.

character meaning
^ Matches the beginning of a string, or when using a multi-line flag (mMatches the beginning of a line. This will match the position, not the character.
$ Matches the end of a string, or when using a multi-line flag (mMatches the end of a line. This will match the position, not the character
\b Match the boundaries of a word. Boundary can be understood as whether there are other non-word characters, such as space and symbol, between two “word” characters. If there are non-word characters, such as space and symbol, then the character has a boundary, and the re can match the character according to the boundary
\B Matches a non-word boundary

repeat

When we expect a character to match several times in a row, such as a cell phone number or an ID card, the only way we can match a cell phone number is this: /\d\d\d\d/The need to write 11 \d\d\d to represent 11-digit numbers is cumbersome and inflexible, so we use repeated character syntax to match elements that occur more than once.

Repeated character syntax

character meaning
{n,m} Matches the previous character at least n times, but not more than m times
{n,} Matches the preceding character n or more times
{n} Matches the previous character n times
? Matches the preceding character 0 or 1 times, that is, the preceding character is optional, equivalent to{0, 1}
+ Matches the preceding character 1 or more times, equivalent to{1,}
* Matches the preceding character 0 or more times, equivalent to, {0}

{n,m} character syntax – repetition interval

Sometimes, we are not sure about the number of repetitions of matching characters. For example, we need to check the number of Chinese and Russian landlines. China’s is 86-10086, Russia’s is 7-10000. In this case, the repetition interval {m,n} is used.

In this case, we need to match the phone number area code may be 1 or 2 digits, followed by 5 digits, we need to write: \d{1,2}\-\d{5}.

As shown in figure:

First you match 1 or 2 digits, and then you need to match -, and then you need to match 5 digits.

{n,} character syntax – open and close interval

Sometimes, we are not sure how many times the matching character is repeated, for example, the area code of a landline phone needs to be checked for multiple countries. Their area code and the number of digits are uncertain, only fixed format XXX-XXXX. In this case we need to use our open and closed interval {n,}.

Such as:

Matches at least 1 area code, then a – symbol, and finally at least 5 numbers.

{n} character syntax – repeat

The syntax of this character is relatively simple, which is to match the preceding character n times.

Such as:

\ D stands for number, followed by repeated character syntax {3}, which means three consecutive numbers match.

? Character syntax – Optional

? Character syntax indicates that the preceding character will match 1 or 0 times.

Such as:

An optional character syntax is placed after l, indicating that the match will be matched if l is present, and the result will be hello, and the match will not be matched if no, and the result will be Helo.

+ character syntax and * character syntax

+ character syntax: matches the previous character 1 or more times, equivalent to {1,}.

Such as:

/a\d+/g indicates that a string must be followed by at least one digit to be matched.

In this example, our string a does not meet regular expression requirements, so it is not matched.

Here’s another example:

Building on the previous example, we add multiple numbers after a, consistent with the regular expression.

* Character syntax: matches the preceding character 0 or more times, equivalent to {0,}.

Such as:

Unlike the + character syntax, the * character syntax means that an A followed by at least zero or more digits will be matched by the re.

grouping

Syntax for grouping characters:

character meaning
(x) Match x and remember the match, use$nRepresents the matches that are remembered
(? :x) Matches x, but doesn’t remember the matches
`x y`
\n Reference group

Capture the packet

Capture grouping: is to match () in the character, and can be matched to the result of the memory, through $n to get the match result. In the capture expression, the result of the first capture group is represented by $1, the result of the second capture group is represented by $2, and the result of the NTH capture group is represented by $n.

Such as:

// (\d{3}) is the first grouping corresponding to $1
// (\d{4}) is the second grouping corresponding to $2
const regexp = /(\d{3})-(\d{4})/g;
const str = '123-4567';
regexp.exec(str);

console.log(RegExp. $1);/ / 123
console.log(RegExp. $2);/ / 4567
Copy the code

Uncaptured grouping

Non-capture grouping is the same as capture grouping, except that the matched result is not recorded and cannot be retrieved by $n.

Such as:

// (\d{3}) is the first grouping corresponding to $1
// (\d{4}) is the second grouping corresponding to $2
const regexp = / (? :\d{3})-(? :\d{4})/g;
const str = '123-1234';
regexp.exec(str);

console.log(RegExp. $1);/ /"
console.log(RegExp. $2);/ /"
Copy the code

Reference group

Reference grouping subcharacter rules used to match within a character occur more than once.

Such as:

Text < font > < / font >

You need to match the HTML tag to extract the tag content.

We can write this:

As you can see, we have two groups:

  1. (\w+): Indicates the group of HTML tags. The serial number is 1
  2. (. +): Indicates the label group. The number is 2

Finally, we use \1 in the closing tag to reference the group numbered 1 to ensure the consistency of the HTML tag. Finally, we can use RegExp.$2 to get the contents of the tag.

Or conditions

Or conditions is to use | symbol like character groups, are all matching multiple options. But the difference is that matches in character groups can only be matched in the form of a single character, while conditions of or conditions can be composed of multiple characters.

Such as:

/[abc]/g / / character set

/abc|asd|dfg/g // Or conditions
Copy the code

First assertion

Forward antecedent assertion

Forward prior assertion: (? = expression), such as: x(? =y), from left to right, x must be followed by y to satisfy the forward anemic assertion.

Such as:

I like you I like I like I like I like you

Need to take out like two words, need to like the back of “you”, this time to write: like (? = you) :

Reverse antecedent assertion

Reverse antecedent assertion: ! Expression), such as: x(? ! Y), from left to right, x is guaranteed not to be followed by Y to satisfy the reverse ante-assertion.

Such as:

I like you I like I like I like I like you

Need to take out like two words, need to like the back without “you”, this time to write: like (? ! You) :

After assertion

Forward after assertion

The only difference between an antecedent assertion and a trailing assertion is that an antecedent assertion looks from left to right and a trailing assertion looks from right to left.

Forward after assertion: (? <= expression), such as :(? <=y)x, from right to left, x must be followed by y to satisfy the forward backward assertion.

For example, if you want to take out the two Chinese characters “like”, you should write “like” in front of me and “like” in the back. <= I) like (? = you).

Reverse trailing assertion

Reverse postassertion: (?

If you want to take the word “like”, you should write “like” without “I” and “like” without “you”.

In actual combat

Now that the basic knowledge of re has been learned, let’s do some questions to test ourselves.

Use the re to determine if a character is a decimal

Topic:

Match: 1.2 11.22 0.1 192.123456

Do not match: 1 12 12.12.12.1.1.1.1 999

First, analyze the characteristics of decimals:

  1. Start with a number
  2. There’s only one decimal point
  3. End with a number

Answer: / ^ \ d + \ \ d + $/ g

Let’s examine this regular expression:

  1. ^\d+: Matches at least one or more characters starting with a number.
  2. \.: Uses character escape to match character decimal point
  3. \d+$: Matches at least one or more characters ending in digits

Verify the mobile phone number format with the re

Topic:

Verify that the mobile phone number format is correct?

Rules:

  1. It must be11The number of bits;
  2. The first digit must start with1At the beginning, the second digit could be,4,5,7,8 [3]Any one of the following9The number is[0-9]Any number in.

Answer: / ^ 1,4,5,7,8 [3] \ d {9} / g

Let’s examine this regular expression:

  1. ^ 1: matches the character starting with 1
  2. ,4,5,7,8 [3]: matches the second character3,4,5,7,8One of the characters
  3. \d{9}: matches nine digits