This article has been published exclusively by guolin_blog, an official wechat account

Reprint please indicate the source: juejin.cn/post/692093…

This article is from Rong Hua Xie Hou’s blog

Past review:

Learning regular Expressions together (1) Those Dizzying Metacharacters

Learning regular Expressions together (2) Quantifiers and Greed

Learning regular Expressions together (3) Grouping and Referencing

Learning regular Expressions together (4) Four Common Matching patterns

Learning regular Expressions together (5) Predicate Matching

Learning regular Expressions together (6) Principle of Regular Matching

Learning regular Expressions together (7) Backtracking Traps

0. Write first

In development, regular expressions are often used to verify email and mobile phone numbers, and batch search and replace texts.

Most of the students, when they get the demand, the first thing must be to open the browser, search: how to write the mailbox regular expression, and then Ctrl C + V, test several conditions no problem, submitted, out of the problem also do not know how to modify, can only ask for help enthusiastic netizens.

This article, mainly take you to understand the basic usage of regular expression, have a preliminary understanding, see the regular after no longer a face meng.

For example, regular expressions for IPv4 addresses:

^ ((1-9] [0-9]? | 1 [0-9] [0-9] [0 to 4] | 2 | 25 [0-9] [0 to 5]) (\. (0 | [1-9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | 25 [0 to 5]) {3} $Copy the code

If you haven’t read this article, this expression may seem a little confusing at first glance, but don’t worry, you’ll find that what seems like a complicated expression is just that.

A mind map of the main content of this article can be used for quick follow-up queries:

1. Special single character

In regular expressions, ordinary characters still represent the original meaning. For example, expression 1 can match the number 1, and expression A can match the letter A.

However, if we want to match more characters, we can not list all characters, which would be a waste of time, then what better way, then metacharacters come into play.

.The dot wildcard can match any character except the newline:

\dA numeric wildcard that can match numbers 0-9

\DIf D is capitalized, the match is any non-number, equivalent to the antisense of \ D

\wAlphanumeric underscore Wildcard that can match any alphanumeric underscore

\WIf W is capitalized, any non-alphanumeric underscore is matched

\sWhitespace A wildcard character that can be assigned to any whitespace character, including carriage return, line feed, page feed, and TAB

\SIf S is capitalized, any non-whitespace character is matched

Here, the special single character is done, summary:

2. Whitespace

Whitespace is divided into the following categories, usually denoted by \s:

Scope of 3.

  • | or, like that or you think in your heart, ab | BC can match to the ab or BC

  • […]. For example, [ABC] can match the letters A, B, or C

  • [a-z] matches any element between a and z, and the wildcard \w can be represented by [A-za-z0-9_]

  • [^…]. Inversely, it can’t be any single element inside the parentheses

Note: The above expression can only match a single element at a time

4. The quantifiers

  • * Asterisk, representing 0 to multiple occurrences, may occur, may not occur, if the occurrence, unlimited number of times

  • Plus sign, which means 1 to more, which means at least once

  • ? The question mark represents 0 to 1 occurrences, for example, Http regex can be used in Https? said

  • {m} indicates m occurrences. For example, a{1} indicates that the letter a can appear only once in the matching rule

  • {m,} represents at least m occurrences, {0,} corresponds to an asterisk, and {1,} corresponds to a plus sign

  • {m,n} represents m to n occurrences, and {0,1} is equivalent to a question mark

5. Actual combat

Now let’s go back to the regular expression at the beginning of the article:

^ ((1-9] [0-9]? | 1 [0-9] [0-9] [0 to 4] | 2 | 25 [0-9] [0 to 5]) (\. (0 | [1-9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | 25 [0 to 5]) {3} $Copy the code

Do you think there is some idea, let’s implement it together, first talk about the rules:

  • The range of IPv4 addresses is defined as 1.0.0.0-255.255.255.255. Of course, there must be stricter definitions for IPv4 addresses, so we will not tangle with them here

  • Through the above range, we can get the basic rule [1-255].[0-255].[0-255].[0-255].

  • [0-255], so we just need to write the rules for [0-255] first, and then it is very simple

  • The ^ and $are used to mark the beginning and end of a line, which we’ll cover in the next article

Let’s start:

1. How to express a two-digit range

From the above, we know that a number can be represented by \d or [0-9]. What if you want to express many digits, such as 0-99?

The range from 0 to 99, the least number of digits is 1, the most number of digits is 2, so we can use two wildcards to represent, for clarity and beauty, we use **[0-9]** to represent.

Write it like this:

0 | [1-9] [0-9]?Copy the code

Among them,0Represents the number 0, which cannot be used to exclude the case of 00[0-9] [0-9]?To indicate that there is an or in the middle|Behind,[1-9] [0-9]?That’s 1 minus 99, remember?It means zero to one occurrence.

2. How to express a three-digit range

We’re done with the two digits, the three digits are easy, so let’s write down the range from 0 to 255.

Note here:

  • When you get up to three digits, the hundreds place can only be one or two

  • When the hundreds digit is 2, the tens digit can only be 0-5

  • When the tens digit is 5, the ones digit can only be 0-5

Let me write that:

0 | [1-9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | 25 [0 to 5]Copy the code

The range from 0 to 255 is 1 to 255:

[1-9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | 25 [0 to 5]Copy the code

Combination of 3.

The last combination, remember the meaning of {3}, represents this character or combination three times:

Note:.Don’t forget to use dot\Under the escape

^ ((1-9] [0-9]? | 1 [0-9] [0-9] [0 to 4] | 2 | 25 [0-9] [0 to 5]) (\. (0 | [1-9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | 25 [0 to 5]) {3} $Copy the code

Done, there is no suddenly enlightened feeling, verify:

6. Write at the end

Here, the basic use of regular expressions is over, if you have any questions can leave a comment, thank you.

The online verification tool for regular expressions is regex101.com/

In the next article, we will learn about the assertion mechanism of re. Stay tuned!