Reprint please indicate the source: juejin.cn/post/698903…

This article is from Rong Hua Xie Hou’s blog

Past review:

Learning regular Expressions together (1) Those Dizzying Metacharacters

Learning regular Expressions together (2) Quantifiers and Greed

Learning regular Expressions together (3) Grouping and Referencing

Learning regular Expressions together (4) Four Common Matching patterns

Learning regular Expressions together (5) Predicate Matching

Learning regular Expressions together (6) Principle of Regular Matching

Learning regular Expressions together (7) Backtracking Traps

0. Write first

Today let’s learn about the matching pattern in the re. The so-called matching pattern refers to some ways in the re that change the metacharacter matching behavior, such as the matching of English letters.

Remember when we were in the second article middle school of greed, not greed and exclusive mode, the mode will change regular matching quantifiers in behavior, today to see some has nothing to do with quantifiers matching mode, there are four, respectively is case-insensitive, dots wildcard, multi-line matching model, the annotation model.

1. Case insensitive mode

As the name implies, case insensitive mode is I want to match Cat in the target string, I don’t care if it’s Cat Cat, Cat Cat Cat, just match me.

Mode modifiers are passed by (? We just need to put the pattern modifier before the corresponding re to use the specified pattern.

The English language is case-insensitive, and mode identifiers are written in lower Case Insensitive letters (Insensitive). I), the chestnut regular mentioned above can be written like this (? I)cat, take a look:

In the last article, we learned about grouping and referencing. I) (cat), 1:

The corresponding Python code is as follows:

import re result = re.findall(r"(? I) (cat) (1) \ ", "the cat cat cat cat") print (result) output: [(' cat ', 'cat'), (' cat ', 'cat')]Copy the code

If we want to match a cat with the same case, we can add a layer of parentheses ((? I)cat) \1, look:

Test link: regex101.com/r/tPXuGX/1

Note: In Python, calling the above re with the regex library will raise the following exception, but will not exactly match two cat cases.

DeprecationWarning: Flags not at the start of the expression
Copy the code
import regex result = regex.findall(r"((? I) cat) (1) \ ", "the cat cat cat cat") print (result) output: [(' cat ', 'cat'), (' cat ', 'cat')]Copy the code

2. Dot wildcard mode

In the first article, we learned about metacharacters and memorized English dots. What does it mean? It can match any character, but it can’t match a newline. When we need to match truly arbitrary characters, we can use [\s\ s] or [\d\ d] or [\w\ w].

But this isn’t elegant enough, so regex provides a pattern that allows the English. To match all characters, including newlines. This pattern is called dot wildcard.

Dot wildcard mode, in many places known as Single Line mode, English stands for Single Line, take the first letter, so the corresponding single-line mode modifier is (? S), for example:

3. Multi-line matching mode

In re ^ matches the beginning of the entire target string, and $user matches the end of the entire target string:

What if we wanted the expression to match the beginning and end of each line? Multiline matches the beginning and end of each line. M), take a look at the effect:

4. Comment mode

When we write a long list of expressions, only you and God will know what it means at the time, and after six months, only God will know what it means.

The English of the Comment is Comment, so the corresponding modifier of the Comment mode is (? #comment), note that there is no initial letter, but also a #, take the IPv4 address matching re as an example:

^ (? : 1 - [9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | [0 to 5]) (25? #comment IP address first value (? : \. (? : 0 | [1-9] [0-9]? | 1 [0-9] [0-9] | 2 [0 to 4] [0-9] | 25 [0 to 5]) {3} (? #comment specifies the IP addressCopy the code

X mode is also provided in many programming languages to write the re, which also acts as a comment. Python for example:

import re regex = r'''(? Mx) # start (\d{4}) # year (\d{2}) # month $# end "result = re.findall(regex, '202006 \ n202106) print (result) output: [(' 2020', '6'), (' 2021 ', '6')]Copy the code

In X mode, all line breaks and Spaces are ignored and can be escaped or placed in character groups if a match is needed:

import re regex = r'''(? Mx) # start (\d{4}) # year [] # space (\d{2}) # month $# end "result = re.findall(regex, '06 \ n2021 06' 2020) print (result) output: [(' 2020 ', '6'), (' 2021 ', '6')]Copy the code

5. Write at the end

Finally, in the summary of the above mentioned content:

Here, the 4 common matching patterns of regular expressions are finished. If you have any questions, please leave a comment.

The online verification tool for regular expressions is regex101.com/