preface

Whether you’re writing code or scripts, regular expressions can be useful for manipulating strings or extracting important information.

Regex = regex, as in the following mailbox expression:

\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*
Copy the code

I don’t know if you fainted, but the first time I saw it, what the hell!

Therefore, when we want to use the regular expression, the first choice of course is to copy the existing regular expression online, online? I’m sorry, but the no-brainer approach is the only option.

Maven-metadata. XML

tag: maven-metadata.xml

tag: maven-metadata.xml

tag: maven-metadata.xml

tag: maven-metadata.xml

tag:




<metadata>
    <groupId>QDReader.QDAarCenter</groupId>
    <artifactId>QDUI_Component</artifactId>
    <versioning>
        <latest>0.0.10</latest>
        <release>0.0.10</release>
        <versions>
            <version>0.0.1</version>
            <version>hundreds</version>
            <version>0.0.3</version>
            <version>0.0.4</version>
            <version>0.0.5</version>
            <version>0.0.6</version>
            <version>0.0.7</version>
            <version>0.0.8</version>
            <version>0.0.9</version>
            <version>0.0.10</version>
        </versions>
        <lastUpdated>20210817095518</lastUpdated>
    </versioning>
</metadata>
Copy the code

Before, I am the first to get the latest > that line, to use the < > do cutting, finally take third after cutting, step is a bit more.

With re, pass (? <=latest>)[\d\.]+ get what I want!

Here’s a treasure trove of regular expressions to learn:

This website canEnter the regular expression and the text you want to validate, highlight the matches, and even tell you whyOf course, there are some other tips:

Website: regex101.com/r/lL3C8c/2

directory

How to match characters

Square brackets [] are frequently used. The contents in brackets are used for matching. Multiple contents are in or relation.

For example, if we use [World], W can match, o can match, that is, W, o, r, L, and d are or relations, and match one of them. For example, if we use Hello World, the highlighted part is matched:

Notice that the gray dots in the match result are not dots, but Spaces, probably just to make the Spaces more prominent

Common expressions are as follows:

symbol explain
[World] matchingW,o,r,ldCan be replaced with any character we want
[0-9] Match the Numbers
[a-z] Matches lowercase letters. If uppercase letters are matched, use this parameter[A-Z]
[\s\S] Match all,\sMatches all whitespace, including newlines,\SMatches non-whitespace characters, excluding newlines
[\u4e00-\u9fa5] Match all Chinese characters

If you want to match something in parentheses, you don’t want to match something in parentheses. What if I don’t want to match any characters in [World]?

[] = [^World] = [^World]

In many cases, it’s not enough just to match letters, so how do you match upper and lower case letters plus numbers?

[a-za-z0-9] [A-za-z0-9] [A-za-z0-9]

[a-zA-z0-9_] = [a-za-z0-9_] = [a-za-z0-9_] In addition, \d means to match all digits, which is nice because it matches any single character of the newline characters (\n and \r).

symbol explain
[a-zA-Z0-9] Matches single uppercase and lowercase letters and numbers
\w Is equivalent to[a-zA-Z0-9_]Matches a single alphanumeric and underscore
. Matches any single character except a newline character
\d Matches any single number

I am JiuXinDev (I am JiuXinDev, I am JiuXinDev, I am JiuXinDev, I am JiuXinDev)

Since [] can only match one character, let’s look at how to match quantities.

Two, how to match the number

Regular expressions often use {} for quantity matching:

symbol explain
{n} Match n times, for examplea{2}Can matchaa, but it does not matcha
{n,m} Matches n to m times, for exampleA {1, 2}Can matchaa, can also matcha
{n,} Matches at least n times, which is n to infinity, for examplea{2,}, except for a singleaHow many other consecutiveaThey all match

At this point, we can match am in I AM JiuXinDev. If you only want to match am, the text character am will do the job:

Ah, this is too simple, if we want to match the MAC of the ma, am text characters, there is no work, can add a match again, use among | separated, ma | am like this.

But what about matching aa and MM again? {} can be used to match am and ma with the regular expression [am]{2}.

Let’s do an exercise

Topic 1: How to find 2-5 letter words in a sentence? As a reminder, the boundary between words and Spaces can be denoted by \b

\b[a-za-z]{2,5}\b [a-za-z]{2,5}\b:

Limiting how many times a match can be successful is called a qualifier, and includes the {} family described above, as well as some simpler versions of qualifiers:

symbol explain
? Match 0 or 1 times, equivalent to{0, 1}
+ Matching 1 to infinity is equivalent to{1,}
* Matching 0 to infinity is equivalent to, {0}

Retrieving tag content is common

Given

qdreader. QDAarCenter
, how to extract

and
?

Remember the universal card. Introduced above? We just have to make sure we start and end with <.*> :

That’s not the right result. It’s not the matching tag I was looking for

Whoa, whoa, whoa, whoa. One more? , written as <. *? > :

One? , involves a knowledge of greed and non-greed, in a nutshell:

  • Greed: As many matches as possible
  • Non-greedy matches: As few matches as possible


and

qdreader. QDAarCenter
both start with < and end with >.

QDReader.QDAarCenter

If the match is non-greedy, it has two matching results,

and
.

Essentially, both + and * are greedy matches, right? Non-greedy matching.

Three, how to match the position

The localizers are usually the following:

symbol explain
^ Matches the position at which the specified string begins. You can specify a line of text or an entire paragraph of text that begins with the specified character
$ Matches where the specified string ends. You can specify a line of text or an entire paragraph of text ends with the specified character
\b Word boundaries, the position of words and Spaces
\B Non-word boundary matching refers to the boundary between characters
\bWe’ve used it before to locate the beginning and end of a word.^Used in[]When a character is matched, it is the opposite of a character.

Go back to the maven-metadata.xml file at the beginning of this article:

Topic 3: How do I get all the contents of the line starting with

?

. From the above maven – metadata XML, you can see that the < version > tag, there are a number of Spaces before symbols can be expressed in \ s *, combined with the locator ^, the regular expressions can be written as ^ \ s * < version >. * :

The above solution can also be solved with $, which matches all lines ending in with the regular expression.*$.

In some cases, ^ and $can also be used together; for example, ^JiuXin$can be used to indicate that only rows with JiuXin content are matched.

Four, how to match multiple content

Matching multiple options is typically done using selection.

1. The parentheses

Type on the blackboard, the most important selection is the parenthesis () :

symbol explain
(a) You can put all of the options in parentheses|If one of multiple options is matched, success is counted.(a)Will capture the group,(a)Each word that is matched in is placed in the cache, and the corresponding cache is viewed numerically

It’s a little confusing, so let’s go through them one by one. [] = = = = = = = = = = = = = = = =

  • []: Each option is a single character and can match only one character.
  • (a): Each option can be a character or an expression, more powerful!

(am | \ d +), for example, means to match am or do not limit the amount of digital string.

Consolidate:

Question 4. Match the words beginning with a and b from the given statement?

Extract word \ b, with a word corresponds to a [a zA – Z] *, match the choice of a or b can use the above expression, is combined with a \ b (a/a zA – Z * | b [a zA – Z] *) \ b:

If you notice the match, you’ll notice something a little different:

Instead of matching the result Match, there is a display message Group, which is the cache we introduced above. Use \n to reference the NTH cache.

We can use this cache to do something different, such as ([a-za-z])\1 to match two consecutive characters.

Consolidate:

5. Match two consecutive words in a given statement.

To indicate a word, use \b\w+\b. Since we need to use cache \1, we need to add a parentheses to it. Together, we get (\b\w+ b)\s\1:

Using caching is called backreferencing.

2. Non-capture element

As mentioned above, the result of using () matches is cached instead of using? In parentheses. : will eliminate the cache, there are several non-captured elements:

symbol explain
? : In the choice(a)To remove the cache
? = exp1(? =exp2)Match withexp2At the end of theexp1
? ! exp1(? ! exp2)Match is not toexp2At the end of theexp1
? < = (? <=exp2)exp1Match withexp2At the beginning ofexp1
? ! = (? ! =exp2)exp1Match is not toexp2At the beginning ofexp1

When we introduced the locator, we introduced ^ and $, which can match the beginning and end of a whole paragraph and line. We can also use text characters to match the beginning and end of text.

What about here? = and? What’s the difference between <=?

Non-capture element? < = and? ! =, as the name implies, indicates that the matching part begins or ends with the specified content, and the matching part does not include the specified content. Such as? <=he)\w+ Matches a string beginning with he, but the result does not contain he:

Remember the original title?

Topic 6: How to obtain the latest version information of the Maven library? Get

0.0.10
0.0.10

Some version numbers can contain letters and -, so the version number regular can be summarized as [\w\.-]+, followed by

, which can be written as (? < = < latest >) [\ w \. -] + :

Let’s practice how to end:

Topic 7: How to get QQ account in QQ mailbox?

\d{4,10}(?=@qq\.com) : \ \ \.com \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

At this point, we should be able to understand most of the regular expressions

How to match according to priority

The regular expression is evaluated from left to right, but the symbols in it have precedence as follows:

symbol explain
\ Escape character
(a).(? :).(? =).[] Parentheses and square brackets
*.+.?.{n}.{n,}.{n,m} qualifiers
^.$.\Any metacharacter, any character Registration points and sequences
| Or operation

Just like we write code, operators have various priorities, but it’s not that difficult!

Let’s do it:

Question 8: Can regular expressions ((\d)([A-za-z]{2}(\d)) \1\2\3\4 match the string 7ac87AC87AC88? If you can, export their \1, \2, \3, and \4.

This regular expression looks complicated, but fortunately it doesn’t use the * and + qualifiers, so the length of the match is fixed, and then we look at it in terms of priority.

The outermost layer is a curly bracket, which matches the result \1 because it is the first parenthesis from left to right.

Removing the parentheses, it is (\d)([a-za-z]{2}(\d)), again from left to right, and then matches the left (\d) first, so it is \2.

After removing the (\d) on the left, and ([a-za-z]{2}(\d)), the outermost layer is again a parenthesis, which is \3 as in the first case.

Finally, we have one left on the right, (\d), which is \4.

Take a close look at the whole expression (\d)([a-za-z]{2}(\d)), with two numbers on both sides and two letters in the middle. \1 is equal to the whole expression, \2 + \3 is equal to \1, \4 is the last digit of the whole expression.

So the final result is a match, just look at the figure above.

Next time bye

When not regular, always feel regular difficult, a meal of operation down, found that regular is only so many things.

Hope after reading this article, you can also master regular soon!

Thanks for reading. See you next time at 👋

Reference article:

The role of regular Expression parentheses