Write it up front

Reading the headline you might wonder why not 30 minutes? Because my article is so graphic and scary, man, you can read it in less than 30 minutes. You might think I’m bragging about B’s, but when you’re done, you pinch your watch, you’ll see I’m bragging about B’s and why is it.22? As a science student, keeping two decimal places is a constant belief. I just like the number 2, that’s all

Regular expression

Regular expressions are also called regular expressions. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions are often used to retrieve, replace, and validate text that conforms to a pattern (rule).

The RegExp object

In Java, the RegExp object represents a regular expression and is a powerful tool for performing pattern matching on strings. So how do you use it? Two ways: literals and constructors

Var reg = /\bhello\b/g // literal // \b stands for WordBoundary (WordBoundary), which means that this re matches helloworld instead of helloworld because helloworld is linked together, No word boundariesCopy the code
Var reg = new RegExp('\\bhello\\b','g') // And the g(modifier, global match) is extracted separately and the re is not surrounded by /, the first type above is like this => / regular expression /Copy the code

Regular visualization tool

Regulex visualizations are great for understanding regex. Without saying a word, come to this site first, and this article will use this site to verify the examples written.

metacharacters

Regular expressions consist of two basic character classes

  • The original meaning character
  • metacharacters

A literal character is a character that represents the original meaning, such as hello in the re above, which means matching the string metacharacter hello, which is a character that does not have the original meaning. Like the one above \ B

Since the metacharacter does not represent its own character, what if I want to match its original character? For example, if I want to match the + and *, use \ to escape the character

Below these metacharacters first casually over first, do not need to recite can also look down ~

  • $matches the end of the input string. If the Multiline property of the RegExp object is set, then $also matches ‘n’ or ‘r’. To match the $character itself, use \$.
  • () marks the start and end of a subexpression. Subexpressions can be retrieved for later use. To match these characters, use (and).
  • * Matches the preceding subexpression zero or more times. To match the * character, use *.
  • + matches the preceding subexpression one or more times. To match the + character, use +.
  • Matches any single character except newline n. To match., use.
  • [] marks the beginning of a bracketed expression. To match [, use [.
  • {} marks the beginning of a qualifier expression. To match {, use {.
  • | indicate a choice between the two. To match |, please use the |.
  • ? Matches the preceding subexpression zero or once, or indicates a non-greedy qualifier. To match? Character, please use? .
  • \ Marks the next character as an or special character, or a literal character, or a backreference, or an octal escape. For example, ‘n’ matches the character ‘n’. ‘n’ matches a newline character. The sequence ‘\’ matches “”, while ‘(‘ matches “(“.
  • ^ Matches the start of the input string unless used in a square bracket expression, in which case it indicates that the set of characters is not accepted. To match the ^ character itself, use ^.
  • \cX matches the control character specified by x. For example, cM matches a Control-m or carriage return. The value of x must be either A-z or a-z. Otherwise, c is treated as a literal ‘c’ character.
  • \f Matches a feed character. That’s the same thing as x0c and cL.
  • \n Matches a newline character. That’s the same thing as x0a and cJ.
  • \r matches a carriage return character. This is equivalent to x0d and cM.
  • \s matches any whitespace character, including Spaces, tabs, page feeds, and so on. Equivalent to [FNRTV]. Note that Unicode regular expressions match full-corner Spaces.
  • \S matches any non-whitespace character. That’s the same thing as 1.
  • \t matches a TAB character. This is the same thing as x09 and cI.
  • \v matches a vertical TAB character. That’s the same thing as x0b and cK

The border

From the first example we know that this b, no, this b, is what word boundaries mean. We know that f*ck can be used in many ways, either by itself or with an ing. And then we just want to find the individual f* CK and look at the code

// As the glorious successor of socialism, how can we use F * CK as an example? var reg = /\bis\b/g; var str = "this is me"; str.replace(reg,'X') //"this X me"Copy the code
var reg = /is/g;
var str = "this is me";
str.replace(reg,'X')
//"thX X me"Copy the code

The distinction between the two is so clear that I cannot say more, gentlemen.

Another question, what if I just want the A character at the beginning and I don’t want the A character in the middle of the text? That’s all it takes to fight the enemy

var reg = /^A/g;
var str = "ABA";
str.replace(reg,'X');
//"XBA"Copy the code

The regex that ends in A is as follows

var reg = /A$/g;
var str = "ABA";
str.replace(reg,'X');
//"ABX"Copy the code

Note that as with the beginning and end positions, so with the ^ and $positions, ^ precedes the regular expression and $follows the expression

Character classes

In general, a regular expression corresponds to one character of the string. For example, the expression \bhello represents the matching character \bh E L L O.

What if we want to match a class of characters? For example, if I want to match a or B or C, we can use the metacharacter [] to construct a simple class [A,b,c] that groups a, B, and C together, indicating that it can match a or B or C. If you lose your English, you should be able to understand the following figure, one of a, b, c, that is, match any of the ABC ~

Scope of the class

Once we’ve learned that, if we’re going to write numbers that match 0 through 9, this is what we’re going to do

But what if I want to match more? Isn’t the keyboard about to break? This re is also too not intelligent?? Obviously, as you can imagine, the people who created the re also thought we could do this

Okay, that’s a little bit easier, and then you might be surprised, but what about my dash -? What if I want to match 0-9 and dash? Don’t panic, just fill it back in and you can see clearly that there are two branches, which means I can go from 0 to 9 or I can go from the dash line

Predefined classes

After learning the above, we can write the re that matches the number, [0-9]

So is there an easier and shorter way?

As it happens, regular is powerful

You may have seen the subtleties in the metacharacters section above

Above table, not above (this segmentfault insert table??)

We can remember the usage of these predefined classes based on the meaning of the English words. We find that the difference between uppercase and lowercase letters is invert! If we want the inverse of a class, add ^ none of ABC to the class

quantifiers

What if you were to write a re that matches 10 numbers? What would you write? You probably have it all figured out

\d\d\d\d\d\d\d\d\d\dCopy the code

Surprise! Even though you’ve been single for more than 20 years, your right hand still feels a little weak!

Fatigue, sometimes after overwork

In order to save the right arm of some people, the re has quantifiers

To implement the above requirements we just need \d{10}

For the convenience of those who don’t speak English well, like me, I even use the little-known Baidu Translator.

But what if I don’t know exactly how many numbers to match? It’s a match between 100 and 1000

Let’s look at the results of the visualization tool to make it easier to understand

Notice that {n,m} is a closed interval with n and m degrees

Greedy versus non-greedy

From the above we know that if we want to match 100 to 1000 digits, we write \d{100,1000} what if I give a string with 1000 digits, but I only want to match the first 100 digits?

If written as above, it follows

Var reg = / \ d {3, 6}; var str = "123456789"; Str.replace (reg,' replace with this '); //" Replace this 789"Copy the code

As we can see, the above example matches six digits and replaces six, even though his re matches three to six digits.

Yes, it is greedy! It matches as many as it can! This is the greedy match of the re, which is the default. If we don’t want to be greedy, how can we be satisfied easily? You just put the quantifier, right? Can be

Var reg = / \ d {3, 6}? /; var str = "123456789"; Str.replace (reg,' replace with this '); //" replace this 456789"Copy the code

It is clear that the re matches only the first three digits ~ which is the non-greedy pattern of the re

Branch conditions

What if I only need to match 100 numbers or 1,000 numbers? There are only two possibilities, 100 and 1000, not any number between 100 and 1000, how do you play against the enemy? This is where the branching conditions for the re are designed

\d{100}|\d{1000}Copy the code

It is important to note that all parts | segmentation is left and right sides, rather than simply attached to the symbol of both the left and right sides, see below

Sometimes we only need one part of the branch, followed by the same trunk, and just include the branch with ()

Note: This match starts with a branching condition on the left side of the re. If the left side is satisfied, then the right side is not in comparison!

var reg = /\d{4}|\d{2}/
var str = "12345"
str.replace(reg,'X');
// "X5"Copy the code
var reg = /\d{2}|\d{4}/
var str = "12345"
str.replace(reg,'X');
//"X345"Copy the code

Look forward/backward

Sometimes, the characters that we’re looking for might depend on the characters before and after like I’m going to replace two consecutive numbers, and it’s going to be preceded by two letters, and that’s the kind of number THAT I want you might wonder, isn’t that the end of it?

\d{2}\w{2}Copy the code

It matches 2 numbers and 2 letters, even though it’s connected, it matches 4 characters, so if I wanted to replace the matching text, it would replace 4 characters, and we only want to replace 2 numbers! And this is where assertions come in and we need to understand a couple of things

  • The regular expression is parsed from the beginning of the text to the end of the text. The direction of the end of the text is called ‘forward’
  • Lookahead is when the regular expression matches a rule (‘ 2 numbers’ in this case) and looks ahead to see if it matches the assertion (‘ preceded by 2 letters’ in this case), while lookback/lookback rules do the opposite. (javascript does not support backtracking)

On the table!

According to the table, we can solve this problem. Note that \w includes numbers

var reg = /\d{2}(? =[a-zA-Z]{2})/; var str = "1a23bc456def"; str.replace(reg,'X'); //"1aXbc456def"Copy the code

Only numbers are replaced, not assertions.

Take a look at the negative side of this, by the way

This is not followed by I think you should know how to use it

grouping

What do we do when we want to match a word that appears three times instead of a number? You might write it like this

hello{3}Copy the code

Then you open the visualization tool

Oh, my God! They only repeat my O! Dead slag is too much

In fact, we can achieve the purpose of grouping simply by using (), so that quantifiers are applied to the grouping, as are the parentheses in the branch condition above

How do I use the contents of a group? So first of all, how do I match 8 discrete numbers? If you don’t use groups, you’ll find there’s no way to start because you can’t tell if there’s repetition! First we publish the answer, then we analyze the next wave

  • First, this (? ! Negative forward-looking assertion A) expression B, here we use negative forward-looking assertion, that is to say that everything in front of A (expression B) does not conform to expression A. You see, the design is, I have this assertion “repeated numbers occur,” and then the expression is “8 numbers,” “8 numbers” cannot compound “repeated numbers occur.”
  • Then, the.*(\d).* is to find a number that appears in any position. Why any position? Because the repetition of our judgment can occur anywhere; As can be seen from the above visualization, \d has 0-N characters before and after it, so it is arbitrary.
  • Here’s the key: what does this \1 mean? If you look closely, you can see that \d has a parenthesis, and this parenthesis represents the group. What is the group number? The first parenthesis is group 1 (by default). If there is a second parenthesis, it is group 2. Forward-looking parentheses do not count. You may wonder, am I quoting it as writing a \ D in this position? NOP, not only that, it refers to the contents of \d, which means it will have the same value as \d! Isn’t that just repetition? !!!!!!!!! The.*(\d).*\1 represents any number of repetitions at any location
  • And finally, we put it all together to say that there can’t be any duplication in matching eight numbers. (? ! 8 digits because of this (? !). It’s negative looking forward, so… Emmm.. So that makes sense.

There are more details about the sections, but the space is limited, and the 30 minutes are coming up. Had to pick up some valuable commonly used to speak ~

Gnome male -“

So much for re. The next article will cover the properties of re objects in javascript, as well as some methods. If you have any comments or suggestions, please point them out in the comments section. Thank you


  1. FNRTV ↩