Front-end advances: regular expressions

Front end high quality re!

After reading this regular you are not far from mastering regular, grasp up the scroll!!

Two ways to create a re

  /pig/  // regular literals

  const reg = new RegExp('pig'.'g')  // Create a re using a new object
   or
  const r = '\\d'
  const reg2 = new RegExp(r,'g')

Copy the code

grammar

metacharacters

\d, 0-9 represents numbers, where 0-9 is a match between 0 and 9, of course, 2-8, etc. \d means that all but numbers are uppercase and lowercase are reversed

\b stands for word boundary, that is, the position between word and space; \ B in the same way

\w stands for alphanumeric underscore \w similarly

\s stands for whitespace, TAB, page feed, etc

/\s/.test('\n') // true

[A-Z] Lowercase letters A-Z [A-Z] Uppercase letters [A-za-z] Uppercase letters

. Represents any character except a newline character

qualifiers

It’s just how many times does it take to make a match

For example, if re+ matches re, REE, etc., but not R, e must appear at least once

For example, re* matches r, re, REE, etc., because it matches 0 or more times

? 0 times or once (used in combination)

{n} n times /p{2}/ is the match pp, two consecutive occurrences of p

{n,} occurs at least n times

{n,m} occurs n to m times

Limited border

^ Match the position at the beginning what does position mean here?

$matches the position at the end

For such a requirement, I first validate pig, a string of length 3.

/pig/.test('pig') // true

At first glance it looks fine

/pig/.test('pigg') // true

I found pigg would work, but I wanted a pig of length 3, so I needed a starting ^ and a ending $.

/^pig$/.test('pigg') // false

selector

| matching left and right sides of expression (not left and right sides of a single character), such as I want to match

Atomic groups

The () atom group is the parentheses are the whole thing

I want to match ‘2021-05-16′ or’2021\05\16,’2021\05-16’

const str = '2021-05-16'
const str2 = '2021/05/16'
/\d{4}([\/-])\d{2} \1\d{2}/.test(str)  

\1The match of the first set of atoms is ([\/-]), and so on2Is the second......Copy the code

Here’s a regular problem, as follows

Matching: AAAA - AA = AA AAAAA - A = AAAA AAA - A = AA AAAA - AAA = A Mismatch: AAAA - A = AA AAAA - AAA = A -= ACopy the code

There’s an answer at the end

Match returns an array of matched text with subscript 0, 1, 2, 3… The match is atomic group

'pig'.match(/p((i)(g))/)  // ["pig","ig","i","g"]
Copy the code

Sometimes it happens

'www.xxx.cn'.match(/\w+\.\w+\.(cn|com|org)/)  // ["www.xxx.cn","cn"]  Obviously you don't have to keep the list of atoms in the back that's the CN part, you can do that'www.xxx.cn'.match(/\w+\.\w+\.(? :cn|com|org)/)  // ["www.xxx.cn"]  

'pigpig'.match(/(pig)+/) // ["pigpig","pig"]  
Copy the code

Atomic group name, the match method also has a property called groups and takes the name key-value pair and stores it in an object.

const str = 'name=pig'
str.match(/ (? 
      
       .+)=(? 
       
        .+)/
       
      ) 
// result groups {
    key: "name"
    value: "pig"
}

Copy the code

Atomic table

[] spoke above selector |, when I want to match one character is or you can write when I or g, p/p | | g/I, when the more you can choose to use atom table, such as/pig /, this means any character is atomic in the table that meet the current regular. [^]/ [^pig]/ [^pig]/ [.+] this matches the string. Or +)

/[\d]/    // Match the numbers

/[^\d]/   // Match the same as \D except for numbers

/[^\dpig]/  // Matches all characters except the digit pig

const str = ` p i g `  // Capture all the strings inside
str.match(/[^\s]+/g)  // ["p", "i", "g"]

Copy the code

escape

\ Here is a separate description of the use of escape characters in the re, the two ways to create the re are described separately, one such requirement I want to match: 1.23 such a positive number containing the decimal point, according to the character described above.

/^\d+.\d+$/.test('1.23')    // true seems to be ok. Try something else

/^\d+.\d+$/.test('1 @ 23')  // true That seems to be all right

Copy the code

The main problem here is in the re. It represents any character that I wrote up here, so what I’m going to match. Instead of any character it represents in the re, what about \ escape, as follows

/^\d+\.\d+$/.test('1.23')    // true

/^\d+\.\d+$/.test('1 @ 23')  // false

Copy the code

Another important thing to note is that there are different ways to use escapes in the two ways of creating a re. Literals are created as shown above, and objects are created as strings, as shown below

const reg = new RegExp('\d') // Here we pass the string '\d' so what does it match?
reg.test(1)  // false
reg.test('\d') // trueIn strings \ stands for escape'\d'= = ='d'  // true d escaped from \ is still d, so we need to write it this way
const reg2 = new RegExp('\\d')
reg.test(1) // trueIn fact, by printing reg directly on the console, we can also see that we get /d/ instead of /\d/, so we need to add \ when using \d, \w, etc. in object re creation.Copy the code

After all you’ve seen, why don’t you try? I want to match the string

pig la la la la la la

inside the SPAN tag

const str = 'dog🐶 can't catch me 
      
        Pig 🐷 catch me 
      
 la la la la'
str.match(/<div>[\s\S]+<\/div>/) 

Copy the code

[\s\ s] is used instead,\s matches newlines,\s matches all characters except newlines, and in the same way: [\d\ d]

Pattern modifier

I, g, m

I case insensitive, /pig/i.test(‘ pig ‘) // true

G global match, /pig/g.test(‘ pig ‘) // true

'digdig'.replace(/d/.'p')  // Pigdig finished by replacing only one p
'digdig'.replace(/d/g.'p')  // pigpig
'Digdig'.replace(/d/ig.'p') // pigpig
Copy the code

M multi-line match, you can think of a multi-line string plus m to match, each line is treated separately.

I’m going to use mostly I’s and g’s and not much else.

U is a unicode(UTF-8) match.

'pig'.match(/\p{L}/u)   // If there is an L in the attribute matching the string, use it in conjunction with u
'p, I. G; '.match(/\p{P}/gu)   // Matches the form symbol [", ", ". ", "; "]For more advice on this, check out the websiteCopy the code

Y adhesion, improve efficiency, here directly quotes ruan Yifeng es6 written inside.

The y modifier is similar to the G modifier in that it is a global match, and each subsequent match starts at the position following the last successful match. The difference is that the G modifier works as long as there is a match in the remaining position, while the Y modifier ensures that the match must start at the first remaining position, which is what “bonding” means.

const str = `1234pig,pig,22223332232131231231231231`
const reg = /pig/y
reg.lastIndex = 4  // let the match start at 4
reg.exec(str)
reg.exec(str)  // Since the content to be matched is continuous, subsequent content does not need to be matched

Copy the code

? Ban greed

Re greedy, will match as many as possible

'piggg'.match(/pig+/) // ["piggg"]
'piggg'.match(/pig+? /) // ["pig"]
'piggg'.match(/pig*? /) // ["pi"]
'piggg'.match(/pig{1,}? /) // ["pig"]  Just keep him as few matches as possibleCopy the code

pigpigdogpig fetch span tags and contents one by one

const str = `<span>pig1</span><span>pig2</span>dog<span>pig3</span>`
str.match(/<span>.+<\/span>/g)  
// ["<span>pig1</span><span>pig2</span>dog<span>pig3</span>"]  Since the re is greedy, it will match as many times as possible and it will match the span at the beginning of the string all the way to the span at the end of the string, so we want to disallow greedy str.match(/.+? <\/span>/g)   
/ / / "< span > pig1 < / span >", "< span > pig2 < / span >", "< span > pig3 < / span >"] as less as possible
Copy the code

assertions

(? Pig =xx) match the contents of pig

(? <=xx) after the assertion, that is, what is before

(? ! Xx) zero width negative prior assertion, that is, what is not behind

Write the re for a simple password: /^(? =.*\d)(? =.*[a-z])(? =.*_)[\da-z_]*$/

Here is the end, regular is also very interesting, you can go to find some regular problems to try.

/^(a+)(a+)-\1=\2$/

Now that you see, why don’t you answer a question?

String: haha (123 pig 456) 456 789 (cat 234) ABCC (pig 123) a (dog) LL (95 monkey 27)

Select the results [” CAT 234″, “dog”, “95 monkey 27”] without PIG in parentheses, and leave a comment in the comments section.