When you go to ask for help, do you feel helpless when the big guy says “Use the regular” to you?

Do you blame yourself when you open up A Google search for basic re syntax while doing validation checks?

When you codereview and a few lines of re’s pop up in your eye, do you get lost for words?

“You don’t need to learn regular, just look it up when you come across it.”

?????

Guys, wake up. Our goal is the star Sea, not the screw.

From tomorrow on, be a happy man, feed the horse, split wood, learn regularity.

A regular expression is a logical formula used to manipulate strings. It uses some descriptive languages to express a matching strategy for strings, so as to search, verify, extract, and modify strings.

Want to look at the basic syntax?

“For every promise you make, there’s a regular expression learning trip that ends with basic grammar,” Buddha says.

Maybe only you know me, so you know there are two ways to create a re

literal

The literal approach consists of a pattern wrapped within two slashes

const reg = /ab+c/
Copy the code

Regular expression literals are compiled after the JS script is loaded.

The constructor

Call the constructor of the RegExp object

const reg = new RegExp("ab+c")
Copy the code

Regular expressions created using constructors are compiled while the JS script is running, so constructors are recommended if your re is generated dynamically.

In the past, in the present, and in the future, we know the structure of the re

Slant bar + matching pattern + slant bar + modifier/Pattern/flagsCopy the code

Let’s start with simple modifiers.

The modifier

Modifiers, also known as flags, are used to specify a specific matching strategy.

Common modifiers are:

mark describe
i Ignore, marks the matching policy as case-insensitive
g Global, finds all matches
m Mult-line, which makes the boundary character^and$Match the beginning and end of each line
s modified.The dot operator (more on that later), plus the S modifier,.Matches any character including a newline character

As for the matching pattern, apart from the ordinary characters’ ABC ‘, ‘China’, ‘123’ and so on, the rest of the characters have to mention the special characters that shine in the re. Her name is Xiao Wei, no, it is metacharacter.

metacharacters

Metacharacters do not represent themselves literally, they all have special meanings. Some metacharacters have special meanings when written in square brackets.

metacharacters describe
. Matches any single character except newline
[] Character set that matches any character in square brackets
(^) A negative (negative) character set that matches any character except those in square brackets
* Matches previous subpatterns occur ≥0 times
+ Match the previous word pattern occurs ≥1 times
? Matches the previous subpattern 0 or 1 times
{n,m} Matches the character or character set before num braces (n ≤ num ≤ m)
(xyz) Character set that matches a string exactly equal to xyz
| The or operator matches the character before or after the symbol
\ Escape characters that match reserved characters[] () {}. * +? ^ $\ |
^ Off character, at the beginning of the matching character
$ Dollar character at the end of the matching character

.Some operators

Matches any single character other than newline characters (\n, \r). Such as:

'The car parked in the garage.'.match(/.ar/g) 
//.ar matches an arbitrary character followed by a and r
// ['car', 'par', 'gar'] 
Copy the code

If you need to match any character, including \ n, you can use the /. | \ n)/mode, or use a modifier s (see modifiers section above).

Character set

Square brackets [] are used to specify a character set. The hyphen – can be used to specify the range of character sets ([12345abcd] → [1-5a-d]). Such as:

'The car parked in the garage.'.match(/[tT]he/g) 
// [tT]he matches a string starting with t or t, followed by he
// ['The', 'the'] 
Copy the code

Note that special characters lose their special meaning in square brackets. For example, [(a)+] will match the characters (, a,) and +.

But if the special ^ character appears at the beginning of square brackets, the character set is negative.

'The car parked in the garage.'.match(/[^c]ar/g)
// [^c]ar matches any string except c followed by ar
// ['par', 'gar']
Copy the code

Re actually provides a common shorthand for character sets:

shorthand describe
\d Matching number:[0-9]
\D Match non-numbers:[^\d]
\w Matches all letters and digits[a-zA-Z0-9_]
\W Matching all non-alphanumeric, i.e. symbols, is equivalent to:[^\w]
\s Matches all space characters, equivalent to:[\t\n\f\r\p{Z}]
\S Matches all non-space characters:[^\s]

You may not have seen villages with auroras, or people setting off fireworks late at night, but you may have seen people matching arbitrary characters with combinations like /(\d\ d)/.

qualifiers

Qualifiers are used to limit the number of times the regular matches a subpattern. *, +,? , {n,m} these are qualifiers.

The simple thing, the simple way of saying it, is that in order to match, you have to have a finite number of occurrences.

{n,m}, {n,m}, {n,m}, {n,m}, {n,m}, {n,m}, {n,m}

  • If writing{n}, indicates a fixed number of matchesn
'The number was 9.9997 but we rounded it off to 10.0.'.match(/[\d]{3}/g)
// [\d]{3} matches 3 digits
/ / / '999'
Copy the code
  • If writing{n,}Is at least matchedntime
'The number was 9.9997 but we rounded it off to 10.0.'.match(/[\d]{1,}/g)
/ / / \ d {1} match at least 1 digit, equivalent \ d +, * * * * a similar {0} is equivalent to * * * * *
// ['9', '9997', '10', '0']
Copy the code
  • If writing{n,m}Indicates that the match times are at least n times and at most m times
'The number was 9.9997 but we rounded it off to 10.0.'.match(/ [\ d] {2, 3} / g)
[0-9]{2,3} Matches at least two digits and at most three digits
/ / / '999', '10'
Copy the code

(a)Characteristics of the group of

A feature group is a set of subpatterns written in parentheses () that will be treated as a whole.

() in a certain sense as a side standard to measure the mastery level of the regular, it provides groups, regular play is actually a lot of functions are based on this.

grouping

The expression (ab)* matches occurrences of 0 or more ab in a row. If () is not used, then the expression ab* matches zero or more consecutive b occurrences.

'ababa abbb'.match(/(ab)*/)
// (ab)* Matches consecutive occurrences of ab, where the second element of the result ab represents a match to (...) The results of the
// ['abab', 'ab', index: 0, input: 'ababa abbb', groups: undefined]
Copy the code

There are generally two scenarios for referencing groups:

Reference in JS

In addition to validating data, another big scenario where we use regex is data extraction and replacement.

  • Data extraction
const reg = /(\d{4})-(\d{2})-(\d{2})/
// 1. Use regular match without g modifier
'2021-12-31'.match(reg)
// The first element is the result of the whole match, followed by the contents of each captured group (parentheses) match
// ['2021-12-31', '2021', '12', '31', index: 0, input: '2021-12-31', groups: undefined]

// 2. Use the constructor's global attributes $1 to $9
reg.test('2021-12-31')
// RegExp.$1 -> '2021', RegExp.$2 -> '12', RegExp.$3 -> '31'
Copy the code
  • Data to replace

'2021-12-31'.replace(/(\d{4})-(\d{2})-(\d{2})/.function(match, year, month, day) {
	return month + "/" + day + "/" + year
})
// Year, month, and day respectively represent the NTH string matched by parentheses
/ / '12/31/2021'
Copy the code

Reference in a regular expression

Instead of referring to groups using apis, referring to groups directly in the regular expression itself is called backreferencing.

For example, to write a re that matches one of the following three formats:

  • 2021-12-31
  • 2021/12/31
  • 2021.12.31
const reg = /\d{4}(-|\/|\.) \d{2}\1\d{2}/
/ / reference \ 1 said before the group (- | \ / | \.) Because/and. Need to escape, preceded by the symbol \
// Whatever it matches (e.g. -), \1 matches the same specific character
// regex.test('2021-12-31') // true
/ / regex. Test (' 2021-12.31) / / false
Copy the code

Note that if a nonexistent group is referenced, the re does not report an error, but matches the character itself. For example, /\1\2/ matches \1\2.

Non-capture grouping

Groups produced by () matches, whether using match or constructor, record the result of each () match, so they are also commonly referred to as captured groups. Of course, this inevitably adds overhead and has a greater or lesser impact on performance and efficiency.

So if we simply want to use the primitive functionality of parentheses without referring to them, that is, neither in the API nor in the re. Then we can use non-capture grouping (? : p).

'ababa abbb'.match(/ (? :ab)*/)
/ / (? :ab)* still matches consecutive ab occurrences, as does (ab)*, but no longer records capture results in the result compared to the grouping section above
// ['abab', index: 0, input: 'ababa abbb', groups: undefined]
Copy the code

Branching structure

Another more important and commonly used () function is the use of symbols or | said, constitute a branch structure, so | also known as the or operator.

'The car is parked in the garage.'.match(/(T|t)he|car/g)
/ / (T | T) he (T) | T | car matching he or car
// ['The', 'car', 'the']
Copy the code

\Escape special characters

If you want to match [] {} () / \ + *. $^ |? These special characters are preceded by a backslash \.

'The car is parked in the garage.'.match(/\w+e\.? /g)
// /\w+e\.? Matches a string ending in e or
// ['The', 'parke', 'the', 'garage.']
Copy the code

The anchor

Anchor points are used in regular expressions to match strings that specify a beginning or an end, with ^ specifying the beginning and $specifying the end.

When we need to verify a string, such as a mobile phone number, id number, password, etc., we need to add ^ and $to ensure that the entire beginning and end of the string are verified.

Matching replacement data is usually not necessary because the data we are trying to match can appear anywhere in a long string of text.

Position position position

Regular expressions are matching patterns that match either characters or positions.

Old Yao told me I was a thief, stealing his memories and stuffing them into my head.

I have to say that knowing the location doesn’t help us much when we use the re to find and replace.

In addition to ^ and $, there are more anchors:

symbol describe
\b Matching word boundaries
\B Matches non-word boundaries
? = Positive lookahead
? ! Lookahead positively
? < = Positive lookabehind, positive posterior assertion
? <! Negative lookbehind

\band\B

I just want you to translate. What is word boundary?

  • \wand\WThe position between
  • ^with\wThe position between
  • \wwith$The position between
'[Regex] Lesson_01.mp4'.replace(/\b/g.The '#')
// \w matches [a-za-z0-9_]
// '[#Regex#] #Lesson_01#.#mp4#'
Copy the code

Understand \B, look \B is very easy to understand.

'[Regex] Lesson_01.mp4'.replace(/\B/g.The '#')
// The result of the above example \b is the exact opposite
// '#[R#e#g#e#x]# L#e#s#s#o#n#_#0#1.m#p#4'
Copy the code

Zero width assertion

Both leading and trailing assertions (collectively called lookaround) belong to non-capture groups (hence the need for ()). They are used when we need to have another specific pattern in front or behind the matching pattern.

(? =pattern)Forward antecedent assertion

Matches the position before pattern, i.e., to satisfy the match, pattern must be followed.

'The fat cat sat on the mat.'.match(/(T|t)he(? =\sfat)/g)
// (T|t)he(? =\sfat) matches The or The before \sfat, \s denotes space
// ['The']
Copy the code

(? ! pattern)Negative prior assertion

Matches the position before the pattern. That is, to satisfy the matching, the position cannot be followed by the pattern.

'The fat cat sat on the mat.'.match(/(T|t)he(? ! \sfat)/g)
// (T|t)he(? ! \sfat) matches The or The not before \sfat, \s denotes space
// ['the']
Copy the code

(? <=pattern)Forward backward assertion

The position after pattern is matched, i.e., to satisfy the match, it must be preceded by pattern.

'The fat cat sat on the mat.'.match(/ (? <=(T|t)he\s)(fat|mat)/g)
/ / (? < = (T) | T he \ s) (fat | mat) match behind The \ s or The \ s fat or mat, \ s Spaces
// ['fat', 'mat']
Copy the code

(? <! pattern)Negative posterior assertion

Matches positions that do not contain pattern. That is, to satisfy the match, the position cannot be preceded by pattern.

'The cat sat on cat.'.match(/ (? 
      )
/ / (? 
      
// ['cat'']
Copy the code

Here’s a rule I’ve discovered:

  • The so-called “positive” means that pattern should appear in characters
  • The so-called “negative” means that pattern cannot appear in characters
  • The so-called “first” refers to the position matching in front of pattern
  • The so-called “after” refers to the position after pattern

Don’t be greedy, I will be a dog

As we all know, stop-loss is important, and stop-profit is also very important. It means to get out in time when profits fall, so as to keep some profit. This of course requires us to do “not too greedy”.

So, how do we specifically stop surplus?

(A bit off topic 🤦♂️)

Greed match

Note that the re defaults to greedy matching, which means it matches substrings as long as possible.

+,? , *, {n}, {n,}, {n,m}

A greedy match occurs when the qualifier above is encountered, for example:

'The fat cat sat on the mat.'.match(/(.*at)/g)
// Greedy matches up to the last word mat with at in the string
// ['The fat cat sat on the mat']
Copy the code

Non-greedy matching

As the name implies, “I’m not greedy 🤷♂️”, that is, as little as possible to match the substring, so it is also called lazy matching.

+? And?????? , *? , {n}? , {n,}? , {n, m}?

Add one after the qualifier accordingly? The non-greedy model, for example:

'The fat cat sat on the mat.'.match(/ (. *? at)/g)
// Non-greedy matches until the first word with an at
// ['The fat', ' cat', ' sat', ' on the mat']
Copy the code

Is this it?

I hear you think you got it all figured out?

Capitalize the first letter

Why?

We first have to know how to find the first letter of each word.

Got it. Location, use \b.

const titleize = (str) = > {
	str.toLowerCase().replace(/\b\w/g.(matched) = > matched.toUpperCase())
}
// \b\w matches the first letter of each word
Copy the code

Actually very simple, actually very natural, two people’s love by two people.

Matching paired labels

What is paired tag?

Are you sure

Regular Expression is love?

Regular Expression

Understand, inside the < > tag name to consistent, use reverse references.

const pairedTags = (str) = > {
	return ([^ / < >] +) >. *? <\/\1>/gs.test(str)
}
Copy the code

In fact, it is not difficult, you are too pessimistic, across a wall do not share with anyone.

  1. < ([^ >] +) >Match the open label,<with>It’s not a special character, you don’t have to\Escape,([^ >] +)Matches more than one not>Character, that is, qualified<>Not empty label
  2. First bracket(a)perfect\ 1
  3. .With the modifiersAppear together, matching any character, with line breaks allowed. As mentioned above, we can also use\w\WAnd so on
  4. *with?If they appear together, it indicates lazy matching. Find the close label nearby

Digit split in thousandths

What is a thousandth partition?

Convert 1234567890 to 1,234,567,890.

Observe, the position of occurrence, every three digits in advance.

See, before pattern, use forward first assertion.

const division = (str) = > {
	return str.replace(/\B(? =(\d{3})+(? ! \d))/g.', ')}Copy the code

In fact, we all know their moves, there is no difference.

  1. \BThis is used to match positions between characters, and without limitation, commas will be inserted at each position between characters.
  2. (\d{3})+Matches a group of three digits, which can be three or six consecutive digits, such as 123 or 123456
  3. (? ! \d)Represents a negative prior assertion that matches a position that is not followed by a number
  4. (? =(\d{3})+(? ! \d))(\d{3})+(? ! \d)Pattern, as a forward-leading assertion, matches a number followed by a multiple of 3, and here(d{3})+It cannot be followed by the position of the number
  5. /\B(? =(\d{3})+(? ! \d))/gReplaces consecutive multiples of 3 in the entire string without any more digits following it\BThe location of the

Originally wanted to see good to close, above 5 points for you to savor, but I am afraid you are too lazy to carefully disassemble and deliberate, let alone when I write dizzy.

All right, I’ll do it for you.

// Take 1234567.00 as an example
const str = '123456.00'
str.replace(/\B/g.', ')
1,2,3,4,5,6,7.0 / / ', 0 '
// The effect of \B in this string is to match the position between the characters, because there is no restriction, so except non-characters. Insert commas everywhere else outside,

str.replace(/\B(? =\d{3})/g.', ')
/ / '1,2,3,4,567.00'
4, 3, 2, 1 all have 3 consecutive digits, so they are all followed by a comma,

str.replace(/\B(? =(\d{3})+)/g.', ')
/ / '1,2,3,4,567.00'
// The expectation is to restrict groups of three, but the effect is the same as before, because for example 3 can be regarded as followed by 456 3 consecutive numbers

str.replace(/\B(? =\d{3}(? ! \d))/g.', ')
// '1234,567.00'
// Because our pain point is to find the position after 3 consecutive digits, we use negative preemptive assertions to restrict the number that can't be followed by any more digits, but we haven't done it right yet

str.replace(/\B(? =(\d{3})+(? ! \d))/g.', ')
/ / '1234567.00'
// Congratulations on your wealth. Any consecutive number that appears in multiples of 3 should be preceded by a comma.
Copy the code

Check the password

Just looking at the headlines, someone must have just closed the TAB and liked 😭 (we’re talking about you 👉).

“This is easy. I won’t look at it.”

Really ?

I’ll add a restrictive description: The password is 6-12 characters long and consists of digits, lowercase letters, and uppercase letters, but must contain at least two types of characters.

Come on, you can have the blackboard, you can have the pen, you can have the spotlight, you can write.

const checkPassword = (str) = > {
	return / ^ [a zA - Z0-9] {6, 12} $/.test(str)
  // Use the g modifier because there are anchor points
]
Copy the code

If that’s your answer, give me my pen back.

The above answer, if you do not look at the “at least two characters” limit, it is not wrong.

How can you “at least” contain at least two characters?

Okay, so there’s got to be a number somewhere, lowercase or uppercase, position, zero width assertion.

(Feel or step by step elaboration writing method is more interesting)

Contains one of these characters

For example, if I want a password that must contain a number, then (? =. * [0-9]).

Is that understandable?

Pattern is.*[0-9], indicating that a position is matched. A number must be followed by any non-newline characters.

Therefore, re can be temporarily rewritten as

let reg = / ^ (? = (. *) [0-9] [a zA - Z0-9] {6, 12} $/
Copy the code

I don’t know if you have any questions, but I have two:

  1. Why do we add. *?
  2. why(? =. * [0-9])In front?

Ok, ask yourself and answer:

  1. If your password requires it to start with a number, don’t add it. *No problem.
  2. You need to understand,Zero-width assertions do not change position, so[a zA - Z0-9] {6, 12} $Always apply to the entire string.

At least two

For example, if I want a password that must contain numbers and lowercase letters, it is (? =. * ([0-9])? =.*[a-z]), then the next thing to do is to arrange any two combinations to complete the task.

reg = / ((? =. * ([0-9])? =.*[a-z])|(? =. * ([0-9])? =.*[A-Z])|(? =.*[a-z])(? =. * [a-z])) ^ [0-9 A Za - Z] {6, 12} $/
Copy the code

(Seems a bit long!!)

Of course, I’m sure you’re smart (” Smart ME, “if you haven’t already) :” At least 2 “means” not all of them.”

reg = / (? ! ^ [0-9] {12} $) (? ! ^ [a-z] {12} $) (? ! ^ [a-z] {12} $) ^ [0-9 A Za - Z] {6, 12} $/
// It seems that there are fewer words
/ / (? ! ^[0-9]{6,12}$) does not match all numbers
Copy the code

So the final answer could be

const checkPassword = (str) = > {
	return / (? ! ^ [0-9] {12} $) (? ! ^ [a-z] {12} $) (? ! ^ [a-z] {12} $) ^ [0-9 A Za - Z] {6, 12} $/.test(str)
]
Copy the code

I’m just standing on the shoulders of giants

In fact, most of the content of this article is from the following two articles, I just stand on their shoulders, a little polish.

  • Learn Regex The East Way
  • Full tutorial on regular expressions

Stranger, I also wish for you

May you have a bright future

May all be well, Jack shall have Jill

May you find happiness in this earthly world