Understand Martian (2)

Thank you

For this article, see the Regular Expressions Mini-Book

Position matching

What is location

The position is an empty string. Each character is not a position, and the position in the re can be understood as follows


// The empty string is the position
"hello"= ="" + "h" + "" + "e" + "" + "l" + "" + "l" + "" + "o" + "";

"hello"= ="" + "" + "hello"
Copy the code

How to match positions

location	meaning
^	Match the beginning. Multiple lines match, match the beginning of the line
$	Match the end. Multiple lines match, match end of line
\b	Word boundaries can be the positions between \w and \w, \w and ^, \w and $
\B	Non-word boundaries, positions between \w and \w(between characters), \w and \w, ^ and \w, $and \w
(? =p)	P is a subpattern, (? =p) indicates the position before the p character, (? =l) indicates the position before the L character.To put it another way, the character after the position needs to be p, and that position is the position that satisfies the re
(? ! p)	(? ! (is p)? =p), all not (? =p). For example (? ! ^) all positions not in front of the beginning.

^ and $

Single match

// "#Hello World#"
console.log("Hello World".replace(/^|$/g.The '#'))
Copy the code

Multi-line matching

There is the modifier \n, ^$matches the beginning and end of each line. Regular expressions require the addition of m global schema

// "#Hello#
// #World#"

console.log('Hello\nWorld'.replace(/^|$/gm.The '#'))
Copy the code

\ \ b and b

\b

\b Matches all word boundaries


// []#I# #L#[] [#ove#] #you# #Fang#[]#Fang#!!!!

"[]I L[] [ove] you Fang[]Fang!!!!".replace(/\b/g.The '#')
Copy the code

I is the boundary of the word
L[is the boundary of the word
[o is the word boundary
E] is the boundary of the word
Y is the word boundary
U is the boundary of the word
F is the word boundary
G [is word boundary
F is word boundary
G is word boundary

\B

\B matches all non-word boundaries; \B is the inverse

// #[#]I L[#]# #[o#v#e]# y#o#u F#a#n#g! #! #! #! #

"[]I L[] [ove] you F#a#n#g!!!!".replace(/\B/g.The '#')
Copy the code

(? = p) and (? ! p)

P is a subpattern. The position of p can be another regular expression

(? =p) forward first assertion

// All empty characters are preceded by #
// p is preceded by a total of three empty characters, so add three #
// '# # # p'
' p'.replace(/ (? =[\s])/g.The '#')
Copy the code

(? ! P) negative prior assertion

(? =p) position

// Precedes all non-null characters
// ' #p#'
' p'.replace(/ (? ! [\s])/g.The '#')
Copy the code

case

A re that does not match any character


//. Indicates a wildcard character, but the beginning is after the character, and no character is satisfied

var reg = / /. ^
Copy the code

Thousands separator

You can use (? =) forward first assertion, with \d{3} subpattern. (? If (\d{3})$) =(\d{3})$)

Because we need to match multiple groups we can use the quantifier + for this position. (? =(\d{3})+$)


"Position" + 123 + "Position" + 456+ $(end)Copy the code


var reg = / (? =(\d{3})+$)/g

// ",123,456,789"
"123456789".replace(reg, ', ')
Copy the code

A comma is added to the beginning of the string (^), which is also regular. How to remove the comma? There are two ways. The first uses \B, because commas must be added to non-word boundaries. The second approach is to use negative antecedent assertions (? ! ^), requires that the matching position cannot be followed by the beginning.


var reg1 = /\B(? =(\d{3})+$)/g

/ / "123456789"
"123456789".replace(reg1, ', ')

var reg2 = / (? ! (^)? =(\d{3})+$)/g
/ / "123456789"
"123456789".replace(reg2, ', ')
Copy the code

Off topic, I also thought about this question before reading the solution, my thoughts are as follows


var reg = / [^ ^] (? =(\d{3})+$)/g

12,45,789 "/ /"

"123456789".replace(reg, ', ')
Copy the code

The result was “12,45,789”. After thinking about it, I found myself in a mental error. What we need to match here is the position but not the character and that’s important!

Reg means it’s followed by three digits and not preceded by a character itself,Rather than the positionIt is important to mentally distinguish between places and characters

regular expression matches only ^, $, \b, \b, (? =p), (? ! p)


// ^ Indicates the location
"123456789".replace(/^/g.', ')

// [^^] is a character
"23456789".replace(/[^^]/g.', ')
Copy the code

If the string becomes “123456789 123456789”, we still need to add a thousand separator to the character, what do we do?

Let’s analyze the current situation, this string has a space in the middle, we can’t continue to use at this timeCharacters before space will not match because the condition of the (\d{3})+ subpattern cannot be met.

We can replace $with \b to indicate the position from the boundary of the word before the first three digits

/ / ", 123456789, 123456789"

"123456789, 123456789".replace(/ (? =(\d{3})+\b)/g.', ')
Copy the code

We can’t add commas to word boundaries at the same time, so use \B(non-word boundary position) or (? ! \b)(not, the position in front of the word boundary)


/ / "123456789, 123456789"

"123456789, 123456789".replace(/\B(? =(\d{3})+\b)/g.', ')

"123456789, 123456789".replace(/ (? ! \b)(? =(\d{3})+\b)/g.', ')
Copy the code

Currency formatting

“1888” = = > “$1888.00”

var money = '1888'
var reg1 = / (? =(^))/g
var reg2 = /$/g

money = money.replace(reg1, '$')

// "$ 1888.00"
money = money.replace(reg2, 00 '. ')
Copy the code

Verify password

Password authentication rule. The password is a 6-12 character string consisting of lowercase and uppercase letters and digits. The password must contain two types of characters.

First match the first condition, 6-12 bits


var reg = /^\w{6, 12}$/g
Copy the code

Must contain numbers

// Indicates whether there is a string containing a number in the preceding position
var reg = / (? =.*[0-9])/g
Copy the code

It must contain lowercase letters, it must contain uppercase letters and the same thing


// Indicates whether there is a string containing lowercase letters, before the position
var reg = /? =.*[a-z]/g
Copy the code

We combine the above re’s


var reg = / ((? =. * ([0-9])? =.*[a-z])|(? =. * ([0-9])? =.*[A-Z])|(? =.*[a-z])(? =. * [a-z])) ^ 6, 12 {} $\ w /
// true
console.log(reg.test('123dde121'))
// true
console.log(reg.test('1231a311'))
// false
console.log(reg.test('12z'))
Copy the code

Another way to think about it

Characters containing two types can be understood as not all numbers, not all lowercase letters, and not all uppercase letters. ? ! Is? The region of theta is inverse.


// Do not exist, "not" are all numbers, the front position
var reg1 = / (? ! [0-9] {12}) /
// Does not exist, "not" are all lowercase letters, before the position
var reg2 = / (? ! [a-z] {12}) /
// Does not exist, "not" are all capital letters, the position before
var reg3 = / (? ! [a-z] {12}) /

var reg = / (? ! [0-9] {12}) | (? ! [a-z] {12}) | (? ! [a-z] {12}) ^ 6, 12 {} $\ w /
Copy the code