Talk about ES6 regular expressionsu
character
ES6 re added u modifier, can make JS re use more powerful, accurate.
You can use the RegExp. Prototype. unicode attribute to determine whether the re uses the U modifier.
We look at the role of the Kangkang U modifier from the following scenario:
-
Match any character
- In the old way, it can be used
[\d\D]
,[\s\S]
,[\w\W]
,(^)
Match any character/[\d\D]/.test("\r"); //true Copy the code
- Of course, it can also be given
.
The meaning of, i.e[^\n\r\u2028\u2029]
, and then use.|[\n\r\u2028\u2029]
Matches any character;/^(.|[\n\r\u2028\u2029])*$/.test("1 a \ t \ n \ n \ r \ f - _ (the"); //true Copy the code
- Another approach is to use new additions to ES6
s
The modifier. It will make special characters.
Ability to match any single Unicode Basic Multilingual Plane (BMP) character:"\n".match(/ /.); //null "\n".match(/ /.s); //[" Address ", index: 0, INPUT: "Access ", Groups: undefined] Copy the code
s
Whether the modifier is used or not can be passedRegExp.prototype.dotAll
Attribute judgment:/./.dotAll; //false / /.s.dotAll; //true Copy the code
However, this is not the end of the story. When you try to match the following example directly with the above re, you will find that the character is not recognized properly:
"𠮷".match(/[\d\D]/); //["�", index: 0, input: "𠮷", groups: undefined] Copy the code
To correctly identify such Unicode characters with code points greater than \uFFFF, the U modifier comes in handy. It will correctly handle four bytes of UTF-16 encoding:
"𠮷".match(/[\d\D]/u); / / / "𠮷", index: 0, input: "𠮷 groups: undefined] Copy the code
- In the old way, it can be used
-
Match any position
We know that \b matches the boundary between words and non-words, and \b matches the boundary between words. For those who know the u modifier, write /[\b\ b]/ug. Although it seems more reasonable, it is still not the correct solution.
The correct method should be: / \ | b \ b/ug. Not only is the u modifier used, but also the meaning of /[\b]/, which matches the escape character \b itself and the backspace key \u0008.
In addition, for the word boundary \b, both javascript and Java re refer to the string \w is [a-za-z0-9_]. In.NET, regular expression words are defined as strings of [A-zA-z0-9] and Unicode characters (Chinese characters, full-corner characters, etc.).
-
Correct identification of quantifiers
With the U modifier, all quantifiers correctly recognize Unicode characters with code points greater than 0xFFFF.
-
Correct recognition of predefined patterns
In re, \d, \w, \s, etc. are predefined classes. Similarly, Unicode characters with code points greater than 0xFFFF can be matched correctly only with the u modifier.
-
Standardize the writing of escape characters
This will help you understand what regular metacharacters are, because an error will be reported when an invalid escape character is escaped:
/\a/ // /\a/ /\a/u // Uncaught SyntaxError: Invalid regular expression: /\,a/: Invalid escape Copy the code
-
Identifies non-canonical characters with the I modifier
/[a-z]/iu.test('\u212A') // true /[a-z]/iu.test('\u004B') // true Copy the code
-
Unicode’s own attribute class \p{… } and \ {P… }
This new attribute class matches all characters that match a Unicode attribute. There are two ways to write it:
- Specify the attribute name and value
\p{UnicodePropertyName=UnicodePropertyValue} / / example: /\p{Script=Greek}/u // Specify a Greek character to match Copy the code
- Write only the attribute name or attribute value
\p{UnicodePropertyName} \p{UnicodePropertyValue} / / example: /\p{White_Space}/u // Matches all Spaces/\p{Emoji_Modifier_Base}\p{Emoji_Modifier}? |\p{Emoji_Presentation}|\p{Emoji}\uFE0F/gu/ / match Emoji Copy the code
- Specify the attribute name and value