If you need to handle complex string-related functions, consider using regular expressions.
The re is used to find and replace characters
Before we look at the formal concepts, let’s take a quick look at regex through a series of examples, and then dive into some of the concepts of regex.
I recommend the website regexr.com/
Use this website to quickly learn the basics of regular by following the examples below.
Light out of the case
9. Regular is written:
let reg=/... /Copy the code
. It is the content that you want to match, for example, I want to match my name in a series of English words: qiuyanxi, how to do?
RegExr was created by gskinner.com,
qiuyanxi and yanxi is proudly qiuyanxi hosted qiu yan xi by Media QiuYanxi Temple.
Copy the code
I’ll just say /qiuyanxi/ and it will match correctly.
G modifier – global
So I can only access the first one, and if I want to access all of the qiuyanxi in the global text, this matching pattern, which we call horizontal matching, uses a modifier G for global matching. It’s written like this /qiuyanxi/g
I modifier – case
As you can see, all of the above patterns are exact matches, even for uppercase characters, but I still want to match uppercase characters, so I can use the I modifier, which says qiuyanxi/gi
\ d and[0-9]
Now let’s convert the text
My name is QiuYanxi,my skill is Awesome!
My name is QiuYanxi,my skill is 66.
My name is QiuYanxi,my skill is 6.
Copy the code
I want to be able to match the numbers inside, and the numbers are usually 0-9. If you write 10-100, the re won’t help you find the numbers between 10-100, because the re matches character by character and doesn’t recognize the size of the numbers. So remember that the numbers in the re are 0 through 9.
That is, I want to be able to match between 0 and 9. This matching pattern, called vertical matching, can be matched with the character group []. It reads like this
/[0-9]/g
Copy the code
Notice the red box in the picture. It means six characters have been matched. In other words, the above notation matches the numbers, but each number has been broken down to match.
We don’t want to do that, so I want to tell the re how much I want to match, and then I need quantifiers
Quantifiers are represented by {}, which is used like this
So we have a full match of six, six, six, six hundred and sixty-six.
If you don’t want to match these three numbers, you can use [^…
In brackets ^ can be expressed as not.
Shorthand characters
When using groups of characters, we can use [0-9a-za-z] to represent any character of all numbers + upper and lower case letters
[0-9] shorthand \ d
[0-9 a Za – z] shorthand \ w
[^ 0-9] shorthand \ D
[^ 0-9 a Za – z] shorthand \ W
The following character sets are commonly used
shorthand | describe |
---|---|
. | All characters except newline characters |
\w | Matches all letters and digits[a-zA-Z0-9_] |
\W | Matching all non-alphanumeric, i.e. symbols, is equivalent to:[^\w] |
\d | Matching number:[0-9] |
\D | Match non-numbers:[^\d] |
\s | Matches all space characters, equivalent to:[\t\n\f\r\p{Z}] |
\S | Matches all non-space characters:[^\s] |
\f | Matches a feed character |
\n | Matches a newline character |
\r | Matches a carriage return |
\t | Matches a TAB character |
\v | Matches a vertical TAB character |
\p | Matching CR/LF (equivalent\r\n ) to match the DOS line terminator |
In this case, for example, I can match all characters except newline.
What if you want to match points? This is just like JS, where the escape character is used, and the escape character for the regular is \
That’s just the match dot symbol.
Starting position and ending position
In the following characters, I want to match My. What should I do?
My name is QiuYanxi,My skill is Awesome!
My name is QiuYanxi,My skill is 66.
My name is QiuYanxi,My skill is 6.
Copy the code
It’s easy, I can match it with /My/g, but I just want to match the first My, how do I do that? Again, use the ^ metacharacter, which means “not” when placed in brackets, or “beginning” when not.
It reads like this
I got a match, and here I see another line that didn’t get a match. This is because a newline is just a symbol for the re, and we need to let the re know that we want it to match multiple lines. You can use the m modifier, which, like g, is a modifier.
The following example is the use of the m modifier
The $metacharacter is used at the end.
metacharacters
The short character can help us replace [0-9] with the simpler \d. What about quantifiers? Quantifiers also have metacharacters to help us abbreviate.
It has already been introduced that quantifiers are represented in braces, {least bit, maximum bit}, for example
{0,1} can have no, at most 1 {1,} at least 1, at most unlimited {0,} zero or infiniteCopy the code
Using metacharacters instead is
{0,1} ==> * {1,} ==> + {0,} ==>?Copy the code
For example, \d{1,} represents one digit with no upper limit. We can write it as \d+
The other two metacharacters are used in the same way. A list of commonly used metacharacters is attached
metacharacters | describe |
---|---|
. | Period matches any single character except newline. |
[] | Character type. Matches any character inside square brackets. |
(^) | The character type of the negation. Matches any character except those in square brackets |
* | Matches >=0 repeated characters before the * sign. |
+ | Matches >=1 duplicate character before the + sign. |
? | Mark? The preceding characters are optional. |
{n,m} | Matches the character or character set before num braces (n <= num <= m). |
(xyz) | Character set that matches a string exactly equal to xyz. |
| | The or operator matches the character before or after the symbol. |
\ | Escape characters that match reserved characters[] () {}. * +? ^ $\ | |
^ | Matches start at the beginning line. |
$ | Match from the end. |
The inside of the more important is brackets [] character set, the parentheses () group and pipeline operator |.
The brackets character set is generally understood, and is used for vertical matching. Matching one of these characters, such as [tT], matches either t or t, with no order in it.
The above example can also use the parentheses () and pipe | write.
The parentheses () are groups represented as wholes.
Or pipe | said
Perhaps more difficult to understand is the parenthesis grouping (), which is a grouping that represents a whole, and in which order is also strictly defined.
It’s hard to understand because it usually requires reference substitution with $.
My name is QiuYanxi,QiuYanxi's skill is 666. How to convert QiuYanxi => 666?Copy the code
This is where you need to reference and then replace. A reference is a group of characters in parentheses and a $is used to get the reference. As the figure below
The above example first parentheses the required characters and then references them with a $+ sequence number.
For example, (QiuYanXi) is quoted by $1.
(\d+) is quoted by $2.
In-depth concept
Regular expressions are matching patterns that match either characters or positions.
Character match
1. Horizontal fuzzy matching – You need to match several
Horizontal fuzzy matching means matching one or more quantities.
The main way to do this is to use quantifiers, such as {m,n}, for m-n times
var regex = / ab} {2 and 5 c/g;
var string = "abc abbc abbbc abbbbc abbbbbc abbbbbbc";
console.log( string.match(regex) );
//=> ["abbc", "abbbc", "abbbbc", "abbbbbc"]
Copy the code
The mantissa g in the case represents the global matching pattern and is a modifier.
That is, all substrings that meet the matching pattern are found in order in the target string, emphasizing “all” rather than just “the first”. G is the first letter of the word global.
2. Longitudinal fuzzy matching – what do you want to match
Vertical fuzzy matching means that a character can be matched in any number of ways, not necessarily that way.
This is done by using groups of characters, such as [ABC], to indicate that the character can be any of the characters “A”, “b”, or “C”.
var regex = /a[123]b/g;
var string = "a0b a1b a2b a3b a4b";
console.log( string.match(regex) );
Copy the code
The number in the middle can be 1 or 2 or 3.
summary
Horizontal fuzzy matching is used to match quantities and vertical fuzzy matching is used to match multiple possibilities.
Quantifiers are used for horizontal fuzzy matching and character groups are used for vertical fuzzy matching
3. Quantifiers — used to indicate the number of characters
In layman’s terms, a quantifier is the number of occurrences of the character.
shorthand
{m,} indicates at least m occurrences. {m,n} means at least m occurrences, and at most n occurrences {m} is equivalent to {m,m}, indicating m occurrences. ? Equivalent to {0,1}, indicating presence or absence. How to memorize: the meaning of question mark, is there? + is equivalent to {1,}, indicating at least one occurrence. How to remember: The plus sign means to add, you have to have one first, then you can consider adding. * is equivalent to {0,}, indicating any occurrence, and may not occur. How to remember: Look at the stars in the sky. There may be none, there may be a few scattered, and you may not be able to count them.Copy the code
3.1 Greedy matching and lazy matching
Greedy matches are as many matches as I can make
Lazy matching is where I make as few matches as possible
var regex = / \ d {2, 5} / g;
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) );
// => ["123", "1234", "12345", "12345"]
Copy the code
It’s a case of greedy matching, I’ll take whatever I’m given.
The re above means \d I need numbers,{2,5} means I need 2-5, and if I have 5, I need 5.
So lazy matching, give me two, and I’m good.
Lazy matching is written like this
var regex = / \ d {2, 5}? /g;
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) );
// => ["123", "1234", "12345", "12345"]
Copy the code
Lazy matching can be achieved by placing a question mark after the quantifier, so all lazy matching cases are as follows:
{m,n}?
{m,}?
??
+?
*?
The way to remember lazy matching is to put a question mark after the quantifier and ask are you satisfied? Are you greedy?
4. Character group: indicates the range of characters
It is important to note that a character group (character class) is only one character. For example, [ABC] matches a character. It can be one of A, B, or C.
4.1 What Do I Do if the Range of Matched Characters is Too Large
If the range of characters to match is too large to write, range notation can be used. You can use a hyphen – here.
For example, [123456abcdefGHIJKLM] can be written as [1-6a-fg-m].
For example, 26 letters in lower case could be written as [a-z]
Because the hyphen is special, what do you do to match any of the characters “A”, “-“, or “z”? Cannot be written as [a-z] because it represents any character in lower case. It can be written as: [-az] or [az-] or [a\-z]. Either at the beginning, at the end, or escaped. You don’t want the engine to think it’s a range notation.
4.2 What if I don’t need a Character
For example, the character can be anything, but we don’t need “A”, “b”, “C”, we can use the exclusion character ^, which means invert.
This is the time to exclude the concept of character groups (antisense character groups). For example, [^ ABC] is any character except “A”, “b”, and “C”. The first part of the character group is ^ (decaracter) to indicate the concept of inverting.
Common shorthand form
\d is [0-9]. Represents a digit. How to remember: Digit.
\D is [^0-9]. Represents any character except a number.
\w is [0-9a-zA-z_]. Represents digits, uppercase letters, and underscores. How you remember it: W is short for Word, also known as word character.
\ W is [^ 0-9 a zA – Z_]. Non-word characters.
\s is [\t\v\n\r\f]. Represents whitespace, including Spaces, horizontal tabs, vertical tabs, line feeds, carriage returns, and page feeds. How to remember: S is the first letter of space character.
\S is [^ \t\v\n\r\f]. Non-whitespace character.
Is [^\n\r\u2028\u2029]. Wildcard character, representing almost any character. Newline, carriage return, line and segment separators are excluded. How to remember it: Think ellipses… Each of these dots can be interpreted as a placeholder for anything like it.
What if I want to match arbitrary characters? You can use any of [\d\ d], [\w\ w], [\s\ s] or [^].
5. Multiple selection mode
Described above are based on a mode matching, but sometimes we need a variety of patterns, such as I want to choose a ABC, also want to choose a xyz, can use the multiselect mode, through the pipe | segmentation
var reg=/[abc]? |[xyz]? /g
var string='xyz abc '
var string2='abc xyz'
console.log(string.match(reg))
console.log(string2.match(reg))
// => ["x"]
// => ["a"]
Copy the code
For example, to match the “good” and “nice” can use/good | nice /. The tests are as follows:
var regex = /good|nice/g;
var string = "good idea, nice try.";
console.log( string.match(regex) );
// => ["good", "nice"]
Copy the code
Case analysis
Regular expressions are most important for analysis, followed by writing, such as matching the following characters
var string = "#ffbbad #Fc01DF #FFF #ffE abc";
Copy the code
Need to match out hexadecimal characters.
Analysis:
1.Hexadecimal characters range from 1 to 9, A to Z, and a to Z
There’s a # in front of it
3.3-6
The range is the first condition, using groups of characters
Quantity is the third condition, using quantifiers
var reg=/ # [0-9 a - fA - F] {3, 6} / g
var string = "#ffbbad #Fc01DF #FFF #ffE abc";
console.log(string.match(reg))
// => ["#ffbbad", "#Fc01DF", "#FFF", "#ffE"]
Copy the code
Match the time
23:59
thou
24:00
Analysis:
1. The first digit is between 0 and 2
2. The second digit is between 0 and 9
3. The third digit is between 0-5
4. The fourth digit is between 0 and 9
5. If the first digit is 2, the second digit is between 0 and 4
6. If the first and second digits are 24, then the third and fourth digits must be 00
var reg=/ (([0, 1] [0-9] | [2] [0, 3]) : [0 to 5] [0-9]) | 24:00 /
console.log( reg.test("01:09"));// true
console.log(reg.test("24:01")); // false
console.log(reg.test("00:6 0")); // false
Copy the code
If you want to ignore the zeros, you can write it like this
var reg=/ ((^ (0? [0-9] [0-9]) | | 1 [2] [0, 3]) : (0? | [0 to 5] [0-9]) | 24:00 /
console.log(reg.test("He"));// true
console.log(reg.test("24:01")); // false
console.log(reg.test("23:9")); // true
Copy the code
Match the date
Take the YYYY-MM-DD format as an example.
Required to match 2017-06-10
Analysis:
What scope do I need to match?
Year: number, between 0 and 9 [0-9]
Month: digital, may be the 01-09, and between 10 to 12, can be used (0 [1-9] | 1 [2-0])
Day: Numbers, is likely to be 01-09, and 10-29, 31, can use 0 [1-9] | [12] [0-9] | 3 [0, 1]
How many bits do I need to match?
Year matches four, month matches two, day matches two
const reg=/ [0-9] {4} - (0 [1-9] | 1) [0-2] - [1-9] | 0 [12] [0-9] [0, 1] / | 3
console.log(reg.test("2017-06-10"));//true
Copy the code
Position matching
What is location
A position is the position between adjacent characters. For example, where the arrow in the image below points:
How to match positions
In a re, there are six anchor characters
^ $ \b \B (? =p) (? ! p)
Match the beginning and end
Matches start and end with ^ and $
^ (off character) matches the beginning of a line in a multi-line match.
The $(dollar sign) matches the end of a line in a multi-line match.
For example, we replace the beginning and end of a string with a “#”. :
var result = "hello".replace(/^|$/g.The '#');
console.log(result);
// => "#hello#"
Copy the code
In the case of multi-line matching pattern, the two concepts are rows, which needs our attention:
var result = "I\nlove\njavascript".replace(/^|$/gm.The '#');
console.log(result);
/*
#I#
#love#
#javascript#
*/
Copy the code
Matches word boundaries and non-word boundaries
\b is word boundary
\B Non-word boundary
\b is the boundary between \w and \w, including \w and ^, and \w and $.
\w is [0-9a-zA-z_] for letters, digits, uppercase letters, and underscores.
\W is all except letters, digits, uppercase letters, and underscores.
For example, a file named \b in “[JS] lesson_01.mp4 “would look like this:
var result = "[JS] Lesson_01.mp4".replace(/\b/g.The '#');
console.log(result);
// => "[#JS#] #Lesson_01#.#mp4#"
Copy the code
The re above adds # to the word boundary.
What are word boundaries? First, JS is related to \w, and second, Lesson_01 and mp4 belong to \ W.
So what is \W? That is [], Spaces, and.
So let’s analyze it:
[
withJ
There is a word boundary betweenS
with]
There is a word boundary between- The blank space with
L
There is a word boundary between - 1 with
.
There is a word boundary between .
withm
There is a word boundary between- The last one
#
Student: Because 4 belongs to\w
, with$
There is a word boundary between the endings
Now that the concept of \B is known, \B is relatively easy to understand.
\B means the opposite of \B, not word boundary. For example, if \b is deducted from all positions in a string, all that is left is \B’s.
var result = "[JS] Lesson_01.mp4".replace(/\B/g.The '#');
console.log(result);
// => "#[J#S]# L#e#s#s#o#n#_#0#1.m#p#4"
Copy the code
Front position and non-front position
(? = p) and (? ! P) represents the front and non-front positions of the P mode respectively. Such as
var result = "hello".replace(/ (? =l)/g.The '#');
console.log(result);
// => "he#l#lo"
Copy the code
The above code indicates that the character inserted before l is #
And (? ! P) is (? =p), for example:
var result = "hello".replace(/ (? ! l)/g.The '#');
console.log(result);
// => "#h#ell#o#"
Copy the code
Property of position
You can think of the properties of positions as null characters.
For example, a “hello” string is equivalent to the following:
"hello"= ="" + "h" + "" + "e" + "" + "l" + "" + "l" + "o" + "";
Copy the code
That is, the positions between characters can be multiple.
A very efficient way to understand positions is to understand them as null characters.
Related to the case
The thousands separator representation of a number
For example, change “12345678” to “12,345,678”.
You need to put a comma in front of the three digits, so it becomes
const reg=/ (? =(\d{3})+$)/g
console.log('12345678'.replace(reg,', '))
/ / "12345678"
Copy the code
However, if the character is 123456789, it will be “123,456,789”.
So we need to get rid of the first position, which can be represented by the alpha.
Non-primacy can be used in position? ! P mode, so it becomes p mode
const reg=/ (? ! (^)? =(\d{3})+$)/g
console.log('123456789'.replace(reg,', '))
//"123,456,789"
Copy the code
The function of regular expression parentheses
The function of parentheses, in fact, can be explained in a few words, the parentheses provide grouping, so that we can refer to them.
There are two ways to refer to a group: in JavaScript, or in regular expressions.
Grouping and branching structures
We know that /a+/ matches consecutive occurrences of “a”, and to match consecutive occurrences of “ab”, we need to use /(ab)+/.
Where parentheses provide grouping function, so that the quantifier + applies to the whole “ab”, test as follows:
var regex = /(ab)+/g;
var string = "ababa abbb ababab";
console.log( string.match(regex) );
// => ["abab", "ab", "ababab"]
Copy the code
In the multiple branch structure (p1 | p2), the role of the parentheses is self-evident, provides the expression of all possible.
For example, to match the following string:
var regex = /^I love (JavaScript|Regular Expression)$/;
console.log( regex.test("I love JavaScript"));console.log( regex.test("I love Regular Expression"));// => true
// => true
Copy the code
Reference group
This is an important function of parentheses, which allows us to do data extraction, as well as more powerful substitution operations.
To take advantage of its benefits, you must use the API of the implementation environment.
Take dates, for example. Assuming the format is YYYY-MM-DD, we can start with a simple re:
var regex = /\d{4}-\d{2}-\d{2}/;
Copy the code
Then modify the parenthesized version:
var regex = /(\d{4})-(\d{2})-(\d{2})/;
Copy the code
Why use this re?
Extract the data
For example, to extract the year, month, and day, you can do this:
var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
console.log( string.match(regex) );
// => ["2017-06-12", "2017", "06", "12", index: 0, input: "2017-06-12"]
Copy the code
An array returned by match. The first element is the overall match result, followed by the matches for each group (in parentheses), followed by the match subscript, and finally the input text. (Note: The array format returned by match is different if the re has the g modifier or not).
Alternatively, we can use the exec method of the regular object:
var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
console.log( regex.exec(string) );
// => ["2017-06-12", "2017", "06", "12", index: 0, input: "2017-06-12"]
Copy the code
It can also be obtained using the constructor’s global attributes 1 to 1 to 1 to 9:
var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
regex.test(string); // Regex operations, for example
//regex.exec(string);
//string.match(regex);
console.log(RegExp. $1);/ / "2017"
console.log(RegExp. $2);// "06"
console.log(RegExp. $3);/ / "12"
Copy the code
replace
For example, if you want to replace YYYY-MM-DD with MM/DD/YYYY, what do you do?
var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, "$2 / $3 / $1");
console.log(result);
/ / = > "06/12/2017"
Copy the code
The equivalent of
var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, function(){
return RegExpThe $2 +"/" + RegExp. $3 +"/" + RegExp. $1; });console.log(result);
/ / = > "06/12/2017"
Copy the code
backreferences
In addition to referring to groups using the corresponding API, you can also refer to groups within the re itself. But you can only refer to the previous grouping, which is called a backreference.
Again, take dates.
For example, to write a re that matches one of the following three formats:
2016-06-12
2016/06/12
2016.06.12
The first regular that might come to mind is:
var regex = /\d{4}(-|\/|\.) \d{2}(-|\/|\.) \d{2}/;
var string1 = "2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // true
Copy the code
Where/and. Need to be escaped. Although the required condition is matched, data such as “2016-06/12” is also matched.
What if we wanted to be consistent with the separator? Use a backreference:
var regex = /\d{4}(-|\/|\.) \d{2}\1\d{2}/;
var string1 = "2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // false
Copy the code
Four operations on regular expressions
check
var regex = /\d/;
var string = "abc123";
console.log( regex.test(string) );
// => true
Copy the code
cut
And once we have a match, we can do things like slice.
The so-called “shard” is to cut the target string into segments. Split is used in JS.
For example, if the target string is “HTML, CSS,javascript”, split by commas:
var regex = /, /;
var string = "html,css,javascript";
console.log( string.split(regex) );
// => ["html", "css", "javascript"]
Copy the code
You can use split to “cut out” year month day:
var regex = /\D/;
console.log( "2017/06/26".split(regex) );
console.log( "2017.06.26".split(regex) );
console.log( "2017-06-26".split(regex) );
// => ["2017", "06", "26"]
// => ["2017", "06", "26"]
// => ["2017", "06", "26"]
Copy the code
take
Although the whole match is made, it is sometimes necessary to extract partial matched data.
In this case, the regex usually uses the grouping reference (grouping capture) function, along with the related API.
Here, again, I’m taking the date as an example and extracting the year, month and day. Note the parentheses in the re below:
var regex = /^(\d{4})\D(\d{2})\D(\d{2})$/;
var string = "2017-06-26";
console.log( string.match(regex) );
// =>["2017-06-26", "2017", "06", "26", index: 0, input: "2017-06-26"]
Copy the code
in
Finding is often not the goal, usually the next step is to replace. In JS, replace is used.
For example, replace the date format from YYYY-MM-DD to YYYY /mm/dd:
var string = "2017-06-26";
var today = new Date( string.replace(/-/g."/"));console.log( today );
// => Mon Jun 26 2017 00:00:00 GMT+0800
Copy the code
The resources
Full tutorial on regular expressions
learn-regex