Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
This article also participated in the “Digitalstar Project” to win a creative gift package and creative incentive money
Can you really use regular expressions? If you do, use it as a reinforcement. If you don’t, learn. I think regular expressions are a side measure of how good a programmer is. So I suggest you learn more.
Today I will talk about several topics:
- Why learn regular?
- This section describes how to create regular expressions and syntax rules
- How to use regular expressions
Why use regular expression?
Regular expression describes a pattern of string matching. It can be used to check whether a string contains a certain substring, replace the matched substring, or extract the substring matching a certain condition from a string.
Here I use a simple example to illustrate the good, read the text rather than directly on the code:
// Find all the numbers in the string
var str = 'Code more than 123 Nuggets 45 non-stop 6789!! 111 ';
1 / / method
function findNum(str) {
var tmp = ' ', arr = [];
for (var i = 0; i < str.length; i++) {
var cur = str[i];
if (!isNaN(cur)) {
tmp += cur;
} else {
if (tmp) {
arr.push(tmp); tmp = ' '; }}}if (tmp) {
arr.push(tmp)
}
return arr;
}
console.log(findNum(str)) / / / "123", "45", "6789", "111"]
Method 2 uses regular expressions
var reg = /\d+/g;
console.log(str.match(reg)) // ["123", "45", "6789", "111"]
Copy the code
After looking at the examples above, you might think that true matching is so easy to learn that it can reduce code and improve performance. So while regular expressions may seem like a bunch of marbling, it’s worth learning.
Two Regular expression creation methods and syntax rules
Regular expressions can be created in either of the following ways:
- Literal creation method
- Instance Creation Mode
// The first way to create a literal
var reg = /pattern/flags
// The second instance creation method
var reg = new RegExp(pattern,flags);
/** pattern: regular expression flags: flags 1. I ignores case matching. 2. M multi-line matching, which continues until the end of a line of text to determine whether the item in the next line matches the re
Copy the code
Now that we know how to create, let’s look at one difference:
2.1 the difference between
- Literal creation cannot concatenate strings, instance creation can
var reg = 'More code, more nuggets! ';
var r1 = new RegExp(reg + 'Ollie to');
var r2 = /regParam/;
console.log(r1); // the code will not stop, the nuggets will not stop! Ollie to /
console.log(r2); // /regParam/
Copy the code
- Characters with special meaning in literal creation do not need to be escaped. Instance creation needs to be escaped
var r1 = new RegExp('\d'); // /d/
var r2 = new RegExp('\\d') // /\d/
var r3 = /\d/; // /\d/
Copy the code
2.2 Grammar Rules
After talking about the basic creation method, we will talk about a main, is the syntax, to be able to use it must learn his syntax, just like you learn HTML, JS. Let’s cut the crap and get to work.
2.2.1 metacharacters
- A metacharacter representing a special meaning
\d : 0-9Any digit \d in between takes only one position \w: digit, letter, underscore0-9A-z a-z _ \ s: such as Spaces or blank \ D: in addition to the \ \ D W: in addition to the \ W \ s: besides \ s. : besides \ n any one character \ : escape character | : or () : group \ n: match a newline \ b: Matches both the beginning and end Spaces of the boundary string as bounds => does not occupy the number of string digits ^ : qualifying the beginning position => itself does not occupy the position$[A-Z] : indicates that any letter [] can be used. [^ A-z] : indicates that no letter [] can be used. [^ A-z] : indicates that no letter [] can be usedCopy the code
- A quantifier metacharacter representing a number of times
* : 0 to multiple + : 1 to multiple? : 0 times or 1 time optional {n} : exactly N times; {n,} : n to multiple {n,m} : n to m timesCopy the code
Quantifiers appearing after metacharacters such as \d+ limit the number of occurrences of preceding metacharacters
var str = '1223334444';
var reg = /\d{2}/g;
var res = str.match(reg);
console.log(res) //["12", "23", "33", "44", "44"]
var str ='I'm The space king.';
var reg = /^\s+|\s+$/g; // Matches the beginning and end Spaces
var res = str.replace(reg,' ');
console.log('('+res+') ') //(this is Blank)
Copy the code
The () and [] and repeat subterms of the re // are singled out
- Generally, the characters in [] have no special meanings. For example, + indicates +. But words like \w still have special meaning
var str1 = 'abc';
var str2 = 'dbc';
var str3 = '.bc';
var reg = /[ab.]bc/; // The value of.
reg.test(str1) //true
reg.test(str2) //false
reg.test(str3) //true
Copy the code
- [], there will be no double digits
/** [12] = 1 or 2 but [0-9] = 0 to 9 [a-z] = a to z example: match all ages from 18 to 65 */
var reg = / [18-65]; // This is correct
reg.test('50')
//Uncaught SyntaxError: Invalid regular expression: /[18-65]/: Range out of order in character class
[16-85] seems to be suitable for ages 16 to 85, but actually it is also not reliableWe actually match this18-65The age regex we're going to break down to match what we're breaking down into3Partially match18-19 20-59 60-65
reg = /(18|19)|([2-5]\d)|(6[0-5])/;
Copy the code
- () function of the priority: all | appeared, we must pay attention to whether there is a need to add () to improve the priority;
- Group repetition subitem of ()
/** grouping: The result of exec(match) and replace changes as long as there are groups (more on the regex method later) Whenever parentheses are present in the re, a group is formed. We can refer to this group by \n (n is the number of groups). The first small group can be denoted by \1 for example: Calculate the string 'abAAbcBCCccdaACBDDabcccddddaab' in most of the letters, and find out how many times a (case). * /
var str = 'abbbbAAbcBCCccdaACBDDabcccddddaab';
str = str.toLowerCase().split(' ').sort(function(a,b){
return a.localeCompare(b)
}).join(' ');
var reg = /(\w)\1+/ig;
var maxStr = ' ';
var maxLen = 0;
str.replace(reg,function($0, $1){
var regLen = $0.length;
if(regLen>maxLen){
maxLen = regLen;
maxStr = $1;
}else if(maxLen == regLen){
maxStr += $1; }})console.log('The most common letter is${maxStr}, came together${maxLen}Time `)
Copy the code
- We can add? To () when we add () just to increase priority and do not want to capture small groups. : to cancel the capture of the group
var str = 'aaabbb';
var reg = /(a+)(? :b+)/;
var res =reg.exec(str);
console.log(res)
//["aaabbb", "aaa", index: 0, input: "aaabbb"]
// Capture only the contents of the first small group
Copy the code
2.2.2 Precedence of regular operators
- Regular expressions evaluate from left to right and follow the order of precedence, much like arithmetic expressions.
- Those with the same priority are evaluated from left to right, and those with different priorities are evaluated from higher to lower.
// The following is the order of precedence of the common operators
// Indicate the order of precedence of the various regular expression operators from highest to lowest:\ : Escape character (), (? :), (? =), [] => parentheses and square brackets *, +,? , {n} {n,}, {n, m} = > quantifiers qualifier ^ and $, \ | = > replace any metacharacters, any characters,"Or"Operating character is higher than replace the operator priority, general use |, in order to improve | priority, we often use () to improve the priorities such as: reg = when matching food or foot/foo(t|d)/So that matchesCopy the code
2.2.3 Features of re
-
TanLanXing
Greedy is when the re tries to capture as many items as possible at a time. If we want to capture as few qualified strings as possible, we can add? To the quantifier metacharacter.
-
Lazy sex
Laziness is when the re is successfully captured once, regardless of whether the following string matches the condition. If we want to capture all eligible strings in the target, we can use the g identifier to indicate global capture
var str = '123aaa456';
var reg = /\d+/; // Catch as many as you can once
var res = str.match(reg)
console.log(res)
// ["123", index: 0, input: "123aaa456"]
reg = /\d+? /g; // Tackle greed and laziness
res = str.match(reg)
console.log(res)
// ["1", "2", "3", "4", "5", "6"]
Copy the code
The usage of regular expressions
We will cover only test, exec, match, and replace here
- Reg.test (STR), used to verify that a string is regular, returns true if it is. False otherwise
var str = 'abc';
var reg = /\w+/;
console.log(reg.test(str)); //true
Copy the code
- Reg.exec () is used to capture strings that conform to the rule
var str = 'abc123cba456aaa789';
var reg = /\d+/;
console.log(reg.exec(str))
// ["123", index: 3, input: "abc123cba456aaa789"];
console.log(reg.lastIndex)
// lastIndex : 0 In an array captured by reg.exec// [0:"123",index:3,input:"abc123cba456aaa789"]
0:"123"Represents the string we caughtindex:3The input index representing the start of the capture represents the original stringCopy the code
When we use exec to capture, if the re does not have a ‘G’ identifier, exec will capture the same re every time. If the re has a ‘G’ identifier, the result will be different. Let’s look at the example above
var str = 'abc123cba456aaa789';
var reg = /\d+/g; // add the identifier g
console.log(reg.lastIndex)
// lastIndex : 0
console.log(reg.exec(str))
// ["123", index: 3, input: "abc123cba456aaa789"]
console.log(reg.lastIndex)
// lastIndex : 6
console.log(reg.exec(str))
// ["456", index: 9, input: "abc123cba456aaa789"]
console.log(reg.lastIndex)
// lastIndex : 12
console.log(reg.exec(str))
// ["789", index: 15, input: "abc123cba456aaa789"]
console.log(reg.lastIndex)
// lastIndex : 18
console.log(reg.exec(str))
// null
console.log(reg.lastIndex)
// lastIndex : 0
/** Each time the exec method is called, a different string is captured. This value is 0 when capture has not started. If the current capture result is NULL. Then the lastIndex value will be changed to 0. Next time, start from scratch. And the lastIndex property also supports manual assignment. * /
Copy the code
Exec’s capture is also affected by grouping ()
var str = '2017-01-05';
var reg = /-(\d+)/g
// ["-01", "01", index: 4, input: "2017-01-05"]
"- 01": re captures the content"01": The contents of a small grouping of captured stringsCopy the code
- Str.match (reg) returns an array of matches if successful, or null if unsuccessful
// Match is similar to exec
var str = 'abc123cba456aaa789';
var reg = /\d+/;
console.log(reg.exec(str));
//["123", index: 3, input: "abc123cba456aaa789"]
console.log(str.match(reg));
//["123", index: 3, input: "abc123cba456aaa789"]
Copy the code
What is the difference between the results of the two methods console above? The two strings are the same. When we do global matching, the difference becomes apparent.
var str = 'abc123cba456aaa789';
var reg = /\d+/g;
console.log(reg.exec(str));
// ["123", index: 3, input: "abc123cba456aaa789"]
console.log(str.match(reg));
/ / / "123", "456", "789"]
Copy the code
When a global match is performed, the match method captures all the matching strings into the array at once. If you want to use exec to achieve the same effect, you need to execute exec multiple times.
We can try using exec to simply simulate the implementation of the match method.
String.prototype.myMatch = function (reg) {
var arr = [];
var res = reg.exec(this);
if (reg.global) {
while (res) {
arr.push(res[0]);
res = reg.exec(this)}}else{
arr.push(res[0]);
}
return arr;
}
var str = 'abc123cba456aaa789';
var reg = /\d+/;
console.log(str.myMatch(reg))
/ / / "123"
var str = 'abc123cba456aaa789';
var reg = /\d+/g;
console.log(str.myMatch(reg))
/ / / "123", "456", "789"]
Copy the code
In addition, both match and exec can be affected by grouping (), but match displays the contents of small groups only if there is no identifier G, and if there is global G, match captures them all in an array at once
var str = 'abc';
var reg = /(a)(b)(c)/;
console.log( str.match(reg) );
// ["abc", "a", "b", "c", index: 0, input: "abc"]
console.log( reg.exec(str) );
// ["abc", "a", "b", "c", index: 0, input: "abc"]When I have global Gvar str = 'abc';
var reg = /(a)(b)(c)/g;
console.log( str.match(reg) );
// ["abc"]
console.log( reg.exec(str) );
// ["abc", "a", "b", "c", index: 0, input: "abc"]
Copy the code
- The str.replace() method is no stranger to you, and now we’re talking about something related to this method and regex.
// The re matches the string, and the successful character is replaced with a new string
Replace (reg,newStr); // Replace (reg,newStr);
var str = 'a111bc222de';
var res = str.replace(/\d/g.'Q')
console.log(res)
// "aQQQbcQQQde"The second argument to replace can also be a function str.replace(reg,fn);var str = '2017-01-06';
str = str.replace(/-\d+/g.function(){
console.log(arguments)})/** Console print result: ["-01", 4, "2017-01-06"] ["-06", 7, "2017-01-06"] "2017undefinedundefined" So it seems possible to print out the contents of small groups as well
var str = '2017-01-06';
str = str.replace(/-(\d+)/g.function(){
console.log(arguments)})/**
["-01", "01", 4, "2017-01-06"]
["-06", "06", 7, "2017-01-06"]
"2017undefinedundefined"
*/
// From the results our guess is all right.
// Also, we need to note that if we need to replace the string found by the re in replace, we need a return value to replace what the re caught.
Copy the code
The method of getting parameters in a URL through the replace method
(function(pro){
function queryString(){
var obj = {},
reg = /([^?&#+]+)=([^?&#+]+)/g;
this.replace(reg,function($0, $1, $2){
obj[$1] = $2;
})
returnobj; } pro.queryString = queryString; } (String.prototype));
// For example, the URL is https://www.baidu.com?a=1&b=2
// window.location.href.queryString();
// {a:1,b:2}
Copy the code
3.1 Zero-width assertion method
Used to find something before or after something (but not including it), as \b,^,$is used to specify a location that should satisfy certain conditions (that is, assertions), so they are also called zero-width assertions.
Zero-width assertions come in handy when using regular expressions, and we don’t want to capture specific content before and after the captured content.
- Zero-width positive prediction antecedent assertion (? =exp)
- Zero-width negative predictive predictive predictive predictors (? ! exp)
- Zero width forward retrospective postpredicate (? <=exp)
- Zero-width negative post-retrospective assertion (?
Let’s take a look at each of them and see what they do.
- (? =exp), which means that the right side of the character must match the expression exp.
var str = "i'm singing and dancing";
var reg = /\b(\w+(? =ing\b))/g
var res = str.match(reg);
console.log(res)
// ["sing", "danc"]
// Note that we are talking about positions, not characters.
var str = 'abc';
var reg = /a(? =b)c/;
console.log(res.test(str)); // false
/** This appears to be correct, in fact the result is false reg a(? =b) matches the string 'ABC' and b is to the right of string A, which matches no problem, then a(? /ac/ matches' ABC '. /ac/ matches' ABC '. /ac/ matches' ABC '
Copy the code
- (? ! Exp: this means that a character cannot appear to the right of the expression exp.
var str = 'nodejs';
var reg = /node(? ! js)/;
console.log(reg.test(str)) // false
Copy the code
- (? <=exp) this means that the character position is preceded by exp.
var str = '$998 $888';
var reg = / (? <=$)\d+/;
console.log(reg.exec(str)) / / 888
Copy the code
- (?
var str = '$998 $888';
var reg = / (?
;
console.log(reg.exec(str)) / / 998
Copy the code