Recently has been learning, in line with the idea that good memory is inferior to bad writing, the author shameless will come to the community to sum up a wave ~
The warmth we left on the keyboard will also be transmitted to the farther future along with The Times
What is a regular expression
So what is a regular expression? When do I need regular expressions?
Many features of JavaScript are borrowed from other languages, such as syntax borrowed from Java, functions borrowed from Scheme, prototype inheritance borrowed from Self, and regex borrowed from Perl
Regular expression principle
Step 1: Compile
After you create a regular expression object (using a regular expression direct or RegExp constructor), the browser checks your template for errors and converts it into a native code routine that performs the matching. If you assign a regular expression to a variable, avoid repeating this step.
Step 2: Set the starting position
When a regular expression is put to use, the first step is to determine where in the target string to start the search. It is the start position of the string, or specified by the lastIndex property of the regular expression, but when it returns here from the fourth step (because the attempt failed), it will be one character after the start position of the last attempt.
Step 3: Match the character of each regular expression
Once the regular expression finds its starting position, it scans the target text and the regular expression template one by one. When a particular character fails to match, the regular expression tries to go back to where it was before the scan and then onto other possible paths of the regular expression.
Step 4: The match succeeds or fails
If a perfect match is found at the current position of the string, the regular expression is declared successful. If all possible paths to the regular expression have been tried and failed to match, the regular expression engine goes back to step 2 and tries again from the next character in the string. If every character in the string (and the position after the last character) has gone through this process without a successful match, the regular expression is declared a complete failure.
start
Search, replace, and extract key information that always appears in strings in daily development. But it’s a bit of a headache because of the notation and complexity. Go directly to the code, for example, now write a re that matches the URL. The following line of code is likely to appear.
let reg = /(ht|f)tp(s?) \:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?) ([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?Copy the code
When you’re in the middle of a project looking at someone else’s code logic and everything is going well, suddenly you get a bunch of this. Change and afraid of bad, do not change and can not meet the requirements, that can only rm – RF, and then
Even so, we still have to use them, and we have to admit that at some point using the re is much easier than using the string API.
Next, let’s learn regular expressions happily. Before we write the re, let’s take a look at what some of the characters in the code just above represent. After all, the grinder does not mistakenly cut wood workers, this online search is a lot, here is not a waste of chapter summarized here. Put up a portal. See for yourself. Click on the portal to open the new world
——- split line ——-
Common methods of regular expressions
Now you see the content after the separator line, I believe you have understood the meaning of most of the characters, or opened the link in another page, if I guess correctly, please give me a like. Regexp. Exec, regexp. Test, String.match, String.replace, String.search, and String.split
regexp.exec(string)
Exec (string)\color{red}regexp. Exec (string)regexp. Exec (string)regexp.
regexp.test(string)
Regexp. Test (string)\color{red}regexp. Test (string)regexp. Test (string), the simplest and fastest method, returns true on success, false on failure. It’s important not to use g notation for this method, why? Find out later.
string.match(regexp)
String.match (regexp)\color{red}string.match(regexp)string.match(regexp). If no g flag is added to the re, the result of the call is the same as that of exec. If you carry g, an array of all matches is generated.
string.replace(searchValue, replaceValue)
string.replace(searchValue,replaceValue)\color{red}string.replace(searchValue, ReplaceValue) String.replace (searchValue,replaceValue) : Searches for and replaces a string. Returns a new string. The first parameter can be a string or regular expression.
For example let r = “replace_result_str”. Replace (“_”, “-“) // ‘replace-result_str’
If the argument is a regular expression with a g identifier, all target characters will be replaced. Without g, only the first match will be replaced.
So here comes the second argument, which can be a string or a function. If it is just a string, then it is the keyword to replace the match, but if the string is $, then it is no longer an ordinary string, this son thunder male mouth, gu Guizui, fire eyes, yellow hair, gold band, a yellow hair, two red.
- $& : The entire matching text
- $number: text captured in a group
- $’ : matches the previous text
- $’ : matched text
If it is a function, the string returned is used as the replacement text. The first argument received by the function is the entire text that was matched, the second is the text captured in group 1, the third is the text captured in group 2, the fourth is the text captured in group 3, and so on.
string.search(regexp)
String.search (regexp)\color{red} String.search (regexp) String.search (regexp) is similar to indexof, except that it takes a regular as an argument. If a match fails, -1 is returned. This method ignores the g flag.
string.split(separatpr, limit)
String. The split (separatpr, limit), color red string. Attach the split (separatpr, limit) string. The split (separatpr, limit), In fact, the first parameter of this method can also be passed into the regular expression, but it is rarely used in development and is no longer expanded.
Know the above methods, then to a few simple questions to practice, talk less nonsense, into the furnace.
Verify that the user entered the correct zip code
-
Verify that the user entered the correct zip code let emailReg = /^\d{6}$/
Before writing, we need to know what we are going to match and what the keyword rules are. Zip codes are usually six digits long. The // inside the re is equivalent to double quotes. \d is used for any number, and {6} is used for 6 digits. {0,6} is used for 0-6 digits, and 6 or more {6,} is used. The code begins with a number, the content is six digits, and the code ends with a number. So you can use this rule to match mailboxes. Test it out.
let guangzhou = 510000;
emailReg.test(guangzhou)// true
let shenzhen = 518000;
emailReg.test(shenzhen)// true
let foshan = 528000;
emailReg.test(foshan)// true
Copy the code
Verify that the date format is correct
-
Verify that the date format is correct. The date format is 2018-12-14, four-digit – two-digit – two-digit, so let’s look at the regular notation.
let dateReg = /^\d(4)-\d{2}-\d{2}$/;
Wrote at first glance seem to have no problem, just to date also can match, but the law of the date, if not summarize the above \ d match is an arbitrary number, do not distinguish, but we need to match the date of the beginning but there are specific figures, such as in double-digit total no 9 at the beginning, then we will need to be optimized. let dateReg = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-9]|3[0-1])$/
Break the above regular can understand so, first of all, we match the year 4 digits \ d (4), followed by the month begin with 0 only until September, 1 is 10 to 12 months in the beginning, (0 [1-9] | [2] 10 -), then followed by dates starting with 0 to 9, 1 to 19 in the beginning, 2 to 29 in the beginning, 3 at the beginning of the remaining two, then the rule out step-by-step writing (0 [1-9] [0-9] | 1 | 2 | 3 [0-9] [0, 1])
let currentDate = '2021-11-05'
let currentDateNext = '2021-11-06'
let currentDatePre = '2021-11-04'
console.log(dateReg.exec(currentDate))
console.log(dateReg.exec(currentDateNext))
console.log(dateReg.exec(currentDatePre))
Copy the code
If you look at this match and you wonder why the month and date are matched separately, the reason is because in the expression, we wrapped the month and date in parentheses, and when it matched, it was shown as a single index.
Change the middle four digits of your phone number to an asterisk
- Let’s try replace again. Take that: replace the middle four digits of your phone number with an *
If we don’t use the re, we might intercept the string and concatenate it. There is no need to bother with the re case
let phoneNum = '12345678900' let phoneNumReg = /(\d{4})(\d{4})(\d{3})/ let r = phoneNum.replace(phoneNumReg, '$1 * * * * $3) / / 1234 * * * * 900Copy the code
You may be asking, what is this $1 $3? What is this? Let’s go back to earlier, when we said that expressions wrapped in parentheses are shown with a single subscript, starting with 1, and that subscript 0 is the entire character that is matched. Let’s test it with exec to make it clear.
Follow the instructions for the replace method further up, when the argument is a string$
When,$number
Group captured text, and then something interesting happens,The $1
It’s the first text that matches\d{4}
.$2
Is 2\d{4}
.$3
Is the third\d{3}
.
So knowing this, it’s not hard to understand the code above. The result is $1****$3. If you don’t believe me, you can replace $3 with $2 and see if it is much easier to do this in one line of code.
If you don’t understand the above usage, try a simple one: convert the date in YYYY/MM/DD format to MM/DD/YYYY format. Today it’s 2021/11/05, and if flipped it’s 05/11/2021.
let dateStr = '2021/10/11' let dateReg = /(\d{4})\/(\d{2})\/(\d{2})/ let r = dateStr.replace(dateReg, '$3 / $2 / $1') / / 11/10/2021Copy the code
Or by getting the grouping, splicing. You don’t have to convert the array, flip it, merge it, etc. Replace ($); replace ($); replace ($); replace ($);
String turns hump
- For a simple example, change an all-string to a hump, for example, abc-def-gh to abcDefGh
At this time, the use of the previous concatenation has not worked, some students may say THAT I can convert to an array… (omitted ten thousand words), and then concatenation is good. It’s just that replace has another use that doesn’t involve deleting 10,000 words.
let str = 'abc-def-gh'
let strReg = /-(\w)/g
let r = str.replace(strReg, function ($, $1) {
return $1.toUpperCase()
}) // abcDefGh
Copy the code
Is amazing, we will just use the $1 pass function, and then processed to return to, the end result is what we want, some daily usage is almost said so much, actually a lot more practice can practice makes perfect, like math class know the formula, and still not know how to use without practice. 馃榿 馃惐 馃懁 馃惐 馃懁
Regular expression branching
Regular expression branch contains one or more of the regular expression sequence, the sequence can be separated by perpendicular line | characters. If any of the criteria are met in the procedure, it will be matched. It matches sequence items in sequence, for example:
“Masang”. The match (/ ma | mas /) will only match ma in this line of code, not match into mas, because ma in front of the mas has been a match.
Regular expression escape
Backslashes are escaped in re, and when you write expressions like /路路路路路/g, in this case if you need to match the/character, you need to escape \/, preceded by a backslash.
For example, in a regular expression, if you need to match the repeated character abcabcabc, you might write let reg = /(\w)+/. Then if you want to string the middle ABC, you have to match it separately. Let reg = /(ABC)(ABC)(ABC)/ and then process.
Now it’s just an ABC character, and it’s quick to write, but if the character is more than 3 characters long, do I have to copy it like this? As a good CV engineer, I would not allow this to happen, so the backslash is good. The above code just needs to let reg = /(ABC)(\1)(\1) or let reg = /(ABC)(\1)(\2). \1 refers to a reference captured by group 1, so it can be matched again. It also points to different groups, so we don’t have to copy the same code all the time, just change the reference.
Regular expression quantifiers
In the first exercise, we mentioned that if we need to match a number from 0 to 6 or greater than 6, we will write it differently in curly braces. So it’s the quantifier suffix that determines how many times this factor should be matched. ? Equivalent to {0,1}, * equivalent to {0,}, + equivalent to {1,}.
If there is only one quantifier, then there is a tendency towards greedy matching, i.e. matching as many copies as possible until the upper limit is reached. , which means a non-greedy match, just match the necessary copies, and in general it’s best to use greedy matches. As I write this, I can’t help but think of two patterns for people: those who try to do as much as possible, and those who try to do as much as possible. I can’t say which one is good or bad, but it varies from person to person.
Regexp. test Use g with caution
If you have a G identifier in the expression, things are different. If a match is successful, the value of regexp. LastIndex is the position of the last character. If a match is successful, the value of regexp. LastIndex is reset back to 0. And of course his initial value is 0. This is a bit abstract, we block code easy to understand.
let str = '1234';
let reg = /^\d+$/g;
console.log(reg.test(str)); //true
Copy the code
If I run console.log(reg.test(STR)) again, what is the result? If the answer is false, console.log(reg.test(STR)) is executed again; What is it? The answer is true
let str = '1234';
let reg = /^\d+$/g;
console.log(reg.test(str)); //true
console.log(reg.test(str)); //false
console.log(reg.test(str)); //true
console.log(reg.test(str)); //false
Copy the code
What? If regexp. LastIndex is returned true for the first time, then regexp. LastIndex will no longer be 0. The next time you call regexp. LastIndex, the regexp. So the third time we call it, it works again. We can print out the prisoner and see his ugly face.
let str = '1234';
let reg = /^\d+$/g;
console.log(reg.test(str)); //true
console.log(reg.lastIndex) // 4
console.log(reg.test(str)); //false
console.log(reg.lastIndex) // 0
console.log(reg.test(str)); //true
console.log(reg.lastIndex) // 4
Copy the code
So now that you know the problem, there’s a way to solve it, and it’s easy to just reset it back to 0. Reg.lastindex = 0.
Regular expressions can cause performance problems
Avoid a regular expression at work that does too much work. Complex search problems that require conditional logic are easier to solve and often more efficient to split into two or more regular expressions, each performing a lookup only in the final match result. Regular expression monsters that do everything in one template are difficult to maintain and prone to loop-related problems. In short, it’s short, it’s clear, it’s short
Intensive string manipulation and shallow writing of regular expressions can be major performance hurdles. Backtracking is not only a basic part of regular expression matching, but also a common reason why regular expressions affect efficiency. This is in large part what makes regular expressions so powerful. Backtracking is expensive, and if we want to write some efficient regular expressions, we need to understand how it works and how to use it less often.
Backtracking occurs where the regular expression is supposed to find a match quickly, because some particular action to match the string causes it to slow down or even crash the browser. Some ways to avoid this problem are not limited to adjacent characters being mutually exclusive, avoiding nested quantifiers matching the same part of a string more than once, and removing unnecessary backtracking by reusing the atomic nature of forward-looking operations.
Write in the last
The article does not involve many things, if there are wrong places welcome correction, thank you big guys ~~~ there are a lot of things can not be listed one by one for example: Greedy mode, non-greedy mode, positive matching, backtracking mentioned at the end of this article, backtracking out of control, etc. ~ these summaries are too long (want to be lazy), and may be summarized again in the follow-up ~~馃馃馃
May Edg win tomorrow
May EDG win tomorrow