Let’s say we’re looking for 15 or 18 digits. According to the previous study of knowledge, the use of quantifiers can represent occurrences, use pipe symbols can represent multiple choice, you should soon be able to write \ d {15} | \ d {and}. But if you test it out, you’ll find that the re doesn’t do the job very well because the 18 digits match the first 15 digits, as shown below.
The reason for this is that in most re implementations, the branching selection is left first, so if you reverse the order of the expression above, it will work
If you think of the question as a question of 15 digits, do you have or do not have the next three digits?
Here (\d{3}) we use parentheses. If you don’t use parentheses, you’ll see a problem. The function of parentheses in the re is to group. When a part consisting of multiple metacharacters should be considered as a whole, parentheses can be used to represent the whole. This is an important function of parentheses. In fact, the use of parentheses also serves another purpose, and that is reuse.
Grouping and numbering
Parentheses can be used for grouping in a re, and the part of the “subexpression” enclosed in parentheses is saved as a subgroup. What are the rules for grouping and numbering? In fact, it is very simple, in a word, the number of brackets is the number of groups. This might be a little hard to understand, but let’s do an example. Here’s a time format: 2020-05-10 20:23:05. Suppose we want to use the re to extract the date and time inside.
We can write the re as shown, enclosing the date and time in parentheses. There are two groups in this re, date 1 and time 2.
Save the subgroups
What’s inside the parentheses is saved as a subgroup, but in some cases, you might just want to use the parentheses to see some parts as a whole and not use them again, and in those cases, there’s no need to keep the subgroups in practice. We can use that in parentheses, right? : Does not save subgroups.
If parentheses occur in the re, we assume that the subexpression may be referenced again later, so not saving the subgroup improves the re’s performance. In addition, there are some advantages to doing this, as fewer subgroups lead to better regex performance and less error prone subgroup counting.
So what exactly is not saving subgroups? Parentheses are only used for grouping, treating a part as a “single element”, not assigning a number, and not referring to that part later.
regular | The sample | |
---|---|---|
Save the subgroups | (xx) | \d{\15}(\d{3})? |
Do not save subgroups | (? :xx) | \d{\15}(? :\d{3})? |
Nested parentheses
So we talked about subgroups and numbering, but there are some cases where it’s a little bit more complicated, like in the case of nested parentheses, what if we want to look at the number of groups that are in the parentheses? Don’t worry, it’s very simple, we just count the number of open parentheses to determine the number of subgroups.
In aliyun simple log system, we can use re to match the beginning of a log line. Suppose the time format is 2020-05-10 20:23:05.
The date group ids are 1, time group ids are 5, year, month, day are 2,3,4, and hour, minute, and second are 6, 7, and 8 respectively.
Grouping reference
Once we know the number of the grouped references, in most cases we can use the “backslash + number” method, i.e. \number, as opposed to the JavaScript method of \number, such as \1
Group references and lookup
Having covered the basics of subgroups and references, let’s now look at how to use grouped references in regular lookups. For example, if we want to find a word that appears repeatedly, we can use the re to easily make “the word that appeared before appears again”, how to operate the object? We can use \w+ to represent a word, and we can easily write the regular expression (\w+)\ 1 or (\w+)\s\1.
Use of grouped references in substitution
Similar to lookups, we can use backreferences to spell out what we want in the result. Again, using the date and time example, we can easily replace it with a grid like May 10, 2020.
! [image-20210605105547998](/Users/chenbang/Library/Application Support/typora-user-images/image-20210605105547998.png)
Use in a text editor
Using Sublime Text 3 as an example, I’ll show you how to use regular find and replace. Sublime Text 3 is a cross-platform editor that is small, powerful, and, while a premium application, you can always try it out and download and install it yourself. As you get comfortable with the editor, you’ll find that you can use it for a variety of tasks without having to write code.
Problems in practice
Finally, let’s do a little exercise. There are some words that appear more than once in a row in an English passage.
the little cat cat is in the hat hat hat, we like it.
Where cat and hat connections appear for many times, and the result after processing is
the little cat is in the hat, we like it.