Introduction to the

Going back to the last section, we outlined some of the syntax and concepts of regular expressions. Starting with this section, I will focus on how regular expressions work and backtracking. I will also insert some simple examples to help you understand them better.

The working principle of

compile

The NFA expression engine can create an instance of a regular expression using the direct quantity of an expression or the constructor of a RegExp. The JS interpreter can check the syntax for errors during parsing. This is then translated into a native code routine that can be executed. If you loop a regular expression literal in previous versions of ES5, it will reference the same expression. The ES5 specification explicitly states that regular expression literals must be created in line with constructors, creating a new instance each time. The sample is as follows

let reg = null
for (leti = 0; i < 10; I++) {reg = /\d/g // reg instances before ES5 will be reused, meaning that the lastIndex does not start at 0 reg.test('${i}`)}Copy the code

The lastIndex attribute is mentioned here, which represents the position of the current regular expression in the matching target.

back

The traceback of the NFA engine, which processes each subexpression in turn and, when faced with a choice between two or more success possibilities, selects one first and records the current re position in the alternate branch.

Need to make a choice, in general, including the condition of the quantifiers and multiple structure, which is divided into priority matching quantification, ignore the first match, gave priority matching quantification (temporary does not support JavaScript), as long as one of the branches is successful, the entire expression is successful, if a branch fails, will spare branch try back into the record just now, Until all branches fail to trace back, the entire expression fails to trace back. In addition to the alternate branch of the regular expression, the position of the matching string also needs to be recorded for easy traceback.

There is an important principle for choosing which state to save when backtracking occurs

1. For priority matching quantifiers, the engine attempts, and for ignoring priority quantifiers, the engine skips the attempt. 2. The nearest storage branch is LIFO, a stack structureCopy the code

Here I will add the definition of ignoring priority quantifiers and priority quantifiers for your understanding

Priority matching quantifiers: +,? , *, {0,} ignore the matching quantifier :(*? And???? , +,?Copy the code

Match quantifiers first

With the basic backtracking principle in mind, let’s look at a simple example

let reg = /\w+\d/
let str = 'abcdefg123'
reg.exec(str)Copy the code

\w+ is the preferred match quantifier, so the engine chooses to try all possibilities, recording the alternate state of each match up to the end of the string

After matching to the end of the string, it found that there was still a subexpression that had not been verified, so it started backtracking to the last point of its own record. At this time, string 3 was handed over, and the engine transferred control to the next seat, expression \d, and found that the subexpression succeeded, and the whole expression matched successfully



Note that backtracking requires not only recalculation of the regular expression and text, but also constant maintenance of the matching text within the parenthesized (grouped) subexpression, as in another example

let reg = /.*([0-9][0-9])/
let str = 'CA95472**USA'
reg.exec(str) //72Copy the code

. * used to represent a group of any character, because the dot can match any character, and an asterisk any number, is also a priority matching quantifiers, pay attention to this expression, we joined a group of parentheses, the corresponding regular $1 attribute, plus the, towards the end of the string matching quantifiers started back, handing over control, each time back, in addition to move back in the states, It also maintains the values within the group, affecting the matching efficiency

Ignore priority matching quantifiers

Ignoring the concept of priority quantifiers we’ve already mentioned above, let’s go straight to a practical example to help us understand

let reg = /<b>(.*)<\/b>/
let str = ' I'm bold  I'm bold 'Reg. Exec (STR) // I am a bold font </b><b> I am a bold fontCopy the code

The above expression is intended to intercept the text inside the B tag, but the principle of intercepting other tags of the same kind is the same as the backtracking we described above. How do we intercept the first paragraph of text, which requires an ignore priority quantifier

letreg = /<b>(.*?) <\/b>/let str = ' I'm bold  I'm bold 'Reg. Exec (STR) // I'm in boldCopy the code

Here we can see that the string has successfully intercepted the result we want. How does ignoring the priority quantifier work



In the figure above, we can see that the second step is bypassed. This expression, went straight to the back of the < / b > expression, the reason is that ignore the first match, the default rules is not match, even through the back door of back late, it is also a full-on, every step will store the standby state, and in the next match, preferred to ignore yourself, and then continue to match fails, continue to back, to make the whole expression of failure, The engine must try all possibilities.

Is there a way for engines to fail quickly? The answer is definitely yes, and that is the cure grouping and possessive quantifiers. Cure grouping is not well supported, so we will cover it in the next section.

summary

This section focuses on some common cases of backtracking, and there are some examples of branching judgment that are not covered, which are actually the same as quantifiers. If you are interested, you can go down and explore them. In the next section, we will focus on sequential loop, solidified grouping, and the use of sequential loop to simulate solidified grouping