Regular expression – Syntax

Regular expression describes a pattern of string matching. It can be used to check whether a string contains a certain substring, replace the matched substring, or extract the substring that meets a certain condition from a string.

Special characters

Special characters are characters that have special meanings, such as runoo*b, which simply means any string. If you want to look for the * symbol in a string, you need to escape the * by prefacing it with a \, runo*ob match the string **runoob**.

Many metacharacters require special treatment when trying to match them. To match these special characters, the characters must first be “escaped”, that is, preceded by the backslash character \. The following table lists special characters in regular expressions:

Special characters describe
$ Matches the end of the input string. If the Multiline property of the RegExp object is setAlso matches ‘\n’ or ‘\r’. To matchFor the character itself, use $.
( ) Marks the start and end of a subexpression. Subexpressions can be retrieved for later use. To match these characters, use (and).
* Matches the preceding subexpression zero or more times. To match the * character, use *.
+ Matches the previous subexpression one or more times. To match the + character, use +.
. Matches any single character except newline character \n. To match., use.
[ Marks the beginning of a bracketed expression. To match [, use [.
? Matches the preceding subexpression zero or once, or indicates a non-greedy qualifier. To match? Character, please use? .
\ Marks the next character as an or special character, or a literal character, or a backreference, or an octal escape. For example, ‘n’ matches the character ‘n’. ‘\n’ matches a newline character. The sequence ‘\’ matches “”, while ‘(‘ matches “(“.
^ Matches the starting position of the input string, except when used in a square bracket expression, when the symbol is used in a square bracket expression, to indicate that the set of characters in the square bracket expression is not accepted. To match the ^ character itself, use ^.
{ Marks the beginning of a qualifier expression. To match {, use {.

qualifiers

Qualifiers are used to specify how many times a given component of a regular expression must occur to satisfy a match. Have * or + or? Or {n} or {n,} or {n,m}.

Qualifiers for regular expressions are:

character describe
* Matches the preceding subexpression zero or more times. For example, zo* matches “z” and “zoo”. * is equivalent to {0,}.
+ Matches the previous subexpression one or more times. For example, ‘zo+’ matches “zo” and “zoo”, but not “z”. + is equivalent to {1,}.
? Matches the preceding subexpression zero or once. For example, “do (es)?” Matches “do”, “does”, and “doxy”. ? Equivalent to {0,1}.
{n} N is a non-negative integer. Match certain n times. For example, ‘o{2}’ does not match the ‘O’ in “Bob”, but does match two o’s in “food”.
{n,} N is a non-negative integer. At least n times. For example, ‘o{2,}’ does not match the ‘O’ in “Bob”, but matches all o’s in “foooood”. ‘o{1,}’ is equivalent to ‘o+’. ‘o{0,}’ is equivalent to ‘o*’.
{n,m} Both m and n are non-negative integers, where n <= m. At least n times and at most m times are matched. For example, “o{1,3}” will match the first three o’s in “fooooood”. ‘o{0,1}’ is equivalent to ‘o? ‘. Note that there can be no Spaces between commas and numbers.

The following regular expression matches a positive integer with [1-9] setting the first digit not to be 0 and [0-9]* representing any number of digits:

/ [1-9] [0-9] * /Copy the code

Notice that the qualifier comes after the range expression. Therefore, it applies to the whole range of expressions, in this case specifying only numbers from 0 to 9 (both 0 and 9 included).

The + qualifier is not used here, because it is not necessary to have a number in the second or following position. Don’t use it? Character, because use? Limits integers to only two digits.

If you want to set a two-digit number from 0 to 99, you can use the following expression to specify at least one digit and at most two digits.

/ [0-9] {1, 2} /Copy the code

The disadvantage of the above expression is that it can only match two digits, and the section numbers that can match 0, 00, 01, 10, 99 still only match the first two digits.

The expression of positive integers matching 1~99 is as follows:

/ [1-9] [0-9]? /Copy the code

or

/ [1-9] [0-9] {0, 1} /Copy the code

The *** and + qualifiers are greedy because they match as many words as possible, only by adding one to the end of them. You can achieve non-greedy or minimum matching. 支那

For example, you might search an HTML document to find content within an H1 tag. The HTML code is as follows:

123-number