Foreword: regular expression – interview must test question ah, everyone quickly to have a look, welcome to put forward a new supplement yo, there is a problem, please also point out! Some bosses blog reference: www.cnblogs.com/myitnews/p/…
1. What is regular expression?
A text specification that includes ordinary characters (a-Z, etc.) and special characters (" metacharacters "); Regular expressions are described using a single string, matching a series of strings that match a syntactic rule. Use of regular expressions: 1. Extract substrings from strings based on pattern matching; 2. Replace text. 3. Test the pattern in the string;Copy the code
2. Basic knowledge of regular expressions
Constructors: use a variety of metacharacters and operators to combine small expressions to create larger expressions; Regular expression components: a single character, a set of characters, a range of characters, a selection between characters, or any combination of all these components;Copy the code
2.1 Common Characters
Matching [ABC] : [...]. All characters in; [^ ABC] : indicates in addition to [...] All characters of the middle character; [a-z]: matches a range of all lowercase letters. . : matches any single character other than the newline character (\n,\r), similar to [^\n\r]; [\s\ s]: matches all, \s matches all whitespace characters, including newlines, \s non-whitespace characters, including newlines; \ W: Matches letters, digits and underscores, equivalent to [A-zA-z0-9];Copy the code
2.2 Non-print characters
Escape sequences for non-print characters: 1. \f: matches a feed character ~\x0c and \cL; 2. \n: Match a newline character ~\x0a and \cJ; 3. \r: Matches a carriage return character ~\x0d and \cM; 4. \ S: Matches any whitespace characters, including Spaces, tabs, page feeds, etc. ~[\f\n\r\t\v], Unicode regular expressions will match full-corner Spaces; 5. \S: Matches any non-whitespace character. ~[^ \f\n\r\t\v]; 6. \t: Matches a TAB character, ~\x09 and \cI;Copy the code
2.3 Special Characters
For some characters with special meaning, when matching, the character is "escaped" first, that is, backslash \ is preceded; 1. $: matches the end of the input string. If the multiline property of regexp is set, $also matches '\n' or '\r', and matches $itself, use \$; 2. (): marks the start and end of a subexpression that can be retrieved for later use. Match these characters with \(and \); 3. *: matches the preceding subexpression 0 or more times. 4. + : Matches the preceding subexpression once or more, using \+; 5.. : matches any single character except newline character \n, matches. When, use \. 6. []: marks the beginning of a bracketed expression, matching [with \[; 7.?: Matches the preceding subexpression 0 or 1 times, or specifies a non-greedy qualifier, using \?; 8. \ : Mark to the next character or special characters, or literal characters, or backward reference, or octal escape characters. Such as' n 'match' n ', '\ n' match a newline, sequence matching '\ \' \ ', \ '(' match' ('; 9. ^ : Matches the starting position of the input string, exceptions: when used in square brackets expressions, said not to accept the brackets expression of character collection, matching ^, use \ ^; 10. {} : mark qualifier expression, matching {, use \ {; 11. | : Indicate a choice between two, to match |, use \ |; 12. / / ^ / the situation saidCopy the code
2.4 qualifiers
Qualifier definition: Used to specify how many times a given component of a regular expression must occur to satisfy a match. The value can be: *, +, or? , {n},{n,},{n,m}; 1. *: matches the preceding subexpression 0 or more times. 2. + : Matches the preceding subexpression once or more, using \+; 3.? Matches the preceding subexpression 0 or 1 times, or specifies a non-greedy qualifier, matches? ", using \? ; 4. {n}:n is a non-negative integer that matches a certain n times. For example, 'O {2}' cannot match the O in 'Tomy', but can match the two O in 'mood'. 5. {n,}: n is a non-negative integer that matches at least n times. Example 'o{2}' does not match the O in 'Tomy' and matches all the O in 'moooood'. 'o{1,}'~'o+','o{0,}'~'o*'; 6. {n,m}:m and n are non-negative integers, n<=m, matching at least n times and at most m times. Example 'o{1,3}' matches the first three o's in 'moooodddoood'. 'o {0, 1}' ~ 'o? '. ! There can be no Spaces between commas and numbers.Copy the code
2.5 locator
A locator can anchor a regular expression to the beginning or end of a line. The locator is used to describe the boundary of a string or word. ^ and $indicate the beginning and end of a string, \b indicates the front and back boundary of a word, \b indicates the non-word boundary. 1.^: matches the position at the beginning of the input string. If the Multiline property of the RegExp object is set, ^ will also match the position after \n or \r. 2.$: Matches the end of the input string. If the Multiline property of the RegExp object is set, $will also match the position before \n or \r. 3.\ B: Match a word boundary, that is, the position between a word and a space. 4.\B: Non-word boundary matching.Copy the code
! Note: you cannot use qualifiers and positioners together. Expressions such as ^* are not allowed because there cannot be more than one position immediately before/after a newline or word boundary. Matches the text at the beginning of a line of text, using ^ at the beginning of a regular expression. Matches the text at the end of a line of text, using the $character at the end of the regular expression;
2.6 choose
Options: enclosed in parentheses () all the options, use | space between adjacent option; () represents the capture group, and () will save the matching values in each group. Multiple matching values can be viewed by the number N (n is a number representing the contents of the NTH capture group). () Disadvantages: make relevant matches cached, available? : is eliminated before the first option. ? : represents one of the non-capturing elements, and? =,? ! Too. ? = represents a forward lookup to match the search string at any point where the regular expression pattern in () begins to match; ? ! Represents a negative pre-check to match the search string at any location where the regular expression pattern does not match at the beginning;Copy the code
# Listed below? =,? < =,? ! ,? <! Differences in the use of
1. exp1(? =exp2); 2. (? <=exp2)exp1; 3. exp1(? ! Exp2): select * from exp1; 4. (? <! =exp2)exp1;Copy the code
2.7 Reverse Reference
Adding () to both sides of a regular expression pattern or part of a pattern causes related matches to be stored in a temporary buffer. Each submatch captured is stored in the order that it appears in the regular expression pattern from left to right. The buffers are numbered 1-99. N is a one - or two-digit decimal number that identifies a particular buffer. You can use non-capture metacharacters,? : and? =,? ! To override the capture, ignoring the saving of related matches; Application of backreferences: Provides the ability to find matches between two identical ringing words in text.Copy the code
2.8 modifier
Tags (modifiers). Tags in a regular expression are used to specify additional matching strategies. Tags are not written inside the regular expression. Common modifiers: I: ignore, case insensitive. Set the match to case insensitive, and the search is case insensitive: there is no difference between A and A. G: global: global matching. Find all the matches. M: Multi line. Make the boundary characters ^ and $match the beginning and end of each line, remembering multiple lines, not the beginning and end of the entire string. S: special character dot · contains newline character \n. By default, dot · matches any character except newline \n, followed by the s modifier. Contains the newline character \n.Copy the code
2.9 yuan character
1. \ : Marks the next character as a special character, or a literal character, or a backreference, or an octal escape. For example, 'n' matches the character "n". '\n 'matches a newline character. The sequence 'I' matches "\" and "\ (" matches "(". ^ 3. $4. *: matches previous subexpression 0 or more times 5. + : matches previous subexpression once or more 6.... Many omissions, refer to https://www.cnblogs.com/myitnews/p/13818615.html, you can refer to this blog, personal feel very good ~Copy the code
2.10 Operator priority
Regular expressions are evaluated from left to right and follow the order of precedence, similar to arithmetic expressions. The same priority is calculated from left to right, and different priorities are calculated from high to low. From the top to the bottom:Copy the code
Above is summary version ~, everybody can refer to yo