Blog: bugstack.cn

Precipitation, share, grow, let yourself and others can gain something! 😄

One, foreword

Programming always pays off in practice!

Regular expressions are also called regular expressions. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule).

Regular engines can be divided into two main categories: DFA and NFA. Both engines have a long history (more than 20 years now) and there are many variations from both engines! POSIX was introduced to prevent unnecessary variations from continuing. In this way, the mainstream re engine can be divided into three categories: DFA, traditional NFA, and POSIX NFA.

Re is also a very interesting technology, but often do not know how to use these symbols in the actual use of programming, so summarized this article, convenient for all partners can be used as a tool article, convenient to deal with some need to use re technical content.

Second, the rules

1. Common symbols

  • X x
  • \ Backslash character
  • \0n Character n with octal value 0 (0 <= n <= 7)
  • \0nn Character with octal value 0nn (0 <= n <= 7)
  • \0mnn Character with octal value 0 MNN (0 <= m <= 3, 0 <= n <= 7)
  • \ XHH The character hh with the hexadecimal value 0x
  • \uhhhh The character HHHH with a hexadecimal value of 0x
  • \t 制表符 (‘\u0009’)
  • \n Newline character (‘\u000A’)
  • \r carriage return (‘\u000D’)
  • \f page feed (‘\u000C’)
  • \ A Alarm (Bell) symbol (‘\u0007’)
  • \e Escape character (‘\u001B’)

2. Alphabetic characters

  • [ABC] a, B or C (simple class)
  • [^ ABC] any character except a, B, or C (negative)
  • [a-za-z] a to Z or a to Z, inclusive (range)
  • [a-d[m-p]] a to D or m to p: [a-dm-p] (union)
  • [a-z&&[def]] d, e, or f (intersection)
  • [a-z&&[^ BC]] a to z, except b and C: [ad-z] (minus)
  • [a-z&&[^m-p]] a to z, not m to p: [a-lq-z] (minus)

3. Predefined characters

  • . Any character (which may or may not match the line terminator)
  • \ D Digital: [0-9]
  • \D non-digit: [^0-9]
  • \s blank character: [\t\n\x0B\f\r]
  • \S Non-blank character: [^\ S]
  • \ W Word characters: [A-za-z_0-9]
  • \W non-word character: [^\ W]

4. The POSIX character

  • \p{Lower} Lowercase characters: [a-z]
  • \ P {Upper} Uppercase characters: [a-z]
  • \p{ASCII} All ASCII: [\x00-\x7F]
  • \p{Alpha} letter character: [\p{Lower}\p{Upper}]
  • \ P {Digit} Decimal Digit: [0-9]
  • \p{Alnum} alphanumeric characters: [\p{Alpha}\p{Digit}]
  • \p{Punct} Punctuation:!” # $% & ‘() *, +, -. / :; The < = >? ~ @ ^ _ ` [] {|}
  • \p{Graph} Visible characters: [\p{Alnum}\p{Punct}]
  • \p{Print} : [\p{Graph}\x20]
  • \p{Blank} space or TAB: [\t]
  • \p{Cntrl} Control character: [\x00-\x1F\x7F]
  • \ P {XDigit} Hexadecimal digit: [0-9a-fa-f]
  • \p{Space} Blank character: [\t\n\x0B\f\r]

5. Character classes

  • \ p {javaLowerCase} equivalent Java lang. Character. IsLowerCase ()
  • \ p {javaUpperCase} equivalent Java lang. Character. IsUpperCase ()
  • \ p {javaWhitespace} equivalent Java lang. Character. IsWhitespace ()
  • \ p {javaMirrored} equivalent Java lang. Character. IsMirrored ()

6. Classes for Unicode blocks and categories

  • \p{InGreek} Greek block (simple block) characters
  • \p{Lu} uppercase letters (simple category)
  • \ P {Sc} Currency symbol
  • \P{InGreek} all characters except Greek blocks (negative)
  • [\p{L}&&[^\p{Lu}]] all letters except capital letters (minus)

7. Boundary matcher

  • Beginning of ^ line
  • End of $line
  • \b Word boundary
  • \B Non-word boundary
  • \A The beginning of the input
  • \G The end of a match
  • \Z End of input, used only for the last terminator (if any)
  • \z End of input

8. Greedy

  • X? X, not once or not
  • X times X, zero or more times
  • X plus X, one or more times
  • X{n} X, exactly n times
  • X{n,} X, at least n times
  • X{n,m} X, at least n times, but no more than m times

9. We were Reluctant to do it

  • X?? X, not once or not
  • X*? X, zero or more times
  • X+? X, one or more times
  • X{n}? X, exactly n times
  • X{n,}? X, at least n times
  • X{n,m}? X, at least n times, but no more than m times

There is no future

  • X? Plus X, not once or not
  • X times plus X, zero or more times
  • X++ X, once or more
  • X{n}+ X, exactly n times
  • X{n,}+ X, at least n times
  • X{n,m}+ X, at least n times, but no more than m times

11. Logical operators

  • XY, X followed by Y
  • X | Y X or Y
  • (X) X, as the capture group

12. The Back reference

  • \n Any matching NTH capture group

Reference 13.

  • \ Nothing, but references the following characters
  • \Q Nothing, but references all characters up to \E
  • \E Nothing, but ends the quote from \Q

14. Special construction (non-capture)

  • (? :X) X, as a non-capture group
  • (? Idmsux-idmsux) Nothing, but will match the flag I dmsux on-off
  • (? Idmsux -idmsux:X) X, as a non-capture group with the given flag I dmsu X on-off (? =X) X, through zero width positive lookahead
  • (? ! X) X, through negative lookahead of zero width
  • (? <=X) X, through zero width positive lookbehind
  • (?
  • (? >X) X as a separate non-capture group

Three cases,

1. Character matching

"a".matches(".")
Copy the code
  • Results: true,
  • Description:. Matches any character

"a".matches("[abc]")
Copy the code
  • Results: true,
  • Description: Any character containing ABC is matched. The default value is matched once

"a".matches("[^abc]")
Copy the code
  • Results: false
  • Description: Any character except a, B or C (negative)

"A".matches("[a-zA-Z]")
Copy the code
  • Results: true,
  • Description: A to Z or A to Z, letters at both ends included (range)

"A".matches("[a-z]|[A-Z]")
Copy the code
  • Results: true,
  • Description: A to Z or A to Z, letters at both ends included (range)

"A".matches("[a-z(A-Z)]")
Copy the code
  • Results: true,
  • Description: A-z, A-z, matching range same, parentheses are capture groups

"R".matches("[A-Z&&(RFG)]")
Copy the code
  • Results: true,
  • Description: Matches the intersection of A-Z and RFG

"a_8".matches("\\w{3}")
Copy the code
  • Results: true,
  • Description: \ W word character is equivalent to [A-za-z_0-9], {3} matches three times

"\ \".matches("\ \ \ \")
Copy the code
  • Results: true,
  • Description: \ stands for a \

"hello sir".matches("h.*")
Copy the code
  • Results: true,
  • Description:. Any character, * matches zero to more than one time

"hello sir".matches(".*ir$")
Copy the code
  • Results: true,
  • Description:.* Matches any character IR $determines the end of the matching line

"hello sir".matches("^ h [a-z] {1, 3} o \ \ b. *")
Copy the code
  • Results: true,
  • Description: ^h matches the beginning, [a-z]{1,3}o matches a-z 1 to 3 times followed by the letter O, \b does not match any of these word-delimiter characters, it only matches one position. It matches the position after o.

"hellosir".matches("^ h [a-z] {1, 3} o \ \ b. *")
Copy the code
  • Results: false
  • Description: o followed by s, which is a letter, not a space, \b does not match the o boundary of the word.

" \n".matches("^[\\s&&[^\\n]]*\\n$")
Copy the code
  • Results: true,
  • Description: Matches start with a space^[\\s&&[^\\n]], and cannot be a newline character, must end with a newline\\n$

System.out.println("java".matches("(? i)JAVA"));
Copy the code
  • Results: true,
  • Description: (? I) This in a non-capture group means ignoring case

2. Pattern matching

2.1 Verifying matches

Pattern p = Pattern.compile("[a-z]{3,}");
Matcher m = p.matcher("fgha");
System.out.println(m.matches()); // true to match three or more characters
Copy the code
  • Results: true,
  • Description: Pattern works with Matcher. The Matcher class provides grouping support for regular expressions, as well as multiple matching support for regular expressions. Matches (String regex,CharSequence input) matches only Pattern. Matches (String regex,CharSequence input)

2.2 Matching Function

Pattern p = Pattern.compile("\ \ d {3, 5}");
Matcher m = p.matcher("The 123-4536-8978 9-000");
System.out.println(m.matches());
m.reset();Matches (matches) matches (matches) matches (matches) matches (matches
System.out.println(m.find());
System.out.println(m.start() + "-" + m.end());  // Print the first position when you find it (you must find it to print)
System.out.println(m.find());
System.out.println(m.start() + "-" + m.end()); // Print the first position when you find it (you must find it to print)
System.out.println(m.find());
System.out.println(m.start() + "-" + m.end()); // Print the first position when you find it (you must find it to print)
System.out.println(m.find());
System.out.println(m.lookingAt());              // The search is always on the head
Copy the code

The test results

false
true
0-3
true
4-8
true
9-14
true
true
Copy the code
  • M.matches (), is full matching
  • Matches () : matches (matches) matches (matches); matches (matches) matches (matches
  • M.bind (), looking for a match
  • M.start (), matching string, starting position
  • M.edd (), matching string, end position

2.3 Matching Common Replacement

Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("java_Java_jAva_jAVa_IloveJava");
System.out.println(m.replaceAll("JAVA"));
Copy the code
  • Results: JAVA_JAVA_JAVA_JAVA_IloveJAVA
  • Description: All matched lowercase letters Java and Java are matched as uppercase letters

2.4 Matching Logical Replacement

Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("java_Java_jAva_jAVa_IloveJava fdasfas");
StringBuffer sb = new StringBuffer();
int i = 0;
while (m.find()) {
    i++;
    if (i % 2= =0) {
        m.appendReplacement(sb, "java");
    } else {
        m.appendReplacement(sb, "JAVA");
    }
}
m.appendTail(sb);
System.out.println(sb);
Copy the code
  • Result: java_java_java_iloveJava fdasfas
  • Description: According to the program logici % 2To perform odd and even number replacement matching

2.4 Group Matching

Pattern p = Pattern.compile("(\ \ d {3, 5}) ([a-z] {2})");
Matcher m = p.matcher("123bb_78987dd_090po");
while(m.find()){
    System.out.println(m.group(1));
}
Copy the code
  • Results:

    123
    78987
    090
    
    Process finished with exit code 0
    Copy the code
  • Description: group parentheses only take a group of numbers, grop parentheses inside the 0 group is the whole, the first group is the first left parentheses, the second group is the second left parentheses

2.5 Greedy match and ungreedy match

Pattern p = Pattern.compile("(., 10} {3?) [0-9]. "");
Matcher m = p.matcher("aaaa5dddd8");
while (m.find()) {
    System.out.println(m.start() + "-" + m.end());
}
Copy the code
  • Results:

    0-5
    5-10
    
    Process finished with exit code 0
    Copy the code
  • Description:.{3,10} with no question mark is greedy match will accompany the longest, if {3,10}? Add? The number is the least number of lazy matches, so start with three. If (m.finish)(){m.start()+”-“+m.end()} then match the first one

2.6 Common Capture

Pattern p = Pattern.compile(". {3}");
Matcher m = p.matcher("ab4dd5");
while(m.find()){
    System.out.println(m.group());
}
Copy the code
  • Results:

    ab4
    5-10
    
    Process finished with exit code 0
    Copy the code
  • Description: match three arbitrary characters at a time, output with m.group().

2.7 Non-capture group (? =a)

 Pattern p = Pattern.compile("{3} (? =a)");           
 Matcher m = p.matcher("ab4add5");
 while (m.find()) {
     System.out.println("Can't be followed by an A:" + m.group());
 }
Copy the code
  • Results:It cannot be followed by a: AB4
  • Description: (? =a) this is a non-capture group, the last one is a and it doesn’t take out the A!! (? =a) This would be different if I wrote it out front

Pattern p = Pattern.compile("(? ! a).{3}");           
Matcher m = p.matcher("abbsab89");
while (m.find()) {
    System.out.println("Can't have a in front of it:" + m.group());
}
Copy the code
  • Results: BBS, B89, b89
  • Description: (? ! A) BBS, b89, BBS, b89, BBS, b89

2.8 Remove >< sign matching

Pattern p = Pattern.compile("(? ! + (? >). = <)");
Matcher m = p.matcher("> < span style = "max-width: 100%;);
while (m.find()) {
    System.out.println(m.group());
}
Copy the code
  • Result: Little Fu Ge
  • Description: Generally can match the content information inside the special string in the web page.

2.9 Forward Reference

Pattern p = Pattern.compile("(\\d\\d)\\1");
Matcher m = p.matcher("1212");
System.out.println(m.matches());
Copy the code
  • Results: true,
  • Description: 1 is the forward reference, 12 is the first match, the next match 12 is the same as before, so it is true

Four,

Re includes many symbols, types, matching range, number of matches, matching principles and so on, such as greed, exclusion, forward reference and so on, these use methods are not difficult, as long as according to the standard of the re you can combine the string content information you want to match and intercept.

Five, series recommendation

  • Hold the grass! You poisoned the code!
  • Why do programmers build wheels and get promotions and pay raises?
  • BATJTMD, large factory recruitment, are recruiting what kind of Java programmers?
  • Work 3 years, see what data can monthly salary 30K?
  • Math, how close is it to a programmer?