The reptile, he summed up the regular expression to use me!

Blog: bugstack.cn

Precipitation, share, grow, let yourself and others can gain something! 😄

One, foreword

Programming always pays off in practice!

Regular expressions are also called regular expressions. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions are often used to retrieve and replace text that conforms to a pattern (rule).

Regular engines can be divided into two main categories: DFA and NFA. Both engines have a long history (more than 20 years now) and there are many variations from both engines! POSIX was introduced to prevent unnecessary variations from continuing. In this way, the mainstream re engine can be divided into three categories: DFA, traditional NFA, and POSIX NFA.

Re is also a very interesting technology, but often do not know how to use these symbols in the actual use of programming, so summarized this article, convenient for all partners can be used as a tool article, convenient to deal with some need to use re technical content.

Second, the rules

1. Common symbols

X x
\ Backslash character
\0n Character n with octal value 0 (0 <= n <= 7)
\0nn Character with octal value 0nn (0 <= n <= 7)
\0mnn Character with octal value 0 MNN (0 <= m <= 3, 0 <= n <= 7)
\ XHH The character hh with the hexadecimal value 0x
\uhhhh The character HHHH with a hexadecimal value of 0x
\t 制表符 (‘\u0009’)
\n Newline character (‘\u000A’)
\r carriage return (‘\u000D’)
\f page feed (‘\u000C’)
\ A Alarm (Bell) symbol (‘\u0007’)
\e Escape character (‘\u001B’)

2. Alphabetic characters

[ABC] a, B or C (simple class)
[^ ABC] any character except a, B, or C (negative)
[a-za-z] a to Z or a to Z, inclusive (range)
[a-d[m-p]] a to D or m to p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^ BC]] a to z, except b and C: [ad-z] (minus)
[a-z&&[^m-p]] a to z, not m to p: [a-lq-z] (minus)

3. Predefined characters

. Any character (which may or may not match the line terminator)
\ D Digital: [0-9]
\D non-digit: [^0-9]
\s blank character: [\t\n\x0B\f\r]
\S Non-blank character: [^\ S]
\ W Word characters: [A-za-z_0-9]
\W non-word character: [^\ W]

4. The POSIX character

\p{Lower} Lowercase characters: [a-z]
\ P {Upper} Uppercase characters: [a-z]
\p{ASCII} All ASCII: [\x00-\x7F]
\p{Alpha} letter character: [\p{Lower}\p{Upper}]
\ P {Digit} Decimal Digit: [0-9]
\p{Alnum} alphanumeric characters: [\p{Alpha}\p{Digit}]
\p{Punct} Punctuation:!” # $% & ‘() *, +, -. / :; The < = >? ~ @ ^ _ ` [] {|}
\p{Graph} Visible characters: [\p{Alnum}\p{Punct}]
\p{Print} : [\p{Graph}\x20]
\p{Blank} space or TAB: [\t]
\p{Cntrl} Control character: [\x00-\x1F\x7F]
\ P {XDigit} Hexadecimal digit: [0-9a-fa-f]
\p{Space} Blank character: [\t\n\x0B\f\r]

5. Character classes

\ p {javaLowerCase} equivalent Java lang. Character. IsLowerCase ()
\ p {javaUpperCase} equivalent Java lang. Character. IsUpperCase ()
\ p {javaWhitespace} equivalent Java lang. Character. IsWhitespace ()
\ p {javaMirrored} equivalent Java lang. Character. IsMirrored ()

6. Classes for Unicode blocks and categories

\p{InGreek} Greek block (simple block) characters
\p{Lu} uppercase letters (simple category)
\ P {Sc} Currency symbol
\P{InGreek} all characters except Greek blocks (negative)
[\p{L}&&[^\p{Lu}]] all letters except capital letters (minus)

7. Boundary matcher

Beginning of ^ line
End of $line
\b Word boundary
\B Non-word boundary
\A The beginning of the input
\G The end of a match
\Z End of input, used only for the last terminator (if any)
\z End of input

8. Greedy

X? X, not once or not
X times X, zero or more times
X plus X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times, but no more than m times

9. We were Reluctant to do it

X?? X, not once or not
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times, but no more than m times

There is no future

X? Plus X, not once or not
X times plus X, zero or more times
X++ X, once or more
X{n}+ X, exactly n times
X{n,}+ X, at least n times
X{n,m}+ X, at least n times, but no more than m times

11. Logical operators

XY, X followed by Y
X | Y X or Y
(X) X, as the capture group

12. The Back reference

\n Any matching NTH capture group

Reference 13.

\ Nothing, but references the following characters
\Q Nothing, but references all characters up to \E
\E Nothing, but ends the quote from \Q

14. Special construction (non-capture)

(? :X) X, as a non-capture group
(? Idmsux-idmsux) Nothing, but will match the flag I dmsux on-off
(? Idmsux -idmsux:X) X, as a non-capture group with the given flag I dmsu X on-off (? =X) X, through zero width positive lookahead
(? ! X) X, through negative lookahead of zero width
(? <=X) X, through zero width positive lookbehind
(?
(? >X) X as a separate non-capture group

Three cases,

1. Character matching

"a".matches(".")
Copy the code

Results: true,
Description:. Matches any character

"a".matches("[abc]")
Copy the code

Results: true,
Description: Any character containing ABC is matched. The default value is matched once

"a".matches("[^abc]")
Copy the code

Results: false
Description: Any character except a, B or C (negative)

"A".matches("[a-zA-Z]")
Copy the code

Results: true,
Description: A to Z or A to Z, letters at both ends included (range)

"A".matches("[a-z]|[A-Z]")
Copy the code

Results: true,
Description: A to Z or A to Z, letters at both ends included (range)

"A".matches("[a-z(A-Z)]")
Copy the code

Results: true,
Description: A-z, A-z, matching range same, parentheses are capture groups

"R".matches("[A-Z&&(RFG)]")
Copy the code

Results: true,
Description: Matches the intersection of A-Z and RFG

"a_8".matches("\\w{3}")
Copy the code

Results: true,
Description: \ W word character is equivalent to [A-za-z_0-9], {3} matches three times

"\ \".matches("\ \ \ \")
Copy the code

Results: true,
Description: \ stands for a \

"hello sir".matches("h.*")
Copy the code

Results: true,
Description:. Any character, * matches zero to more than one time

"hello sir".matches(".*ir$")
Copy the code

Results: true,
Description:.* Matches any character IR $determines the end of the matching line

"hello sir".matches("^ h [a-z] {1, 3} o \ \ b. *")
Copy the code

Results: true,
Description: ^h matches the beginning, [a-z]{1,3}o matches a-z 1 to 3 times followed by the letter O, \b does not match any of these word-delimiter characters, it only matches one position. It matches the position after o.

"hellosir".matches("^ h [a-z] {1, 3} o \ \ b. *")
Copy the code

Results: false
Description: o followed by s, which is a letter, not a space, \b does not match the o boundary of the word.

" \n".matches("^[\\s&&[^\\n]]*\\n$")
Copy the code

Results: true,
Description: Matches start with a space^[\\s&&[^\\n]], and cannot be a newline character, must end with a newline\\n$

System.out.println("java".matches("(? i)JAVA"));
Copy the code

Results: true,
Description: (? I) This in a non-capture group means ignoring case

2. Pattern matching

2.1 Verifying matches

Pattern p = Pattern.compile("[a-z]{3,}");
Matcher m = p.matcher("fgha");
System.out.println(m.matches()); // true to match three or more characters
Copy the code

Results: true,
Description: Pattern works with Matcher. The Matcher class provides grouping support for regular expressions, as well as multiple matching support for regular expressions. Matches (String regex,CharSequence input) matches only Pattern. Matches (String regex,CharSequence input)

2.2 Matching Function

Pattern p = Pattern.compile("\ \ d {3, 5}");
Matcher m = p.matcher("The 123-4536-8978 9-000");
System.out.println(m.matches());
m.reset();Matches (matches) matches (matches) matches (matches) matches (matches
System.out.println(m.find());
System.out.println(m.start() + "-" + m.end());  // Print the first position when you find it (you must find it to print)
System.out.println(m.find());
System.out.println(m.start() + "-" + m.end()); // Print the first position when you find it (you must find it to print)
System.out.println(m.find());
System.out.println(m.start() + "-" + m.end()); // Print the first position when you find it (you must find it to print)
System.out.println(m.find());
System.out.println(m.lookingAt());              // The search is always on the head
Copy the code

The test results

false
true
0-3
true
4-8
true
9-14
true
true
Copy the code

M.matches (), is full matching
Matches () : matches (matches) matches (matches); matches (matches) matches (matches
M.bind (), looking for a match
M.start (), matching string, starting position
M.edd (), matching string, end position

2.3 Matching Common Replacement

Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("java_Java_jAva_jAVa_IloveJava");
System.out.println(m.replaceAll("JAVA"));
Copy the code

Results: JAVA_JAVA_JAVA_JAVA_IloveJAVA
Description: All matched lowercase letters Java and Java are matched as uppercase letters

2.4 Matching Logical Replacement

Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("java_Java_jAva_jAVa_IloveJava fdasfas");
StringBuffer sb = new StringBuffer();
int i = 0;
while (m.find()) {
    i++;
    if (i % 2= =0) {
        m.appendReplacement(sb, "java");
    } else {
        m.appendReplacement(sb, "JAVA");
    }
}
m.appendTail(sb);
System.out.println(sb);
Copy the code

Result: java_java_java_iloveJava fdasfas
Description: According to the program logici % 2To perform odd and even number replacement matching

2.4 Group Matching

Pattern p = Pattern.compile("(\ \ d {3, 5}) ([a-z] {2})");
Matcher m = p.matcher("123bb_78987dd_090po");
while(m.find()){
    System.out.println(m.group(1));
}
Copy the code

Results:

123
78987
090

Process finished with exit code 0
Copy the code

Description: group parentheses only take a group of numbers, grop parentheses inside the 0 group is the whole, the first group is the first left parentheses, the second group is the second left parentheses

2.5 Greedy match and ungreedy match

Pattern p = Pattern.compile("(., 10} {3?) [0-9]. "");
Matcher m = p.matcher("aaaa5dddd8");
while (m.find()) {
    System.out.println(m.start() + "-" + m.end());
}
Copy the code

Results:

0-5
5-10

Process finished with exit code 0
Copy the code

Description:.{3,10} with no question mark is greedy match will accompany the longest, if {3,10}? Add? The number is the least number of lazy matches, so start with three. If (m.finish)(){m.start()+”-“+m.end()} then match the first one

2.6 Common Capture

Pattern p = Pattern.compile(". {3}");
Matcher m = p.matcher("ab4dd5");
while(m.find()){
    System.out.println(m.group());
}
Copy the code

Results:

ab4
5-10

Process finished with exit code 0
Copy the code

Description: match three arbitrary characters at a time, output with m.group().

2.7 Non-capture group (? =a)

 Pattern p = Pattern.compile("{3} (? =a)");           
 Matcher m = p.matcher("ab4add5");
 while (m.find()) {
     System.out.println("Can't be followed by an A:" + m.group());
 }
Copy the code

Results:It cannot be followed by a: AB4
Description: (? =a) this is a non-capture group, the last one is a and it doesn’t take out the A!! (? =a) This would be different if I wrote it out front

Pattern p = Pattern.compile("(? ! a).{3}");           
Matcher m = p.matcher("abbsab89");
while (m.find()) {
    System.out.println("Can't have a in front of it:" + m.group());
}
Copy the code

Results: BBS, B89, b89
Description: (? ! A) BBS, b89, BBS, b89, BBS, b89

2.8 Remove >< sign matching

Pattern p = Pattern.compile("(? ! + (? >). = <)");
Matcher m = p.matcher("> < span style = "max-width: 100%;);
while (m.find()) {
    System.out.println(m.group());
}
Copy the code

Result: Little Fu Ge
Description: Generally can match the content information inside the special string in the web page.

2.9 Forward Reference

Pattern p = Pattern.compile("(\\d\\d)\\1");
Matcher m = p.matcher("1212");
System.out.println(m.matches());
Copy the code

Results: true,
Description: 1 is the forward reference, 12 is the first match, the next match 12 is the same as before, so it is true

Four,

Re includes many symbols, types, matching range, number of matches, matching principles and so on, such as greed, exclusion, forward reference and so on, these use methods are not difficult, as long as according to the standard of the re you can combine the string content information you want to match and intercept.

Five, series recommendation

Hold the grass! You poisoned the code!
Why do programmers build wheels and get promotions and pay raises?
BATJTMD, large factory recruitment, are recruiting what kind of Java programmers?
Work 3 years, see what data can monthly salary 30K?
Math, how close is it to a programmer?

The reptile, he summed up the regular expression to use me!

One, foreword

Second, the rules

1. Common symbols

2. Alphabetic characters

3. Predefined characters

4. The POSIX character

5. Character classes

6. Classes for Unicode blocks and categories

7. Boundary matcher

8. Greedy

9. We were Reluctant to do it

There is no future

11. Logical operators

12. The Back reference

Reference 13.

14. Special construction (non-capture)

Three cases,

1. Character matching

2. Pattern matching

2.1 Verifying matches

2.2 Matching Function

2.3 Matching Common Replacement

2.4 Matching Logical Replacement

2.4 Group Matching

2.5 Greedy match and ungreedy match

2.6 Common Capture

2.7 Non-capture group (? =a)

2.8 Remove >< sign matching

2.9 Forward Reference

Four,

Five, series recommendation

Related Posts

Spark 3.0 has been released with improved SQL, deprecated Python 2, better ANSI SQL compatibility, and significantly improved performance

Docker-swarm deployable Mongo sharding cluster

Three minutes of Algorithmic Practice – Four Solutions for The Oldest String without repeating Characters