Regular expressions define patterns for strings.
Regular expressions can be used to search, edit, or process text.
Regular expressions are not limited to one language, but there are subtle differences in each language.
Regular expression instances
A string is simply a regular expression. For example, the Hello World regular expression matches the “Hello World” string.
The. (dot) is also a regular expression that matches any character such as “a” or “1”.
The following table lists some examples and descriptions of regular expressions:
Regular expression | describe |
---|---|
this is text | Matches the string “this is text” |
this\s+is\s+text | Notice in the string\s+. Match anything after the word “this”\s+You can match multiple Spaces, then the IS string, then the is string\s+Match multiple Spaces and then follow the text string. You can match this instance: this is text |
^\d+(\.\d+)? | ^ Defines what starts \d+ to match one or more numbers? Set the options in parentheses to be optional \. Match “.” Instances that can be matched: “5”, “1.5” and “2.21”. |
Java regular expressions are most similar to Perl’s.
The java.util.regex package mainly includes the following three classes:
-
The Pattern class:
The Pattern object is a compiled representation of a regular expression. The Pattern class has no public constructor. To create a Pattern object, you must first call its public statically compiled method, which returns a Pattern object. The method takes a regular expression as its first argument.
-
The Matcher class:
Matcher objects are engines that interpret and match input strings. Like the Pattern class, Matcher has no public constructor. You need to call Pattern’s matcher method to get a matcher object.
-
PatternSyntaxException:
PatternSyntaxException is an optional exception class that represents a syntax error in a regular expression pattern.
The following example uses the regular expression.*runoob.* to find if the string contains a Runoob substring:
The instance
import java . util . regex .*; class RegexExample1 {
public static void main ( String args [ ] ) {
String content = ” I am noob ” + ” from runoob.com. ” ; String pattern = ” .*runoob.* ” ; boolean isMatch = Pattern . matches ( pattern , content ) ; System.out.println (” Does the string contain a ‘runoob’ substring? ” + isMatch ) ; }}
The example output is:
Does the string contain the ‘runoob’ substring? true
Capture group
A capture group is a method of treating multiple characters as a single unit, created by grouping the characters within parentheses.
For example, the regular expression (dog) creates a single group containing “d”, “O”, and “g”.
Capture groups are numbered by counting their open brackets from left to right. For example, in the expression ((A) (B (C))), there are four such groups:
- ((A)(B(C)))
- (A)
- (B(C))
- (C)
You can see how many groups an expression has by calling the groupCount method of a Matcher object. The groupCount method returns an int indicating that the matcher object currently has multiple capture groups.
There is also a special group(group(0)) that always represents the entire expression. This group is not included in the return value of groupCount.
The instance
The following example shows how to find a number string from a given string:
RegexMatches. Java file code:
import java . util . regex . Matcher ; import java . util . regex . Pattern ; public class RegexMatches {
public static void main ( String args [ ] ) {
String line = “This order was placed for QT3000! OK? “; String pattern = ” ( \\ D*)( \\ d+)(.*) ” ; R = pattern.compile (Pattern); Matcher m = r. matcher (line); if ( m . find ( ) ) {
System . out . println ( ” Found value: ” + m . group ( 0 ) ) ; System . out . println ( ” Found value: ” + m . group ( 1 ) ) ; System . out . println ( ” Found value: ” + m . group ( 2 ) ) ; System . out . println ( ” Found value: ” + m . group ( 3 ) ) ; } else {
System . out . println ( ” NO MATCH ” ) ; }}}
The compilation and running results of the above examples are as follows:
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
Copy the code
Regular expression syntax
In other languages, \\ means: I want to insert a plain (literal) backslash into the regular expression, please don’t give it any special meaning.
In Java, \\ means: I’m inserting a regular expression backslash, so the character after it has special meaning.
So, in other languages, a single backslash \ is sufficient to escape, whereas in regular expressions you need two backslashes to be resolved as escape in other languages. It is also easy to understand that in regular expressions, two \ represent one \ in other languages, which is why the regular expression for a digit is \\d, and a common backslash is \\\.
character | instructions |
---|---|
\ | Marks the next character as a special character, text, backreference, or octal escape. For example, “n” matches the character “n”. \n” matches a newline character. Sequence “\\\” match “\”, “\(” match “(“. |
^ | Matches the beginning of the input string. If you set it upRegExpThe object’sMultilineProperty, ^ will also match the position after “\n” or “\r”. |
$ | Matches the position at the end of the input string. If you set it upRegExpThe object’sMultilineProperty, $will also match the position before “\n” or “\r”. |
* | Matches the preceding character or subexpression zero or more times. For example, zo* matches “z” and “zoo”. * equivalent to {0,}. |
+ | Matches the preceding character or subexpression one or more times. For example, “zo+” matches “zo” and “zoo”, but not “z”. + is equivalent to {1,}. |
? | Matches the preceding character or subexpression zero or once. For example, “do (es)?” Matches “do” in “do” or “does”. ? Equivalent to {0,1}. |
{n} | nNon-negative integers. matchnTimes. For example, “O {2}” does not match the” O “in “Bob”, but does match the two” O “in “food”. |
{n,} | nNon-negative integers. At least matchnTimes. For example, “o{2,}” does not match the “O” in “Bob”, but all the O’s in “foooood”. O {1,}” is equivalent to “o+”. O {0,}” is equivalent to “o*”. |
{n.m} | M 和 nNon-negative integers, wheren< =m. Match at leastnAt most time,mTimes. For example, “o{1,3}” matches the first three o’s in “fooooood”. ‘o{0,1}’ is equivalent to ‘o? ‘. Note: You cannot insert Spaces between commas and numbers. |
? | When this character is followed by any other qualifier (*, +,? , {n}, {n,}, {n.m}) after, the matching pattern is “non-greedy”. The non-greedy pattern matches the shortest possible searched string, while the default greedy pattern matches the longest possible searched string. For example, in the string “oooo”, “o+?” Matches only a single “O”, while “o+” matches all “OS”. |
. | Matches any single character except “\r\n”. To match any character including “\r\n”, use a pattern such as “[\s\ s]”. |
(pattern) | matchingpatternAnd captures the subexpression of the match. You can use$0… $9Property retrieves the captured match from the result match collection. To match the parenthesis character (), use “\(” or “\)”. |
(? :pattern) | matchingpatternBut a subexpression that does not capture the match, that is, it is a non-capture match and does not store the match for later use. This with “or” character (|) portfolio model component is very useful. For example, ‘industr (? Industry: y | ies) is better than ‘| industries’ economic expression. |
(? =pattern) | A subexpression that performs a forward prediction-first search and matches in a matchpatternThe starting point of the string. It is a non-capture match, that is, a match that cannot be captured for later use. For example, ‘Windows (? = 95 NT | | 98 | 2000) “matching” Windows 2000 “in the” Windows “, but does not match the “Windows” in the “Windows 3.1”. Predictive preemption does not occupy characters, that is, after a match occurs, the search for the next match follows the previous match, not the character that makes up the predictive preemption. |
(? !pattern) | A subexpression that performs a reverse predictive first search that matches and is not in a matchpatternSearch string for the starting point of the string. It is a non-capture match, that is, a match that cannot be captured for later use. For example, ‘Windows (? ! 95 NT | | 98 | 2000) ‘match “Windows” in the “Windows 3.1”, but does not match the “Windows 2000” in the “Windows”. Predictive preemption does not occupy characters, that is, after a match occurs, the search for the next match follows the previous match, not the character that makes up the predictive preemption. |
x|y | matchingx 或 y. For example, ‘z | food matching “z” or “food”. ‘(z | f ood matching “zood” or “food”. |
[xyz] | Character set. Matches any character contained. For example, “[ABC]” matches “A” in” plain”. |
[^xyz] | Reverse character set. Matches any characters that are not included. For example, “[^ ABC]” matches “P”, “L”, “I”, “n” in” plain”. |
[a-z] | Character range. Matches any character in the specified range. For example, “[a-z]” matches any lowercase letter in the range “A” through “z”. |
[^a-z] | Reverse range character. Matches any character that is not in the specified range. For example, “[^a-z]” matches any character that is not in the range “a” through “z”. |
\b | Matches a word boundary, the position between a word and a space. For example, “er\b” matches the “er” in “never”, but not the “er” in “verb”. |
\B | Non-word boundary match. Er \B” matches the “er” in “verb”, but not the “er” in “never”. |
\cx | matchingxIndicates the control character. For example, \cM matches control-m or carriage return.xMust be between A-Z or A-Z. If not, assume that c is the “C” character itself. |
\d | Numeric character match. Equivalent to [0-9]. |
\D | Non-numeric character match. Equivalent to [^0-9]. |
\f | The feed character matches. Equivalent to \x0c and \cL. |
\n | Newline matches. Equivalent to \x0a and \cJ. |
\r | Matches a carriage return. Equivalent to \x0d and \cM. |
\s | Matches any whitespace character, including Spaces, tabs, page feeds, and so on. Equivalent to [\f\n\r\t\v]. |
\S | Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v]. |
\t | TAB match. Equivalent to \x09 and \cI. |
\v | Vertical TAB matching. Equivalent to \x0b and \cK. |
\w | Matches any word-like character, including underscores. Is equivalent to “[A-zA-z0-9_]”. |
\W | Matches any non-word character. Is equivalent to “[^ a-za-z0-9_]”. |
\xn | matchingnAnd here’snIs a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, “\x41” matches “A”. \x041″ is equivalent to “\x04″&”1”. ASCII code is allowed in regular expressions. |
\num | matchingnumAnd here’snumIt’s a positive integer. A backreference to the capture match. For example, “(.). \1” matches two consecutive identical characters. |
\n | Identifies an octal escape code or backreference. If you \nAt least in frontnA capture subexpression, thennIt’s a backreference. Otherwise, ifnOctal (0-7), thennIs an octal escape code. |
\nm | Identifies an octal escape code or backreference. If you \nmAt least in frontnmA capture subexpression, thennmIt’s a backreference. If you \nmAt least in frontnCapture, thennIt’s a backreference, followed by a characterm. If neither of the preceding conditions exists, then \nmMatches octal valuesnm, includingn 和 mIs an octal number (0-7). |
\nml | whennIs octal (0-3),m 和 lIf it is an octal number (0-7), an octal escape code is matchednml. |
\un | matchingn, includingnIs a Unicode character represented by four hexadecimal numbers. For example, \u00A9 matches the copyright symbol (©). |
According to the Java Language Specification, backslashes in strings of Java source code are interpreted as Unicode escapes or other character escapes. So you must use two backslashes in the string literal to indicate that the regular expression is protected from being interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal “\b” matches a single backspace character, while “\\b” matches word boundaries. The string literal “\(hello\)” is illegal and will cause a compile-time error; To match a string (hello), you must use the string literal “\(hello)”.
Common regular expressions |
---|
| | 字符 | 描述 | | :————– | :——————————————————————————————————————————————————————————————————————————————- | | 中文字符 | [\u4e00-\u9fa5] | | 双字节字符,包含汉字 | [^\x00-\xff] | | 增强版中文匹配,可用于姓名匹配 | ^[\u4e00-\u9fa5]+(·[\u4e00-\u9fa5]+) | | Email地址 | [\w!#%&’+/=?^_{\|}~-]+(?:\\.[\w!#$%&'*+/=?^_
{|}~-]+)@(?:\w?\.)+\w? | | 网址URL | [a-zA-z]+://[^\s] | | 国内电话号码 | \d{3}-\d{8}|\d{4}-\{7,8} | | QQ号码 | [1-9][0-9]{4,} | | 中国邮政编码 | [1-9]\d{5}(?!\d) | | 18位身份证号 | ^(\d{6})(\d{4})(\d{2})(\d{2})(\d{3})([0-9]|X)
| | 正整数 | ^[1-9]\d*
| | 非正整数 | ^-[1-9]\d*|0
| | 正浮点数 | ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*
| | 外网视频地址 | ([hH][tT]{2}[pP]|[hH][tT]{2}[pP][sS])+(:\/\/)+([\w-]+\.)+[\w-]+(/[\w- ./?%&=])+(?=mp4|rmvb|flv|avi)+(mp4|rmvb|flv|avi) | | img标签 | <img\\b[^>]\\bsrc\\b\\s*=\\s*(‘|\”)?([^’\”\n\r\f>]+(\.jpg|\.bmp|\.eps|\.gif|\.mif|\.miff|\.png|\.tif|\.tiff|\.svg|\.wmf|\.jpe|\.jpeg|\.dib|\.ico|\.tga|\.cut|\.pic)\\b)[^>]> | | script标签 | <script[^>]?>[\\s\\S]?<\/script> | | style标签 | <style[^>]?>[\\s\\S]?<\/style> | | html标签 | <[^>]+> | | 空格回车换行符 | \\s|\t|\r|\n | | 所有w标签 | <w[^>]?>[\\s\\S]?<\/w[^>]*?> | | 定义常见的html符 | |”|“ | | | | :——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————- | :- |
Methods of the Matcher class
Index method
The index method provides useful index values that indicate exactly where a match can be found in the input string:
The serial number | Methods and Instructions |
---|---|
1 | Public int start() returns the previously matched initial index. |
2 | Public int start(int group) returns the initial index of the subsequence captured by the given group during the previous matching operation |
3 | Public int end() returns the offset after the last matched character. |
4 | Public int end(int group) Returns the offset after the last character of the subsequence captured by the given group during the previous matching operation. |
The research methods
The research method checks the input string and returns a Boolean value indicating whether the pattern was found:
The serial number | Methods and Instructions |
---|---|
1 | Public Boolean lookingAt() attempts to match the pattern with an input sequence starting at the beginning of the region. |
2 | Public Boolean find() attempts to find the next subsequence of the input sequence that matches the pattern. |
3 | Public Boolean find(int start) resets this matcher and then tries to find the next subsequence of the input sequence starting at the specified index that matches the pattern. |
4 | Public Boolean matches() attempts to match the entire region to the pattern. |
Replace method
A substitution method is a method that replaces the text in the input string:
The serial number | Methods and Instructions |
---|---|
1 | Public Matcher appendReplacement(StringBuffer SB, String replacement) implements non-terminal add and replace steps. |
2 | Public StringBuffer appendTail(StringBuffer sb) Implements the terminal add and replace steps. |
3 | Public String replaceAll(String replacement) Each subsequence of the input sequence that the replacement pattern matches with the given replacement String. |
4 | Public String replaceFirst(String replacement) The first subsequence of the input sequence that the replacement pattern matches with the given replacement String. |
5 | Public static String quoteReplacement(String s) Returns the literal replacement String for the specified String. This method returns a string and works just like a literal string passed to the appendReplacement method of the Matcher class. |
Start and end methods
Here is an example of counting the number of occurrences of the word “cat” in the input string:
RegexMatches. Java file code:
import java . util . regex . Matcher ; import java . util . regex . Pattern ; public class RegexMatches {
private static final String REGEX = ” \\ bcat \\ b ” ; private static final String INPUT = ” cat cat cat cattie cat ” ; public static void main ( String args [ ] ) {
Pattern p = Pattern . compile ( REGEX ) ; Matcher m = p . matcher ( INPUT ) ; Int count = 0; while ( m . find ( ) ) {
count ++; System . out . println ( ” Match number ” + count ) ; System . out . println ( ” start(): ” + m . start ( ) ) ; System . out . println ( ” end(): ” + m . end ( ) ) ; }}}
The compilation and running results of the above examples are as follows:
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
Copy the code
You can see that this example uses word boundaries to ensure that the letters “C” “A” “T” are not just substrings of a longer word. It also provides some useful information about where the match occurs in the input string.
The Start method returns the initial index of the subsequence captured by the given group during the previous matching operation, and the end method increments the index of the last matching character.
Matches and lookingAt methods
The matches and lookingAt methods are both used to try to match an input sequence pattern. The difference is that matches requires the entire sequence to match, while lookingAt does not.
The lookingAt method does not need to match the entire sentence, but does need to match from the first character.
These two methods are often used at the beginning of an input string.
Let’s use the following example to illustrate this feature:
RegexMatches. Java file code:
import java . util . regex . Matcher ; import java . util . regex . Pattern ; public class RegexMatches {
private static final String REGEX = ” foo ” ; private static final String INPUT = ” fooooooooooooooooo ” ; private static final String INPUT2 = ” ooooofoooooooooooo ” ; private static Pattern pattern ; private static Matcher matcher ; private static Matcher matcher2 ; public static void main ( String args [ ] ) {
pattern = Pattern . compile ( REGEX ) ; matcher = pattern . matcher ( INPUT ) ; matcher2 = pattern . matcher ( INPUT2 ) ; System . out . println ( ” Current REGEX is: ” + REGEX ) ; System . out . println ( ” Current INPUT is: ” + INPUT ) ; System . out . println ( ” Current INPUT2 is: ” + INPUT2 ) ; System . out . println ( ” lookingAt(): ” + matcher . lookingAt ( ) ) ; System . out . println ( ” matches(): ” + matcher . matches ( ) ) ; System . out . println ( ” lookingAt(): ” + matcher2 . lookingAt ( ) ) ; }}
The compilation and running results of the above examples are as follows:
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
Current INPUT2 is: ooooofoooooooooooo
lookingAt(): true
matches(): false
lookingAt(): false
Copy the code
ReplaceFirst and replaceAll methods
The replaceFirst and replaceAll methods are used to replace text that matches the regular expression. The difference is that replaceFirst replaces the first match and replaceAll replaces all matches.
The following example illustrates this feature:
RegexMatches. Java file code:
import java . util . regex . Matcher ; import java . util . regex . Pattern ; public class RegexMatches {
private static String REGEX = ” dog ” ; private static String INPUT = ” The dog says meow. ” + ” All dogs say meow. ” ; private static String REPLACE = ” cat ” ; public static void main ( String [ ] args ) {
Pattern p = Pattern . compile ( REGEX ) ; // get a matcher object Matcher m = p . matcher ( INPUT ) ; INPUT = m . replaceAll ( REPLACE ) ; System . out . println ( INPUT ) ; }}
The compilation and running results of the above examples are as follows:
The cat says meow. All cats say meow.
Copy the code
The appendReplacement and appendTail methods
The Matcher class also provides appendReplacement and appendTail methods for text substitution:
Look at the following example to explain this feature:
RegexMatches. Java file code:
import java . util . regex . Matcher ; import java . util . regex . Pattern ; public class RegexMatches {
private static String REGEX = ” a*b ” ; private static String INPUT = ” aabfooaabfooabfoob ” ; private static String REPLACE = ” – ” ; public static void main ( String [ ] args ) {
Pattern p = Pattern . compile ( REGEX ) ; Matcher m = p. matcher (INPUT); StringBuffer sb = new StringBuffer ( ) ; while ( m . find ( ) ) {
m . appendReplacement ( sb , REPLACE ) ; } m . appendTail ( sb ) ; System . out . println ( sb . toString ( ) ) ; }}
The compilation and running results of the above examples are as follows:
-foo-foo-foo-
Copy the code
Methods of the PatternSyntaxException class
PatternSyntaxException is an optional exception class that indicates a syntax error in a regular expression pattern.
The PatternSyntaxException class provides the following methods to help us see what went wrong.
The serial number | Methods and Instructions |
---|---|
1 | Public String getDescription() gets the description of the error. |
2 | Public int getIndex() Gets the wrong index. |
3 | Public String getPattern() Gets the incorrect regular expression pattern. |
4 | Public String getMessage() returns a multi-line String containing a description of the syntax error and its index, the wrong regular expression pattern, and a visual indication of the wrong index in the pattern. |
Java date time
Java method
Notes on the list
-
Donnie’s little slave
861***[email protected]
A regular expression is a logical formula used to manipulate strings. It uses predefined characters and their combinations to form a “regular string”, which is used to express the filtering logic of strings.
Given a regular expression and another string, we can do the following:
- 1. Whether the given string conforms to the filtering logic of the regular expression (called “match”);
- 2. We can use regular expressions to get a specific part of a string that we want.
Regular expressions are characterized by:
- 1. Strong flexibility, logic and functionality;
- 2. Complex control of strings can be achieved quickly and in a very simple way.
- 3. It’s hard to understand for people who are new to it.
Note: Once the regular expression is written, there is no right or wrong, only true and false are returned
Verify THE QQ id, the requirements: must be 5 to 15 digits, cannot start with 0. Before the regular expression
public class regex { public static void main(String[] args) { checkQQ("0123134"); } public static void checkQQ(String qq) { int len = qq.length(); if(len>=5 && len <=15) { if(! qq.startsWith("0")) { try { long l = Long.parseLong(qq); System.out.println("qq:"+l); } catch (NumberFormatException e) {system.out.println (" invalid character "); }} else system.out.println (" can't start with 0 "); } else system.out.println ("QQ id length error "); }}Copy the code
After using the regular expression:
public class regex { public static void main(String[] args) { checkQQ2("0123134"); } public static void checkQQ2(String qq) {String reg = "[1-9][0-9]{4,14}"; System.out.println(qq.matches(reg)?" Legal QQ ":" illegal QQ "); }}Copy the code
Donnie’s little slave
Donnie’s little slave
861***[email protected]
2 months ago (11-21)
-
yimkong
136***[email protected]
Count the left parenthesis “(“, the number of brackets is the number of groups, you can reuse the rule that specified the parentheses in front:
Pattern p = pattern.compile ("(\\d{2})([a-z]{2,3})"); Matcher m =p.matcher("33aa-32sdy-29ssc"); while(m.find()) { System.out.println(m.group(2)); Pattern p = pattern.compile ("(\\d(\\d))\ 2"); Matcher matcher = p.matcher("211"); System.out.println(matcher.matches()); // Result: true // explanation: "\\2" refers to the matching value of the previous group 2Copy the code