This article is participating in the “Digitalstar Project” to win a creative gift package and challenge the creative incentive money.

Author homepage: Java Li Yangyong resume template, learning materials, interview question bank, technical assistance [pay attention to me, all for you]

Regular expressions are often used in our daily development projects/email/phone/domain/IP etc.)

A string is a simple regular expression. For example, the Hello World regular expression matches the string “Hello World”.

The. (dot) is also a regular expression that matches any character such as “a” or “1”.

The following table lists some examples and descriptions of regular expressions:

Regular expression describe
this is text Matches the string “this is text”
this\s+is\s+text Notice in the string\s+. Match anything after the word “this”\s+You can match multiple Spaces, then the IS string, then the is string\s+Match multiple Spaces and then follow the text string. You can match this instance: this is text
^\d+(.\d+)? ^ Defines what starts \d+ to match one or more numbers? Setting the options in parentheses is optional. Match “.” Examples that can match: “5”, “1.5”, and “2.21”.

Java regular expressions are most similar to Perl’s.

The java.util.regex package mainly includes the following three classes:

  • The Pattern class:

    The Pattern object is a compiled representation of a regular expression. The Pattern class has no public constructor. To create a Pattern object, you must first call its public statically compiled method, which returns a Pattern object. The method takes a regular expression as its first argument.

  • The Matcher class:

    Matcher objects are engines that interpret and match input strings. Like the Pattern class, Matcher has no public constructor. You need to call Pattern’s matcher method to get a matcher object.

  • PatternSyntaxException:

    PatternSyntaxException is an optional exception class that represents a syntax error in a regular expression pattern. The following example uses the regular expression.runoob. To find if the string contains a runoob substring:

import java.util.regex.*;
class RegexExample1{
public static void main(String[] args){
  String content = "I am noob " + "from runoob.com."; 
  String pattern = ".*runoob.*"; 
  boolean isMatch = Pattern.matches(pattern, content);
  System.out.println("Does the string contain a 'runoob' substring? "+ isMatch); }}Copy the code

Does the final printed string contain the ‘runoob’ substring? true

Regular expression syntax

In other languages, \ means: I want to insert a plain (literal) backslash into the regular expression, please don’t give it any special meaning.

In Java, \ means: I’m inserting a regular expression backslash, so the character after it has special meaning. So, in other languages (such as Perl), a single backslash \ is sufficient to escape, whereas in Java regular expressions need two backslashes to be resolved to escape in other languages. It is also easy to understand that in Java regular expressions, two \ represent one \ in other languages, which is why a regular expression representing a digit is \d and a regular backslash is \.

System.out.print("\"); // Print system.out. print("\\"); // Output is \Copy the code
character instructions
\ Marks the next character as a special character, text, backreference, or octal escape. For example, n matches the character n. \n Matches a newline character. Sequence \\ match \, \(match (.
^ Matches the beginning of the input string. If you set it upRegExpThe object’sMultilineProperty, ^ will also match the position after “\n” or “\r”.
$ Matches the position at the end of the input string. If you set it upRegExpThe object’sMultilineProperty, $will also match the position before “\n” or “\r”.
* Matches the preceding character or subexpression zero or more times. For example, zo* matches “z” and “zoo”. * equivalent to {0,}.
+ Matches the preceding character or subexpression one or more times. For example, “zo+” matches “zo” and “zoo”, but not “z”. + is equivalent to {1,}.
? Matches the preceding character or subexpression zero or once. For example, “do (es)?” Matches “do” in “do” or “does”. ? Equivalent to {0,1}.
{n} nNon-negative integers. matchnTimes. For example, “O {2}” does not match the” O “in “Bob”, but does match the two” O “in “food”.
{n,} nNon-negative integers. At least matchnTimes. For example, “o{2,}” does not match the “O” in “Bob”, but all the O’s in “foooood”. O {1,}” is equivalent to “o+”. O {0,}” is equivalent to “o*”.
{n.m} m 和 nNon-negative integers, wheren< =m. Match at leastnAt most time,mTimes. For example, “o{1,3}” matches the first three o’s in “fooooood”. ‘o{0,1}’ is equivalent to ‘o? ‘. Note: You cannot insert Spaces between commas and numbers.
? When this character is followed by any other qualifier (*, +,? , {n}, {n,}, {n.m}) after, the matching pattern is “non-greedy”. The non-greedy pattern matches the shortest possible searched string, while the default greedy pattern matches the longest possible searched string. For example, in the string “oooo”, “o+?” Matches only a single “O”, while “o+” matches all “OS”.
. Matches any single character except “\r\n”. To match any character including “\r\n”, use a pattern such as “[\s\ s]”.
(pattern) matchingpatternAnd captures the subexpression of the match. You can use
0 0…
9
Property retrieves the captured match from the result match collection. To match the parenthesis character (), use “(” or “)”.
(? :pattern) matchingpatternBut a subexpression that does not capture the match, that is, it is a non-capture match and does not store the match for later use. This is useful for using the “or” character (
(? =pattern) A subexpression that performs a forward prediction-first search and matches in a matchpatternThe starting point of the string. It is a non-capture match, that is, a match that cannot be captured for later use. For example, ‘Windows (? = 95
(? !pattern) A subexpression that performs a reverse predictive first search that matches and is not in a matchpatternSearch string for the starting point of the string. It is a non-capture match, that is, a match that cannot be captured for later use. For example, ‘Windows (? ! 95
x y
[xyz] Character set. Matches any character contained. For example, “[ABC]” matches “A” in” plain”.
[^xyz] Reverse character set. Matches any characters that are not included. For example, “[^ ABC]” matches “P”, “L”, “I”, “n” in” plain”.
[a-z] Character range. Matches any character in the specified range. For example, “[a-z]” matches any lowercase letter in the range “A” through “z”.
[^a-z] Reverse range character. Matches any character that is not in the specified range. For example, “[^a-z]” matches any character that is not in the range “a” through “z”.
\b Matches a word boundary, the position between a word and a space. For example, “er\b” matches the “er” in “never”, but not the “er” in “verb”.
\B Non-word boundary match. Er \B” matches the “er” in “verb”, but not the “er” in “never”.
\cx matchingxIndicates the control character. For example, \cM matches control-m or carriage return.xMust be between A-Z or A-Z. If not, assume that c is the “C” character itself.
\d Numeric character match. Equivalent to [0-9].
\D Non-numeric character match. Equivalent to [^0-9].
\f The feed character matches. Equivalent to \x0c and \cL.
\n Newline matches. Equivalent to \x0a and \cJ.
\r Matches a carriage return. Equivalent to \x0d and \cM.
\s Matches any whitespace character, including Spaces, tabs, page feeds, and so on. Equivalent to [\f\n\r\t\v].
\S Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v].
\t TAB match. Equivalent to \x09 and \cI.
\v Vertical TAB matching. Equivalent to \x0b and \cK.
\w Matches any word-like character, including underscores. Is equivalent to “[A-zA-z0-9_]”.
\W Matches any non-word character. Is equivalent to “[^ a-za-z0-9_]”.
\xn matchingnAnd here’snIs a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, “\x41” matches “A”. \x041″ is equivalent to “\x04″&”1”. ASCII code is allowed in regular expressions.
*num* matchingnumAnd here’snumIt’s a positive integer. A backreference to the capture match. For example, “(.). \1” matches two consecutive identical characters.
*n* Identifies an octal escape code or backreference. If star n star has at least onenA capture subexpression, thennIt’s a backreference. Otherwise, ifnOctal (0-7), thennIs an octal escape code.
*nm* Identifies an octal escape code or backreference. If *nm* at least in front ofnmA capture subexpression, thennmIt’s a backreference. If *nm* at least in front ofnCapture, thennIt’s a backreference, followed by a characterm. If neither of the previous cases exists, *nm* matches the octal valuenm, includingn 和 mIs an octal number (0-7).
\nml whennIs octal (0-3),m 和 lIf it is an octal number (0-7), an octal escape code is matchednml.
\un matchingn, includingnIs a Unicode character represented by four hexadecimal numbers. For example, \u00A9 matches the copyright symbol (©).

Note: According to the Java Language Specification, backslashes in strings in Java source code are interpreted as Unicode escapes or other character escapes. So you must use two backslashes in the string literal to indicate that the regular expression is protected from being interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal “\b” matches a single backspace character, while “\b” matches word boundaries. The string literal “(hello)” is illegal and will result in a compile-time error; To match a string (hello), you must use the string literal “\(hello)”.

Author homepage: Java Li Yangyong resume template, learning materials, interview question bank, technical assistance [pay attention to me, all for you]

Well, that’s it for today. I’m Obama. See you next time

Clocked articles updated 80/100 days