How much do you know about Java regular expressions?

This is the 24th day of my participation in the August Text Challenge.More challenges in August

preface

Regular expressions are commonly used for string matching, string lookup, and string substitution. Don’t underestimate its role, flexible use of regular expression processing string in work and learning can greatly improve efficiency, the joy of programming is so simple.

Here’s a step-by-step guide to using regular expressions.

Simple introduction`.`

package test;

public class Test01 {

    public static void main(String[] args) {
        // String ABC matches the regular expression "..." ", where "." indicates a character
        / / "..." Represents three characters
        System.out.println("abc".matches("..."));

        System.out.println("abcd".matches("...")); }}Copy the code

Output result:

true
false
Copy the code

The String class has a matches(String regex) method, which returns a Boolean value that tells the String whether it matches the given regular expression.

In this example, the regular expression given is… , where each. Represents one character, and the entire regular expression means three characters. Obviously, the result is true for ABC and false for abcd.

Support for regular expressions in Java

There are two classes for regular expressions in the java.util.regex package, one is Matcher and the other is Pattern.

A typical use of these two classes is given in the Official Java documentation as follows:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test02 {

    public static void main(String[] args) {
        //[a-z] indicates any character from a to Z. {3} indicates three characters. It means that a string of 3 characters is matched and each character belongs to a to Z
        Pattern p = Pattern.compile("[a-z]{3}");
        Matcher m1 = p.matcher("abc"); System.out.println(m2.matches()); }}Copy the code

Output: true

Pattern can be understood as a Pattern, and the string needs to match a Pattern. For example, in Test02, the pattern we define is a string of length 3, where each character must be one of a to Z.

We see that the compile method in the Pattern class is used to create the Pattern object, that is, we compile the regular expression we pass in to get a Pattern object. The compiled schema object, on the other hand, makes regular expression usage much more efficient and, as a constant, can safely be used concurrently by multiple threads.

Matcher can be understood as the result of a pattern matching a string. A string matching a pattern can produce many results, which will be explained in a later example.

Finally, when we call m.matches(), we return the full string matching the pattern.

The three lines above can be reduced to one line:

System.out.println("abc".matches("[a-z]{3}"));
Copy the code

However, if a regular expression needs to be matched repeatedly, this is less efficient.

Number of matches symbol

symbol	The number of
*	Zero or more times
+	1 or more times
?	Zero or one
{n}	Just n
{n,m}	N ~ m times
{n,}	At least n times

Code examples:

package test;

public class Test03 {

    private static void p(Object o){
        System.out.println(o);
    }
    
    public static void main(String[] args) {
        // "X*" represents zero or more X's
        p("aaaa".matches("a*"));
        p("".matches("a*"));
        // "X+" stands for one or more X
        p("aaaa".matches("a+"));
        // "X?" It stands for zero or one X
        p("a".matches("a?"));
        \\d A digit: [0-9] // \\d A digit: [0-9
        p("2345".matches("\ \ d {2, 5}"));
        // \\. Used to match "."
        p("192.168.0.123".matches("\ \ d {1, 3} \ \ \ \ d {1, 3} \ \ \ \ d {1, 3} \ \ \ \ d {1, 3}"));
        // [0-2] must be a number ranging from 0 to 2
        p("192".matches("[2-0] [0-9] [0-9]." ")); }}Copy the code

Output: all true.

The scope of`[]`

[] is used to describe a range of characters. Here are some examples:

package test;

public class Test04 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        //[ABC] refers to one of the letters in ABC
        p("a".matches("[abc]"));
        //[^ ABC] indicates characters other than ABC
        p("1".matches("[^abc]"));
        // A to z or a to z. The following three characters can be written as or
        p("A".matches("[a-zA-Z]"));
        p("A".matches("[a-z|A-Z]"));
        p("A".matches("[a-z[A-Z]]"));
        //[a-z &&[REQ]] indicates the characters in A to Z that belong to REQ
        p("R".matches("[A-Z&&[REQ]]")); }}Copy the code

Output: all true.

\s \w \d \S \W \D

About the \

In Java strings, special characters must be escaped by preceded by \.

For example, consider the string “Class, hand in your homework!” the teacher shouted. “. If we don’t have an escape character, the opening double quotation mark should end by saying :” Here, but we need to use double quotation marks in our string, so we need to use an escape character.

Use the escape character after the string is “teacher loudly :\” class, hand in your homework! \”” so that our original meaning can be correctly identified.

Similarly, if we want to use \ in a string, we should also add \ before it, so it is represented as “\\” in the string.

So how do you express matching \ in a regular expression? The answer is \\\\.

Let’s consider separately: since the expression \ in the regular expression also needs to be escaped, the preceding \\ represents the escape character \ in the regular expression, and the following \\ represents the \ itself in the regular expression, which together represents \ in the regular expression.

Let’s start with a code example:

package test;

public class Test05 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        // \s{4} represents four whitespace characters
        p(" \n\r\t".matches("\\s{4}"));
        // \S indicates a non-blank character
        p("a".matches("\\S"));
        // \w{3} represents alphanumeric letters and underscores
        p("a_8".matches("\\w{3}"));
        p("abc888&^%".matches("[a-z] {1, 3} \ \ d + [% ^ & *] +"));
        / / match \
        p("\ \".matches("\ \ \ \")); }}Copy the code

symbol	said
\d	[0-9] number
\D	[^ 0-9] non-numeric
\s	[\ t \ n \ r \] f Spaces
\S	T \ n \ r \ [^ \] f the Spaces
\w	[0-9A-z_A – Z] Letters and underscores
\W	[^ 0-9A – Z_a – z] Non-numeric letters and underscores

Boundary processing`^`

^ in brackets means inverse [^], if not the beginning of the string.

Code examples:

package test;

public class Test06 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        /** ** ^ The beginning of a line * $The end of a line * \b a word boundary */
        p("hello sir".matches("^h.*"));
        p("hello sir".matches(".*r$"));
        p("hello sir".matches("^ h [a-z] {1, 3} o \ \ b. *"));
        p("hellosir".matches("^ h [a-z] {1, 3} o \ \ b. *")); }}Copy the code

Output result:

true
true
true
false
Copy the code

The Matcher class

The matches() method matches the entire string to the template.
Find () matches from the current position. If the string is passed in first, the current position is the beginning of the string, as can be seen in the following code example
The lookingAt() method matches from the beginning of the string.

Code examples:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test07 {
    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("\ \ d {3, 5}");
        String s = "123-34345-234-00";
        Matcher m = pattern.matcher(s);

        Matches () matches the entire string.
        p(m.matches());
        // If the result is false, it is obvious that matching 3~5 digits will fail at -

        // Then demonstrate find(), using the reset() method to set the current position to the beginning of the string
        m.reset();
        p(m.find());//true Matches 123 successfully
        p(m.find());//true Matches 34345 successfully
        p(m.find());//true Matches 234 successfully
        p(m.find());//false Fails to match 00

        Matches () does not use reset() to change the current position
        m.reset();/ / to reset
        p(m.matches());//false fails to match the entire string, the current position goes to -
        p(m.find());// true Matches 34345 successfully
        p(m.find());// true Matches 234 successfully
        p(m.find());// false matches the beginning edge of 00
        p(m.find());// false Nothing matches, fail

        // demo lookingAt(), start from the beginning
        p(m.lookingAt());//true find 123, success}}Copy the code

Start () is used to return the position where the match started if a match is successful,

End () is used to return the position after the matching end character.

Code examples:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test08 {

    private static void p(Object o) {
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("\ \ d {3, 5}");
        String s = "123-34345-234-00";
        Matcher m = pattern.matcher(s);

        p(m.find());//true Matches 123 successfully
        p("start: " + m.start() + " - end:" + m.end());
        p(m.find());//true Matches 34345 successfully
        p("start: " + m.start() + " - end:" + m.end());
        p(m.find());//true Matches 234 successfully
        p("start: " + m.start() + " - end:" + m.end());
        p(m.find());//false Fails to match 00
        try {
            p("start: " + m.start() + " - end:" + m.end());
        } catch (Exception e) {
            System.out.println("Error reported...");
        }
        p(m.lookingAt());
        p("start: " + m.start() + " - end:"+ m.end()); }}Copy the code

Output result:

True start: 0 -end :3 true start: 4 -end :9 true start: 10 -end :13 false Error... true start: 0 - end:3Copy the code

Substitution string

A method in the Matcher class, group(), returns the matched string.

Code example: Convert Java in a string to uppercase

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test09 {

    private static void p(Object o){
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern p = Pattern.compile("java");
        Matcher m = p.matcher("java I love Java and you");
        p(m.replaceAll("JAVA"));The //replaceAll() method replaces all matched strings}}Copy the code

Output result:

JAVA I love Java and you
Copy the code

Find and replace strings case insensitive

We will specify case insensitive when creating the template template.

public static void main(String[] args) {
    Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);// Specify case insensitive
    Matcher m = p.matcher("java I love Java and you");
    p(m.replaceAll("JAVA"));
}
Copy the code

Output result:

JAVA I love JAVA and you
Copy the code

Replaces the specified string found, case insensitive

This shows converting the odd-th string found to uppercase and the even-th string to lowercase.

This introduces a powerful method appendReplacement(StringBuffer SB, String replacement) from the Matcher class, which requires passing in a StringBuffer for String concatenation.

public static void main(String[] args) {
    Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher("java Java JAVA JAva I love Java and you ?");
    StringBuffer sb = new StringBuffer();
    int index = 1;
    while(m.find()){
        m.appendReplacement(sb, (index++ & 1) = =0 ? "java" : "JAVA");
        index++;
    }
    m.appendTail(sb);// Add the rest of the string
    p(sb);
}
Copy the code

Output result:

JAVA JAVA JAVA JAVA I love JAVA and you ?
Copy the code

grouping

Let’s start with an example:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test10 {

    private static void p(Object o) {
        System.out.println(o);
    }

    public static void main(String[] args) {
        Pattern p = Pattern.compile("\ \ d {3, 5} [a-z] {2}");
        String s = "005aa-856zx-1425kj-29";
        Matcher m = p.matcher(s);
        while(m.find()) { p(m.group()); }}}Copy the code

Output result:

005aa
856zx
1425kj
Copy the code

The regular expression “\\d{3,5}[a-z]{2}” indicates three to five digits followed by two letters, and each matching string is printed.

What if I wanted to print the number in each matching string?

The grouping mechanism helps us group in regular expressions. (\\d{3,5})([a-z]{2})

Then pass in the group number when the m.group(int Group) method is called.

Note: The group number starts at 0, which represents the entire regular expression. After 0, each left parenthesis corresponds to a group in the regular expression from left to right. In this expression, group 1 is a number and group 2 is a letter.

public static void main(String[] args) {
    Pattern p = Pattern.compile("(\ \ d {3, 5}) ([a-z] {2})");// The regular expression contains three to five digits followed by two letters
    String s = "005aa-856zx-1425kj-29";
    Matcher m = p.matcher(s);
    while(m.find()){
        p(m.group(1)); }}Copy the code

Output result:

005
856
1425
Copy the code

finishing

[\u4e00-\u9fa5]
Match double-byte characters (including Chinese characters) : [^\x00-\ XFF]
Matching air lines of the regular expression: \ n \ |] [\ s * r
Regular expression matching HTML tags: / < > (. *) * < 1 > \ / \ | < (. *) \ / > /
Leading and trailing Spaces matching of regular expressions: (^ \ s *) | (\ s * $)
Regular expressions that match IP addresses: /(\d+)\.(\d+)\.(\d+)\.(\d+)/g //
Matches the Email address of the regular expression: \ w + (\ w + / – +.]) * @ \ w + ([-] \ w +) * \ \ w + ([-] \ w +) *
A regular expression that matches a URL URL: http://(/[\w-]+\.) +[\w-]+(/[\w- ./?%&=]*)?
SQL statements: ^ (select | drop | delete | create | update | insert). * $
Non-negative integer: ^\d+$
Positive integer: ^[0-9]*[1-9][0-9]*$
A positive integer: ^ ((\ d +) | (0 +)) $
Negative integer: ^-[0-9]*[1-9][0-9]*$
Integer: ^ -? \d+$
Non-negative floating point: ^\d+(\.d +)? $
Are floating point Numbers: ^ ((0-9) + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \ [0-9] +) | ([0-9] * [1-9] [0-9] *)) $
Non-positive float: ^((-\d+\.d +)?) | (0 + (\. 0 +)? ) $
Negative float: ^(-((positive float regular)))$
English character string: ^[A-za-z]+$
Uppercase string: ^[a-z]+$
Lowercase string: ^[a-z]+$
English digit string: ^[A-zA-z0-9]+$
Alphanumeric underlined string: ^\w+$
E-mail address: ^[\w-]+(\.[\ W -]+)*@[\ W -]+(\.[\w-]+)+$
URL: ^ [a zA – Z] + : / / (\ w +) (\ w +) * (\. (\ w + (\ w +) *)) * (\? \s*)? $or: ^ HTTP: / / / / / A – Za – z0-9 + \. [A – Za – z0-9] + [\ / = \? % \ – & _ ~ @ \ [\] ‘: +!] * ([^ < > \ \ “”]) * $
Zip code: ^[1-9]\d{5}$
English: ^ + $[\ u0391 – \ uFFE5]
Telephone number: ^ ((\ \ d {2, 3} \)) | (\ d {3} \ -))? (\ \ d (0 {2, 3} \) | 0 \ d {2, 3} -)? [1-9] \ d {6, 7} (\ \ d {1, 4})? $
Mobile phone number: ^ ((\ (\ d {2, 3} \)) | (\ d {3} \ -))? 13\d{9}$
Double-byte characters (including Chinese characters) : ^\x00-\ XFF
Match the leading and trailing Spaces: (^ \ s *) | (\ s * $) (trim like vbscript function)
Matching HTML tags: < > (. *). * < 1 > \ / \ | < \ / > (. *)
Matching line empty: \ n \ |] [\ s * r
To extract the information of the network link: (h) (r) | r | h | e (e) (f) | f * = * (” | “)? (\w|\\|\/|\.) + (‘ | | | * “>)?
To extract the information of the email address: \ w + (\ w + / – +.]) * @ \ w + ([-] \ w +) * \ \ w + ([-] \ w +) *
Extract the image links in the message (s) (r) | r | s | c (c) * = * (” | “)? (\w|\\|\/|\.) + (‘ | | | * “>)?
Extract the IP address in the message :(\d+)\.(\d+)\.(\d+)\.(\d+)
Chinese mobile phone number in the extracted information: 86)*0*13\d{9}
To extract the information of China’s fixed telephone number: (\ (\ d {3, 4} \ | \ d {3, 4} – | \ s)? \d{8}
To extract the information of Chinese telephone number (including mobile and fixed phone) : (\ (\ d {3, 4} \ | \ d {3, 4} – | \ s)? \ d {7, 14}
China zip code in extract information: [1-9]{1}(\d+){5}
Extract floating point numbers (i.e. decimals) from information :(-? \d*)\.? \d+
Extract any number from the message :(-? \d*)(\.\d+)?
(\d+)\.(\d+)\.(\d+)\.(\d+)
Area code: /^0\d{2,3}$/
^[1-9]*[1-9][0-9]*$
Account (letter, allow 5-16 bytes, allow alphanumeric underlined) : ^ [a zA – Z] [a zA – Z0-9 _] {4, 15} $
Chinese, English, Numbers and underscores: ^[\ U4E00 -\ U9FA5_A – ZA-z0-9]+$

conclusion

The above is a summary and usage instructions for regular expressions. May regular expressions bring you a more pleasant programming experience.

At the end

I am a coder who is being beaten and still trying to move on. If this article is helpful to you, remember to like and follow yo, thanks!

How much do you know about Java regular expressions?

preface

Simple introduction.

Support for regular expressions in Java

Number of matches symbol

The scope of[]

\s \w \d \S \W \D

Boundary processing^

The Matcher class

Substitution string

Find and replace strings case insensitive

Replaces the specified string found, case insensitive

grouping

finishing

conclusion

At the end

Related Posts

Why is the capacity of HashMap a power of 2?

Golang channel Basics (I) | Go Theme month

Three conclusions about GO slices and arrays

Simple introduction`.`

The scope of`[]`

Boundary processing`^`