This is the 24th day of my participation in the August Text Challenge.More challenges in August

The article directories

  • Regex regular expression
  • Relevant methods
    • Matches (regular expression)
    • ReplaceAll (regular expressions, substrings)
    • Split (regular expression)
  • Java. Util. Regex. The Pattern and Java. Util. Regex. Matcher
  • About greed and non-greed
  • Localizer ^ $\b \b
  • Substitutions in regular expressions
    • Replace the practice
    • Regular substitution is used in the editor
  • How does escape work
  • What exactly does some combination of symbols mean

Regex regular expression

Regular expression describes a pattern of string matching. It can be used to check whether a string contains a certain substring, replace the matched substring, or extract the substring matching a certain condition from a string.

Input format

Regular expression Matching string
k k
abc abc
[abc] […]. Character set: a/b/c
[abc][123] a1/a2/a3/b1/b2/b3/c1/c2/c3
[a-z] a/b/c/d… z
[a-zA-Z0-9] a/A/z/Z/0/9..
[^a-zA-Z] ^ Indicates exclusion of a range: exclusion of English letters
[\u4e00-\u9fa5] Chinese range
\d Digital [0-9]
\D Exclude numbers [^0-9]
\w Word characters and underscores [A-zA-Z_0-9]
\W Exclude word characters and underscores [^ A-za-z_0-9]
\s Whitespace characters: Enter/line feed/TAB /……
\S Whitespace exclusion
. Any character
[abc]? ? Indicates zero or one
[abc]? [123] 1/2/3/a1/a2…
[abc]* * indicates 0 to more than one
[abc]*[123] 1/2/3/a1/aaabccba1/…
[abc]+ Plus means 1 to more than one
[abc]+[123] a1/ab2/abcbcac1/…
[abc]{3} {} curly braces indicate fixed numbers: AAA/bCA/CBC /…
[ABC] {3, 5} 3-5: acc/abca/abcab /…
[abc]{3,} 3 to more: ABC/ABCCCCCAAAA /…
| Or, match one of the left and right expressions

For example: the qualifiers are * or + or? Or {n} or {n,} or {n,m}. Zo +, which can match zo and zooo, but cannot match zozo. The + sign indicates that the preceding character must appear at least once. If you want to match zozo, you write the regular expression (ZO)+ as a whole

Appearing after a range expression, it applies to the preceding single range expression, such as an expression that matches positive integers from 1 to 99: [1-9][0-9]? . [1-9] Set the first digit not to 0, [0-9]? Indicates that 0-9 does not occur, or occurs once.

Relevant methods

Matches (regular expression)

Determines whether the current string matches the regular expression

The id card number can be 15 digits (first-generation ID card) or 18 digits. The last digit of the 18 digits is the verification code, which can be 0-9 or X.

public class Test {
	public static void main(String[] args) {  
		System.out.println("Enter id number:");
		String s = new Scanner(System.in).nextLine();
		
		/ * * 123456789012345 * 123456789012345678 * 12345678901234567 * 12345678901234567 * * x x \ d {15} | \ d {and} [\ dxX] * need to escape the "\" * \ - > \ \ * /
		String regex = "\\d{15}|\\d{17}[\\dxX]";
		
		if(s.matches(regex)){
			System.out.println("Well formed");
		}else{
			System.out.println("Formatting error"); }}}Copy the code

Running results:

Chestnut: Match the landline phone

This is just an exercise, assuming the following landline phone formats

public class Test {
	public static void main(String[] args) {  
		System.out.println("Enter landline number:");
		String s = new Scanner(System.in).nextLine();
		
		/* * 123456 * 1234567 * 12345678 * (010)12345678 * (0102)12345678 * 010-123456 * 0102-1234567 * * (\ \ d {3, 4} - | \ \ (\ \ d {3, 4} \ \))? \ \ d {6, 8} * /
		String regex = "(\ \ d {3, 4} - | \ \ (\ \ d {3, 4} \ \))? \ \ d {6, 8}";
		
		if(s.matches(regex)){
			System.out.println("Well formed");
		}else{
			System.out.println("Formatting error"); }}}Copy the code

ReplaceAll (regular expressions, substrings)

Replace the found matching substring with the new substring

String regex = "store";
String s = "http://store.store.com";
System.out.println(s.replaceAll(regex, "www"));
System.out.println(s);
Copy the code

Run the result without changing the previous string

http://www.www.com
http://store.store.com
Copy the code

Replace and replaceAll can achieve the same effect. The difference is:

Replace: The parameters are target and replacement, which are the target object to replace and the new object

ReplaceAll: The arguments are regex and replacement. The second argument is the same. The first argument represents a regular expression, which means replaceAll supports regular expression replacement. Such as:

String str = "www.google.com";
System.out.print("Match successful return value :" );
System.out.println(str.replaceAll("(.*)google(.*)"."baidu" ));
Copy the code

Running results:

Matching success return value :baidu

ReplaceFirst: The argument is the same as replaceAll, except that it matches the first result

Split (regular expression)

Split the string with matching substrings

public class Test {
	public static void main(String[] args) {  
		System.out.println(Enter a list of keywords, separated by commas, semicolons, and Spaces.);
		String s = new Scanner(System.in).nextLine();
		
		String regex = "[;] +";
		
		String[] a = s.split(regex);
		for(int i=0;i<a.length;i++){
			System.out.println(a[i]);
		}
	}  
}
Copy the code

The results

Java. Util. Regex. The Pattern and Java. Util. Regex. Matcher

Pattern encapsulates regular expressions. Matcher encapsulates regular expressions and strings to match

Create instance Pattern p = pattern.compile (regular expression); Matcher m = p.m Matcher (string to match);

Find () looks back for the next matching substring. Returns a Boolean value indicating whether it was found

Find (int from) Searches backwards from the specified position

Group () extracts the substring just found

Start (), end() Specifies the start and end positions of the substring just found

Chestnut: Matches 3 to more numbers in a string

		System.out.println("Input:);
        String s = new Scanner(System.in).nextLine();

        //3 to multiple consecutive digits
        String regex = "\\d{3,}";

        Matcher m = Pattern.compile(regex)
                .matcher(s);

        // Keep looking backwards until false
        while (m.find()) {
            String s2 = m.group();
            int start = m.start();
            int end = m.end();
            System.out.println(start + "-" + end + ":" + s2);
        }
Copy the code

The output

Input: ABCD1234efg56Higk789 4-8:12317-20:789

About greed and non-greed

The * and + qualifiers are greedy because they match as many words as possible, with only one after them. You can achieve non-greedy or minimum matching.

For example, when using <.*>

String s = "

Chapter 1 - Regular expressions

"
; String regex = "< * >"; Matcher m = Pattern.compile(regex).matcher(s); while (m.find()) { String s2 = m.group(); System.out.println(s2); } Copy the code

The result of the match is

<H1>Chapter 1- Introduction to regular expressions </H1>Copy the code

If with the <. *? >, the result of the match is

<H1>
</H1>
Copy the code

If only the starting H1 tag is matched, the expression is <\\w+? >, the matching result is

<H1>
Copy the code

Through *, +, or? Put after the qualifier? The expression is converted from a “greedy” expression to a “non-greedy” expression or minimum match.

Example: source string: aa

test1

bb

test2

cc Regular expression 1:

.*

Matching result 1:

test1

bb

test2

Regular expression 2:

.*?

test1
test2

From the perspective of application only, we can think of greedy mode, which is to match as many matches as possible on the premise that the whole expression matches successfully, which is called “greedy”. The non-greedy mode is to match as little as possible on the premise that the whole expression matches successfully, which is called “non-greedy”.

Localizer ^ $\b \b

A locator simply defines where certain characters appear.

The regular expression locator is

character describe
^ Matches the beginning of the input string. If the Multiline property of the RegExp object is set, ^ will also match the position after \n or \r.
$ Matches the position at the end of the input string. If the Multiline property of the RegExp object is set, $will also match the position before \n or \r.
\b Matches a word boundary, the position between a word and a space.
\B Non-word boundary matching.

^ For example, the regular expression ^a uses the ^ locator, so the string must start with a. Therefore, the characters that match the preceding regular expression include ABC and absolute characters that start with a, but back and 123 characters do not match.

When ^ is followed directly by the string, it indicates the beginning of the entire string. When followed by an expression, matches the string that begins with that expression. Here’s an example.

^123[0-9]*[3] : Finds a string starting with 123 + any number of characters between 0 and 9 + 3. Can match 123999123, 1235673, etc. ^[123][0-9]*[3] : Find a string with any number of digits + 3 starting with [1 or 2 or 3] + numbers between 0 and 9. It can match 29993, 303, etc. ^(123).*[3] : Find a string starting with [123] + any number of arbitrary characters + 3. Can match 123AABBCC3, 1233, etc.

The difference between ^a and [^a] on the tip of the use of two cases :(1) locator; (2)] [^… metacharacters

Many beginners are easy to confuse the tip horn, in fact, we can understand: tip horn, only one special case, is [^…] In metacharacters like this, the uppermost character means “not”; in other cases, the uppermost character means a locator.

$

In regular expressions, the $locator is used to qualify characters at the end position. For example, the regular expression a$uses the $locator, so the string must end with a. Therefore, the strings matching the above regular expression include panda, nana and other strings ending in A, while ABC and helicopter do not match.

When the $is preceded by a character, it matches the string that ends with the entire preceding string, and when the front is an expression, it matches the string that ends with that expression. [0-9]+123: string with 1 to more characters in the front and ending with [123], such as’ 002244123 ‘. ‘[456] [123] : Matches a character string ending with [123], for example,’ 002244123 ‘. ‘^[456][123] : Matches a string ending with [123] and 1 to more digits, for example, 002244123. ‘[456][123] : Find a string that starts with [4 or 5 or 6] and ends with [1 or 2 or 3], such as 43.

\b \b contains the regular expression er\\b between words and Spaces. It matches er in order to, but not er in verb.

\b also contains the start and end positions of the target string. The regular expression \\ba[a-z]{7}\\b matches any word with a length of 8 starting with the letter “a”. Thus \b defines the beginning and end of words.

It is very common to use 2 \ bs to match a word. If you see two \b’s in the regular expression later, you should also know that they match words.

The position of the \b character is very important. If it is at the beginning of the string to match, it looks for the match at the beginning of the word. If it is at the end of the string, it looks for a match at the end of the word for example: the regular expression ter\\b matches the string ter in the word Chapter because it appears before the word boundary.

\B In regular expressions, the \B locator is used to qualify a character that is not at the beginning or end of a word. The regular expression er\\B matches er in verb but not er in order.

Substitutions in regular expressions

Replace the practice

		String str = "2013hello04world20";
        // Replace the number with *
        System.out.println(str.replaceAll("\\d"."*"));
        // Replace consecutive numbers with *
        System.out.println(str.replaceAll("\\d+"."*"));
        // Replace the last four digits of the phone with *
        str = "15200001111";
        System.out.println(str.replaceAll("\\d{4}$"."* * * *"));
        // Replace the middle four digits of the phone with *
        System.out.println(str.replaceAll("(\\d{3})(\\d{4})(\\d{3})"."$1 * * * * $3"));
        // Add an A tag to the link address to convert it into a hyperlink
        str = "http://www.baidu.com,http://www.google.com";
        System.out.println(str.replaceAll("(http://www\\.. *? \\.com)"."<a href='$1'>$1</a>"));
Copy the code

The results

****hello**world**
*hello*world*
1520000* * * *152* * * *1111
<a href='http://www.baidu.com'>http://www.baidu.com</a>,<a href='http://www.google.com'>http://www.google.com</a>
Copy the code

Note that $1, $2, and so on correspond to a parenthesis

Regular substitution is used in the editor

We use Editplus to edit some text, need to put “I am a programmer ah” inThe programmerreplaceThe engineer, of course, direct matching can be, some complex can use the re



Ctrl+H to open the replacement page



Where $1 and $2 represent the matches in parentheses 1 and 2 respectively, click replaceAll



Replaces the specified content to the end of the line

Change the content after ABC to def every time you encounter ABC





Digital replacement

Enclose consecutive numbers in brackets





Deletes the specified character at the end of each line

Delete 345 at the end of each line





Replace multiple lines with half – Angle brackets



Use the following regular expression

< script LANGUAGE = JavaScript1.1 > \ n <! --\nhtmlAdWH.'93163607','728','90'.; \n//-->\n</SCRIPT>\n

Since “(” and”) “are used as markers for default expressions (or subexpressions, as they may be called),” (” and “) “can be replaced with any character marker, i.e., half stop:”.

How does escape work

Requirement: Format amount. Since the background amount format can change, the background returns the amount format. The APP side should replace “{0}” in the formatted string returned by the server as the amount. For example, if the background returns ¥{0}, our amount format is ¥123. The background returns ${0}, and our amount is in the format $123

		String unformattedMoney = "12.00";
        String s = "${0}";
        String regex = "\ \ {0 \ \}";
        s = s.replaceAll(regex, unformattedMoney);
        System.out.println(s);
Copy the code

The execution result

$12.00
Copy the code

Characters like {and} have special meaning in regular expressions. The first table shows [ABC]{3} with curly brackets indicating a fixed number. This expression matches: aaa/bca/ CBC /…

Here, when we need to match braces, we need to write \{. But \ is also an escape character with special meaning, so \ needs to be written as \\{. You need to escape \ first.

Special characters are characters with special meanings, for example, * in roo*t. Normally, this regular expression can match root, roooot, and roooooot. The * indicates that the character appears 0 to more times. However, if you want to find the * symbol in a string, you need to escape the *, that is, preceded by a \ ro\\**ot that matches ro****ot, root, etc.

Many metacharacters require special treatment when trying to match them. To match these special characters, the characters must first be “escaped”, that is, preceded by the backslash character \. The following table lists special characters in regular expressions

Special characters describe
$ Matches the end of the input string. If the Multiline property of the RegExp object is setAlso matches ‘\n’ or ‘\r’. To matchFor the character itself, use \$.
( ) Marks the start and end of a subexpression. Subexpressions can be retrieved for later use. To match these characters, use \(and \).
* Matches the preceding subexpression zero or more times. To match the * character, use \*.
+ Matches the previous subexpression one or more times. To match the + character, use \+.
. Matches any single character except newline character \n. To match., use \.
[ Marks the beginning of a bracketed expression. To match [, use \[.
? Matches the preceding subexpression zero or once, or indicates a non-greedy qualifier. To match? Character, please use \? .
\ Marks the next character as an or special character, or a literal character, or a backreference, or an octal escape. For example, ‘n’ matches the character ‘n’. ‘\n’ matches a newline character. The sequence ‘\’ matches “\”, while ‘\(‘ matches “(“.
^ Matches the start of the input string, unless used in a square bracket expression, in which case it indicates that the set of characters is not accepted. To match ^ character itself, use \^.
{ Marks the beginning of a qualifier expression. To match {, use \{.
| Indicates a choice between two items. To match |, please use the \

What exactly does some combination of symbols mean

? 😕 : Does not get the match, matches the content after the colon, but does not get the match result, and does not store it for future use

If you want to match the “program” and “the project” this two word, regular expressions can be expressed as the program | project, can also be expressed as pro (” gramm | ject). But with the matching parentheses () are said to be the content of the existence and store a knew) (look at the replacement of chestnut, separated with a |, that is a “gramm and ject were stored But this store content is meaningless, so the written expression pro (? : “gramm | ject), it is more concise, 2 it is not stored meaningless content.

If you need to match a continuously repeated word, such as lost lost, lost lost is found to be repeated, you can use the re to find \\b(\\w+)\\b\ s+\\1\b. \b matches the beginning of the word (\w+) matches the word and stores a copy of the word. When followed by a backreference, the stored word \b can be called to match the end of the word \s+ one or more Spaces. The expression is invalid if: is added. Because (? :\w) the word can be matched but does not store a copy, and subsequent \1 does not call the preceding word in parentheses, so the expression is invalidated.

Refer to www.runoob.com/regexp/rege…