This is the 21st day of my participation in the First Challenge 2022

Regular expression syntax

\

Mark the next character as:
- A special character
- One literal character (12) : ^, $, (,), *, +,? ,,,, {|
- A backward reference
- An octal escape character
Example:
- \n – Newline character
- \ \ \
- \ [- (

^

Matches the start of the input string
If you set the Multiline property of the RegExp object, ^ also matches the position after \n or \r

$

Matches the end of the input string
If you set the Multiline property of the RegExp object, $also matches the position before \n or \r

*

Matches the preceding subexpression zero or more times
Equivalent to {0}
Example:
- Zo * -z or ZO or zoo

+

Matches the previous subexpression one or more times
Equivalent to {1}
Example:
- Zo + -ZO or zoo

?

Matches the preceding subexpression zero or once
Equivalent to {0, 1}
Example:
- do(es)? – do or does

{n}

The match is determined n times. N is a non-negative integer
Example:
- O {2} – Does not match Bob, but matches food

{n,}

Match at least n times. N is a non-negative integer
O {0,} is equivalent to o*
O {1,} is equivalent to o+
Example:
- O {2,} – Does not match Bob, but matches Looooog

{n, m}

The minimum number of matches is n, and the maximum number is m. Both n and m are non-negative integers, where n<=m
O {0,1} is equivalent to o?
Example:
- O {1,3} – matches the first three o’s of loooooog

?

Non-greedy quantification:When this character is followed by any of the other modifiers *, +,? , {n}, {n,}, {n,m}, the matching mode is non-greedy
- The non-greedy pattern is to match as few strings as possible
- The regular expression defaults to greedy mode, which matches as many strings as possible
Example:
- o+? – Matches a single O in loooooog
- O + – Matches all o’s of Loooooog

.

Matches any single character except \r, \n
If need to match \ r \ n, characters, need to use (. | | \ r \ n)

(pattern)

Matches pattern and gets the matching substring, which is used for backward reference
The Matches obtained can be obtained from the collection that generated Matches

(? :pattern)

Matches pattern but does not get a matching substring
This is a non-fetch match that does not store the matched substring for backward reference
Or characters used in the replacement | to combine various parts of a model is very useful
Example:
- industr(? : y | ies) is equivalent to industry | industries

(? =pattern)

Positive affirmative prelookup: Matches the lookup string at the beginning of any string that matches pattern
This is a non-fetch match, meaning that the match does not need to be fetched for later use
Presearch does not consume characters. That is, at the beginning of a match, the next match search starts immediately after the last match occurs, rather than starting the match search from the character that contains the pre-check character
Example:
- Windows(? = 95 NT | | 98 | 2000) – can match Windows in Windows, but can not match the Windows 10 of Windows

(? ! pattern)

Positive negative pre-check: Matches the search string at the beginning of any string that does not match pattern
This is a non-fetch match, meaning that the match does not need to be fetched for later use
Presearch does not consume characters. That is, at the beginning of a match, the next match search starts immediately after the last match occurs, rather than starting the match search from the character that contains the pre-check character
Example:
- Window(? ! 95 NT | | 98 | 2000) – can match the Windows in Windows 10, but can not match the Windows in Windows

(? <=pattern)

Reverse affirmative prelookup: Reverse matches the lookup string at any string that matches pattern
Example:
- (? < = 95 NT | | 98 | 2000) Windows – can match 2000 Windows in Windows, but can’t match ten Windows in Windows

(? <! pattern)

Reverse negation prelookup: Reverse matches any string that does not match pattern
Example:
- (?

x|y

Or match
If not enclosed in parentheses, the range is the entire regular expression. Otherwise, it just matches the string in parentheses
Example:
- Z | food – z or food
- | f (z) oo – zoo or foo

[xyz]

Character set. Matches any of the contained characters
Only the special character backslash \ can retain the special meaning of escape characters. Other symbols such as *, +, (,) are ordinary characters
- Decarbonate ^ indicates a negative set of characters if they appear first. If it occurs between characters, it is a normal character
- Hyphen – indicates a character range if it occurs in the middle of a string. It is a normal character if it appears at the beginning or end
- The closing parenthesis is also a normal character if it appears first
Example:
- [ABC] – can match ain plain

[^xyz]

A collection of excluded characters. Matches any character not listed
Example:
- [^ ABC] – can match plin in plain

[a-z]

Character range. Matches any character in the specified range
Example:
- [a-z] – Can match any lowercase character from a to Z

[^a-z]

Exclude the range of type characters. Matches any character that is not in the specified range
Example:
- [^a-z] – Matches any character not in the range from a to z

[:name:]

Adds a character from a named character class to an expression. Can only be used in square bracket expressions

[=elt=]

Add or subtract characters equivalent to ELT in the current locale. Can only be used in square bracket expressions

[.elt.]

Add the sort element ELT to the expression. Can only be used in square bracket expressions
This syntax is used when some collated elements consist of more than one character. For example, in the 29-alphabet Spanish,CH comes after C as a single letter, resulting in the order cinco, credo, chispa

\b

Matches word boundaries. That is, the position between words and Spaces
Example:
- Er \b – can match er in never, but not er in verb

\B

Matches non-word boundaries
Example:
- Er \B – Can match the ER in verb, but not the ER in never

\cx

Matches the control character specified by x
The value of x must be a-Z or one of a-Z characters, otherwise c is considered A literal C character
The value of the control character equals the minimum 5 bits of the value of x (remainder of decimal 32)
Example:
- \cM – Matches control-m or carriage return
- \ca – \u001
- \cb – \u002

\d

Matches a numeric character
Equivalent to [0-9]

\D

Matches a non-numeric character
Equivalent to [^ 0-9]

\f

Matches a feed character
This is equivalent to \x0c and \cL

\n

Matches a newline character
Equivalent to \x0a and \cJ

\r

Matches a carriage return
Equivalent to \x0d and \cM

\s

Matches any whitespace character
Includes Spaces, tabs, page feeds, etc
Equivalent to [\f\n\r\t\v]

\S

Matches any non-whitespace character
Equivalent to [^ \f\n\r\t\v]

\t

Matches a TAB character
This is equivalent to \x0b and \cI

\v

Matches a vertical TAB character
Equivalent to \x0b and \cK

\w

Matches any word character including underscores
Equivalent to [A Za – z0-9 _]

\W

Matches any non-word character
Equivalent to [^ A Za – z0-9 _]

\xnn

Hexadecimal escape character sequence. Matches a character represented by two hexadecimal digits nn
ASCII encoding can be used in regular expressions
\x041 is equivalent to \x04&1
Example:
- \x41 – A

\num

References a substring that matches the num parenthesized subexpression of the regular expression
Num is a positive decimal integer starting from 1, and can be up to 9, 31, 99, or even infinite
Example:
- (.). \1 – Matches two consecutive identical characters

\n

Identifies an octal escape value or a backreference:
- If at least n subexpression is obtained before \n, n is a backreference
- Otherwise,n is an octal number 0-7. In this case,n is an octal escape value

\nm

Identifies an octal escape value or a backward reference
- If the subexpression is obtained at least nm before \nm, nm is a backward reference
- If \nm is preceded by at least n get subexpression, n is a backward reference followed by the literal m
- Otherwise, if none of the previous conditions are met and both n and m are octal digits 0-7, then \nm will match the octal escape value nm

\nml

If n is an octal number 0-3, and both m and L are octal numbers 0-7, \ NML will match the octal escape value NML

\un

Unicode escape character sequences
N is a Unicode character represented by four hexadecimal digits

\p{P}

Used forUnicodeRegular expression prefix
- The lowercase P is a property, representing a Unicode property
- BracketedPsaidUnicodeThe punctuation character of one of the seven character attributes in the character set. There are six other attributes:
  - L:
  - M: Mark symbol. They don’t usually appear alone
  - Z: delimiter. Such as Spaces, line breaks and so on
  - S: symbols, such as mathematical symbols, currency symbols and so on
  - N: Numbers, such as Arabic numerals, Roman numerals and so on
  - C: Other characters
Note: This syntax is not supported in JavaScript

< \ \ >

Match word beginning \ < and end \ >
Example:
- \ – Can match the in the string for the wise, not the in otherwise \>

(a)

Define the expression between (and) as a group group, and save characters that match the expression to a temporary region. A regular expression can hold up to nine such temporary regions, referenced using the \1 through \9 symbols

|

Perform a logical or OR operation on two matching conditions
Example:
- (question | its ehrs) – can match it belongs to hin or it belongs to its ehrs, cannot match it belongs to them

Regular expression usage scenarios

String substitution

Conversion date format example

Matcher

The Matcher class provides four methods to replace a matching string with a specified string:
- replaceAll()
- replaceFirst()
- appendReplacement()
- appendTail()
Focus on the appendReplacement() and appendTail() methods

appendReplacement()

appendReplacement(StringBuffer sb, String replacement):
- Replaces the current matching substring with the specified string
- And add the replaced substring and the string since the last match to a StringBuffer object

appendTail()

appendTail(StringBuffer sb):
- Adds the remaining characters after the last match to a StringBuffer object

The sample

stringfatcatfatcatfat,Regular expression patterncat:
- AppendReplacement (sb, “dog”) is called after the first match, and the StringBuffer is fatDog. That is, cat in fatcat is replaced with dog and appended to sb with the content before the matching substring
- After the second match, an appendReplacement(sb, “dog”) is invoked, at which point the sb’s contents become FatdogFatdog
- The last appendTail(sb) call, then sb’s content becomes Fatdogfatdogfat

Character checking

Example of regular expression verification

Regular expression details! Regular expression syntax analysis and use in Java