Regex Tutorial — A quick cheatsheet by examples[1]
The format of the gold digging editor is not good, you can go to our public number to browse:
Mp.weixin.qq.com/s?__biz=MzI…
Regular expressions are useful for extracting the content of a particular pattern from text.
It has a wide range of applications, including: validation, parsing/replacing strings, data format conversion, and web crawling.
Once you learn its syntax, you can use it in almost any programming language (JavaScript, Java, VB, C#, C/C++, Python, Perl, Ruby, Delphi, R, Tcl, and so on).
Let’s look at some examples and explanations first.
basis
Anchors: ^ and $
• ^The – Matches any string beginning with The
•
end$– match to
end
Terminating string
•
^The end$– string
Complete matching(i.e.
The end
)
•
roar– contains
roar
Any string of
Quantifiers: * +? And {}, that is, represents the quantity
• ABC * — matches ab and is followed by zero or more C’s
ABC,
+Match –
Ab followed by one or more csString (at least one C)
ABC,
?Match –
Ab followed by zero or one cThe string
ABC,
{2}Match –
Ab followed by two csThe string
ABC,
{2}Match –
Ab followed by two or more CSString of (at least 2)
ABC,
{2, 5}Match –
Ab followed by two to five CSThe string
a.
(bc)*Match –
A followed by zero or more BCSThe string
a.
(BC) {2, 5}Match –
A followed by 2 to 5 BCSThe string
Or operator: | or []
A. (b | c) – string matching and followed by b or c (i.e., ab or ac)
•
a[bc]– same as above
Character classes: \d \w \s and.
• \d — Matches a number (equivalent to [0-9])
•
\wMatch –
Literal characters(Note: this refers to English: numbers, letters and underscores. Is equivalent to
[a-zA-Z0-9_]
)
•
\sMatch –
Spaces(including TAB characters and line breaks, equivalent to
[\r\n\t\f\v ]
)
•
. Match –
Any character
When using., note that it is sometimes faster and more accurate to use character classes (\d, \ s.\ w) and anticharacters.
The antisenses of \d, \s and \w are \d, \w and \s (uppercase).
For example, \D got the opposite result.
• \D — Matches a non-numeric character
Want to match ^. [$() | * +? {\ these characters, such as to use \ escaped.
• $\d – Matches a $and a number
Tip: Regular expressions can be used to match characters that are not printable, such as \t for TAB, \n for newline, \r for return.
Flags
The re is usually presented in the form/ABC /, and the search pattern is separated by two slashes. Add the following tag at the end:
• g (global) returns a multi-line match of all matches • m (multi-line). Using ^ and $together will match on multiple lines instead of the full string. • INSENSITIVE (I), for example, /aBc/ I matches aBc.
The intermediate
Grouping and capturing: ()
• A (BC) – Use braces to create a capture group whose value is BC
a.
(? :bc
)* – use
? : to disable capture groupsA. (
? <foo>BC) – use
? <foo>To the group name
This operator is useful in data extraction, where multiple capture subsets are rendered as arrays, so the values can be retrieved by index.
If we use group naming (?
…) , we can use key names just like we use dictionaries.
Parenthesis expressions: []
• [ABC] – match or a or b, or c string (equal to: a | b | c)
•
[a-c]– same as above,
[a-fA-F0-9]Match –
A hexadecimal number, case insensitive•
[0-9] %Match –
The % is preceded by an arbitrary number from 0 to 9String •
[^a-zA-Z]Match –
Not including a to Z, and a to zThe string here
^
It’s an inverse expression
Greed and laziness (not greed) match
The * + {} quantifiers are greedy operators. They match as much content as possible.
For example: <.+> will match
from This is a
test. If we only want the div tag, just add? Make it lazy.
• <. +? > — Lazy matches any character in < and >
Tip: It’s best to avoid using. In favor of strict matching.
• <[^<>]+> – Matches any character in < and > except < and >
senior
Boundaries: \b and \B (also known as word boundaries)
• \babc\b — word ABC
\b is a marker similar to $and ^. One side of it is a word (\w) and the other side is not a word (the starting position of the string or a space).
Its antisense is \B.
• \Babc\B – matches ABC surrounded by words
Back-references: \1
• ([ABC])\1 – uses \1 to match the same text as the first capture group
• ([abc])
([de]
) \ 2– use \ 1
\ 2
(\3, \4, etc.) to match and
The second(3rd, 4th, etc.)
Capture group the same text•
(? <foo>[abc])
\k<foo>– Name the capture group as
foothrough
\k<foo>Quote him. Same result as the first example
Forward and backward :(? =) and (? < =)
D (? =r) — matches d before r, excluding r
•
(? <=r)D – match
rFollowed by
d, not including
r
Or use an antisense operation
D (? ! R) — matches the d following r, excluding r
•
(? <! r)D – match
rIn front of
d, not including
r
conclusion
As you can see, regular expressions can be used in multiple fields, and you may have used some of them in your development. Here are a few common scenarios:
In lower case, however, the annoying truth is that a shorter string is written in lower case so that all the annoying truth in lower case is written in lower case. In lower case, however, the annoying truth is that a shorter string is written in lower case so that all the annoying truth in lower case is written in lower case
Geeks reading | blog brings together the best technology at home and abroad, product dynamic and public articles.
Website: geeker-read.com