Regular expressions (RE)

It’s a computer science concept
Used to match a string that matches a rule using a single string
Text that is often used to retrieve and replace certain patterns

Regular notation

.(dot): represents any character except \n, such as finding all one character.
[]: Matches any characters listed in parentheses, such as [L,Y,0], LLY, Y0, LIU
\ D: Any number
\D: Anything but numbers
\ S: indicates space, TAB key
\S: Except for blank space
\ W: Word characters, namely A-z, A-z, 0-9, _
\W: Anything except “\W”
: indicates that the preceding content is repeated zero or more times, \w
+: indicates that the previous content appears at least once
? : Zero or once of the previous content
{m,n}: allow the previous content to appear at least m times, at most N times
^: Matches the beginning of the string
$: matches the end of the string
\ B: Match word boundaries

(): Groups the contents of the regular expression, starting with the first parentheses and increasing in number

To verify a number: ^\d$must have a number, at least one digit: ^\d+$Can only appear numbers, and the number of digits is 5-10: ^\d{5,10}$Register age, 16 years old or older,99 years old or younger: ^[16,99]$Only English characters and numbers can be entered: ^[a-za-z0-9]$verify qq number: [0-9]{5,12}Copy the code

\A: matches only the beginning of the string, \Aabcd, then abcd
\Z: Matches only the end of the string, abcd\Z, abcd
| : about any one
(? P…) : group, make an alias in addition to the original number, (? P12345){2}, 1234512345
(? P=name): reference group

RE uses rough steps

Use compile to compile the string representing the re into a pattern object
The pattern object provides a series of method degree text to find the Match and obtain the Match result, a Match object
Finally, use the properties and methods provided by the Match object to get the information and operate as needed

RE common functions

Group (): To get one or more matching strings, use group or group(0) to get the whole matching string.
Start: Gets the starting position of the substring matched by the grouping in the entire string. The default argument is 0
End: Gets the end position of the grouping matched substring in the entire string. Default is 0
Span: returned structural techniques (start(group), end(group))

Import related packages
import re

# find a number
# r indicates that the string is not escaped
p = re.compile(r'\d+')
# look in the string "one12twothree33456Four78", according to the re set by rule P
If None is returned, the match object is returned
m = p.match("one12twothree33456four78")

print(m)
Copy the code

None
Copy the code

Import related packages
import re

# find a number
# r indicates that the string is not escaped
p = re.compile(r'\d+')
# look in the string "one12twothree33456Four78", according to the re set by rule P
If None is returned, the match object is returned
Parameter 3,6 indicates the range to look for in the string
m = p.match("one12twothree33456four78".3.26)

print(m)

# The problem with the above code
# 1. Match can input arguments to indicate the starting position
# 2. Only one result is found, indicating that the first match was successful
Copy the code

<_sre.SRE_Match object; span=(3, 5), match='12'>
Copy the code

print(m[0])
print(m.start(0))
print(m.end(0))
Copy the code

12 March 5Copy the code

import re
# I means case is ignored
p = re.compile(r'([a-z]+) ([a-z]+)', re.I)

m = p.match("I am really love you")
print(m)
Copy the code

<_sre.SRE_Match object; span=(0, 4), match='I am'>
Copy the code

print(m.group(0))
print(m.start(0))
print(m.end(0))
Copy the code

I am
0
4
Copy the code

print(m.group(1))
print(m.start(1))
print(m.end(1))
Copy the code

I
0
1
Copy the code

print(m.group(2))
print(m.start(2))
print(m.end(2))
Copy the code

am
2
4
Copy the code

print(m.groups())
Copy the code

('I', 'am')
Copy the code

To find the

Search (STR, [, pos[, endpos]]): Looks for a match in the string, with pos and endpos representing the starting position
Findall: Finds all
Finditer: To find an iter result

import re

p = re.compile(r'\d+')

m = p.search("one12two34three567four")

print(m.group())
Copy the code

12
Copy the code

rst = p.findall("one12two34three567four")
print(type(rst))

print(rst)
Copy the code

<class 'list'>
['12', '34', '567']
Copy the code

Sub replaced

sub(rep1, str[, count])

# sub replacement case
import re

# \w contains numbers and letters
p = re.compile(r'(\w+) (\w+)')

s = "hello 123 wang 456, i love you"

rst = p.sub(r'Hello world', s)
print(rst)
Copy the code

Hello world Hello world, Hello world you
Copy the code

Matching Chinese

Most Chinese representations range is [u4e00-U9FA5] and do not include full-angle punctuation

import re

title = 'Hello world, Hello Moto'

p = re.compile(r'[\u4e00-\u9fa5]+')
rst = p.findall(title)

print(rst)
Copy the code

[' World ', 'Hello ']Copy the code

Greed and non-greed

Greedy: As many matches as possible, (*) indicates greedy matches
Not greedy: find the smallest content that fits the criteria, (?) Not greedy
The re uses greedy matching by default

import re

title = u'<div>name</div><div>age</div>'

p1 = re.compile(r'<div>.*</div>')
p2 = re.compile(r'
      
       .*? 
      
')

m1 = p1.search(title)
print(m1.group())

m2 = p2.search(title)
print(m2.group())
Copy the code

<div>name</div><div>age</div>
<div>name</div>
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python regular expressions

Regular expressions (RE)

Regular notation

RE uses rough steps

RE common functions

To find the

Sub replaced

Matching Chinese

Greed and non-greed

Python regular expressions

Regular expressions (RE)

Regular notation

RE uses rough steps

RE common functions

To find the

Sub replaced

Matching Chinese

Greed and non-greed

Related Posts

How to achieve the AB experiment of seller growth task

First knowledge of Spring Cloud series — Sleuth principle

A brief analysis of CentOS directory structure