import re
Copy the code

1 Find the first matching string

s = 'i love python very much'
pat = 'python' 
r = re.search(pat,s)
print(r.span()) #(7.13)
Copy the code

2 finds all 1’s

s = Class 1, Senior 3, Qingzhou No.1 Middle School, Weifang City, Shandong Province
pat = '1'
r = re.finditer(pat,s)
for i in r:
    print(i)

# <re.Match object; span=(9.10), match='1'>
# <re.Match object; span=(14.15), match='1'>
Copy the code

3 \d Match digit [0-9]

s = 'total 20 lines of code running time 13.59s'
pat = r'\d+'# + denotes a matching number (\d denotes a generic character for a number)1R = re.findall(pat,s)print(r)
# ['20'.'13'.'59']
Copy the code

We want to keep 13.59 rather than separate, see 4

4? Indicates that the preceding character matches 0 or 1 times

s = 'total 20 lines of code running time 13.59s'
pat = r'\d+\.? \d+'#? Match decimal point (\.)0Time or1R = re. Once the.findall (pat, s)print(r)
# ['20'.'13.59']
Copy the code

5 ^ matches the beginning of the string

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'^[emrt]'R = re.findall(pat,s)print(r) # [], because the string begins with a character`T`, is not in the emRT matching range, so return nullCopy the code

6 Re. I ignores case

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'^[emrt]'R = re.compile(pat, re.i).search(s)print(r)
# <re.Match object; span=(0.1), match='T'> indicates that the beginning of the string is in the match listCopy the code

This is an inaccurate version, see # 9

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'\s[a-zA-Z]+'  
r = re.findall(pat,s)
print(r) #[' module'.' provides'.' regular'.' expression'.' matching'.' operations'.' similar'.' to'.' those'.' found'.' in'.' Perl']
Copy the code

8 only capture words, remove Spaces and use () capture, this is inaccurate version, see # 9

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'\s([a-zA-Z]+)'  
r = re.findall(pat,s)
print(r) #['module'.'provides'.'regular'.'expression'.'matching'.'operations'.'similar'.'to'.'those'.'found'.'in'.'Perl']
Copy the code

9 add the 8th above the first word, see the extracted word does not include the first word, use? Indicates 0 or 1 occurrences of the preceding character, but this character also indicates greedy or non-greedy matching. Use caution.

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'\s? ([a-zA-Z]+)'  
r = re.findall(pat,s)
print(r) #['This'.'module'.'provides'.'regular'.'expression'.'matching'.'operations'.'similar'.'to'.'those'.'found'.'in'.'Perl']
Copy the code

Use the above method to split words, not concise, just for demonstration. The easiest way to split a word is to use the split function.

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'\s+'  
r = re.split(pat,s)
print(r) # ['This'.'module'.'provides'.'regular'.'expression'.'matching'.'operations'.'similar'.'to'.'those'.'found'.'in'.'Perl']
Copy the code

11 Extract words beginning with m or T, ignore case the result below is not what we want, the reason is? On!

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'\s? ([mt][a-zA-Z]*)'R = re.findall(pat,s)print(r) # ['module'.'matching'.'tions'.'milar'.'to'.'those']
Copy the code

12 Use ^ to find the word at the beginning of the string. Combine 11 and 12 to get all words beginning with m or t

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'^([mt][a-zA-Z]*)\s'R = re.compile(pat, re.i).findall(s)print(r) # ['This']
Copy the code

13 Split the words first and then search for the words that meet the requirements. Use match to indicate whether the words match

s = 'This module provides regular expression matching operations similar to those found in Perl'
pat = r'\s+'  
r = re.split(pat,s)
res = [i for i in r if re.match(r'[mMtT]',i)] 
print(res) # ['This'.'module'.'matching'.'to'.'those']
Copy the code

14 Greedy to match as many matching characters as possible

content='<h>ddedadsad</h><div>graph</div>bb<div>math</div>cc'
pat=re.compile(r"<div>(.*)</div>"M =pat.findall(content)print(m) # ['graph</div>bb<div>math']
Copy the code

15 non-greedy match has only one more question mark (?) than 14. And get completely different results.

content='<h>ddedadsad</h><div>graph</div>bb<div>math</div>cc'
pat=re.compile(r"
      
(.*?)
"
M =pat.findall(content)print(m) # ['graph'.'math'] Copy the code

The difference between greedy matching and non-greedy matching is that the latter is returned immediately after the string is matched.

16 contains a variety of separators using the split function

content = 'graph math,,english; chemistry'# this pat=re.com running (r"[\s\,\;] +"M =pat.split(content)print(m) # ['graph'.'math'.'english'.'chemistry']
Copy the code

The sub function implements the substitution of matched substrings

content="hello 12345, hello 456321"    
pat=re.compile(r'\d+'M =pat.sub()"666",content)
print(m) # hello Awesome!, hello Awesome!
Copy the code

18 climb baidu home page title

import re
from urllib importData =request.urlopen("http://www.baidu.com/").read().decode() # parse the web page and determine the regular expression pat=r'(.*?) '

result=re.search(pat,data)
print(result) <re.Match object; span=(1358.1382), match= "> < span style =" max-width: 100%; clear: bothCopy the code

19 Summary of common metacharacters

. Matches any character ^ matches at the beginning of the string $Matches at the end of the string * before the atom repeats0time1Many times? Repeat the preceding atom once or0Times + the preceding atom repeats once or more {n} the preceding atom occurs n times {n,} the preceding atom occurs at least n times {n,m} the preceding atom occurs between n-m () grouping, the part that needs to be outputCopy the code

20 Summary of common characters

\s matches blank characters \w matches any letters/digits/underscores \w and lowercase w conversely, matches any characters other than letters/digits/underscores \d matches decimal digits \d matches values other than decimal numbers [09 -[match a09 -The digits between [A-z] match lowercase letters. [A-z] match uppercase lettersCopy the code

This is a summary of the basic use of the regular module in Python, with a step-by-step optimization analysis process. This is an intermediate process, but it is essential to understand the regular module. The author also has a superficial understanding of regularity. If there is any inadequacy in the summary, please correct me.

Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.

Highlights from the past2019Machine learning Online Manual Deep Learning online Manual AI Basic Download (Part I) note: To join our wechat group or QQ group, please reply "add group" to join knowledge planet (4500+ user ID:92416895), please reply to knowledge PlanetCopy the code

Like articles, click Looking at the