Re library application

The Re library is Python’s standard library for dealing with regular expressions. This blog introduces the re library along with a brief introduction to the regular expression syntax. If you want to learn more about regular expressions, you will need to work harder.

Regular expression syntax

Regular expression syntax is made up of characters and operators.

The operator instructions example
. Any single character rarely fails to match
[] Character set, a range of values for a single character [ABC] matches a, B, and C, and [a-z] indicates a single character from A to Z
[\ ^] Non – character set, given the exclusion range for a single character [\^ ABC] Matches a single character that is not a, b, or C
* The preceding character expands 0 or infinitely ABC * indicates ab, ABC, ABCC, and ABCCC
+ The preceding character is extended 1 or indefinitely ABC + indicates ABC, ABCC, and ABCCC
? The preceding character is 0 or 1 times abc? Indicates AB and ABC
Any left or right expression
{m} Extend the first character m times Ab {2}c: indicates ABBC
{m,n} Extend the first character m to n times Ab {1,2}c: indicates ABC and abbc
\ ^ Matching the beginning of a string \^ ABC indicates that ABC is at the beginning of the string
\ $ Matching the end of a string ABC $indicates that ABC is at the end of the string
(a) Group tag, used internally only The operator
\d Number, equivalent to [0-9]
\w Character, equivalent to [A-za-z0-9]

The above representation is only the most basic part of regular expression. If you want to study regular expression in depth, it is recommended to find more comprehensive information for study. This paper is only used as a medicine citation.

Re library basic usage

Main functions of re library are as follows:

  • Basic functions:compile;
  • Functions:search,match,findall,split,finditer,sub.

Before we dive in, let’s take a look at native strings.

In Python, it represents a native string, preceded by r. For example, my_str = ‘I ‘am xiangpica’ will be reported directly in the program. If you want the string’ xiangpica’ to run normally, you need to add the transfer character \ my_str = ‘I ‘am xiangpica’, modify to my_str =’ I ‘am xiangpica’. However, this combination of the previous regular expression operators can be problematic, because \ has a real meaning in regular expressions. If you use the re library to match \ in a string, you need to use four backslashes. To avoid this, the concept of native strings is introduced.

# r"\\" # r"\\"Copy the code

There are practical applications for this later.

Let’s look at an example, such as the following code:

my_str='C:\number'

print(my_str)
Copy the code
C:
umber
Copy the code

The output of this code is as follows, \n is parsed as a newline. If you want to mask this, use r:

my_str=r'C:\number'

print(my_str)
Copy the code

Output the C: \ number.

Re library related function description

Re. The search function

This function searches the string for the value of the first position that the regular expression matches, returning a match object. The function prototype is as follows:

re.search(pattern,string,flags=0)
Copy the code

Requirement: Match Charlie in the string Charlie not 猹 good good.

Import re my_str=' my_str '猹 good good pattern = r' my_str' ret = re.search(pattern,my_str) print(ret)Copy the code

Return result: < re.Match object; Span =(2, 5), match=’ Charlie ‘> .

Flags, the third argument to the search function, represents the control flags for regular expression usage.

  • re.I.re.IGNORECASE: Ignores the case of the regular expression.
  • re.M.re.MULTILINEThe \^ operator in a regular expression can treat each line of a given string as the start of a match;
  • re.S.re.DOTALL: in the regular expression.The operator can match all characters.

Finally, output the matched string using the following code.

Import re my_str = 're.search(pattern, my_str) if ret: print(ret.group(0))Copy the code

Re. Match function

This function is used to match a regular expression at the beginning of the target string and returns the match object, or None on success.

re.match(pattern,string,flags=0)
Copy the code

Be sure to note that this is the start of the target string.

Import re my_str = 'Charley not 猹 good good' pattern = r'猹' # pattern = r'good' # ret = re.match(pattern, my_str) if ret: print(ret.group(0))Copy the code

Both the re.match and re.search methods return at most one match object at a time. If you want to return more than one value, you can return more than one string by constructing a matching group in parentheses around the pattern.

Re. The.findall function

This function searches for strings and returns all matched strings in a list format. The prototype of this function is as follows:

re.findall(pattern,string,flags=0)
Copy the code

The test code is as follows:

Import re my_str = '猹 good good' pattern = r'good' ret = re.findall(pattern, my_str) print(ret)Copy the code

Re. Split function

This function splits a string into regular expression matches and returns a list. The function prototype is as follows:

re.split(pattern, string, maxsplit=0, flags=0)
Copy the code

When the re.split function splits a string, if the regular expression matches characters at the beginning or end of the string, it returns a split string list with too many Spaces at the beginning and end, which needs to be removed manually, for example:

Import re my_str = '1 '猹1good1good1' pattern = r'\d' ret = re.split(pattern, my_str) print(ret) import re my_str = '1' 猹1good1good1' pattern = r'\d' ret = re.split(pattern, my_str) print(ret)Copy the code

Running results:

[", "Charlie is not 猹", "good", "good", "]Copy the code

Switch to something in the middle to split the string correctly.

Import re my_str = '1 'import re my_str = '1' 猹1good1good1' pattern = r'good' ret = re.split(pattern, my_str) print(ret)Copy the code

If parentheses are captured in pattern, the result matched in parentheses will also be returned in the list.

Import re my_str = '1 'import re my_str = '1' 猹1good1good1' pattern = r'(good)' ret = re.split(pattern, my_str) print(ret)Copy the code

Running the result, you can learn about the difference between parentheses and no parentheses:

['1 Charlie 1', 'good', '1', 'good', '1']Copy the code

The maxsplit parameter indicates the maximum number of splits, and all remaining characters are returned to the last element in the list, for example, setting match 1, resulting in [‘1 dream eraser 1’, ‘1good1’].

Re. Finditer function

Searches the string and returns an iterator that matches the result, each iterating element being a match object. The function prototype is as follows:

re.finditer(pattern,string,flags=0)
Copy the code

The test code is as follows:

Import re my_str = '1 Charley not 猹1good1good1' pattern = r'good' # ret = re.split(pattern, my_str,maxsplit=1) ret =re.finditer(pattern, my_str) print(ret)Copy the code

Re. The sub function

Replaces the string matched by the regular expression in a string and returns the replaced string. The prototype function is as follows:

re.sub(pattern,repl,string,count=0,flags=0)
Copy the code

Where the repl argument is the string to replace the matching string, and the count argument is the maximum number of substitutions that can be matched.

Import re my_str = '1 '猹1good1good1' pattern = r'good' ret = re.sub(pattern, "nice", my_str) print(ret)Copy the code

After running, we get the replaced string:

1 Charlie is not 猹1nice1nice1Copy the code

Re library other functions

Other common functions are re.fullmatch(), re.subn(), and re.escape(). Check out the official documentation for more information.

Re library object-oriented writing method

The above are all functional writing method, RE library can use object-oriented writing method, the regular expression is compiled, multiple operations. The core function is re.compile.

The function prototype is as follows:

regex = re.compile(pattern,flags=0)
Copy the code

Where Pattern is a regular expression string or a native string.

The test code is as follows:

Regex = re.compile(pattern = r'good') ret = regex.sub("nice", my_str) print(ret)Copy the code

The code above compiles the regular expression to a regular object, which is not needed to write the regular expression in the regex.sub function. When using the compiled regex object, you only need to replace all the re objects, and then call the corresponding method.

Match object of the RE library

After using the RE library to match a string, a match object is returned with the following properties and methods.

Properties of the match object

  • .string: Text to be matched;
  • .re: Pattern object used for matching;
  • .pos: the starting position of the regular expression search text;
  • .endpos: Indicates the end position of the regular expression search text.

The test code is as follows:

Import re my_str = '1 '猹1good1good1' regex = re.compile(pattern = r'g\w+d') ret = regex.search(my_str) print(ret) print(ret.string) print(ret.re) print(ret.pos) print(ret.endpos)Copy the code

Result output:

<re.Match object; Span =(7, 16), match='good1good'> 1 猹1good1good1 re.compile('g\ w+d') 0Copy the code

Method of the match object

  • .group(0): Gets the matched string.
  • .start(): matches the string at the beginning of the original string;
  • .end(): matches the string at the end of the original string;
  • .span()Returns the(.start(),.end())

Because the content is relatively simple, the specific code is not shown.

Summary of this blog post

This blog is about the re library in Python. It focuses on the functions in the re library. It doesn’t explain regular expressions too much.