When doing crawler, it is often necessary to search the contents of crawler. In this case, using regular expression for pattern matching is an elegant way.

Golang’s standard library, Regexp, provides related uses of regular expressions.

Regexp syntax rules

If you want to use regular expressions to Match patterns directly, it is common to use the regexp.Match function and the regexp.MatchString function directly, or to use the regexp type and its related methods.

regexp.Match

  • Definition:
func Match(pattern string, b []byte) (matched bool, err error)
Copy the code

This method determines whether Byte slice matches a regular expression. Returns the result of the match, as well as error.

  • Example:
matched, _ := regexp.Match("b"And []byte("hello golang"))
fmt.Println(matched) // false
Copy the code

regexp.MatchString

  • Definition:
func Match(pattern string, s string) (matched bool, err error)
Copy the code

This method determines whether a string matches a regular expression. Returns the result of the match, as well as error.

  • Example:
matched, _ := regexp.MatchString("b"."hello golang")
fmt.Println(matched)
Copy the code

Regexptype

  • Type definition:
type Regexp struct {}
Copy the code

A Regexp is a compiled regular expression that can be used concurrently by multiple Goroutines.

The common methods for obtaining Regexp instances are Compile and MustCompile.

Compile

This method compiles the regular expression and returns the compiled result. The method is defined as:

func Compile(expr string) (*Regexp, error)
Copy the code

MustCompile

This method is similar to Compile, but if a compilation error occurs, it does not return an error, but panic.

Regexp Common matching methods

Regexp contains a number of matching methods, some of the more common ones listed below.

Find

This method returns the first matching []byte.

  • Defined as:
func (re *Regexp) Find(b []byte) []byte
Copy the code
  • Example:
re := regexp.MustCompile("a")
match := re.Find([]byte("hello golang"))
fmt.Println(string(match)) // a
Copy the code

FindAll

This method returns a slice composed of matching results.

  • Defined as:
func (re *Regexp) FindAll(b []byte, n int)[] []byte
Copy the code
  • The returned slice length is specified by the n argument:

    • If n < 0, all matching numbers are returned.

    • If n >= 0 and n <= total number of matches, n results are returned.

    • If n > the total number of matches, all results are returned.

  • Example:

re := regexp.MustCompile("l[a-z]")
match := re.FindAll([]byte("hello world, hello golang"), - 1)
for _, m := range match {
	fmt.Println(string(m))
}
// ll
// ld
// ll
// la
Copy the code

FindString

This method returns the matching string.

  • Defined as:
func (re *Regexp) FindString (s string) string
Copy the code
  • Example:
re := regexp.MustCompile("l[a-z]")
match := re.FindString("hello world, hello golang")
fmt.Println(match) // ll
Copy the code

FindAllString

  • Defined as:
func (re *Regexp) FindAllString(s string, n int) []string
Copy the code

This method returns a match, based on argument n, or nil if the match is unsuccessful

  • Example:
re := regexp.MustCompile("l[a-z]")
match := re.FindAllString("hello world, hello golang".- 1)
for _, m := range match {
	fmt.Println(string(m))
}
// ll
// ld
// ll
// la
Copy the code

FindIndex

This method returns the position of the first match in the original string.

  • Defined as:
func (re *Regexp) FindIndex(b []byte) (loc []int)
Copy the code

Loc [0] is the starting position of the matching result and loc[1] is the ending position of the matching result +1. If no match is found, an empty slice is returned.

  • Example:
re := regexp.MustCompile("l[a-z]")
match := re.FindIndex([]byte("hello world, hello golang"))
fmt.Println(match) / / [4] 2
Copy the code

FinAllIndex

This method is the all version of FindIndex and determines the number of results to return based on the n argument.

  • Defined as:
func (re *Regexp) FindAllIndex(b []byte, n int)[] []int
Copy the code

See FindAll for the usage of n.

  • Example:
re := regexp.MustCompile("l[a-z]")
match := re.FindAllIndex([]byte("hello world, hello golang"), - 1)
for _, m := range match {
	fmt.Println(m)
}
// [2 4] [9 11] [15 17] [21 23]
Copy the code

FindStringIndex

This method does the same thing as FindIndex, except that the type of argument passed in is different.

  • Defined as:
func (re *Regexp) FindStringIndex(s string) (loc []int)
Copy the code

FindAllStringIndex

This method is the All version of FindStringIndex.

  • Defined as:
func (re *Regexp) FindAllStringIndex(s string, n int)[] []int
Copy the code

FindStringSubmatch

This method returns a matching set of strings.

  • Defined as:
func (re *Regexp) FindStringSubmatch(s string) []string
Copy the code

It may be difficult to understand how this method works just by looking at the instructions. Here is an example:

. re := regexp.MustCompile(`(aaa)bb(c)`)
fmt.Printf("%q\n", re.FindStringSubmatch("aaabbc"))
Copy the code

The return result is:

["aaabbc"."aaa"."c"]
Copy the code

FindAllStringSubmatch

This method is the All version of FindStringSubmatch.

  • Implemented as:
func (re *Regexp) FindAllStringSubmatch(s string, n int)[] []string
Copy the code

Match

This method determines whether Byte slice matches a regular expression.

  • Definition:
func (re *Regexp) Match(b []byte) bool
Copy the code
  • Example:
re := regexp.MustCompile(`hello`)
match := re.Match([]byte("hello everyone"))
fmt.Println(match) // true
Copy the code

MatchString

Determines whether the string matches the regular expression.

  • Definition:
func (re *Regexp) MatchString(s string) bool
Copy the code
  • Example:
re := regexp.MustCompile(`hello`)
match := re.MatchString("hello everyone")
fmt.Println(match) // true
Copy the code

ReplaceAll

  • Defined as:
func (re *Regexp) ReplaceAll(src, repl []byte) []byte
Copy the code

This method returns a copy of SRC in which all matches are replaced by the REPL.

  • Example:
re := regexp.MustCompile(`hello`)
match := re.ReplaceAll([]byte("hello everyone"), []byte("hi!"))
fmt.Println(string(match)) // hi! everyone
Copy the code

ReplaceAllString

  • Defined as:
func (re *Regexp) ReplaceAllString(src, repl string) string
Copy the code

This method returns a copy of SRC, all matches of which are replaced by the REPL.

  • Example:
re := regexp.MustCompile(`hello`)
match := re.ReplaceAllString("hello everyone"."hi!")
fmt.Println(match) // hi! everyone
Copy the code

Split

  • Defined as:
func (re *Regexp) Split(s string, n int) []string
Copy the code

The method splits S into multiple strings with the match as the separator and returns a slice of the string.

  • Example:
re := regexp.MustCompile(`a`)
s := re.Split("abacadaeafff".- 1)
fmt.Println(s) // ["", "b", "c", "d", "e", "fff"]
Copy the code
  • The n argument controls the slice length returned:

    • N > 0: Returns a maximum of n strings, the last of which is the uncut portion

    • N == 0: returns nil

    • N < 0: Returns all strings

Regular expression syntax rules

character describe
^ Matches the beginning of the string
$ Matches the end of the string
* Matches the preceding subexpression zero or more times
+ Matches the previous subexpression one or more times
? Matches the preceding subexpression zero or once
{n} Match the n
{n,} At least n times
{n,m} At least n times, at most m times
? Follow the * +? {n} {n.} {n,m} indicates non-greedy matching
. Matches any single character except “\n”
x|y Match x or y
[xyz] Matches any of the contained characters
[^xyz] Matches characters that are not included
[a-z] Range of matching characters
[^a-z] Matches characters that are not in the specified range
\b Matches the boundaries of a word
\B Matches a non-word boundary
\d Matches a numeric character
\D Matches a non-numeric character
\f Matches a feed character
\n Matches a newline character
\r Matches a carriage return
\s Matches any whitespace character
\S Matches any non-whitespace character
\t Matches a TAB character
\v Matches a vertical TAB character
\w Matches any word character including underscores
\W Matches any non-word character