Yesterday a colleague was looking at k8S source code, suddenly asked a seemingly very simple question, golang.org/pkg/regexp/… What does the official documentation mean by ReplaceAllString? How does it work?

Official English text:

func (re *Regexp) ReplaceAllString(src, repl string) string

ReplaceAllString returns a copy of src, replacing matches of the Regexp with the replacement string repl. Inside repl, $ signs are interpreted as in Expand, so for instance $1 represents the text of the first submatch.

Copy the code

Chinese document:

The ReplaceAllLiteral returns a copy of SRC, replacing all re matches in SRC with repL. At replacement time, in the REPL'$'Symbols are interpreted and replaced according to the rules of the Expand method, for exampleThe $1Is replaced by the result of the first group match.Copy the code

You look confused and still don’t understand how this function works.

Go back to the official example:

Example:
re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-"."T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-"."$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-"."$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-"."${1}W"))

Output:

-T-T-
--xx-
---
-W-xxW-
Copy the code

The first substitution, barely discernible, is to replace the regular expression matching portion of -ab-axxb- with T; What does the $in the second mean? $1 looks like matching the first part of the regular expression grouping, but what about $1W? The ${1} W? With these questions in mind, I started digging into how this function works.

First, the $sign is explained in the Expand function:

Func (re *Regexp) Expand(DST []byte, template []byte, SRC []byte, match []int) []byte Expand Returns the newly generated slice after adding template to DST. When adding, Expand replaces the variable in the template with the result of the match from SRC. Match should be the index of the start and end positions of matches returned by FindSubmatchIndex. In the template argument, a variable is represented in the format:$nameor${name}Where name is a sequence of letters, digits, and underscores of length >0. A purely numeric character name such asThe $1Will serve as the numeric index of the capture group; Other names correspond to (? P<name>...) The syntax produces the name of the capture group. An out-of-range numeric index, an index whose group does not match text, or a group name that does not appear in the regular expression are replaced with an empty slice.$nameFormat the variable name, name will take the longest possible sequence:$1xIs equivalent to${1x}Rather thanThe ${1}X,$10Is equivalent toThe ${10}Rather thanThe ${1}0. so$nameWhen followed by a space/line feed,${name}Applicable in all cases. If you want to insert a literal value into the output'$'Can be used in template? .Copy the code

Having said so much, in fact, the final part can be summarized as three points:

  1. $The grouped index of a regular expression is followed only by a number.

Capture groups can be numbered by counting their open brackets from left to right. For example, in the expression (A)(B(C)), there are four groups like this:

0 (A)(B(C))
1 (A)
2 (B(C))
3 (C)

The group zero always represents the entire expression

Capture groups are named this way because in a match, each subsequence of the input sequence that matches these groups is saved. The captured subsequence can be used later in an expression via a Back reference or retrieved from the matcher after the match operation is complete.

Matches the $1 part of the regular expression, leaving that part and leaving the rest;

  1. $Followed by the string, i.e$name, stands for matchThe corresponding (? P…) The syntax produces the name of the capture group
  2. ${number} string, i.e.,${1}xxx, which means matching the regular expression group 1,srcMatches the reservation of group 1 and deletes itsrcThe remainder, appendxxxCode examples will explain this part later, which is also the hardest part to understand
  3. The simplest case, parametersreplIs a string that replaces all re matches in SRC with repL

The following code is used to explain the above cases:

package main
import (
"fmt"
"regexp"
)
func main(a) {

	s := "Hello World, 123 Go!"
	// Define a regular expression reg that matches either Hello or Go
	reg := regexp.MustCompile(`(Hell|G)o`)
	
	s2 := "2019-12-01,test"
	// Define a regular expression reg2 that matches the date format of YYYY-MM-DD
	reg2 := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    
    // In the simplest case, replace "a(x*)b" with "T" in "-ab-axxb-"
    reg3 := regexp.MustCompile("a(x*)b")
	fmt.Println(re.ReplaceAllString("-ab-axxb-"."T"))
	
    ${1} matches "Hello World, 123 Go!" In accord with regular ` (Hell | G) ` part and keep, remove the "Hello" and "Go" in the 'o' and "DDD" appended
	rep1 := "${1}ddd"
	fmt.Printf("%q\n", reg.ReplaceAllString(s, rep1))
    
    / / in the first place, "2019-12-01, the test" in accord with regular expression ` (\ d {4}) - (\ d {2}) - (\ d {2}) ` part is "2019-12-01", the partial match '(\ d {4}) of the' 2019 ', get rid of the rest
    rep2 := "${1}"
	fmt.Printf("%q\n", reg2.ReplaceAllString(s2,rep2))
    
    / / in the first place, "2019-12-01, the test" in accord with regular expression ` (\ d {4}) - (\ d {2}) - (\ d {2}) ` part is "2019-12-01", the partial match '(\ d {2})' '12' reserves, get rid of the rest
     rep3 := "${2}"
	fmt.Printf("%q\n", reg2.ReplaceAllString(s2,rep3))
    
    / / in the first place, "2019-12-01, the test" in accord with regular expression ` (\ d {4}) - (\ d {2}) - (\ d {2}) ` part is "2019-12-01", the partial match '(\ d {2})' '01' reserves, remove the rest, and additional "13:30:12"
    rep4 := "${3} : 13:30:12."
	fmt.Printf("%q\n", reg2.ReplaceAllString(s2,rep4))
	}
Copy the code

The output of the above code is, in order:

$ go run main.go
-T-T-
"Hellddd World, 123 Gddd!"
"2019,test"
"12,test"
"01:13:30:12,test"

Copy the code

conclusion

The ReplaceAllString design in the Goregexp package is a bit anti-human and feels awkward to understand and use. If you have a better understanding or sample code, Call me!