background

In our actual development process, it is inevitable to do some string concatenation work. For example, concatenate an array into a sentence according to certain punctuation, concatenate two fields into a sentence, and so on.

In our Go language, there are many methods for string splicing, the most common may be directly splicing with “+” plus sign, or slicing with join, or FMT.Sprintf(“”) to assemble data.

This raises the question, how can we use string concatenation efficiently, and how do different string concatenation methods affect performance in high concurrency scenarios online?

Below, I’ll test several string concatenation methods that are common in the Go language to see how well each one performs.

0 Preparations

In order to test the actual effect of each method, this article will use Benchmark to test. Here is only a brief introduction to Benchmark, and there will be another article to introduce benchmark in detail.

Benchmark is a built-in test tool of Go. Using Benchmark, we can easily and quickly test the performance of a function method in serial and parallel environments. Specify a time (default test is 1 second), and see how many times the method under test can be executed and how much memory is allocated when the time limit is reached.

Common apis for Benchmark include the following:

// Start the timer
b.StartTimer() 
// Stop the timer
b.StopTimer() 
// Reset the timer
b.ResetTimer()
b.Run(name string, f func(b *B))
b.RunParallel(body func(*PB))
b.ReportAllocs()
b.SetParallelism(p int)
b.SetBytes(n int64)
testing.Benchmark(f func(b *B)) BenchmarkResult
Copy the code

This paper mainly uses the following three kinds

b.StartTimer()   // Start the timer
b.StopTimer()    // Stop the timer
b.ResetTimer()   // Reset the timer
Copy the code

After writing the test file, execute the command go test-bench =.-benchmem to execute the test file and display the memory

1 Build test cases

Here I will have a global slice in the test file, which will be used to make the original data set for stitching.

var StrData = []string{"Efficient Concatenation of Strings in Go"}
Copy the code

This global slice is then enlarged using data assembly in the init function, and you can control how the larger slice splice differs from the smaller slice splice.

func init(a) {
    for i := 0; i < 200; i++ {
        StrData = append(StrData, "Efficient Concatenation of Strings in Go")}}Copy the code

1.1 “+” direct splicing

func StringsAdd(a) string {
    var s string
    for _, v := range StrData {
        s += v
    }
    return s
}
// Test method
func BenchmarkStringsAdd(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        StringsAdd()
    }
    b.StopTimer()
}
Copy the code

1.2 Use FMT package for assembly

func StringsFmt(a) string {
    var s string = fmt.Sprint(StrData)
    return s
}

// Test method
func BenchmarkStringsFmt(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        StringsFmt()
    }
    b.StopTimer()
}
Copy the code

1.3 Use the join method of the Strings package

func StringsJoin(a) string {
    var s string = strings.Join(StrData, "")
    return s
}

// Test method
func BenchmarkStringsJoin(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        StringsJoin()
    }
    b.StopTimer()
}
Copy the code

1.4 Using bytes.buffer stitching

func StringsBuffer(a) string {
    var s bytes.Buffer
    for _, v := range StrData {
        s.WriteString(v)
    }
    return s.String()
}
// Test method
func BenchmarkStringsBuffer(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        StringsBuffer()
    }
    b.StopTimer()
}
Copy the code

1.5 Use Strings. Builder for stitching

func StringsBuilder(a) string {
    var b strings.Builder
    for _, v := range StrData {
        b.WriteString(v)
    }
    return b.String()
}
// Test method
func BenchmarkStringsBuilder(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        StringsBuilder()
    }
    b.StopTimer()
}
Copy the code

2 test results and analysis

2.1 Run the test using the Benchmark and view the result

Next, run the go test-bench =. -benchmem command to get the benchmark test results.

From the test results, we can preliminarily draw the following conclusions:

  • We directly use “+” splicing is the most time-consuming and memory consuming operation. Memory will be allocated for each iteration within the b.N cycle.
  • The execution of the Strings.Join method has the most times, which means the least time consuming, and the memory allocation is the least, and the memory allocated for each iteration is also the least.
  • Using the strings. Builder type for concatenation, allocating 13 extra memory per iteration, does not provide a significant performance advantage, but Go has added this type since 1.10 and is gradually using it.

The above conclusion seems to make some sense, but is it true? We might as well take our time, from the five splicing methods next to each parameter explanation to see whether it is consistent with expectations.

2.2 Result analysis of “+” splicing

From the above test case, we loop through a slice 200 times, append 200 elements to it, and add one element of our own to get a slice of 201 length. The result of this slice is a loop of 201. Benchmark’s last column shows that memory has been allocated an “extra” 200 times, excluding the initial allocation, just as we know it: using a “+” concatenation string reallocations memory.

With + splicing, the average amount of memory and time allocated is the largest, and therefore the total number of executions is the lowest.

So, what’s the difference if we do large text concatenation versus small text concatenation? I’ll test small text concatenation later.

2.3 Result analysis of FMT package stitching

We see that FMT package concatenation is second only to “+” concatenation, but memory reallocation is greater than 200 times, which is a little strange. What causes the number of additional memory allocations, is it 3 times per iteration? Let’s do an experiment:

Let’s change the slice size to 1 to see if additional memory allocation exists, again using benchmark:

We then change the slice length to 2 and check the benchmark result:

Finally, after several tests, we found that each iteration does have 3 extra memory allocations. So, where are these 3 extra memory allocations?

We output the results of the Benchmark test to a file and view it using pprof, using the following command:

Use benchmark to collect data3Seconds of data and generate filesgo test -bench=. -benchmem  -benchtime=3-memprofile=mem_profile.outgo tool pprof -http="127.0.0.1:8080" mem_profile.out
Copy the code

After execution, the default browser will be used to open a Web interface to view the specific collected data content. We click in turn according to the red box in the picture

Get the final url is: http://127.0.0.1:8080/ui/top? si=alloc_space

At this point, we see the following:

As we see, the FMT.Sprint method has these three memory allocations.

2.4 The strings.Join method is used to analyze the results of stitching

From the above test content, the strings.Join method has the best implementation effect, with the lowest time and memory consumption, and only one extra memory allocation. Let’s check the internal implementation code of the strings.Join method.

// Join concatenates the elements of its first argument to create a single string. The separator
// string sep is placed between elements in the resulting string.
func Join(elems []string, sep string) string {
    switch len(elems) {
    case 0:
        return ""
    case 1:
        return elems[0]
    }
    n := len(sep) * (len(elems) - 1)
    for i := 0; i < len(elems); i++ {
        n += len(elems[i])
    }

    var b Builder
    b.Grow(n)
    b.WriteString(elems[0])
    for _, s := range elems[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}
Copy the code

As you can see from line 15, the Join method also uses the Builder type in the Strings package. Later, I will separately compare the internal effects of strings.Builder and Join methods written by myself.

2.5 Analyze the result of stitching using bytes.Buffer

Bytes. Buffer is a Buffer type Buffer, which contains all bytes. The structure of buffer is defined as follows:

// A Buffer is a variable-sized buffer of bytes with Read and Write methods.
// The zero value for Buffer is an empty buffer ready to use.
type Buffer struct {
    buf      []byte // contents are the bytes buf[off : len(buf)]
    off      int    // read at &buf[off], write at &buf[len(buf)]
    lastRead readOp // last read operation, so that Unread* can work correctly.
}
Copy the code

Before Go 1.10, using Buffer was definitely a more efficient option. Var b bytes.Buffer is used to store the final concatenated string. This avoids the problem of rearranging the intermediate string each time the string is concatenated. But there are still problems with []byte -> string conversion and memory copy.

2.6 The strings.Builder method was used to analyze the results of splicing

Since Go 1.10, Go officially introduced strings.Builder as a feature, which can greatly improve the efficiency of string stitching. Some codes are posted below:

// A Builder is used to efficiently build a string using Write methods.
// It minimizes memory copying. The zero value is ready to use.
// Do not copy a non-zero Builder.
type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte
}

func (b *Builder) copyCheck(a) {
    if b.addr == nil {
        // This hack works around a failing of Go's escape analysis
        // that was causing b to escape and be heap allocated.
        // See issue 23382.
        // TODO: once issue 7921 is fixed, this should be reverted to
        // just "b.addr = b".
        b.addr = (*Builder)(noescape(unsafe.Pointer(b)))
    } else ifb.addr ! = b {panic("strings: illegal use of non-zero Builder copied by value")}}// String returns the accumulated string.
func (b *Builder) String(a) string {
    return* (*string)(unsafe.Pointer(&b.buf))
}


// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...)
    return len(s), nil
}

// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
    buf := make([]byte.len(b.buf), 2*cap(b.buf)+n)
    copy(buf, b.buf)
    b.buf = buf
}
Copy the code

Cache.safe.int (); unsafe.int (); unsafe.int (); unsafe.int (); At the same time, the problem of sufficient memory allocation is avoided. The library also implements a copyCheck method that compares hack code to avoid buFs escaping to the heap.

As we mentioned earlier, string concatenation using String. Join is basically strings.Builder.

3. Comparison between Strings. Builder and Strings. Join

In order to compare the efficiency of the two methods, I have again posted two method comparison codes.

Strings. Join key code:

func Join(elems []string, sep string) string {
    switch len(elems) {
    case 0:
        return ""
    case 1:
        return elems[0]
    }
    n := len(sep) * (len(elems) - 1)
    for i := 0; i < len(elems); i++ {
        n += len(elems[i])
    }

    var b Builder
    b.Grow(n)
    b.WriteString(elems[0])
    for _, s := range elems[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}
Copy the code

I wrote my own strings.Builder stitching method:

func StringsBuilder(a) string {

    var b strings.Builder
    for _, v := range StrData {
        b.WriteString(v)
    }

    return b.String()
}
Copy the code

Here I find that there is a b. row(n) operation in line 14 of the Join method, which is the initial capacity allocation, and the length of n calculated above is the length of the slice we want to splice. At this time, I try to add a memory allocation method to my splice method for comparison.

func StringsBuilder(a) string {
    
    n := len("") * (len(StrData) - 1)
    for i := 0; i < len(StrData); i++ {
        n += len(StrData[i])
    }

    var b strings.Builder
    b.Grow(n)
    b.WriteString(StrData[0])
    for _, s := range StrData[1:] {
        b.WriteString("")
        b.WriteString(s)
    }
    return b.String()
}
Copy the code

Run benchmark again to check the test results

I suddenly found that it was different from what I had expected at the beginning. The strings. Builder was more advantageous. Why? Come to think of it, the strings.join () method passes the parameter. Why does passing the parameter not cause this difference? Now I’m passing parameters to the StringsBuilder method as well.

func StringsBuilder(strData []string,sep string) string {
    
    n := len(sep) * (len(strData) - 1)
    for i := 0; i < len(strData); i++ {
        n += len(strData[i])
    }

    var b strings.Builder
    b.Grow(n)
    b.WriteString(strData[0])
    for _, s := range strData[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}
Copy the code

Run benchmark again to check the test resultsAnd you can see that. Two executions is almost the same.

4. Build small string test and analysis

The above tests all concatenate strings based on larger slices. What if we had a smaller slice to concatenate? Which of the five methods would be more efficient? I chose a slice of length 2 for splicing.It can be seen that the efficiency of stitching using strings package is relatively obvious, but it is close to the efficiency of direct “+” stitching.

5 concludes

The above is a comparison of the efficiency of five commonly used string stitching methods. The official method is strings.Builder, but it has to be said that the method becomes more flexible according to different business scenarios. However, for a lot of string concatenation, we still try to use strings.Builder. When the strings.Join method is used to splice slice, the efficiency is more or less affected due to the parameter transfer involved.

Therefore, in the case of large string concatenation, the efficiency of the five ways is in descending order:

Strings. Builder ≈ strings.Join > strings.Buffer > “+” > FMT

This article is published by OpenWrite!