How to efficiently concatenate strings in Go
preface
Hello, everyone, I’m Asong
The operation of string concatenation is inseparable from daily business development. Different languages have different ways of realizing string concatenation. In Go language, there are six ways to concatenate string. Which is more efficient to use? In this article, we will analyze it together.
This article uses Go language version 1.17.1
Type string
Let’s first look at the structure definition of string in Go. Let’s first look at the official definition:
// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.
type string string
Copy the code
A string is a collection of 8-bit bytes, usually but not necessarily representing UTF-8 encoded text. String can be empty, but it cannot be nil. The value of string cannot be changed.
String is essentially a structure, defined as follows:
type stringStruct struct {
str unsafe.Pointer
len int
}
Copy the code
StringStruct is similar to slice in that STR points to the beginning of an array and len to the length of the array. What array does slice point to? Let’s look at the method he calls when instantiating:
//go:nosplit
func gostringnocopy(str *byte) string {
ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
s := *(*string)(unsafe.Pointer(&ss))
return s
}
Copy the code
The input parameter is a pointer to a byte. From this we can see that string is an array of bytes underneath, so we can draw a picture like this:
A string is essentially an array of bytes. In Go, strings are designed to be immutable. Not only in Go, but in other languages as well. In a concurrent scenario, we can use the same string multiple times without locking it, ensuring efficient sharing without worrying about security.
The string type cannot be changed, but it can be replaced because the STR pointer in a stringStruct can be changed, but the pointer to the string cannot be changed. In other words, every time the string is changed, memory needs to be reallocated and the space allocated is reclaimed by the GC.
String concatenation: string concatenation: string concatenation
6 ways and principles of string splicing
Native stitching mode “+”
The Go language supports direct concatenation of two strings using the + operator, as shown in the following example:
var s string
s += "asong"
s += "Handsome"
Copy the code
This is the easiest way to use and is available in almost all languages. Concatenation with the + operator iterates over the string, evaluates and creates a new space to store the original two strings.
String formatting functionfmt.Sprintf
The Go language uses the default function fmt.Sprintf for string formatting, so string concatenation can also be done this way:
str := "asong"
str = fmt.Sprintf("%s%s", str, str)
Copy the code
FMT.Sprintf implementation principle is mainly the use of reflection, specific source analysis because of the length of the reason is not here detailed analysis, see reflection, will produce performance loss, you know!!
Strings.builder
Go language provides a library for manipulating strings. Strings.Builder can be used to concatenate strings, and writeString method is provided to concatenate strings as follows:
var builder strings.Builder
builder.WriteString("asong")
builder.String()
Copy the code
The implementation principle of Strings. builder is very simple, and its structure is as follows:
type Builder struct {
addr *Builder // of receiver, to detect copies by value
buf []byte / / 1
}
Copy the code
The writeString() method is used to append data to the buf file as a byte slice:
func (b *Builder) WriteString(s string) (int, error) {
b.copyCheck()
b.buf = append(b.buf, s...)
return len(s), nil
}
Copy the code
The provided String method converts []]byte to String. In order to avoid memory copy problems, cast is used to avoid memory copy:
func (b *Builder) String(a) string {
return* (*string)(unsafe.Pointer(&b.buf))
}
Copy the code
bytes.Buffer
Since string is an array of bytes, we can concatenate the bytes.buffer of Go. Bytes. Buffer is a Buffer containing bytes. The usage is as follows:
buf := new(bytes.Buffer)
buf.WriteString("asong")
buf.String()
Copy the code
The underlying bytes.buffer is also a []byte slice with the following structure:
type Buffer struct {
buf []byte // contents are the bytes buf[off : len(buf)]
off int // read at &buf[off], write at &buf[len(buf)]
lastRead readOp // last read operation, so that Unread* can work correctly.
}
Copy the code
Since bytes.Buffer can continuously write data to the tail of the Buffer and read data from the head of the Buffer, the off field is used to record the read position and use the cap property of the slice to know the write position. This is not the focus of this article, but how the WriteString method concatenates strings:
func (b *Buffer) WriteString(s string) (n int, err error) {
b.lastRead = opInvalid
m, ok := b.tryGrowByReslice(len(s))
if! ok { m = b.grow(len(s))
}
return copy(b.buf[m:], s), nil
}
Copy the code
Memory blocks are not allocated when a slice is created. Memory blocks are allocated only when data is written into the slice. The size of the first memory block is the size of the data written into the slice. If the data to be written is less than 64 bytes, the request is 64 bytes. Using the dynamic slice extension mechanism, string appending is copied to the end in the form of copy. Copy is a built-in copy function, which can reduce memory allocation.
But the standard type is still used when converting []byte to string, so memory allocation occurs:
func (b *Buffer) String(a) string {
if b == nil {
// Special case, useful in debugging.
return "<nil>"
}
return string(b.buf[b.off:])
}
Copy the code
strings.join
The strings. join method can concatenate a string slice into a string. It can define join operators as follows:
baseSlice := []string{"asong"."Handsome"}
strings.Join(baseSlice, "")
Copy the code
Strings. join is also implemented based on strings.builder, and the code is as follows:
func Join(elems []string, sep string) string {
switch len(elems) {
case 0:
return ""
case 1:
return elems[0]
}
n := len(sep) * (len(elems) - 1)
for i := 0; i < len(elems); i++ {
n += len(elems[i])
}
var b Builder
b.Grow(n)
b.WriteString(elems[0])
for _, s := range elems[1:] {
b.WriteString(sep)
b.WriteString(s)
}
return b.String()
}
Copy the code
The only difference is that the method b. Row (n) is called in the join method, which is the initial capacity allocation, and the length of n calculated previously is the length of the slice we want to splice. Since the length of the slices we pass in is fixed, the capacity allocation in advance can reduce the memory allocation, which is very efficient.
sliceappend
Since string is also a byte array underneath, we can redeclare a slice and concatenate strings using append as follows:
buf := make([]byte.0)
base = "asong"
buf = append(buf, base...)
string(base)
Copy the code
If you want to reduce memory allocation, consider casting when converting []byte to string.
Benchmark comparison
We have provided a total of six methods, so we basically know the principle, so let’s use the Go language Benchmark to analyze which string concatenation method is more efficient. We mainly analyze it in two cases:
- Small string concatenation
- Large string concatenation
Because the amount of code is a little too much, the following only post the analysis results, the detailed code has been uploaded to github: github.com/asong2020/G…
Let’s start by defining a base string:
var base = "123456789qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASFGHJKLZXCVBNM"
Copy the code
For the test of a small number of string concatenation, we verify by concatenating base once, base concatenating base, so we get the Benckmark result:
goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/once
cpu: Intel(R) Core(TM) i9- 9880.H CPU @ 2.30GHz
BenchmarkSumString- 16 21338802 49.19 ns/op 128 B/op 1 allocs/op
BenchmarkSprintfString- 16 7887808 140.5 ns/op 160 B/op 3 allocs/op
BenchmarkBuilderString- 16 27084855 41.39 ns/op 128 B/op 1 allocs/op
BenchmarkBytesBuffString- 16 9546277 126.0 ns/op 384 B/op 3 allocs/op
BenchmarkJoinstring- 16 24617538 48.21 ns/op 128 B/op 1 allocs/op
BenchmarkByteSliceString- 16 10347416 112.7 ns/op 320 B/op 3 allocs/op
PASS
ok asong.cloud/Golang_Dream/code_demo/string_join/once 8.412s
Copy the code
Let’s build a string slice of length 200:
var baseSlice []string
for i := 0; i < 200; i++ {
baseSlice = append(baseSlice, base)
}
Copy the code
And I’m going to iterate over this slice and I’m going to keep stitching it, because I can benchmark it:
goos: darwin
goarch: amd64
pkg: asong.cloud/Golang_Dream/code_demo/string_join/muliti
cpu: Intel(R) Core(TM) i9- 9880.H CPU @ 2.30GHz
BenchmarkSumString- 16 7396 163612 ns/op 1277713 B/op 199 allocs/op
BenchmarkSprintfString- 16 5946 202230 ns/op 1288552 B/op 600 allocs/op
BenchmarkBuilderString- 16 262525 4638 ns/op 40960 B/op 1 allocs/op
BenchmarkBytesBufferString- 16 183492 6568 ns/op 44736 B/op 9 allocs/op
BenchmarkJoinstring- 16 398923 3035 ns/op 12288 B/op 1 allocs/op
BenchmarkByteSliceString- 16 144554 8205 ns/op 60736 B/op 15 allocs/op
PASS
ok asong.cloud/Golang_Dream/code_demo/string_join/muliti 10.699s
Copy the code
conclusion
Using the + operator to concatenate strings is efficient when concatenating a small number of strings. However, the performance of the + operator is poor when concatenating a large number of strings. The FMT.Sprintf function is still not suitable for string concatenation. Regardless of the number of concatenation strings, the performance loss is very high. Strings. Builder has stable performance no matter it is a small number of string splicing or a large number of string splicing, which is why strings.Builder is officially recommended for string splicing in Go. When using strings. Builder, it is better to use Grow method for preliminary capacity allocation. As can be seen from benchmark of strings.join method, because Grow method is used to allocate memory in advance, there is no need to copy strings during string concatenation. There is also no need to allocate new memory, which gives the best performance and minimum memory consumption using Strings. Builder. The bytes.Buffer method has lower performance than the strings.builder method. When bytes.Buffer is converted to a string, it applies for a new space to store the generated string variables. Unlike strings.buidler, which returns the underlying []byte as a string, this takes up more space.
Synchronous final analysis conclusion:
String concatenation using strings. Builder is most efficient in any case, but to use methods primarily, remember to call grow for capacity allocation. The performance of strings.join is approximately equal to that of strings.builder, which can be used when the string slice is already in use, but is not recommended when the string slice is unknown. If the + operator is the most convenient and has the highest performance for a small amount of string concatenation, the use of strings. Builder can be abandoned.
Comprehensive comparison performance ranking:
Strings.join ≈ strings.builder > bytes.buffer > []byte Conversion String > “+” > fmt.sprintf
conclusion
In this paper, we introduce six string stitching methods and compare the efficiency through Benckmark. No matter when strings. Builder is used, it is not wrong to use strings.
The code has been uploaded to github: github.com/asong2020/G…
Well, that’s the end of this article. I amasong
And we’ll see you next time.
Welcome to the public account: Golang Dream Factory