preface
When we use GO for data serialization or deserialization operations, we may often involve converting strings and byte arrays. Such as:
ifstr, err := json.Marshal(from); err ! =nil {
panic(err)
} else {
return string(str)
}
Copy the code
Json is serialized as []byte, which needs to be converted to a string. When the amount of data is small, the overhead of conversion between types is negligible, but when the amount of data is large, it can become a performance bottleneck, which can be reduced by using efficient conversion methods
The data structure
Before you can understand how it transforms, you need to understand the underlying data structure. Right
This article is based on Go 1.13.12
String:
type stringStruct struct {
str unsafe.Pointer
len int
}
Copy the code
Slice:
type slice struct {
array unsafe.Pointer
len int
cap int
}
Copy the code
In contrast to slice’s structure, String lacks a cap field that represents capacity, so you can’t use the built-in cap() function for string traversal
So why doesn’t string need cap? Because strings are designed to be immutable in Go (and certainly in many other languages),
Since it cannot append elements like Slice, there is no need for the CAP field to determine whether the underlying array has exceeded its capacity to determine whether to expand
Only the len attribute does not affect reads such as for-range, because the for-range operation only depends on len to decide whether to break out of the loop
So why do strings have to be immutable? This ensures that the underlying array of the string does not change
For example, a map uses string as its key. If the underlying character array changes, the calculated hash value will also change, so that the previous value cannot be found when the map is located. Therefore, the immutable property of the map can avoid this situation. In addition, immutable features can also keep data thread-safe
Conventional implementation
String immutable has many advantages. In order to maintain its immutable properties, the interconversion between string and byte array is usually implemented by copying data:
Var a string = "Hello world" var b [] Byte = [] Byte (a) // String conversion []byte A = String (b) // []byte conversion stringCopy the code
This approach is simple to implement, but is implemented through underlying data replication, which is converted at compile time into function calls to SliceByteToString and StringTosliceByte, respectively
Turn the string [] byte
func stringtoslicebyte(buf *tmpBuf, s string) []byte {
var b []byte
ifbuf ! =nil && len(s) <= len(buf) {
*buf = tmpBuf{}
b = buf[:len(s)]
} else {
// Request memory
b = rawbyteslice(len(s))
}
// Copy data
copy(b, s)
return b
}
Copy the code
It determines whether to use BUF or call RawByteslice to request a slice based on whether the return value escapes to the heap and whether the length of BUF is sufficient. Either way, a copy is performed to copy the underlying data
[] byte string
func slicebytetostring(buf *tmpBuf, b []byte) (str string) {
l := len(b)
if l == 0 {
return ""
}
if l == 1 {
stringStructOf(&str).str = unsafe.Pointer(&staticbytes[b[0]])
stringStructOf(&str).len = 1
return
}
var p unsafe.Pointer
ifbuf ! =nil && len(b) <= len(buf) {
p = unsafe.Pointer(buf)
} else {
p = mallocgc(uintptr(len(b)), nil.false)}// Assign the underlying pointer
stringStructOf(&str).str = p
// Assign length
stringStructOf(&str).len = len(b)
// Copy data
memmove(p, (*(*slice)(unsafe.Pointer(&b))).array, uintptr(len(b)))
return
}
Copy the code
You first deal with the case of length 0 or 1, then decide whether to use BUF or apply for a new memory through mallocGC, but either way, you end up copying the data
The len attribute of the converted string is set here
Efficient implementation
If the program guarantees that no changes will be made to the underlying data, can it improve performance by simply converting the type and not copying the data?
Unsafe.Pointer, int, and uintpt all take up the same amount of memory
var v1 unsafe.Pointer
var v2 int
var v3 uintptr
fmt.Println(unsafe.Sizeof(v1)) / / 8
fmt.Println(unsafe.Sizeof(v2)) / / 8
fmt.Println(unsafe.Sizeof(v3)) / / 8
Copy the code
Therefore, from the perspective of underlying structure, String can be seen as [2] Uintptr, and [] Byte slice type can be seen as [3] Uintptr
[3]uintptr{PTR,len,len}
Here we generate a CAP field for the Slice structure. Not generating a CAP field has no effect on the read operation, but there may be problems with the converted Slice Append element for the following reasons: In this way, the cap attribute of slice is random and may be larger than len, so append does not create a new segment of memory to store elements, but appends it to the end of the original array. If the following memory is unwritable, panic will occur
[] int = string; cap = string
The implementation is as follows:
func stringTobyteSlice(s string) []byte {
tmp1 := (*[2]uintptr)(unsafe.Pointer(&s))
tmp2 := [3]uintptr{tmp1[0], tmp1[1], tmp1[1]}
return* (* []byte)(unsafe.Pointer(&tmp2))
}
func byteSliceToString(bytes []byte) string {
return* (*string)(unsafe.Pointer(&bytes))
}
Copy the code
Unsafe.Pointer is used to convert Pointers of different types, without copying the underlying data
The performance test
Next we test the performance of the efficient implementation by choosing a string or byte array of length 100 for conversion
Test the following four methods respectively:
func stringTobyteSlice(s string) []byte {
tmp1 := (*[2]uintptr)(unsafe.Pointer(&s))
tmp2 := [3]uintptr{tmp1[0], tmp1[1], tmp1[1]}
return* (* []byte)(unsafe.Pointer(&tmp2))
}
func stringTobyteSliceOld(s string) []byte {
return []byte(s)
}
func byteSliceToString(bytes []byte) string {
return* (*string)(unsafe.Pointer(&bytes))
}
func byteSliceToStringOld(bytes []byte) string {
return string(bytes)
}
Copy the code
The test results are as follows:
BenchmarkStringToByteSliceOld - 12 28637332 42.0 ns/op BenchmarkStringToByteSliceNew - 12 1000000000 0.496 ns/op BenchmarkByteSliceToStringOld - 12 32595271 36.0 ns/op BenchmarkByteSliceToStringNew - 12 1000000000 0.256 ns/opCopy the code
As you can see, the performance difference is quite large, and the performance improvement is more obvious if the string or byte array to be converted is longer
conclusion
This article introduces the underlying data structures of strings and arrays, as well as efficient interchangeover methods. It is important to note that this applies to scenarios where the program can guarantee that the underlying data will not be modified. If this is not guaranteed and the underlying data may cause exceptions when modified, copy is still used