preface

When we use GO for data serialization or deserialization operations, we may often involve converting strings and byte arrays. Such as:

ifstr, err := json.Marshal(from); err ! =nil {

    panic(err)

} else {

    return string(str)

}
Copy the code

Json is serialized as []byte, which needs to be converted to a string. When the amount of data is small, the overhead of conversion between types is negligible, but when the amount of data is large, it can become a performance bottleneck, which can be reduced by using efficient conversion methods

The data structure

Before you can understand how it transforms, you need to understand the underlying data structure. Right

This article is based on Go 1.13.12

String:

type stringStruct struct {

   str unsafe.Pointer

   len int

}
Copy the code

Slice:

type slice struct {

   array unsafe.Pointer

   len   int

   cap   int

}
Copy the code

In contrast to slice’s structure, String lacks a cap field that represents capacity, so you can’t use the built-in cap() function for string traversal

So why doesn’t string need cap? Because strings are designed to be immutable in Go (and certainly in many other languages),

Since it cannot append elements like Slice, there is no need for the CAP field to determine whether the underlying array has exceeded its capacity to determine whether to expand

Only the len attribute does not affect reads such as for-range, because the for-range operation only depends on len to decide whether to break out of the loop

So why do strings have to be immutable? This ensures that the underlying array of the string does not change

For example, a map uses string as its key. If the underlying character array changes, the calculated hash value will also change, so that the previous value cannot be found when the map is located. Therefore, the immutable property of the map can avoid this situation. In addition, immutable features can also keep data thread-safe

Conventional implementation

String immutable has many advantages. In order to maintain its immutable properties, the interconversion between string and byte array is usually implemented by copying data:

Var a string = "Hello world" var b [] Byte = [] Byte (a) // String conversion []byte A = String (b) // []byte conversion stringCopy the code

This approach is simple to implement, but is implemented through underlying data replication, which is converted at compile time into function calls to SliceByteToString and StringTosliceByte, respectively

Turn the string [] byte

func stringtoslicebyte(buf *tmpBuf, s string) []byte {

   var b []byte

   ifbuf ! =nil && len(s) <= len(buf) {

      *buf = tmpBuf{}

      b = buf[:len(s)]

   } else {

      // Request memory

      b = rawbyteslice(len(s))

   }

   // Copy data

   copy(b, s)

   return b

}
Copy the code

It determines whether to use BUF or call RawByteslice to request a slice based on whether the return value escapes to the heap and whether the length of BUF is sufficient. Either way, a copy is performed to copy the underlying data

[] byte string

func slicebytetostring(buf *tmpBuf, b []byte) (str string) {

   l := len(b)

   if l == 0 {

 return ""

   }

   if l == 1 {

      stringStructOf(&str).str = unsafe.Pointer(&staticbytes[b[0]])

      stringStructOf(&str).len = 1

      return

   }



   var p unsafe.Pointer

   ifbuf ! =nil && len(b) <= len(buf) {

      p = unsafe.Pointer(buf)

   } else {

      p = mallocgc(uintptr(len(b)), nil.false)}// Assign the underlying pointer

   stringStructOf(&str).str = p

   // Assign length

   stringStructOf(&str).len = len(b)

   // Copy data

   memmove(p, (*(*slice)(unsafe.Pointer(&b))).array, uintptr(len(b)))

   return

}
Copy the code

You first deal with the case of length 0 or 1, then decide whether to use BUF or apply for a new memory through mallocGC, but either way, you end up copying the data

The len attribute of the converted string is set here

Efficient implementation

If the program guarantees that no changes will be made to the underlying data, can it improve performance by simply converting the type and not copying the data?

Unsafe.Pointer, int, and uintpt all take up the same amount of memory

var v1 unsafe.Pointer

var v2 int

var v3 uintptr

fmt.Println(unsafe.Sizeof(v1)) / / 8

fmt.Println(unsafe.Sizeof(v2)) / / 8

fmt.Println(unsafe.Sizeof(v3)) / / 8
Copy the code

Therefore, from the perspective of underlying structure, String can be seen as [2] Uintptr, and [] Byte slice type can be seen as [3] Uintptr

[3]uintptr{PTR,len,len}

Here we generate a CAP field for the Slice structure. Not generating a CAP field has no effect on the read operation, but there may be problems with the converted Slice Append element for the following reasons: In this way, the cap attribute of slice is random and may be larger than len, so append does not create a new segment of memory to store elements, but appends it to the end of the original array. If the following memory is unwritable, panic will occur

[] int = string; cap = string

The implementation is as follows:

func stringTobyteSlice(s string) []byte {

   tmp1 := (*[2]uintptr)(unsafe.Pointer(&s))

   tmp2 := [3]uintptr{tmp1[0], tmp1[1], tmp1[1]}

   return* (* []byte)(unsafe.Pointer(&tmp2))

}



func byteSliceToString(bytes []byte) string {

   return* (*string)(unsafe.Pointer(&bytes))

}
Copy the code

Unsafe.Pointer is used to convert Pointers of different types, without copying the underlying data

The performance test

Next we test the performance of the efficient implementation by choosing a string or byte array of length 100 for conversion

Test the following four methods respectively:

func stringTobyteSlice(s string) []byte {

   tmp1 := (*[2]uintptr)(unsafe.Pointer(&s))

   tmp2 := [3]uintptr{tmp1[0], tmp1[1], tmp1[1]}

   return* (* []byte)(unsafe.Pointer(&tmp2))

}



func stringTobyteSliceOld(s string) []byte {

   return []byte(s)

}



func byteSliceToString(bytes []byte) string {

   return* (*string)(unsafe.Pointer(&bytes))

}



func byteSliceToStringOld(bytes []byte) string {

   return string(bytes)

}
Copy the code

The test results are as follows:

BenchmarkStringToByteSliceOld - 12 28637332 42.0 ns/op BenchmarkStringToByteSliceNew - 12 1000000000 0.496 ns/op BenchmarkByteSliceToStringOld - 12 32595271 36.0 ns/op BenchmarkByteSliceToStringNew - 12 1000000000 0.256 ns/opCopy the code

As you can see, the performance difference is quite large, and the performance improvement is more obvious if the string or byte array to be converted is longer

conclusion

This article introduces the underlying data structures of strings and arrays, as well as efficient interchangeover methods. It is important to note that this applies to scenarios where the program can guarantee that the underlying data will not be modified. If this is not guaranteed and the underlying data may cause exceptions when modified, copy is still used