Strings in Go deserve special attention, because they are implemented differently in Go than in other languages.

string

In Go, use double quotation marks to declare strings:

s := "Hello world"
fmt.Println("len(s):".len(s))
fmt.Println(s);       
Copy the code

Output:

len(s): 11
Hello world
Copy the code

The above code declares the string s. Len returns the number of bytes (including Spaces) of the string S. In Go, strings are actually read-only slices of bytes.

s := "Hello world"
for i:=0; i<len(s); i++ { fmt.Print(s[i],"")}Copy the code

What do you think this code will print, letters? Not really:

72 101 108 108 111 32 119 111 114 108 100
Copy the code

The output is the decimal number corresponding to each letter in the ASCII code table. As is well known, Go uses UTF-8 encoding, which is compatible with ASCII encoding, except that ASCII requires only one byte, whereas UTF-8 requires 1-4 bytes to represent a symbol.

s := "Hello world"
for i:=0; i<len(s) ; i++ { fmt.Printf("%c ",s[i])
}
fmt.Println("")
for i:=0; i<len(s) ; i++ { fmt.Printf("%v ",s[i])      
}
fmt.Println("")
for i:=0; i<len(s) ; i++ { fmt.Printf("%x ",s[i])    
}
fmt.Println("")
for i:=0; i<len(s) ; i++ { fmt.Printf("%T ",s[i])
}
Copy the code

Output:

H e l l o   w o r l d 
72 101 108 108 111 32 119 111 114 108 100 
48 65 6c 6c 6f 20 77 6f 72 6c 64 
uint8 uint8 uint8 uint8 uint8 uint8 uint8 uint8 uint8 uint8 uint8 
Copy the code

In the code above, %v formats the output byte to a decimal value; %x is printed in hexadecimal format; %T Format the type of output value. As can be seen from the result, the value type is uint8, namely byte type, which is the alias of Uint8. Byte type is introduced in my article. Let’s look at what happens when a string contains non-ASCII characters.

s := "Hellõ World"
fmt.Println("len(s):".len(s))
for i := 0; i < len(s); i++ {
	fmt.Printf("%c ", s[i])
}
fmt.Println("")
for i := 0; i < len(s); i++ {
	fmt.Printf("%v ", s[i])
}
fmt.Println("")
for i := 0; i < len(s); i++ {
	fmt.Printf("%x ", s[i])
}
Copy the code

Output:

len(s): 12
H e l l à µ   W o r l d 
72 101 108 108 195 181 32 87 111 114 108 100 
48 65 6c 6c c3 b5 20 57 6f 72 6c 64
Copy the code

In the example above, replace o with O. As you can see from the result, the string is 12 bytes long, indicating that o takes up two bytes. The output of O, however, becomes A µ. The Unicode code point for O is U+00F5, whose UTF-8 encoding takes up two bytes c3 and b5. The for loop reads in bytes, with C3 (decimal 195) for character A and b5 (decimal 181) for character µ (see here). Familiarity with ASCII, UTF-8, and Unicode will help you understand these topics, which will not be covered in detail in this article, but can be found here. In UTF-8 encoding, a code point occupies at least one byte, and if we print a character with a code point occupying one byte, we will definitely have a problem, as in the example above. Is there a way to solve this problem? Fortunately, Go provides rune for us.

Rune

Rune, the built-in data type for Go, is an alias for INT32 and represents Unicode code points in Go. With the Rune data type, developers don’t have to worry about how many bytes a code point takes.

s := "Hellõ World"
r := []rune(s)

fmt.Println("len(r):".len(r))
for i := 0; i < len(r); i++ {
	fmt.Printf("%c ", r[i])
}
fmt.Println("")
for i := 0; i < len(r); i++ {
	fmt.Printf("%v ", r[i])
}
fmt.Println("")
for i := 0; i < len(r); i++ {
	fmt.Printf("%x ", r[i])
}
fmt.Println("")
for i := 0; i < len(r); i++ {
	fmt.Printf("%T ", r[i])
}
Copy the code

Output:

len(r): 11
H e l l õ   W o r l d 
72 101 108 108 245 32 87 111 114 108 100 
48 65 6c 6c f5 20 57 6f 72 6c 64 
int32 int32 int32 int32 int32 int32 int32 int32 int32 int32 int32 
Copy the code

In the code above, the string S is typed to a rune slice. The Unicode code point for O is U+00F5, which corresponds to 245 in decimal. See here. The length of slice R is: 11; Int32: rune is an alias for int32.

for rangestring

The above example solves the problem nicely, but there is a better way — range String. Looping over a string with range returns the character and byte index of type Rune.

s := "HellõWorld"
for index, char := range s {
	fmt.Printf("%c starts at byte index %d \n", char,index)
}
Copy the code

Output:

H starts at byte index 0 
e starts at byte index 1 
l starts at byte index 2 
l starts at byte index 3O starts atbyte index 4 
W starts at byte index 6 
o starts at byte index 7 
r starts at byte index 8 
l starts at byte index 9 
d starts at byte index 10
Copy the code

As you can see from the output, o takes up two bytes: indexes 4 and 5.

If you read this, you might wonder, how do I get the length of a string?

Length of the string

We can use the RuneCountInString() function, which looks like this:

func RuneCountInString(s string) (n int)
Copy the code

Returns the number of rune characters in a string.

s := "Hello China"
length := utf8.RuneCountInString(s)
fmt.Println(length)
Copy the code

Output: 8

Strings are immutable

As we said earlier, strings are read-only slices of bytes that, once created, are immutable. If the modification is forced, an error is reported:

s := "Hello World"
s[0] = "h"
Copy the code

Cannot assign to s[0]

There are several important points in this article:

  1. Strings are read-only slices of bytes;
  2. runeIn GoUnicodeCode points;
  3. Go withUTF-8The code, the way the code isUnicodeOne of the implementation methods;
  4. Be familiar withASCIIUTF-8Unicode, you can refer tothis”Is more conducive to understanding the article;

I hope this article can solve your questions about Go String. If you don’t understand, please leave a comment and discuss!


Original article, if need to be reproduced, please indicate the source! Check out “Golang is coming” or go to seekload.net for more great articles.

The public account “Golang is coming” has prepared a mystery learning gift package for you, and the background replies [ebook] to get it!