This is the 14th day of my participation in the August More Text Challenge
This article is about strings and the character slice. We will use an example in Leetcode to share the usage scenarios and differences between strings, byte slice, and rune
string
Introduction to Basic Use
A string is an immutable sequence of bytes that can contain arbitrary data, including zero-valued bytes. Traditionally, text strings are interpreted as utF-8 encoded sequences of Unicode code points (literal symbols).
The built-in len function returns the number of bytes of the string (not the number of literal symbols), and the subscript s[I] retrieves the i-th character, where 0<= I <len(s)
s := "hello, world"
fmt.Println(len(s))//"12"
fmt.Println(s[0], s[7])//"104 119"
Copy the code
The i-th byte of a string is not necessarily the i-th character, since utF-8 code points for non-ASCII characters require two or more bytes. The substring generation operation s[I :j] produces a new string whose contents are taken from the original string bytes, with subscripts starting from I (with boundary values) up to j (without boundary values), resulting in J-i bytes
fmt.Println(s[:5])// "hello"
fmt.Println(s[7:])// "world"
fmt.Println(s[:])// "hello,world"
Copy the code
The plus (+) operator can be used to concatenate two strings to produce a new string
fmt.Println("goodbye" + s[5:]) // "goodbye, world"
Copy the code
Although new values can be assigned to string variables, string values cannot be changed. The byte sequence contained in the string value itself is never mutable. To append a string to another string, write this
s := "left root"
t := s
s += ", right root"
fmt.Println(s) // "left root, right root"
fmt.Println(t)// "left root"
Copy the code
This does not change the original string value of S, but simply assigns the new string generated by the += statement to S (for the sliced understanding shared earlier, click here). Meanwhile, t still holds the old string value. Because strings are immutable, the data inside the string cannot be modified. (Immutable means that two strings can safely share a piece of underlying memory, making copying strings of any length cheap.)
S [0] = 'L' // error: s[0] cannot be assignedCopy the code
String and byte slice
Four standard packages are particularly important for string manipulation: Bytes, Strings, Strconv, and Unicode
- The Strings package provides many functions to search, replace, compare, trim, slice, and concatenate strings
- The Bytes package has a similar function for operating on the byte slice (type []byte, which has some of the same properties as a string). Because strings are immutable, building strings incrementally results in multiple memory allocations and copies. In this case, it is more efficient to use the bytes.buffer type
- The strconv package has functions that convert bools, integers, and floating-point numbers to their corresponding string forms, or convert strings to bools, integers, and floating-point numbers, as well as add/remove quotes to strings
- The Unicode package comes with functions that identify literal symbol value properties, such as IsDigit, IsLetter, IsUpper, and IsLower. Each function takes a single literal value as an argument and returns a Boolean value. If the literal symbol value is an English letter, conversion functions (such as ToUpper and ToLower) convert it to the specified case. All of the above functions follow the Unicode standard for alphabetic and alphanumeric classification. The Strings package has similar functions, also named ToUpper and ToLower, that perform specified transformations on each character of the original string to generate and return a new string
Strings can be converted to and from the character slice
s := "abc"
b := []byte(s)
s2 := string(b)
fmt.Println(s2)
Copy the code
Conceptually, the []byte(s) conversion allocates a new byte array, copies the bytes contained in s, and generates a slice reference to the entire array. Optimized compilers may avoid allocating memory and copying content in some cases, but in general, copying is necessary to ensure that the bytes of S remain the same (even if the bytes of B change after the transformation). Conversely, converting a byte slice to a string using string(b) also produces a copy, ensuring s2 is immutable as well
To avoid conversions and unnecessary memory allocation, both bytes packages and strings packages have a number of corresponding utility functions prepared in pairs. The Strings package has the following six functions:
func Contains(s, substr string) bool
func Count(s , sep string) int
func Fields(s string) []string
func HasPrefix (s , prefix string) bool
func Index(s , sep string) int
func Join(s [][]byte, sep []byte) []byte
Copy the code
The corresponding function in the bytes package is:
func Contains(b , subslice [] byte) bool
func Count(s , sep [ ] byte) int
func Fields(s [ ] byte ) [] [] byte
func HasPrefix(s , prefix [] byte) bool
func Index(s , sep [] byte) int
func Join(s [][]byte, sep []byte) []byte
Copy the code
Conversion between strings and numbers
Conversion between strings, literal symbols, and bytes is often also required to convert numeric values and their string representations to each other. This is done by strconv package functions
To convert integers to strings, one method is fmt.sprintf () and the other is strconv.itoa ().
x := 123
y := fmt.Sprintf("%d", x)
fmt.Println(y, strconv.Itoa(x)) // "123 123"
Copy the code
FormatInt and FormatUint can format data in different carries
Println(strconv.FormatInt(int64(x), 2))// "1111011" convert x to base 2Copy the code
The Atoi function or ParseInt function in the strconv package is used to interpret strings representing integers, and ParseUint is used for unsigned integers
X, err := strconv.Atoi("123")//x is an integer. Y, err := strconv.ParseInt("123", 10, 64)// The maximum value is 64Copy the code
The third argument to ParseInt, specifying what size of integer the result must match, for example, 16 for int16, and 0 for int, the most special value
Using the demonstration
Find the longest substring in the string that does not contain repeated characters and calculate the length of the longest string source: LeetCode
The sample
Input: "abcabcbb" Output: 3 Explanation: Since the oldest string without repeating characters is "ABC", its length is 3. Input: "BBBBB" Output: 1 Explanation: Since the oldest string without repeating characters is "b", its length is 1. Input: "pwwkew" Output: 3 Explanation: Since the oldest string without repeating characters is "wke", its length is 3. Note that your answer must be the length of the substring, "pwke" is a subsequence, not a substring.Copy the code
Train of thought
So let’s say we have a string like this, and we scan it from left to right, and we only have to scan it once. If the letter X is scanned, we first record a start, which represents the start position of the longest non-repeating substring currently found. When we encounter a character X, we need to see if the letter X exists at the subscript X minus 1 from start to X
How to check if the position from start to X subscript 1 contains the letter X? A map can be used to record the last position of each letter during the scan. LastOccurred [x] does not need to be handled if the character is not included in the map or occurs before start. If it occurs between start and the X subscript, then you need to change the position of start to the last occurrence of the character by 1
Sorting out ideas:
For each letter of x
- No operation is required if lastOccurred[x] does not exist or is smaller than start
- If lastOccurred[x] is greater than start, update start
- Update lastOccurred[x] to update the length of the oldest string
The problem solving
func lengthOfNonrepeatingSubStr(s string) int {
lastOccured := make(map[byte]int)
start := 0
maxLength := 0
for i, ch := range []byte(s) {
lastId, ok := lastOccured[ch]
if ok && lastId >= start {
start = lastId + 1
}
if i - start + 1 > maxLength {
maxLength = i - start + 1
}
lastOccured[ch] = i
}
return maxLength
}
func main() {
fmt.Println(newNonRepeat("abcabcbb"))//3
fmt.Println(newNonRepeat("bbbbb"))//1
fmt.Println(newNonRepeat("pwwkew"))//3
fmt.Println(newNonRepeat(""))//0
fmt.Println(newNonRepeat("b"))//1
fmt.Println(newNonRepeat("abcdefg"))//7
}
Copy the code
This problem can be solved in Leetcode in this way, but if the string we need to verify is a string containing Chinese, the program is not accurate, so we need to modify the program to support Chinese
Support For Chinese strings
In fact, the key to making it Chinese is how to use the Rune type in the GO language. Look at the following example
S := "Yes, I am a test!" For _, b := range []byte(s) {for _, b := range []byte(s) {for _, b := range []byte(s) {FMT.Printf("%X ", b) // Print the result in hexadecimal format: 59 65 73 E6 88 91 E6 98 AF E4 B8 80 E4 B8 AA E6 B5 8B E8 AF 95 21} fmt.Println() for I, ch := range s { Printf("(%d %X)", I, ch)// (0 59)(1 65)(2 73)(3 6211)(6 662F)(9 4E00)(12 4E2A)(15 6D4B)(18 8BD5)(21 21) } fmt.Println() fmt.Println("Rune count:", Utf8.runecountinstring (s))//10 For I, ch := range []rune(s) {//rune is an alias for int32, Each represents four bytes FMT. Printf (" (% d % c) ", I, ch) / / print the results (0), Y (1) e (2 s) (3 I) (4) (5) (6) (7) (8) (9!) } this traversal mode, can be a string normal traversal, and do not care about the ChineseCopy the code
Modify the longest non-repeating character substring found above
func newNonRepeat(s string) int { lastOccured := make(map[rune]int) start := 0 maxLength := 0 for i, ch := range []rune(s) { lastId, ok := lastOccured[ch] if ok && lastId >= start { start = lastId + 1 } if i - start +1 > maxLength { maxLength = i - Start + 1} lastOccured[ch] = I} return maxLength} fmt.Println(newNonRepeat(" I am a test "))//5 FMT.Println(newNonRepeat(" 1 2 2 2 1 "))//3Copy the code