This is the 14th day of my participation in the August More Text Challenge

This article is about strings and the character slice. We will use an example in Leetcode to share the usage scenarios and differences between strings, byte slice, and rune

string

Introduction to Basic Use

A string is an immutable sequence of bytes that can contain arbitrary data, including zero-valued bytes. Traditionally, text strings are interpreted as utF-8 encoded sequences of Unicode code points (literal symbols).

The built-in len function returns the number of bytes of the string (not the number of literal symbols), and the subscript s[I] retrieves the i-th character, where 0<= I <len(s)

s := "hello, world"
fmt.Println(len(s))//"12"
fmt.Println(s[0], s[7])//"104  119"
Copy the code

The i-th byte of a string is not necessarily the i-th character, since utF-8 code points for non-ASCII characters require two or more bytes. The substring generation operation s[I :j] produces a new string whose contents are taken from the original string bytes, with subscripts starting from I (with boundary values) up to j (without boundary values), resulting in J-i bytes

fmt.Println(s[:5])// "hello"
fmt.Println(s[7:])// "world"
fmt.Println(s[:])// "hello,world"
Copy the code

The plus (+) operator can be used to concatenate two strings to produce a new string

fmt.Println("goodbye" + s[5:]) // "goodbye, world"
Copy the code

Although new values can be assigned to string variables, string values cannot be changed. The byte sequence contained in the string value itself is never mutable. To append a string to another string, write this

s := "left root"
t := s
s += ", right root"
fmt.Println(s) // "left root, right root"
fmt.Println(t)// "left root"
Copy the code

This does not change the original string value of S, but simply assigns the new string generated by the += statement to S (for the sliced understanding shared earlier, click here). Meanwhile, t still holds the old string value. Because strings are immutable, the data inside the string cannot be modified. (Immutable means that two strings can safely share a piece of underlying memory, making copying strings of any length cheap.)

S [0] = 'L' // error: s[0] cannot be assignedCopy the code

String and byte slice

Four standard packages are particularly important for string manipulation: Bytes, Strings, Strconv, and Unicode

  • The Strings package provides many functions to search, replace, compare, trim, slice, and concatenate strings
  • The Bytes package has a similar function for operating on the byte slice (type []byte, which has some of the same properties as a string). Because strings are immutable, building strings incrementally results in multiple memory allocations and copies. In this case, it is more efficient to use the bytes.buffer type
  • The strconv package has functions that convert bools, integers, and floating-point numbers to their corresponding string forms, or convert strings to bools, integers, and floating-point numbers, as well as add/remove quotes to strings
  • The Unicode package comes with functions that identify literal symbol value properties, such as IsDigit, IsLetter, IsUpper, and IsLower. Each function takes a single literal value as an argument and returns a Boolean value. If the literal symbol value is an English letter, conversion functions (such as ToUpper and ToLower) convert it to the specified case. All of the above functions follow the Unicode standard for alphabetic and alphanumeric classification. The Strings package has similar functions, also named ToUpper and ToLower, that perform specified transformations on each character of the original string to generate and return a new string

Strings can be converted to and from the character slice

s := "abc"
b := []byte(s)
s2 := string(b)
fmt.Println(s2)
Copy the code

Conceptually, the []byte(s) conversion allocates a new byte array, copies the bytes contained in s, and generates a slice reference to the entire array. Optimized compilers may avoid allocating memory and copying content in some cases, but in general, copying is necessary to ensure that the bytes of S remain the same (even if the bytes of B change after the transformation). Conversely, converting a byte slice to a string using string(b) also produces a copy, ensuring s2 is immutable as well

To avoid conversions and unnecessary memory allocation, both bytes packages and strings packages have a number of corresponding utility functions prepared in pairs. The Strings package has the following six functions:

func Contains(s, substr string) bool
func Count(s , sep string) int
func Fields(s string) []string
func HasPrefix (s , prefix string) bool
func Index(s , sep string) int
func Join(s [][]byte, sep []byte) []byte
Copy the code

The corresponding function in the bytes package is:


func Contains(b , subslice [] byte) bool
func Count(s , sep [ ] byte) int
func Fields(s [ ] byte ) [] [] byte
func HasPrefix(s , prefix [] byte) bool
func Index(s , sep [] byte) int
func Join(s [][]byte, sep []byte) []byte
Copy the code

Conversion between strings and numbers

Conversion between strings, literal symbols, and bytes is often also required to convert numeric values and their string representations to each other. This is done by strconv package functions

To convert integers to strings, one method is fmt.sprintf () and the other is strconv.itoa ().

x := 123
y := fmt.Sprintf("%d", x)
fmt.Println(y, strconv.Itoa(x)) // "123   123"
Copy the code

FormatInt and FormatUint can format data in different carries

Println(strconv.FormatInt(int64(x), 2))// "1111011" convert x to base 2Copy the code

The Atoi function or ParseInt function in the strconv package is used to interpret strings representing integers, and ParseUint is used for unsigned integers

X, err := strconv.Atoi("123")//x is an integer. Y, err := strconv.ParseInt("123", 10, 64)// The maximum value is 64Copy the code

The third argument to ParseInt, specifying what size of integer the result must match, for example, 16 for int16, and 0 for int, the most special value

Using the demonstration

Find the longest substring in the string that does not contain repeated characters and calculate the length of the longest string source: LeetCode

The sample

Input: "abcabcbb" Output: 3 Explanation: Since the oldest string without repeating characters is "ABC", its length is 3. Input: "BBBBB" Output: 1 Explanation: Since the oldest string without repeating characters is "b", its length is 1. Input: "pwwkew" Output: 3 Explanation: Since the oldest string without repeating characters is "wke", its length is 3. Note that your answer must be the length of the substring, "pwke" is a subsequence, not a substring.Copy the code

Train of thought

So let’s say we have a string like this, and we scan it from left to right, and we only have to scan it once. If the letter X is scanned, we first record a start, which represents the start position of the longest non-repeating substring currently found. When we encounter a character X, we need to see if the letter X exists at the subscript X minus 1 from start to X

How to check if the position from start to X subscript 1 contains the letter X? A map can be used to record the last position of each letter during the scan. LastOccurred [x] does not need to be handled if the character is not included in the map or occurs before start. If it occurs between start and the X subscript, then you need to change the position of start to the last occurrence of the character by 1

Sorting out ideas:

For each letter of x

  • No operation is required if lastOccurred[x] does not exist or is smaller than start
  • If lastOccurred[x] is greater than start, update start
  • Update lastOccurred[x] to update the length of the oldest string

The problem solving

func lengthOfNonrepeatingSubStr(s string) int {
		lastOccured := make(map[byte]int)
		start := 0
		maxLength := 0
		for i, ch := range []byte(s) {
			lastId, ok := lastOccured[ch]
			if  ok && lastId >= start {
				start = lastId + 1
			}
			if i - start + 1 > maxLength {
				maxLength = i - start + 1
			}
			lastOccured[ch] = i
		}

	return maxLength
}

func main() {
    fmt.Println(newNonRepeat("abcabcbb"))//3
    fmt.Println(newNonRepeat("bbbbb"))//1
    fmt.Println(newNonRepeat("pwwkew"))//3
    fmt.Println(newNonRepeat(""))//0
    fmt.Println(newNonRepeat("b"))//1
    fmt.Println(newNonRepeat("abcdefg"))//7
}
Copy the code

This problem can be solved in Leetcode in this way, but if the string we need to verify is a string containing Chinese, the program is not accurate, so we need to modify the program to support Chinese

Support For Chinese strings

In fact, the key to making it Chinese is how to use the Rune type in the GO language. Look at the following example

S := "Yes, I am a test!" For _, b := range []byte(s) {for _, b := range []byte(s) {for _, b := range []byte(s) {FMT.Printf("%X ", b) // Print the result in hexadecimal format: 59 65 73 E6 88 91 E6 98 AF E4 B8 80 E4 B8 AA E6 B5 8B E8 AF 95 21} fmt.Println() for I, ch := range s { Printf("(%d %X)", I, ch)// (0 59)(1 65)(2 73)(3 6211)(6 662F)(9 4E00)(12 4E2A)(15 6D4B)(18 8BD5)(21 21) } fmt.Println() fmt.Println("Rune count:", Utf8.runecountinstring (s))//10 For I, ch := range []rune(s) {//rune is an alias for int32, Each represents four bytes FMT. Printf (" (% d % c) ", I, ch) / / print the results (0), Y (1) e (2 s) (3 I) (4) (5) (6) (7) (8) (9!) } this traversal mode, can be a string normal traversal, and do not care about the ChineseCopy the code

Modify the longest non-repeating character substring found above

func newNonRepeat(s string) int { lastOccured := make(map[rune]int) start := 0 maxLength := 0 for i, ch := range []rune(s) { lastId, ok := lastOccured[ch] if ok && lastId >= start { start = lastId + 1 } if i - start +1 > maxLength { maxLength = i - Start + 1} lastOccured[ch] = I} return maxLength} fmt.Println(newNonRepeat(" I am a test "))//5 FMT.Println(newNonRepeat(" 1 2 2 2 1 "))//3Copy the code