“This is the 15th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

preface

No more nonsense, data structure must learn! I will update a chapter every day. If I can’t finish one, I will divide it into two chapters

Five, the string

5.1 Definition of strings

A string is a finite sequence of zero or more characters, also called a string.

Generally written as s = “a1a2a3a4…… An “(n>0), where s is the name of the string, the sequence of characters enclosed in double quotation marks is the value of the string, and the single quotation marks do not belong to the content of the string

The number of characters in a string, n, is called the length of the string, and a string of zero characters is called an empty string, which can be represented by “” or an empty set

5.2 Comparison of strings

Two numbers are nice to compare, but how do strings compare to each other?

For example, which is bigger, silly or stupid?

The first letter is both S, so we don’t think there’s any difference in size, and the second letter, I comes before t, so we say, “silly” > “stupid”.

In fact, strings are compared by the encoding between the characters that make up the string, and the encoding refers to the serial number of the characters in the corresponding character set

The most common characters in computers are the standard ASCII code, or more accurately, the 7-bit binary code for a character, which can represent up to 128 characters. Later, the extended ASCII code is the 8-bit binary code for a character, which can represent up to 256 characters. There are enough characters for the input, storage, and output operations of english-dominated languages and special symbols.

Obviously, 256 characters were not enough, so Unicode encodings were developed, which are more commonly used to represent 16-bit binary numbers — one character. This gives a total of 216 characters, which is more than 650,000 characters, enough to represent all characters in all the languages of the world. Of course, in order to be compatible with ASCII, the first 256 characters of Unicode are exactly the same as ASCII.

5.3 Abstract data types for strings

The basic operation on strings is quite different from that on linear tables. Linear tables are more concerned with operations on individual elements, such as finding an element, inserting or deleting an element, but strings are more concerned with finding a substring position, getting a substring at a given position, replacing a substring, and so on.

ADT string (string) Data string elements consist of only one character. Adjacent elements have a precursor and a successor relationship. Operation StrAssign (T,*chars) : Generates a string T whose value is equal to the string constant chars. StrCopy(T,s): string S exists and T is copied from string S. ClearString(s) : the string s exists and clears the string. StringEmpty(s) : Returns true if the string s is empty, false otherwise. StrLength(s) : Returns the number of elements of string S, that is, the length of the string. StrCompare (S,T) : if S>T, return 0, if S=T, return 0, if S<T, return <0. Concat (T,S1,S2) : use T to return a new string joined by S1 and S2. SubString (Sub,s,pos,len) : string s exists, 1≤pos≤StrLength(s), and 0≤len≤StrLength (s) -pos+1. Index (S,T,pos) : string S and T exist,T is a non-empty string, 1≤pos≤StrLength(S). If there is a substring in S with the same value as T, return the position where it first appeared after the pos character in S, otherwise return 0. Replace (S,T,v): Strings S,T, and v exist, and T is a non-empty string. Replace all non-overlapping substrings that are equal to T in primary string s with V. StrInsert (s,pos,t) : string s and t exist, 1≤pos≤StrLength(s) +1. Insert string T before pos of string S. StrDelete (S,pos,len) : string S exists, 1≤pos≤StrLength (S) -len+1. Removes substrings of length len from string S at pos. endADTCopy the code

In the next section we introduce the string storage structure and some common pattern matching algorithms