The author Greg Heo (@ gregheo) | Twitter, the original link: Swift Substrings

Adding features or syntactic sugar to text strings is common in all programming languages. A character string is a characters array, but the compiler does this for you by typing hello instead of [‘h’,’e’,’l’,’l’,’o’]. More advanced languages such as Swift treat strings not just as character arrays, but as a full type with various features. Let’s first look at a property of strings: substrings.

So let’s just take a look at Strings

First, take a quick look at the string implementation. The following code comes from the standard library, String.swift:

public struct String {
  public var _core: _StringCore
}
Copy the code

There are other initialization Settings as well, but this is the only one stored in the declaration! The secret must all be in stringcore.swift:

public struct _StringCore {
  public var _baseAddress: UnsafeMutableRawPointer?
  var _countAndFlags: UInt
  public var _owner: AnyObject?
}
Copy the code

There are many other things in this type, but let’s just focus on the storage properties:

  • Base Address – a pointer to the internal store
  • Count — The length of a string, of type UInt, which in a 64-bit system means there is 62 (64-2) bits of space available to represent the length. That’s a very large number. So the length of the string is less likely to overflow.
  • Flags – Two bits used for Flags. The first bit indicates whether it is held by _StringBuffer; The second digit indicates whether the encoding format is ASCII or UTF-16. The true nature of _StringCore is a bit more complicated than this, but it makes it easier to understand the underlying information about strings: strings have some internal storage and storage size.

Substrings

How do I create a substring in Swift? The easiest way to do this is to get a segment from a string by subscripting it:

let str = "Hello Swift!"
let slice = str[str.startIndex..<str.index(str.startIndex, offsetBy: 5)]
// "Hello"
Copy the code

Although very simple, the code does not look very elegant 😄. The index of String is not an intuitive integer, so the position index of String should be obtained using startIndex and index(_:offsetBy:). If truncated from the beginning of the string, we can omit startIndex:

let withPartialRange = str[..<str.index(str.startIndex, offsetBy: 5)]
// still "Hello"
Copy the code

Or use this method in collection:

let slice = str.prefix(5)
// still "Hello"
Copy the code

Keep in mind that strings are collections, so you can use methods under collections like prefix(), suffix(), dropFirst(), etc.

The internals of Substring

The magic thing about substrings is that they reuse the parent string’s memory. You can think of subString as one of the parent strings.

For example, if you cut 100 characters from an 8,000-character string, you do not need to reinitialize the 100-character memory space. This also means that you might accidentally extend the life of the parent string. If you have a big chunk of string, and you just cut off a small chunk, as long as the small chunk of string is not freed, the big chunk of string is not freed. How exactly does it work inside a Substring?

public struct Substring {
  internal var _slice: RangeReplaceableBidirectionalSlice<String>
Copy the code

The internal _slice property holds all information about the parent string:

// Still inside Substring
internal var _wholeString: String {
  return _slice._base
}
public var startIndex: Index { return _slice.startIndex }
public var endIndex: Index { return _slice.endIndex }
Copy the code

Calculate the _wholeString attribute (which returns the entire parent string), startIndex and endIndex are both returned via internal _slice. You can also see how Slice references the parent string.

Substring is converted to String

Finally, there might be a lot of subStrings in the code, but the argument type of the function needs to be string. Converting a Substring to a string is also simple:

let string = String(substring)
Copy the code

Because substrings shares the same memory space as its parent string, guessing that creating a new string should initialize a new piece of storage. So what exactly does the initialization of string look like?

extension String {
  public init(_ substring: Substring) {
    / / 1
    let x = substring._wholeString
    / / 2
    let start = substring.startIndex
    let end = substring.endIndex
    / / 3
    let u16 = x._core[start.encodedOffset..<end.encodedOffset]
    // 4A
    if start.samePosition(in: x.unicodeScalars) ! =nil
    && end.samePosition(in: x.unicodeScalars) ! =nil {
      self = String(_StringCore(u16))
    }
    // 4B
    else {
      self = String(decoding: u16, as: UTF16.self)}}}Copy the code
  1. Creates a reference to the original parent string
  2. Gets the start and end positions of the subString in the parent string
  3. Gets the substring content in UTF-16 format. _core is an instance of _StringCore.
  4. The step of converting a subString to a string is simple enough to determine the matching Unicode encoding and generate a new string instance, but you may want to consider whether you need to do this in the first place. Is it true that all substring operations require a string type? If all operations on a substring need to be converted to a string, then lightweight substrings are meaningless. 🤔

StringProtocol

StringProtocol play! StringProtocol is really a good example of protocol-oriented programming. StringProtocol abstracts the scenario functions of strings, such as uppercased(), lowercased(), comparable, collection, and others. Both String and Substring declare StringProtocol. That is, you can use == to evaluate subString and string directly, without casting:

let helloSwift = "Hello Swift"
let swift = helloSwift[helloSwift.index(helloSwift.startIndex, offsetBy: 6)... ]// comparing a substring to a string 😱
swift == "Swift"  // true
Copy the code

You can also iterate over a subString or intercept a substring from a substring. There are also a small number of functions in the standard library that take a StringProtocol type as an argument. For example, converting a string to an integer is init(text: StringProtocol). Although you may not care about string and SubString, using StringProtocol as the argument type makes the caller a lot friendlier by eliminating the need for type conversions.

conclusion

  • The string is the usual string.
  • A Substring is a part of a string that shares the same memory space as the parent string and records its own start and end positions.
  • Both String and Substring declare that StringProtocol is implemented. StringProtocol contains the basic properties and functions of a string.

Do you think you can also customize string types and implement StringProtocol?

/// Do not declare new conformances to `StringProtocol`. Only the `String` and
/// `Substring` types in the standard library are valid conforming types.
public protocol StringProtocol
Copy the code

But Apple’s father refused.


  • Weibo: @Zhuo without a story

  • If you want to talk to me more closely, you can also join my Planet of Knowledge: The Programmer’s Survival Guide