The picture is from Unsplash

String flipping as an algorithm problem has been a basic problem, nothing more than reverse traversal, double pointer traversal, recursion, code can be written in minutes:

void strrev(char *str) {
    size_t start = 0;
    size_t end = start + strlen(str) - 1;
    
    while (start < end) {
        charch = str[start]; str[start++] = str[end]; str[end--] = ch; }}Copy the code

OK, the above code can definitely be AC in LeetCode, but can it be AC in actual situation? The answer is definitely no! A good string flipping algorithm is at least Medium in LeetCode.

First of all, we know that strings have encoding rules, such as utF-8, UTF-16 used in the early days of Windows, etc… In the case of ASCII characters such as English letters, both UTF-8 and ASCII encodings are one byte, so the above approach is not too problematic. However, in the case of Chinese characters, a Chinese character will take up 3 bytes in UTF-8, which will be garbled if simply flipped by byte.

So what’s the solution?

The easiest way to do this is to use the mbstowcs function to convert a string of type CHAR * to a wide string of type wchar_t, which is 4 bytes on Linux and UNIX systems and 2 bytes on Windows. Four bytes means that characters will be encoded in UTF-32 and can be stored in both Chinese and Emoji. However, for utF-16 characters with 2 bytes, Chinese characters can be represented, but Emoji characters in auxiliary plane code points need two codes, so the method in this paper is not applicable for the time being.

First let’s take a look at the improved string flipping:

static void strrev2(char *str) {
    setlocale(LC_CTYPE, "UTF-8");
    size_t len = mbstowcs(NULL, str, 0);
    wchar_t *wcs = (wchar_t *) calloc(len + 1.sizeof(wchar_t));
    mbstowcs(wcs, str, len + 1);
    
    size_t start = 0;
    size_t end = start + len - 1;
    
    while (start < end) {
        wchar_t wc = wcs[start];
        wcs[start++] = wcs[end];
        wcs[end--] = wc;
    }
    
    wcstombs(str, wcs, wcstombs(NULL, wcs, 0));
    free(wcs);
}
Copy the code

To use conversion functions such as MBstowcs, you first need to set the system encoding of the string, otherwise the function cannot determine what you pass in char *. In this article, both the source code and the STD I/O in the system environment use UTF-8 encoding.

Next we call mbstowcs once without passing in the destination address or character length, which allows the function to directly calculate the number of wchar_t required and return it so that we can allocate memory.

Then a regular string based on wchar_t is flipped, and finally converted back to free up memory.

Bonus: String flipping in Cocoa development

As an iOS developer, of course, there are workarounds in OC to consider.

Plan 1:

Iterate through the substring through the API, and then insert forward into the new NSMutableString.

- (NSString *)my_stringByReversing {
    NSMutableString *reversed = [NSMutableString stringWithCapacity:self.length];
    NSRange range = NSMakeRange(0, self.length);
    [self enumerateSubstringsInRange:range
                             options:NSStringEnumerationByComposedCharacterSequences
                          usingBlock:^(NSString * _Nullable substring, NSRange substringRange,
                                       NSRange enclosingRange, BOOL * _Nonnull stop) {
                              [reversed insertString:substring atIndex:0];
                          }];
    return [reversed copy];
}
Copy the code

This method works best and extracts Composed Emoji (such as 👨👩👧👧) as well, because they are Composed of multiple Unicode characters that wouldn’t fit even a 4-byte wchar_t. The downside of this approach, however, is that it can be expensive, as we’ll see in a moment.

Scheme 2:

Get the C String from the API, process it with the method described at the beginning of this article, and then reconstruct the NSString from the processed C String.

- (NSString *)my_stringByReversing2 {
    NSUInteger length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
    char *buf = calloc(length + 1, 1);
    [self getCString:buf maxLength:length + 1 encoding:NSUTF8StringEncoding];
    strrev2(buf);
    NSString *reversed = [NSString stringWithCString:buf encoding:NSUTF8StringEncoding];
    free(buf);
    return reversed;
}
Copy the code

The advantage of this method is that it is efficient, with more than 100 times better performance than the traversal method in tests, but the problem is that it cannot handle complex emojis.

Two methods, in the use of a good measure.

Solution 3:

Swift. The basic unit of Swift’s String is Character, which is a collection of Unicode Scalar and represents a renderable Character, including Composed Emoji. In addition, strings implement BidirectionalCollection with the reversed method, which allows easy String flipping. As a reminder, Swift String is based on Character, so I can reuse the Index before I can retrieve a Character. I have seen many people like to write

str.index(str.startIndex, offsetBy: i)
Copy the code

This is costly, because the Index move requires traversing the string from the starting point, and traversing the string this way is approximately O(n!). .

If you are interested, you can try Swift performance ~