Abstract: An interesting Crash exploration process, Clang has GCC does not

This article is originally published on Nebula Graph’s official blog: nebula-graph.com.cn/posts/troub…

If someone told you that the following C++ function would cause a program to crash, what would come to mind?

std::string b2s(bool b) {
    return b ? "true" : "false";
}
Copy the code

If you give some more descriptions, like:

  • Crash is repeated with a certain probability
  • The cause of Crash is segmenting error (SIGSEGV).
  • Backtraces in the field are often incomplete or completely missing.
  • Only optimizations above -O2 will (more easily) recur
  • GCC does not reproduce only under Clang

Well, some old birds might already have a clue, so here’s a minimal repetition procedure and steps:

// file crash.cpp
#include <iostream>
#include <string>

std::string __attribute__((noinline)) b2s(bool b) {
    return b ? "true" : "false";
}

union {
    unsigned char c;
    bool b;
} volatile u;

int main(a) {
    u.c = 0x80;
    std::cout << b2s(u.b) << std::endl;
    return 0;
}
Copy the code
$ clang++ -O2 crash.cpp $ ./a.out truefalse,d$x4DdzRx Segmentation fault (core dumped) $ gdb ./a.out core.3699 Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000012cfffff0d4 in ?? () (gdb) bt #0 0x0000012cfffff0d4 in ?? () #1 0x00000064fffff0f4 in ?? () #2 0x00000078fffff124 in ?? () #3 0x000000b4fffff1e4 in ?? () #4 0x000000fcfffff234 in ?? () #5 0x00000144fffff2f4 in ?? () #6 0x0000018cfffff364 in ?? () #7 0x0000000000000014 in ?? () #8 0x0110780100527a01 in ?? () #9 0x0000019008070c1b in ?? () #10 0x0000001c00000010 in ?? () #11 0x0000002ffffff088 in ?? () #12 0xe2ab001010074400 in ?? () #13 0x0000000000000000 in ?? (a)Copy the code

Because the backtrace information is incomplete, it indicates that the program did not crash in the first time. For a quick first spot, try AddressSanitizer (ASan) :

$ clang++ -g -O2 -fno-omit-frame-pointer -fsanitize=address crash.cpp
$ ./a.out
=================================================================
==3699==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000000552805 at pc 0x0000004ff83a bp 0x7ffd7610d240 sp 0x7ffd7610c9f0
READ of size 133 at 0x000000552805 thread T0
    #0 0x4ff839 in __asan_memcpy (a.out+0x4ff839)
    #1 0x5390a7 in b2s[abi:cxx11](bool) crash.cpp:6
    #2 0x5391be in main crash.cpp:16:18
    #3 0x7faed604df42 in __libc_start_main (/usr/lib64/libc.so.6+0x23f42)
    #4 0x41c43d in _start (a.out+0x41c43d)

0x000000552805 is located 59 bytes to the left of global variable '<string literal>' defined in 'crash.cpp:6:25' (0x552840) of size 6
  '<string literal>' is ascii string 'false'
0x000000552805 is located 0 bytes to the right of global variable '<string literal>' defined in 'crash.cpp:6:16' (0x552800) of size 5
  '<string literal>' is ascii string 'true'
SUMMARY: AddressSanitizer: global-buffer-overflow (/home/dutor.hou/Wdir/nebula-graph/build/bug/a.out+0x4ff839) in __asan_memcpy
Shadow bytes around the buggy address:
…
...
Copy the code

From ASan, we can determine that a “global buffer overflow” occurred when b2s(bool) read the string constant “true”. Well, once again looking at the problem function and the replay program from god’s perspective, it “seems” to conclude that since the Boolean type parameter B of B2s is not initialized, b stores a value other than 0 and 1 [1]. So the question is, why does this value of B cause a buffer overflow? If you’re interested, you can change the type of B from bool to char or int, and the problem can be fixed.

To answer this question, we’ll have to look at what clang++ is generating for b2s (we mentioned earlier that there was no crash under GCC, so the problem may have something to do with code generation). Before we do, we should know:

  • In the sample program,b2sThe return value of is temporarystd::stringObject, which is stored on the stack
  • C++ 11, GCCstd::stringThe default implementation uses SBO (Small Buffer Optimization), defined roughly asstd::string{ char *ptr; size_t size; union{ char buf[16]; size_t capacity}; }. For lengths less than16String, no extra memory required.

OK, so let’s take a look at the disassembly of B2S and give a key note:

(gdb) disas b2s Dump of assembler code for function b2s[abi:cxx11](bool): 0x00401200 <+0>: push %r14 0x00401202 <+2>: Push % RBX 0x00401203 <+3>: push %rax 0x00401204 <+4>: MOV %rdi,%r14 # Save the starting address of the returned value to r14 0x00401207 <+7>: Mov $0x402010,%ecx # save "true" starting address to ecx 0x0040120c <+12>: Mov $0x402015,%eax # save "false" starting address to eAX 0x00401211 <+17>: test %esi,% ESI # "test" parameter b is non-zero 0x00401213 <+19>: Cmovne % RCX,%rax # If b is non-zero, save the "true" address to rax 0x00401217 <+23>: Lea 0x10(%rdi),%rdi # saves the buF starting address in string to rDI # 0x0040121b <+27>: Mov %rdi,(%r14) # save rDI to string PTR, SBO 0x0040121E <+30>: mov %esi,%ebx # save b to ebX 0x00401220 <+32>: Xor $0x5,% RBX # xor $0x5,% RBX # xor $0x5,% RBX # 0x00401224 <+36>: mov %rax,%rsi # Save the starting address of the string to rSI, that is, the second parameter of memcpy 0x00401227 <+39>: Mov % RBX,% RDX # save the length of the string to RDX, memcpy 0x0040122a <+42>: callq <memcpy@plt> # Mov % RBX,0x8(%r14) # save the string length to string::size 0x00401233 <+51>: Movb $0x0,0x10(%r14,% RBX,1) # end string with '\0' 0x00401239 <+57>: Mov %r14,%rax # save string address to RAx, return value 0x0040123c <+60>: add $0x8,% RSP 0x00401240 <+64>: pop % RBX 0x00401241 <+65>: pop %r14 0x00401243 <+67>: retq End of assembler dump.Copy the code

At this point, the question becomes crystal clear:

  1. Clang++ assumes that theboolThe value of the type is not01
  2. At compile time,"True""False"The length of the known
  3. Use xOR directives (0x5 ^ false == 5.0x5 ^ true == 4) calculates the length of the string to be copied
  4. whenboolLength calculation error when type does not match hypothesis
  5. becausememcpyThe destination address is on the stack (in this case only), so the buffer on the stack may also overflow, causing the program to fly and the backtrace to be missing.

Note:

  1. C++ standard requirementsboolThe type _ can represent at least two states:true 和 falseBut there is no regulationsizeof(bool)The size of the. But on almost all compiler implementations,boolBoth take up one addressing unit, the byte. Therefore, from the perspective of storage, the value range is0x00-0xFF, i.e.,256A state.

Like this article? GitHub: 🙇♂️🙇♀️

Ac graph database technology? NebulaGraphbot takes you into the NebulaGraphbot community

Recommended reading

  • Troubleshoot a Segmentation Fault and GCC Illegal Instruction compilation problem