We use some crash log collection libraries to locate and troubleshoot online crashes, but some crash stacks provide limited information and are not required to crash, making it difficult to directly troubleshoot the problem. Here I share a technique for troubleshooting and analyzing crash logs using register assignment tracing. Without further ado, let’s look at the case:

Date/Time: 2021-03-25 04:35:38.211 +0800 OS Version: iOS 10.3.2 (14F89) Report Version: 104 Monitor Type: Unix Signal Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x00000000 at 0x00000001808f1014 Crashed Thread: 27 Pthread id: 1313098 Thread 27 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 112 2 libsystem_c.dylib abort + 140  3 libsystem_malloc.dylib szone_error + 420 4 libsystem_malloc.dylib free_list_checksum_botch.295 + 36 5 libsystem_malloc.dylib tiny_free_list_remove_ptr + 288 6 libsystem_malloc.dylib tiny_free_no_lock + 684 7 Libsystem_malloc. dylib free_tiny + 472 8 CoreFoundation _CFRelease + 1228 testApp __99-[XXX fn:queue:]_block_invoke + 384 10 libdispatch.dylib _dispatch_call_block_and_release + 24 11 libdispatch.dylib _dispatch_client_callout + 16 12 libdispatch.dylib _dispatch_queue_serial_drain + 928 13 libdispatch.dylib _dispatch_queue_invoke + 884 14 libdispatch.dylib _dispatch_root_queue_drain + 540 15 libdispatch.dylib _dispatch_worker_thread3 + 124 16 libsystem_pthread.dylib _pthread_wqthread + 1096Copy the code
Pthread id: 1313098 Thread 27 Crashed: 0 libsystem_kernel.dylib 0x00000001808f1014 __pthread_kill + 8 1 libsystem_pthread.dylib 0x00000001809bb264 pthread_kill  + 112 2 libsystem_c.dylib 0x00000001808659c4 abort + 140 3 libsystem_malloc.dylib 0x0000000180931828 szone_error + 420 4 libsystem_malloc.dylib 0x000000018093b74c 0x180924000 + 96076 5 libsystem_malloc.dylib 0x0000000180928994 0x180924000 + 18836 6 libsystem_malloc.dylib 0x000000018093ba00 0x180924000 + 96768 7 libsystem_malloc.dylib 0x000000018093c0c8 0x180924000 + 98504 8 CoreFoundation 0x00000001818a701c 0x1817ca000 + 905244 9 testApp 0x0000000103b9e5b8 __99-[XXX fn:queue:]_block_invoke + 384 10 libdispatch.dylib 0x00000001807ae9e0 0x1807ad000 + 6624 11 libdispatch.dylib 0x00000001807ae9a0 0x1807ad000 + 6560 12 libdispatch.dylib 0x00000001807bcad4 0x1807ad000 + 64212 13 libdispatch.dylib 0x00000001807b22cc 0x1807ad000 + 21196 14 libdispatch.dylib 0x00000001807bea50 0x1807ad000 + 72272 15 libdispatch.dylib 0x00000001807be7d0 0x1807ad000 + 71632 16 libsystem_pthread.dylib 0x00000001809b7100 _pthread_wqthread + 1096 Binary Images: 0x1000b8000 - 0x108263fff +testApp arm64 <cb8cc0075ed4352b802c6c586b8a93d5> /var/containers/Bundle/Application/0A1F4541-8749-4F9D-B60A-813FFEE69CA6/testApp.app/testAppCopy the code

From the crash information above, it can be seen that this is a GCD queue thread call crash. The ninth level of the crash shows that the code inside the block defined in a [XXX fn:queue:] method crashed, but nothing else. The source code for the crash method is defined as follows:

@interface TestObj:NSObject -(NSString*)testString; -(NSInteger)length; @end -(void)fn:(TestObj*) TestObj queue:(dispatch_queue_t)queue {dispatch_async(queue, dispatch_queue_t) ^{ @autoreleasepool { if ([testObj length] ! = 0) { NSString *suffix = [testObj testString]; const static int len = 4; if (suffix.length > len) { suffix = [suffix substringToIndex:len]; }}}}); }Copy the code

A dispatch_async call is made within the method -[XXX fn:queue:] and the block is defined within the method. There’s still no way to pinpoint which line of code crashed, and it’s not necessarily an online crash.

So to find out the cause need to go to the assembly code level to locate the crash cause!! The steps are as follows:

  1. Download the executable file locally or obtain the executable app package from the CI distribution department and unpack it.

  2. With the system’s own otool tool, code disassembly processing. The following otool command format can be used to display disassembly code for specific functions or methods:

Otool "executable file path" -p "Name of a function or method" -v -tCopy the code

In the otool command, -p is followed by a method name, function name, or symbol name. One thing to be careful about here is that because the system may add an extra underscore to the function name or symbol name at compile time, you need to add an extra underscore to the symbol name. -v is the assembly code that indicates the corresponding print function. -t indicates the code in the printed code snippet.

For this example, use otool as follows:

  otool   "/Users/apple/Downloads/Payload/testApp.app/testApp" -p "___99-[XXX fn:queue:]_block_invoke" -V -t 
Copy the code

The partial code compiled is as follows:

/Users/apple/Downloads/Payload/testApp.app/testApp:
(__TEXT,__text) section
___99-[XXX fn:queue:]_block_invoke: ......... Omit some code0000000103ae6554<+284>	mov x20, x0
0000000103ae6558<+288>	ldr x0, [x20, #x20]
0000000103ae655c<+292>	adrp	x8, 26695 ; 0x10a32d000
0000000103ae6560<+296>	ldr	x1, [x8, #0x940]; Objc selector ref: testString0000000103ae6564<+300>	bl	0x107fac9e8 ; Objc message: -[x0 testString]
0000000103ae6568<+304>	mov	x29, x29
0000000103ae656c<+308>	bl	0x107faca48 ; symbol stub for: _objc_retainAutoreleasedReturnValue
0000000103ae6570<+312>	mov	x25, x0
0000000103ae6574<+316>	mov	x0, x26
0000000103ae6578<+320>	bl	0x107faca18 ; symbol stub for: _objc_release
0000000103ae657c<+324>	mov	x0, x25
0000000103ae6580<+328>	mov	x1, x22
0000000103ae6584<+332>	bl	0x107fac9e8 ; Objc message: -[x0 testString]  // There is a bug in otool that should be the length method.
0000000103ae6588<+336>	cmp	x0, #0x5
0000000103ae658c<+340>	b.lo	0x103ae65bc
0000000103ae6590<+344>	adrp	x8, 26674 ; 0x10a318000
0000000103ae6594<+348>	ldr	x1, [x8, #0xb30]; Objc selector ref: substringToIndex:0000000103ae6598<+352>	mov	x0, x25
0000000103ae659c<+356>	mov	w2, #0x4
0000000103ae65a0<+360>	bl	0x107fac9e8 ; Objc message: -[x0 substringToIndex:]
0000000103ae65a4<+364>	mov	x29, x29
0000000103ae65a8<+368>	bl	0x107faca48 ; symbol stub for: _objc_retainAutoreleasedReturnValue
0000000103ae65ac<+372>	mov	x22, x0
0000000103ae65b0<+376>	mov	x0, x25
0000000103ae65b4<+380>	bl	0x107faca18 ; symbol stub for: _objc_release
0000000103ae65b8<+384>	mov	x25, x22
Copy the code

You can see from the crash information above that the crash address is 0x0000000103b9e5b8. According to this address, the crash is in the penultimate line of our disassembly code. 0x0000000103ae65b4<+380>

The two addresses above are different. How can the conclusion be reached?

  1. When the program runs online, the program is loaded at a random base address, and the base address of the program image is 0x1000B8000 as you can see from the bottom part of the original crashed stack. Therefore:
0x0000000103b9e5b8 - 0x1000b8000 = 0x0000000103ae65b8
Copy the code
  1. And because the general program crash address has three characteristics:

A. The non-top-level address in the crash stack hierarchy is the next address of the function call instruction, namely LR value, so the real crash instruction is the result calculated in step 1 minus 4, that is, the actual crash address is 0x0000000103AE65b4

B. If crash information appears at the top level, generally crash instructions are instructions with memory access. If the crash is at the second instruction above, that is, at LDR x0, [x20, #x20], there is a high probability that the crash is caused by an invalid memory address.

C. If the crash message appears at the top level, i.e., no memory access or function call instructions, the crash usually triggers the BRK breakpoint instruction, or produces some other unidentifiable cause. The former is easier to locate, the latter is harder.

From the penultimate line <+384> of the assembly code above, you can see that an object called _objc_release caused the crash. __99-[XXX fn:queue:]_block_invoke + 384 on the crash stack is _CFRelease.

Memory free exception caused by object call _objC_release Then you need to keep track of which object called _objc_release.

An x0 = x25 operation is performed according to the ABI rules of the arm system function call and the assembly code in the third to last line <+376>, that is, the x0 object is assigned from x25. At this point we can use the technique of register assignment tracing to see where x25 is assigned again. Up the code can be seen in the instruction execution of < + > 312 x25 = x0 assignment operation, and the result of x0 is on call _objc_retainAutoreleasedReturnValue function returns the result of an instruction. _objc_retainAutoreleasedReturnValue into parameter are also an instruction on the < 300 > + place of [x0 testString] method returns the result. At this point I can infer from the source that the result object returned by [testObj testString] caused the crash when it was released.

Then you can take a closer look at the source code [testObj testString] method where the problem is. And finally locate the cause of the anomaly.

Here is a trace derivation using the register trace technique:

Tip: In ARM64-bit systems, the first argument to a function is stored in an X0 register, as are objects called by OC methods, and the return results of functions and methods.