Issues and Background

In terms of mobile application performance, crashes are the most severe, interrupting the user’s in-progress experience, resulting in business-critical disruptions, reduced user retention, poor brand reputation, and reduced lifecycle value. Many companies regard the crash rate as the highest priority technical index, so the monitoring and collection of program crashes has become an essential work. Currently, 58.com App uses Tencent Bugly as a tool to collect abnormal App data in the release environment.

Our crash rate has been optimized, and each version has students who are responsible for monitoring online crashes and solving problems. After continuous optimization, the crash rate of 58.com iOS App remains at a relatively good level, and most of the crashes collected on Bugly are wild pointer crashes and difficult crashes. However, the optimization methods of the remaining difficult crashes are relatively limited. One of the main reasons is that the crashes on Bugly cannot be resolved normally and the real cause cannot be located. Let’s take a simple example to illustrate.

RN HOOK function problem

0 CoreFoundation  0x00000001804f504c 	0x000000018045c000 + 626764
1 Foundation      0x0000000181dae6cc 	0x0000000181c7e000 + 1246924
2 UIKit           0x0000000198e5cf30 	0x0000000198e57000 + 24368
3 AppName         0xe622388106d79fcc 	RCTFBQuickPerformanceLoggerConfigureHooks + 16244
4 CoreTelephony   0x0000000198e5e628 	0x0000000198e57000 + 30248
5 CoreTelephony   0x0000000108f68fe4 	0x0000000198e57000 + 78455260
6 CoreTelephony   0x00000001061ed870 	0x0000000198e57000 + 30763624
7 CoreTelephony   0x0000000108f657ec 	0x0000000198e57000 + 78440932
8 AppName         0x0000000108f67024 	_ZN6tflite19AcquireFlexDelegateEv + 78447132
9 Foundation      0x0000000108f67024 	_NSGetUsingKeyValueGetter + 88
Copy the code

In the past few releases, we found there are a large number of collapse on the Bugly log will carry a function call stack from RN: RCTFBQuickPerformanceLoggerConfigureHooks, this is a RN HOOK function. Multiple crash log stacks point to this function, and it’s an empty function with no implementation, which bothers us. Students who have used Bugly know that every Bugly crash log has a trace data, which records the trace log of the page before the crash. Through the trace log of the page, we find that most of the pages browsed by users in these crashes do not involve RN services and have nothing to do with RN. And every crashed page trace log is different. Since the business browsed before the crash did not involve RN but the stack on Bugly did point to RN, we suspect that the crash was not a crash on RN’s HOOK functions and that they were crashes caused by different errors. With that in mind, we set out to test the conjecture to see if our suspicions were accurate.

How do I validate Bugly parsing errors

Because Bugly couldn’t get hold of the IPS file generated after the app crashed, it couldn’t symbolize the log using tools like Symbolicatecrash. So we used the ATOS command to verify our suspicions.

1. Atos validation

The atOS tool will output the broken code statement, the file it is in and the number of lines. The precondition is to get the dSYM file, determine whether the phone architecture is ARM64 or ARMV7, and get the LOAD-address and address required by ATOS to find the problem. The aots command format is as follows:

atos -o yourAppName.app.dSYM/Contents/Resources/DWARF/yourAppName -arch arm64/armv7 -l <load-address> <address>
Copy the code

How to get dSYM files and schemas is not covered in detail here, let’s look at how to get load-address and address from Bugly crash logs.

For example, a normal crash log format is:

0x0000000103ef6970 0x0000000102728000 + 30252
Copy the code

0x0000000103EF6970 is the address required by ATOS, 0x0000000102728000 is the start address required by ATOS, and 302522 is the offset. Offset + start address = start address.

Introduced atos need the load address (running the starting address) and address (address) operation, look at the RCTFBQuickPerformanceLoggerConfigureHooks collapse of this function, Based on the example in the figure, we can see that the running address of the crash is 0xe622388106d79FCC, but the crash address is wrong, generally less than 0xFFFFFFFFFF, which is obviously much larger in the example. Therefore, I need to clean the high address. After cleaning, this address is 0x106d79FCC. Therefore, address is 0x106d79FCC.

Then we open up the other Bugly information column and seeApp base addr(Base address) :0x0000000102604000This is what ATos needsload address.Through the above information, we takeRCTFBQuickPerformanceLoggerConfigureHooksThis function is used as an example to verify that Bugly’s parsing results are correct

➜ atos -o AppName. App. DSYM/Contents/Resources/DWARF/AppName - arch arm64 - l0x0000000102604000 0x0000000106d79fcc
-[NSMutableDictionary(YJKit) yjKit_setObject:forKey:] (in AppName) (YJKit.m:432)
Copy the code

It turns out that the symbolized results of ATos are really inconsistent with those given to us by Bugly. According to the page tracking data of Bugly, we confirmed that the symbolized results of ATOS were correct, which was consistent with our suspicion.

Since the stack of the Bugly errors point to the empty function of RN, then we will have a look at how RCTFBQuickPerformanceLoggerConfigureHooks are present in the source code.

**2. Source codeRCTFBQuickPerformanceLoggerConfigureHooks* * function

RCTFBQuickPerformanceLoggerConfigureHooks function in the source of the statement is as follows:

In the source code, this function does not have any implementation, is completely an empty function. willRCT__EXTERNUnfolds for__attribute__((visibility("default"))), its role is toRCTFBQuickPerformanceLoggerConfigureHooksTo the outside world, if there is a function with the same name, thenRCTFBQuickPerformanceLoggerConfigureHooksA symbol conflict error is reported. Here use__attribute__((weak))willRCTFBQuickPerformanceLoggerConfigureHooksIf there is an external function with the same name, the SDK calls the external function internally, otherwise it calls the internal empty function. This weak symbol is inRNIn theHOOKNext, we will take a look at weak symbols in detail.

3. Weak symbol __attribute__ ((weak))

In a program, no matter the name of a variable or a function, in the eyes of the compiler, it is just a symbol, which can be divided into strong symbols and weak symbols. For C/C++, the compiler default functions and initialized global variables are strong symbols, and uninitialized global variables are weak symbols. Strong symbols and weak symbols generally follow the following three rules during program compilation and linking:

  1. Strong symbols are not allowed to be defined more than once. If there are multiple strong symbols, a symbol redefinition error is reported

  2. If there is a strong symbol and all other definitions are weak, the strong symbol is selected

  3. If a symbol is weak in all files, select the one that takes up the most space

Strong and weak symbol rule definition is taken from: strong symbol and weak symbol, strong and weak reference

Duplicate symbol ‘_OBJC_CLASS_$_XXX’ error: duplicate symbol ‘_OBJC_CLASS_$_XXX’ error: Duplicate symbol ‘_OBJC_CLASS_$_XXX’ At compile time, the compiler outputs each global symbol to the assembler. If two or more global symbols (function or variable names) have the same name and are both strong symbols, a symbol redefinition error will occur. If one of them is weak symbols, there will be no problem.

When both strong and weak symbols exist in a program, the linker ignores the weak symbols and resolves all references to those symbols using ordinary global symbols, but when ordinary global symbols are not available, the linker uses weak symbols. When a function or variable name may be overridden by the user, the function or variable name can be declared as a weak symbol. Weak symbols can be defined by __attribute__((weak)).

4. Use of weak symbols

In development, if we are not sure whether the external module provides a function func, but we have to use the function, that is, our module’s code must use the func function:

extern int func(void); . int a = func; .Copy the code

We don’t know if the func function is defined, which results in two things:

  1. There’s this function out therefuncThen use this function in my own modulefunc, right.
  2. External if this function does not exist, then we usefunc“, the program crashes.

So at this point, __attribute__((weak)) comes in handy and is defined in its own module:

int __attribute__((weak)) func(.)
{
     return 0;
}
Copy the code

Convert this module’s func to a weak symbol type. If a strong symbol type is encountered (that is, the external module defines func), then the func we execute in this module will be the external module defines func. If the external module is not defined, then this weak symbol will be called.

We found Bugly to some not parse the collapse of the right, the stack will locate on the project’s weak symbols, and at the same time we also found in 58 city App, Bugly position not only to RCTFBQuickPerformanceLoggerConfigureHooks this a weak symbols, A large number of crashes are located on other weak symbols.

Above we restore the correct log through ATOS, and locate the problem is weak symbol, let’s combine the symbol table to see the principle of log symbolization.

How do I handle data from bugly parsing exceptions

Crash logs are unreadable until they are symbolized, meaning that the stack information is interpreted as readable function or method names in the source code, known as symbols. Only after the symbolization is successful, Crash logs can better help developers locate problems. Logs can be parsed using dSYM files, which stand for Debug Symbols.

DWARF is a debug file format used by many compilers and debuggers to support sourch-level debugging. The format is a fixed data format, and dSYM is a file that stores debugging information in DWARF format, often referred to as symbol table files.

Logs can be symbolized in a variety of ways, such as Xcode analysis, Symbolicatecrash, atOS, dwarfdump, etc., essentially finding out which function in the symbol table the crash instruction is in. Today we are going to focus on how To find the correct stack in the symbol table for logs that Bugly fails to parse.

1. Bugly restores the principle of the correct stack

With weak symbols RCTFBQuickPerformanceLoggerConfigureHooks function as an example, the reduction of the log analysis principle.

0 CoreFoundation  0x00000001804f504c 	0x000000018045c000 + 626764
1 Foundation      0x0000000181dae6cc 	0x0000000181c7e000 + 1246924
2 UIKit           0x0000000198e5cf30 	0x0000000198e57000 + 24368
3 AppName         0xe622388106d79fcc 	RCTFBQuickPerformanceLoggerConfigureHooks + 16244
4 CoreTelephony   0x0000000198e5e628 	0x0000000198e57000 + 30248
5 CoreTelephony   0x0000000108f68fe4 	0x0000000198e57000 + 78455260
6 CoreTelephony   0x00000001061ed870 	0x0000000198e57000 + 30763624
7 CoreTelephony   0x0000000108f657ec 	0x0000000198e57000 + 78440932
8 AppName         0x0000000108f67024 	_ZN6tflite19AcquireFlexDelegateEv + 78447132
9 Foundation      0x0000000108f67024 	_NSGetUsingKeyValueGetter + 88
Copy the code
  • We see it in the figure aboveRCTFBQuickPerformanceLoggerConfigureHooksThe virtual memory address of this call stack is abnormal. The general address is less than0xFFFFFFFFFFIs significantly larger in the example. We cleaned the high address and kept the stack healthy. After the adjustment, the address is0x106d79fcc, but of course not every Bugly error log virtual memory address is abnormal, if it is normal, do not change
  • Look for other information. Find the base addressApp base addrAnd here is0x102604000. If the crash occurred in another dynamic library, look for the address of the corresponding dynamic library below.
  • After step 1 and step 2, we get0x106d79fcc0x102604000
  • Instruction offset address is:0x4775FCC= (Step 1)0x106d79fcc– (Step 2)0x102604000
  • Find the Bugly symbol table for this package and open it as text
  • To find the0x4775FCCIn what line of sign interval
  • Finally found it in the0x4775fb40x4775FCC < 0x4775fd0, i.e.,3997407The symbol of the line, the symbol interval follows the principle of closed before open

And through the above steps we found itRCTFBQuickPerformanceLoggerConfigureHooksThe actual crash position of the function, and we useatosThe result of tool validation is consistent, indicating that this result is correct.

Above we looked up the correct stack of Bugly’s parse error logs in the symbol table. What if there is no symbol table? This involves extracting the symbol table.

2. How to extract the symbol table

If the symbol table is missing but the code is not changed, you can try to recompile and extract the symbol table in the same environment. This step has two prerequisites: 1. 2. Compile and link environments are the same to prevent Debug/Relase from affecting the final package. Dsymutil xxx.app/ XXX -o xxx.dSYM can be used to extract the symbol table from the Debug package

With the above two premises, symbol table can be extracted through dSYM files. At present, we have achieved the extraction of Bugly light symbol table, and the file volume is reduced to 60% compared with Bugly symbol table. Promote ICI(58 Project Management Platform) to output the symbol table according to certain rules. At present, the corresponding symbol table can be downloaded directly according to the UUID of crash logs, which greatly improves the efficiency of log parsing and troubleshooting.

3. Symbolic logs without symbolic tables

If you can’t find the symbol table (dSYM file or SYMBOL file) and can’t revert to the original code to regenerate the symbol table, consider using the unsigned table symbolic tool WBBlades to restore the log: github.com/wuba/WBBlad…

WBBlades is a toolset based on Mach-O file resolution, including unused code detection (ObjC and Swift support), application size analysis, log recovery without dSYM files.

Due to the limitations of the scheme itself, crash logs other than OC methods can not be resolved at present, such as: block crash, custom C function crash. You then need to consider how to symbolize the block’s crash log.

Optimize results and benefits

Now we know that when Bugly fails, we can use the other information Bugly gave us to find the correct answer in the sign table. Through the above research, we symbolized Bugly’s logs again by using our own analytical tools. We solved not only RN’s HOOK function problem but also several long-standing historical version crash problems within the group through tools. Here we briefly introduce some representative ones.

1. The problem of not getting the base address

Through introducing the collapse of RCTFBQuickPerformanceLoggerConfigureHooks function, we can get to the log in the other information of Bugly base address, through this address we either by atos validation or manually in the symbol table lookup can restore the right stack, But what if Bugly doesn’t have a base address in his other messages? Let’s look at the crash log below.

0 CoreFoundation 0x00000001835891b8 0x0000000183459000 + 1245624
5 UIKit 0x000000018963a660 0x000000018942f000 + 2143840
6 AppName 0x00000001075c9904 str_to_integral_8ExpectedIT_NS_14Conversion + 1950052
7 AppName 0x000000010627f94c RCTFBQuickPerformanceLoggerConfigureHooks + 3098344
8 AppName 0x00000001062015a0 RCTFBQuickPerformanceLoggerConfigureHooks + 2581308
9 AppName 0x00000001061fe498 RCTFBQuickPerformanceLoggerConfigureHooks + 2568756
10 AppName 0x00000001061fed38 RCTFBQuickPerformanceLoggerConfigureHooks + 2570964
11 AppName 0x00000001061ed900 RCTFBQuickPerformanceLoggerConfigureHooks + 2500252
12 AppName 0x0000000105231bd8 _ZZGetAppIdTableEvE12arAppIdTable + 57325128
13 libdispatch.dylib 0x00000001824121fc 0x0000000182411000 + 4604
21 UIKit 0x00000001894a4534 UIApplicationMain + 208
22 AppName 0x00000001085b73e8 _ZN15CTXAppidConvert13GetAppIdTableEv + 8521236
23 libdyld.dylib 0x00000001824455b8 0x0000000182441000 + 17848
Copy the code

Through the stack information above we see the collapse of the call stack has to stay in the weak symbols RCTFBQuickPerformanceLoggerConfigureHooks, but this is the difference with our examples above collapse in a bar in the additional information on the Bugly is empty, also is not base address, So we can’t use the ATos command, so we have to look it up in the symbol table, but we have to get the base address first. Let’s look at how to get the base address in this case.

  1. First of all we see22 AppName 0x00000001085b73e8 str_to_integral_8ExpectedIT_NS_14Conversion + 8521236This line of information, familiar with Bugly andcrashThose of you who log know that there’s a high probability that this row ismainDelta function, so that’s where we find the breakthrough.
  2. We seemainThe call stack symbol for the function is_ZN15CTXAppidConvert13GetAppIdTableEv, this function runs at0x00000001085b73e8
  3. So the starting address of this function is0x00000001085b73e80x8521236 = 0x107D96DD4
  4. Open up the symbol table and find_ZN15CTXAppidConvert13GetAppIdTableEvThe offset address of this symbol is0x7d86dd4
  5. theApp base addr(Base address):0x107D96DD40x7d86dd4 = 0x100010000

Then we get the base address of the log and use the above method to find the correct stack in the symbol table, so that the log can be properly parsed.

2. Crash of Baidu Map SDK

In addition toRNtheHOOKFunction problem, we also found a large number of crash log call stack are pointing to baidu map SDK.First of all, we thought the stack information displayed by Bugly was the crash of Baidu Map SDK. This crash accounted for about 40% of the total crash rate of 58.com App in some versions, and it was the module with the highest crash rate in 58App. The crash rate did not decrease after the replacement of new SDK. We have never encountered this in development and testing. Through the tracking data on Bugly, we can see that the last page records stay in the financial business, and the financial business has nothing to do with Baidu Map. So the crash should be the same as described above, parsing errors. Get the base address and run address, and get the correct stack using our own tools. The result was a crash of a facial recognition SDK used by financial businesses, with the same file name as the trace log on Bugly.

3. Anjuke IM login problem

We wrote a script file, using the script file to locate a tenant has a long problem. Bugly is high in the rankings and has a high percentage of crashes. The stack on Bugly shows an exception, so the specific cause of this crash was not located before. Script to assist locating guest isIMSDKThe reason why.

The above are some representative logs of Bugly parsing errors. We restored the wrong stack and solved the problem through research and analysis.

Currently we support automatic troubleshooting of the top 200 Bugly crash resolution logs by version, and can automatically symbolize the exception logs into the correct logs. The whole process only takes about 10 seconds under the premise that the symbol table has been downloaded and analyzed in advance, which greatly improves the efficiency of our daily research and development and problem solving. Through the treatment of difficult crashes on Bugly, we have repaired about 70% of difficult crashes on Bugly so far, greatly reducing the crash rate of 58 App.In addition to the above Bugly exception resolution logs, 58.com also supports the resolution of other exception logs. For example, there is a log in App after the crash of segment migration. The library name in the log becomes abnormal characters, the start address of the process is lost, and the wrong offset address is obtained. In this case, we can make automatic correction and resolve the correct stack information.

Summary and Prospect

The text first introduced the HOOK function problem of RN that we encountered when using Bugly. Through this problem, we raised the question that Bugly might have wrong parsing, and then found the correct answer by using ATOS command and symbol table investigation. In the process, we also found the problem of weak symbols. According to this research direction, we developed a series of tools within the group and solved the historical problems of multiple versions, greatly reducing the crash rate of 58.com iOS App and improving the efficiency of daily work research and development.

App performance optimization is very important to user experience, and crash, as one of the most important links, needs our continuous research and exploration. In the future, we will continue to optimize the performance of the App to bring users the best experience.

Difficult crash on “Kill” App!