APM Monitoring System: Crash (part 1)

7. Crash monitoring

1. Review of knowledge related to exceptions

1.1 Mach layer handling of exceptions

Mach implemented a unique set of exception handling on top of messaging. Mach exception handling was designed with the following in mind:

A single exception handling facility with consistent semantics: Mach provides only one exception handling mechanism for all types of exceptions (including user-defined exceptions, platform-independent exceptions, and platform-specific exceptions). Exceptions are grouped by type, and specific platforms can define specific subtypes.
Clarity and simplicity: The interface for exception handling relies on an existing well defined message and port architecture under Mach and is therefore very elegant (without affecting efficiency). This allows for extensibility of debuggers and external handlers – and even, in theory, extensibility of network-based exception handling.

In Mach, exceptions are handled through infrastructure-messaging mechanisms in the kernel. An exception is not much more complicated than a message, and is thrown by the thread or task that failed (via MSG_send ()), which is then caught by a handler via MSg_recv ()). Handlers can handle exceptions, they can know about exceptions (mark exceptions as completed and continue), and they can decide to terminate the thread.

Unlike other exception-handling models, where exception handlers run in the context of the error thread, Mach’s exception handler runs in a different context, where the error thread sends a message to a pre-specified exception port and then waits for a reply. Each task can register an exception handling port that applies to all threads in the task. In addition, Thread_set_exception_ports (<#thread_act_t thread#>, <# mask_t exception_mask#>, <#mach_port_t new_port#>, <#exception_behavior_t behavior#>, <#thread_state_flavor_t new_flavor#>) register your own exception handling port. Typically, the exception ports of both tasks and threads are NULL, meaning that exceptions are not handled, and once they are created, they can be handed off to other tasks or hosts just like any other port in the system. (With a port, you can use UDP to enable other host applications to handle exceptions through network capabilities).

When an exception occurs, you first try to throw the exception to the exception port of the thread, then to the exception port of the task, and finally to the exception port of the host (the default port for which the host is registered). If none of the ports returns KERN_SUCCESS, the entire task is aborted. That is, Mach does not provide exception-handling logic, only a framework for passing exception notifications.

Exceptions are first raised by processor traps. To handle traps, every modern kernel has a trap handler installed. These low-level functions are installed by the assembly part of the kernel.

1.2 Abnormal processing by BSD layer

The BSD layer, the main user-mode interface to XUN, presents a POSIX-compliant interface. Developers can use all the features of a UNIX system without knowing the details of the Mach layer implementation.

Mach already provided underlying trap handling through exceptions, while BSD built signal processing on top of exceptions. Signals generated by the hardware are captured by the Mach layer and then converted to the corresponding UNIX signals. In order to maintain a unified mechanism, signals generated by the operating system and the user are first converted to Mach exceptions and then to signals.

Mach exceptions are converted to appropriate Unix signals at the Host layer by Ux_Exception and posted to the offending thread via ThreadSignal.

2. Crash collection mode

The Apples’ Crash Reporter of iOS records the Crash log in the setting. Let’s observe the Crash log first

Incident Identifier: 7FA6736D-09E8-47A1-95EC-76C4522BDE1A CrashReporter Key: 4 e2d36419259f14413c3229e8b7235bcc74847f3 Hardware Model: iPhone7, 1 Process: APMMonitorExample [3608] the Path: /var/containers/Bundle/Application/9518A4F4-59B7-44E9-BDDA-9FBEE8CA18E5/APMMonitorExample.app/APMMonitorExample Identifier: com. Wacai. APMMonitorExample Version: 1.0 (1) Code Type: ARM - 64 the Parent Process:? [1] Date/Time: 2017-01-03 11:43:03.000 +0800 OS Version: iOS 10.2 (14C92) Report Version: 104 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x00000000 at 0x0000000000000000 Crashed Thread: 0 Application Specific Information: *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSSingleObjectArrayI objectForKey:]: unrecognized selector sent to instance 0x174015060' Thread 0 Crashed: 0 CoreFoundation 0x0000000188f291b8 0x188df9000 + 1245624 (<redacted> + 124) 1 libobjc.A.dylib 0x000000018796055c 0x187958000 + 34140 (objc_exception_throw + 56) 2 CoreFoundation 0x0000000188f30268 0x188df9000 + 1274472 (<redacted> + 140) 3 CoreFoundation 0x0000000188f2d270 0x188df9000 + 1262192 (<redacted> + 916) 4 CoreFoundation 0x0000000188e2680c 0x188df9000 + 186380 (_CF_forwarding_prep_0 + 92) 5 APMMonitorExample 0x000000010004c618 0x100044000 + 34328 (-[MakeCrashHandler throwUncaughtNSException] + 80)Copy the code

It can be found that the Exception Type item in the Crash log consists of two parts: Mach Exception and Unix signal.

So Exception Type: EXC_CRASH (SIGABRT) indicates that EXC_CRASH occurred at the Mach layer and was converted to SIGABRT at the host layer for delivery to the faulty thread.

Q: Catching Mach layer exceptions and registering Unix signal processing can both catch Crash. How to choose between the two methods?

A: Mach layer exception interception is preferred. As described in 1.2 above, we know that the Mach layer exception handler is handled earlier, and that if the Mach layer exception handler had let the process exit, the Unix signal would never have occurred.

There are many open source projects to collect crash logs in the industry, including KSCrash, PLcrashReporter, Bugly and Ummon. We use open source projects to develop a bug-gathering tool that meets our company’s internal needs. For comparison, select KSCrash. Why KSCrash was chosen is beside the point in this article.

KSCrash is fully functional and can capture the following types of crashes

Mach kernel exceptions
Fatal signals
C++ exceptions
Objective-C exceptions
Main thread deadlock (experimental)
Custom crashes (e.g. from scripting languages)

Therefore, to analyze the Crash collection scheme of iOS terminal is to analyze the implementation principle of Crash monitoring of KSCrash.

2.1. Mach layer exception handling

The general idea is as follows: First create an exception handling port, apply for permission for the port, set the exception port, create a kernel thread, and wait for exceptions in the loop. However, in order to prevent your registered Mach layer exception handling from preempting the logic set up by other SDKS or line of business developers, you need to save the other exception handling ports at the beginning and hand over the exception handling to the logic in the other ports when the logic is finished. After collecting Crash information, assemble data and write it into a JSON file.

The flow chart is as follows:

For Mach exception catching, you can register an exception port that listens on all threads of the current task.

Here’s the key code:

static bool installExceptionHandler() { KSLOG_DEBUG("Installing mach exception handler."); bool attributes_created = false; pthread_attr_t attr; kern_return_t kr; int error; // Get the current process const task_t thisTask = mach_task_self(); exception_mask_t mask = EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC | EXC_MASK_SOFTWARE | EXC_MASK_BREAKPOINT; KSLOG_DEBUG("Backing up original exception ports."); / / to get the Task on the abnormal registered good port kr = task_get_exception_ports (thisTask, mask, g_previousExceptionPorts. Masks, &g_previousExceptionPorts.count, g_previousExceptionPorts.ports, g_previousExceptionPorts.behaviors, g_previousExceptionPorts.flavors); // Failed logic if(kr! = KERN_SUCCESS) { KSLOG_ERROR("task_get_exception_ports: %s", mach_error_string(kr)); goto failed; } // If KSCrash is empty, execute logic if(g_exceptionPort == MACH_PORT_NULL) {kslo_debug ("Allocating new port with receive rights."); Kr = mach_port_allocate(thisTask, MACH_PORT_RIGHT_RECEIVE, &g_ExceptionPort); kr = mach_port_allocate(thisTask, MACH_PORT_RIGHT_RECEIVE, &g_ExceptionPort); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("mach_port_allocate: %s", mach_error_string(kr)); goto failed; } KSLOG_DEBUG("Adding send rights to port."); // Apply permission for exception handling port: MACH_MSG_TYPE_MAKE_SEND kr = mach_port_insert_right(thisTask, g_exceptionPort, g_exceptionPort, MACH_MSG_TYPE_MAKE_SEND); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("mach_port_insert_right: %s", mach_error_string(kr)); goto failed; } } KSLOG_DEBUG("Installing port as exception handler."); Kr = task_set_EXCEPtion_ports (thisTask, mask, g_exceptionPort, EXCEPTION_DEFAULT, THREAD_STATE_NONE); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr)); goto failed; } KSLOG_DEBUG("Creating secondary exception thread (suspended)."); pthread_attr_init(&attr); attributes_created = true; pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); Error = pthread_create(&g_secondarypThread, &attr, &handleExceptions, kThreadSecondary); if(error ! = 0) { KSLOG_ERROR("pthread_create_suspended_np: %s", strerror(error)); goto failed; } g_secondaryMachThread = pthread_mach_thread_NP (g_secondaryPThread); ksmc_addReservedThread(g_secondaryMachThread); KSLOG_DEBUG("Creating primary exception thread."); error = pthread_create(&g_primaryPThread, &attr, &handleExceptions, kThreadPrimary); if(error ! = 0) { KSLOG_ERROR("pthread_create: %s", strerror(error)); goto failed; } pthread_attr_destroy(&attr); g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread); ksmc_addReservedThread(g_primaryMachThread); KSLOG_DEBUG("Mach exception handler installed."); return true; failed: KSLOG_DEBUG("Failed to install mach exception handler."); if(attributes_created) { pthread_attr_destroy(&attr); } / / reduction anomaly before registration port, will restore uninstallExceptionHandler control (); return false; }Copy the code

Handle exception logic and assemble crash information

/** Our exception handler thread routine. * Wait for an exception message, uninstall our exception port, record the * exception information, and write a report. */ static void* handleExceptions(void* const userData) { MachExceptionMessage exceptionMessage = {{0}}; MachReplyMessage replyMessage = {{0}}; char* eventID = g_primaryEventID; const char* threadName = (const char*) userData; pthread_setname_np(threadName); if(threadName == kThreadSecondary) { KSLOG_DEBUG("This is the secondary thread. Suspending."); thread_suspend((thread_t)ksthread_self()); eventID = g_secondaryEventID; } // loop read the registered exception port for(;;) { KSLOG_DEBUG("Waiting for mach exception"); // Wait for a message. kern_return_t kr = mach_msg(&exceptionMessage.header, MACH_RCV_MSG, 0, sizeof(exceptionMessage), g_exceptionPort, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); If (kr == KERN_SUCCESS) {break; } // Loop and try again on failure. KSLOG_ERROR("mach_msg: %s", mach_error_string(kr)); } KSLOG_DEBUG("Trapped mach exception code 0x%x, subcode 0x%x", exceptionMessage.code[0], exceptionMessage.code[1]); If (g_isEnabled) {// Suspend all threads ksmc_suspendEnvironment(); g_isHandlingCrash = true; / / notice happened abnormal kscm_notifyFatalExceptionCaptured (true); KSLOG_DEBUG("Exception handler is installed. Continuing exception handling."); // Switch to the secondary thread if necessary, or uninstall the handler // to avoid a death loop. if(ksthread_self() == g_primaryMachThread) { KSLOG_DEBUG("This is the  primary exception thread. Activating secondary thread."); // TODO: This was put here to avoid a freeze. Does secondary thread ever fire? restoreExceptionPorts(); if(thread_resume(g_secondaryMachThread) ! = KERN_SUCCESS) { KSLOG_DEBUG("Could not activate secondary thread. Restoring original exception ports."); } } else { KSLOG_DEBUG("This is the secondary exception thread. Restoring original exception ports."); // restoreExceptionPorts(); } // Slog_DEBUG ("Fetching machine state."); KSMC_NEW_CONTEXT(machineContext); KSCrash_MonitorContext* crashContext = &g_monitorContext; crashContext->offendingMachineContext = machineContext; kssc_initCursor(&g_stackCursor, NULL, NULL); if(ksmc_getContextForThread(exceptionMessage.thread.name, machineContext, true)) { kssc_initWithMachineContext(&g_stackCursor, 100, machineContext); KSLOG_TRACE("Fault address 0x%x, instruction address 0x%x", kscpu_faultAddress(machineContext), kscpu_instructionAddress(machineContext)); if(exceptionMessage.exception == EXC_BAD_ACCESS) { crashContext->faultAddress = kscpu_faultAddress(machineContext); } else { crashContext->faultAddress = kscpu_instructionAddress(machineContext); } } KSLOG_DEBUG("Filling out context."); crashContext->crashType = KSCrashMonitorTypeMachException; crashContext->eventID = eventID; crashContext->registersAreValid = true; crashContext->mach.type = exceptionMessage.exception; crashContext->mach.code = exceptionMessage.code[0]; crashContext->mach.subcode = exceptionMessage.code[1]; if(crashContext->mach.code == KERN_PROTECTION_FAILURE && crashContext->isStackOverflow) { // A stack overflow should return KERN_INVALID_ADDRESS, but // when a stack blasts through the guard pages at the top of the stack, // it generates KERN_PROTECTION_FAILURE. Correct for this. crashContext->mach.code = KERN_INVALID_ADDRESS; } crashContext->signal.signum = signalForMachException(crashContext->mach.type, crashContext->mach.code); crashContext->stackCursor = &g_stackCursor; kscm_handleException(crashContext); KSLOG_DEBUG("Crash handling complete. Restoring original handlers."); g_isHandlingCrash = false; ksmc_resumeEnvironment(); } KSLOG_DEBUG("Replying to mach exception message."); // Send a reply saying "I didn't handle this exception". replyMessage.header = exceptionMessage.header; replyMessage.NDR = exceptionMessage.NDR; replyMessage.returnCode = KERN_FAILURE; mach_msg(&replyMessage.header, MACH_SEND_MSG, sizeof(replyMessage), 0, MACH_PORT_NULL, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL); return NULL; }Copy the code

Restore the exception handling port and transfer control

/** Restore the original mach exception ports. */ static void restoreExceptionPorts(void) { KSLOG_DEBUG("Restoring original exception ports."); if(g_previousExceptionPorts.count == 0) { KSLOG_DEBUG("Original exception ports were already restored."); return; } const task_t thisTask = mach_task_self(); kern_return_t kr; Run scandisk and then apply this port to the application server. // For loop remove the saved exception ports that were registered before KSCrash and register each port back to for(mach_MSG_type_number_t I = 0; i < g_previousExceptionPorts.count; i++) { KSLOG_TRACE("Restoring port index %d", i); kr = task_set_exception_ports(thisTask, g_previousExceptionPorts.masks[i], g_previousExceptionPorts.ports[i], g_previousExceptionPorts.behaviors[i], g_previousExceptionPorts.flavors[i]); if(kr ! = KERN_SUCCESS) { KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr)); } } KSLOG_DEBUG("Exception ports restored."); g_previousExceptionPorts.count = 0; }Copy the code

2.2. Signal Exception Handling

For Mach exceptions, the operating system converts them to the corresponding Unix signal, so developers can handle them by registering signanHandler.

The processing logic for KSCrash is shown below:

Take a look at the key code:

Sets the signal handler function

static bool installSignalHandler() { KSLOG_DEBUG("Installing signal handler."); #if KSCRASH_HAS_SIGNAL_STACK allocates a block of memory on the heap,  if(g_signalStack.ss_size == 0) { KSLOG_DEBUG("Allocating signal stack area."); g_signalStack.ss_size = SIGSTKSZ; g_signalStack.ss_sp = malloc(g_signalStack.ss_size); } // The sigaltstack() function, whose first argument, sigstack, is a pointer to a stack_t structure that stores the location and properties of an "alternative stack". The second parameter, old_sigstack, is also a stack_T pointer. It is used to return slog_DEBUG ("Setting signal stack area.") of the last alternative signal stack established (if any). // sigaltStack the first argument is to create a new replaceable stack. The second argument can be set to NULL. If it is not NULL, the old replaceable stack information will be stored in it. The function returns 0 on success, -1 on failure. If (sigaltStack (&g_signalStack, NULL)! = 0) { KSLOG_ERROR("signalstack: %s", strerror(errno)); goto failed; } #endif const int* fatalSignals = kssignal_fatalSignals(); int fatalSignalsCount = kssignal_numFatalSignals(); if(g_previousSignalHandlers == NULL) { KSLOG_DEBUG("Allocating memory to store previous signal handlers."); g_previousSignalHandlers = malloc(sizeof(*g_previousSignalHandlers) * (unsigned)fatalSignalsCount); Struct sigAction = {{0}}; struct sigAction = {{0}}; // the sa_flags member sets the SA_ONSTACK flag, which tells the kernel that the signal handler's stack frame is on the "alternate signal stack". action.sa_flags = SA_SIGINFO | SA_ONSTACK; #if KSCRASH_HOST_APPLE && defined(__LP64__) action.sa_flags |= SA_64REGSET; #endif sigemptyset(&action.sa_mask); action.sa_sigaction = &handleSignal; For (int I = 0; i < fatalSignalsCount; I++) {// bind each signal handler to the action declared above, G_previousSignalHandlers KSLOG_DEBUG(" Handler for signal % D ", fatalSignals[I]); if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) ! = 0) { char sigNameBuff[30]; const char* sigName = kssignal_signalName(fatalSignals[i]); if(sigName == NULL) { snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]); sigName = sigNameBuff; } KSLOG_ERROR("sigaction (%s): %s", sigName, strerror(errno)); // Try to reverse the damage for(i--; i >= 0; i--) { sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL); } goto failed; } } KSLOG_DEBUG("Signal handlers installed."); return true; failed: KSLOG_DEBUG("Failed to install signal handlers."); return false; }Copy the code

Context information such as threads is recorded during signal processing

static void handleSignal(int sigNum, siginfo_t* signalInfo, void* userContext) { KSLOG_DEBUG("Trapped signal %d", sigNum); if(g_isEnabled) { ksmc_suspendEnvironment(); kscm_notifyFatalExceptionCaptured(false); KSLOG_DEBUG("Filling out context."); KSMC_NEW_CONTEXT(machineContext); ksmc_getContextForSignal(userContext, machineContext); kssc_initWithMachineContext(&g_stackCursor, 100, machineContext); KSCrash_MonitorContext* crashContext = &g_monitorContext; memset(crashContext, 0, sizeof(*crashContext)); crashContext->crashType = KSCrashMonitorTypeSignal; crashContext->eventID = g_eventID; crashContext->offendingMachineContext = machineContext; crashContext->registersAreValid = true; crashContext->faultAddress = (uintptr_t)signalInfo->si_addr; crashContext->signal.userContext = userContext; crashContext->signal.signum = signalInfo->si_signo; crashContext->signal.sigcode = signalInfo->si_code; crashContext->stackCursor = &g_stackCursor; kscm_handleException(crashContext); ksmc_resumeEnvironment(); } KSLOG_DEBUG("Re-raising signal for regular handlers to catch."); // This is technically not allowed, but it works in OSX and iOS. raise(sigNum); }Copy the code

Restore the previous signal processing permission after KSCrash signal processing

static void uninstallSignalHandler(void) { KSLOG_DEBUG("Uninstalling signal handlers."); const int* fatalSignals = kssignal_fatalSignals(); int fatalSignalsCount = kssignal_numFatalSignals(); For (int I = 0; i < fatalSignalsCount; i++) { KSLOG_DEBUG("Restoring original handler for signal %d", fatalSignals[i]); sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL); } KSLOG_DEBUG("Signal handlers uninstalled."); }Copy the code

Description:

An area of memory is allocated from the heap, called the “replaceable signal stack”. The purpose is to eliminate the stack of the signal processing function and replace it with an area of memory on the heap, rather than sharing the same stack area with the process.

Why do you do that? A process may have n threads, each with its own task, and if one thread fails, the entire process will crash. Therefore, in order for the signal processing function to work properly, it is necessary to set up a separate running space for the signal processing function. In another case, the recursive function uses up the default stack space, but the signal handler uses the stack that its implementation allocates in the heap, not the default stack, so it still works.
Int sigaltStack (const stack_t * __RESTRICT, stack_t * __RESTRICT) two arguments to the function are Pointers to the stack_t structure, storing information about the alternative signal stack(starting address of the stack, length of the stack, state). The first parameter this structure stores the location and properties of an “alternative signal stack”. The second parameter returns information (if any) about the “alternative stack” that was created last time.
```
_STRUCT_SIGALTSTACK
{
	void            *ss_sp;         /* signal stack base */
	__darwin_size_t ss_size;        /* signal stack length */
	int             ss_flags;       /* SA_DISABLE and/or SA_ONSTACK */
};
typedef _STRUCT_SIGALTSTACK     stack_t; / * /?????  signal stack */
Copy the code
```
Ss_flags must be set to 0 for newly created replaceable signal stacks. The system defines the SIGSTKSZ constant, which can meet the needs of most replaceable signal stacks.
```
/* * Structure used in sigaltstack call. */

#define SS_ONSTACK      0x0001  /* take signal on signal stack */
#define SS_DISABLE      0x0004  /* disable taking signals on alternate stack */
#define MINSIGSTKSZ     32768   /* (32K)minimum allowable stack */
#define SIGSTKSZ        131072  /* (128K)recommended stack size */
Copy the code
```
The sigaltStack system call notifies the kernel that an “alternative signal stack” has been established.

If ss_FLAGS is SS_ONSTACK, the process is currently executing in the alternative signal stack. If you attempt to create a new alternative signal stack, you will encounter an EPERM error. If SS_DISABLE is set to available, no replaceable signal stack has been created.
int sigaction(int, const struct sigaction * __restrict, struct sigaction * __restrict);

The first function represents the value of the signal to be processed, but not SIGKILL and SIGSTOP, whose handlers are not allowed to be overridden. Because they provide the superuser with a way to terminate the program (SIGKILL and SIGSTOP cannot be caught, blocked, or ignored);

The second and third parameters are a SIGaction structure. If the second argument is not empty, it points to the signal handler. If the third argument is not empty, the previous signal handler is saved to the pointer. If the second argument is null and the third argument is not, the current signal handler can be obtained.
```
/* * Signal vector "template" used in sigaction call. */
struct  sigaction {
	union __sigaction_u __sigaction_u;  /* signal handler */
	sigset_t sa_mask;               /* signal mask to apply */
	int     sa_flags;               /* see signal options below */
};
Copy the code
```
Sigaction’s sa_flags parameter requires the SA_ONSTACK flag, which tells the kernel that the signal handler’s stack frame is on the “alternate signal stack”.

2.3. C++ exception handling

C++ exception handling relies on the library’s STD ::set_terminate(CPPExceptionTerminate) function.

Some functions of iOS project may be implemented using C, C++, etc. If a C++ exception is thrown, go through the OC exception catching mechanism if the exception can be converted to NSException. If not, continue through the C++ exception process, default_terminate_handler. The C++ exception’s default terminate function calls abort_message internally, triggering an abort call that generates a SIGABRT signal.

After the system throws a C++ exception, add a try… catch… To determine if the exception can be converted to NSException and then rethrown. At this time, the field stack of the exception has disappeared, so the upper layer cannot restore the scene when the exception occurs by capturing SIGABRT signal, that is, the exception stack is missing.

Why is that? try… catch… Inside the statement, __cxA_rethrow () is called to throw an exception, and inside __cxA_rethrow () is called to unwind. Unwind is simply the reverse of a function call. It is used to clean up local variables generated by each function during the function call. All the way up to the function where the outermost catch statement resides and passes control to the catch statement, which is why the C++ exception stack disappears.

static void setEnabled(bool isEnabled)
{
    if(isEnabled ! = g_isEnabled) { g_isEnabled = isEnabled;if(isEnabled)
        {
            initialize(a);ksid_generate(g_eventID);
            g_originalTerminateHandler = std::set_terminate(CPPExceptionTerminate);
        }
        else
        {
            std::set_terminate(g_originalTerminateHandler); } g_captureNextStackTrace = isEnabled; }}static void initialize(a)
{
    static bool isInitialized = false;
    if(! isInitialized) { isInitialized =true;
        kssc_initCursor(&g_stackCursor, NULL.NULL); }}void kssc_initCursor(KSStackCursor *cursor,
                     void (*resetCursor)(KSStackCursor*),
                     bool (*advanceCursor)(KSStackCursor*))
{ cursor->symbolicate = kssymbolicator_symbolicate; cursor->advanceCursor = advanceCursor ! =NULL? advanceCursor : g_advanceCursor; cursor->resetCursor = resetCursor ! =NULL ? resetCursor : kssc_resetCursor;
    cursor->resetCursor(cursor);
}
Copy the code

static void CPPExceptionTerminate(void)
{
    ksmc_suspendEnvironment(a);KSLOG_DEBUG("Trapped c++ exception");
    const char* name = NULL;
    std::type_info* tinfo = __cxxabiv1::__cxa_current_exception_type();
    if(tinfo ! =NULL)
    {
        name = tinfo->name(a); }if(name == NULL || strcmp(name, "NSException") != 0)
    {
        kscm_notifyFatalExceptionCaptured(false);
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        memset(crashContext, 0.sizeof(*crashContext));

        char descriptionBuff[DESCRIPTION_BUFFER_LENGTH];
        const char* description = descriptionBuff;
        descriptionBuff[0] = 0;

        KSLOG_DEBUG("Discovering what kind of exception was thrown.");
        g_captureNextStackTrace = false;
        try
        {
            throw;
        }
        catch(std::exception& exc)
        {
            strncpy(descriptionBuff, exc.what(), sizeof(descriptionBuff));
        }
#define CATCH_VALUE(TYPE, PRINTFTYPE) \
catch(TYPE value)\
{ \
    snprintf(descriptionBuff, sizeof(descriptionBuff), "%"#PRINTFTYPE, value); The \}
        CATCH_VALUE(char,                 d)
        CATCH_VALUE(short,                d)
        CATCH_VALUE(int,                  d)
        CATCH_VALUE(long,                ld)
        CATCH_VALUE(long long,          lld)
        CATCH_VALUE(unsigned char,        u)
        CATCH_VALUE(unsigned short,       u)
        CATCH_VALUE(unsigned int,         u)
        CATCH_VALUE(unsigned long,       lu)
        CATCH_VALUE(unsigned long long, llu)
        CATCH_VALUE(float,                f)
        CATCH_VALUE(double,               f)
        CATCH_VALUE(long double,         Lf)
        CATCH_VALUE(char*,                s)
        catch(...). { description =NULL;
        }
        g_captureNextStackTrace = g_isEnabled;

        // TODO: Should this be done here? Maybe better in the exception handler?
        KSMC_NEW_CONTEXT(machineContext);
        ksmc_getContextForThread(ksthread_self(), machineContext, true);

        KSLOG_DEBUG("Filling out context.");
        crashContext->crashType = KSCrashMonitorTypeCPPException;
        crashContext->eventID = g_eventID;
        crashContext->registersAreValid = false;
        crashContext->stackCursor = &g_stackCursor;
        crashContext->CPPException.name = name;
        crashContext->exceptionName = name;
        crashContext->crashReason = description;
        crashContext->offendingMachineContext = machineContext;

        kscm_handleException(crashContext);
    }
    else
    {
        KSLOG_DEBUG("Detected NSException. Letting the current NSException handler deal with it.");
    }
    ksmc_resumeEnvironment(a);KSLOG_DEBUG("Calling original terminate handler.");
    g_originalTerminateHandler(a); }Copy the code

2.4. Objective-c exception handling

For OC level NSException exception handling is relatively easy, can be captured by a registered NSUncaughtExceptionHandler exception information, through NSException parameters for Crash information collection, report to data components.

static void setEnabled(bool isEnabled)
{
    if(isEnabled ! = g_isEnabled) { g_isEnabled = isEnabled;if(isEnabled)
        {
            KSLOG_DEBUG(@"Backing up original handler.");
            // Record the previous OC exception handler
            g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler(a);KSLOG_DEBUG(@"Setting new handler.");
            // Set the new OC exception handler
            NSSetUncaughtExceptionHandler(&handleException);
            KSCrash.sharedInstance.uncaughtExceptionHandler = &handleException;
        }
        else
        {
            KSLOG_DEBUG(@"Restoring original handler.");
            NSSetUncaughtExceptionHandler(g_previousUncaughtExceptionHandler); }}}Copy the code

2.5. Main thread deadlock

Detection of the main thread deadlock is similar to detection of ANR

Create a thread with do… while… Loop processing logic, added autorelease to avoid high memory

There is an awaitingResponse property and a watchdogPulse method. The watchdogPulse main logic is to set awaitingResponse to YES, switch to the main thread and set awaitingResponse to NO,

- (void) watchdogPulse
{
    __block id blockSelf = self;
    self.awaitingResponse = YES;
    dispatch_async(dispatch_get_main_queue(), ^
                   {
                       [blockSelf watchdogAnswer];
                   });
}
Copy the code

The thread’s execution method is iterated continuously, waiting for the set g_watchdogInterval to determine whether the awaitingResponse attribute value is the initial state value, otherwise, it is judged to be a deadlock

- (void) runMonitor { BOOL cancelled = NO; do { // Only do a watchdog check if the watchdog interval is > 0. // If the interval is <= 0, just idle until the user changes it. @autoreleasepool { NSTimeInterval sleepInterval = g_watchdogInterval; BOOL runWatchdogCheck = sleepInterval > 0; if(! runWatchdogCheck) { sleepInterval = kIdleInterval; } [NSThread sleepForTimeInterval:sleepInterval]; cancelled = self.monitorThread.isCancelled; if(! cancelled && runWatchdogCheck) { if(self.awaitingResponse) { [self handleDeadlock]; } else { [self watchdogPulse]; } } } } while (! cancelled); }Copy the code

2.6 Generating and saving Crash

2.6.1 Generating Logic of Crash logs

The previous section described the various crash monitoring logic in iOS application development. Next, we should analyze how to record the crash information after the crash capture, that is, save it in the application sandbox.

Take the main thread deadlock as an example to see how KSCrash records crash information.

// KSCrashMonitor_Deadlock.m
- (void) handleDeadlock
{
    ksmc_suspendEnvironment();
    kscm_notifyFatalExceptionCaptured(false);

    KSMC_NEW_CONTEXT(machineContext);
    ksmc_getContextForThread(g_mainQueueThread, machineContext, false);
    KSStackCursor stackCursor;
    kssc_initWithMachineContext(&stackCursor, 100, machineContext);
    char eventID[37];
    ksid_generate(eventID);

    KSLOG_DEBUG(@"Filling out context.");
    KSCrash_MonitorContext* crashContext = &g_monitorContext;
    memset(crashContext, 0.sizeof(*crashContext));
    crashContext->crashType = KSCrashMonitorTypeMainThreadDeadlock;
    crashContext->eventID = eventID;
    crashContext->registersAreValid = false;
    crashContext->offendingMachineContext = machineContext;
    crashContext->stackCursor = &stackCursor;
    
    kscm_handleException(crashContext);
    ksmc_resumeEnvironment();

    KSLOG_DEBUG(@"Calling abort()");
    abort(a); }Copy the code

The same is true for the other crashes, where the exception information is wrapped to the kscm_handleException() function. You can see that this function is called after being caught by several other crashes.


/** Start general exception processing. * * @oaram context Contextual information about the exception. */
void kscm_handleException(struct KSCrash_MonitorContext* context)
{
    context->requiresAsyncSafety = g_requiresAsyncSafety;
    if(g_crashedDuringExceptionHandling)
    {
        context->crashedDuringCrashHandling = true;
    }
    for(int i = 0; i < g_monitorsCount; i++)
    {
        Monitor* monitor = &g_monitors[i];
        // Check whether crash monitoring is enabled
        if(isMonitorEnabled(monitor))
        {
            // Do some additional supplementary information for each crash typeaddContextualInfoToEvent(monitor, context); }}// To actually process crash information, save crash information in JSON format
    g_onExceptionEvent(context);

    
    if(g_handlingFatalException && ! g_crashedDuringExceptionHandling) { KSLOG_DEBUG("Exception is fatal. Restoring original handlers."); kscm_setActiveMonitors(KSCrashMonitorTypeNone); }}Copy the code

G_onExceptionEvent is a block declared static void (*g_onExceptionEvent)(struct KSCrash_MonitorContext* monitorContext); Is assigned in kscrashMonitor.c

void kscm_setEventCallback(void (*onEvent)(struct KSCrash_MonitorContext* monitorContext))
{
    g_onExceptionEvent = onEvent;
}
Copy the code

The kscm_setEventCallback() function is called in the kscrashc.c file

KSCrashMonitorType kscrash_install(const char* appName, const char* const installPath)
{
    KSLOG_DEBUG("Installing crash reporter.");

    if(g_installed)
    {
        KSLOG_DEBUG("Crash reporter already installed.");
        return g_monitoring;
    }
    g_installed = 1;

    char path[KSFU_MAX_PATH_LENGTH];
    snprintf(path, sizeof(path), "%s/Reports", installPath);
    ksfu_makePath(path);
    kscrs_initialize(appName, path);

    snprintf(path, sizeof(path), "%s/Data", installPath);
    ksfu_makePath(path);
    snprintf(path, sizeof(path), "%s/Data/CrashState.json", installPath);
    kscrashstate_initialize(path);

    snprintf(g_consoleLogPath, sizeof(g_consoleLogPath), "%s/Data/ConsoleLog.txt", installPath);
    if(g_shouldPrintPreviousLog)
    {
        printPreviousLog(g_consoleLogPath);
    }
    kslog_setLogFilename(g_consoleLogPath, true);
    
    ksccd_init(60);
    // Set the callback function when crash occurs
    kscm_setEventCallback(onCrash);
    KSCrashMonitorType monitors = kscrash_setMonitoring(g_monitoring);

    KSLOG_DEBUG("Installation complete.");
    return monitors;
}

/** Called when a crash occurs. * * This function gets passed as a callback to a crash handler. */
static void onCrash(struct KSCrash_MonitorContext* monitorContext)
{
    KSLOG_DEBUG("Updating application state to note crash.");
    kscrashstate_notifyAppCrash();
    monitorContext->consoleLogPath = g_shouldAddConsoleLogToReport ? g_consoleLogPath : NULL;

    // A second crash occurred while processing the crash
    if(monitorContext->crashedDuringCrashHandling)
    {
        kscrashreport_writeRecrashReport(monitorContext, g_lastCrashReportFilePath);
    }
    else
    {
        // 1. Create a new file path based on the current time
        char crashReportFilePath[KSFU_MAX_PATH_LENGTH];
        kscrs_getNextCrashReportPath(crashReportFilePath);
        // 2. Save the path of the newly generated file to g_lastCrashReportFilePath
        strncpy(g_lastCrashReportFilePath, crashReportFilePath, sizeof(g_lastCrashReportFilePath));
        // 3. Pass the newly generated file path into the function for crash writingkscrashreport_writeStandardReport(monitorContext, crashReportFilePath); }}Copy the code

The next function is the concrete log write file implementation. Both functions do similar things, formatting to JSON and writing to a file. Kscrashreport_writeRecrashReport () Otherwise take the standard write logic kscrashreport_writeStandardReport().

bool ksfu_openBufferedWriter(KSBufferedWriter* writer, const char* const path, char* writeBuffer, int writeBufferLength)
{
    writer->buffer = writeBuffer;
    writer->bufferLength = writeBufferLength;
    writer->position = 0;
    #define O_RDONLY 0x0000 open for Reading only #define O_WRONLY 0x0001 Open for writing only #define O_WRONLY 0x0001 Open for writing only #define O_RDWR 0x0002 open for reading and writing #define O_ACCMODE 0x0003 mask for above mode #define O_CREAT 0x0200 Create if nonexistant #define O_TRUNC 0x0400 TRUNCate to zero length #define O_EXCL 0x0800 error if already exists 0755: That is, users have read/write/execute permissions, and group users and other users have read/write permissions. 0644: The user has read/write permission, and the group user and other users have read/write permission. Returns the file descriptor on success or -1 */ on occurrence
    writer->fd = open(path, O_RDWR | O_CREAT | O_EXCL, 0644);
    if(writer->fd < 0)
    {
        KSLOG_ERROR("Could not open crash report file %s: %s", path, strerror(errno));
        return false;
    }
    return true;
}
Copy the code

/** * Write a standard crash report to a file. * * @param monitorContext Contextual information about the crash and environment. * The caller must fill this out before passing it in. * * @param path The file to write to. */
void kscrashreport_writeStandardReport(const struct KSCrash_MonitorContext* const monitorContext,
                                       const char* path)
{
		KSLOG_INFO("Writing crash report to %s", path);
    char writeBuffer[1024];
    KSBufferedWriter bufferedWriter;

    if(! ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer,sizeof(writeBuffer)))
    {
        return;
    }

    ksccd_freeze();
    
    KSJSONEncodeContext jsonContext;
    jsonContext.userData = &bufferedWriter;
    KSCrashReportWriter concreteWriter;
    KSCrashReportWriter* writer = &concreteWriter;
    prepareReportWriter(writer, &jsonContext);

    ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter);

    writer->beginObject(writer, KSCrashField_Report);
    {
        writeReportInfo(writer,
                        KSCrashField_Report,
                        KSCrashReportType_Standard,
                        monitorContext->eventID,
                        monitorContext->System.processName);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeBinaryImages(writer, KSCrashField_BinaryImages);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeProcessState(writer, KSCrashField_ProcessState, monitorContext);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeSystemInfo(writer, KSCrashField_System, monitorContext);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writer->beginObject(writer, KSCrashField_Crash);
        {
            writeError(writer, KSCrashField_Error, monitorContext);
            ksfu_flushBufferedWriter(&bufferedWriter);
            writeAllThreads(writer,
                            KSCrashField_Threads,
                            monitorContext,
                            g_introspectionRules.enabled);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        writer->endContainer(writer);

        if(g_userInfoJSON ! =NULL)
        {
            addJSONElement(writer, KSCrashField_User, g_userInfoJSON, false);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        else
        {
            writer->beginObject(writer, KSCrashField_User);
        }
        if(g_userSectionWriteCallback ! =NULL)
        {
            ksfu_flushBufferedWriter(&bufferedWriter);
            g_userSectionWriteCallback(writer);
        }
        writer->endContainer(writer);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeDebugInfo(writer, KSCrashField_Debug, monitorContext);
    }
    writer->endContainer(writer);
    
    ksjson_endEncode(getJsonContext(writer));
    ksfu_closeBufferedWriter(&bufferedWriter);
    ksccd_unfreeze();
}

/** Write a minimal crash report to a file. * * @param monitorContext Contextual information about the crash and environment. * The caller must fill this out before passing it in. * * @param path The file to write to. */
void kscrashreport_writeRecrashReport(const struct KSCrash_MonitorContext* const monitorContext,
                                      const char* path)
{
  char writeBuffer[1024];
    KSBufferedWriter bufferedWriter;
    static char tempPath[KSFU_MAX_PATH_LENGTH];
    // The last crash report will be delivered Filename path (/ var/mobile/Containers/Data/Application / * * * * * * / Library/Caches/KSCrash/Test/Reports/Test report - * * * * * *. Json) is modified to remove Json, Add.old to make the new file path /var/mobile/Containers/Data/Application/******/Library/Caches/KSCrash/Test/Reports/Test-report-******.old

    strncpy(tempPath, path, sizeof(tempPath) - 10);
    strncpy(tempPath + strlen(tempPath) - 5.".old".5);
    KSLOG_INFO("Writing recrash report to %s", path);

    if(rename(path, tempPath) < 0)
    {
        KSLOG_ERROR("Could not rename %s to %s: %s", path, tempPath, strerror(errno));
    }
    // Open memory to write the required files according to the incoming path
    if(! ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer,sizeof(writeBuffer)))
    {
        return;
    }

    ksccd_freeze();
    // Json parsing c code
    KSJSONEncodeContext jsonContext;
    jsonContext.userData = &bufferedWriter;
    KSCrashReportWriter concreteWriter;
    KSCrashReportWriter* writer = &concreteWriter;
    prepareReportWriter(writer, &jsonContext);

    ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter);

    writer->beginObject(writer, KSCrashField_Report);
    {
        writeRecrash(writer, KSCrashField_RecrashReport, tempPath);
        ksfu_flushBufferedWriter(&bufferedWriter);
        if(remove(tempPath) < 0)
        {
            KSLOG_ERROR("Could not remove %s: %s", tempPath, strerror(errno));
        }
        writeReportInfo(writer,
                        KSCrashField_Report,
                        KSCrashReportType_Minimal,
                        monitorContext->eventID,
                        monitorContext->System.processName);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writer->beginObject(writer, KSCrashField_Crash);
        {
            writeError(writer, KSCrashField_Error, monitorContext);
            ksfu_flushBufferedWriter(&bufferedWriter);
            int threadIndex = ksmc_indexOfThread(monitorContext->offendingMachineContext,
                                                 ksmc_getThreadFromContext(monitorContext->offendingMachineContext));
            writeThread(writer,
                        KSCrashField_CrashedThread,
                        monitorContext,
                        monitorContext->offendingMachineContext,
                        threadIndex,
                        false);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        writer->endContainer(writer);
    }
    writer->endContainer(writer);

    ksjson_endEncode(getJsonContext(writer));
    ksfu_closeBufferedWriter(&bufferedWriter);
    ksccd_unfreeze();
}
Copy the code

2.6.2 Logic of reading Crash logs

After the current App crashes, KSCrash saves the data to the App sandbox directory. After the App starts up next time, we read the stored Crash file, and then process the data and upload it.

Function call after App startup:

[KSCrashInstallation sendAllReportsWithCompletion:] -> [KSCrash sendAllReportsWithCompletion:] -> [KSCrash allReports] -> [KSCrash reportWithIntID:] ->[KSCrash loadCrashReportJSONWithID:] -> kscrs_readReport

Read the sandbox in sendAllReportsWithCompletion Crash data.

Static int getReportCount() {int count = 0; DIR* dir = opendir(g_reportsPath); if(dir == NULL) { KSLOG_ERROR("Could not open directory %s", g_reportsPath); goto done; } struct dirent* ent; while((ent = readdir(dir)) ! = NULL) { if(getReportIDFromFilename(ent->d_name) > 0) { count++; } } done: if(dir ! = NULL) { closedir(dir); } return count; } // Go through the number of files and folder information to get the file name (the last part of the file name is reportID), then read the file content in the crash report. Write array - (NSArray*) allReports {int reportCount = kscrash_getReportCount(); int64_t reportIDs[reportCount]; reportCount = kscrash_getReportIDs(reportIDs, reportCount); NSMutableArray* reports = [NSMutableArray arrayWithCapacity:(NSUInteger)reportCount]; for(int i = 0; i < reportCount; i++) { NSDictionary* report = [self reportWithIntID:reportIDs[i]]; if(report ! = nil) { [reports addObject:report]; } } return reports; } // find crash information according to reportID - (NSDictionary*) reportWithIntID:(int64_t) reportID {NSData* jsonData = [self loadCrashReportJSONWithID:reportID]; if(jsonData == nil) { return nil; } NSError* error = nil; NSMutableDictionary* crashReport = [KSJSONCodec decode:jsonData options:KSJSONDecodeOptionIgnoreNullInArray | KSJSONDecodeOptionIgnoreNullInObject | KSJSONDecodeOptionKeepPartialObject error:&error]; if(error ! = nil) { KSLOG_ERROR(@"Encountered error loading crash report %" PRIx64 ": %@", reportID, error); } if(crashReport == nil) { KSLOG_ERROR(@"Could not load crash report"); return nil; } [self doctorReport:crashReport]; return crashReport; } / / reportID read crash content and converted to NSData type - (NSData *) loadCrashReportJSONWithID: int64_t reportID {char * report = kscrash_readReport(reportID); if(report ! = NULL) { return [NSData dataWithBytesNoCopy:report length:strlen(report) freeWhenDone:YES]; } return nil; } // reportID Reads crash data to char char* kscrash_readReport(int64_t reportID) {if(reportID <= 0) {KSLOG_ERROR("Report ID was %" PRIx64, reportID); return NULL; } char* rawReport = kscrs_readReport(reportID); if(rawReport == NULL) { KSLOG_ERROR("Failed to load report ID %" PRIx64, reportID); return NULL; } char* fixedReport = kscrf_fixupCrashReport(rawReport); if(fixedReport == NULL) { KSLOG_ERROR("Failed to fixup report ID %" PRIx64, reportID); } free(rawReport); return fixedReport; } // Multithreaded lock. Run the getCrashReportPathByID c function through reportID to set the path to the path. Result char* kscrs_readReport(INT64_t reportID) {pthread_mutex_lock(&g_mutex); char path[KSCRS_MAX_PATH_LENGTH]; getCrashReportPathByID(reportID, path); char* result; ksfu_readEntireFile(path, &result, NULL, 2000000); pthread_mutex_unlock(&g_mutex); return result; } int kscrash_getReportIDs(int64_t* reportIDs, int count) { return kscrs_getReportIDs(reportIDs, count); } int kscrs_getReportIDs(int64_t* reportIDs, int count) { pthread_mutex_lock(&g_mutex); count = getReportIDs(reportIDs, count); pthread_mutex_unlock(&g_mutex); return count; } // Call getReportIDFromFilename according to ent->d_name to get reportID. Static int getReportIDs(int64_t* reportIDs, int count) {int index = 0; DIR* dir = opendir(g_reportsPath); if(dir == NULL) { KSLOG_ERROR("Could not open directory %s", g_reportsPath); goto done; } struct dirent* ent; while((ent = readdir(dir)) ! = NULL && index < count) { int64_t reportID = getReportIDFromFilename(ent->d_name); if(reportID > 0) { reportIDs[index++] = reportID; } } qsort(reportIDs, (unsigned)count, sizeof(reportIDs[0]), compareInt64); done: if(dir ! = NULL) { closedir(dir); } return index; } // the sprintf(argument 1, format 2) function returns the value of format 2 to argument 1 and then executes sscanf(argument 1, argument 2, argument 3). The function writes the contents of the string argument 1 to argument 3 in the format of argument 2. Json "static int64_t getReportIDFromFilename(const char* filename) {char scanFormat[100]; sprintf(scanFormat, "%s-report-%%" PRIx64 ".json", g_appName); int64_t reportID = 0; sscanf(filename, scanFormat, &reportID); return reportID; }Copy the code

2.7 Monitoring of front-end JS related Crash

2.7.1 JavascriptCore Exception Monitoring

This part is straightforward and directly monitored by the exceptionHandler attribute of the JSContext object, as shown in the following code

JsContext. ExceptionHandler = ^ (jsContext * context, JSValue * exception) {/ / processing jscore related exception information};Copy the code

2.7.2 H5 Page Exception Monitoring

The window object raises the error event of the ErrorEvent interface and executes window.onerror() when Javascript runs abnormally in the H5 page.

window.onerror = function (msg, url, lineNumber, columnNumber, error) {
   // Handle exception information
};
Copy the code

2.7.3 React Native Exception Monitoring

The following is a RN Demo project: add event listening code to the Debug Text control, and manually trigger crash

<Text style={styles.sectionTitle} onPress={() = >{1+qw; }}>Debug</Text>Copy the code

Comparison Group 1:

Condition: iOS project debug mode. Exception handling code has been added to RN.

Click Command + D on the emulator to bring up the panel, select Debug, open Chrome browser, press the Mac shortcut command + Option + J to open the Debug panel, and you can Debug RN code just like you can Debug React.

Click on crash Stack to jump to sourceMap.

Tips: RN projects have a Release package

Create a folder (release_iOS) in the project root directory as the output folder for the resource

Switch to the project directory at the terminal and execute the following code

react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map;
Copy the code

Drag the contents of the.jsbundle and Assets folders in the release_iOS folder into the iOS project

Comparison Group 2:

Condition: iOS project release mode. No exception handling code is added on the RN side

Operation: Run iOS project, click the button to simulate crash

What you see: iOS projects crash. The following table shows the screenshots and logs

[info][tid:main][RctrootView.m :294] Running Application todos ({initialProps = {}; rootTag = 1; }) 2020-06-22 22:26:03. 490 [info] [dar: com. Facebook. React. JavaScript] Running "with" todos {" rootTag ": 1," initialProps ": {}} 22:27:38. 2020-06-22, 673 [error] [dar: com. Facebook. React. JavaScript] ReferenceError: Can't find variable: 've 22:27:38 2020-06-22. 675 (fatal) [dar: com. Facebook. React. ExceptionsManagerQueue] Unhandled JS Exception: ReferenceError: Can't find variable: Qw 2020-06-22 22:27:38.691300+0800 todos[16790:314161] *** Terminating app due to uncaught exception 'RCTFatalException:  Unhandled JS Exception: ReferenceError: Can't find variable: qw', reason: 'Unhandled JS Exception: ReferenceError: Can't find variable: qw, stack: onPress@397:1821 <unknown>@203:3896 _performSideEffectsForTransition@210:9689 _performSideEffectsForTransition@(null):(null) _receiveSignal@210:8425 _receiveSignal@(null):(null) touchableHandleResponderRelease@210:5671 touchableHandleResponderRelease@(null):(null) onResponderRelease@203:3006 b@97:1125 S@97:1268 w@97:1322 R@97:1617 M@97:2401 forEach@(null):(null) U@97:2201 <unknown>@97:13818 Pe@97:90199 Re@97:13478 Ie@97:13664 receiveTouches@97:14448 value@27:3544 <unknown>@27:840 value@27:2798 value@27:812 value@(null):(null) ' *** First throw call stack: ( 0 CoreFoundation 0x00007fff23e3cf0e __exceptionPreprocess + 350 1 libobjc.A.dylib 0x00007fff50ba89b2 objc_exception_throw + 48 2 todos 0x00000001017b0510 RCTFormatError + 0 3 todos 0x000000010182d8ca -[RCTExceptionsManager reportFatal:stack:exceptionId:suppressRedBox:] + 503 4 todos 0x000000010182e34e -[RCTExceptionsManager reportException:] + 1658 5 CoreFoundation 0x00007fff23e43e8c __invoking___ + 140 6 CoreFoundation  0x00007fff23e41071 -[NSInvocation invoke] + 321 7 CoreFoundation 0x00007fff23e41344 -[NSInvocation invokeWithTarget:] +  68 8 todos 0x00000001017e07fa -[RCTModuleMethod invokeWithBridge:module:arguments:] + 578 9 todos 0x00000001017e2a84 _ZN8facebook5reactL11invokeInnerEP9RCTBridgeP13RCTModuleDatajRKN5folly7dynamicE + 246 10 todos 0x00000001017e280c ___ZN8facebook5react15RCTNativeModule6invokeEjON5folly7dynamicEi_block_invoke + 78 11 libdispatch.dylib 0x00000001025b5f11 _dispatch_call_block_and_release + 12 12 libdispatch.dylib 0x00000001025b6e8e _dispatch_client_callout + 8 13 libdispatch.dylib 0x00000001025bd6fd _dispatch_lane_serial_drain + 788 14 libdispatch.dylib 0x00000001025be28f _dispatch_lane_invoke + 422 15 libdispatch.dylib 0x00000001025c9b65 _dispatch_workloop_worker_thread + 719 16 libsystem_pthread.dylib 0x00007fff51c08a3d _pthread_wqthread + 290 17 libsystem_pthread.dylib 0x00007fff51c07b77 start_wqthread + 15 ) libc++abi.dylib: terminating with uncaught exception of type NSException (lldb)Copy the code

Tips: How to debug in RN release mode (see console info on js side)

inAppDelegate.mThe introduction of#import <React/RCTLog.h>
in- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptionsaddRCTSetLogThreshold(RCTLogLevelTrace);

Comparison Group 3:

Condition: iOS project release mode. Add exception handling code to RN.

global.ErrorUtils.setGlobalHandler((e) = > {
  console.log(e);
  let message = { name: e.name,
                message: e.message,
                stack: e.stack
  };
  axios.get('http://192.168.1.100:8888/test.php', {
  	params: { 'message': JSON.stringify(message) }
  }).then(function (response) {
  		console.log(response)
  }).catch(function (error) {
  console.log(error)
  });
}, true)
Copy the code

Operation: Run iOS project, click the button to simulate crash.

What you see: iOS projects don’t crash. The log information is as follows: Compare the JS in the bundle.

Conclusion:

In RN projects, if a crash occurs, it will be reflected in the Native side. If RN profiles the code captured by crash, the Native side will not crash. If the crash on the RN side is not captured, the Native directly crashes.

RN project wrote crash monitoring. After the monitoring, stack information was printed out and it was found that the corresponding JS information was processed by Webpack. Crash analysis was very difficult. Therefore, we need to write monitoring codes in RN profile and report them after monitoring. In addition, we need to write special crash information to restore the monitored information to you, that is, sourceMap parsing.

2.7.3.1 JS Logic Error

Anyone who has written RN knows that js code in DEBUG mode will have a red screen if there is a problem, and in RELEASE mode, it will have a white screen or flash back. For the sake of experience and quality control, abnormal monitoring is required.

I found ErrorUtils while looking at RN’s source code, which can be set to handle error messages.

/**
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 *
 * @format
 * @flow strict
 * @polyfill* /

let _inGuard = 0;

type ErrorHandler = (error: mixed, isFatal: boolean) = > void;
type Fn<Args, Return> = (. Args) = > Return;

/** * This is the error handler that is called when we encounter an exception * when loading a module. This will report any errors encountered before * ExceptionsManager is configured. */
let _globalHandler: ErrorHandler = function onError(e: mixed, isFatal: boolean,) {
  throw e;
};

/** * The particular require runtime that we are using looks for a global * `ErrorUtils` object and if it exists, then it requires modules with the * error handler specified via ErrorUtils.setGlobalHandler by calling the * require function with applyWithGuard. Since the require module is loaded * before any of the modules, this ErrorUtils must be defined (and the handler * set) globally before requiring anything. */
const ErrorUtils = {
  setGlobalHandler(fun: ErrorHandler): void {
    _globalHandler = fun;
  },
  getGlobalHandler(): ErrorHandler {
    return _globalHandler;
  },
  reportError(error: mixed): void {
    _globalHandler && _globalHandler(error, false);
  },
  reportFatalError(error: mixed): void {
    // NOTE: This has an untyped call site in Metro.
    _globalHandler && _globalHandler(error, true); }, applyWithGuard<TArgs: $ReadOnlyArray<mixed>, TOut>( fun: Fn<TArgs, TOut>, context? :? mixed, args? :? TArgs,// Unused, but some code synced from www sets it to null.unused_onError? :null.// Some callers pass a name here, which we ignore.unused_name? :? string, ): ? TOut {try {
      _inGuard++;
      // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work
      return fun.apply(context, args);
    } catch (e) {
      ErrorUtils.reportError(e);
    } finally {
      _inGuard--;
    }
    return null; }, applyWithGuardIfNeeded<TArgs: $ReadOnlyArray<mixed>, TOut>( fun: Fn<TArgs, TOut>, context? :? mixed, args? :? TArgs, ): ? TOut {if (ErrorUtils.inGuard()) {
      // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work
      return fun.apply(context, args);
    } else {
      ErrorUtils.applyWithGuard(fun, context, args);
    }
    return null;
  },
  inGuard(): boolean {
    return!!!!! _inGuard; }, guard<TArgs: $ReadOnlyArray<mixed>, TOut>( fun: Fn<TArgs, TOut>, name? :? string, context? :? mixed, ): ?(. TArgs) = >? TOut {// TODO: (moti) T48204753 Make sure this warning is never hit and remove it - types
    // should be sufficient.
    if (typeoffun ! = ='function') {
      console.warn('A function must be passed to ErrorUtils.guard, got ', fun);
      return null;
    }
    const guardName = name ?? fun.name ?? '<generated guard>';
    function guarded(. args: TArgs): ?TOut {
      return ErrorUtils.applyWithGuard(
        fun,
        context ?? this,
        args,
        null,
        guardName,
      );
    }

    returnguarded; }};global.ErrorUtils = ErrorUtils;

export type ErrorUtilsT = typeof ErrorUtils;
Copy the code

So exceptions to RN can be set to error handling using global.errorUtils. For example

global.ErrorUtils.setGlobalHandler(e => {
   // e.name e.message e.stack
}, true);
Copy the code

2.7.3.2 Component Faults

So one other thing you need to note about RN’s crash handling is React Error Boundaries. Detailed information on

In the past, JavaScript errors within components would cause the React internal state to be corrupted and produce errors that might not be traceable during the next rendering. These errors are basically caused by earlier errors in other code (non-React component code), but React does not provide a way to gracefully handle these errors in a component or recover from them.

JavaScript errors in part of the UI should not crash the entire app, and React 16 introduced a new concept called error boundaries to address this issue.

The error boundary is a React component that catches and prints JavaScript errors that occur anywhere in its child tree, and instead of rendering the broken child tree, it renders the alternate UI. Error bounds catch errors during rendering, in lifecycle methods, and in constructors throughout the component tree.

It catches exceptions in child component lifecycle functions, including constructors and render functions

The following exceptions cannot be caught:

Event Handlers
Asynchronous code (e.g. SetTimeout, promise, etc.)
Server Side Rendering (Server-side Rendering)
Error thrown in the error boundary itself (rather than its children)

Therefore, the exception boundary component can be used to capture all exceptions in the life cycle of the component and then render the bottom-of-the-pocket component to prevent App crash and improve user experience. Users can also be guided to feedback problems, convenient troubleshooting and repair problems

So far, RN crash is divided into two types, namely, JS logic error and component JS error, which have been monitored and processed. Now, how do we solve these problems at an engineering level

2.7.4 Restoring RN Crash

The SourceMap file is very important for parsing the front-end logs. The SourceMap file contains the parameters and the steps to calculate them. See this article.

With SourceMap files, it is possible to restore RN’s crash logs with the help of Mozilla’s source-Map project.

I have written a NodeJS script with the following code

var fs = require('fs');
var sourceMap = require('source-map');
var arguments = process.argv.splice(2);

function parseJSError(aLine, aColumn) {
    fs.readFile('./index.ios.map'.'utf8'.function (err, data) {
        const whatever =  sourceMap.SourceMapConsumer.with(data, null.consumer= > {
            // Read the row and column numbers of crash logs
            let parseData = consumer.originalPositionFor({
                line: parseInt(aLine),
                column: parseInt(aColumn)
            });
            // Output to the console
            console.log(parseData);
            // Output to a file
            fs.writeFileSync('./parsed.txt'.JSON.stringify(parseData) + '\n'.'utf8'.function(err) {  
                if(err) {  
                    console.log(err); }}); }); }); }var line = arguments[0];
var column = arguments[1];
parseJSError(line, column);
Copy the code

Here’s an experiment, again with the ToDOS project.

Simulate crash on the click event of Text

<Text style={styles.sectionTitle} onPress={() = >{1+qw; }}>Debug</Text>Copy the code

Bundle RN projects and produce sourceMap files. Execute the command,

react-native bundle --entry-file index.js --platform android --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.android.map;
Copy the code

Because of frequent use, add alias alias Settings to iterm2 and modify the.zshrc file

alias RNRelease='react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map; '# RN calls the Release packageCopy the code

Copy the JS bundle and image resources into the Xcode project
Click simulate Crash and copy the row and column numbers below the log. Under the Node project, run the following command
```
node index.js 397 1822
Copy the code
```
The line number, column number, and file information parsed by the script were compared with the source file, and the result was correct.

2.7.5 SourceMap parsing system design

Purpose: Through the platform, crash on RN project line can be restored to specific files, code lines, code columns. You can see the code, you can see RN Stack Trace, and you can download source files.

Server managed under package system:
- The source map file is generated only when packaged in production
- Save all files before packaging (install)
Develop RN analysis interface on product side. Click the collected RN crash, and you can see the specific file, number of code lines and number of code columns on the detail page. You can see the actual code, you can see RN stack trace, Native stack trace. (Specific technical implementation has been mentioned above)
Because souece Map files are large, long RN parsing consumes computing resources, so efficient reading methods need to be designed
SourceMap works differently in iOS and Android mode, so SoureceMap storage needs to be OS specific.

The content of the article is too long, divided into several chapters, please click to view, if you want to view the whole coherent, please visit here