Phenomenon of the problem

After the update of the latest version of APP, a large number of abnormal data were reported from Sentry, and the number of users affected was extremely exaggerated 10W +. The specific errors are reported as follows

The screening process

First, I checked the cause of the SIGPIPE error and found relevant information on the official website

If the client is still sending data after the Socket connection is closed, a SIGPIPE signal will be generated. If the signal is not processed, it will crash.

The documentation states that you can use signal(SIGPIPE, SIG_IGN) global ignore to confirm that the client has added this logic, but the exception is reported to sentry anyway. Signal is a function that associates a signal with a handler and executes it when received. SIG_IGN is the processing method provided by the system for ignoring signals. It is defined as follows:

#define SIG_IGN         (void (*)(int))1
Copy the code

Try to manually trigger SIGPIPE. It will output normally after running.

void signalHandler(int signal) {
    printf("bingo");
}

int main(int argc, char * argv[]) {
    signal(SIGPIPE, signalHandler);
    kill(getpid(), SIGPIPE);
}
Copy the code

The console outputs 333, meaning that only the last handler added will be executed. It’s easy to understand that only one handler can be associated with a signal.

void signalHandler(int signal) {
    printf("111");
}

void signalHandler2(int signal) {
    printf("222");
}

void signalHandler3(int signal) {
    printf("333");
}

int main(int argc, char * argv[]) {
    signal(SIGPIPE, signalHandler);
    signal(SIGPIPE, signalHandler2);
    signal(SIGPIPE, signalHandler3);
    kill(getpid(), SIGPIPE);
}
Copy the code

The current situation is that Sentry can catch and handle this exception, so it is suspected that Sentry overwrites the client’s handling.

The sentry uses the sigAction function associated with the handler. This function, like signal, can set the action associated with the sig. Oact is not a null pointer. It is used to hold the position of the original action on the signal, and act is used to set the action on the specified signal. Sentry associates its own handler with handleSignal and stores the previous handlers in the array g_previousSignalHandlers.

int sigaction(int sig, const struct sigaction *act, struct sigaction *oact); HandleSignal sigAction (fatalSignals[I], &Action, &g_previousSignalHandlers[I])Copy the code

Sentry raises an exception in handleSignal and executes sentrycrashcm_handleException, then raises the signal again using raise.

static void handleSignal(int sigNum, siginfo_t *signalInfo, void *userContext) { SentryCrashLOG_DEBUG("Trapped signal %d", sigNum); Sentrycrashcm_handleException (); if (g_isEnabled) {sentrycrashcm_handleException(); } SentryCrashLOG_DEBUG("Re-raising signal for regular handlers to catch."); // This is technically not allowed, but it works in OSX and iOS. raise(sigNum); }Copy the code

Look at the simplified call stack for handleException:

void sentrycrashcm_handleException(**struct** SentryCrash_MonitorContext *context)
{
    sentrycrashcm_setActiveMonitors(SentryCrashMonitorTypeNone);
}

void sentrycrashcm_setActiveMonitors(SentryCrashMonitorType monitorTypes)
{
    // isEnabled = false
    setMonitorEnabled(monitor, isEnabled);
}

static inline void setMonitorEnabled(Monitor *monitor, bool isEnabled) {
    uninstallSignalHandler();
}

static void uninstallSignalHandler(void) {
    sigaction(fatalSignals[i], &g_previousSignalHandlers[i], **NULL**);
}
Copy the code

You can see that the handleException function eventually reassociates the handlers stored in g_previousSignalHandlers, which is ignored by SIG_IGN in the client setting by default. The function handleSignal associated with sentry will rethrow a signal after processing, which will trigger SIG_IGN, so there is no override relationship and sentry does not affect the logic ignored by the client by default.

In conclusion, the SIG_IGN Settings of the client will take effect. Sentry only reported an exception, not a crash. After manually triggering SIGPIPE in APP, Charles captured the package and sentry report could be seen. The APP did not crash.

Cause and Treatment

Confirmed with multiple business parties that there are no socket related changes in this release, so why are there a large number of exceptions reported after this release?

The diff code later found that the initial timing of the sentry had been altered. SignalHandler (SIG_IGN) overwrites signalHandler (SIG_IGN). SignalHandler (SIG_IGN) overwrites signalHandler (SIG_IGN). SignalHandler (SIG_IGN) overwrites signalHandler (SIG_IGN). Sentry does not report a signal if it does not catch it. Changes in the current version have reversed this order, resulting in a large number of abnormal data reports. Subsequent attempts to locate specific sockets failed, and the sequence SIG_IGN was changed to associate after sentry initialization, and abnormal data was no longer reported in later versions.