Phenomenon of the problem
After the update of the latest version of APP, a large number of abnormal data were reported from Sentry, and the number of users affected was extremely exaggerated 10W +. The specific errors are reported as follows
The screening process
First, I checked the cause of the SIGPIPE error and found relevant information on the official website
If the client is still sending data after the Socket connection is closed, a SIGPIPE signal will be generated. If the signal is not processed, it will crash.
The documentation states that you can use signal(SIGPIPE, SIG_IGN) global ignore to confirm that the client has added this logic, but the exception is reported to sentry anyway. Signal is a function that associates a signal with a handler and executes it when received. SIG_IGN is the processing method provided by the system for ignoring signals. It is defined as follows:
#define SIG_IGN (void (*)(int))1
Copy the code
Try to manually trigger SIGPIPE. It will output normally after running.
void signalHandler(int signal) {
printf("bingo");
}
int main(int argc, char * argv[]) {
signal(SIGPIPE, signalHandler);
kill(getpid(), SIGPIPE);
}
Copy the code
The console outputs 333, meaning that only the last handler added will be executed. It’s easy to understand that only one handler can be associated with a signal.
void signalHandler(int signal) {
printf("111");
}
void signalHandler2(int signal) {
printf("222");
}
void signalHandler3(int signal) {
printf("333");
}
int main(int argc, char * argv[]) {
signal(SIGPIPE, signalHandler);
signal(SIGPIPE, signalHandler2);
signal(SIGPIPE, signalHandler3);
kill(getpid(), SIGPIPE);
}
Copy the code
The current situation is that Sentry can catch and handle this exception, so it is suspected that Sentry overwrites the client’s handling.
The sentry uses the sigAction function associated with the handler. This function, like signal, can set the action associated with the sig. Oact is not a null pointer. It is used to hold the position of the original action on the signal, and act is used to set the action on the specified signal. Sentry associates its own handler with handleSignal and stores the previous handlers in the array g_previousSignalHandlers.
int sigaction(int sig, const struct sigaction *act, struct sigaction *oact); HandleSignal sigAction (fatalSignals[I], &Action, &g_previousSignalHandlers[I])Copy the code
Sentry raises an exception in handleSignal and executes sentrycrashcm_handleException, then raises the signal again using raise.
static void handleSignal(int sigNum, siginfo_t *signalInfo, void *userContext) { SentryCrashLOG_DEBUG("Trapped signal %d", sigNum); Sentrycrashcm_handleException (); if (g_isEnabled) {sentrycrashcm_handleException(); } SentryCrashLOG_DEBUG("Re-raising signal for regular handlers to catch."); // This is technically not allowed, but it works in OSX and iOS. raise(sigNum); }Copy the code
Look at the simplified call stack for handleException:
void sentrycrashcm_handleException(**struct** SentryCrash_MonitorContext *context)
{
sentrycrashcm_setActiveMonitors(SentryCrashMonitorTypeNone);
}
void sentrycrashcm_setActiveMonitors(SentryCrashMonitorType monitorTypes)
{
// isEnabled = false
setMonitorEnabled(monitor, isEnabled);
}
static inline void setMonitorEnabled(Monitor *monitor, bool isEnabled) {
uninstallSignalHandler();
}
static void uninstallSignalHandler(void) {
sigaction(fatalSignals[i], &g_previousSignalHandlers[i], **NULL**);
}
Copy the code
You can see that the handleException function eventually reassociates the handlers stored in g_previousSignalHandlers, which is ignored by SIG_IGN in the client setting by default. The function handleSignal associated with sentry will rethrow a signal after processing, which will trigger SIG_IGN, so there is no override relationship and sentry does not affect the logic ignored by the client by default.
In conclusion, the SIG_IGN Settings of the client will take effect. Sentry only reported an exception, not a crash. After manually triggering SIGPIPE in APP, Charles captured the package and sentry report could be seen. The APP did not crash.
Cause and Treatment
Confirmed with multiple business parties that there are no socket related changes in this release, so why are there a large number of exceptions reported after this release?
The diff code later found that the initial timing of the sentry had been altered. SignalHandler (SIG_IGN) overwrites signalHandler (SIG_IGN). SignalHandler (SIG_IGN) overwrites signalHandler (SIG_IGN). SignalHandler (SIG_IGN) overwrites signalHandler (SIG_IGN). Sentry does not report a signal if it does not catch it. Changes in the current version have reversed this order, resulting in a large number of abnormal data reports. Subsequent attempts to locate specific sockets failed, and the sequence SIG_IGN was changed to associate after sentry initialization, and abnormal data was no longer reported in later versions.