Combat iOS continuous crash detection and self-repair

background

In the recent release of iOS, Umeng has automatically reported user crashes by default in its updated version.

It would have been ok if I had just felt sick about the development. However, when releasing the version, I found that the code of Umeng did not make safety judgment on the reported data, resulting in the crash of data reading every time.

Crash detection, causing App crashes:)

The stack is as follows:

As can be easily seen from the method names [UMCrash initUMCrash:channel:] and [WPKSetup sendAllReports], this is a crash detection during initialization, and then report, while processing data failed.

The console can’t see the code directly because:

[NSTaggedPointerString objectForKey:]: unrecognized selector sent to instance

Since the initialization of The Alliance happens very early (we usually do it in the startup phase). As a result, most apps crash before services are up. As long as this happens, every time I open the App, I will flash back because of the same problem.

The aftermath of successive crashes

So what are the consequences of successive crashes like this?

It can be summarized as the following three points:

Development is not perceptible. since it crashes directly at startup, there is no information on bugly/Ummon crash collection platform. There’s no way to fix it.
Users can’t give feedback, because they crash every time they come in, and they can’t give feedback to customer service.
0 experience for new users. When the App is entered for three consecutive times, it will flash back. Personally, I will definitely not use this App that I cannot experience.

The solution

In the face of the above situation, with the knowledge of reproducible scenarios, we can indeed hook its upload method through technical means and solve the crash problem.

This time, for example, intercept [WPKSetup sendAllReports] in the path where we occurred and do not execute. Then of course it doesn’t collapse anymore.

But there are three problems with the above solution:

Cost trade-off problem, the original function is directly disabled due to an occasional crash scenario.
SDK features are missing and may even introduce new problems.
It only solves the current scenario, and lacks the necessary means to solve other continuous crashes. That is to say, it can be protected for a while, but not forever.

Perform continuous crash detection

As mentioned earlier, one of the big problems with continuous crashes is developing unawareness.

In other words, we don’t even know there’s a problem, so the first thing to do is find it.

The first thought that usually comes to mind is to observe each crash by catching exceptions, just like the crash reporting framework.

Operations that catch exceptions also have two disadvantages:

It is duplicated and coupled with existing code that handles exceptions
Conflict with third-party Crash collection framework, resulting in leak detection

On the second point, conflicts with third-party crash collection frameworks are the most significant, as their code is often invisible to us.

In the protection scheme of continuous flash backoff for iOS startup of wechat reading team, a good idea is provided for us:

Persist a crashCount variable
Each startup crashCount = crashCount +1
After x seconds, crashCount = 0

CrashCount represents the number of crashes. Increment it by 1 each time the App is started. If the App has survived for a period of time, then no continuous crashes have occurred. Once the number of times exceeds the threshold value set by us, it can be proved that there is no survival in these successive time thresholds and abnormal crash occurs.

Of course, there are also false positives, such as users actively kill the App within this period of time threshold. This can be controlled by adjusting the thresholds of both frequency and time.

Control of false positives

In the original scheme, we can further control false positives and try to monitor the scenario where users actively kill the App:

The user kills the APP in the foreground
User kills APP in the background

In the case of false positives, most of whom were the first kind, in a few seconds, start at the front desk to kill the APP, the iOS listening through UIApplicationWillTerminateNotification, after receiving the notice, the number of empty out.

Automatic repair of continuous crashes

To fix crashes, you first need to know the common causes of such problems.

For the problem of code bugs, if fixed entry will be broken, in the testing process will generally be exposed. Of course, code crashes are not completely excluded.

Clear data

There must be a “variable” that causes online problems to crash continuously.

The database
Storing files
Server data

For the repair of database and stored files, we will clean up the local data to ensure the normal flow of App.

Important data definitions can be stored in the cloud first.

This time we appear the friendship alliance crash, also because of the existence of local problems to read the data caused by continuous flash back.

Re-request/run the hot fix pack

If the server fails to process data, check with the server and return normal data to solve the problem. We can also provide a portal for users to report or contact us directly.

Even consider introducing dynamic fixes to solve code bugs, requests, and run hot fixes.

The specific process

As per the wechat reading team, hook at the didFinishLaunching stage. When the crash limit is triggered, enter the fix, and when the fix is complete call the didFinishLaunching method to follow the original flow into the App.

According to the actual situation of our project, the automatic repair process is different from the details:

The AppdelegateinitializeLog initialization starts.
inwillFinishLaunchingPhase, there is some initialization of data and services.
applicationDidBecomeActiveThere is logic, too.
Accessing the home page requires pulling requests and modules with complex logic.

These issues need to be resolved not only by hooking the didFinishLaunching stage, but also by handling the above case separately and intercepting when the number of crashes exceeds the limit.

And intermediate integration, such as service can not find the crash problem, need to solve and integration.

Now add a temporary method:

+ (BOOL)needFixCrashes
{
    NSInteger launchCrashes = [self crashCount];
    
    if (launchCrashes >= kContinuousCrashOnLaunchNeedToReport) {
        return YES;
    }
    return NO;
}

Copy the code

Control everywhere via needFixCrashes:

if(needFixCrashes){ return; } // The original logic is normalCopy the code

After the repair, we invoke the corresponding processes separately. [A],[B]… Call it one by one.

A better idea

In fact, there is a better way to do the above process, limited to business time.

We can hook objects and methods in the process, and try to store them, such as using NSMapTable.

At the end of the repair, the sequence of objects and methods through the call, go through a set of initiated process.

And the nice thing about this is that you can reuse it, you can just add the Selector Selector method to it, and then fix it, you don’t have to write hard code to call it.

The final process

The final detection process is as follows:

Start the App, crash = a + 1 bonus to crash
Check the crash < maxCrash
If crash < maxCrash, the system enters the normal startup process and is empty after a period of time
Crash >= maxCrash, enter repair boot

The repair process is designed as follows:

The root controller is set to the new controller, and a repair dialog box is displayed, prompting “The application may be damaged. Do you want to repair it?
If the user selects “Cancel”Report the informationGo to the platform, and then the App exits to the background
The user chooses “repair “, then we clean up the data (important data should be backed up in the cloud first), and thenReport the information.
After the repair is complete, reinitialize all services and go to the home page.
In the worst case, data cleansing still doesn’t help, keeping track of the number of “fixes” over time. Provide a way to directly contact the platform, in the case of conditions to solve the flash back.

In practice, there is a lot of business for us to sort out, just do it before all the services, if there is no special class collection processing, we need to spend time to do it, or make judgments in various places.

In general, the main idea is as follows:

Crash testingLaunch first in the entire App. The code is clean and simple enough.
When the repairCategorize your data, back up what’s important, and delete what.
After repairAfter entering the App, the path should be complete enough to ensure smooth entry.

conclusion

The occurrence of the continuous crash problem can be said to be the most serious problem of an App, generally does not occur.

And as the number of users increases, any problems can be magnified.

Therefore, as technical personnel, we need to do a good job in the bottom of the strategy, as far as possible to eliminate such problems, to ensure good user experience, through technical protection means to retain a user is a user.

What’s more, the fact that it is necessary to take precautions, the continuous collapse of the problem did not happen in their own body?

reference

The iOS protection scheme is enabled with continuous blinks

How can I report a Crash log when an iOS App continuously blinks back