Author: An Qing

background

In the past fiscal year, the high availability team of Alipay client continued to ensure the high availability stability of Alipay client line. However, it is not enough to only have the online emergency response ability. It is also necessary to discover the stability risks of clients in advance and build the risk mining ability offline, so as to improve the overall high availability guarantee system of clients. Based on the summary of online flash rollback problems in the past 1-2 years, it can be found that NPE problems, RPC data type mismatch, and Config changes cause a large number of flash rollback problems, which can be detected offline in advance through a certain mechanism.

Based on this, we classified the factors affecting the stability, and constructed the stability “mine clearance” system of the client according to the priority, mainly including function interface mine clearance, rpc&config&jsapi mine clearance, scheme& broadcast & notification mine clearance, lottie&antmation& nest template change mine clearance, etc. Through this set of mine clearance capacity building, we hope to completely avoid the above online stability failures, and promote the development of optimized codes in the form of issue, so as to continuously improve the client stability and improve the client high availability guarantee system from the perspective of prior engagement.

Technical solution

According to the empirical analysis, in the function call of the client, all kinds of flashback problems caused by insufficient check of function parameters account for about 20%. At the present stage, Fuzz tests are performed on the Android side for public and private static static interfaces, and on the iOS side for all public interfaces. By scanning the installation package, a full amount of interface function information is obtained, and then the function parameters Fuzz are created to test the stability of the client.

Compared with the traditional static code scanning, this scheme is implemented on the real client, close to the real scene can produce real abnormal data. The whole technical implementation scheme includes the following points:

  • Code scan gives detailed interface files
  • Client non – intrusive implementation of batch interface reflection capability
  • Parameter Fuzz exception construction capability
  • Automated test cases control real machine execution of use cases and client exception recovery
  • Issue report analysis and processing

Code function scan

Based on the open source framework github.com/androguard/…

Client function execution module

Using alipay dynamic bundle capability, function interface stability test can be carried out without intruding into Alipay client. The test process is divided into the following stages:

1. Scan execution mode

Due to the amount of interface implementation takes too long, is currently support version difference, big difference version interface test by calculation, the difference between version interface can be obtained rapidly stability data, and support a particular bundle code scanning, for a business code scanner, the function interface stability detection output to the business side.

2. Parameter Fuzz anomaly construction

Abnormal structure parameters is a very professional subject, there are a lot of scheme based on machine learning to construct anomaly data cover more code logic, at present only in accordance with the experience and business semantics constructed a set of abnormal test set, through the permutation and combination to traverse the function code, now a interface according to the number of parameters can be variable 5-10 function calls. Code coverage results show that 60% of the logic of the interface can be penetrated.

3. The function under test is called

The main problems to be solved in the function call process under test are execution efficiency, data recording and breakpoint continuation. Execution efficiency depends on interface grouping and multi-threaded execution. Strict separation of function execution process ensures the reliability of results. In the process of interface mine clearance test, a large number of flash backoff and ANR jam will lead to the test stop. In order to automatically complete the whole test process, it is necessary to record the test process data in detail, capture the key data during the flash backoff, save it and output it to the client storage space, so as to provide key data for the breakpoint continuation.

4. Flash back data playback

After the execution of the test case is effectively saved to the local, the program supports direct playback of abnormal use cases to verify the effect of code repair.

3 Automatic script execution

The purpose of the automation script is to arouse the client to execute the test case, and to be able to detect the exception of the client, pull the test data to determine the breakpoint, and restart the client to ensure that the entire test process can be fully automated execution without human intervention.

Summary and Prospect

More than 10 large versions of Alipay client function interface minesweeper have been implemented, and nearly a thousand effective problems have been found on both sides. In the iteration of the tool, the false positive rate of problems has been continuously reduced, and the mutation capacity of Fuzz has been improved to cover more risk problems. At the same time, the problem of functional interface minesweeping has been added into the client attack and defense drill, as a real flash back scenario to attack the business side. At present, there are still many imperfections in functional interface minesweeping, such as the construction of intelligent variation of parameter fuzz and the construction of business semantic parameters, which need to be improved. In the future, we will continue to supplement the client stability testing capability and strive to solve all problems before the release.

Follow us every week for 3 mobile technology practices & dry goods for you to think about!