Research efficiency optimization practice: AI algorithm helps deep BUG mining

Introduction:

As products continue to operate online, the scale of products online becomes larger and more complex. The growth of product volume requires higher and higher quality. In order to achieve higher quality requirements, it is necessary to find ways to increase the intensity of the test, but the traditional way of manually writing use cases and automated regression is too costly. In recent years, AI technology has played an increasingly important role in a growing number of fields. At Tencent, we have always been curious about new technologies, actively learning and applying them to our daily work. The author of this article is Lin Junke, senior engineer of System test in Tencent Security Department. He has 16 years of experience in software testing and has a lot of research on the application of AI technology in the field of testing.

This article uses security products as an example, but the methodology is suitable for deep BUG mining involving multiple factors. Shown below the graph is a typical flow attack protection process: hackers attacked the business server on the Internet, we have the testing flow equipment to detect attacks, protection will start automatically detect attacks will be attacked IP traffic diversion to defense equipment, the flow rate on defense equipment cleaning after server will forwarded to the normal flow of business.

01 Safety product testing pain point analysis

Safety protection product features:

1, black production attack methods are diverse and fast renovation, the product needs to be able to quickly respond to the new attack methods of the live network, but can not mistakenly kill normal users, so the product protection strategy is very many. The table below shows the number of configuration items in the main policy file, which adds up to two or three hundred and is growing rapidly. With each iteration, a large number of new configuration items are added and the processing logic is very complicated.

The main policy profile	Number of policy items set
anti_*******.conf	50
anti_*******.conf	147
********.conf	11
********.conf	11
********.conf	10

Even with careful development, it is virtually impossible to ensure that every function is highly cohesive and poorly coupled, and sometimes it is inevitable that otherwise unrelated configuration items will interact with each other. If there is undue influence between switches that do not affect each other, protection may become uncontrollable after policy switchover. An example of an unintended impact failure occurred when a configuration item protecting UDP traffic affected HTTPS traffic, but the two configurations had no relationship. Therefore, we need to test the stability and reliability of product functions under various combinations of strategies.

2. For a specific flow, the vast majority of the flow will eventually be protected by a specific protection module. This feature can be used to simplify the model. We can focus on the main features for modeling, and other protection details can be ignored for the time being.

To solve the problem caused by this kind of parameter combination, the industry mainly uses the full dual algorithm to pair the parameters. The resulting test set can cover all value combinations of any two variables with the minimum number of combinations. In theory, this set of use cases should expose any defects caused by the combined action of the two variables. Although this algorithm generates the least number of combinations, if a new parameter is added to create a new combination, the new combination is completely unrelated to the previous combination. So when parameters are small, we often use it to reduce the number of cases while maintaining good test coverage. But once you have a large number of parameters, the whole process becomes very complicated as a new combination is generated each time and the expected result is recalculated each time. It is difficult to solve the problem of “hundreds of switches in different configurations for specific traffic protection techniques”.

Our project team is currently using manual addition of use cases, automatic execution of use cases. In this way, it is difficult to maintain full duality every time a new configuration item is added. For example, suppose that all existing use cases are fully dual and a new configuration item is added. This configuration item can only take the values of 0 and 1. To ensure that all parameters are combined once, a configuration item must be added based on all the original use cases and tested once when 0 is selected and again when 1 is selected. The number of use cases is very large when one configuration item is added. If a new combination is generated each time, the 150 configuration switches will generate only about 130 combinations when the fully dual combination is newly generated. And the increments are up to 2^150 combinations.

02 How does the industry automate use case generation

Is there a solution in the industry that can generate fewer combinations without having to redo the expected result manually? The answer is yes. UML modeling technology is to update and maintain the model with the version under test, and generate new use cases for each test. The core value of this technique is to automate the generation of use cases, maximize functionality coverage with a minimum number of use cases, and ultimately test versions faster and more fully. The disadvantages of this technique are that the model is complex to maintain, design flaws are difficult to detect (use cases are just mechanically traversed), and use cases are not designed from the user’s perspective.

03 Application of AI in the field of front-end page testing

In recent years, THE development of AI technology has been very rapid, AI technology has the same characteristics as UML: love to build models. So can AI bypass complex modeling? Overall planning of use cases, using the minimum number of use cases to achieve maximum coverage. Also avoid calculating the expected result manually.

In order to explore the application of new technologies in the field of testing, I quickly swept the blindness of AI, and then further study found that the future of AI application in the field of testing had already arrived. There are already a number of tools in the industry that use AI for automated testing, and even use cases are designed automatically. For front-end pages, there are even tools that say that given a URL link, testers can sit back and wait for the results. Eggplant, Appvance IQ, Sauce Labs, etc.

Through analysis, it is found that these technologies mainly use THE computer vision technology of AI to identify all buttons on the page, generate traversal tree according to the buttons on each page, and then automatically traverse possible paths (user journey) according to the traversal tree. So as to achieve the purpose of automatic design cases, automatic testing.

Tencent’s colleagues previously published a book called AI Automated Testing, which detailed AI testing on graphic and data games.

The existing technologies in the industry are excellent, but they are mainly applied to front-end page testing, and there is no corresponding technology for backend testing. Therefore, we began to study how to apply AI technology to background testing. After multiple attempts and combining the characteristics of AI, we came up with a bold idea: It is impossible for machines to understand the business logic designed by humans without human involvement, and modeling like UML is too heavy, but AI is very good at processing and classifying data. Since it cannot calculate the expected results, can it not calculate? The test suite only records how the flow is handled, and the AI classifies the flow and protection results after recording. Then analyze the typical configuration by category. Then, a manual review is made to see if the flow disposition in a typical configuration is reasonable.

Explore the application of AI in background testing

Based on these ideas, we quickly developed an implementation plan. Our goal: to improve the coverage of the combination of multiple factors at the lowest cost, and to dig deep bugs. The theoretical basis for the successful implementation of the scheme is: covering the most scenarios with the least number of cases based on the test theory. Use AI to categorize and gain insight into responses in various scenarios. These two pieces work together.

The implementation steps are as follows:

Step1: Every time a new configuration item is added, the configuration is generated based on the full dual algorithm.

Step2: use typical attack methods for each configuration and record the defense method of the tested end.

Setp3: Analyze the association between various protection and configuration through AI. Find out the most important configuration items for each defense mode.

Step4: check whether the N configurations most relevant to various protection modes are in line with the expected design.

The first part is very simple to generate a combination of full duality based on test theory. It took me half a day to implement. To combine configuration items in multiple configuration files, I use the configuration item name@file name method to name configuration items. Generated using the Pairwise tool. Then use the script to convert the configuration file.

A total of 250 combinations are generated based on the full dual algorithm. Select traffic with 27 typical characteristics to make ‘GET’,’POST’,’PUT’,’DELETE’,’HEAD’,’OPTIONS’,’TRACE’ and ‘CONNECT’ requests respectively. The number of traffic types was 278=216. In 250 configurations, the 216 flows were passed and the protection techniques were recorded, and the protection records of 250,216 = 54,000 scenarios were obtained. The recorded result is as follows: there are three parts in total. The first part is the configuration item combination data, the second part is the name of the traffic sent, and the last column is the protection method used by the tested end.

The data is ready to be handed over to the AI. But the team only has test experts, not AI experts. We consulted an AI expert in Tencent, who thought it feasible after understanding our needs, but the specific implementation still puzzled us. Because AI knowledge is so different from testing knowledge, learning it from scratch is like reading a book.

However, as long as you are willing to use your brain and learn more, the method is always more than difficult. I found a data mining tool that is easy for AI to use. After repeated learning and practice, I think these components can be applied in our plan. My model is as follows:

PCA, as the full name of principal cause analysis component, can help us find out N configuration items that have great influence on the result. The ranking of the influence of configuration items on the result is a one-dimensional list. The configuration switch processing sequence of the development design is definitely a mesh, the result is good for reference. Quantitative analysis of the impact of classification tree on configuration items. Personally, I think the information output from this component is more valuable.

The PCA analysis showed this. In our case, the curve is fairly smooth, indicating that there are no significant configuration items.

The influence of configuration items analyzed by AI using RANK component on the result is in the following order with the design flow chart of development, which can be roughly matched. Preliminary confirmation of the scheme is a bit reliable.

The following figure shows the classification of running results by classification tree:

Let’s take a typical example to illustrate how to find problems according to AI’s guidance: AI obtained a large classification tree graph after data processing, and marked each result in the data with a color. As shown in the figure, yellow, purple and white and green are the data display related to the four results respectively. The root node in the yellow area indicates a total of 74 data pieces whose defense method is dropos_***********.

The most relevant configuration item is:

* * * * * * * @ drop_ anti_ * * * * *. Conf.

The left leaf node represents:

When drop_ * * * * * * * @ anti_ * * * * *. The conf configuration for android, ios, Linux.

The defense method is dropos_***********.

The leaf node on the right represents:

When drop_*******@anti_*****. Con +f is set to 0.

The defense method is ************************_trans.

With the defense logic of the system under test, I can see that there is a real problem in this area. This feature is a feature to discard the fingerprint of a particular OS, because I only sent traffic on Linux when RUNNING the case, which normally would only be discarded on Linux. Conf is discarded when the drop_*******@anti_*****. Conf is configured for Android, ios, win, or Linux. In other words, the OS identification is inaccurate when the drop_*******@anti_*****. So let’s write down this point right here.

The configuration item at the bottom of the box is the sub-related configuration item related to the result. Continue to observe its leaf nodes, and we pay special attention to the ratio of each leaf node. In this example, when the configuration item is configured with different values, the ratio is close and the result bias is obvious, which is a signal of low coupling.

You can see the problem when you open the original table based on the information displayed in the classification tree, hide the irrelevant columns, and put the associated configuration items together.

Locate the corresponding configuration according to the line number of the faulty scenario and reproduce the problem on the environment. The configuration is as follows:

Expected: The traffic is delivered on Linux. The traffic is expected to be forwarded because it does not match the policy. Measurement found that the traffic was dropped due to os_********** :

This example shows that under the guidance of AI, the fingerprint function of OS does have the possibility of misidentifying others in certain scenarios, and also proves that the method of data analysis by AI is reliable. I think the core value of AI for testing lies in visualizing complex data to make analysis easier.

To sum up, this method can solve the pain point of “there are but few deep bugs caused by the coupling of multiple parameters at present, but it needs to do parameter combination test to solve these problems, which costs a lot to solve”. The coupling between multiple factors is verified with a small cost. 54000 scenarios of test cases were automatically generated, which took 3.5 days to finish. After analyzing the results, AI confirmed 2 bugs with the developer. If these 54,000 scenarios are written manually, it will take 4.9 years to complete according to the current 30 use cases per person per day. After using this method, the combination can be generated in only a few minutes and completed in 3.5 days. At present, the analysis can be completed in 10 days, which greatly improves the test efficiency.

About Tencent WeTest

Tencent WeTest is a one-stop quality open platform officially launched by Tencent. More than ten years of quality management experience, committed to quality standard construction, product quality improvement. Tencent WeTest provides mobile developers with excellent r & D tools such as compatibility testing, cloud real machine, performance testing and security protection, and provides solutions for more than 100 industries, covering the testing needs of product development and operation at all stages. It has experienced the trials of thousands of products. Gold expert team, through 5 dimensions, 41 indicators, 360 degrees to ensure your product quality.

Pay attention to Tencent WeTest for more test dry goods knowledge

WeTest Tencent Quality open platform – focus on improving the quality of games

Research efficiency optimization practice: AI algorithm helps deep BUG mining

Related Posts

What is Jetpack Compose? Take you into Jetpack Compose~

Source code analysis of event distribution process

Android interview questions (with answers) | the nuggets technical essay