This article is the full text shared by Liu Guancheng, lecturer of Bytedance Quality Lab team, at the MTSC China Test and Development Conference “Bytedance Server Single Test ATG-Smartunit Exploration practice”. The complete PPT and QR code of technical exchange group can be shared in the comment section of “MTSC”.
Unit testing is an important part of R&D quality assurance. However, the high cost of writing unit tests and the low motivation of developers make it difficult for unit tests to exert maximum effectiveness. There are many wonderful solutions to efficiently build single tests in the industry, but few comprehensively solve the problem of intelligent generation of unit test code. For this, Bytedance Quality Lab has overcome many difficulties, such as:
[1] Understanding the semantics of code engineering;
[2] How to automate code generation;
[3] How to generate test cases to cover more branches of code;
[4] How can code metadata be maximized
Finally, we completed the intelligent single test product [SmartUnit], realized the automatic production and regression testing of unit test cases, and ensured 35% coverage and accurate assertion. In this article, we will share our practical experience in single test ATG.
Project background
According to statistics, most errors are introduced in the initial stage of the project, and the cost of correcting errors increases gradually with the iteration of the project. The salient feature of unit testing is that it constrains the state size of the test to find problems better and faster. Based on modular disassembly, it can well manage project changes and prevent too many bugs from causing a project to run out of control. It is a very cost-effective quality assurance tool.
Unit tests have many benefits, but are expensive to write. In order to catch up with the project and save time, the initial development often neglects to write unit tests, and the unit test construction time passes to the middle and later stages of the project, when the bugs are experienced by users and the pits are trampled by testers. This leads to the mismatch between the project development cycle and the unit test quality assurance cycle, which seriously wastes the quality assurance effect that unit test should play.
Project objectives
Unit Testing difficulties
According to the above analysis, the dilemma of unit test is that it has great effect but high cost, while the writing cost mainly comes down to the following aspects:
The dimension | The difficulties in |
---|---|
Business knowledge | There are barriers to entry: developers are required to understand the relevant business chain and the role of the objective function. This way, meaningful unit tests can be written to validate the business logic, otherwise unit tests may not be effective in detecting business exceptions. |
The human cost | Very labor-intensive: you need to stop business development and build a lot of code on business functions to verify their functionality. |
Quality standard | Outputs are hard to quantify: unit test writing costs are not as significant as business development; Hard to measure output; It’s not easy to be recognized. |
SmartUnit breakthrough point
In order to solve these problems, SmartUnit entry point can be summarized as: intelligent, integrated, standardized. SmartUnit through intelligent way to solve the problem of labor costs, so that the unit test code writing automation; We embed the SmartUnit project into the acceptance process (such as CI pipelined process), so that the automatic written unit tests are automatically executed in the acceptance environment, and the function regression verification can be done without invasion. Make unit tests work early in the project.
In addition, SmartUnit standardizes unit testing, defining criteria such as line coverage, branch coverage, and branch distance.
To sum up, intelligentization liberates labor cost; Integration allows unit tests to take effect quickly; Standardization solves measurement problems; It solves the main contradiction in unit test application perfectly.
Project objectives
Currently, the average coverage of SmartUnit automatically generated unit test code is around 35%, and the combination of static code analysis and run capture can detect 1 or 2 bugs and catch 8 panics for the project.
The goal of SmartUnit is to automatically generate unit tests with 40% to 60% coverage, build a fully automated code quality assurance system, and do more exploration and practice based on the automatically generated unit tests, such as automatic bug fixing and verification, data flow tracking analysis, etc.
Design scheme
Use case generation analysis
Through the introduction of the project background and entry point, the design concept of SmartUnit is very clear: intelligent closed loop. The overall workflow of SmartUnit is to analyze the code repository, construct test cases, and ultimately select the best test cases by running analysis. The following examples are based on golang.
For example, SSA/AST is abstract syntax tree analysis, which is used to parse the code structure of the function under test. It will be combined with the preset test template to build a unit test code of the function under test that only lacks input. Through GA (genetic algorithm) and Fuzz technology to construct a large number of test case input, through the Instrumentation technology to screen out the final use cases.
Engineering module
In order to realize these processes, the engineering path is mainly divided into three modules:
The main module | Key points of project |
---|---|
Code generation | Similar to the GoTests tool, there is one code to generate the basic framework of test functionsGenerated templates; SmartUnit must make assertions on its constructed unit tests to achieve the regression verification capability of unit tests. In addition, to solve a series of external-dependent problems like network calls, you also need to Mock out downstream dependencies. |
The data generated | In order to construct enough input and output of the function under test, it is necessary to analyze the raw corpus of extracted code and generate more test inputs based on variation/combination. |
Run the analysis | At run time, the generated unit test data is filtered. First, compile orchestration, then discard test cases that don’t run, and finally, rerun the use cases. |
The project framework
To achieve such a project, the architecture of the project is divided into the main module of the code, the storage architecture, and the file execution modules in parallel. According to the execution flow, the project main module is divided into data parsing flow, code assembly flow and use case screening flow.
Take a work situation for easy understanding: when generating a unit test, use [data Extraction module] unit test based on history to generate corpus; These expectations are converted into real function input parameters through the assignment module. Using the [Test Case production module], the input parameters after variation and the code template itself constitute a test case that can be run. Finally, the final test case is screened through the [Test Case Screening module].
Insert the principles
Code peg based on AST transforms code into a tree structure, allowing structured access to code snippets and insertion of code at specific locations. The official tool for Go Test Coverage is also the original code that makes a peg to get coverage of the execution use case.
Take the following original code block as an example:
func sum(a, b int) bool {
if a + b > 80 {
return true
}
return false
}
Copy the code
【 Original function 】
func sum(a, b int) (result bool) {
branchVector := map[string]int{}
hitMap := [3]uint32{}
defer Save(branchVector, hitMap, result)
hitMap[0]++
if a+b > 80 {
branchVector[ "branch-1" ] ++
hitMap[2]++
return true
}
hitMap[1]++
....
Copy the code
[Function after Piling]
After piling, the function is embedded in branchVector and hitMap pile point codes. When the corresponding input enters the branch condition, it must perform an assignment to the map’s key, recording that the execution entered the code block. In general, the pegging approach, similar to that of the official tool, allows SmartUnit to know how each input is running inside the code, and we judge the test case against that state.
Function analysis
Explain the principle of SmartUnit analysis function through the following function use case generation process. The following figure is an anti-addiction verification function. If the user’s nationality is “CN”, he/she needs to query the age and do anti-addiction test, otherwise, there is no need to test.
For users who need to do anti-addiction detection
- So if he’s between the ages of 18 and 120, he’s certified as an adult and can pass the addiction test.
- If you are younger than 18, you are certified as a minor and cannot pass the anti-addiction test.
- If her age is less than 0 years old and more than 120 years old, then that’s the wrong age.
How does SmartUnit generate test cases for this function?
SmartUnit scans through the AST syntax tree and finds the function Nationality(string) string in the code under test, which is used to check the user Nationality. The Nationality function is located on the left side of the if expression. On the right side is the constant “CN “, which is the country code of China. Constants such as “age > 18” and “age < 120” are recognized and extracted by SmartUnit to construct the corresponding return value.
Unit test snippets built by SmartUnit
Through a series of transformations, SmartUnit constructs the final test case from the analyzed data. In the final use case, the mockito technology was used to replace the location of the Nationality function and set the return value to “CN” to achieve the ability of a non-invasive mock. For details, see Golang Monkey. The parameter “age” will carry out data variation to the left and right along the boundary conditions [18] [120] to achieve the effect of exploring code branches as much as possible.
Accurate assertion
Unit tests, above all, assert results. In order for SmartUnit to perform precise assertions, there is a “contract” : when you access SmartUnit, you can select a branch as the base branch, such as the Master branch, and assume that all the functions of the base branch should run correctly (as the assertion result).
SmartUnit will generate the test intermediate code in the base branch and execute it, using the result of this execution as an assertion to generate the actual test case. The SmartUnit implementation’s assertions are used to determine whether the result of the function’s execution is the same as the base branch.
Break through the difficulties
Genetic algorithm practice
SmartUnit uses genetic algorithms to better create and filter use cases, with two priorities: mutation strategies and filtering strategies. The mutation strategy creates more test cases, and the fitness screening strategy is used to evaluate the best sample population of use cases to generate descendant use cases for the mutation strategy. They both antagonize and complement each other, helping SmartUnit explore the set of use cases with the highest coverage and most branches.
- Mutation strategy
In mutation strategy, we focus on two key points: one is how to select the initial values of mutation, and the other is how to mutate these initial values.
The selection of initial values is shown below. SmartUnit uses a number of analytical tools to mine information about the function itself, primarily the data from literal expressions in the AST syntax tree. As the example of anti-addiction function mentioned above, mining the “CN” constant in the existing expression can greatly reduce the number of mutations and achieve the goal of obtaining effective input.
For variation strategies, it is easier to understand some rules by data type:
The data type | Mutation strategy | instructions |
---|---|---|
Int, float | Gaussian distribution | The left and right variation of numerical types is performed by gaussian distribution. |
Bool type | State inversion of association | Bool types can only be true or false, but there may be many criteria in the code. It is difficult to completely hit an if ESle branch of multiple criteria by mutating a bool independently. In this case, we need to record the bool type group of a branch. Explore the unachieved bool combination state according to these multiple conditions. |
string | Corpus padding + string special mutation strategy | For the value of string type, the corresponding corpus pool can be obtained based on the parameter name. If the input parameter name of IP is obtained during the mutation, the string will be mutated in the format of XXX.XXX.XXX. XXX type. |
function | Reflect analytic construction | Inputs and outputs are parsed by reflection construction and methods are constructed in real time. |
An array of | Compound variation + array special variation | Data involving arrays are generally considered to exist in the code traversal statement, but also on the length of the array does not meet the error handling statement, for each type of the array value according to. |
- Use case screening
How do we filter the fitness function? As mentioned in the piling Principle above, SmartUnit can sense the operation state of the test case to determine whether the use case is good or not by piling. SmartUnit can calculate the following data to judge the use case by piling.
Evaluation standard | role |
---|---|
Line coverage | Guided by the number of covered rows,Although line coverage is a general standard for unit testing, it is not a complete measure of code test case coverage. Some bad judgment, maybe it’s one or two lines long, but it’s very important. Test cases covering this branch may have low line coverage, but it is still important. |
Branch coverage | Evaluate test cases in the direction of covering more branches. |
Branch distance | Filter use cases that are more likely to cover more branches because of parameter mutations. |
Branch distance details
As mentioned above, the criteria for evaluating a unit test case are: line coverage, branch coverage, and branch distance. Branch coverage rate and branch in all three distance seems to be based on branch as a target oriented covering most of the code, but it’s branch of the distance standard is more “smooth” branch coverage, branch coverage, it is used to further filter in the same test cases, are more likely to cover more branch of use cases for parameter mutation, this was a core part of the genetic algorithm.
In genetic computing mode, data generation is based on variation, such as a code block in the above anti-addiction function (if age > 18) to understand the concept of branch distance: in this judgment condition, if age > 18, the code will execute the corresponding code block; Another code block is hit when age <= 18. So when age == 18, its branch distance is 0 according to the branch distance formula, because age will mutate smoothly according to the Gaussian distribution, it is easier to mutate the use cases of [age > 18] [age <= 18] than other use cases. The closer the branches are, the more likely they are to mutate into inputs that hit different blocks of code.
[Fitness of use case set T on a branch]
[Use case set T fitness function]
Other types of data are similar. For data whose branch distance is not 0, we always use the constant K after the calculation of branch distance to distinguish the infinitesimal branch distance from 0, because there is a significant difference between them in branch coverage.
The whole process
SmartUnit production process
In the SmartUnit production process, the analyzer disassembles the call diagram of the target function and the data corpus (the initial value of the input variation). The call diagram is used to determine the downstream call of the function under test to Mock, and the input and output parameters of the function under test are also resolved, including the input and output parameters required by the Mock function. The algorithm generates appropriate data to fill in the input and output parameters.
By combining call diagrams and data corpora with code templates, SmartUnit generates executable test cases. These initial set of use cases need to enter the cyclic iteration mode supported by genetic algorithm to find better use cases in the set of use cases. GA algorithm will continuously iterate and “converge” good test cases into the final set of test cases.
After obtaining the final offspring of GA pattern, use cases that failed to compile and panic still need to be removed to obtain the final set of test cases that can be run.
The SmartUnit consumption process
After the Unit test production process of SmartUnit is completed, there is a quality card point process that acts on the code submission process. With automatic unit test generation and precise assertions, SmartUnit has a very powerful function-level regression verification system that puts SmartUnit at the very top of the quality assurance process.
First SmartUnit fully generates test cases for the master branch of the repository and maintains test cases based on diff updates. If a project developer wants to join the Master branch, they pull the SmartUnit test case out for execution and confirm the results. If the unit test execution results are abnormal (assertion failure, panic), there is an unexpected difference between the merged and master branches.
For functions with code structure changes, we will do analysis before running to remove the test cases that cannot run, so as to prevent the compilation failure of unit test cases caused by large version changes.
Projects show
SmartUnit visual interface
SmartUnit pulls remotely independently maintained unit tests from the MR pipeline card, executes the results and provides a report view. The results clearly show the overall coverage of SmartUnit’s unit tests on the project, whether each test case passes, and the details of unit test execution.
Click on the unit test cases to see the coverage details. The yellow line shows the use cases covered by SmartUnit unit tests, and the blue line shows the hand-written use cases by developers, showing that SmartUnit can bring substantial coverage improvements to the project.
For the failure of the unit tests, you can click on the details to check the reason, in this case we found a function of a warehouse not doing concurrent operation, the map data led to competition and developers are not aimed at the function to write test cases, after access SmartUnit for developers discovered the problem after repair.
Project summary
Currently, there are nearly 10,000 SmartUnit access code warehouses, serving feishu, Douyin, Toutiao and other business lines. According to statistics, in the application of a project line, 1076 MR intercepting anomalies in SmartUnit regression detection accounted for 21.06% (1076/5110) of the total submitted, and MR repair rate was 11.04% (133/1076). By November 20, SmartUnit had detected and reported BUG 9825 (including static BUG detection), and the BUG consumption rate was about 19.16% (1882/9825).
Join us and do something fun!
We are the Quality Lab team of Quality engineering, focusing on the implementation of cutting-edge technology, providing the whole company with engineering efficiency products and infrastructure. The company attaches great importance to quality and efficiency, our team is growing, and we have mature products in r&d intelligence (such as SmartUnit, ByteQI, Fastbot), engineering efficiency (such as ByteFuzz, Fatal platform), AIOps(such as ByHunter, SmartEye).
The current projects are very challenging, the team is very encouraging and supportive of technological innovation, the atmosphere is lively and youthful. If you join us, you will have many opportunities for industry top conferences and exchanges every year, and technical cooperation between universities is frequent. Here, you will receive at least three patents every year, and the top Paper output will be participated in. The space for development is large, and the growth is rapid and not boring. Whether you are a backend engineer or an algorithm engineer, we are looking for you to join us: [email protected] [email protected].