Original link: tech.youzan.com/thanos/?utm… The early business of Youzan was based on a single PHP project. With the development of the business, the performance expansion could not meet the demand. For the subsequent development, the bottom layer began to be microservized and turned to the Dubbo framework as a whole. From single unit to distributed framework, testing also faces a number of problems, as follows:
For the vast majority of applications in distributed systems, with the development of business, the complexity of their own application code will continue to increase, how to accurately and comprehensively determine the impact of code modification will become more and more important;
Some domains with poorly designed business architectures may find that changes in any application interface can affect multiple applications. During the testing process, it will be found that only a modification of its own application code will lead to great changes in the exposed interface logic. At this time, the tester needs to determine how much impact this exposed interface has on the upper application.
Rapid business iteration results in continuous compression of test time, and full regression is a very difficult thing, so the test scope needs to be controlled accurately by the development tester according to the code and business familiarity, and the risk is easy to get out of control.
Based on the above background, we developed precision testing tools, which are integrated into the testing tool platform for all colleagues in the technical department as one of the reference dimensions of application on-line quality.
Overall scheme design for the above pain points, can be divided into three steps; The first step is how to identify the modified code. The second step is to analyze which interfaces of the application are affected. The third step is to obtain the impact on the upper business side; Design points are as follows:
Identification of changed codes: the online code and master code are analyzed by abstract syntax tree. After removing noise, the method of adding/modifying/deleting can be obtained by comparing method body;
Analysis of the impact of their own applications exposed to the interface, using static and static combination. Static analysis uses bytecode analysis and bridge to solve part of the polymorphism problem. Dynamic analysis uses javaAgent which is consistent with the mainstream call chain technology to weave the code. In order to prevent the performance deterioration caused by a large amount of weaving, the weaving is only carried out in qa environment.
For application link between query, due to the internal have long an invocation chain system, can view real-time application interface between call details, with the help of this system, using big data spark or MR offline tasks, collect information available to handle all the link all link information between applications, the upper business scope as long as the query link can obtain. (PS: For companies that don’t have a call chain in place, you can refer to skywalking, a mature open source tool, at github: github.com/apache/skyw…
The overall plan
Key module Key module includes code comparison, static analysis, dynamic analysis, static and static combination, application impact analysis
Code comparison design idea: Impact analysis, first need to determine which methods have changed. Traditional Git/SVN will consider the addition of comments, white space characters, blank lines and other non-business code as code changes. In fact, such changes have no impact on our business. Simply judging the compiled class file can avoid these misjudgments, but it is difficult to determine which methods have changed by comparing instructions in the method body and dealing with various internal classes. To solve this problem, we use syntax tree analysis, the flow is as follows:
Code comparison scheme
Static analysis logic design ideas:
For Java code, can discover bytecode analysis, a method is called by invokestatic, invokespecial, invokeinterface, invokevirtual, invokedynamic five of these instructions, Scan the invoke directive in each method body directive for a series of parent-child nodes in the intra-application invocation chain. The exposed interface of each application can be considered as the root node of an internal call chain, and the internal call chain can be generated by traversing all the reachable nodes from the root node. The main points are as follows:
For bytecode analysis, there are many bytecode manipulation tools, such as ASM, BCEL, and Javassist, which are used in a similar way.
For invokedynamic instruction, only bytecode instruction points to a Bootstrap Method, which needs to determine the real execution Method, and then obtain the real call chain.
In order to speed up the process and reduce the post-processing, we need to remove the parent node that is not interested, such as calling the third-party package/JDK API/get method /set method;
The invokeInterface method invokes the interface instruction, but what is actually executed is the implementation class code of the interface. If the interface has only one implementation class, then we can determine that the implementation class is executed, so that we can bridge.
During the compilation of anonymous inner class, A class file similar to A$1 is generated. According to the EnclosingMethod field in the bytecode file, the class name and method name of the upper layer caller can be determined, so as to complete the bridge between method and anonymous inner class method.
Static solution
Dynamic analysis & Dynamic and static analysis:
AOP and polymorphism exist in the code, static analysis can not be well solved, using dynamic analysis will be a good solution to this problem. Using JavaAgent to weave code into internal methods, when automated or functional testing is performed, it is possible to record all internal methods through a request. The resulting internal method call chain will record the real method of AOP and polymorphic execution, and the static weakness will be greatly complemented. The main points are as follows:
Performance problem: large amount of weaving will lead to performance loss, first determine the current environment, whether it is QA environment, QA environment before weaving, do not affect the online;
Weaving range: Youzan.*), exclude all get/set methods, and exclude private methods (subclasses cannot override private methods of the parent class). Eliminating these will greatly speed up code weaving and have no impact on analysis;
For each request to the end of the return, the whole call process can be regarded as the process of constantly on and off the stack, call a method is on the stack, the end of the method is off the stack, when the stack is empty, that is, the end of the request, the order of the call in and out of the stack reflects the code call logic, thus forming an internal call chain;
The dynamic plan
Static and static combination:
Dynamic analysis will have insufficient samples, the internal call chain can not fully reflect the internal method call; Static analysis has the problem of polymorphism and AOP, there are isolated nodes, cannot be connected in series; In order to analyze the affected range as far as possible and avoid the disadvantages of static and static analysis, static and static combination is adopted. The main points are as follows:
According to dynamic analysis and static analysis, a series of internal call chains are obtained respectively. The nodes of these internal call chains are broken up and recombined to obtain internal call chains containing dynamic and static data.
According to the new/modified method name and method input parameter type, the internal call chain containing this method is matched. The root node of the internal call chain is the external exposed interface affected by the change point.
Action plan
SDK + JavaAgent is used to collect links between applications. The overall scheme is similar to Skywalking and can be used for secondary development by referring to Skywalking. This topic describes the offline analysis method. Data in the call chain between applications is reported by each application in batches. A request reported on each node contains the information about the upper-layer caller interface, upper-layer caller interface, and this interface. (PS: Due to some anomalies, the link data reported in real time may not be complete. Therefore, it is necessary to determine whether it is a complete call relationship tree before offline statistics are stored in the database.)
Interprocess link
Effect The interface effects of intra-application impacts are as follows, including the interfaces corresponding to summary information, comparison pages, and impact points
In-app general display
An interface may be more than the caller calls, for development and testing personnel directly under the most attention is the interface the caller and entry the caller and overall topology, as shown in the following: across application topology entry and direct call list and in some cases, a single link invocation chain details also need to display: a single link for details
Less than 1. Most of the impact surface of new code relies on bytecode analysis, while bytecode analysis has natural shortcomings in polymorphism and AOP, and the impact surface will be lost
2. Large-scale code weaving exists in in-application link tracking, which causes certain loss of performance and memory resources, especially for projects with a large amount of code.
3. For large-scale code refactoring or changes in underlying common methods, impact surface analysis will cover many interfaces, so it is still necessary to manually evaluate whether the testing scope can be reduced