background
After the information system goes online, it usually needs iterative upgrading or even reconstruction. How to ensure the correctness of the original business of the modified system is very important. Uncomplicated business systems can be solved by some conventional automated testing tools and manual testing, but for very complex business systems, regression testing will become a huge project. A practical example: Alibaba, as a group company with e-commerce as its core, is self-evident in the importance of transaction system and stability. The whole trading system has experienced a lot of business ups and downs in the development process for many years, and the maintenance personnel have changed one wave after another. Almost no one can sort out the business and code clearly. When it had to face a full upgrade, its regression testing was incredibly difficult. Because conventional automated test tools need to prepare test data and write scripts, their coverage is not high, so they cannot meet the requirements of regression verification after requirements reconstruction. The Doom platform solved this problem by automating regression by replicating the real traffic on the line, finding many reconstruction bugs and speeding up the process of going online. At the same time, daily automatic regression is realized by recording traffic as use cases instead of traditional automatic regression by writing scripts, which greatly improves regression efficiency and coverage. Because of the versatility of its solution, we are sharing it with you and also opening up the cloud service to support users who need it.
Platform is introduced
What is the Doom platform
The DOOM autoregression platform is a platform that replicates part of online real traffic and uses it for automatic regression testing. The innovative automatic mock mechanism supports not only regression validation of the read interface, but also validation of the write interface (such as user order interface, payment interface). At the bottom, it uses Java instrument to implement AOP. Therefore, it only supports Java application access. Its schematic diagram is as follows:
The differences between tcpCopy and Diffy are as follows: TcpCopy and Diffy implement traffic recording and playback at the network layer outside the application. They can only authenticate some read-only pages. Doom is an in-app traffic recording and playback feature implemented using AOP aspect programming, so you can do in-app interface level regression validation, as well as service level or HTTP level regression validation. Regression validation of write traffic and cross-environment regression validation (online drainage to the test environment) can be achieved through original media-level mocks as well as internal custom mocks.
Application scenarios
- During system reconstruction, the traffic of the real online environment is copied to the environment under test for regression. This means that the system goes online in advance to detect potential system problems without affecting services.
- Recorded traffic can be managed as a use case for daily automated returns.
advantage
- Low cost: No need to write test cases, rich test cases can be formed through traffic recording.
- High coverage: on the one hand, a large amount of real traffic online ensures coverage; on the other hand, it supports the verification of intermediate processes, such as full-object comparison verification of the content of the sent message and the intermediate calculation process. Traditional manual verification points are difficult to achieve.
- Support write traffic verification: (Note: Write traffic refers to traffic that may cause data changes.) There is no need to worry about application data contamination caused by write traffic playback. Support online diversion to the test environment and automatic mock of write traffic.
- Low application intrusion: Use isolated container techniques, AOP at the bytecode level, and midware-level mockers to avoid access class conflicts and reduce access costs.
How to use
Doom platform is widely used in Alibaba, especially in some core systems. Therefore, we decided to open up this product and provide it for free in the form of cloud service. Doom supports directly applying for applications on the cloud or any application that can access the public network. Platform documentation: Access User Guide Platform link: Doom platform
The principle of
- How to implement regression validation? For Web applications, requests are ultimately completed by issuing HTTP requests. We assume that production applications will normally respond to user requests, using AOP to save the input and return results of the request as well as some snapshot data during execution, such as the input and return results for database access and the input and result for remote server access. The snapshot data is then sent to the test machine (the machine where the code changes) to complete a playback process. Through the database data, call background request data and returned results and online real request occurred data for full comparison, find the difference, so as to identify the problem of the tested system. The same is true for background applications, but background applications are generally implemented through RPC requests. At this time, as long as the RPC input parameters, RPC return values and intermediate snapshot data are recorded for playback.
- How do I protect the database from contamination? Mocks are a common unit testing technique used to resolve unfinished or mistuned interfaces. Online will be extended, this feature is performed on the real requests to write the request of the database and the foreign service access, during playback when performing database to mock or call back service, so that playback not real access database, nor the real launch calls to back-end services, thus will affect the business data, It can even be played back in an offline environment, since mock data comes from real requests, eliminating the need to fabricate data.
- How to mock external system requests? Applications make RPC requests through a variety of middleware, which can be set up through platform configured middleware isolation, and platform clients perform aop processing on these middleware to automate the mock, without manually configuring the specific RPC interface. If not supported middleware please contact us, we will do adaptation development.
- How to solve the problem that the program execution flow during playback may be inconsistent with the online real flow? The state of some memory data during the execution of the program in the production environment is often inconsistent with the state of memory data during the playback of the test server. These inconsistencies will lead to different execution flow of the program. Examples include native caches, memory switches, session queries, and so on. So how to solve it? The platform provides a custom mock mechanism that mocks these code snippets that cause inconsistencies. For example, mock the cached GET method. If there is data in the line read cache, you can mock the cached data when playing back, ensuring that the playback process is the same as when the line is actually executing.
-
How to solve the noise during comparison? There must be some differences between playback and recording, such as server IP, time, random numbers and so on. There are two ways to solve this problem:
- Exclusion method: The platform supports the exclusion comparison of specified fields and excludes unnecessary fields.
- Specify comparison method: Compare the service data that you care about.
System architecture
Deployment diagram
As shown in the figure, cloud services provide configuration management functions, and users can expand to customize data storage or directly use Ali Cloud OSS storage products to store use cases or traffic. Centralized configuration management facilitates platform upgrade, and customized data storage provides users with more storage options. Enterprise A in the figure completely uses the platform functions. If the platform functions do not meet the requirements, enterprise B in the figure can implement its own stability/regression test platform based on the recording and playback capabilities provided by the platform.
Client server architecture
The diagram above shows the architecture of the client server. The client is a functional module embedded in the access application. It is responsible for traffic recording, traffic playback, middleware mock, middleware isolation, traffic comparison analysis, and so on. The server provides configuration information about the client, such as the traffic to be recorded, the proportion to be recorded, and the IP server to be recorded. Some status information of the client will also be sent to the server for display and management. In addition, only when a comparison exception occurs, the server sends abnormal data to the server for analysis. To implement different middleware mocks for different enterprises, clients need to extend different middleware mocks to achieve this. Each plug-in is managed through the middleware plug-in manager. The platform supports some common middleware and also supports extensions. In addition to mocks, you need to provide a middleware isolation mechanism. For example, you need to do some isolation at the bottom of the middleware to ensure that the database is not accessed in the case of mock failure, ensuring the security of the business data during playback. This risk can also be avoided if the playback test is performed in a non-production environment.
Open platform
- Where is recorded data stored? The platform saves recorded data to OSS by default and allows users to use their own data storage services by extension.
- Can you implement your own use case management execution platform based on the DOOM platform? The DOOM platform opens up apis for recording, playback and comparison of traffic. Users who need these capabilities can quickly build their own automated regression testing platform.