Industry-level mobile application testing is faced with the problems of operating system fragmentation, heavy burden of manual testing, complex functional scenarios, and low efficiency of automated test exploration. Tao is a technical quality team with professors of Beijing university professor Xie Tao teacher team, are put forward in the industry’s first based on computer vision (and multimodal reinforcement learning) automation test framework Monkeybot vision, solved the traverse test efficiency is low, automated testing tools cross-platform use hard problems, provides a general mobile application testing and efficient solution. The technical solution was presented to the industry at the QECon Global Software Quality and Effectiveness Conference at the end of September 21. Subsequently, Professor Xie Tao shared the landing effect of MonkeyBot on the Computing Conference in October.

Monkeybot in alibaba group, including taobao, thousands of cattle and other more than ten big industrial deployment, landing on a mobile application service in their daily testing and quality assurance, especially the double tenth first-class important scene of quality assurance and testing work, and through the green alliance to the industry to provide the user experience of software testing, intelligent exploration test, intelligent validation, etc. It has been implemented in Android, iOS, Hongmeng and other operating systems, and promoted the test efficiency by more than 3 times. So let’s see how we do that.

The research background

With billions of users, The rapid iteration of products and the complex and diversified functions bring great challenges to the quality assurance work. Faced with the problems of high regression cost of functional testing and easy to miss testing, an efficient and intelligent automated testing scheme is urgently needed to ensure the quality of software.

Solve the problem of easy missed detection in complex crossover scenarios

With the expansion of Taobao business, current mobile shopping not only has the function of placing orders, but also many other business functions, such as message chat, live broadcast, browsing, interactive games, etc. Different businesses are divided into different teams, so the launch and regression of each business are relatively independent. However, from the perspective of users, it is difficult to distinguish the boundary of each business. The process of users using the product must be a combination of multiple scenarios. For example, when users buy something, they want to communicate with merchants about the size and number. Results merchants did not reply in a timely manner, the next time could be found from the message TAB session to chat, and then wandered around the merchants can back again from the new message bounced point in chat and chat up may have to return to continue shopping, stroll stroll up and think of some things to with merchant account, and from the quick entry into to the message and then find the chat session. For a simple and commonly used chat function, users may have many paths to complete, and the process also involves many businesses other than messages. It is difficult for the test students to comprehensively cover the verification of combined scenarios, and it is easy to miss the test of cross scenarios, leading to online problems.

Solve the problem of high cost of rapid iterative regression

Hand for each iteration release every business client to return to validate, hand wash at least once a month the formal release, gray released once every week, probably for each issue should be at least two rounds of regression, each time to return to at least two covers the android and iOS system, conservative calculation function of at least a month to do 16 times to return, Each regression requires a significant human cost. In order to reduce the cost of regression, UI automation test is widely loved by everyone. Selenium, APpium, Airtest, RXT and other tools have emerged successively. Test automation through maintaining automation scripts has also achieved good results. But with the rapid iteration of the product, maintaining automated test scripts still requires significant labor and time costs. In order to improve the test efficiency, let the machine as close as possible to the user’s perspective to automatically execute the test process, as much as possible to cover different business scenarios, inspired by traversal test and Monkey test, we try to let the machine in the given app independent exploration, combined with intelligent exploration strategy, with human-like test ability. Realize automated testing with no one participating in the whole process. \

Thought analysis

In order to make machines capable of human-like testing, we set the following goals: First, at the implementation level, we want a test tool that supports multi-system and multi-platform verification; Second, test tools need to be free from the dependence of test scripts and have the ability to actively understand the business. Finally, on the basis of understanding the business, it can make intelligent decisions and execute operations, effectively covering more user operation paths. Obviously, these goals will be difficult to achieve. However, we still have to grope ahead on the road full of thorns! After a long time of technical research and trial, we found a set of solutions as shown in the figure below. First, UI analysis based on pure vision, no longer depends on system information to complete analysis, better adapted to different systems; Secondly, through image features, text information and UI structure information, the UI state is modeled to obtain the multi-modal information of the text and text, and then the picture is abstracted with different levels of multi-granularity state to better understand the business meaning of the image. Thirdly, the reinforcement learning exploration strategy is introduced to realize more effective path coverage by combining exploration history and intelligent exploration strategy.

MonkeyBot technology framework

MonkeyBot test execution is completed by relying on RXT platform. RXT itself has the advantage of compatibility with different systems, so it also achieves the purpose of getting rid of system dependence. The state abstraction layer mainly analyzes the image features, text features and image structure of the UI interface. The multi-mode and multi-granularity state abstraction compresses UI information for us, and lays a good foundation for the entrance of decision-making intelligent technology. The exploration strategy part mainly includes reward design, action selection and parameter update, and behavior decision based on historical state access and designed reward strategy.

The specific execution process is as follows:

  1. Enter the app initial state.
  2. Abstract the current state UI interface, analyze the scene characteristics through the information of element recognition, text extraction and image structure analysis, and then perform hierarchical state abstraction.
  3. After comparing the abstract state information with the historical access state information, combining with the curiosity strategy, the confidence upper bound exploration strategy and the depth and breadth balance strategy, a reward value distribution was obtained, and the parameters were updated. Next actions are selected based on the distribution of reward values.
  4. The execution is done through RXT and the new state is obtained.
  5. This loop is executed until either the specified number of exploration rounds or the specified exploration time ends.

UI elements identify technical challenges and solutions

Computer vision in the recent ten years in the natural image processing testing the effect of the proud, whether it’s the kind of person visual model based on statistical learning is based on the depth study and data driven model of neural network, such as computer vision image classification, semantic segmentation, target detection and so on are close to human performance on tasks. Despite the rapid development of computer vision in nature image processing tasks, the research on UI as a specific scene is relatively lacking. Our large-scale empirical study on Taobao shows that directly applying existing computer vision technology to UI analysis tasks without modification will lead to poor results, and it is very difficult to accurately segment and identify all element information in the entire UI interface:

  1. It is difficult to set a good threshold value through the traditional image segmentation method, and it is easy to combine multiple elements into one, or divide an integral element into several smaller elements.
  2. Want to rely solely on deep learning approach to training recognition element is it won’t work, for hand wash the large popular application, most of the UI interface element is designed based on the business, style variety, not unified, and update the iteration is fast, so it is almost impossible to complete the collection of data set of all elements, hard landing in the actual use.

To this end, we developed comprehensive visual analysis techniques specifically for UI images.

First of all, our technology is low data dependent. For mobile applications without test history data, our technology is based on dynamic image filtering, using a variety of filtering algorithms and bottom-up gradient segmentation and region aggregation based on visual statistics, which can achieve high analysis accuracy without training data. At the same time, our UI analysis technology can automatically collect the application data under test during the analysis process, and drive the parameter fine-tuning based on deep learning and machine learning pre-training model with the help of the historical feedback data of the test, to further improve the effect of UI analysis.

Secondly, we absorb the idea of autoML, and select the most suitable parameter combination for specifying the application and machine to be tested through automatic parameter combination and verification.

In addition, our technology can perform segmentation of UI based on image semantics and business semantics, and perform heterogeneous visual analysis of different components of UI, further improving the stability and robustness of UI analysis.

The figure above shows the effect of our UI analysis technique. The control positioning, control classification and semantic segmentation accuracy of more than 95% in the core scene of mobile control, UI level extraction and business logic protocol has also achieved a very high precision.

Explore problems and solutions encountered during the process

There are three common problems encountered during traversal and exploration testing:

  1. Repeat testing for similar or identical business scenarios, such as ordering from Taobao, which are different products but have the same functionality, so there is no need to repeat testing for each product detail page.

  2. In the case of staggered scenes, A loop can occur, where scene A can click on an element to enter scene B, and then scene B can click on an element to return to scene A. For example, as shown in the figure below, the scene of chatting with merchants and the scene of the shop home page can be switched back and forth, which may cause the machine to be stuck in a loop for invalid exploration.

  3. In some scenes, there are very few elements that can be clicked effectively, and many elements are unresponsive when clicked. In this case, continuous exploration will affect exploration efficiency. In the chat page shown in the following figure, each chat message is identified as an independent element, but none of them can be clicked.

Our exploration process was completed based on reinforcement learning, and many optimization designs of exploration strategies were made for the above problems. Both repeated exploration and cyclic exploration have a great relationship with historical states, so we introduce the mechanism of curiosity exploration to increase the ability of remembering historical states and encourage agents to choose actions that are more likely to trigger new states. And by balancing the depth and breadth of the ability, decide whether to continue to explore the current page, or go back to the previous page to find another operation path. Aiming at the problem of invalid clicks first when doing the state abstract element distribution and the distribution of the text, we will analysis the interface information abstract compression, and then we will according to the state of the abstract information of each state a value estimate, namely whether the state is important, if we need to continue to explore, to enhance our exploration effectiveness.

Results show

In the app test of Handtao, compared with the random strategy of Monkey test, the activity coverage of the designed exploration strategy increased by 120%, and the coverage of the effective functional path increased by more than 2 times, greatly improving the effectiveness of the coverage path.

Summary and Outlook

For applications with a large number of users or complex business scenarios, functional test regression costs are high and easy to miss, so how to comprehensively cover the user operation path in the test becomes a major problem of quality assurance. This paper introduces a set of intelligent exploration test scheme based on fusion AI technology, which is jointly studied by the technical quality team of Tao Department and professor Xie Tao’s team of Peking University, and analyzes some technical and business problems solved by the scheme. In order to create a universal humanoid testing tool, RXT robot testing tool was used to complete the execution operation, and then to create a pure visual analysis solution. The UI can be segmented based on image semantics and business semantics, and the control positioning on the core scene of handwashing, control classification and semantic segmentation accuracy is more than 95%. The UI state is modeled by image features, text information, and the structure of the UI, while allowing multi-granularity representation of state abstraction. The use of multi-modal information fully improves the accuracy and reliability of state abstraction, while multi-granularity state abstraction makes it possible to express business semantics and various testing requirements synchronously, increasing the flexibility of intelligent exploration testing. The introduction of reinforcement learning exploration strategy, combined with exploration history and intelligent exploration strategy, solves problems such as repeated exploration in the same business scene, exploration stuck in a cycle and invalid click on the interface. Compared with random strategy, activity coverage is increased by 120%, and effective function path coverage is increased by more than 2 times, achieving more effective path coverage. Future and more room to improve in the exploration strategy, we hope that by state abstraction and exploring ability of the strategy, the machine can be achieved within a given app more efficient exploration, at the same time combined with intelligent encountered in the process of verify the ability to take the initiative to find bugs, complete with class test ability, realize the whole process of unattended automated tests. Welcome to contact: Ju Fan email: [email protected]