preface
Image recognition and algorithm introduction
Maybe image recognition for us is not strange, more or less have contact with, but can directly apply image recognition to our test work of students seems not very much. Image recognition this noun official saying is “the use of computer image processing, analysis and understanding, in order to identify a variety of different patterns of the target and object technology”. In fact, the most important thing for image recognition is the word “recognition”. For testing is through “recognition” let the computer assist us to test, let the computer instead of us to test.
The following will be how to “identify” the problem, the author in the use of image recognition as an auxiliary tool used by some algorithms and operators.
2.1 SIFT algorithm
The main method of SIFT algorithm is to extract the eigenvalues and eigendirections of some local appearance points of interest on the object in the image. The feature values and feature directions of these points of interest have nothing to do with the size of the picture and the rotation Angle, and the tolerance of the change of light and micro-perspective is quite high. Moreover, SIFT algorithm has a very high detection rate of partial object occlusion, and at the speed of the current computer hardware, the identification speed is close to real-time operation.
SIFT algorithm is the main characteristics of the invariant scale, that is, image magnification does not affect matching, image rotation at any Angle does not affect matching, image brightness does not affect matching, shooting Angle level does not affect matching.
2.1.1 SIFT algorithm calculation process
Because SIFT algorithm calculation process involves more mathematical knowledge, this paper is not mainly to explain mathematical knowledge for the purpose, so the calculation process is only a few key steps after the author briefly understand. The following information is available online.
1) Image denoising: The denoising algorithm is Gaussian blur. Gaussian blur processing is performed on the input image first.
2) Construction of The Gauss Pyramid: The Gauss pyramid is a group of images obtained by continuously changing the scale parameters of the original image.
3) Construction of Gaussian difference pyramid: The two adjacent graphs of each layer of the fuzzy Gaussian pyramid are used to construct the difference pyramid.
4) Extracting spatial extremum points: The spatial extremum points of each graph are obtained by analyzing the extremum points of the difference pyramid.
5) Obtain the main gradient direction: The direction of the main gradient is the direction in which the extreme point changes the most.
After the above steps, the feature points and feature directions of a map at different scales can be obtained.
The following figure shows the position and direction of feature points obtained by KingRoot homepage operation. The radius of the circle is the extreme value, and the direction of the radius is the characteristic direction.
2.2 Canny operator
For contour recognition, there are many various operators. The function of Canny operator is: it can represent the actual edge of the image as much as possible; The marked edge is as close as possible to the actual edge of the actual image; The edge can only be identified once and the noise point can be identified as the edge.
2.2.1 Operation process of Canny operator
Similar to SIFT algorithm, this paper does not introduce the operation process of Canny operator in detail. The author gives the following calculation steps according to his own understanding.
1) Image denoising: Canny operator uses the same denoising method as SIFT algorithm, and also uses Gaussian blur to achieve.
2) Obtain the gradient: obtain the brightness gradient of each point in the image and the brightness gradient direction by calculating the gradient of each point in each direction.
3) Edge detection: double threshold algorithm is adopted. The high threshold is used to filter most noise points, and the low threshold is used to retain most information of the image.
The following figure shows KingRoot home page and contour information calculated by Canny operator. As shown in the figure, most of the contours in the picture can be captured.
Three, image recognition application
3.1 Control click method based on image recognition
When using Uiautomator, we often encounter a problem that some mobile phone pop-ups cannot be well identified as buttons, as shown in the following figure. If you need to click the “General Settings” button during the test, you can’t actually click it through Uiautomator.
Because the popbox Uiautomator in the figure above cannot be well recognized, as shown below. The entire popbox does not exist in THE Uiautomator XML, and the entire home page is identified as a View. Therefore, coordinate click method is often used in our test to avoid, but the problem of adaptation is derived. I moved to a different phone and the resolution changed, so the script couldn’t work.
Based on the above problems, the Guangzhou test group adopted the method of image recognition to find the position of “general Settings” and click it. Because SIFT algorithm has scale invariance and is not affected by size and illumination, it can be used for matching. The flow chart of the specific algorithm is as follows.
The algorithm mainly has the following 7 steps, can realize to fail to identify the control search click.
1. Save the pictures you need to match in advance
2. Capture the current screen
3. SIFT algorithm outputs feature values and feature variables of the target graph and screen shots
4. Use KNN algorithm to match eigenvalues and eigenvectors
5. Remove the noise points that deviate more seriously
Find the center point of the set of points
7. Implement click events
In this paper, by clicking “General Settings” as an example, SIFT algorithm is used to calculate the eigenvalues of the original image and the screen shot, as shown in the figure below. You can see that the eigen values, the eigen vectors are pretty similar.
The matching process is shown below, and you can see that most of the matched values are in the “common Settings” position.
Through the above steps, you can obtain the location of “general Settings” on the screen, and then click to achieve the method of clicking “general Settings”.
3.2 Deep traversal test tool
Depth traversal test tool based on Canny operator and Uiautomator
When using the Monkey for random operations, we found that in many cases the Monkey did not do a good job of simulating human clicks on useful key points, thus wasting a lot of time and testing efficiency was not high. At the same time, if crash is found in the process of clicking and the path that has been operated before is not well known, it is particularly difficult to reproduce the bug. Therefore, the author is considering whether to implement a tool that can simulate small white users to click operations, record the path of operation, and click the application to a greater extent.
The depth traversal testing tool based on Canny operator and Uiautomator can actually solve the following problems well. Let’s start with the environment of the tool. As shown in the figure below. The main control program runs on THE PC, and the controlled object is the mobile phone. The communication between the two is through ADB commands.
3.2.1 Tool Functions
The main functions of the deep traversal tool are as follows:
1. Realize the random click function and click the key points more than once;
2. Always check whether you are clicking the app under test, and if you jump out and return to continue (because there may be inter-app jump during the clicking process);
3. Record the click path completely and reproduce the click path;
4, each step of the operation has corresponding screenshots, so that the tester can look back to confirm;
5. Apply Crash to alarm and stop running in time.
3.2.2 Test tool workflow
The working flow chart of the test tool is shown in the figure below.
The general process of the test tool consists of the following 8 steps.
1. Take out the contour key points from the screenshot
2. Take out the Uiautomator layout file and add the clickable=True point to the list of key points
3. Click key points in turn and judge whether there is a jump
4. If there is a jump, return 1 and start exporting key points again
5. If there is no jump, mark the previous point as useless and continue to click
6. If there is no jump and the current page has been clicked, judge whether the unfinished page can be returned
7, can return not click the end of the page jump back
8. You cannot exit the tool before clicking the page
3.2.3 Definition of key points
Different testers have different opinions on the definition of key points. The author believes that the key points must meet the following two conditions:
1, clickable must be key points;
2. Places that look clickable and meaningful.
According to these two conditions, on the one hand, the author includes all the points with clickable= True attribute in Uiautomator into key points, and on the other hand, the points with clear contour meaning identified by Canny are also included into key points, because in the author’s opinion, generally surrounded by contour are a complete picture. Or a word with complete meaning.
3.2.4 Traversal Rules
The test tool traverses click rules using depth-first algorithm, that is, every click, judge whether there is a jump. If there is a jump and it is different from the previous page, create a new group of key points and traverse from the first point. If no, continue to click from the next point until no further page is displayed and all key points on the current page have been clicked. Then roll back the page. The criterion for judging whether to jump is to compare the pictures before and after the click and observe the similarity of the pictures to judge. If the similarity is less than a certain threshold (defined by the program as 0.8), there is a jump.
3.2.5 Return rule
If all the points on the current page have been clicked and no new page is displayed, the tool will determine whether there are unfinished pages in the current project and whether the current page can be skipped. The criterion of judgment is to search the next level of the page according to the width of the current page. If there are unfinished pages, the page will jump. If there is no jump of the first level page, the sub-page of the second level will be traversed, and so on, until a path that can be backtracked is found.
3.2.6 Path Record
Click path record has the following key points to record.
1. Current page information: including the page ID, completion status, and name of the corresponding PIC
2, key point information: including point ID, status (not clicked, clicked)
XML tables are easy to read and write and easy to display, so this tool uses XML as the carrier of record path selection. The specific record format is shown in the following figure.
3.2.7 Implementation effect
Taking mobile butler as the test object, the author used this tool to traverse about 100+ pictures in a night, and each picture was guaranteed to be inconsistent. Moreover, 70% to 80% of the first and second levels of butler pages could be traversed, which basically realized the automatic operation of traversing.
3.2.8 Deficiencies and prospects
Although it can basically traverse 70% to 80% of the first and second level pages of the butler, the tool still has the following deficiencies that need to be optimized and solved.
1. For pages that need to be loaded, before and after loading are considered as two pages. This is because the page takes time to load, and although the tool has a wait time before each screenshot, a page that loads too long will be considered as two different screenshots before and after loading.
2. The running time is too long. For the housekeeper, 100+ pictures are built on the running time of 11-12 hours of traversal, because it takes time to screenshot each time, wait for the interface to jump, and there are many key points on each page, so there is so much information, so the running time is long.
3, for some one-time pages, that is, click the same button in the same position, the first time to click into the page is not consistent with other times to enter the page, because the path table records the jump page, so when returning, there will be a phenomenon of mismatch.
4. The image recognition method can only be used as a supplement of Uiautomator and Monkey, which solves some limitations of Uiautomator and cannot completely replace Uiautomator and Monkey. Looking to the future, image recognition based functions are still promising, but there are some areas that can be optimized.
1. To reduce the running time, the current interface can be judged to be stable by judging the stability of the interface. In this way, a lot of useless screenshots can be reduced and efficiency can be improved by skipping after Loading and scanning
2. Optimize the operation speed of the image similarity algorithm. At present, it takes about 0.5s to compare the similarity of two images, so it is a waste of time. On the one hand, the optimization direction can reduce the image size without changing the effect, on the other hand, a new calculation method can be adopted to speed up the calculation of similarity.
3. Optimize the operation of reading and writing XML files. Under the current code architecture, XML will be overloaded and written for many times, and multiple I/O operations will lead to a lot of system time, so the number of reading and writing can be reduced to optimize the running time.
Four,
For image recognition for the way of testing this paper is only a piece of introduction, hope to be able to add machine learning and neural network in the image recognition of the current popular technology, and its application to the development of testing tools. I hope this paper can inspire more students of image recognition to think more, and image recognition can be more applied to work.
Copyright, prohibit reprint