The background,

At present, domestic and international large Internet companies have a lot of APP or game traversal testing tools, and there are also a lot of UI automation tasks running inside the company every day. As automation through tool gradually mature and daily tasks for each mission will produce a lot of traversal process screenshot, about a lot of this screenshot, it is difficult to have human make check whether there is abnormal, the human cost is too big, but it may spend a lot of manpower to check, but the actual but seldom can see images of the abnormal problem. To solve this problem, the automatic assertion ability of image exception is obviously needed to determine whether there is some kind of anomaly in the automatic task screenshots.

By providing a variety of image anomaly detection capabilities, it is expected to explore and solve the automatic assertion of UI image anomalies in various UI automation tasks. It also provides tools or services with simple access and diverse access modes.

Second, the overall plan

The scheme is mainly composed of three parts: algorithm ability access, algorithm training framework and image database.

The solution provides two access modes: SDK and service. Currently, the SDK mainly provides Python SDK, which is mainly applicable to confidential projects or businesses and projects that do not want relevant screenshots to be released from the production environment. A separate API is provided for each image quality detection algorithm for the access party to customize the choice. The HTTP service provides a universal server access mode. At the same time, the automatic task plug-in can be used to access the image anomaly detection, and after the automatic Case runs, the image quality anomaly assertion can be made on the process screenshots.

The training framework is only used for image classification, including image preprocessing, image enhancement, image antagonistic sample generation, model training, etc. More image algorithm-related frameworks will be added later, such as image segmentation and target detection.

The data sources in the image database are mainly composed of abnormal business data and abnormal image generator. Since there are few real abnormal images, we need to generate some artificial abnormal images through some algorithms on the basis of abnormal images. It can also be obtained by business students by corrupting certain image resources.

The detection process mainly consists of three parts: general quality detection algorithm, scene classification, scene customization algorithm. The reason for scene classification is that many apps have different definitions of some image quality anomalies for different APP scenes (such as the detail page and video page), so different algorithms need to be adapted according to the scene.

Third, the core algorithm

Because anomaly detection data is usually distributed in the long tail, we adopt the traditional image processing method for most algorithms to solve the problem, and for the situation that image processing cannot solve, we use the abnormal image generation + deep learning method to solve the problem.

The advantage of traditional image processing lies in its relatively low requirement on the amount of data, but poor generalization. With the gradual accumulation of abnormal data over time, it will gradually change to the mode of traditional method recall + deep learning method accurate classification.

3.1 Core Indicators

For the image anomaly detection algorithm, in addition to accuracy and Recall, we define two indexes to evaluate the quality of the algorithm: detection rate and error detection rate. Set the total number of images as X, the total number of abnormal images as Y, the number of correctly detected anomalies as C, and the number of incorrectly detected anomalies as W

Detection rate M =cyM_{detection rate}=\ cFRAc {C}{y}M detection rate = yC

False detection rate M False detection rate = Wx −yM_{false detection rate}=\ cFRAc {w}{x-Y}M False detection rate = X −yw

The detection rate is used to evaluate the ability of the algorithm to identify abnormal images from a large number of images. The false detection rate is used to assess the probability of detecting real anomalies in the image. For abnormal image detection, we have a higher requirement on detection rate, but a relatively loose requirement on false detection rate. The reason is that we don’t want to miss any abnormal images, but a small number of misidentified cases can be screened out during manual Review.

3.2 Black-and-white screen detection

(1) overview

This command is used to check whether the screen is blank or blank. The following is an example:

(2) principle

The interface is divided into several regions from top to bottom, and the average values of pixel mean and standard deviation of all pixel regions of 32 * 32 gray image in the region are calculated to detect whether the current interface has black screen or white screen. You can customize the black and white screen standards for different services.

3.3 Damage Detection

Damage detection is performed based on the damage exceptions that may occur in different service scenarios. The service side can customize the detection algorithm to be invoked. The more common damage anomalies are purple block anomaly, white block anomaly and spline anomaly.

3.3.1 Abnormal purple block

(1) overview

Purple bar exceptions are commonly found in game scenarios, and are used with UI automation to detect purple bar exceptions during traversal or Case execution. Usually the cause of the abnormal purple patch is the damage or missing of the texture or model image.

(2) principle

The proportion of purple pixels in the screenshot was calculated to determine whether there were abnormal purple blocks. Since the purple block when the map or model image is damaged will also be affected by the shadow effect, the RGB value range of the purple block is limited to R >= 220, G <= 60, and B >= 220.

3.3.2 Abnormal white block

(1) overview

White block exceptions usually occur in the game scene, with the HELP of UI automation to detect white block exceptions during traversal or Case execution. The common reason for the white block anomaly is that the UI image is corrupted or missing.

(2) principle

By detecting the white rectangle area in the picture, we can judge whether there is abnormal white block. The specific principle is shown in the figure above. In the calculation of pixel mean, the threshold of pixel mean can be appropriately relaxed to detect the white rectangle anomaly as shown below.

(3) effect

The detection rate of test set was 99.8% (4826/4836), the error detection rate was 0.1% (209/15W), Precision 96% and Recall 99.8%

3.3.3 flowers screen

(1) overview

Detect abnormal screen splashes in games or videos.

(2) principle

The classification data set was constructed by generating abnormal images (mainly image processing and manual collection), and the training binary CNN model was used.

(3) effect

The accuracy of the test set was 98.7% (961/974), in which the accuracy of abnormal images was 99.4% (484/487) and that of normal images 98.1% (478/487). The predicted CPU time of a single image is about 42ms. Model size 4 MB

3.4 Black box detection

(1) overview

Detect if the image has a black frame

(2) principle

Determine whether there is a black area around the image that exceeds the threshold width

3.5 Overexposure detection

(1) overview

Detect whether the special effect in the image has been exposed

(2) principle

  1. The highlighted area is preserved by binarization
  2. The noise was removed by morphological open operation
  3. Filter by the area and shape of the highlighted area

3.6 Text overlap detection

(1) overview

Check whether text overlap exists in the text area of the page

(2) Abnormal data construction

In the actual scene, the number of normal sample pictures is very large. We use OCR to locate the text area, extract the background and text color through the algorithm, and superimpose the misplaced fonts of specific size, font and color to construct abnormal images of text overlap. In this way, we construct a dataset of text overlap regions, where the anomaly region is of the order of 2W +

(3) principle

Since the size of the text area is too small, it is not feasible to use the whole graph dimension as the input for detection after the experiment, and the current scheme is finally adopted:

OCR is used to extract the position of the text region, and the text region is sent into the trained text overlap classifier to judge whether there is text overlap. Accuracy of test set is 88%, and Recall is 62%

3.7 Unsupervised Clustering

(1) overview

In the scene of abnormal image detection, for each different scene, the detection algorithm to be applied and the detection standard of the same algorithm are different. Therefore, the intelligent algorithm is required to perform scene clustering for multiple scenes of the same APP and then perform customized detection for different scenes. Here, we can obtain an automatic scene division from the massive automatic screenshots in an unsupervised way, avoiding the manual annotation of massive data and the long process of model training and tuning in supervised learning.

(2) principle

  1. Sift feature points and feature vectors under multiple scales are extracted by Sift + Spatial Pyramid Pooling. Here, Sift features under three scales are selected.
  2. A Codebook was obtained through KMeans clustering of feature vectors at all scales extracted from all images in the whole data set.
  3. According to codebook, the Sift features of each image are calculated as histogram vectors, and a total of 14 histogram vectors of 3 scales 1+4+9 are spliced as the global representation of the image
  4. Bisecting KMeans was used to perform adaptive clustering of global vectors without manually specifying the number of clusters.
  5. After visualization of the final clustering result, clusters of the same scene are selected to merge and the final scene classifier is obtained.

(3) effect

Four, technical difficulties

4.1 data

For all anomaly detection tasks, obtaining anomaly data is the most difficult. In order to solve the difficulty of data acquisition, we adopted three ways to manually create abnormal images — GrabCut algorithm (purple block anomaly), random overlay image (flower screen anomaly) and text area overlay (text overlap anomaly).

GrabCut algorithm is an algorithm based on GMM and graph algorithm to segment foreground and background in images. After extracting foreground images through GrabCut algorithm, we fill foreground objects with purple color blocks and form artificial abnormal images.

The random overlay method is mainly used to generate split-screen images in the game. Through the few real split-screen images, we found that split-screen anomalies were usually caused by a mixture of images from the previous and subsequent frames. A batch of anomalous graphs which are very close to real ones are generated by random superposition.

In the method of text area superposition, we use OCR to locate the text area, extract the background and text color through the algorithm, and superimpose the misplaced fonts of specific size, font and color to construct abnormal images of text overlap. In this way, we construct a dataset of text overlap regions, where the anomaly region is of the order of 2W +

4.2 Over Confident

As the classification task of artificial abnormal images is too simple for deep convolutional neural network, the Score of the prediction result of the final model is usually close to 1, usually around 0.99, which is very unfavorable to the actual task. To alleviate the problem of Over Confident model, we introduced the regular method of Label Smoothing.

4.2.1 Label Smoothing

Label smoothing is simply to replace Hard One Hot Label (One category is 1 and the rest is 0) with Soft One Hot Label (One category is close to 1 and the rest is close to 0) in the calculation of Loss to relieve Over Confident.

This regular approach is proven to be effective in most classification tasks.

4.3 Sample is too simple

For the problem that artificial samples are too simple, it may be found that some cases that are obviously abnormal images are wrongly classified as abnormal. We tried to add some Adversarial Examples to artificial samples and real normal pictures to make the classification boundary more clear and robust.

4.3.1 Adversarial Examples

In the classification task, we sometimes find that a correctly predicted image with some very small noise invisible to the naked eye will make the classifier completely wrong, even though the naked eye looks no different. These samples are counter samples.

By adding adversarial samples, the boundary of the distribution of the data learned by the classifier is clearer, and the model is prevented from learning some unknown visual information that we cannot observe.

4.4 Computing Resources

In the testing process, the number of requests for image analysis is large, which requires high real-time performance of the algorithm. Among the algorithms mentioned above, splintering detection algorithm has the highest requirements on computing resources. By selecting MobileNet, a lightweight network suitable for mobile terminal, and TF Lite Uint8 quantization provided by Tensorflow, it can achieve great speed improvement and saving of computing resources with little loss of precision.

Five, landing effect

Here only three algorithms for landing game scene have done landing effect experiments

5.1 Effect of test set

Detection rate Error detection rate
Purple block 96.4% (53/55) 0/15 of 0% (w)
White block 99.8% (4826/4836) 209/15 of 0.1% (w)

The accuracy of the screen splicing algorithm test set was 98.7% (961/974), in which the accuracy of abnormal images was 99.4% (484/487) and that of normal images 98.1% (478/487). The predicted CPU time of a single image is about 42ms. Model size 4 MB

5.2 Landing Effect

The total number of Abnormal image No abnormal picture
The total number of 13856 1 13855
Purple block 1 1 0
White block 70 0 70
Flowers screen 140 0 140

Purple block

Predict whether there are anomalies
Check whether an exception exists Yes No
Yes 1 0
No 0 13855

White block

Predict whether there are anomalies
Check whether an exception exists Yes No
Yes 0 0
No 70 13786

Flowers screen

Predict whether there are anomalies
Check whether an exception exists Yes No
Yes 0 0
No 140 13716

As you can see, the number of screenshots that need to be asserted is greatly reduced by the image quality detection plug-in (13856->211). In terms of algorithm effect, there is no need to optimize the purple block algorithm, while the white block and split-screen algorithm need to collect misdetected images for further tuning.