background
As the number of users of an application increases and the business becomes more and more complex, performance problems will become more prominent, especially on low-end phones, and even affect the application’s user activity, stay time and other important indicators, so it is urgent to improve the performance of the application on low-end phones. How to make a reasonable evaluation of the optimization of r & D students we need to consider the following two points:
- To avoid “sporty” performance optimization, many teams have invested a lot of time and energy in the application of special governance, due to the lack of normal control and governance means, ultimately lead to the deterioration of performance volatility volatility;
- Online buried log data cannot fully reflect users’ real experience and feelings of the application;
One of the most important indicators affecting user experience is startup time, especially when pulling new. Generally, there are two directions for measuring startup time: one is through technical burying point, but it is difficult to measure the user’s real sense of entity based on the recorded data of technical burying point (good online statistical data? Is it real?) And the data of competing products cannot be obtained based on technology buried points; Another is recorded by screen frame test, but the artificial record screen frame by frame analysis for cognitive error (end boundary inconsistent cognition), and human performance special test continuous delivery ROI is not high, such as recording, 10 times to extract key frames average, almost wants to spend more than half an hour, sampling frequency, the more the longer time. As I have been reading machine learning books recently, I wonder if I can practice this case.
Before this, I also investigated similar schemes existing in the industry, including OCR text recognition and image comparison. If the scheme of image comparison is the whole picture comparison, the advertisement and home page poster in the process of video startup will change, so it cannot be accurately identified. In addition, if it is a partial comparison, the incomplete display of the first screen may not be in the same place every time after the app is fully launched. Therefore, I implemented the whole scheme after referring to various schemes and combining my own ideas. Next, I will introduce this scheme in detail.
The whole process
- Phase I mainly collects data, converts videos into pictures, and generates training data and test data
- Phase two is mainly about training model and quality assessment
- Phase three is mainly about predicting and calculating start-up time through trained models
Environment to prepare
Since the whole solution is implemented by Python, the Python environment needs to be installed locally. Here I am using a Mac so the Python environment is installed by default, but if Python3 is used, I need to upgrade myself and install PIP tool:
brew install pip3
Copy the code
Install SciKit-Learn, a simple machine learning framework, and the dependent scientific computing software package Numpy and algorithm library Scipy:
pip3 install scikit-learn
pip3 install numpy
pip3 install scipy
Copy the code
Image processing libraries OpenCV and IMUtils:
pip3 install opencv-contrib-python
pip3 install imutils
Copy the code
Ffmpeg for video file frame processing:
brew install ffmpeg
Copy the code
Install airTest framework (a cross-platform UI automation framework from netease) :
pip3 install -U airtest
Copy the code
Install POCO framework (a cross-platform UI automation framework from netease) :
pip3 install pocoui
Copy the code
Note: You need to turn on the touch feedback switch in the Android phone’s developer options so that you can identify exactly when you click the app icon.
Phase one
First installation
Since there will be various permissions for the first installation of the application, in order to avoid affecting the accuracy of the test, we need to remove the popbox of the first installation, then kill the application and restart to calculate the cold startup time.
In addition, to simulate the user’s real sense of entity, the user should first simulate the real process of clicking the application to start. At this time, the application cannot be directly evoked by ADB. I clicked the icon of the desktop application through THE POCO framework.
poco = AndroidUiautomationPoco()
poco.device.wake()
poco(text='Application name').click()
poco(text='Next step').click()
poco(text='allow').click()
poco(text='allow').click()
poco(text='allow').click()
os.system("adb shell am force-stop {}".format(package_name))
Copy the code
Start the recording screen
Run the ADB command to start the screen recording service. – time-limit 20 indicates that screen recording takes 20 seconds. Generally, the startup and home page can be completed in 20 seconds.
Screen recording is started by a separate thread.
subprocess.Popen("adb shell screenrecord --time-limit 20 /sdcard/sample.mp4", shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
Copy the code
Start the application
Install the application under test before the test. Then, after clicking the permission pop-up box, kill the process and click the desktop icon again to start the application.
os.system("adb install -r {}".format(apk_path))
poco(text="Application name").click()
Copy the code
After screen recording, kill the process, and then repeat the above startup process, depending on the sampling rate to determine how many times.
os.system("adb shell am force-stop {}".format(package_name))
Copy the code
Video frame
The recorded video will be pulled from the phone to the local, and then through FFMPEG frame processing.
os.system("adb pull /sdcard/sample.mp4 {}".format(video_local_path))
os.system("ffmpeg -i {} -r 60 {}%d.jpeg"Format (video_LOCAL_PATH, test_path)) -r specifies the frame rate to extract, that is, the number of pictures to extract from the video per second. 60 means 60 frames per second.Copy the code
Extract training set and test set data
We generally divide data into training sets and test sets in 80% and 20% proportions. Here, we can record 10 groups of data, including 8 groups as training sets and 2 groups as test sets.
Phase two
Manual annotation of training set data
Since we identify each stage of start-up through image classification algorithm, we first need to define the stages of start-up, which are divided into five stages:
- 0_desk: desktop phase
- 1_start: Stage of clicking icon icon
- 2_splash: stage when the splash page appears
- 3_loading: loading stage of the home page
- 4_stable: Render stable stage on the home page
The five stages are pictured below:
Feature extraction and descriptor generation
Here select SIFT feature, SIFT feature with scale, rotation, illumination invariance, and both of geometric distortion, image geometric deformation of a certain degree of robustness, using Python OpenCV extension module in the SIFT feature extraction interface, you can extract the image of SIFT feature points and descriptors.
Word bag generated
Word bag of generation, is based on the description, on the basis of child data to generate a series of vector data, the most common is the first by K – Means to describe the son clustering analysis of data, usually divided into 100 clustering, get every clustering center of the data, to generate 100 word bag, according to each descriptor to the clustering center distance, Determines which cluster it belongs to, thus generating its histogram representation data.
SVM classification training and model generation
SVM is used for data classification training and output model is obtained. Here, sklearn linear SVM training is used to achieve classification model training and export.
import cv2
import imutils
import numpy as np
import os
from sklearn.svm import LinearSVC
from sklearn.externals import joblib
from scipy.cluster.vq import *
from sklearn.preprocessing import StandardScaler
# Get the training classes names and store them in a list
train_path = "dataset/train/"
training_names = os.listdir(train_path)
# Get all the path to the images and save them in a list
# image_paths and the corresponding label in image_paths
image_paths = []
image_classes = []
class_id = 0
for training_name in training_names:
dir = os.path.join(train_path, training_name)
class_path = imutils.imlist(dir)
image_paths += class_path
image_classes += [class_id] * len(class_path)
class_id += 1
Create SIFT feature extractor
sift = cv2.xfeatures2d.SIFT_create()
Feature extraction and descriptor generation
des_list = []
for image_path in image_paths:
im = cv2.imread(image_path)
im = cv2.resize(im, (300, 300))
kpts = sift.detect(im)
kpts, des = sift.compute(im, kpts)
des_list.append((image_path, des))
print("image file path : ", image_path)
# Describe subvectors
descriptors = des_list[0][1]
for image_path, descriptor in des_list[1:]:
descriptors = np.vstack((descriptors, descriptor))
# 100 Clustering k-means
k = 100
voc, variance = kmeans(descriptors, k, 1)
# Generate feature histogram
im_features = np.zeros((len(image_paths), k), "float32")
for i in range(len(image_paths)):
words, distance = vq(des_list[i][1], voc)
for w in words:
im_features[i][w] += 1
# Realize the statistics of verb frequency and occurrence frequencynbr_occurences = np.sum((im_features > 0) * 1, Axis =0) IDF = np.array(np.log((1.0 * len(image_paths) + 1)/(1.0 * nbr_paths + 1)),'float32')
# dimension,
stdSlr = StandardScaler().fit(im_features)
im_features = stdSlr.transform(im_features)
# Train the Linear SVM
clf = LinearSVC()
clf.fit(im_features, np.array(image_classes))
# Save the SVM
print("training and save model...")
joblib.dump((clf, training_names, stdSlr, k, voc), "startup.pkl", compress=3)
Copy the code
Forecast validation
Load the pre-trained model, and use the model to predict the data on the test set. The test results show that the image classification in the start-up stage can achieve better results.
Here is the code implementation of the prediction method:
import cv2 as cv
import numpy as np
from imutils import paths
from scipy.cluster.vq import *
from sklearn.externals import joblib
def predict_image(image_path, pkl):
# Load the classifier, class names, scaler, number of clusters and vocabulary
clf, classes_names, stdSlr, k, voc = joblib.load("eleme.pkl")
# Create feature extraction and keypoint detector objects
sift = cv.xfeatures2d.SIFT_create()
# List where all the descriptors are stored
des_list = []
im = cv.imread(image_path, cv.IMREAD_GRAYSCALE)
im = cv.resize(im, (300, 300))
kpts = sift.detect(im)
kpts, des = sift.compute(im, kpts)
des_list.append((image_path, des))
descriptors = des_list[0][1]
for image_path, descriptor in des_list[0:]:
descriptors = np.vstack((descriptors, descriptor))
test_features = np.zeros((1, k), "float32")
words, distance = vq(des_list[0][1], voc)
for w in words:
test_features[0][w] += 1
# Perform Tf-Idf vectorizationNbr_occurences = np.sum((test_features > 0) * 1, axis=0) IDF = np.array(np.log((1.0 + 1)/(1.0 * nbr_occurences + 1)),'float32')
# Scale the features
test_features = stdSlr.transform(test_features)
# Perform the predictions
predictions = [classes_names[i] for i in clf.predict(test_features)]
return predictions
Copy the code
Stage three
Collect a new startup video
The same approach was adopted in Phase 1.
Use models to make predictions
Do the same with the phase 2 test model.
Calculate startup time
According to the prediction results, it is determined to click the picture in the application icon stage and the picture after the home page is rendered stable to obtain the direct frame difference value of the two pictures. If the image is extracted at 60 frames, the total time is = frame difference * 1/60. The code for specific calculation of this part is as follows:
from airtest.core.api import *
from dingtalkchatbot.chatbot import DingtalkChatbot
from poco.drivers.android.uiautomation import AndroidUiautomationPoco
webhook = 'https://oapi.dingtalk.com/robot/send?access_token='
robot = DingtalkChatbot(webhook)
def calculate(package_name, apk_path, pkl, device_name, app_name, app_version):
sample = 'sample/screen.mp4'
test_path = "dataset/test/"
if not os.path.isdir('sample/'):
os.makedirs('sample/')
if not os.path.isdir(test_path):
os.makedirs(test_path)
try:
os.system("adb uninstall {}".format(package_name))
os.system("adb install -r {}".format(apk_path))
poco = AndroidUiautomationPoco()
poco.device.wake()
time.sleep(2)
poco(text='Application name').click()
poco(text='Next step').click()
poco(text='allow').click()
poco(text='allow').click()
poco(text='allow').click()
os.system("adb shell am force-stop {}".format(package_name))
subprocess.Popen("adb shell screenrecord --time-limit 20 /sdcard/sample.mp4", shell=True,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
poco(text="Application name").click()
time.sleep(20)
os.system("adb pull /sdcard/sample.mp4 {}".format(sample))
os.system("adb uninstall {}".format(package_name))
os.system("ffmpeg -i {} -r 60 {}%d.jpeg".format(sample, test_path))
image_paths = []
class_path = list(paths.list_images(test_path))
image_paths += class_path
start = []
stable = []
for image_path in image_paths:
predictions = predict_image(image_path, pkl)
if predictions[0] == '1_start':
start += [str(image_path.split('/')[2]).split('. ') [0]]elif predictions[0] == '4_stable':
stable += [str(image_path.split('/')[2]).split('. ')[0]]
start_time = int(sorted(start)[0])
stable_time = int(sorted(stable)[0])
print("Time :%.2f seconds" % ((stable_time - start_time) / 60))
robot.send_text(
msg="Startup time automatic test Results :\n Device under test :{}\n Application under test :{}\n Version under test :{}\n".format(device_name, app_name,
app_version) + "Startup time :%.2f seconds" % (
(stable_time - start_time) / 60),
is_at_all=True)
except:
shutil.rmtree(test_path)
if os.path.exists(sample):
os.remove(sample)
if __name__ == "__main__":
calculate("package_name"."app/app-release.apk"."startup.pkl"."Millet MIX3"."Application name"."10.1.1")
Copy the code
Continuous integration
According to the parameters provided by the above test method, Jenkins configuration task, model training, encapsulation of the above three stages in the form of Python script, and WebHook configuration and packaging platform association, automatic verification analysis can be realized to calculate the first screen loading time of the latest package.
The effect
By manually recording the screen and then using QuickTime to view the timeline in frames, the error between the first screen loading time calculated and the result obtained by this scheme is basically within 100 ms, but this process takes about 15 minutes to take the number at a time, while this scheme only takes about 3 minutes to take the number at a time, which significantly improves the efficiency. It also avoids the problem of inconsistent collection standards for different people.