This series of articles is divided into three parts according to the share of Fei Tai, senior development engineer of mobile Taobao client infrastructure, at the Android Green Alliance developer Conference, to introduce the design principle and implementation ideas of emAS-MoTU, a systematic improvement scheme for performance and stability of Mobile Taobao technology team.

This paper will focus on how to conduct performance and stability management of mobile high availability platform.

Performance management

The main thread is stuck

The main thread stuttered because the main thread exceeded the threshold of messages, causing the page to lose frames. By taking over the main thread message distribution mechanism, mobile shopping can obtain the message distribution time and message type, so as to locate the specific business triggering the main thread delay and carry out targeted governance.

In addition, when using the SharedPreferences of the system, it is found that the page jump leads to ANR of the interface. Through reading the source code of the system, it is found that when it makes Receiver or Service, it will force all the contents of SharedPreferences Apply to be written into the document, resulting in ANR. To address this problem, handwash overwrites SharedPreferences to improve performance and reduce the occurrence of such ANR problems.

A memory leak

The Mobile Team has devoted a lot of time to managing memory leaks. On the one hand, it takes over the life cycle of the underlying component of the system. When the life cycle of the component is destroyed, a WeakReference is made to it, and then whether the object leaks is determined according to GC event triggering. On the other hand, in the Native layer, the malloc and free methods at the bottom layer of Hook operating system are used to calculate the memory processing condition of each SO, and the difference between MALloc and free is compared with the whitelist to determine whether there is memory leakage.

Memory misuse

What is Memory misuse? For example, when you develop a 100×100 view, you actually use a 200 × 200 or larger Bitmap. For example, if you place an image in the system’s drawable directory and display it on a high-definition device, it will stretch it according to the system’s own principles. In this case, the original need of only 1 unit of memory (100×100) may become 16 units of memory (400×400), memory waste rate reaches more than 90%. In view of this situation, hand Amoy made a memory use improper investigation plug-in.

The same goes for video, where playing high-quality video on a low-resolution device not only won’t make for a better user experience, but can cause the device to stagger. There is also the issue of image holding, when the page has sunk to the bottom of the stack, it is best not to keep the image of the previous page. This ensures that there is enough memory for the front page, otherwise it is easy to get OOM as the page hierarchy progresses.

Resource leaks

Hand washing mainly takes over the two Native method functions of open and close at the bottom of the system to govern resource leakage. If open and close are not paired and the service does not accompany the entire application life cycle (the files that accompany the entire application life cycle are whitelisted), the operation may leak resources. The platform will inform the corresponding developer to check and rectify the abnormal behavior. Database governance also uses this solution.

Threading issues

The threading problem is more complicated. Unexpected problems, such as Out Of Memory errors, may be raised during thread creation. Out Of Memory errors can be caused by thread creation failures. Therefore, Handshopping takes over thread creation. When a business is created, it is possible to cluster its method call stack to know how many threads are created for each business and whether thread creation is reasonable. It is recommended that application developers name their threads properly when creating them so that they can quickly locate specific business parties.

Traffic monitoring

By taking over the Socket protocol, handwashing analyzes the size information of the protocol header to obtain the request and return data content for traffic monitoring and management. If any abnormality is found, you can let the developer locate and solve it. At the same time, it can also monitor the background traffic behavior and observe whether there are still a large number of network requests after the APP is cut to the background.

Equipment rating

Android devices bloom together, Handtao has adopted a scoring method for rating different devices, and adopts different strategies according to device scores to display corresponding pictures, videos and business, so as to bring users the best performance experience. This equipment rating scheme can provide guidance and suggestions for developers to better show the business form.

Layout performance

HierarchyViewer is often used to check if the layout structure makes sense during development. Hand-tao wrote a set of algorithms to check whether the page structure is reasonable, whether the page level is too deep, whether there is room for further optimization of the page level. At the same time, I also realized a set of OverDraw algorithm, which level can be optimized to the developer students, how to lower the level, how to solve the problem of OverDraw.

User experience optimization

Mobile shopping is very concerned about user experience, including the startup time, the opening time of each page, etc. By monitoring the start-up time of each sub-task, as well as the rapid analysis of this information, determine the specific reason for the quality change of each release.

4. Optimization of X device experience

As the product becomes more and more functional and larger in size, 4.X devices suffer from slower and slower Multidex. Based on this phenomenon, Handtao has reconstructed Google’s Support Multidex package. After the support Multidex scheme was reconstructed, the performance of the 4.X device improved with more and more Dex loading.

Disaster memory

Memory Dr Hand wash has been concerned about memory Dr. When the user uses the current page, if the memory is insufficient, the expectation of the background page can quickly release memory resources, for the front page service. Hand tao disaster developed memory plug-in, monitored JVM GC events and wheel for physical page can be used to calculate the real physical memory, through the calculation to hand for business parties send corresponding memory level events, if it is to belong to a very high risk of memory events, can let the background fast release cache resources, to provide better services to the visual page.

Stability management

Stability governance mainly consists of two parts, Java Crash and Native Crash.

Java Crash governance

Java Crash governance is implemented by taking over UncaughtExceptionHandler and getting the specific Java Crash information and its stack. There is also a common problem with Java Crash management, which can be caused by the virtual machine running out of memory, or the thread creation failure.

Native Crash governance

By capturing semaphores, hand shopping can govern Native Crash. When an exception occurs in Native, a subprocess is created to dump the thread information of Native Crash context through ptrace. Based on the performance governance method mentioned above, OOM Native Crash can be taken over by MALloc and Free to locate which SO causes the occurrence of this problem and take care of it.

The next installment will focus on emAS-Motu hotfix solutions and development process. Stay tuned!

Review of the previous period

Exploring the high availability platform of Mobile Shopping (I) — metric index and data platform