As a
those
Yang Jin is a senior engineer at Tencent mobile client development
Commercial reprint please contact Tencent WeTest for authorization, non-commercial reprint please indicate the source.
WeTest takeaway
Currently the dominant memory monitoring tool in iOS is Instruments Allocations, but only for development. This article describes how to implement an offline memory monitoring tool to find memory problems after the App goes online.
FOOM (Foreground Out Of Memory) refers to the fact that the App is forced to kill because it consumes too much Memory in Foreground. To the user, it behaves like crash. As early as August 2015, Facebook proposed the FOOM detection method. The general principle is that after excluding various cases, the remaining cases are FOOM. https://code.facebook.com/posts/1146930688654547/reducing-fooms-in-the-facebook-ios-app/.
Wechat launched FOOM report at the end of 2015. According to the initial data, the number of FOOM times per day and the number of login users accounted for nearly 3%, while the crash rate was less than 1% during the same period. At the beginning of 2016, a certain boss of Dong gave feedback that wechat frequently flashes back. After pulling more than 2G logs with difficulty, it was found that KV reported frequent log hitting and caused FOOM. Then in August 2016, many external users reported that wechat had flash back shortly after it started. After analyzing a large number of logs, FOOM was still unable to find the reason. Wechat urgently needs an effective memory monitoring tool to find problems.
one
Realize the principle of
The initial version of wechat memory monitoring used Facebook’s FBAllocationTracker tool to monitor OC object allocation, fishhook hook Malloc/Free and other interfaces to monitor heap memory allocation, and every one second, The current number of all OC objects, the TOP 200 maximum heap memory and its allocation stack, with text log output to the local. The solution is simple to implement and completed within a day. By delivering TestFlight to the user, it is found that FOOM is caused by migrating DB to load a large number of contacts in the contact module.
However, there are a number of disadvantages to this scheme:
1. The monitoring granularity is not fine enough, such as quality changes caused by large allocation of small memory, which cannot be monitored. In addition, Fishhook can only be called by the C interface of its own app, which has no effect on the system library.
2, the log interval is not well controlled, too long interval may lose the middle peak, too short interval will cause power consumption, I/O frequency and other performance problems;
3. The reported original log relies on manual analysis and lacks good page tools to show and classify problems.
So Version 2 focuses on four Allocations, data collection, storage, reporting, and presentation.
1. Data collection
In order to solve the ios10 nano crash at the end of September 2016, I studied the source code of libmalloc, and accidentally found several interfaces:
When the malloc_logger and __syscall_logger function Pointers are not empty, malloc/free, vm_allocate/ VM_deallocate and other memory allocation/release Pointers are used to inform the upper layer. This is how the memory debugging tool Malloc Stack works. With these two function Pointers, it is easy to record memory allocation information (including allocation size and allocation stack) for the current living object. The allocation stack can be captured using the Backtrace function, but the captured addresses are virtual memory addresses and symbols cannot be parsed from the symbol table DSYM. The offset slide for each image load is also recorded, such that symbol table address = stack address -slide.
In addition, to better classify data, each memory object should have a Category to which it belongs, as shown in the figure above. For heap memory objects, its Category name is “Malloc” + allocation size, such as “Malloc 48.00KiB”; For virtual memory objects, vm_allocate is used to create virtual memory. Flags indicates what kind of virtual memory it is when vm_ALLOCATE is invoked. Flags corresponds to the first parameter type of __syscall_logger. The specific meaning of each flag can be found in the < Mach/vm_statiss.h > header file. For the OC object, its Category name is the OC class name, which we can get by hook OC method +[NSObject alloc] :
NSObject alloc [NSObject alloc] [NSObject Alloc] [NSObject Alloc] [NSObject Alloc] [NSObject Alloc] [NSObject Alloc] [NSObject Alloc] [NSObject Alloc] In the apple open source code CF 1153.18 finally found the answer, when __CFOASafe = true and __CFObjectAllocSetLastAllocEventNameFunction! =NULL; CoreFoundation uses this function pointer to tell the upper layer what type the current object is:
In this way, our monitoring data sources are basically Allocations, through private apis of course. Without enough “tricks”, private apis won’t make it to the Appstore, and we’ll have to settle for the next best thing. Change the malloc_zone_t structure malloc_default_zone malloc, free and other functions in the malloc_zone_t structure can also monitor heap allocation, the effect is the same as malloc_logger; Virtual memory allocation can only be done through Fishhook.
2. Data storage
Live Object Management
The APP allocates/frees a lot of memory during runtime. In the above picture, 800,000 objects have been created and 500,000 objects have been released within 10 seconds of wechat’s launch. Performance is a challenge. In addition, memory requisition/release is minimized during stored procedures. So SQLite was abandoned in favor of a more lightweight balanced binary tree for storage.
Splay Tree (Splay Tree), also called split Tree, is a kind of binary sorting Tree. It does not guarantee that the Tree is balanced, but the average time complexity of various operations is O(logN), which can be approximately regarded as balanced binary Tree. Compared to other balanced binary trees (such as red-black trees), it has a smaller memory footprint and does not need to store additional information. The main starting point of extended tree is to consider the principle of locality (a node that has just been visited will be accessed again next time, or the node that has been visited more times may be accessed next time). In order to reduce the whole search time, the node that is frequently queried will be moved to a place closer to the root through the operation of “extension”. In most cases, memory requests are quickly released again, such as autoreleased objects, temporary variables, etc. The OC object, on the other hand, will update its Category immediately after it allocates memory. So it’s best to manage with stretched trees.
The traditional binary tree is implemented by linked list. Every time a node is added/deleted, memory is allocated/released. To reduce memory operations, binary trees can be implemented using arrays. To do this, the left and right child of the parent node is changed from the previous pointer type to integer type, representing the child in the array subscript; When a node is deleted, it is placed with an array index of the node being released.
The stack is stored
According to statistics, during the operation of wechat, there are millions of kinds of backtrace stack. When the maximum stack length is captured, the average stack length is 35. If 36bits store an address (armV8 has a maximum virtual memory address of 48bits, but 36bits is sufficient), the average storage length of a stack is 157.5bytes, and 1M stacks require 157.5M of storage space. But looking at the breakpoint, most stacks actually have common suffixes. For example, the last seven addresses of the following two stacks are the same:
To do this, you can use a Hash Table to store these stacks. The idea is that the entire stack is inserted into the table as a linked list, and the linked list node stores the current address and the index of the last address of the table. The hash value of each address is calculated as the index in the table. If no data is stored in the slot of the index, the linked list node is recorded. If there is data stored, and the data is consistent with the linked list node, the hash hit, proceed to the next address; Inconsistent data indicates hash conflicts and the hash value needs to be recalculated until the storage conditions are met. Here’s an example (simplified hash) :
1) insert G, F, E, D, C, A of Stack1 into Hash Table in sequence, index 1 to 6 nodes are (G, 0), (F, 1), (E, 2), (D, 3), (C, 4), (A, 5) respectively. Stack1 index entry is 6
2) Insert Stack2, because the data of nodes G, F, E, D, C are consistent with the first 5 nodes of Stack1, hash hit; B inserts a new position 7, (B, 5). Stack2 index entry is 7
3) Finally insert Stack3, and nodes G, F, E, D hit hash; Insert node (A, 4); insert node (A, 4); insert node (A, 4); insert node (A, 4); The index of the address A above B is 8, instead of the existing (B, 5), the hash is not matched, search for the next blank position 9, insert node (B, 9). Stack3 index entry is 9
After this suffix compression storage, the average stack length is reduced from 35 to less than 5. The storage length of each node is 64bits (36bits store the address and 28bits store the parent index), and the hashTable space utilization rate is 60%+. The average storage length of a stack only needs 66.7bytes, and the compression rate is as high as 42%.
The performance data
After the above optimization, the CPU usage of memory monitoring tool running on iPhone6Plus is less than 13%. Of course, this is related to the amount of data, and heavy users (such as too many groups, frequent messages, etc.) may have a slightly higher CPU usage. The memory for storing data occupies about 20M, and files are mapped to the memory in MMAP mode. For more information about mmap benefits, please Google.
3. Data reporting
Since memory monitoring stores the memory allocation information of all living objects, the amount of data is very large. Therefore, when FOOM appears, it cannot be reported in full. Instead, it is reported selectively according to certain rules.
First, classify all objects by Category, count the number of objects in each Category and the size of allocated memory. This list data is very small, can do full report. Then merge all the same stacks under the Category and calculate the number of objects and memory size for each stack. For some categories, such as allocating TOP N sizes, or UI-related (such as UIViewController, UIView, etc.), the stack that allocates TOP M sizes is reported. The reporting format looks something like this:
4. Page presentation
The page presentation references Allocations, showing which categories are available, each Category is allocated size and number of objects, and for some categories you can see the allocation stack.
In order to highlight the problem and improve the problem solving efficiency, the background first identifies the Categories (Suspect Categories above) that may cause FOOM. The rules are as follows:
● The number of UIViewController is abnormal
● Whether the number of UIViews is abnormal
● Whether the number of UIImage is abnormal
● Whether the size of other categories is abnormal and the number of objects is abnormal
Then calculate the eigenvalue for the suspicious Category, which is the OOM reason. The eigenvalues are composed of Caller1, Caller2, Category, Reason. Caller1 refers to the memory allocation point, and Caller2 refers to the specific scenario or business, which are extracted from the first size allocated stack under Category. The Caller1 extract is as meaningful as possible and is not assigned to the last address of the function. Such as:
After the eigenvalues of all reports are calculated, they can be categorized. The primary classification can be Caller1 or Category, and the secondary classification is the aggregation of features related to Caller1/Category. The effect is as follows:
The primary classification
The secondary classification
5. Operational strategy
As mentioned above, memory monitoring will bring certain performance loss, and the amount of data to be reported is about 300K each time. Therefore, full reporting puts certain pressure on the background, so enable sampling for live network users, and enable 100% for gray package users, company internal users, and whitelist users. Only the latest data can be stored locally.
two
Reduce misjudgment
Let’s review how Facebook determines if FOOM was in the last startup:
1.App has not been upgraded
2.App does not call exit() or abort() to exit
3. The App did not crash
4. The user does not forcibly cancel the App
5. The system is not upgraded or restarted
6. The App was not running in the background
7. FOOM App
1, 2, 4, and 5 are relatively easy to determine. 3 depends on the crash callback of its own CrashReport component, and 6 and 7 depend on ApplicationState and foreground switch notification. Since wechat launched FOOM data report, there have been many misjudgments. The main situations are as follows:
ApplicationState no
Some systems will briefly call up the app in the background. The ApplicationState is Active, but not BackgroundFetch. After didFinishLaunchingWithOptions dropped out, also have received BecomeActive notice, but soon also exit; The whole startup process lasts from 5 to 8 seconds. The solution is that one second after receiving the BecomeActive notification, the startup is considered to be a normal foreground startup. This method can only reduce the probability of misjudgment, but not completely solve.
Group control plug-ins
This kind of plug-in is the software that can control iPhone remotely. Usually, a computer can control multiple phones, and the computer screen and the phone screen can be operated synchronously in real time. For example, wechat can be opened, friends can be added automatically, moments can be sent, and wechat can be forcibly quit. The solution is to reduce such misjudgments only through security background strikes.
The CrashReport component crashes without calling the upper layer
Wechat once broke out a large number of GIF crashes at the end of May, 2017. The crash was caused by memory out of bounds. However, when receiving the crash signal and writing crashlog, the component could not write crashlog normally due to the damage of the memory pool, and even caused a second crash. The upper layer could not receive the crash notification, so it misjudged as FOOM. If the last crashlog exists locally (whether complete or not), the APP restarts as a result of the crash.
The previous decca death causes the system watchdog to forcibly kill
This is also known as 0x8BadF00D, which is usually caused by too many foreground threads, deadlocks, or persistently high CPU usage, which cannot be caught by the App. To this end, we combined the existing stuck system, the current platform was caught at the last moment of running, we think this startup was forcibly killed by watchdog. At the same time, we classified the new restart cause from FOOM as “APP restart due to the death of the desktop”, which is listed as the focus of attention.
three
results
Wechat since March 2017 online memory monitoring, solved more than 30 large and small memory problems, involving chat, search, circle of friends and other businesses, FOOM rate from the beginning of 17 years 3%, down to the current 0.67%, and the former card death rate from 0.6% to 0.3%, the effect is particularly obvious.
four
Q&A
UIGraphicsEndImageContext
UIGraphicsBeginImageContext and UIGraphicsEndImageContext must appear in pairs, or it will cause leakage of the context. In addition, XCode’s Analyze can also scan for such problems.
UIWebView
UIWebView takes up a lot of memory in your APP, whether it’s opening a web page or executing a simple piece of JS code. WKWebView not only has excellent rendering performance, but also has its own independent process, some webpage related memory consumption is moved to its own process, the most suitable for UIWebView.
autoreleasepool
Autoreleased objects are usually released at the end of the runloop. If a large number of Autoreleased objects are generated in the loop, the memory peak will spike to OOM levels. Adding autoReleasepool properly can free up memory in time to reduce spikes.
Refer to each other
The easiest place to cross-reference is when a block uses self, and self holds the block, which can only be avoided by code specification. In addition, the target of NSTimer and the delegate of CAAnimation are strong references to Obj. Wechat currently circumvents this problem with its own implementation of MMNoRetainTimer and MMDelegateCenter.
Large image processing
For example, the old image zoom interface was written like this:
However, when processing large resolution images, OOM is often easy to appear, because -[UIImage drawInRect:] decodes the image first and generates the original resolution bitmap during drawing, which consumes a lot of memory. The solution is to use the lower-level ImageIO interface to avoid intermediate bitmap generation:
Large view
A large View is a View that is too large to contain the content to be rendered. Super-long text is a common explosion group message in wechat, usually thousands or even tens of thousands of lines. Drawing it into the same View would consume a lot of memory and cause serious lag. The best way is to divide the text into multiple views to draw, using the reuse mechanism of TableView, reduce unnecessary rendering and memory footprint.
Recommend the article
A few final iOS memory-related links are recommended:
● Memory Usage Performance Guidelines
https://developer.apple.com/library/content/documentation/Performance/Conceptual/ManagingMemory/ManagingMemory.html#//ap ple_ref/doc/uid/10000160-SW1
No pressure, Mon!
http://www.newosxbook.com/articles/MemoryPressure.html
Tencent WeTest iOS pre-review tool
In order to improve the approval rate of IEG’s apple review, Tencent specially established the Apple review test team and created the product of iOS pre-review tool. After one and a half years of internal operation, the iOS approval rate of Tencent’s internal apps has increased from an average of 35% to over 90%.
Now Tencent internal product review experience, in the form of online tools to share to you. It can be used online on WeTest Tencent Quality open platform. Click “Read the original article” in the lower left corner to experience it immediately!
If you have any questions, please contact Tencent WeTest enterprise QQ: 800024531
IOS Pre-audit service
[Scanning tool] Upload IPA package, pictures, videos and application description for testing; Multi-dimensional automatic scanning of the rejected risk of materials for questioning; Give full scan report within 1 hour.
[Expert prequalification] Tencent experts will traverse all the functional modules of the App for you; Fully expose the risk of App content rejection; Follow up issues until online (official rejection email is required).
[Expert consultation] One-to-one service from senior pre-audit experts; Consultation time flexible optional, on demand purchase; Solve audit problems with a clear target.
[ASO Optimization] The professional team made a multidimensional in-depth analysis of the CURRENT SITUATION of App ASO; Filter highly relevant keywords around App target user groups; Help increase your App’s visibility in the Apple App Store.