Hello, everyone. My name is Zhang Shaowen. I am currently in charge of Android performance optimization and terminal quality platform related work in wechat. First of all, I’m really sorry that the preparation may not be sufficient due to the release of the version this week. If you have any questions, please feel free to ask them at the end of the sharing.
Now let’s start today’s sharing. Hot patch technology is a very popular Android development technology at present, among which the more famous schemes are AndFix of Alipay and QZone’s super hot patch scheme. Wechat started to try its application in June 2015. After studying and trying various existing schemes, we found that they all have their own limitations. We finally adopted a different technical solution, namely Tinker, the open source framework of wechat hot patch.
Let’s start with some of the limitations of the existing framework:
Andfix is an open source framework launched by Alibaba. It is available on Github at github.com/alibaba/And…
Its technical principle is shown in the figure below: It adopts the method of Native hook, and this scheme is directly useddalvik_replaceMethod
Replace the implementation of a method in class.
Its disadvantages mainly include the following:
- Poor compatibility; Due to the way it uses native replacement, there is also a lot of crash feedback in Github Issue;
- The success rate is not high; There is no support for changing inline methods, methods with more than eight parameters, or methods with long,double, or float. I’ve talked to some products that use Andfix, and their success rate is no more than 40%;
- Development opacity; Since it does not support the increase of filed, we need to patch for the sake of patch, so we cannot release requirements with this technology. The nice thing about Andfix is that it works immediately, but it supports a very limited number of patch scenarios and can only be used to fix specific problems. So we’re not considering that option.
Now let’s talk about the Qzone super patch solution
This solution uses the Classloader approach to achieve more friendly class substitution. And this is similar to how we load Multidex, which basically guarantees stability and compatibility. It faces two main problems:
Unexpected DEX problem (unexpected DEX problem, unexpected DEX problem, unexpected DEX problem, unexpected DEX problem, unexpected DEX problem)
The peg causes all classes to be non-PreVerify, resulting in the Verify and optimize operations shown in the figure above being triggered when the class is loaded. This will cause certain performance loss. Wechat has carried out two tests by piling and non-piling respectively. One is to continuously load 700 classes with about 50 lines, and the other is to make statistics on the time it takes to complete the whole startup of wechat.
2. On the ART platform, if the patch class has Field, Method, or Interface changes, it may cause memory address disorder. To solve this problem, the classes in our final patch will have the following rules:
A. Modify and add classes; B. If the class has field, method, or interface changes in number, all of their subclasses; C. If the class has field, method, or interface changes in number, they and all of their subclasses call classes. If the ClassN mode is adopted, that is, multiple dex need to be processed together.
Qzone’s solution is the simplest, transparent and has a very high success rate of patches. However, because wechat is sensitive to performance and patch size, we did not adopt this solution.
So what kind of hot patch framework does wechat want? We think the main goals are as follows:
- Development transparency; The developer doesn’t have to care if it’s in a patch, he can change it at will, regardless of the framework;
- Performance has no impact; Patching frameworks should not impose performance costs on applications;
- Full support; Support code, So library and resource repair, can achieve release function;
- The patch size is small. The patch size should be as small as possible to improve the upgrade rate.
- Stable, good compatibility; To ensure the use of wechat’s hundreds of millions of users, and to minimize reflection;
Now let’s talk about the implementation of Tinker, the hot patch framework of wechat
It takes its name from the Goblin Tinker in Dota, and we hope the release will be as unlimited as it is.
Tinker’s solution comes from Instant Run compiled by Gradle and Exopackage compiled by Buck. Their idea is to completely replace the new Dex, that is, we completely use the new Dex or resources, so that there will not be the problem of Art address disorder, and there is no need to insert piles in Dalvik.
But Instant Run is compile-time, and it copies all the resulting changes directly to the phone. For online solutions, this is definitely not feasible. Therefore, the current core problem is to find a suitable difference algorithm to make the patch result smaller.
Wechat first uses BSDIff in the demo, which is irrelevant to the file format, but the effect is not particularly good for Dex, and it is very unstable. Currently, wechat still uses BSDIff algorithm for SO. Resources also partially use THE BSDIff algorithm.
Then we thought of the DexMerge algorithm to merge the modified and newly added classes with the original dex through the dexmerge method, so as to obtain the final complete DEX. Through practice, there are two core issues of DexMerge:
- Unable to delete class; As a result, there will be repeated loading classes in Dalvik platform, which requires that we can only use miniloader loading scheme to avoid;
- Large memory footprint during composition; The DexMerge library is used in PC scenarios, where there is not much consideration for memory. Its peak memory can be four to six times the size of the input dex. With a dex of 12M, peak memory might reach more than 70m.
Finally, we decided to develop a Dexdiff algorithm based on the format of DEX, which needs to achieve the following objectives;
- Diff results were small;
- The synthesis process occupies little memory;
- You can delete, add, and modify classes in dex.
The main principle is to make in-depth use of the information in the original DEX and process each section of dex. This section is not in-depth today, but interested students can exchange or read the source code.
In terms of memory, dexdiff peak memory was about twice that of DEX, achieving the expected result.
For more information on wechat hot patches, you can read my previous post. The evolution of wechat Android hot patch practice
Then let’s look at Tinker’s framework design, which mainly includes the following parts:
- Patch synthesis; All these work in the separate patch process, including dex, SO and resources, mainly to complete the synthesis and upgrade of patch package;
- Patch loading; If we load dex, SO and resources synthesized by us through reflection system;
- Monitoring callback; In the process of synthesis and loading, there is a problem in time to call back;
- Version management; Tinker supports patch upgrades, even multiple patches. Here we need to ensure consistency across all process versions;
- Safety check; Both in patch composition and loading, we need to have the necessary security checks.
In wechat, we have added more than 100 real-time reports to Tinker framework, monitoring possible problems in each process:
Let’s take a look at some of the problems we encountered while developing Tinker:
1. OTA of the manufacturer; For Art platforms, Dex2OAT takes a longer time. Especially after vendor OTAs, all dynamically loaded code needs to be re-executed dex2OAT. This is because the Boot Image has changed, but the system only re-oAT classn. dex during the upgrade.
For the patch DEX, the main process synchronously performs Dex2OAT. This time is very long, and ANR is likely to appear, especially for the development version of some products such as Xiaomi.
This is why we are now trying to implement platform-specific compositing, that is, on the Art platform, compositing only the classes that are required under the rules. Re-dex2oat time is acceptable as long as it is not a full replacement.
2. Android N mixed compilation causes the patch mechanism to fail; This section spent some time reworking the Android N Art code, and a detailed analysis can be found in my previous post. Android N hybrid compilation and hot patch impact analysis
3. Dex reflection succeeds but does not take effect. In the beginning, we loaded the patch dex using makedexElement. But on hundreds of thousands of machines, the patch loaded successfully, but with older versions of the code. Some machines are similar to the Samsung S6 502 system, and although the reflection pathList is successful, the search order is still base.apk first.
The solution here is similar to Instant Run, reflecting the parent ClassLoader. It has to be said that the Instant Run increaseClassLoader implementation is pretty neat.
4, Xposed and other wechat plug-ins; There are a variety of wechat plug-ins on the market, which will load the classes in wechat in advance before wechat starts, which will cause two problems:
A. On Dalvik platform, the crash of “Class ref in pre-verified Class Resolved to Unexpected implementation” directly appears; B. On the Art platform, some classes use old code, which may lead to invalid patches or incorrect addresses.
Their fundamental reason is Xposed reflection call, import some of our classes in advance.
In fact, we do need to have a secure mode because of the possibility of improper patching and other issues. That is, when the application fails to start or crashes for many times, the process of patch cleaning or upgrade is entered.
Some might find Tinker too bloated and complex. This is because the hot patch doesn’t just load a dex or so file, it actually cares about a lot of details. Process consistency, control of the scope of modifiable classes, version management, extensibility, and so on.
Tinker’s future plans are to truly open source, with all the code related to platform composition and resources to be delivered in the next week or so. Then after the company’s open source audit, we will open source on Github about the synthesis method of So resources and technical details of Dexdiff. If you are interested, you can ‘read the ** source code’ or communicate with us.
Due to the limited time, today’s share here, due to the haste of preparation, again to you apologize.
Q0: How does the Patch process communicate with the main process? Q0: The main process is an intent service that accepts the result of a patch. The patch process is an intent service that accepts an unrequested patch. Q1: The patch process is an intent that accepts an unrequested patch. Q1: Sub-platform synthesis is in Dalvik platform, we synthesize the full dex, which can avoid the requirement of piling. On the Art platform, we only compose classes under the above three conditions. The difficulty here is that the same diff code can be composed in different ways.
Q2: Are there any good solutions to patch failure caused by insufficient internal space? Q2: For our solution, it does take up a lot of space. We have two solutions: 1. Check the remaining space of the user in advance before patch. If the remaining space of the user is too small, no attempt is made. 2. If this fails, we will have a callback, and then we will periodically retry three times. You can also prompt users. B0: Is the code completely open source? B0: Yes, all code will be open source, from compilation to modules.
B1: How many activities does wechat Android version have now? Is it an activity with multiple fragments? B1: I’m sorry, this question has nothing to do with this sharing. In fact, I suggest you decompile the wechat code to study.
B2: Xposed framework for those plug-ins, is by reflection call replacement value? What is the general way to ensure security? Ensure the security of APP data B2: They only need to reflect some classes of wechat to achieve some function tampering. In fact, pure protection is more difficult if you are under root.
B3: Why do you add the result callback when the patch is successful in order to start the program, but in order to report the result in real time B3: callback is to give the user a callback in which it can do all kinds of work. Let’s say I pop up a dialog that has been upgraded. I set the lock screen or commit suicide after the application goes into the background, which can speed up the application of the patch
B4: Since it can load SO and resources, can Tinker be used for plug-in? B4: Tinker does not currently code the four major components, but Tinker will definitely have this capability in the future
B7: Tinker is currently being developed and maintained by three people
B8: Is the resource compiled into arSC or reflection loaded binary stream? B8: I don’t quite understand your question. We have adopted full replacement of resources, that is, completely using the new resource pack
Q6: How can patchCoreSDK bypass loading class Accesserror across dex after changing classLoader? Is there mandatory access isolation for patchcoreSDK? Q6: Yes, Tinker framework is divided into two parts, the core loading code, called loader class, there are about a dozen classes, they are not allowed to change. Most of the other Tinker classes can also be modified through patches. Tinker framework has dealt with this, that is, in the newly synthesized Dex, we have deleted loader-related classes, thus completely avoiding this problem
Q7: How to restart other processes after patch is successful? Q7: To ensure the uniqueness of each process, we have a version management file to record the current patch version. It is divided into old and new fields. At the same time, only the patch process can modify the new field, only the main process can modify the old field, and all other processes will only load the patch version of the old field when starting. The main process can then initiate a version upgrade by assigning the new field to the old field, at which point the main process kills all other processes to ensure uniformity
The above are from DEV CLUB wechat live group, collated and published in Diycode to share with you. Follow us at WeMobileDev