Wechat Android hot patch practice evolution road

Hot patching, which began in 2015 after the introduction of plug-ins, has become a very popular Android development technology. One of the more famous taobao’s Dexposed, Alipay AndFix and QZone’s super hot patch program. Wechat’s research on hot patch technology is not early, starting around June 2015. After studying and trying various existing schemes, we found that they all have their own limitations. Wechat finally adopted a different technical solution and walked out of its own way of practice evolution.

Technology, on the other hand, should be just one piece of a hot patch solution. With many attempts and applications of hot patches, wechat has established its own process specifications and constantly tried to expand its application scenarios. Through this article, I hope that you will not only have a comprehensive understanding of the advantages and disadvantages of each hot patch technology, but also have a more comprehensive understanding of its application scenarios. Based on this, it may be easier for you to decide whether and how to use hot patching in your own projects.

Why do we need hot patches

Hot patch: Enables applications to be updated without reinstallation, helping applications quickly build dynamic repair capabilities.

From the above definition, hot patches save time for a large Android market release. At the same time, users do not need to reinstall, as long as the online can be unaware of the update. It looks great, but does that mean we can try to use patches instead of releases? In fact, hot patch technology still has its limitations, which are mainly reflected in the following points:

The patch can be used only for a single client version. The patch size increases with the version difference.
The patch does not support all changes, such as AndroidManifest;
Patches are not 100% successful in updating either code or resources.

Since patching can’t completely replace upgrading, where does it fit in?

Hot patching can also be understood as a channel to dynamically modify code and resources, and it is suitable for situations with small changes. Take wechat’s many releases as an example, the patch size is within 300K, which has great advantages over traditional releases.

Review images

With Android users’ upgrading habits, even the relatively active wechat needs more than 10 days to cover 50% of users. With patch technology, we can cover more than 70% in one day. This is also based on the small size of the patch, which can be downloaded directly over the mobile network.

Because of this, patching techniques are well suited for use in the grayscale phase. In the past, we needed to make sure that all serious issues were fixed before release, which usually required us to go through three or more gray-scale processes, and it was impossible to quickly verify that these issues were fixed with the same users. With hot patch technology, we can quickly validate fixes to the same users, which greatly shortens our release process.

Review images

In the event of a problem or emergency bug in a release, the traditional approach requires a separate grayscale validation change and then a new release. With patch technology, we only need to launch a small number of users to verify the effect of the changes, and then launch the full number. However, this kind of publishing has a large impact on online users, so we need to be cautious. In a sense of responsibility to customers, releasing a patch is the same as releasing a version, and it should also strictly follow the full testing and rollout process.

Review images

In general, patching technology can reduce the development cost, shorten the development cycle, and achieve lightweight and rapid upgrades.

2. Remote debugging

Another pain in Android development is fragmentation. We may all encounter the “local do not reappear”, “log can not check”, “contact users do not bird you” trouble. So the patching mechanism is ideal for remote debugging. We needed the ability to send patches only to specific users, which was very helpful in finding problems.

Review images

Using patch technology, we avoid harassing users and quietly solve problems for users. Of course, this also requires very strict permission management, to prevent malicious or arbitrary use.

Data statistics also play a very important role in wechat. We also hope to combine hot patches and data statistics better. In fact, hot patches have great advantages both in general statistics and ABTest. For example, if I want to run two tests on the same group of users, the traditional way does not allow the same group of users to install both versions. Using patch technology, we can easily keep changing patches for the same group of users.

Review images

In the road of data statistics, how to combine with patch technology better, more accurate control of sample number and proportion, this is also a direction of wechat’s current efforts to develop.

Other four.

In fact, Android officially uses hot patch technology to implement Instant Run. It is divided into Hot Swap, Warm Swap and Cold Swap three ways, you can refer to the English introduction, you can also see the translation of the reference article. The latest Instant App should adopt a similar principle, but Google Play is not allowed to send code, this overseas App needs to pay attention to.

Now that we know what patching technology can and can do, let’s go back to the technology itself. Because Dexposed cannot support the whole platform, it is not suitable for application to commercial products. So here we only briefly introduce the implementation of Andfix, QZone, wechat several sets of solutions, as well as the problems they face, we can also refer to the resources of the major hot patch solution analysis and comparison article.

A AndFix.

AndFix uses native hook. This solution directly replaces the implementation of the method in class with dalvik_replaceMethod. Since it does not replace the class as a whole, and the relative address of field in the class is determined when the class is loaded, AndFix cannot support adding or deleting filed cases (replacing init and Clinit can only change the field value).

Review images

As such, Andfix can support a relatively limited number of patch scenarios, and can only be used to fix specific problems. Given the previous release process, we prefer that the patch be insensitive to the developer, i.e. he doesn’t need to know whether the change is for a patch or a release (in fact, we also use Git branch management +cherry-pick). On the other hand, there are complex compatibility issues with native replacements.

Review images

The biggest advantage of AndFix over other solutions is that it works immediately. In fact, The implementation of AndFix is somewhat similar to the hot plug of Instant Run, but due to the limitations of usage scenarios, wechat has ruled out the use of this solution at the very beginning.

The QZone solution is not open source, but Nuwa on Github takes the same approach. This solution uses the Classloader approach to achieve more friendly class substitution. And this is similar to how we load Multidex, which basically guarantees stability and compatibility. The specific principle is no longer detailed here, we can refer to this article.

In order to solve unexpected DEX problem, pile insertion is adopted in this scheme to avoid the occurrence of problems. In fact, these check rules of Android system are very meaningful, which will cause some problems in both Dalvik and Art for QZone solution.

Dalvik. In dexopt, the pre-Verify flag is written if class Verify passes, and the odex file is written after the optimize. Optimize includes inline and Quick instruction optimizations.

Review images

If staking causes all classes to be non-PreVerify, the Verify and optimize operations will be triggered when the class is loaded. This will cause certain performance loss. Wechat has carried out two tests by piling and non-piling respectively. One is to continuously load 700 classes with about 50 lines, and the other is to make statistics on the time it takes to complete the whole startup of wechat.

Review images

The average time per class Verify +optimize (depending on class size) is not that long, and it only takes once per class. However, because of the large number of classes loaded at startup, the impact is significant in this case.

Art; Art takes a new approach where staking has no impact on the efficiency of code execution. However, if a class variable or method is modified in the patch, memory address errors may occur. To solve this problem, we need to add the parent class of the class that changed the variables, methods, and interfaces, as well as all classes that call the class, to the patch pack. This can lead to a sharp increase in the size of the patch pack.

Review images

This is because in Dex2OAT, FAST * has already written down all the addresses that the class can determine. If the address of the patch pack changes at runtime, the original class will be called with an address error. What is said here may not be detailed enough. In fact, wechat spent a certain amount of time to thoroughly understand the process of Dalvik and Art in order to find out these two problems at that time. If you’re interested in this, I’ll write about it in a separate post.

In general, Qzone scheme has the advantage of transparent and simple development, and has the highest application success rate at present. However, it has certain limitations in terms of patch package size and performance loss. In particular, whether we actually apply patches or not, it will affect the performance of the application when it runs because of staking. Wechat has high performance requirements, so we did not adopt this scheme.

Iii. Wechat hot patch scheme

Is there a solution that makes development transparent but doesn’t have the drawbacks of the QZone solution? Instant Run’s Cold plug and Buck’s Exopackage might give us some inspiration, both of which are full Dex replacements. That is, we completely used the new Dex, so that there was no problem of Art address confusion, and there was no need to pile in Dalvik. Of course, given the size of the patch pack, we couldn’t just put the new Dex in it. However, we can put the differences between the old and new Dex into the patch package, and at the simplest, we can adopt the BsDiff algorithm.

Review images

In simple terms, a difference between the old and new Dex is generated at compile time, path.dex. At run time, restore the difference patch.dex with the old dex of the original installation package to the new dex. This process may be time consuming and memory consuming, so we put it in a separate background process: Patch. In order to keep the patch package as small as possible, wechat developed the DexDiff algorithm, which makes deep use of the format of Dex to reduce the size of the difference. Its granularity is each item in Dex format, which can make full use of the original Dex information, while the granularity of BsDiff is a file, AndFix/QZone is class.

Review images

I hope this part will be covered in a separate article. Here is a preparation. The general effect is shown in the picture below. In the most extreme case, our patch size was only 6.6m because we completely replaced a 13M DEX with the information of the original DEX.

Review images

But the scheme is not without its drawbacks, and it raises two problems:

Occupying Rom volume; This side is about 1.5 times the size of the Dex you modified (dexopt and Dex compressed into JAR).
An additional synthesis process; Although we put it in a separate process, the length of synthesis time and memory consumption also affect the final success rate.

Wechat’s hot patch solution is called Tinker, a nod to the goblin Tinker in Dota, which hopes to be unlimited.

Review images

The more technical details of Dex, Library, and resources are not covered here due to space limitations, but will be covered in a separate article later. Let’s finally compare these schemes as a whole:

Review images

The QZone scheme is the simplest and most successful without taking care of performance loss and patch pack size (there is no separate composition process). It also has a smaller Rom footprint than Tinker. On the other hand, the difference between QZone’s success rate and Tinker’s is about 3%.

In fact, a complete framework should also be one that is easy to use. Tinker has good support for patch version management, process management, security verification and so on. We also support gradle and named lines. Hopefully, it will be available soon.

In the previous chapter, we briefly compared the various hot patch solutions that solve the problem of how to generate and load a patch pack. However, a complete hot patch system should not stop there, it also needs to include the following aspects:

Network channel; The problem here is deciding how and to which users the patch will be pushed.
Online and background management platform; It mainly includes hot patch on-line management, history management, report analysis, alarm monitoring, etc.

I. Status quo of network channels

The network channel is responsible for delivering the fix pack to the user, both for specific users and for full users. In fact, wechat currently has the following three channel updates for hot patches:

Pull the channel; At the time of login /24 hours, we use pull to check whether there is a corresponding patch package update in the background, which is also the most common way.
Specify the push channel for the version; For version-specific channels, in case of emergency, we can deliver patch pack updates to all users within an hour.
Specify a push channel for a specific user; Remote debugging for specific users or groups of users.

In fact, for most applications, CDN+pull channels are relatively easy to implement, assuming that push channels are not implemented.

In fact, wechat hot patch release is very cautious. The entire release process is consistent with the upgraded version, and the version number must be modified, and the complete testing process must be rigorous, etc. We will also go online by grayscale, and monitor the patch version of each indicator. Here’s what we did to fully monitor the patch situation:

1 minute granularity of accumulative users of each version per hour/day, timely monitor the number of patch version and active;
The Crash statistics of 3-minute granularity are compared in the two dimensions of hourly/daily Crash of benchmark version and patch version.
Patch monitoring information is reported within 10 minutes.

Application success rate = Number of patch versions/Number of users of the patch version before the patch release Because users of the benchmark or patch version may have installed other versions, the statistics may be a little low, but it can realistically reflect the online coverage of the patch.

With Qzone, the success rate of the wechat patch was about 98.5 percent after 10 days. Tinker is only used around 95.5%, mostly due to lack of space and background processes being killed. Here we are also trying to use retry and reduce the time and memory of composition to improve the success rate.

Hot patches are moving fast, and Android’s Instant Apps are expected. But back home, we seem to be counting on ourselves to be more reliable. The needs of each application are not quite consistent, here is roughly some practical experience of wechat, I hope to help you.

The future work

With the evolution of wechat department from “single APP” to “multiple apps”, wechat is also stepping into the development practice of open source. We want to componentize each function so that it can be copied and applied quickly. “Tinker”, the hot patch framework of wechat, is also undergoing a process of separation from wechat and integration into wechat. Hopefully, in the near future, we can also open source “Tinker” and some other components in wechat.

We also hope to find an App for internal testing to provide us with valuable opinions. Users who are interested in wechat’s Tinker solution can send separate messages or leave comments at the end of articles. Please indicate your name, company and responsible App. We hope to select some products for internal test.

If you think our content is good, please scan the QR code to reward the author and send to your moments, and share with your friends

Review images

This article is the exclusive content of Tencent Bugly, please reprint at the beginning of the articleNote the author and source “Tencent Bugly(http://bugly.qq.com)”

Review images

Wechat Android hot patch practice evolution road

Why do we need hot patches

2. Remote debugging

Other four.

A AndFix.

Iii. Wechat hot patch scheme

I. Status quo of network channels

The future work

Related Posts