QFix is a new hot patch solution for Android, which can effectively avoid unexpected DEX in Dalvik without affecting app runtime performance (no need to plug into preverify). It is also a lightweight implementation: All you need to do is call a very simple method.

Hot patch scheme and the use of hand Q

Since the hot patch technology of Android began to appear in 2015, various solutions and frameworks have emerged one after another. The original technical solutions mainly include the following:

Hand Q started to study the patch scheme last year. At that time, Tinker of wechat had not been launched. Considering compatibility and stability, he chose the Scheme of Java reflection hack Classloader, which was similar to the mature dex in principle at that time. The main difficulty was how to solve the “unexpected DEX” abnormality under Dalvik discovered by Qzone. Since no other method was developed, Qzone’s original solution of inserting piles to preverify was adopted. Since January 2016, the hot patch has been put into use in the official version of Q. So far, more than ten problems have been solved, and the repair effect is very obvious and the stability is very good.

Performance cannot be improved and needs to be changed

The reason staked solutions affect runtime performance is that all classes in the app pre-embed an empty class that references a separate dex, causing preVerify to fail during the Dexopt installation phase, and the runtime to verify+optimize again. Recently, when we tried to optimize the startup performance of hand Q through ReDex, we found:

  • The existing pile insertion of hand Q is retained, and the startup performance is not optimized
  • The dex distribution of hand Q startup related class was optimized by removing the insertion pile, and the startup performance was improved by 30%

In addition, even if the later release version of Hand Q actually does not need to issue patches, we also need to embed the logic of piling, which itself is not reasonable, so it is necessary to explore a new direction, not only retain the ability of patching, but also remove the negative effects brought by piling.

Reanalyze the unexpected DEX exception

To find a new solution, we need to go back and analyze the conditions under which this exception occurs:

This is a piece of dalvik source code, which will be called when the class in the patch is used for the first time after the patch is installed. Exceptions can occur only when the three conditions marked in the figure are met at the same time. The meanings of these three conditions are as follows:

It can be seen that Qzone’s piling scheme breaks through the restriction of Condition 2 (the preverify mark of all referenced classes is removed), while wechat Tinker’s dex incremental synthesis scheme breaks through the restriction of Condition 3 (patch and app dex are replaced after synthesis, There are two classes in the same dex in the app originally, but one of them will still be in the same dex after being patched). Is there any way to start from condition 1? If fromUnverifiedConstant is true, you can verify that it is true.

Blog.csdn.net/xwl198937/a…

The main idea of ** is: ** Whenever the system calls this method, use native hook to intercept the system method, change the entry parameter of the method to true, but similar to Andfix, Native hook method has various compatibility and stability problems, and it intercepts a method that involves dalvik basic function and is called frequently at the same time, so there is no doubt that the risk will be much greater.

Find a new “continent”

The method where this logic is located is dvmResolveClass, which is called by a reference between classes. The entry parameters are ClassObject for the referenced class, classIdx for the referenced class, And whether the dalvik instruction associated with the reference is const-class/instance-of, which returns the ClassObject of the referenced class. After repeated reading and analysis, I finally found a detail that can be used:

DvmResolveClass will first look for the referenced class from the cache of the parsed class in the current dex at the beginning. If it finds the referenced class, it will return directly. If it cannot find the referenced class, it indicates that the referenced class has not been loaded. Unexpected DEX all subsequent parsed references to the patch class will not go into the following “unexpected DEX” exception logic. The related logic of the parsed get/set class in DEX is as follows:

Based on the above analysis, I came up with an idea: it only needs to be able to successfully break one of the above three conditions when first referring to the patch class. Qzone breaking Condition 2 and Tinker breaking condition 3 are too heavy and have lasting effects, while condition 1 is easy to start with: After the patch is installed, the patch class is actively referenced in const-class/instance-of mode in advance. This reference will trigger the loading of the patch class and put the reference into the parsed class cache of DEX. When the actual business logic of APP references the patch class, it can be directly obtained from the parsed cache. This simply bypassed the “unexpected DEX” exception, and it simply executed a lightweight statement with no additional impact.

In addition, considering the case of multiple dex, the patch class is likely to be referenced by classes in multiple different dex, so do we need to find a reference class in each dex to pre-reference the patch class? If the reference class in the app and the patch class are originally in the same dex, the reference class may be preVerify, in which case pre-reference is required; If the reference class is not originally in a DEX, it is definitely not preverify because it has dependencies on other dex classes. In this case, condition 2 is not met, so there is no need to pre-reference. Therefore, it can be inferred that it only needs to pre-reference the dex corresponding to the original APP for the patch class.

After sorting out the ideas, I immediately verified them in a simple demo:

The patch in the demo contains the class BugObject. By comparison, if the code does not contain the pre-referenced logic shown in the red box above, unexpected DEX occurs. If this line of code is added, the Demo works fine and the patch fixes take effect. According to dexdump, it is true that the patch class is referenced by the const-class directive first.

It’s not that simple. The initial plan won’t work

In the above demo, classes included in the patch are pre-embedded, but in practice, we cannot pre-set which classes to be patched. In dex, the reference instruction of the patch class const-class/instance-of mode is determined at compile time, but the specific classes need to be determined dynamically at run time. So this dynamic approach didn’t work. The original idea was to pile all the classes in the app into a const-class, but there were obvious problems:

1) due to the number of classes in the app a lot, all the classes in advance reference unified in one place is certainly not reality, need to disperse in multiple areas, only to patch class in a few area to perform the operation of the reference in advance, but how to grasp the partition granularity is bad here, and the class in the app and quantity has change, we made some attempts, But there is no ideal solution to consider.

2) Pre-reference parsing all classes will increase the loading time of the reference class and the execution time of the reference statement itself. The execution time can be optimized by adding conditional judgment. If the class to be parsed is in the patched class name list, the statement will be executed; otherwise, it will not be executed. The initial test results are as follows (a partition contains about 500 classes and further distinguishes preVerify from two classes in the test patch pack) :

Based on the test data, the loading time is long and the patch class is unpredictable, and if it happens to be spread across multiple zones, the cumulative time impact will be much more severe.

3) The implementation of this scheme is particularly tedious and not practical

Determine the final plan

The new solution cannot be implemented in the Java layer, so it tries to start from the native layer. When the patch class is first referred to and resolved, dalvik’s dvmResolveClass method is directly called through JNI. Native hook fromUnverifiedConstant = true; native hook fromUnverifiedConstant = true;

  1. The dvmResolveClass method is in dalvik’s system library /system/lib/libdvm.so, and can obtain the handle of the system library through dlopen
  2. Get the address of the dvmResolveClass method through DLSYm
  3. DvmResolveClass: 1) Reference the ClassObject of the referrer: Here we need to set a reference class and get the ClassObject for that class. FromUnverifiedConstant is the classIdx of your app’s original dex. This classIdx can be used to find the resolved class in the dex or obtain the name of the class. On the C/C++ layer, this value can be fixed to either 1 or true

The key here is to get the values of the first two parameters. The first parameter refers to the ClassObject of the class, originally referenced by the dvmFindClassNoInit method called in dvmResolveClass. But this method takes two arguments to get the ClassObject of a class, where the class name is easy to construct, but requires additional operations to get the address of the ClassLoader object that refers to the class, and then finds a more convenient method, dvmFindLoadedClass:

This method only uses the descriptor of the class passed in, but it must be a successfully loaded class. After patch injection, it is not difficult to find a fixed successfully loaded reference class in each dex. For the main dex, the XXXApplication class is used directly. For other dex, the dex scheme of hand Q has the following logic: Whenever a dex injection is completed, Hand Q will try to load a fixed empty class in the dex to verify whether the injection is successful, so this fixed empty class can be used as the reference class of the patch. The second parameter, classIdx, can be obtained with dexdump -h:

This process can be done automatically with a small program:

Input: all dex of the original APK and all class names of the patch package Output: dex number of each class in the patch package and classIdx value Note 1: If a class that does not exist in the original app is added in the patch, the new class will only be referenced by the patch dex, that is, the class in the same dex, so the new patch class does not need to resolve the reference in advance. Note 2: Because “unexpected DEX” exceptions occur in dalvik’s implementation and do not exist in ART mode, the above pre-referenced patch class logic should only be used on systems up to 5.0.

The overall implementation process of the final new scheme is shown in the figure below:

As you can see, the new solution is a very lightweight implementation, requiring a very simple JNI method call to solve the problem, neither pre-pepping preVerify at build time, nor the full composition of dex after downloading the patch.

Compatibility issues and resolution

Since this scheme is native layer, we have fully verified the compatibility through mass testing:

  1. Export symbols for different system versions: In the 2.x version, Dalvik is written in C, and in the 4.x version above 2.3, it is written in C++. Based on the C++ Name Mangling principle, dvmFindLoadedClass will change to _Z18dvmFindLoadedClassPKc after compilation. However, after IDA disassembly libdvM. so analysis, dvmResolveClass did not change

  2. Yunos ROM compatibility issues: In the first mass test task, 446 users participated, and 6 of them reported that the patch did not take effect. According to the result code of the feedback, all the problems were caused by the successful loading of libdvm.so, but the exported symbol was NULL. Later, it was found that all the 6 users installed yunos ROM.

Libvmkid_lemur. So, yunos’s dalvik implementation is actually in the later library, and disassembly shows that the exported symbol name has also changed, but the internal implementation logic has not changed:

dvmResolveClass -> vResolveClass
_Z18dvmFindLoadedClassPKc -> _Z18kvmFindLoadedClassPKcCopy the code

The above two possible symbol names can be considered when dlSYM calls, which have been successfully solved by local and above problem users’ verification again.

  1. Compatibility problems of x86 platform: After yuNOS compatibility problems were solved, 1884 users participated in the second crowdtesting task, and 3 users reported abnormal results. It was found that all the problematic users were from x86 platform. Since the compatibility of X86 platform was not made at the beginning, there were two kinds of anomalies in the dynamic library of ARM platform running on x86 phones:

A) Some mobile phones have been stuck in the black screen interface. According to the log location, these mobile phones are all installed with the third-party library of Houndini, which will automatically convert ARM’s SO to x86 platform compatible, so loading and symbol export are no problem. After successfully obtaining dvmResolveClass symbol address, B) Some phones work fine, but the exported symbols are all NULL. After providing x86 platform SO, the above two problems are also solved successfully.

conclusion

This article mainly discusses to solve the patch Java solution in dalvik “unexpected DEX” abnormality to provide a new idea, in the whole Android patch big technical framework, just one of the links, if you have any questions, welcome everyone to exchange more!