Meituan Dianping is China’s largest O2O trading platform, with nearly 600 million users, 4.32 million merchants, and 11.5 million orders at peak. Meituan App is one of the main entrances of the platform, and the complexity of O2O transaction scenarios determines that the stability of the App must meet almost strict requirements. When the user goes to the store to buy coupons, he or she can’t get off the order. He or she can’t choose a red envelope which is obviously available. The images of what happened in the past are too beautiful to imagine. The biggest disadvantage of client compared to Web version is the concept of release, it is difficult to have immediate effective solutions to online accidents, each release is like skating on thin ice, after all, no matter how perfect the development and testing process can guarantee that bugs will not be brought online. Since last year, there have been some excellent hot update solutions on Android platform, which can be divided into two categories: one is the hot update framework based on Multidex, including Nuwa, Tinker, etc. The other kind is native hook scheme, such as Ali open source Andfix and Dexposed. This also makes it possible for clients to fix online problems in real time. However, after investigation, we found that the above schemes have more or less some problems. The schemes based on Native Hook need to adapt dalvik VM and ART VM, and the compatibility of instruction set needs to be considered. Native code support is required, which will have a certain impact on compatibility. In the scheme based on Multidex, DexElements need to be reflected to change the loading order of Dex, so that patch can only take effect at the next startup, and its real-time performance will be affected. Meanwhile, this scheme may have problems under android N [speed-profile] compilation mode. For details, see Android N Mixed Compilation and Analysis of hot patches. Given the fragmentation of meituan’s Android users, it’s hard to have a single solution that covers all of them. At the Android Dev Summit at the end of last year, Google announced Android Studio 2.0 with much fanlight. One of the most important new features is Instant Run, which allows code changes to take effect in real time (hot plug). After understanding the principle of Instant Run, we realized a more compatible hot update scheme, which is the productization hotpatch framework –Robust.

The principle of

The Robust plug-in automatically inserts a code for each function of each product code in the compilation and packaging stage, and the insertion process is completely transparent to business development. For example, the getIndex function in state.java:

public long getIndex() {
        return 100;
    }
Copy the code

Is processed into the following implementation:

public static ChangeQuickRedirect changeQuickRedirect; public long getIndex() { if(changeQuickRedirect ! = null) { if(PatchProxy.isSupport(new Object[0], this, changeQuickRedirect, false)) { return ((Long)PatchProxy.accessDispatch(new Object[0], this, changeQuickRedirect, false)).longValue(); } } return 100L; }Copy the code

The Robust adds a static member of type ChangeQuickRedirect to each class, and uses ChangeQuickRedirect logic in front of each method. If ChangeQuickRedirect is not null, the Robust adds a static member of type ChangeQuickRedirect to each class. It is possible to execute to accessDispatch to replace the old logic for fix purposes. If the return value of getIndex is changed to return 106, the generated patch contains two classes: PatchesInfoImp. Java and Statepatch.java. PatchesInfoImpl.java:

public class PatchesInfoImpl implements PatchesInfo { public List getPatchedClassesInfo() { List patchedClassesInfos = new ArrayList(); PatchedClassInfo patchedClass = new PatchedClassInfo("com.meituan.sample.d", StatePatch.class.getCanonicalName()); patchedClassesInfos.add(patchedClass); return patchedClassesInfos; }}Copy the code

StatePatch. Java:

public class StatePatch implements ChangeQuickRedirect { @Override public Object accessDispatch(String methodSignature, Object[] paramArrayOfObject) { String[] signature = methodSignature.split(":"); if (TextUtils.equals(signature[1], "a")) { return 106; } return null; } @Override public boolean isSupport(String methodSignature, Object[] paramArrayOfObject) { String[] signature = methodSignature.split(":"); if (TextUtils.equals(signature[1], "a")) { return true; } return false; }}Copy the code

After the client gets patch.dex containing PatchesInfoPl. Java and Statepatch. Java, it loads the patch.dex with DexClassLoader and gets the patchesInfoPl. Java class by reflection. Once you get it, create an object for the class. Then, through the getPatchedClassesInfo function of this object, we know that the class to be patched is com.meituan.sample.d (the confused name of com.meituan.sample.State). Then reflect the com.meituan.sample.d class in the current operating environment and assign the changeQuickRedirect field value to the object derived from the statepatch. Java class new in patch.dex. This is the main process of patch. Through principle analysis, the Robust is only using DexClassLoader normally, so it can be said that this framework has no compatibility problems.

The general process is as follows:



Problems with plug-ins

OK, so that’s the Robust principle. Easy, right? And in the sample example, it worked. Did everything go so well? In fact, this is not the case. When we applied this implementation to meituan’s main App, a problem appeared:

Conversion to Dalvik format failed:Unable to execute dex: method ID not in [0, 0xffff]: 65536
Copy the code

I can’t even get my bag out! From the perspective of principle, except for the aar of patch process introduced, no other methods will be added to this set of implementation. Moreover, there are only about 100 aar methods introduced, so how can more than 65536 mainDex of Meituan be caused? Further analysis showed that we processed over 70,000 functions, resulting in a total increase of 7,661 methods. Why is that?

Take a look at the dex comparison before and after patch:



For com. At meituan. Android. Order. Adapter. OrderCenterListAdapter. Java analysis, found that after hotpatch added six methods as follows:

public boolean isEditMode() { return isEditMode; } private int incrementDelCount() { return delCount.incrementAndGet(); } private boolean isNeedDisplayRemainingTime(OrderData orderData) { return null ! = orderData.remindtime && getRemainingTimeMillis(orderData.remindtime) > 0; } private boolean isNeedDisplayUnclickableButton(OrderData orderData) { return null ! = orderData.remindtime && getRemainingTimeMillis(orderData.remindtime) <= 0; ="" }="" private="" boolean="" isNeedDisplayExpiring(boolean="" expiring)="" {="" return="" expiring="" &&="" isNeedDisplayExpiring; ="" View="" getViewByTemplate(int="" template,="" convertView,="" ViewGroup="" parent)="" view="null;" switch="" (template)="" case="" TEMPLATE_DEFALUT:="" default:="" null); ="" view; ="" <="" code=""/>Copy the code

However, these extra functions were actually in the original production code. Why did they disappear without the Robust and then appear in the final class after using the plug-in? Only one possibility is that the ProGuard’s inlining was affected. Using the Robust plugin, functions that could be inlined by ProGuard cannot be inlined. Take a look at ProGuard optimizer.java:

if (methodInliningUnique) {
    
    programClassPool.classesAccept(
        new AllMethodVisitor(
        new AllAttributeVisitor(
        new MethodInliner(configuration.microEdition,
                          configuration.allowAccessModification,
                          true,
                          methodInliningUniqueCounter))));
}
if (methodInliningShort) {
    
    programClassPool.classesAccept(
        new AllMethodVisitor(
        new AllAttributeVisitor(
        new MethodInliner(configuration.microEdition,
                          configuration.allowAccessModification,
                          false,
                          methodInliningShortCounter))));
}
Copy the code

As you can see from the comments, functions that are called only once or are small enough can be inlined. Digging deeper into the code, we find that this is true. Private functions that are called only once, and functions with a single line of function body (such as get, set, etc.) are most likely to be inlined. In front of com. At meituan. Android. Order. Adapter. OrderCenterListAdapter. The six functions of Java more also proves this point. If you know why, you’ll have a way to solve the problem. If you think about it, do plugins really need to handle functions that might be inlined with a single line of function body? Let alone the possibility of the function with one line of code having a problem is small, even if there is a problem, the problem can be solved by inlining the function of patch or the function called by patch. The functions that are called only once are the same. So through analysis, such functions are actually not handled by plug-ins. So with this understanding, we made a judgment of the plug-in’s processing functions and skipped the functions that are likely to be inlined by ProGuard. I tried group purchase again, and apK was successfully packaged this time. Through the analysis of dex in the typed APK, it was found that the optimized plug-in still affected the inlining effect, but only resulted in an increase of less than 1000 methods, so it was a temporary and simple solution to this problem.

impact

In principle, the Robust inserts a logic for each function and a ChangeQuickRedirect field for each class, so it will definitely increase the size of the APK in the end. Taking the main App of Meituan as an example, a function will increase by 17.47 bytes on average, and we have processed more than 60,000 functions in the whole App, resulting in an increase in package size from 19.71m to 20.73m. Some classes do not need to add a ChangeQuickRedirect field, so you can optimize by filtering out these classes later.

Robust adds extra logic to each method. What is the effect on performance?



As you can see from the figure, for a function with only memory operations, the execution time of 100,000 times before and after processing increases by 128ms. This is the test result on huawei 4A.

Impact on startup speed:



The results on the same machine showed a 5ms difference in startup time before and after processing.











Patch issues

Take a look at the patch itself. To make a patch, we might face two problems:

1. How to resolve confusion? 2. What if super related calls are used in the complemented function?Copy the code

Confusion is actually easier to deal with. Patch. class is first generated for the code before the confusion, and then the class mapping relationship in the mapping file corresponding to the generation of release package is used to perform string processing on patch.class and make it use the confused class in the online running environment. What if the function being supplemented uses a super-related call? For example, the onCreate method of an Activity calls super.oncreate, and the badMethod of this bad.Class is the onCreate method of the Activity. Patched. Class patchedMethod how to call the onCreate method of its parent class from the Activity object? Analyzing Instant Run’s handling of this problem, we found that it added a proxy function to each class specifically to handle the super problem. Adding a function to each class definitely increases the total number of methods, and doing so is a surefire way to run into 65536 problems. So using Instant Run directly is obviously not a good idea. Super is a keyword in Java and cannot be asked by other objects. It seems impossible to call the onCreate method of its parent class directly from patched. Java code via an Activity object. However, an analysis of the class file reveals that a normal function call uses the JVM’s invokevirtual directive, while the super-. onCreate call uses the Invokesuper directive. How about changing the class file to invokesuper for this call? Look at the following example: the production code superclass.java:

public class SuperClass { String uuid; public void setUuid(String id) { uuid = id; } public void thisIsSuper() { Log.d("SuperClass", "thisIsSuper "+uuid); }}Copy the code

Production code testSuperclass.java:

public class TestSuperClass extends SuperClass{ String subUuid; public void setSubUuid(String id) { subUuid = id; } @Override public void thisIsSuper() { Log.d("TestSuperClass", "thisIsSuper no call"); }}Copy the code

Testsuperpatch. Java is the code that DexClassLoader will load:

public class TestSuperPatch {
    public static void testSuperCall() {
        TestSuperClass testSuperClass = new TestSuperClass();
        String t = UUID.randomUUID().toString();
        Log.d("TestSuperPatch", "UUID " + t);
        testSuperClass.setUuid(t);
        testSuperClass.thisIsSuper();
    }
}
Copy the code

For TestSuperPatch. The class testSuperClass. ThisIsSuper invokesuper () call do replacement, and will be invokesuper call function on testSuperClass this object, and then loading operation:

Caused by: java.lang.NoSuchMethodError: No super method thisIsSuper()V in class Lcom/meituan/sample/TestSuperClass; or its super classes (declaration of 'com.meituan.sample.TestSuperClass' appears in /data/app/com.meituan.robust.sample-3/base.apk)
Copy the code

ThisIsSuper ()V function not found in parent classes of TestSuperClass and TestSuperClass! ThisIsSuper ()V does exist in TestSuperClass and superclass, and apK decompilation does exist, so why not? By analyzing the implementation of invokesuper instruction, it is found that the system will look for the method to be called in the parent class of the class where the instruction is executed, so TestSuperPatch and TestSuperClass should be regarded as a subclass of SuperClass. Modified as follows:

public class TestSuperPatch extends SuperClass {
    ...
}
Copy the code

Then try again:

08-11 09:12:03. 012, 1787-1787 /? D/TestSuperPatch: UUID c5216480-5C3A-4990-896D-58c3696170c5 08-11 09:12:03.012 1787-1787/? D/SuperClass: thisIsSuper c5216480-5c3a-4990-896d-58c3696170c5Copy the code

Looking at the testSuperCall implementation, setUuid assigns the result of uID.randomuuid ().toString() to the UUID field of the testSuperClass object’s parent class. Can be seen from the log, the testSuperClass. ThisIsSuper processing, is indeed a call to thisIsSuper testSuperClass super this object function. OK, the super problem seems to be solved, and this way it does not increase the number of methods.

The effect after going online

Is the Robust reliable?



We put the patch online at 17:00 PM, 07.14. We could see that in the following days until 07.17, the patch was offline, and the online problem was obviously repaired. After the patch was offline, we saw that the problem was obviously increased on 07.18. The patch came back online before 07.18.



What is the patch compatibility and success rate? Through the above theoretical analysis, it can be seen that this set of implementation basically has no compatibility problem. In fact, the data of the line is as follows:



The following describes the indicators: Patch list pulling success rate = Users who successfully pull the patch list/users who attempt to pull the patch list Patch downloading success rate = Users who successfully download the patch/users who successfully pull the patch list Patch application success rate = Users who successfully download the patch/users who successfully download the patch

It can be seen from this table that our patch information has the lowest success rate, more than 97% on average, which is due to actual network reasons, and the success rate of patch after successful downloading has always been above 99.8%. Moreover, we delivered the Robust without any difference, and the server did not filter the model version. The online results again proved the Robust’s high compatibility.

conclusion

At present, the existing hot update schemes of Android App in the industry, including Multidesk and Native Hook, have some compatibility problems. For this reason, we use the principle of Instant Run to realize a more compatible hot update scheme –Robust. In addition to its high compatibility, the Robust has the advantage of being effective in real time. The replacement of SO and resources has not been implemented at present, but from the perspective of the framework, it is fully capable of supporting in the future. Of course, this solution is transparent to developers, but after all, plug-ins invade production code during compilation, which has some side effects on performance, method number, and package size. That’s what we’re working on.

reference

Don’t want to miss tech blog updates? Want to comment on articles and interact with authors? Pay attention to the public account “Meituan Review technical team”! We will update the technical blog and publish the free registration link of the technical salon as soon as possible, and we promise that every comment writer will see it.

Review images