In the past Computing conference, Mobile taobao announced that its mobile container framework Atlas will be open source in early 2017. For this framework, in the past, the team did some sharing to the outside world, and the outside world has been very concerned about it, and now it is finally open source.

This paper will introduce the design ideas of Atlas and The thoughts of Hantao on containerization, componentization and dynamic. The main content is shared by Ni Shenghua (Xuanli), a senior technical expert of Alibaba, at the Cloud Computing Conference.

What is the Atlas

In 2013, the development of the mobile carrier strategy, brought about a doubling of business and development staff expansion. The number of customers increased by four or five times from less than 100, while the number of business increased greatly, which greatly challenged the architecture and release pace of the whole client. As the basic framework of the previous mobile shopping client, Atlas underwent a major reconstruction and formed today’s Atlas.

Atlas is an Android client container framework, mainly provides componentization, dynamic, decoupling support. Support engineers on various issues during engineering coding period, Apk operation period and subsequent operation and repair period.

During the engineering period, the independent development and debugging of the project can be realized, and the independent engineering module can be realized.

At run time, complete component lifecycle mapping, class isolation, and other mechanisms are implemented.

In the operation and maintenance period, to provide rapid incremental update repair capability, fast upgrade.

Atlas is a framework for both the engineering period and the running period. Its characteristic is to try to put some work into the engineering period, so as to ensure the simplicity and stability of the running period.

At present, Atlas is widely used in Taobao App. More than 60+ business components, 20 collaborative teams and millions of lines of code are all running on Atlas. Its rapid iteration ability enables the release cycle of the application from monthly to weekly to anytime, and 446 times have been released in the past six months.

In addition, Atlas itself is very lightweight, with only more than 90 classes, which supports the development of large and small APPS. From large handtao to relatively small Ali Health, this framework is used. Its stability has also been tested and it is compatible with Android 4.x and above. The overall Crash rate of hand shopping has been maintained at about 5% in 10,000, because the Crash rate caused by containers is less than 1%.

In this sense, Atlas should first solve the problem of large-scale team collaboration, appealing for parallel development, rapid iteration and engineering decoupling, and then solve the problem of client dynamic update. The solution of internal thinking is componentization.

Atlas componentized implementation

Componentization, known as pluginization in the industry, but there are some differences between the componentization of Atlas and pluginization now. Componentization is the need to know the function of the component, and the design is more standardized.

APK package directory structure


This is a mobile phone taobao APK package, the first layer on the directory with the standard APK is exactly the same, the APP will be a lot of the so file, if untied, its structure is similar to the complete APK, but in and of itself cannot run independently, it goes with a lot of difference of plug-in is at runtime, it is running in the container, Each component is a separate Bundle.

bundle

In terms of modules, mobile APK can be divided into two layers. The upper layer is split business Bundle, including code scanning, evaluation and details. Functions can be called between each business and can be scheduled to other business parties through routing. The bottom layer is the shared bottom middleware, which opens various capabilities to the business side, such as network library and photo library, and uniformly controls them in the container. The advantage of doing this is to make the package as small as possible, and the second is to improve the performance.

layered

This piece is the overall design of Atlas, which is divided into five layers:

The first layer is called Hack layer, including OS Hack Toolkit & Verifier, where we extend system capabilities and then perform security verification.

The second layer is Bundle Franework, which is our container infrastructure that provides basic capabilities for Bundle management, loading, lifecycle, security, and so on.

The third layer is the runtime management layer, which includes the manifest. We will list all the bundles and their capabilities in a list, which is easy to find when calling. The other is version management, which manages the versions of all bundles. Then there is the proxy, which is similar to the mechanism of some plug-in frameworks in the industry. We will proxy the operating environment of the system and make the Bundle run on our container framework. Then there are debugging and monitoring tools, is to facilitate the development and debugging during the project.

The fourth layer is the business layer, where we expose interfaces to the business side, such as the framework lifecycle, configuration files, tool libraries, and so on.

The top layer is the application access layer, which is our business code.

Therefore, Atlas, as a framework, provides relatively complete capabilities. The development of the business layer can do some customized actions in each link of the framework life cycle, and can also freely call the capabilities released by the system, the framework, and even other components.

Componentize technical details

This was a brief overview at the container level, but we’ll go into some specifics below.

The Bundle lifecycle provides fine-grained nodes, such as the following Bundle lifecycle from load to run:

StartInstall: Starts loading. At this point, the framework does some copying of files, releasing lib, and loading bundles;

Installed: Indicates that the load is complete. The framework injects the resource path and creates a class loader.

Resolved: When resolved, the framework will check whether the component configuration is legal and can be resolved.

Active: Starts the component Bundle.

Started: The operation is successful.

The first issue that componentization involves is Manifest processing. One is because there are many sources, such as host Manifest, Aar Manifest, and component Manifest, and the Manifest of different components often changes, requiring flexibility.

The practice here is to Merge all Manifest files at project time. The important thing here is to Merge Bundle dependencies separately because dependency mediation is involved. Finally, Merge Bundle Manifest is parsed to obtain the BundleInfoList, which is the Bundle information list mentioned above.

The second is class loading, where a Delegate ClassLoader is used to dynamically load the component’s classes. The Delegate ClassLoader first looks for the host Bundle’s PathClassLoader and then finds the corresponding BundleClassLoader based on the previous BundleList.

Class loading

The third is resources. We will replace the system resources with our own DelegeteResources. The resources of the bundles will be added to the AssertPath one by one during installation.

In addition, the sequence of resource search process on Dalvik and ART is different. Besides, systems like Xiaomi will rewrite their resources, so we will adapt to different models and append AssetsPath later or append it ahead. AssetManager is a singleton and append it later by default. The AssetsManager object needs to be created again. Also, in order to replace the original resource when the primary dex is dynamically deployed, the insertion sequence must be consistent with the search sequence.

Note that each update of resourceTable must ensure that apkResource, Runtime system resource, such as WebView, bundle resource, have been added successfully, and in the correct order.

Resources of different bundles may have naming conflicts. We used a relatively simple method to assign different bundles to different ids to ensure that all business resources do not conflict. We tried to solve the problem during the project period. In a lot of code, the entire resource is called by reflection. On systems above 5.0, there is no problem, it only finds the first one, for business code, what was written before, and how to write today.

In terms of componentized performance, we introduced loading on demand, because handwash APK has over 70 bundles and only needs 5 or 10 bundles per user, so you don’t need to load all the bundles. Bundles are isolated and interact with each other through the four native Components of Android, so that bundles can be decoupled from each other.

The entry of all calls is based on BundleInfolist. According to this listing information, the Bundle where the component is located is obtained. If it needs to be loaded, we will perform operations such as install and dexopt.

In addition, to solve the problem of component dependency, two new component formats are defined: Awb (Business Bundle) and SOLib (SO library). The former is consistent with AAR, but does not add local lib, and does dependency arbitration during construction. The latter is the dependency of Native SO library. Awb is simply an AAR, but the suffix has been modified so that you use an AAR if your package is a host Bundle and Awb if it is a component Bundle.

For the dependencies of business bundles, we will package the host Bundle, business Bundle and its dependencies respectively during the construction period, and then conduct tree-like arbitration according to the shortest path and first declaration principle to obtain the dependencies required by each Bundle. During the packaging, we will put the dependency libraries into their respective bundles.

Rely on

Finally, there was the APK build, which we made some major adjustments to. In the figure above, the part on the left is a standard APK construction process, including processing, compilation, and signing.

Awb resources are built based on the host resource. Ap_ and package resources, and R files are merged from Bundle R resources and host R resources. Then we modified Aapt to assign different packageIDS to each Awb. After unified confusion, the Dex of each AWB is produced, packaged as APK, copied to LiBS after signature, renamed as SO file, and then merged into Taobao APK. That’s the whole process of our componentization.

The Atlas dynamic

In a container framework, componentization and dynamics complement each other. Components only solve the problem of decoupling, but we must make the container framework dynamic if we want to send packets at any time. We made dynamic support after completing the componentization of Atlas. One of the benefits of being dynamic is the reduction in package size, so we can download some packages into the application after they run, and the ability to release and fix them dynamically.

Incremental dynamic scenario

Atlas provides dynamic deployment capabilities, with the main goal of dynamic business publishing, as well as problem fixing. It is based on the hand-washed differential algorithm, the master Bundle is based on the ClassLoader mechanism, and the business Bundle is based on the differential merge, supporting the full service type.

In addition, Atlas also supports Andfix as a plug-in, aiming at quick fault repair. Its principle is based on Native hook, mainly for method modification. In practice, two plug-ins can be used together. In the engineering construction period after adaptation, can achieve a set of code and two sets of schemes common.

plan

Firstly, the generation of Dex Patch is realized by modifying the bytecode of Dex, converting the Dex file into Smali, and analyzing the ClassDef and ClassDataMethod structures in it. Classes can be deleted, added and modified. Diff is then used to generate differential files, which are then merged to generate patches.

The second is the generation of the entire resource Patch, which is divided into two parts. The first is the service Bundle. Originally, it is a process of continuous loading, which is relatively simple to implement and can be obtained through the Md5 diff/BSDiff. For the main Bundle, android has a limitation that all resources must be in the base Bundle, and adding a new resource does not take effect. So one way to do this is to leave a lot of empty resources when you pack. In addition, updating existing resources is done by resource overwriting.

Finally, if you add a new business, will add new Activity, we practice first embed a StubActivity in the Manifest, then in the Instrumentation. ExecStartActivity (phase) to replace, With the Intent setFlag, the Activity launch mode is launched and the startActivity is started. Then the System_server process updates the ActivityStack and creates the binder. The ActivityThread handler is then intercepted to update the ActivityInfo and create the target Activity.

In addition, in engineering practice, since the generation of patches will involve Dex and baseline of resources, we will release AP (baseline package) to Maven at the same time every time we release APK package during deployment. AP baseline package contains all files that affect the baseline, the first is Android APK, the second is mapping.txt. Finally, Dependency. TXT, so that the whole build will be very fast.

So this way we are, the version of the upgrade is a different way. For example, the details of Mobile shopping will be updated today, and the version will be released. This version may not be the version to the app market, but a Patch package. As for the dynamic deployment of business version, we are synchronous from 5.3.0 to 5.3.1 to 5.3.2. Such an advantage is that patch can be upgraded all the time as long as the container version is not upgraded and as long as there is demand, and there is no discernable differential upgrade.

Peripheral optimization point

Finally, we will talk about the surrounding optimization points, why to open source only today, do the process or encountered many problems.

The first point is the repeated resource merging of bundles. Because we found that, because of the host problem, there were inevitably conflicting issues, including image resources, which we put into the whole host category.

The second is the dependency verification of the Bundle, which used to be compiled if it was code, but because it is binary today, this problem is left to the field, so we will see if the API affects the Bundle.

The third is the “thin” class library, because there are too many kinds of middleware class libraries on hand shopping, leading to hand shopping itself is bloated, the number of methods is very large; So when packaging, there is a process of tailoring the class library, optimizing the number of methods.

The fourth is caused by dependence, dependence query library.

The fifth is to do Dex File and confusion Mapping.

Finally, in the preparation of open source, we will do open source in both the engineering and operation periods, and provide the mechanism through cloud services. Alibaichuan will provide the r&d support ability of Atlas, including the ability of quick generation, release, rollback, monitoring and so on.