Author: Liu Tianyu (Modest Wind)

Series “nuked engineering corruption review | proguard governance” nuked engineering corrupt | manifest governance “nuked engineering corruption: Java code management” “nuked engineering corrupt | resources governance. This article, the fifth in a series, focuses on the niche area of dynamic link library SO. On engineering corruption, direct firing!

In the field of Android technology, the dynamic link library SO is generally developed using C/C ++. In recent years, with the “shining” of Rust, it can be seen in both the aOSP system function level and the app application function level. But no matter what development language is used, the end result in APK and runtime is so files in ELF format. This article focuses on the dynamic link library SO itself, the corruption of abi incompatibility, duplication, conflict, useless export symbols, tool development and governance practices.

Basic knowledge of

This chapter will not explain how to write the dynamic link library SO using C/C ++ or other languages. Instead, it will try to explain some interesting knowledge learned from Android architecture work in recent years from the perspective of “outside” (non-C/C ++ developers) at the app level as a whole.

1.1 c++ standard template library (STL)

When developing the dynamic link library so in c++, you need to specify which one to use if you use the c++ standard template library. The following options are available:

  • Libc++. LLVM’s libc++ is an implementation of the STL specification, which has been used in Android 5.0 and later OS versions, and more recently became the only STL available in ndkr18. Therefore, libc++ is also the official STL for Android;
  • Gnustl&gnustl _port. Both are STL specification implementations provided by the GNU project and supported in older versions of the NDK, which, as mentioned above, has been deprecated since ndkr18. Avoid using this STL for current development;
  • The system. Android system built-in STL specification implementation, only provide new and delete, generally not used. Also abandoned after NDKR18.

After selecting a specific STL, there are two linking options available:

  • Static linking. Static linking will link (copy) the code used in the STL into the SO;
  • Dynamic linking. Instead of copying the STL code into SO when linking, the STL symbols used are stored in the dynamic linked symbol table of SO, and the symbols in the STL are bound and invoked at run time (in THE SO of THE STL).

When the APP has only one SO, it is recommended to use static link to reduce the package size. When the APP contains multiple SOS, all static links are used, and the STL code implementation will copy multiple copies to different SOS, which will greatly increase the package size, so dynamic links should be selected. However, it should be noted that no matter how many SO statically link to the same STL, or how many SO dynamically link to different STL, it will lead to abnormal run-time function and even crash risk. Therefore, the best solution is to use only one link mode and use only one STL at the same time.

1.2 SO Dynamic Link (dependency)

For a c/ C ++ source developed module, if you need to reference the functionality provided by other modules, similar to the use of STL, there are two options: dynamic linking and static linking. It is important to note that if the dependent modules already exist in APK in the form of the dynamic link library SO, the dynamic link form should be selected here. Otherwise, use the static link form. If you use dynamic linking and reference other symbols in so, the final SO will contain information about this dynamic dependency. Specifically, this information exists in the “.dynamic “section of the so file, which can be read using the readelf tool (one of the more common ones), as an example:

Dynamic section at offset 0x2d18 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [liblog.so]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc++_shared.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so]
 0x0000000000000001 (NEEDED)             Shared library: [libslimlady_core.so]
 0x000000000000000e (SONAME)             Library soname: [libslimlady.so]
Copy the code

As you can see from the above output, the entry with Type SONAME records the so name. Note that this so name is only used when other SOS rely on this so and are searched in the search path. It may not be exactly the same as the name of the so file, but in the Android environment, we generally keep the two the same. Libslimlady. so dynamically links (relies) six sos. libslimlady.so dynamically links (relies) six sos. NEEDED

  • Libdl. So. The dynamic linker provides the ability to dynamically load other SOS, which are included in all Android SOS.
  • Libc. So, libm. So. This can be thought of as the basic runtime library for C, and can be thought of as all the so used in Android;
  • Liblog. So. Android platform logCAT log library, in C/C ++ code if you need to print information to logcat, you need to dynamically link this library, and call related functions in the code;
  • Libc++ _shared. So. This is the name of the LLVM version standard template library libc++, in the form of dynamic links;
  • Libslimlady_core. So. This is another existing SO in APK, and libSlimlady. so relies on this SO via dynamic linking to call its defined methods in code.

In fact, systems that support dynamic linking as described above also support a more flexible way to load SO, namely explicit runtime linking. This linking method does not record the dependent SO in the SO file, but dynamically loads other SO in (Dlopen), obtains the address of the target symbol (DLSYM), and calls it at run time according to the requirements, which is not expanded in detail here.

1.3 Analysis of SO loading process

Next, let’s look at what the basic loading process of an SO looks like.

So loading process analysis

When we call the system. loadLibrary method in our code to load so, we first look for the absolute path of so file in the Java API Framework layer. The search path is stored as follows:

  • NativeLibraryDirectories member variable in the BaseDexClassLoade object, DexPathList instance, if OS < 6.0;
  • OS is greater than or equal to 6.0, is located in DexPathList nativeLibraryPathElements members of the instance variables.

After finding the absolute path to the destination so file, the Java virtual machine determines whether the SO has been loaded and returns if so. If it is not loaded, the nativeloader&linker layer is continued, where the actual loading is done. First of all, the so file header will be parsed and other SO sets of dynamic links of this SO will be collected. If they are empty or all have been loaded, the target SO will be continued to judge whether it has been loaded (there is a concurrency problem here, so the judgment will be made in the Native layer). If not, the target SO will be loaded directly. Notice that the process here is simplified. There is no judgment on whether the dynamically linked SO set has been loaded or not. In fact, the loading work of all levels dependent on SO is completed one by one by traversing SO and following the breadth-first principle. In this traversal process, it is also necessary to find the absolute path of so file according to the name of so. The source of the search path is as follows:

  • If the OS is less than 7.0, the search path is in the Java layer.
  • When the OS is greater than or equal to 7.0, the loading of the underlying SO introduces the concept of Namespace. When the BaseDexClassLoader creates an instance, a Namespace will be created in the Nativeloader layer, and a copy of the Java layer search path will be created.

The loading process of different OS versions is not completely consistent. The loading process of SO mentioned above is an abstract and simplified schematic process, but the real situation is much more complicated. In addition, SO loading is thread-safe, so there is no problem of multiple loads of one SO into memory, and because of this, concurrent loading of SO may lead to blocking wait, which needs special attention. In addition, if you want to load non-app so, one solution is to add external paths in the Java layer. If dynamic links (dependencies) between several SOS are involved, the inconsistent search paths of the Java layer and native layer can never be ignored: If the target SO is not in APK, so may not be found. If the target SO is in APK, both external SO and built-in SO may be loaded into memory.

As an Android developer, even if you don’t need to develop C/C ++ code in your actual work, knowing more about the dynamic link library SO is of great benefit to fully understand the operation mechanism of the APP. I recommend one of my favorite books: Self-cultivation of programmers (linking, loading, and Libraries).

Governance practices

With the increase of engineering modules & functions, dynamic link library so corruption gradually accumulates: the use of c++ standard template library is varied, a large number of static link STLS lead to unnecessary package size increase; When adding or updating SO, there is a lack of required ABI, resulting in a crash on the corresponding device due to the failure to find SO. The occasional repeated SO problem also has a negative impact on packet size and stability. The accumulation of useless export symbols also leads to an increase in package size. These problems are all practical problems encountered in the past struggle between Youku and dynamic link library SO “corruption”. Establish effective detection capability through relevant tools, and form daily research and development bayonet mechanism based on this, and gradually digest existing problems under the premise of ensuring zero new problems.

In the process of problem location and investigation, it is a natural basic demand to quickly obtain which module so comes from. The introduction of a large number of second and third party modules, as well as the improvement of the modularization degree of APP engineering, make the cost of acquiring the above information increasingly high. To this end, the function of module containing SO list is firstly developed to quickly check which module the target SO is located in (APP project, subproject project, Flat AAR and external dependent module). Example results:

Com. Youku. Android: YNativer: 1.2.20210119.2 | -- libPdora. So | | - armeabi | | - arm64 - v8a | -- libaua. So | | - armeabi | | - arm64 - v8a com. Alient. Media: Alier: 2.20210202.16 | -- libalier. So | | - armeabi - v7a | | - arm64 - v8aCopy the code

In addition, the previous part of basic knowledge also mentioned that dynamic dependency relationship can exist between SO. It is not troublesome to directly check the dependence of an SO on other SOS through relevant tools, but it is not easy to quickly obtain the dependence of an SO on other SOS from the perspective of the whole APK. Therefore, as an auxiliary tool, the function of so dependency detection is also developed to analyze such dynamic link (dependency) relations among all SOS in the global scope of APK. In the analysis result, only level 1 dependencies are listed, that is, if A-> (dependency) B-> (dependency) C, only C< -b and B<-A are included in the list. Example analysis result:

* libc++ _shared. So # so name | - armeabi - v7a (#) abi, in parentheses is the target abi/so, from which modules, if the brackets are empty, that does not exist in the apk. | | - libhnd. So (com. Youku. Arch: Hnd: 2.8.15) # armeabi - v7a/libhnd. So, in the Hnd module. Depends on armeabi-v7a/libc++_shared.so. | | - libslimlady. So (project. Extaar. App: slimlady: 1.0) | - arm64 - v8a () | | -- Hnd. So (project: the app: 1.0, com. Youku. Arch: Hnd: 2.8.15) | | -- libslimlady. So (project. Extaar. App: slimlady: 1.0) * libusb100 so | - Armeabi - v7a (project: the library - the aar - 2:1. 0) | | -- libUVCCamera. So (project: the library - the aar - 2:1. 0) | | -- libuvc. So (project: the library - the aar - 2:1. 0) | | -- libUSBAudioDevice. So (project: the library - the aar - 2:1. 0)Copy the code

Next, the governance practices of each SO “corruption” item are explained one by one.

2.1 ABI is incompatible

Abi stands for Application Binary Interface. Different Android devices use different cpus, and different cpus support different instruction sets. Each combination of CPU and instruction set has its own application binary interface (ABI). For the Android platform, there are two major differences:

  • Available CPU instruction sets, and extended instruction sets;
  • Specifications for passing data between applications and systems (including alignment restrictions), and how stacks and registers are used when the system calls functions.

In the current Android ecosystem, it is mainly THE Arm instruction set CPU, and further expansion is the 32-bit and 64-bit Arm instruction set. Currently, most new mobile devices use 64-bit CPU, but due to historical reasons, many apps only support 32-bit ARM. Upgrading apps to support 64-bit ARM has many advantages:

  • Performance. The instruction set corresponding to 64-bit armCPU has a more efficient instruction execution speed. Making full use of these instruction sets can effectively improve the app use experience.
  • Memory. 32 bit app process, the maximum VirtualMemory is 2^32, that is, 4GB, due to the OS and other occupying, the actual available less than 4GB, with the improvement of screen resolution, CPU computing ability and other hardware level, app bearing more and more complex functions, so the demand for VirtualMemory also increases, further, The OOM problem is aggravated by the lack of virtual memory. With 64-bit support, the VirtualMemory limit will exceed 4GB on 64-bit models. The theoretical limit is up to 2^48, which greatly alleviates OOM problems caused by VirtualMemory.

Of course, 64-bit is not without its downsides, and package size is one of them. For the same C/C ++/ RUST code, the corresponding 64-bit SO file will also increase significantly due to the increase in the number of bytes occupied by the instruction set and data. Not surprisingly, this doesn’t stop the Android ecosystem from supporting 64-bit apps: Starting August 1, 2019, GooglePlay will require all new and updated apps that include SO to support 64-bit ARM, otherwise they will not be approved. Domestic app stores have also followed suit. For example, Samsung and Huawei respectively launched relevant restrictions or promotion in 2020. Up to now, more app stores have joined in the promotion of 64-bit app support.

In terms of the support mode for 64-bit apps, googleplay provides app bundle, a compontized technology, which uses the so collection of different abi as a feature module for apk assembly in stores according to devices. However, in China, different app stores do not (and cannot) uniformly support this. In addition to app bundle, another support mode for 64-bit apps is “subcontracting” : one 32-bit APK and one 64-bit APK. The App store automatically presents the corresponding APK according to the CPU information of the terminal phone, which also depends on the support of the app store, but also faces the problem of inconsistent support. Of course, there is also a third support mode called “pack” : an app with both 32-bit apK and 64-bit APK, which requires no additional support, but the SIZE of APK expands rapidly.

Comparison of 64-bit support schemes

For non-app store channels, there are only two ways to ensure THE availability of APK: use 32-bit APK, or combine APK packages. In non-app store channels, traffic is usually real money, and a large package size can greatly reduce the download and install conversion rate. Therefore, the APK combined package model is difficult to meet the demand, and the user experience brought by 64-bit APK is usually sacrificed in favor of 32-bit APK. Of course, in the industry these two years, for non-app store channels, head apps tend to use independent “extremely small packages” to launch. Apart from business and operation aspects, the technical implementation is also an interesting topic, but it is not relevant to this paper and will not be discussed here.

In 2020, Youku adopted the subcontracting mode to realize the support for 64-bit APP. In the transformation process, there were many stock dynamic link libraries, which only contained 32-bit SO, resulting in the overall failure of app to be compatible with 64-bit devices. On the other hand, how to maintain the compatibility of 32 and 64 bits in the process of app functionality iteration is also not a small challenge: Both the addition of SO and the upgrade iteration of existing SO may lead to the loss of 32-bit and 64-bit SO, and it is impossible for every project or code change to be fully double-verified in 32-bit and 64-bit APK.

For this purpose, a 32/64-bit ABI compatibility detection tool was developed. For the same name OF SO, if there is not both 32-bit ARM (ArmeABI or ArmeabI-V7A) and 64-bit ARM (ARM64-V8A), the ABI is judged to be incompatible. Take it a step further and provide the option to terminate the build process if the test results fail, creating a bayonet mechanism. The sample detection results are as follows, and which modules so comes from are also given:

LibUVCCamera. So | - armeabi - v7a ([project: library - the aar - 2:1. 0]) libjpeg - turbo1500. So | - armeabi - v7a ([project: library - the aar - 2:1. 0])Copy the code

Youku launched so ABI incompatibility bayonet in January 2021, which has been blocked for 11 times in total, effectively guaranteeing the compatibility of 32/64-bit devices. In fact, this detection capability and the bayonet mechanism work exactly as expected regardless of which mode is used for 64-bit APK support.

The ABI is incompatible with governance situations

2.2 repeat so

Repeat so: the same ABI with different names so has the same MD5 value. Generally speaking, this is the result of changing the file name of the same so. In the process of APK construction, repeated SO will enter apK, resulting in the increase of package size. In addition, once fully loaded into memory, there are several runtime risks, similar to the reason that STLS are statically linked by multiple SO’s described in Chapter 1. The following is an example check result:

[armeabi-v7a] md5: C0598ed0b87843147152e14bba2b036f | - libmitaec. So (com. Youku. Android: DQI4Android: 1.2.0.10) | -- libNlsAEC. So (com. Youku. Android: DQI4Android: 1.2.0.10) [armeabi] md5: 66 a9cf3fcd1739ad01d637418e97ebc5 | - libwxjst. So (com. TB. Android: ws: 0.26.4.45 - youku) | -- libwxjsb. So (com. TB. The android: ws: 0.26.4.45 - youku)Copy the code

Repeating so also provides the option to terminate the build process if the test result is not valid, forming a bayonet mechanism. In actual iterations, this should happen less often, as normal development does not intentionally change the filename of an SO. During the practice of youku for more than a year, only two duplicate SOS in stock have been found. Up to now, there has been no interception record caused by such problems since the insertion of bayonet in February 2021. The reason why it is necessary to develop such detection capability and deploy the online corresponding bayonet is that such an uncommon problem without any “clues” is difficult to detect in time once it occurs and may exist for a long time. This is also a typical problem hidden deep in “engineering corruption”, which must be avoided.

2.3 conflict so

Conflicting so indicates that the md5 values of files with the same ABI name are inconsistent. During an APK build, a NAMesake SO under the same ABI will fail the build (default, not easy to locate which module the NAMesake SO is from) or choose the first encountered (pickFirsts, “random”, causes uncertain risk), depending on the build configuration (PackagingGptions). The conflict detection function developed is mainly for the convenience of locating the module from which the conflicting SO comes, because the Android Gradle Plugin will not give the module from which the SO comes after the failed construction. The sample check content is as follows:

[armeabi - v7a] libaceManager. So | -- com. Youku. Arch: Hnd: 2.8.16 - the SNAPSHOT (43392841 f299f7b2e35df4bd85703272) | -- - Com. Youku. Android: ALib: 1.0.2 (b7f8d6fc7ba25073e8743c061ed9e92a)Copy the code

Since Android Gradle Plugin has implemented direct interception (packaging failure) for such problems by default, this detection capability is not deployed online corresponding to the bayonet, but is used as an auxiliary function in daily work.

2.4 Useless export symbols

Exported symbols are objects, methods, and global variables defined in so that are set to be referenced by external code. For useless exported symbols, it is exactly in all SO in the global scope of APK to find whether there is import (reference) of this symbol. If there is no such symbol, it belongs to useless exported symbols, which can be cleaned up by linking options in the linking stage of SO construction process. Useless export symbols must be globally analyzed in APK, because at each SO compilation stage, it is difficult to determine which symbols are not being used externally, unless it is so at the top of the call chain. The analysis results are displayed step by step according to module, SO name and ABI. The examples are as follows:

* project: library - the aar - 2:1. 0 | - libuvc. So | | - armeabi v7a - | | | - _uvc_status_callback | | | - uvc_print_format_desc_one | | |-- uvc_find_frame_desc_stream | | |-- uvc_any2iyuv420SP | | |-- uvc_print_configuration_desc | | |-- uvc_get_bus_number | | |-- uvc_parse_vc_extension_unit | | |-- uvc_get_stream_ctrl_format_size | | |-- uvc_yuyv2yuv420P | |-- arm-v8a | | |-- _uvc_status_callback | | |-- uvc_print_format_desc_one | | |-- uvc_find_frame_desc_streamCopy the code

The following two points should be paid attention to:

  • The JNI method has been ignored. When binding JNI methods, the OS uses JNI_OnLoad/JNI_OnUnload and all symbols starting with “Java_”. However, in the above detection algorithm, JNI_OnLoad/JNI_OnUnload and all symbols starting with “Java_” are mistakenly detected as useless. Therefore, the detection results are removed to avoid false detection.
  • Symbols loaded and called through DLSYM will be mistakenly detected as useless, so you need to make final judgment based on the actual code functions.

Useless exported symbols, considering the existence of theoretical error check problem, and a small number of useless exported symbols, in a short period of rationality, not further form a jam, but as a slimming item in the packet size analysis results. After the detection capability development was completed in December 2021, the stock of useless exported symbols in Youku was about 35,000. After a round of centralized problem distribution, it has been reduced to about 28,000 at present.

2.5 Overview of Governance

So far, for the dynamic link library SO, a more comprehensive and effective anti-corruption capacity construction and governance. Finally, a panorama is given:

Overview of dynamic link library SO governance

What else can be done

In fact, the dynamic link library SO, as a binary form of program code, contains a lot of information. For example, in the package size analysis tool of Youku, static link STL and link non-standard STL are taken as one of the slimming detection items, providing effective guidance for package slimming. At the same time, the analysis of SO is much more difficult than that of Java JVM bytecode, and there is still a lot of room for further exploration, such as missing export symbols, JNI method mismatches, etc.

In the field of Android development, Java/Kotlin, the upper technology stack, is very different from C/C ++/ RUST, the lower technology stack, in terms of source code compilation, debugging, and runtime error location analysis. On the one hand, so corresponding source code is a lot of times, is the same set of code compiled for multi-terminal (Android/ios) library, both source code or compilation options, may lack of in-depth optimization of Android; On the other hand, the part of Java/Kotlin code that calls so to each other is also prone to “rotten” code due to gaps in the technology stack.

This requires developers working in the Android space to be able to extend their language & technology stack in order to write good code implementations with a more holistic view. The fight against engineering corruption requires developers of different technology stacks to take a step forward and break the corruption caused by this technology boundary.

【 References 】

  • 【Book】 Self-cultivation of programmers (linking, loading, and libraries)
  • [Google] c + + Library Support:developer.android.com/ndk/guides/…

Pay attention to [Alibaba mobile technology] wechat public number, every week 3 mobile technology practice & dry goods to give you thinking!