The shell mentioned here refers to dex shell, not so shell. The code analysis involved is based on AOSP version 8.0.


Dex shell base

Shells are divided into several types, or generations:

  1. Hide the classes*. Dex files in the original Apk, and use static decompiler jadx to see only the shell dex, not the dex where the App logic is located. The shell dex is responsible for decrypting the App dex and restoring it to the VIRTUAL machine. The loading method can be file loading or memory loading.
  2. Function granularity protection: namely function pump shell, protecting granularity dropped the method level, the App dex time in some way, using jadx this static decompiling tools to check the method body are all the nop instruction, by the shell code at the right time to go back method body restoration, obvious method body reduction must be conducted before the function. Function extraction shells need to disable dex2OAT process.
  3. The protection effect of instruction granularity is the strongest: mainly VMP shell and dex2C. The significant feature is that Java methods are native. VMP shell reduces the protection level to each instruction level and executes SMAIL instructions through a customized interpreter. Dex2C equivalently converts Java methods into corresponding native functions.

Dex integral shell is the basic protection. Basically, mainstream reinforcement manufacturers will add the integral shell to DEX. Therefore, the integral shell must be removed first to further determine whether it is function extraction shell /VMP shell /dex2C.

Many of the existing open source shell solutions are experimental, because the reinforcement needs to solve the performance and stability problems, requiring a lot of manpower to achieve. But you can look at this code and learn how to shell it. 1. Integral shell: github.com/guanchao/ap… 2.dex2C: github.com/amimo/dcc 3.VMP : github.com/chago/ADVMP


Two. Determine the type of shell:

It is easy to determine whether the App is a shell. After opening Apk with JADX, check the four major component classes of the App. If the component class cannot be found, it is the whole shell. For example, only the shell code associated with Qihoo is visible here, and the App code is hidden:

To determine whether the function is extracted shell is to see whether the function body is meaningful code, for example, the following is an extracted shell, you can see that the function body is noP instruction:

To determine whether it is a dex2C/VMP shell, we need to see whether any Java method is native. As for dex2C or VMP, we need to make a further judgment, such as whether the registered address of native functions is the same:


Three. Shell point of view of the overall shell

The overall shell needs to solve the following problems: 1. How to encrypt and hide the original DEX file 2. How shell code gains control and access to the hidden and encrypted dex before execution so it can decrypt dex file 3. How do I dynamically load a decrypted DEX file and make the vm recognize the four components in the DEX file

Problem 1: It is possible to add the original dex file encryption to the end of the shell dex file and update the shell DEX file checksum,signature, and file_size to reflect these changes, because these values are retained in the header of the dex file to check the correct format of the dex file:

After modifying these three values, some information irrelevant to the dex format definition can be added to the end of the shell dex file, such as the number and size of the original dex file and the contents of the original dex file after encryption. Then replace the shell DEX with the original dex file to repackage Apk.

Problem 2: The shell code will replace its own Application class with the original Application class, so the onCreate() and attachBaseContext() of the shell’s Application class will be the first to take control when the App process starts. Access to the hidden and encrypted dex can obtain the apK path through getApplicationInfo().sourcedir. Decompress apK to obtain the shell dex file and decrypt the original dex file attached to the shell dex file.

Problem 3: Load the decrypted DEX file dynamically through the Class loader in Android. In order to enable the four components to operate the life cycle functions normally, the class loader needs to be modified. There are probably two ways to modify the class loader, which will be described in detail in the following content.


Iv. Android class loader:

The classloader plays an important role in the whole dex packing and unpacking. There are a lot of articles on the web about class loaders in Java, so I won’t write about them here. Instead, I’ll focus on the “class” loaders in Android for dex loading. Class loader can load unknown classes at compile time at run time. Dlopen and DLSYm have similar functions for C/C ++, and BaseDexClassLoader in Android can load unknown dex files at compile time at run time. After loading by this loader, The corresponding data structure is formed in the MEMORY of the ART virtual machine, and the dex file is mapped to the virtual memory by mMAP. The Class object can be obtained by using the loadClass(String className, Boolean resolve) method of this loader. So you can use this class. As for the shell, the dex file decrypted by it is also unknown at compile time, so it needs to load the dex file through the ClassLoader. The BaseDexClassLoader provided by the system can meet the requirements, or you can customize the ClassLoader to achieve this.

Print tests to see classes in the JDK API like java.lang.String, . And the core of the Android system provides such as Android app. The Activity (in the/system/framework/framework. The jar) are from the same class loader Java lang. BootClassLoader loading. The code written by the application developers themselves by dalvik. System. PathClassLoader loader loads, printed information is as follows:

dalvik.system.PathClassLoader[DexPathList[[directory "."],nativeLibraryDirectories=[/system/lib64, /vendor/lib64, /system/lib64, /vendor/lib64]]]
Copy the code

PathClassLoader inherits from BaseDexClassLoader and has two other siblings: InMemoryDexClassLoader and DexClassLoader, which are used by many shell programs to load decrypted dex files. InMemoryDexClassLoader InMemoryDexClassLoader InMemoryDexClassLoader InMemoryDexClassLoader InMemoryDexClassLoader InMemoryDexClassLoader InMemoryDexClassLoader InMemoryDexClassLoader To summarize, there are mainly the following classloaders:

  1. BootClassLoader: A loader that loads the core classes of the system and is created by the system
  2. BaseDexClassLoader: base class loader for loading dex files
  3. PathClassLoader: a loader that loads code written by the application developer, such as the four component classes, which inherit from BaseDexClassLoader
  4. DexClassLoader: It can be seen from the source that there is almost no difference with BaseDexClassLoader. It inherits from BaseDexClassLoader and is generally used to implement plug-in and shell
  5. InMemoryDexClassLoader: a loader that loads the dex through the ByteBuffer array, inherited from BaseDexClassLoader
  6. Custom ClassLoader: You can implement any function you want

If the original dex is loaded by the shell, you need to modify the ClassLoader. Otherwise, you will receive a ClassNotFoundException when loading the component class. This involves the creation of component classes, such as an Activity, Application object of the Activity is in ActivityThread type performLaunchActivity () method by calling mInstrumentation. NewActivity () is created, the realization of the function logic is:

You can see it’s throughClassLoaderFirst load the Activity class and then passnewInstance()To implement the class object.

This method is passed inClassLoaderIs the ClassLoader that loads the application. The dex it loads is the dex of the main body of the application, corresponding toDexPathListDoes not have the original dex path, so reportsClassNotFoundException. You can also see here that oneBaseClassLoaderThis is actually a list of Dex files, and if you try to getBaseClassLoaderLoading a class that is not in this list is reportedClassNotFoundException.

To solve the above problem, there are two possible solutions:

  1. Replace the system component classloader with oursDexClassLoader, while settingDexClassLoaderParent is the system component classloader
  2. Package the original parent relationship in the system component classloader andBootClassLoaderInsert ours in the middleDexClassLoader, that is, loading the original dexDexClassLoaderAs aPathClassLoaderThe parent

The first solution is to replace system component class, so that through mInstrumentation. NewActivity () when trying to load class, in order to find the corresponding class. The second solution changes the ClassLoader inheritance to: BootClassLoader –> DexClassLoader –> PathClassLoader. Due to the existence of parent delegation mechanism, when PathClassLoader cannot find the original dex dynamically loaded, It will request the parent or DexClassLoader to load, so it can load successfully. Other shells do this by merging Elements in the PathClassLoader array. The nice thing is that when you try to enumerate ClassLoader objects with Frida, you don’t see any extra ClassLoader objects. It is not easy to get the Dex file loading path.


Five. General overall peeling method:

As can be seen from the above content, there are many ways to realize loading Dex, and the implementation of each shell may not be the same. If you want to realize a universal overall peeling method, you must find a way to realize that the shell cannot be wrapped. The key data structure is the art::DexFile class defined in the ART VIRTUAL machine in the art/ Runtime/dexfile.h file. This class has two member variables, respectively representing the starting address and size of the dex file to be loaded into the memory. After obtaining these two information, the dex file can be easily dumped through the way of memory dump. This indicates the fact that there must be a decrypted dex in memory at some point in time.

Art ::DexFile is a class that can’t be loaded without dex, either using BaseClassLoader or a custom class loader, You will eventually need art::DexFile. This can be determined by looking at functions in class_linker.cc files like ClassLinker::DefineClass, ClassLinker::LoadMethod, ClassLinker: : LoadClassMembers these heavyweight function need to DexFile objects as arguments:

Class is loaded by BaseDexClassLoader performs art/runtime/native/dalvik_system_DexFile. Cc DexFile_defineClassNative functions, the realization of the function is to traverse DexF Java layer The Art ::DexFile of the native layer represented by the mCookie object of ILE, because this art::DexFile object represents the structure of dex file in memory, the DexFile::ClassDef structure can be found in the art::DexFile list by the class name. To continue calling ClassLinker::DefineClass(), you end up with a Class object representing mirror::Class* in the virtual machine implementation, so the process still needs the art::DexFile Class in the virtual machine.

This also raises a question: The ART virtual machine can interpret execution by performing smail instructions in a mode called Interpreter mode (which obviously requires the dex file to be in memory) or by performing local machine instructions in elf files after OAT, which is called Quick Code mode. In Quick Code mode, machine instructions compiled by DEX are included in oAT files generated by dex2OAT (oAT files end with.odex for APP), so are dex files still needed for Quick Code mode? If you don’t need the dex file, can’t dump the dex file in memory?

In fact, quick Code mode will also have the original dex file in memory, The specific execution logic is in the Runtime GetOatFileManager().opendexfilesfromoat () function, in which the OAT_file_assistant.makeuptodate () call will be executed if the dex2OAT process is followed Odex and VDEX files were generated by the dex2OAT command, and OatFile objects were generated. This class represents the mapping representation of OAT files in memory after Dex2OAT, oAT files are actually executable files. It only have a few special symbols: oatdata, oatlastword, oatbss, oatbsslastword etc., OatFile member variable OAT_dex_files_Storage_ is a variable of STD ::vector

, OatDexFile represents the information of dex file corresponding to the OAT file (list), With the dex_file_pointer_ of OatDexFile, the address of the corresponding ART ::DexFile object can be found. After android8.0, the oat file holds the executable instructions compiled from the dex file, while the original dex content is actually stored in the vdex file, which can be seen later in the dump program. In Quick Code mode, the original dex is also needed because the dex file also stores class-related information, such as class_def_item and method_id_item, which is also necessary for the execution of class methods.

Vi. Overall peeling practice:

Since we know that the art::DexFile class in the VIRTUAL machine represents the dex in memory, we can dump the dex file by obtaining this object. The next step is to find a suitable point to get the art::DexFile object, after getting the object through hook or modify the source code can be dumped down. Find all exported functions in libart.so that take the art::DexFile argument or return value. This is the point to unshell. I wrote a command to find functions that satisfy this condition:

arm64-readelf -s libart.so -W | tr -s ' ' | cut -f9 -d ' '| c++filt  | grep "art::DexFile"
Copy the code

For example, add the following code to art/ Runtime/dexfile. cc’s DexFile::OpenCommon or DexFile::DexFile to remove the shell:

  pid_t pid = getpid();
  char dexfilepath[100] = {0};
  sprintf(dexfilepath,"/sdcard/drdump_%d_%d_DexFile.dex",(int)size,(int) pid);
  int fd = open(dexfilepath, O_CREAT | O_RDWR , 666);
  if (fd > 0){
    int number = write(fd,base,size);
    if(number > 0){
    }
    close(fd);
  }
Copy the code

This is the way to modify the source code to shell.


Another way to unshell is through frida Hook, which has the advantage of being simple and efficient without recompiling the ROM. It works as follows: For programs loaded using BaseDexClassLoader, the mCookie variable of the Dexfile. Java class is represented in the native layer as an array of Pointers of type JLong. The first element is OatFile*, the remaining elements are the corresponding ART ::DexFile*, so the ART ::DexFile* list can be obtained by obtaining mCookie variables. And dump by begin_ and size_ of art::DexFile. The code is as follows:

function hasOwnProperty(obj, name) { try { return obj.hasOwnProperty(name) || name in obj; } catch (e) { return obj.hasOwnProperty(name) } } function getHandle(object) { var result = null; if (hasOwnProperty(object, "$handle")) { result = object.$handle; if (result) { return result; } } if (hasOwnProperty(object, "$h")) { return object.$h; } return null; } function dump_dex(packagename, dexfilebegin, dexfilesize) { var dexfile_path = "/sdcard/my_frida_dump_" + packagename + "_" + dexfilesize + ".dex"; var dexfile_handle = new File(dexfile_path, "w"); if (dexfile_handle && dexfile_handle ! = null) { var dex_buffer = ptr(dexfilebegin).readByteArray(dexfilesize); dexfile_handle.write(dex_buffer); dexfile_handle.flush(); dexfile_handle.close(); } } function dealwithClassLoader(classloaderobj, packagename) { if (Java.available) { Java.perform(function () { try { var dexfileclass = Java.use("dalvik.system.DexFile"); var BaseDexClassLoaderclass = Java.use("dalvik.system.BaseDexClassLoader"); var DexPathListclass = Java.use("dalvik.system.DexPathList"); var Elementclass = Java.use("dalvik.system.DexPathList$Element"); var basedexclassloaderobj = Java.cast(classloaderobj, BaseDexClassLoaderclass); var tmpobj = basedexclassloaderobj.pathList.value; var pathlistobj = Java.cast(tmpobj, DexPathListclass); console.log("pathlistobj->" + pathlistobj); var dexElementsobj = pathlistobj.dexElements.value; console.log("dexElementsobj->" + dexElementsobj); for (var i in dexElementsobj) { var obj = dexElementsobj[i]; var elementobj = Java.cast(obj, Elementclass); console.log("elementobj->" + elementobj); tmpobj = elementobj.dexFile.value; var dexfileobj = Java.cast(tmpobj, dexfileclass); var mCookie = dexfileobj.mInternalCookie.value; var mInternalCookie = dexfileobj.mInternalCookie.value; if (mCookie ! = null) { var jnienv = Java.vm.tryGetEnv(); var cookiePtr = getHandle(mCookie); var arrayLength = jnienv.getArrayLength(cookiePtr); var long_data = jnienv.getLongArrayElements(cookiePtr, 0); console.log("arrayLength:" + arrayLength + ",long_data:" + long_data); for (var i = 1; i < arrayLength; i++) { var dexfileptr = Memory.readPointer(ptr(long_data).add(8 * i)); var dexfilebegin = Memory.readPointer(ptr(dexfileptr).add(Process.pointerSize * 1)); var dexfilesize = Memory.readU32(ptr(dexfileptr).add(Process.pointerSize * 2)); console.log("pointer:" + dexfileptr + ",dexfilebegin:" + dexfilebegin + ",dexfilesize:" + dexfilesize); dump_dex(packagename, dexfilebegin, dexfilesize); } } } } catch (e) { console.log(e); }}); } } function tuoke(packagename) { if (Java.available) { Java.perform(function () { console.log("go into enumerateClassLoaders!" ); Java.enumerateClassLoadersSync().forEach(function (loader) { if (loader.toString().indexOf("BootClassLoader") >= 0) { console.log("this is a BootClassLoader!" ) } else { try { console.log("classloader : " + loader); dealwithClassLoader(loader, packagename); } catch (e) { console.log(e); }}})}); }}Copy the code

The advantage of this approach is that it is simple and efficient and does not require time to compile ROM, although it may not be handled by a custom ClassLoader shell.


There is also a frida memory search dex feature dump method, is written by the calabash children: github.com/hluwa/FRIDA…