Source of problem
Starting with Android 7.0, Android prevents apps from using dlopen(), dlsym() and other functions to open system dynamic libraries. However, some large apps often need to use DL function to open the system dynamic library when doing performance monitoring and optimization. Therefore, it is necessary to find a way around this limitation of the system.
Principle of restricting App access to system library
Let’s go to Android11 and see how the dlopen() function works:
// bionic/libdl/libdl.cpp
__attribute__((__weak__))
void* dlopen(const char* filename, int flag) {
const void* caller_addr = __builtin_return_address(0);
return __loader_dlopen(filename, flag, caller_addr);
}
Copy the code
__builtin_return_address(0) is called to retrieve caller_addr before __loader_dlopen() is actually called. Here, __builtin_return_address is a Linux built-in function, and __builtin_return_address(0) is used to return the return address of the current function. In THE ARM architecture, the LR register stores the return address of the current function, so __builtin_return_address(0) gets the current LR register value.
Continue to look at the __loader_dlopen source, and finally execute do_dlopen:
// bionic/linker/linker.cpp void* do_dlopen(const char* name, int flags, const android_dlextinfo* extinfo, const void* caller_addr) { std::string trace_prefix = std::string("dlopen: ") + (name == nullptr ? "(nullptr)" : name); ScopedTrace trace(trace_prefix.c_str()); ScopedTrace loading_trace((trace_prefix + " - loading and linking").c_str()); soinfo* const caller = find_containing_library(caller_addr); android_namespace_t* ns = get_caller_namespace(caller); .Copy the code
Caller_addr is passed to find_containing_library to get information about the dynamic library that contains this address. Find_containing_library function implementation process is also relatively simple, first through all open dynamic library, in the traversal of each dynamic library function address segment, compare caller_ADDR is not in this dynamic library, if, then return this dynamic library information.
// bionic/linker/linker.cpp soinfo* find_containing_library(const void* p) { // Addresses within a library may be tagged if they point to globals. Untag // them so that the bounds check succeeds. ElfW(Addr) address = reinterpret_cast<ElfW(Addr)>(untag_address(p)); for (soinfo* si = solist_get_head(); si ! = nullptr; si = si->next) { if (address < si->base || address - si->base >= si->size) { continue; } ElfW(Addr) vaddr = address - si->load_bias; for (size_t i = 0; i ! = si->phnum; ++i) { const ElfW(Phdr)* phdr = &si->phdr[i]; if (phdr->p_type ! = PT_LOAD) { continue; } if (vaddr >= phdr->p_vaddr && vaddr < phdr->p_vaddr + phdr->p_memsz) { return si; } } } return nullptr; }Copy the code
From the above analysis, it can be seen that the system limits app to call Dlopen by checking whether the LR register value when executing dlopen function is the address of the system library. So how do you get around that? Here’s a simple way around it.
Bypass method
According to the above analysis, if you can change the LR register value to an address of the system library when calling dlopen, you should be able to fool the system verification. However, if you change the VALUE of the LR register to any system library address, you will not be able to go back to the code that followed the dlopen call after the function call. Because in ARM32-bit processor, THE LR register is used to store the return address of the subroutine. When using BL or BLX to jump, the jump instruction automatically puts the return address into the LR register. At the end of the subroutine execution, the return of the program is realized by copying the LR to PC.
Therefore, when changing the LR register value, we also need to ensure that the function can return to the original LR register address to start execution. Therefore, before modifying LR, it is necessary to save the value in the original LR register and restore it after the function is executed, so as to achieve the correct return.
The following uses the Dlopen function as an example to introduce the implementation scheme in detail.
Assembly implementation
To change the VALUE of the LR register, instead of calling dlopen directly, we need to use a springboard function to call dlopen and make sure that the springboard function jumps to dlopen without changing the value of the LR register.
In Arm32, there are two ways to implement instruction jump:
- Use specialized jump instructions: B, BX, BL, BLX
- MOV PC, R0; } POP {R4, PC, etc
Using the jump command to jump to the target address is a short jump that can only jump forward or backward 32MB of address space, that is to say, this is generally a local jump within the module. By writing the jump address value to the program counter PC, it is a long jump, which can be realized in the 4GB address space of any jump, and this jump does not modify the VALUE of the LR register. In addition, BL Register, BLX Register and other instructions can also jump in the full address space.
Therefore, we choose to jump to the Dlopen function by modifying the value of the PC register.
If we know a system library whose address is sys_addr, the assembly implementation of changing LR to sys_ADDR and jumping to dlopen is:
Mov LR, sys_addr // change the value of lr register to system library address mov PC, dlopen // jump to dlopen functionCopy the code
One problem with this approach is that once the Dlopen function is finished, it can’t go back to the point where it was called to continue executing the following code. Therefore, we need to save the value of the original LR register, and then restore the value of the original LR register after the execution of dlopen function, and jump to the corresponding address to start execution.
In Arm assembly, local objects are generally stored on the stack. Therefore, we use the push instruction to save the LR register value on the stack, and then use the POP instruction to restore the LR value saved on the stack to the PC register after dlopen execution, so that we can return to the original position and start execution. The instructions are as follows:
Push {r4, lr} // save the original LR on the stack. Sys_addr // change the LR register value to the system library address mov PC, dlopen // jump to the dlopen functionCopy the code
With this springboard instruction, the dlopen function executes at the address in the LR register, sys_addr. So, we want the sys_addr instruction to be:
Pop {r4, PC} // Restores the value of the original LR register stored on the stack to the PC registerCopy the code
In this way, the address of the original LR register can be popped off the stack into the PC register. So you can go back to where you were calling the code and start executing the instructions that follow.
In fact, those familiar with Arm assembly should know that push {r0-r7, LR} and its corresponding pop {r0-r7, PC} are the first and last assembly instructions for most functions. Corresponding to the Prologue and Epilogue of functions in the assembly respectively. The purpose of the prologue is to preserve the state of the function before execution (by storing LR and R0-R7 on the stack). The purpose of closure is mainly to restore the values of program registers saved in the prologue and to return to the state before the function call occurred.
Gets the system library address
In the above assembly code, the sys_ADDR stored in the LR register is currently unknown. If this address can be retrieved, it will solve the problem perfectly. Pop {r4, PC} = pop {r4, PC} = pop {r4, PC}
Here, there are two ways to get such an address:
- An instruction is found from the so file code area of the system library
pop {r4, pc}
The address; - Modify the instruction corresponding to a known address of the system library to
pop {r4, pc}
;
Here, we use the first search method. It is divided into the following steps:
- traverse
/proc/self/maps
File, find the base address of so file in memory; - Map so files to memory via Mmap;
- Read the section header offset (e_shoff), the size of each section header (e_shentSize), and the number of section headers (e_shnum) by mapping elf headers to memory.
- Iterating through section headers based on offset, size, and number, find the section whose name is. Text, which contains the executable instruction of the program.
- Walk through all instructions in the.text section and find
pop {r4, pc}
Offset corresponding to instruction (0xBD10); - The offset plus the base address of the so file is the memory address corresponding to the instruction.
Replace the searched address with sys_ADDR in assembly code.
The last
Along the same lines, the full implementation code has been uploaded to Github. Welcome star.bypass_dlFunctions