Android PLT Hook Overview

Get code and resources

You can always access the latest version of this article here.

Sample code used in this article can be obtained here. The XHook open source project mentioned in this article is available here.

start

New dynamic library

We have a new dynamic library: libtest.so.

The header file test. H

#ifndef TEST_H
#define TEST_H 1

#ifdef __cplusplus
extern "C" {
#endif

void say_hello(a);

#ifdef __cplusplus
}
#endif

#endif
Copy the code

The source file test. C

#include <stdlib.h>
#include <stdio.h>

void say_hello(a)
{
    char *buf = malloc(1024);
    if(NULL! = buf) {snprintf(buf, 1024."%s"."hello\n");
        printf("%s", buf); }}Copy the code

The function of say_hello is to print the six characters “hello\n” (including the ending \n) on the terminal.

We need a test program: main.

The main source file. C

#include <test.h>

int main(a)
{
    say_hello();
    return 0;
}
Copy the code

Compile them to generate libtest.so and main, respectively. Run it:

caikelun@debian:~$ adb push ./libtest.so ./main /data/local/tmp
caikelun@debian:~$ adb shell "chmod +x /data/local/tmp/main"
caikelun@debian:~$ adb shell "export LD_LIBRARY_PATH=/data/local/tmp; /data/local/tmp/main"
hello
caikelun@debian:~$
Copy the code

That’s great! The libtest.so code looks silly, but it works correctly, so what’s to complain about? Start using it in the new version of the APP!

Unfortunately, as you may have noticed, libtest.so has a serious memory leak problem, leaking 1024 bytes of memory every time the say_hello function is called. After the launch of the new APP, the crash rate began to rise, and all kinds of weird crash information and reports of information.

Problems faced

Fortunately, we fixed the problem with libtest.so. But what about the future? We face two problems:

  1. When the test coverage is insufficient, how to timely find and accurately locate such problems in online apps?
  2. If libtest.so is a system library for some models, or a closed source library for a third party, how can we fix it? How to monitor its behavior?

How to do?

If we can hook (replace, intercept, eavesdrop, or whatever you want to describe correctly) function calls in dynamic libraries, we can do a lot of the things we want to do. Like Hook Malloc, Calloc, Realloc, and Free, we can count how much memory each dynamic library allocates and which memory is always occupied.

Can it really be done? The answer is: it is perfectly ok to hook our own processes. Hook Other processes require root privileges (other processes cannot modify their memory space or inject code without root privileges). Fortunately, we only need to hook ourselves.

ELF

An overview of the

ELF (Executable and Linkable Format) is an industry-standard binary data encapsulation Format, which is mainly used to encapsulate Executable files, dynamic libraries, Object files and core DUMPS files.

The source code is compiled and linked using the Google NDK, resulting in dynamic libraries or executables in ELF format. Readelf allows you to view the basic information of ELF files, and objdump allows you to view the disassembly output of ELF files.

An overview of the ELF format can be found here, and a complete definition can be found here. The most important parts are: ELF file headers, SHT (Section Header Table), and PHT (Program Header Table).

The ELF file header

ELF files start with a fixed-length header in a fixed format (52 bytes for 32-bit architectures and 64 bytes for 64-bit architectures). The ELF file header starts with magic Number 0x7F 0x45 0x4C 0x46 (where the last three bytes correspond to visible characters E, L and F respectively).

ELF header for libtest.so:

caikelun@debian:~$ arm-linux-androideabi-readelf -h ./libtest.so
 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: ARM Version: 0x1 Entry point address: 0x0 Start of program headers: 52 (bytes into file) Start of section headers: 12744 (bytes into file) Flags: 0x5000200, Version5 EABI, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 8 Size of section headers: 40 (bytes) Number of section headers: 25 Section header string table index: 24Copy the code

The ELF file header contains the starting position and length of SHT and PHT in the current ELF file. For example, the SHT of libtest.so starts at 12744 and is 40 bytes long. The PHT starts at position 52 and is 32 bytes long.

SHT (Section Header table)

ELF organizes and manages information in sections. ELF uses SHT to record basic information for all sections. It mainly includes: section type, offset in file, size, relative address of virtual memory after loading into memory, alignment of bytes in memory, etc.

SHT libtest. So:

caikelun@debian:~$ arm-linux-androideabi-readelf -S ./libtest.so There are 25 section headers, starting at offset 0x31c8: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .note.android.ide NOTE 00000134 000134 000098 00 A 0 0 4 [ 2] .note.gnu.build-i NOTE 000001cc 0001cc 000024 00 A 0 0 4 [ 3] .dynsym DYNSYM 000001f0 0001f0 0003a0 10 A 4 1 4 [ 4] .dynstr STRTAB 00000590 000590 0004b1 00 A 0 0 1 [ 5] .hash HASH 00000a44 000a44 000184 04 A 3 0 4 [ 6] .gnu.version VERSYM 00000bc8 000bc8 000074 02 A 3 0 2 [ 7] .gnu.version_d VERDEF 00000c3c 000c3c 00001c 00 A 4 1 4 [ 8] .gnu.version_r VERNEED 00000c58 000c58 000020 00 A 4 1 4 [ 9] .rel.dyn REL 00000c78 000c78 000040  08 A 3 0 4 [10] .rel.plt REL 00000cb8 000cb8 0000f0 08 AI 3 18 4 [11] .plt PROGBITS 00000da8 000da8 00017c 00 AX 0 0 4 [12] .text PROGBITS 00000f24 000f24 0015a4 00 AX 0 0 4 [13] .ARM.extab PROGBITS 000024c8 0024c8 00003c 00 A 0 0 4 [14] .ARM.exidx ARM_EXIDX 00002504 002504 000100 08 AL 12 0 4 [15] .fini_array FINI_ARRAY 00003e3c 002e3c 000008 04 WA 0 0 4 [16] .init_array INIT_ARRAY 00003e44 002e44 000004 04 WA 0 0 1 [17] .dynamic DYNAMIC 00003e48 002e48 000118 08 WA 4 0 4 [18] .got PROGBITS 00003f60 002f60 0000a0 00 WA 0 0 4 [19] .data PROGBITS 00004000 003000 000004 00 WA 0 0 4 [20] .bss NOBITS 00004004 003004 000000 00 WA 0 0 1 [21] .comment PROGBITS 00000000 003004 000065 01 MS 0 0 1 [22] .note.gnu.gold-ve NOTE 00000000 00306c 00001c 00 0 0 4 [23] .ARM.attributes ARM_ATTRIBUTES 00000000 003088 00003b 00 0 0  1 [24] .shstrtab STRTAB 00000000 0030c3 000102 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), y (noread), p (processor specific)Copy the code

The sections that are more important and have a greater relationship with hook are:

  • .dynstr: saves all string constant information.
  • .dynsym: Stores information about the symbol (type, starting address, size, symbol name).dynstrIndex number, etc. A function is also a symbol.
  • .text: Machine instructions generated when program code is compiled.
  • .dynamic: Information for use by the dynamic linker, which records the current ELF external dependencies and the starting positions of other important sections.
  • .got: Global Offset Table. The entry address used to record external calls. When a relocate operation is performed by the linker, the absolute address of the real external call is filled in.
  • .plt: Procedure Linkage Table. A springboard for external calls, mainly used to support lazy binding for external call relocation. (Android currently only supports lazy binding with the MIPS architecture.)
  • .rel.plt: Relocation information for direct calls to external functions.
  • .rel.dyn: in addition to.rel.pltExternal relocation information. (such as calling an external function through a global function pointer)

PHT (Program Header Table)

ELF is loaded into memory in segments. A segment contains one or more sections. ELF uses PHT to record basic information for all segments. It includes the type of segment, offset in the file, size, relative address of virtual memory after loading into memory, and alignment of bytes in memory.

PHT libtest. So:

caikelun@debian:~$ arm-linux-androideabi-readelf -l ./libtest.so 

Elf file type is DYN (Shared object file)
Entry point 0x0
There are 8 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00000034 0x00000034 0x00100 0x00100 R   0x4
  LOAD           0x000000 0x00000000 0x00000000 0x02604 0x02604 R E 0x1000
  LOAD           0x002e3c 0x00003e3c 0x00003e3c 0x001c8 0x001c8 RW  0x1000
  DYNAMIC        0x002e48 0x00003e48 0x00003e48 0x00118 0x00118 RW  0x4
  NOTE           0x000134 0x00000134 0x00000134 0x000bc 0x000bc R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  EXIDX          0x002504 0x00002504 0x00002504 0x00100 0x00100 R   0x4
  GNU_RELRO      0x002e3c 0x00003e3c 0x00003e3c 0x001c4 0x001c4 RW  0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .note.android.ident .note.gnu.build-id .dynsym .dynstr .hash .gnu.version .gnu.version_d .gnu.version_r .rel.dyn .rel.plt .plt .text .ARM.extab .ARM.exidx 
   02     .fini_array .init_array .dynamic .got .data 
   03     .dynamic 
   04     .note.android.ident .note.gnu.build-id 
   05     
   06     .ARM.exidx 
   07     .fini_array .init_array .dynamic .got
Copy the code

All segments of type PT_LOAD are mmapped into memory by the dynamic linker.

Linking View and Execution View

  • Link view: Data organized in sections before ELF is loaded into memory.
  • Execution view: ELF data organized in segments after it has been loaded into memory.

The hook operations we care about are dynamic memory operations, so we are mainly concerned with the execution view, that is, how the data in the ELF is organized and stored after the ELF is loaded into memory.

.dynamic section

This is a very important and special section that contains information such as the memory location of the other ELF sections. In the execution view, there is always a segment of type PT_DYNAMIC that contains the contents of the.dynamic section.

When a hook operation is performed or a dynamic link is performed, the PT_DYNAMIC segment is used to find the memory location of the.dynamic section, and then the other sections are read.

Libtest. so.dynamic section:

caikelun@debian:~$ arm-linux-androideabi-readelf -d ./libtest.so 

Dynamic section at offset 0x2e48 contains 30 entries:
  Tag        Type                         Name/Value
 0x00000003 (PLTGOT)                     0x3f7c
 0x00000002 (PLTRELSZ)                   240 (bytes)
 0x00000017 (JMPREL)                     0xcb8
 0x00000014 (PLTREL)                     REL
 0x00000011 (REL)                        0xc78
 0x00000012 (RELSZ)                      64 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffa (RELCOUNT)                   3
 0x00000006 (SYMTAB)                     0x1f0
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000005 (STRTAB)                     0x590
 0x0000000a (STRSZ)                      1201 (bytes)
 0x00000004 (HASH)                       0xa44
 0x00000001 (NEEDED)                     Shared library: [libc.so]
 0x00000001 (NEEDED)                     Shared library: [libm.so]
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so]
 0x00000001 (NEEDED)                     Shared library: [libdl.so]
 0x0000000e (SONAME)                     Library soname: [libtest.so]
 0x0000001a (FINI_ARRAY)                 0x3e3c
 0x0000001c (FINI_ARRAYSZ)               8 (bytes)
 0x00000019 (INIT_ARRAY)                 0x3e44
 0x0000001b (INIT_ARRAYSZ)               4 (bytes)
 0x0000001e (FLAGS)                      BIND_NOW
 0x6ffffffb (FLAGS_1)                    Flags: NOW
 0x6ffffff0 (VERSYM)                     0xbc8
 0x6ffffffc (VERDEF)                     0xc3c
 0x6ffffffd (VERDEFNUM)                  1
 0x6ffffffe (VERNEED)                    0xc58
 0x6fffffff (VERNEEDNUM)                 1
 0x00000000 (NULL)                       0x0
Copy the code

Dynamic linker

The dynamic linker app in Android is Linker. The source code is here.

The general steps for dynamic linking (such as executing dlopen) are:

  1. Check the ELF list loaded. (If libtest.so is already loaded, it is not reloaded, just increment the reference count of libtest.so by one and return directly.)
  2. The.dynamic section of libtest.so reads the list of external dependencies of libtest.so, removes loaded ELF from this list, and finally obtains the complete list of ELF dependencies to load (including libtest.so itself).
  3. Load ELF in the list one by one. Loading steps:
    • withmmapReserve a chunk of memory large enough for subsequent ELF mapping. (MAP_PRIVATEWay)
    • Read ELF PHT withmmapSet all types toPT_LOADIs mapped to memory in turn.
    • Read the virtual memory address of each section from the. Dynamic segment, and then calculate and save the absolute virtual memory address of each section.
    • Performing the relocate is the most critical step. Relocation information may exist in one or more of the following Secion:.rel.plt..rela.plt..rel.dyn..rela.dyn..rel.android..rela.android. The dynamic linker needs to process these one by one.relxxxRelocation claims in section. Based on the loaded ELF information, the dynamic linker looks for the address of the desired symbol (such as the libtest.so symbol)malloc), and then fill in the address value.relxxxSpecified in theThe target addressOf these”The target address“Generally exists in.got.dataIn the.
    • ELF reference count increased by one.
  4. Call the ELF constructors in the list one by one, whose addresses were previously read from the.dynamic segment (of typeDT_INITDT_INIT_ARRAY). ELF constructors are called layer by layer on a dependency basis, starting with the ELF dependent constructor and ending with libtest.so’s own constructor. (ELF can also define its own destructor, which is automatically called when ELF is unloaded)

Wait a minute! We seem to be on to something! Look again at the relocate section. We just need to get the “destination address” from the.relxxx, and then fill in the “destination address” with a new function address, and then we are done hook? Maybe.

tracking

Static analysis is easy to verify. Take libtest.so for the Armeabi-V7a architecture as an example. Let’s take a look at the assembly code for the say_hello function.

caikelun@debian:~/$ arm-linux-androideabi-readelf -s ./libtest.so

Symbol table '.dynsym' contains 58 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_finalize@LIBC (2)
     2: 00000000     0 FUNC    GLOBAL DEFAULT  UND snprintf@LIBC (2)
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND malloc@LIBC (2)
     4: 00000000     0 FUNC    GLOBAL DEFAULT  UND __cxa_atexit@LIBC (2)
     5: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf@LIBC (2) 6: 00000f61 60 FUNC GLOBAL DEFAULT 12 say_hello ............... .Copy the code

Got it! Say_hello at address F61 corresponds to an assembly instruction volume of 60 (base 10) bytes. View the disassembly output of say_hello with objdump.

caikelun@debian:~$ arm-linux-androideabi-objdump -D ./libtest.so ............... . 00000f60 <say_hello@@Base>: f60: b5b0 push {r4, r5, r7, lr} f62: af02 add r7, sp,# 8
     f64:   f44f 6080   mov.w   r0, # 1024; 0x400
     f68:   f7ff ef34   blx dd4 <malloc@plt>
     f6c:   4604        mov r4, r0
     f6e:   b16c        cbz r4, f8c <say_hello@@Base+0x2c>
     f70:   a507        add r5, pc, #28 ; (adr r5, f90 <say_hello@@Base+0x30>)
     f72:   a308        add r3, pc, #32 ; (adr r3, f94 <say_hello@@Base+0x34>)
     f74:   4620        mov r0, r4
     f76:   f44f 6180   mov.w   r1, # 1024; 0x400f7a: 462a mov r2, r5 f7c: f7ff ef30 blx de0 <snprintf@plt> f80: 4628 mov r0, r5 f82: 4621 mov r1, r4 f84: e8bd 40b0 ldmia.w sp! , {r4, r5, r7, lr} f88: f001 ba96 b.w 24b8 <_Unwind_GetTextRelBase@@Base+0x8> f8c: bdb0 pop {r4, r5, r7, pc} f8e: bf00 nop f90: 7325 strb r5, [r4,# 12]
     f92:   0000        movs    r0, r0
     f94:   6568        str r0, [r5, # 84]; 0x54
     f96:   6c6c        ldr r4, [r5, # 68]; 0x44
     f98:   0a6f        lsrs    r7, r5, # 9f9a: 0000 movs r0, r0 ............... .Copy the code

The call to the malloc function corresponds to instruction BLX DD4. The address DD4 is displayed. Take a look at what’s in this address:

caikelun@debian:~$ arm-linux-androideabi-objdump -D ./libtest.so ............... . 00000dd4 <malloc@plt>: dd4: e28fc600    add ip, pc, # 0, 12
 dd8:   e28cca03    add ip, ip, # 12288; 0x3000
 ddc:   e5bcf1b4    ldr pc, [ip, # 436]! ; 0x1b4. .Copy the code

Sure enough, it jumps to.plt, and after a few address calculations, it finally jumps to the address pointed to by the value in address 3f90, which is a function pointer.

A quick explanation: because arm processors use a three-level pipeline, the value of the first instruction to the PC is the address of the currently executing instruction + 8. Dd4 + 8 + 3000 + 1b4 = 3F90.

Where is address 3F90?

caikelun@debian:~$ arm-linux-androideabi-objdump -D ./libtest.so ............... . 00003f60 <.got>: ... 3f70: 00002604 andeq r2, r0, r4, lsl# 12
    3f74:   00002504    andeq   r2, r0, r4, lsl # 10. 3f88: 00000da8 andeq r0, r0, r8, lsr# 27
    3f8c:   00000da8    andeq   r0, r0, r8, lsr # 27
    3f90:   00000da8    andeq   r0, r0, r8, lsr # 27. .Copy the code

Sure enough, in.got.

Take a look at.rel. PLT again:

caikelun@debian:~$ arm-linux-androideabi-readelf -r ./libtest.so

Relocation section '.rel.plt'at offset 0xcb8 contains 30 entries: Offset Info Type Sym.Value Sym. Name 00003f88 00000416 R_ARM_JUMP_SLOT 00000000 __cxa_atexit@LIBC 00003f8c 00000116 R_ARM_JUMP_SLOT 00000000 __cxa_finalize@LIBC 00003f90 00000316 R_ARM_JUMP_SLOT 00000000 malloc@LIBC ............... .Copy the code

It’s no coincidence that Malloc’s address happens to be stored in 3F90. What are you waiting for? Let’s change the code. Our main.c should look like this:

#include <test.h>

void *my_malloc(size_t size)
{
    printf("%zu bytes memory are allocated by libtest.so\n", size);
    return malloc(size);
}

int main(a)
{
    void **p = (void* *)0x3f90;
    *p = (void *)my_malloc; // do hook
    
    say_hello();
    return 0;
}
Copy the code

Compile and run:

caikelun@debian:~$ adb push ./main /data/local/tmp
caikelun@debian:~$ adb shell "chmod +x /data/local/tmp/main"
caikelun@debian:~$ adb shell "export LD_LIBRARY_PATH=/data/local/tmp; /data/local/tmp/main"
Segmentation fault
caikelun@debian:~$
Copy the code

The train of thought is correct. But it still failed because the code had three problems:

  1. 3f90Is a relative memory address that needs to be converted to an absolute address.
  2. 3f90The corresponding absolute address probably has no write permission, and assigning to it directly would cause a segment error.
  3. Even if the new function address is assigned successfully,my_mallocIt will not be executed because the processor has a Instruction cache.

We need to address these issues.

memory

Base address

In the memory space of the process, the loading address of various ELFs is random, and the loading address, the base address, is available only at runtime. We need to know the ELF base address to convert relative addresses into absolute ones.

Sure enough, if you’re smart enough to be familiar with Linux development, you can call dl_iterate_phdr directly. See here for a detailed definition.

Well, wait, after years of Android development being screwed up, take another look at the linker.h header in the NDK:

#if defined(__arm__)

#if __ANDROID_API__ >= 21
int dl_iterate_phdr(int (*__callback)(struct dl_phdr_info*, size_t, void*), void* __data) __INTRODUCED_IN(21);
#endif /* __ANDROID_API__ >= 21 */

#else
int dl_iterate_phdr(int (*__callback)(struct dl_phdr_info*, size_t, void*), void* __data);
#endif
Copy the code

Why is that? ! Dl_iterate_phdr is not supported in Android 5.0 or later. Our APP should support all versions of Android 4.0 and above. ARM in particular, how can not support? ! That doesn’t make anyone write code!

Fortunately, we realized that we can also parse /proc/self/maps:

root@android:/ # ps | grep main
ps | grep main
shell     7884  7882  2616   1016  hrtimer_na b6e83824 S /data/local/tmp/main

root@android:/ # cat /proc/7884/mapscat /proc/7884/maps address perms offset dev inode pathname -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --... . b6e42000-b6eb5000 r-xp 00000000 b3:17 57457 /system/lib/libc.so b6eb5000-b6eb9000 r--p 00072000 b3:17 57457 /system/lib/libc.so b6eb9000-b6ebc000 rw-p 00076000 b3:17 57457 /system/lib/libc.so b6ec6000-b6ec9000 r-xp 00000000 b3:19 753708 /data/local/tmp/libtest.so
b6ec9000-b6eca000 r--p 00002000 b3:19 753708     /data/local/tmp/libtest.so
b6eca000-b6ecb000 rw-p 00003000 b3:19 753708     /data/local/tmp/libtest.so
b6f03000-b6f20000 r-xp 00000000 b3:17 32860      /system/bin/linker
b6f20000-b6f21000 r--p 0001c000 b3:17 32860      /system/bin/linker
b6f21000-b6f23000 rw-p 0001d000 b3:17 32860      /system/bin/linker
b6f25000-b6f26000 r-xp 00000000 b3:19 753707     /data/local/tmp/main
b6f26000-b6f27000 r--p 00000000 b3:19 753707     /data/local/tmp/main becd5000-becf6000 rw-p 00000000 00:00 0 [stack] ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors] ........... .Copy the code

Maps returns mMAP mapping information for the specified process’s memory space, including various dynamic libraries, executables (such as Linker), stack space, heap space, and even font files. A detailed explanation of the MAPS format is available here.

Our libtest.so has 3 lines in maps. The starting address of the first row with offset 0, b6EC6000, is in most cases the base address we are looking for.

Memory access

The information returned by MAPS already contains permission access information. If you want to execute a hook, you need write permissions, which can be done with mProtect:

#include <sys/mman.h>

int mprotect(void *addr, size_t len, int prot);
Copy the code

Notice When you modify the memory access permission, the unit can only be page. Detailed instructions for MProtect are available here.

Instruction cache

Note that the section types of.got and.data are PROGBITS, which is the execution code. The processor may cache this data. After changing the memory address, we need to clear the processor’s instruction cache and let the processor read the instructions from memory again. The method is to call __builtin___clear_cache:

void __builtin___clear_cache (char *begin, char *end);
Copy the code

Note that the instruction cache can only be cleared in “pages”. __builtin___clear_cache is explained here.

validation

Modify the main. C

Let’s change main.c to:

#include <inttypes.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <test.h>

#define PAGE_START(addr) ((addr) & PAGE_MASK)
#define PAGE_END(addr)   (PAGE_START(addr) + PAGE_SIZE)

void *my_malloc(size_t size)
{
    printf("%zu bytes memory are allocated by libtest.so\n", size);
    return malloc(size);
}

void hook(a)
{
    char       line[512];
    FILE      *fp;
    uintptr_t  base_addr = 0;
    uintptr_t  addr;

    //find base address of libtest.so
    if(NULL == (fp = fopen("/proc/self/maps"."r"))) return;
    while(fgets(line, sizeof(line), fp))
    {
        if(NULL! =strstr(line, "libtest.so") &&
           sscanf(line, "%"PRIxPTR"-%*lx %*4s 00000000", &base_addr) == 1)
            break;
    }
    fclose(fp);
    if(0 == base_addr) return;

    //the absolute address
    addr = base_addr + 0x3f90;
    
    //add write permission
    mprotect((void *)PAGE_START(addr), PAGE_SIZE, PROT_READ | PROT_WRITE);

    //replace the function address* (void **)addr = my_malloc;

    //clear instruction cache
    __builtin___clear_cache((void *)PAGE_START(addr), (void *)PAGE_END(addr));
}

int main(a)
{
    hook();
    
    say_hello();
    return 0;
}
Copy the code

Recompile to run:

caikelun@debian:~$ adb push ./main /data/local/tmp
caikelun@debian:~$ adb shell "chmod +x /data/local/tmp/main"
caikelun@debian:~$ adb shell "export LD_LIBRARY_PATH=/data/local/tmp; /data/local/tmp/main"
1024 bytes memory are allocated by libtest.so
hello
caikelun@debian:~$
Copy the code

Yes, it worked! We didn’t modify the libtest.so code, or even recompile it. We just changed the main program.

The source code for libtest.so and main is available on Github. (Depending on the compiler you are using, or the version of the compiler you are using, malloc may not be 0x3f90 in the generated libtest.so, you need to check with readelf first and then go to main.c.)

Using xhook

Of course, we’ve opened source a library of tools called XHook. With xhook, you can hook libtest.so more gracefully without having to worry about compatibility issues with hard-coding 0x3f90.

#include <stdlib.h>
#include <stdio.h>
#include <test.h>
#include <xhook.h>

void *my_malloc(size_t size)
{
    printf("%zu bytes memory are allocated by libtest.so\n", size);
    return malloc(size);
}

int main(a)
{
    xhook_register(".*/libtest\\.so$"."malloc", my_malloc, NULL);
    xhook_refresh(0);
    
    say_hello();
    return 0;
}
Copy the code

Xhook supports Armeabi, ArmeabI-V7A and ARM64-V8A. Supports Android 4.0 and later versions (API level >= 14). It has been verified by product-level stability and compatibility. You can get xHook here.

To summarize the process of implementing PLT hooks in XHook:

  1. Read MAPS to get ELF’s start address.
  2. Verify ELF header information.
  3. Type is found in PHTPT_LOADAnd the offset for the0In the segment. Calculate ELF base addresses.
  4. Type is found in PHTPT_DYNAMICFrom the segment.dynamicSection, from the.dynamicSection gets the memory addresses of other sections.
  5. in.dynstrSection finds the index value for the symbol that needs to be hooked.
  6. Go through all of them.relxxxSection, find the symbol index and symbol type matching item, for this relocation item, perform the hook operation. Hook process is as follows:
    • Read maps to confirm memory access to the current hook address.
    • If the permission is not readable or writablemprotectChange the access permission to read or write.
    • The caller keeps the current value of the hook address for return if needed.
    • Replace the value of the hook address with the new value. (Execute hook)
    • If I had usedmprotectChanged the memory access permissions, now restore the previous permissions.
    • Clear the processor instruction cache of the memory page where the hook address resides.

FAQ

Can ELF information be read directly from files?

You can. And for format parsing, reading files is the safest way to do it, because while ELF is running, there are many sections that don’t need to be kept in memory all the time and can be discarded after loading, saving a small amount of memory. But from a practical point of view, dynamic linkers and loaders of all platforms do not do this, perhaps deciding that the added complexity is not worth the cost. So instead of reading various ELF messages from memory, reading files adds to the performance loss. In addition, APP may not have access to some system library ELF files.

What is the exact method for calculating the location of a base?

As you will have noticed, the previous description of the libtest.so base address retrieval used “most of the time” in order to simplify concepts and simplify coding. For HOOK, the accurate base address calculation process is as follows:

  1. Find the offset in maps0And,pathnameLines for target ELF. Save the start address of the line asp0.
  2. Find the first type of ELF PHTPT_LOADAnd the offset for the0Store the virtual memory relative address of this segment (p_vaddr) forp1.
  3. p0p1Is the current base address of the ELF.

Most ELF PT_LOAD segments have p_vaddr 0.

In addition, we need to find lines with offset 0 in maps because we want to check the ELF header in memory to ensure that we are operating on a valid ELF before we hook. ELF file headers can only appear in mMAP fields with offset 0.

You can search for “load_bias” in the Android Linker source code to find many detailed comments, as well as refer to linker’s assignment logic for the load_bias_ variable.

How does the compilation option used by target ELF affect the hooks?

There will be some impact.

Calls to external functions can be divided into three situations:

  1. Direct call. Regardless of the compilation option, it can be hooked. The external function address is always stored in.gotIn the.
  2. Called through a global function pointer. Regardless of the compilation option, it can be hooked. The external function address is always stored in.dataIn the.
  3. Called via a local function pointer. If the compile option is -O2 (the default), the call is optimized for a direct call (as in case 1). If the compilation option is -o0, then a pointer to an external function that has been assigned to a temporary variable before hook execution cannot be hooked by PLT. For those assigned after a hook is executed, a PLT hook can be used.

In general, production-grade ELFs rarely compile with -o0, so don’t worry too much. However, if you want your ELF to be as unhooked as possible, try compiling with -o0 and assigning the pointer to the local pointer as early as possible, and then using the local pointer to access the external function all the time.

In short, looking at the C/C++ source code is of no use to understanding this problem. You need to look at the disassembly output of the generated ELF using different compilation options and compare them to see which cases fail to be hooked by PLT for what reason.

What is the reason for the occasional segment error in hook? How to deal with it?

We sometimes have problems like this:

  • read/proc/self/mapsThe access permission of a memory area isCan be readA segment error (SIG: SIGSEGV, code: SEGV_ACCERR) occurred when we read the contents of this area for ELF file header verification.
  • Have been usingmprotect()The access permission of a memory area is changed toCan write.mprotect()Return that the modification succeeded, and then read again/proc/self/mapsEnsure that the access permission of the corresponding memory area isCan write, a segment error (SIG: SIGSEGV, code: SEGV_ACCERR) occurs when performing write operations (replacing function pointer, executing hook).
  • Read and verify the ELF file header successfully, and read PHT or PHT further based on the relative address value in the ELF header.dynamicSection error (SIG: SIGSEGV, code: SEGV_ACCERR or SEGV_MAPERR).

Possible reasons:

  • The memory space of a process is shared by multiple threads, and we may be executing a hook while other threads (or even linkers) are executingdlclose()Or is using itmprotect()Modify the access permissions for this memory region.
  • Android ROMs from different manufacturers, models, and versions may have undisclosed behavior, such as for certain memory areas in some casesWrite protectorRead the protectionMechanisms, and these protective mechanisms are not reflected in/proc/self/maps“.

Problem analysis:

  • Segment errors while reading memory are actually harmless.
  • The only place I need to write data directly by calculating the memory address in the process I’m executing in the hook is the most critical line to replace the function pointer. As long as there are no errors in the logic elsewhere, there is no damage to any other area of memory if a write fails here.
  • When loading an APP running android, the loader has injected the registration logic of the Signal Handler to ensure that the APP crashes with the systemdebuggerdDaemon communication,debuggerduseptraceDebug the crash process, get the required crash scene information, record it to a tombstone file, and then the APP commits suicide.
  • The system sends a segment error signal exactly to the thread where the segment error occurred.
  • We wanted a stealthy, controlled way to avoid APP crashes caused by segment errors.

Let’s be clear: don’t just think of segment errors from an application-layer development perspective. Segment errors are not a monster, they are just a normal way for the kernel to communicate with user processes. When a user process accesses a virtual memory address without permissions or MMap, the kernel sends SIGSEGV signals to the user process to notify the user process, and that’s all. As long as the location where a segment error occurs is controllable, we can handle it in the user process.

Solution:

  • A global is passed before the hook logic enters what we consider a dangerous area (directly calculating the memory address for reading and writing)flagTo mark them, and when they leave the danger zoneflagReset.
  • Register our own Signal handler to catch only segment errors. In the Signal handler, by determiningflagTo determine whether the current thread logic is in the danger zone. If so, usesiglongjmpJump out of the Signal handler and go straight to the “next line of code outside the danger zone” we set up; If it is not, we will restore the signal handler that the loader injected to us and return it directly. In this case, the system will send a segment error signal to our thread again. The default signal handler is used to run the normal logic.
  • We call this mechanism SFP (Segmentation Fault Protection).
  • Note: SFP requires a switch that allows us to turn it on and off at any time. The SFP should always be turned off during APP development and debugging so that segment errors due to coding errors are not missed and should be fixed. The SFP should be enabled after launch to ensure that the APP does not crash. (Of course, partial shutdown of SFP in the form of sampling is also considered to observe and analyze the crash caused by the hook mechanism itself.)

Specific code can refer to the implementation of Xhook, in the source search siglongJMP and SIGsetjMP.

Can calls between ELF internal functions hook?

The hook method we introduce here is PLT hook, which cannot be used to call ELF internal functions.

Inline hooks do this by knowing the symbol name or address of the internal function you want to hook.

There are many open and non-open inline hook implementations, such as:

  • substrate:http://www.cydiasubstrate.com/
  • Frida: https://www.frida.re/

The inline hook scheme is powerful and can cause the following problems:

  • Due to the need to parse and modify machine instructions (sink codes) directly in ELF, there may be compatibility and stability issues for different architectures of processors, processor instruction sets, compiler optimization options, and operating system versions.
  • Problems can be difficult to analyze and locate when they occur, and some well-known inline hook schemes are closed source.
  • The realization is relatively complex and difficult.
  • Unknown pits are relatively more, this can be Google.

It is recommended not to try inline hooks if PLT hooks are sufficient.

Contact the author

caikelun#qiyi.com

license

Copyright (C) 2018, IQiyi, Inc. All Rights Reserved.

This article is licensed under a Creative Commons license.