Series of articles:OC Basic principle series.OC Basic knowledge series.Swift bottom exploration series.IOS Advanced advanced series

preface

I planned to talk about the practical application of scripts in this article, but I found that some scripts had been written in the previous article after I sorted them out. More than half of the article was written, I found that there was too little content to write, so I decided to write about DYLD and LLDB.

The Mach – o analysis

Mach-o and the linker. Symbol has already talked about mach-O. We have talked about the structure of Mach-O and described its Header

The code analysis

  • We create a test.m file and put itCompile to an executableLet’s take a look at the contents of our test.m file

  • We compile test.m into an executable

  • We see theThe contents of an executable file

  • Look at the.o file

Different from the executable file above: the virtual memory address on the left has become an offset.

  • This isn’t obvious, so let’s change the test.m file

  • Same operation, read the.o file

The offset is more obvious when you find more content. O content is loaded in write order

  • Analyze the.o file at this point
    • Let’s look at what’s in the red boxTwo methods testandtest_1All the way up fronte8At the beginning, thise8It’s fixedMachine codeAre allRefers to the callq
    • The back of the e800 00 00 00What is this called? Here’s an idea. Here’s oneStatic address relative displacement call instructionThe offset, which is equal to00 00 00 00 +At the back of the48That’s equal to ustesttheaddress
    • We seetestactuallyThe address is 0And what we have in the red boxThe method does not have this addressThat tells you something about thisaddressisA virtual addresswhileNonreal addressWhat to do? We know it’s in the executableA virtual address, we tellInstruction isNeed to bePut in the offset addressSo that we canGet the real address. At this point we need to putTest to the relocation symbol tableIn,Tell the linker we need to relocate herethe
  • Relocate symbol table

  • Analysis of the
    • At this time49isNeed to repositiontheTest_1 (test_1)Before theLocation of 48.00 00 00 00isThe location of the 49, that is,Tell the linkerAnd to thisLocation to betoTo reposition
    • Look at theThe global address is 3BBefore, it was39.The c7 is 3 a.05 a 3 bAnd then the backfc ff ff ffisThe position to be repositioned
  • Compile the executable

When we look at our main function, we see that instead of an offset, we allocate a virtual memory address

  • Analysis of the
    • At this point weLook at the testIt saysb8 ff ff ff + 100003fa8Is the test ofMemory addressAt this point isffffffb8, the beginning isffThat is theComplement is negativeThrough thecomplementTake the original codeTake the + 1So the frontFFFFFF don't need to lookAll I need isSee b8It is ok

    • At this timeIf I invert it, it becomes 01000111At this time,Add 1

    • Convert to hexadecimal
    • At this timeThe source code is 0x48“And then it saysNeed + 100003FA8 belowI’ll get the original address
    • At this time0x0000000100003f60Is the test ofThe original address
  • View the value of global(Displays all binary contents of the current file)

Look at __data at the bottom, where 0a000000 is the initial value and 100008000 is its position

  • Calculate its position

Is equal to the position printed above

conclusion

Now you know how to find the data inside Mach-O have a clear understanding of it! This is the offset to find the original address to call

Debugging information

DSYM file

  • dSYMIs:Save files that store debugging information in DWARF format
  • DWARF: aA debug file format used by many compilers and debuggers to support source-level debugging

How to generate dSYM debugging information

  • Read 1.debug map
  • From 2..oIn the fileLoading __DWARF
  • 3.Relocate all addresses
  • 4. Finally, all of themDWARFPackaged indSYM Bundle

explain

  • 1. Generate debugging information

At this point the debugging information is in the __DWARF section

  • 2. Generate an executable file

At this time search __DWARF is not found, is in the link when the __DWARF will be deleted, put in another location

  • 3. Check __DWARF’s position in the executable

We put it in the symbol table, DWARF is a file format, and if we write symbols into a file in a certain format, that’s fine

  • 4. Generate a dSYM file

Use instructions:clang -g1 test.m -o testYou canGenerate a dSYM file

  • 5. View the dSYM file

Found in this holds the complete information of the symbol, will be located in the file, symbol address, name.

Practical application of project

How to use dSYM files in daily development

  • Let’s look at the VC code

This code will execute test_dwarf in 2 seconds, and the array in test_dwarf will be out of bounds

  • The collapse of information

The crash information is very clear, give the crash code name, give it because it has been restored. If you don’t want it to go back you have to unsign it

  • Off sign setting

Once this is set, Xcode will be unsigned and run again

Finding that we are no longer given a specific crash name, let’s restore the address to our symbol

Address restored to symbol

Screenshot aboveaddressisOffset address, we are inCheck the headersWhen I sawThe first address is 0x0000000100000000, so even ifThe offsetIs alsoAccording to the zero x0000000100000000addressLet's take machoforThe offsetthe

We need to subtract the offset address of the system’s own library from the offset address of the method to get the real virtual address. The reason for this is that dSYM stores real virtual addresses

  • Gets the real virtual address

Now we have the offset address

  • Revert to symbol
    • I wrote a script to copy the dSYM and the package into the same file
    • The run automatically generates dSYM
    • Find symbol

DSYM files store virtual addresses without offsets

ASLR

Continuing with the above code, we add the following:So let’s do that and see what we getReal virtual addressAnd whether it can be found in dSYMz

  • So let’s take a lookDSYM fileLook up the address

Finding this address means that ASLR is offset from the image in the Macho binary. What’s the use of this thing?

ASLR application

I wrote a project that introduced the TestFramework, so what’s in the TestFramework

This is the only method, and do print

  • How did the TestFramework get introduced

H and.m files are imported if Source=1 is used, and Framework is imported if not

  • Direct use ofPod Install for import

  • useSource=1 to introduce

  • The component’s methods are called in the project

  • Now let’s run the project inAdd a breakpoint lj_testAnd seeCan you jump toweInside the component

  • Click next

Discovery jumps into components, so why jump here?

  • Look at the binary information and see why

This path is the path of the TestFramework, so if it finds the source code, it can jump into it

  • USES:

Save debugging information, component information is useful, by reading the path, you can find the source, when we componentize or binary, through debugging information to enter the corresponding source

dyld

How to debug dyLD

  • 1. The first: ifWant to debug dyLD source code, you need toReady with debugging informationthedyld/libdyld.dylib/ libclosured.dylib, andThe system does the replacement, the risk is higher.
  • 2. The second kind:The LLDB maintains a list of librariesTo avoidSet breakpoints by namewhenThere is a problemAnd theDyld and libdyld. Dylib are in this listOn. There are two ways to do thisForce a breakpoint on dyLD:
    • br set -n dyldbootstrap::start -s dyld
    • set set target.breakpoints-use-platform-avoid-list 0
  • The second wayNo need to look at code, binaries, butAvailable through DYLDtheThe environment variabletoControls dyLD to output useful information during operation
    • DYLD_PRINT_APIS: Prints almost all the calls that occur inside dyLD
    • DYLD_PRINT_LIBRARIES: Prints all dynamic libraries that are being loaded during application startup
    • DYLD_PRINT_WARNINGS: Prints auxiliary information about dyLD running
    • DYLD_*_PATH: displays the directory order in which DYLD searches for dynamic libraries
    • DYLD_PRINT_ENV: Displays the environment variables initialized by dyLD
    • DYLD_PRINT_SEGMENTS: Prints the segment information of the current program
    • DYLD_PRINT_STATISTICS: Prints the pre-main time
    • DYLD_PRINT_INITIALIZERS: All displays have Initialiser

Debugging dyld

Here we enter a test project where we set breakpoints on dyLD

  • By:B dyLDbootstrap ::start Sets a breakpoint

We found it didn’t work

  • By:Br set -n dyLDbootstrap ::start -s dyld Sets a breakpoint

The setup succeeded

  • Through 3.Ban the whitelistThe way ofTo set breakpoints

Dyldbootstrap ::start = dyldbootstrap::start = dyLDbootstrap ::start See the ban command ends with 0, restore 0 to 1. This setting only takes effect in the current LLDB and takes effect after you exit the LLDB

Here are a few of the environment variables introduced, so let’s try them out

  • Print dyLD interioralmostallthecall

This prints out what our project is doing at DyLD

  • Printed onApplication startupDuring the period ofBeing loadedtheAll dynamic libraries

Test uses so many dynamic libraries in its run

  • Print the segment information for the current program

Truncated section, you can see the __TEXT, __DATA address range

Use dyLD for debugging

We use dyLD to debug our test and print debugging information

Before main, same as before, but executing main prints some information

Dyld loading procedure

This part has been mentioned in App Startup process (DYLD loading process), the underlying principle of OC, and it is necessary to supplement the content here

Execute the process

  • Some system process functions initialize the following before executing dyLD:
    • Load files from disk into memory(1. It is faster to read later. 2. It will start faster next time)
    • Parse the Mach header of the current executable to determine whether the current Mach-O file is available
    • ifavailable,According to the Mach header.Parsing the load commands.According to the analytical results.Loads portions of a program into the specified address spaceAnd at the same timeSet protection flags


      R-x, RW. - That's the protection mark

    • fromLC_LOAD_DYLINKERIn the loaddyld
    • Dyld went to work

What exactly did Dyld do

Dyld: dynamic link program

Libdyld. Dylib: provides dynamic linking for our programs during Runtime

  • 1. PerformInitialize the load environment itself; LC_DYLD_INFO_ONLY
    1. loadingThe currentThe program linkstheAll dynamic librariesTo the specified memory; LC_LOAD_DYLIB

    1. Search alltheThe dynamic library.The bindingNeed to beSymbol used before calling a program(non-lazy loading symbol); LC_DYSYMTAB
    1. The binding will be required in the Indirect Symbol tabletheImport symbol real address replacement; LC_DYSYMTA (Indirect symbol table)

    1. Provides an interface function for the program to use dyLD at RuntimeLibdyld. Dylib, provided by LC_LOAD_DYLIB;
    1. Configure the Runtime.Execute all global constructors used in dynamic libraries/images;
    1. Dyld calls program entry functionsAnd began toExecute a program. LC_MAIN

The address of main for test is 16256. Let’s see if main is there

Dyld perform

It says that dyLD does some things before it starts up, but what does dyLD do after it starts up? Here I have sorted out the process behind DYLDOne of them will be called__dyld_start () methodCome,It indicates that DYLD starts workLet’s look at this method

  • We firstThe __dyLD_statr () method breaks the point

Then we found that our breakpoint was not successful because it was implemented in assembly

  • throughRegular method to break points

We found that the breakpoint was successful, but the address appears to be a false address, so it did not break successfully. But I want to say that regex is also very useful for exploring the underlying layer

  • A cache hit

Cache reviewThat’s what Dyld said earlierShared cache, if theShared cacheIn thefindThe isDirect return.The DYLD configuration is complete.The control ofHand inTo executable filestheThe entry function mainAnd thenThe main functiontheThe address is returned to libdyld. Why return it?

As we can see from the print, libdyld will call start after returning the address of main to libdyld

  • Cache miss
    • 1.Insert dynamic library(It is not a dynamic library to which our application links.)
    • 2.Linked dynamic library(Dynamic library that the program needs to link)
    • 3.Link inserted library
    • 4.Apply the insert function
    • 5.Binding symbols
    • 6.libSystem_initializer()Read,Take LCMain.Find the entry function address
    • 7.Find the setup program entry function by using LC_MAIN.Set the glue address to the entry function address, otherwise,The glue address is 0.On failure(Before that: the libSystem_initializer() method for libSystem is executed before the main function.)

Insert dynamic library

  • To prepare

There is an inject.m file in the project, which makes a constructor. In the constructor, only one line is printed. We have wrapped it into a dynamic library

Dynamic library details

Let’s take a look at another project

Nothing was introduced in the project

Set: DYLD_INSERT_LIBRARIES and Inject the dynamic library to it

  • Run TestInject

Inject the dynamic library by using environment variables

Insert the function

  • To prepare

I wrote the macro, which is used to hook the function, and line 24 replaced the NSLog with my_NSLog, which we wrote earlier when we did the method substitution with the method swap, which is the insert function provided by DYld

The macros above look messy, so we can change them

  • 1.__attribute__((used)) tells the compiler that I'm using this secretly, without warning
  • 2.struct { const void* replacement; const void* replacee; } is to declare a structure
  • 3._interpose_NSLog structure name
  • 4.__attribute__ ((section("__DATA, __interpose"))) this puts this variable into the created section
  • 5.{ (const void*) (unsigned long) &my_NSLog, (const void*) (unsigned long) &NSLog }; I'm going to initialize this structure, pass in the my_NSLog and NSLog addresses

Insert functions are replaced when you write them, this is replaced when dyLD loads them (DyLD determines whether the __interpose has any content in the DATA section, and hooks are present), we usually use method swaps that are replaced at runtime

  • Introduce this dynamic library in TestInject

  • Run TestInject

We found that the NSLog has been changed to our custom print. This is the insertion function

instructions

The above method is not available on the shelf, but can be used as a way to explore other people’s source code

Wrote last

This article took a long time to write, because I thought about some problems during the writing process. Because their own article is their own exploration of things, so writing time is very long, their first exploration, and then the exploration process recorded, there are problems to solve, the solution process also want to write down, very slow! There are a lot of things planned by myself. If things go on like this, when can I finish the content planned by myself? So I decided not to write the content of my own exploration in the future, I will write some articles about the application of the project, such as this kind of basic technical articles. Thank you all for your support