Series of articles:OC Basic principle series.OC Basic knowledge series.Swift bottom exploration series.IOS Advanced advanced series
preface
I planned to talk about the practical application of scripts in this article, but I found that some scripts had been written in the previous article after I sorted them out. More than half of the article was written, I found that there was too little content to write, so I decided to write about DYLD and LLDB.
The Mach – o analysis
Mach-o and the linker. Symbol has already talked about mach-O. We have talked about the structure of Mach-O and described its Header
The code analysis
- We create a test.m file and put it
Compile to an executable
Let’s take a look at the contents of our test.m file
- We compile test.m into an executable
- We see the
The contents of an executable file
- Look at the
.o file
Different from the executable file above: the virtual memory address on the left has become an offset.
- This isn’t obvious, so let’s change the test.m file
- Same operation, read the.o file
The offset is more obvious when you find more content. O content is loaded in write order
- Analyze the.o file at this point
- Let’s look at what’s in the red box
Two methods test
andtest_1
All the way up fronte8
At the beginning, thise8
It’s fixedMachine code
Are allRefers to the callq
- The back of the e8
00 00 00 00
What is this called? Here’s an idea. Here’s oneStatic address relative displacement call instruction
The offset, which is equal to00 00 00 00 +
At the back of the48
That’s equal to ustest
theaddress
- We see
test
actuallyThe address is 0
And what we have in the red boxThe method does not have this address
That tells you something about thisaddress
isA virtual address
whileNonreal address
What to do? We know it’s in the executableA virtual address
, we tellInstruction is
Need to bePut in the offset address
So that we canGet the real address
. At this point we need to putTest to the relocation symbol table
In,Tell the linker we need to relocate here
the
- Let’s look at what’s in the red box
- Relocate symbol table
- Analysis of the
- At this time
49
isNeed to reposition
theTest_1 (test_1)
Before theLocation of 48
.00 00 00 00
isThe location of the 49
, that is,Tell the linker
And to thisLocation to be
toTo reposition
- Look at the
The global address is 3B
Before, it was39
.The c7 is 3 a
.05 a 3 b
And then the backfc ff ff ff
isThe position to be repositioned
- At this time
- Compile the executable
When we look at our main function, we see that instead of an offset, we allocate a virtual memory address
- Analysis of the
- At this point we
Look at the test
It saysb8 ff ff ff + 100003fa8
Is the test ofMemory address
At this point isffffffb8
, the beginning isff
That is theComplement is negative
Through thecomplement
Take the original codeTake the + 1
So the frontFFFFFF don't need to look
All I need isSee b8
It is ok
- At this time
If I invert it, it becomes 01000111
At this time,Add 1
Convert to hexadecimal
- At this time
The source code is 0x48
“And then it saysNeed + 100003FA8 below
I’ll get the original address - At this time
0x0000000100003f60
Is the test ofThe original address
- At this point we
View the value of global
(Displays all binary contents of the current file)
Look at __data at the bottom, where 0a000000 is the initial value and 100008000 is its position
- Calculate its position
Is equal to the position printed above
conclusion
Now you know how to find the data inside Mach-O have a clear understanding of it! This is the offset to find the original address to call
Debugging information
DSYM file
dSYM
Is:Save files that store debugging information in DWARF format
DWARF
: aA debug file format used by many compilers and debuggers to support source-level debugging
How to generate dSYM debugging information
- Read 1.
debug map
- From 2.
.o
In the fileLoading __DWARF
- 3.
Relocate all addresses
- 4. Finally, all of them
DWARF
Packaged indSYM Bundle
explain
- 1. Generate debugging information
At this point the debugging information is in the __DWARF section
- 2. Generate an executable file
At this time search __DWARF is not found, is in the link when the __DWARF will be deleted, put in another location
- 3. Check __DWARF’s position in the executable
We put it in the symbol table, DWARF is a file format, and if we write symbols into a file in a certain format, that’s fine
- 4. Generate a dSYM file
Use instructions:clang -g1 test.m -o test
You canGenerate a dSYM file
- 5. View the dSYM file
Found in this holds the complete information of the symbol, will be located in the file, symbol address, name.
Practical application of project
How to use dSYM files in daily development
- Let’s look at the VC code
This code will execute test_dwarf in 2 seconds, and the array in test_dwarf will be out of bounds
- The collapse of information
The crash information is very clear, give the crash code name, give it because it has been restored. If you don’t want it to go back you have to unsign it
- Off sign setting
Once this is set, Xcode will be unsigned and run again
Finding that we are no longer given a specific crash name, let’s restore the address to our symbol
Address restored to symbol
Screenshot aboveaddress
isOffset address
, we are inCheck the headers
When I sawThe first address is 0x0000000100000000
, so even ifThe offset
Is alsoAccording to the zero x0000000100000000
addressLet's take macho
forThe offset
the
We need to subtract the offset address of the system’s own library from the offset address of the method to get the real virtual address. The reason for this is that dSYM stores real virtual addresses
- Gets the real virtual address
Now we have the offset address
- Revert to symbol
- I wrote a script to copy the dSYM and the package into the same file
- The run automatically generates dSYM
- Find symbol
DSYM files store virtual addresses without offsets
ASLR
Continuing with the above code, we add the following:So let’s do that and see what we getReal virtual address
And whether it can be found in dSYMz
- So let’s take a look
DSYM file
Look up the address
Finding this address means that ASLR is offset from the image in the Macho binary. What’s the use of this thing?
ASLR application
I wrote a project that introduced the TestFramework, so what’s in the TestFramework
This is the only method, and do print
- How did the TestFramework get introduced
H and.m files are imported if Source=1 is used, and Framework is imported if not
- Direct use of
Pod Install for import
- use
Source=1 to introduce
- The component’s methods are called in the project
- Now let’s run the project in
Add a breakpoint lj_test
And seeCan you jump to
weInside the component
- Click next
Discovery jumps into components, so why jump here?
- Look at the binary information and see why
This path is the path of the TestFramework, so if it finds the source code, it can jump into it
- USES:
Save debugging information, component information is useful, by reading the path, you can find the source, when we componentize or binary, through debugging information to enter the corresponding source
dyld
How to debug dyLD
- 1. The first: if
Want to debug dyLD source code
, you need toReady with debugging information
thedyld/libdyld.dylib/ libclosured.dylib
, andThe system does the replacement
, the risk is higher. - 2. The second kind:
The LLDB maintains a list of libraries
To avoidSet breakpoints by name
whenThere is a problem
And theDyld and libdyld. Dylib are in this list
On. There are two ways to do thisForce a breakpoint on dyLD
:br set -n dyldbootstrap::start -s dyld
set set target.breakpoints-use-platform-avoid-list 0
- The second way
No need to look at code, binaries
, butAvailable through DYLD
theThe environment variable
toControls dyLD to output useful information during operation
DYLD_PRINT_APIS: Prints almost all the calls that occur inside dyLD
DYLD_PRINT_LIBRARIES: Prints all dynamic libraries that are being loaded during application startup
DYLD_PRINT_WARNINGS: Prints auxiliary information about dyLD running
DYLD_*_PATH: displays the directory order in which DYLD searches for dynamic libraries
DYLD_PRINT_ENV: Displays the environment variables initialized by dyLD
DYLD_PRINT_SEGMENTS: Prints the segment information of the current program
DYLD_PRINT_STATISTICS: Prints the pre-main time
DYLD_PRINT_INITIALIZERS: All displays have Initialiser
Debugging dyld
Here we enter a test project where we set breakpoints on dyLD
- By:
B dyLDbootstrap ::start Sets a breakpoint
We found it didn’t work
- By:
Br set -n dyLDbootstrap ::start -s dyld Sets a breakpoint
The setup succeeded
- Through 3.
Ban the whitelist
The way ofTo set breakpoints
Dyldbootstrap ::start = dyldbootstrap::start = dyLDbootstrap ::start See the ban command ends with 0, restore 0 to 1. This setting only takes effect in the current LLDB and takes effect after you exit the LLDB
Here are a few of the environment variables introduced, so let’s try them out
Print dyLD interior
almostall
thecall
This prints out what our project is doing at DyLD
- Printed on
Application startup
During the period ofBeing loaded
theAll dynamic libraries
Test uses so many dynamic libraries in its run
Print the segment information for the current program
Truncated section, you can see the __TEXT, __DATA address range
Use dyLD for debugging
We use dyLD to debug our test and print debugging information
Before main, same as before, but executing main prints some information
Dyld loading procedure
This part has been mentioned in App Startup process (DYLD loading process), the underlying principle of OC, and it is necessary to supplement the content here
Execute the process
- Some system process functions initialize the following before executing dyLD:
Load files from disk into memory
(1. It is faster to read later. 2. It will start faster next time)Parse the Mach header of the current executable to determine whether the current Mach-O file is available
- if
available
,According to the Mach header
.Parsing the load commands
.According to the analytical results
.Loads portions of a program into the specified address space
And at the same timeSet protection flags
R-x, RW. - That's the protection mark
- from
LC_LOAD_DYLINKER
In the loaddyld
Dyld went to work
What exactly did Dyld do
Dyld: dynamic link program
Libdyld. Dylib: provides dynamic linking for our programs during Runtime
- 1. Perform
Initialize the load environment itself
; LC_DYLD_INFO_ONLY -
loading
The currentThe program links
theAll dynamic libraries
To the specified memory; LC_LOAD_DYLIB
-
Search all
theThe dynamic library
.The binding
Need to beSymbol used before calling a program
(non-lazy loading symbol); LC_DYSYMTAB
-
The binding will be required in the Indirect Symbol table
theImport symbol real address replacement
; LC_DYSYMTA (Indirect symbol table)
-
Provides an interface function for the program to use dyLD at Runtime
Libdyld. Dylib, provided by LC_LOAD_DYLIB;
-
Configure the Runtime
.Execute all global constructors used in dynamic libraries/images
;
-
Dyld calls program entry functions
And began toExecute a program
. LC_MAIN
The address of main for test is 16256. Let’s see if main is there
Dyld perform
It says that dyLD does some things before it starts up, but what does dyLD do after it starts up? Here I have sorted out the process behind DYLDOne of them will be called__dyld_start () method
Come,It indicates that DYLD starts work
Let’s look at this method
- We first
The __dyLD_statr () method breaks the point
Then we found that our breakpoint was not successful because it was implemented in assembly
- through
Regular method to break points
We found that the breakpoint was successful, but the address appears to be a false address, so it did not break successfully. But I want to say that regex is also very useful for exploring the underlying layer
- A cache hit
Cache review
That’s what Dyld said earlierShared cache
, if theShared cache
In thefind
The isDirect return
.The DYLD configuration is complete
.The control of
Hand inTo executable files
theThe entry function main
And thenThe main function
theThe address is returned to libdyld
. Why return it?
As we can see from the print, libdyld will call start after returning the address of main to libdyld
- Cache miss
- 1.
Insert dynamic library
(It is not a dynamic library to which our application links.) - 2.
Linked dynamic library
(Dynamic library that the program needs to link) - 3.
Link inserted library
- 4.
Apply the insert function
- 5.
Binding symbols
- 6.
libSystem_initializer()
Read,Take LCMain
.Find the entry function address
- 7.
Find the setup program entry function by using LC_MAIN
.Set the glue address to the entry function address
, otherwise,The glue address is 0
.On failure
(Before that: the libSystem_initializer() method for libSystem is executed before the main function.)
- 1.
Insert dynamic library
- To prepare
There is an inject.m file in the project, which makes a constructor. In the constructor, only one line is printed. We have wrapped it into a dynamic library
Dynamic library details
Let’s take a look at another project
Nothing was introduced in the project
Set: DYLD_INSERT_LIBRARIES and Inject the dynamic library to it
- Run TestInject
Inject the dynamic library by using environment variables
Insert the function
- To prepare
I wrote the macro, which is used to hook the function, and line 24 replaced the NSLog with my_NSLog, which we wrote earlier when we did the method substitution with the method swap, which is the insert function provided by DYld
The macros above look messy, so we can change them
- 1.
__attribute__((used)) tells the compiler that I'm using this secretly, without warning
- 2.
struct { const void* replacement; const void* replacee; } is to declare a structure
- 3.
_interpose_NSLog structure name
- 4.
__attribute__ ((section("__DATA, __interpose"))) this puts this variable into the created section
- 5.
{ (const void*) (unsigned long) &my_NSLog, (const void*) (unsigned long) &NSLog }; I'm going to initialize this structure, pass in the my_NSLog and NSLog addresses
Insert functions are replaced when you write them, this is replaced when dyLD loads them (DyLD determines whether the __interpose has any content in the DATA section, and hooks are present), we usually use method swaps that are replaced at runtime
- Introduce this dynamic library in TestInject
- Run TestInject
We found that the NSLog has been changed to our custom print. This is the insertion function
instructions
The above method is not available on the shelf, but can be used as a way to explore other people’s source code
Wrote last
This article took a long time to write, because I thought about some problems during the writing process. Because their own article is their own exploration of things, so writing time is very long, their first exploration, and then the exploration process recorded, there are problems to solve, the solution process also want to write down, very slow! There are a lot of things planned by myself. If things go on like this, when can I finish the content planned by myself? So I decided not to write the content of my own exploration in the future, I will write some articles about the application of the project, such as this kind of basic technical articles. Thank you all for your support