Author: Mi Guang, likes iOS development, likes to do things, wechat subscription number: Chop finger north, Bilibili channel: YZ06276
Review:
Backgammon, editor of Old Driver Technology Weekly, focuses on MNN Workbench, a one-stop machine learning platform, which can be downloaded from www.mnn.zone
Damonwong, iOS developer, editor of veteran driver technology weekly, works in the technology department of tao department
preface
Symbolization helps us locate bugs, crashes, and performance bottlenecks by finding root code causes in runtime logs and stacks; I’m sure you’re familiar with common symbolic tools such as ATOS or dSYM, but how do these tools work? This article will focus on the definition, principle, practice and skills of symbolization to lead you to a deeper understanding of symbolization; This article is based on Session 10211 – Symbolication: Beyond the Basics. The Session speaker is Alejandro Lucena, an engineer from Apple’s Performance Tools team
What is symbolization?
“Map App runtime information to source” to make a long story short is to convert the runtime information into source information, symbolization is a mechanism, we will run the device App memory address and associated instruction information into the source file specific file name, method name, line number, etc.; This can be interpreted as translating information about how the runtime machine views processing our App into how we developers view processing our App (source code). Without this layer of transformation, bug locating becomes difficult even in an App with only a few lines of code;
Demo
In order to guide you to understand the principle of symbolization, the project used in this paper is a simple Demo App with only a few lines, and all his codes are as follows:The logic of demo is simple:randomValue()
It can generate random numbers ranging from 1 to 100numberChoices()
An array of 10 of the above random numbers can be generatedselectMagicNumber(choices: numbers)
We can fetch an element with a specified index from the numbers arraygenerateMagicNumber()
Perform the above operations step by step to return the element that fetched the subscript where MAGIC_CHOICE is a random value
Symbolization of the daily crash log
The first time you do thisApp
The error log generated by the crash, there is no intuitive information, is a pile of memory address, I can only seeApp
On the main threadcrash
A;
I try to be directdebug
myApp
, but the problem does not recur in execution, so it seems that the debugger may not be able to help; After several attempts finally reappeared, but the program crashed in assembly, there is no intuitive information, assembly is too hard core, do not know.
Neither the crash log nor the assembly stack above obviously solves the problem directly, but with the help of symbolization, we can avoid mining errors from these raw memory addresses; I’m sure you all know thatXcode Organizer
In the loadApp
的 dSYM
File, he’ll reprocess the crash log, and when it’s loaded, we’ll get a crash log like this, which is readable, with call information, file name, number of lines, and the crash log tells me directly that an array was accessed out of bounds, which is very intuitive; With this information, it is also easy to trace back to the code to find random valuesMAGIC_CHOICE
Easy to cause the array to be out of bounds when accessing an array of only 10 lengths;
useatos
Command line tools, we can also get the above information
dailyInstruments
Symbolization of the stack
Another symbolic example is when a performance optimization in Instruments detects that the App periodically performs a large number of writes, resulting in periodic high and low load intervals. However, by default, the stack information displayed in the lower right corner can only indicate that the App is writing files, regardless of high load or low load, indicating the same stack;This is because the current Instruments stack is partially symbolized. In general, symbolization is incomplete without a specific file name or line number in the stack. At this point, we can also manually load the dSYM file on Instruments. When we look at the high load area, we are explicitly reminded that there is an extra debug code addDebugLog(), while the method is not called in the low load area. DSYM not only makes crash logs that contain only memory address information readable, but also helps make the Instruments stack information useful, all of which helps us find the code behind the problem.
Symbolic principle
Since symbolic tools can help us locate code problems, you must ask, What? According to? Why can dSYM help symbolization? How? How does dSYM help accomplish symbolization? Is dSYM symbolic of everything? Except for crash logs andInstruments
You can load it somewhere elsedSYM
?atos
的 -o
-i
-l
What is the use of each?Instruments
Why not provide a fully symbolic stack directly?Xcode
How do compile Settings affect symbolization? With these questions in mind, let’s delve into some symbolic principles.
To this end, we first decompose and introduce the two steps of symbolization:Step 1: Trace back from the memory address to the file Step 2: Restore the runtime debugging information
Step 1 – address and translation associated with symbolization
Backtracking from memory address to file address refers to converting random memory addresses at runtime into file information that is stably available in binary files on disk; Just as memory addresses have memory space, binaries have address space on disk; However, these two address Spaces cannot be translated directly, so an address translation mechanism is needed.
Address space and binary address on disk
The address of the disk address space is the address given to the binary by the Linker Linker at compile time; Linker groups binary code into segments. Each Segment contains data and attributes, such as the name, size, and address of the Segment. For example, the __TEXT section of a binary file contains the corresponding methods and functions, and the __DATA section contains the global state of the program, such as global variables. Each segment is assigned a unique starting address, which ensures that segments do not overlap.
Specifically, Linker records segment information in the executable header as part of the Mach-O header; As we all know, Mach-O is a file format of executable files and libraries. The Mach-O header contains many load instructions related to the attributes of segments. The operating system kernel loads the corresponding binary segments into memory by reading these load instructions. If the App uses Universal2 packaging, each architecture will have a corresponding mach-o header and associated segment information.
There’s a little bit of information andload
Directives, let’s take a look at the relevant practices in conjunction with the initial small demoload
Instruction; We can get throughotool -l
To outputload
Instruction information, combinedgrep
(string filter tool) can filter outLC_SEGMENT_64
的 load
Instruction, as shown in the figure below; Output result prompt__TEXT
The starting position of the segment is zerovmaddr
The length of the segment isvmsize
Size of bytes shown;
Load the binary file into memory
From the above information, we learn thatload
The instruction will contain the address and size of the load, so why does the kernel actually passload
After the instruction loads, the memory address of the binary segment and thislinker
Inconsistent generated addresses? In the figure below, the memory address andlinker
的 A
,B
,C
What does the address matter? This will be discussed laterlinker
The generated address is abbreviated asA
,B
,C
Address Space Layout randomization – Address space layout randomization technique
In fact, “address space layout randomization” is a computer security technique that prevents memory corruption vulnerabilities from being exploited.ASLR
By randomly placing the address space of the process’s critical data area, an attacker can be prevented from reliably jumping to a specific location in memory to attack the specification function. Modern operating systems typically add this mechanism to prevent malicious programs from attacking known addressesReturn-to-libc
Attack. In short, the kernel initializes a random value, called a random value, before loading the binary segmentASLR Slide
“Memory space random distribution offset”, referred to as “memory space random distribution offset”S
; The kernel then takes the offsetS
Overlay thelinker
The generatedload
Instruction addressA
,B
,C
On; Therefore, the kernel is executingload
Commands will not follow the originallinker
Addresses are loaded directly into memory addressesA
,B
,C
Instead, load toA+S
,B+S
,C+S
We can take these practicalload
The loading address is calledLoad Address
“Load address”, hereafter,Load Address
Will be referred to asL
By understandingASLR
Technology, we figured it outlinker address
和 load address
The difference between theta and theta is thetaASLR Slide
Random memory address distribution offset; We can get that formulaALSR Slide = Load Address - Linker Address
, simplified asS = L - A
How to get the actual Linker Address and Load Address
As mentioned earlierotool
To help us look at the binariesload
Instruction information, and then getlinker address
(This address can also be viewed asfile address
“File address”) to get the runtime memory addressLoad Address
, can be accessed through the crash logBinary Image
List,Instruments
Provide the stack, or throughvmmap
Command line tool to get; Specific Usagevmmap
This will be explained later
Calculate the ASLR Slide random memory offset
With practice, we need to knowASLR Slide
Random memory offset before being able to log and crashInstruments
Memory address in the stack, minusASLR Slide
And get the file address; So we have to figure outASLR Slide
To calculate theASLR Slide
Usually with a specific paragraph (e.g__TEXT
)load address
和 linker address
How do I get these two addresses, as I said above, in practice we get them from the crash log__TEXT
Binary segmentload address
为 0x10045c000
; throughotool
I can get__TEXT
Binary segmentlinker address
为 0x100000000
; We can subtract these two and we can get thatASLR Slide = 0x45c000
;
There are theASLR Slide
, we can convert the file address in the disk address space from the runtime memory address of the crash log, as shown in the figure below, we can get the file address of the crashed stack in our demo is0x10003b70
With the file address, we can use it to view the source code, which will be discussed later. Let’s go ahead and explore some other calculationsASLR Slide
The position of
As shown in the picture below,otool
The command line tool can be used to view the incoming command information when a crash occurs-tV
Can output assembly stack;-arch arm64
In order to makeotool
Correctly handleUniversal 2
The product of technical compilation; Output structure corresponding to the above file address, showing this isbrk
Instruction, in assemblybrk
Generally stands forApp
Exceptions or problems occur;
atos
Command-line tools can also help us calculateASLR Slide
.atos
的 -o
The instruction will outputfile segment address
.-l
The instruction will outputload address
;
In addition toatos
和 otool
, as well asvmmap
Command line tools can also help us get thisload address
We can usevmmap
To verify the above calculation,vmmap
When output crashes__TEXT segment
的 load address
, can be calculated using the previous formulaASLR Slide
为 0x104d14000
In the crash logruntime address - ASLR Slide
gotfile address
为 0x100003b70
And what we calculated beforefile address
The same;
The above two different runtimes, different crash logs, differentASLR Slide
Can get the samefile address
This is no coincidence; Because the kernel runs every timeASLR Slide
Is different, so the memory address in the crash log for different devices will change at different times, but the actuallinkder address
It’s the same; Based on this, although the memory address changes each time, we can still locate the samefile address
;So far, we’ve found a mechanism that allows me to locate us in random runtime memoryApp
What happens at the source level; This mapping mechanism allows us to trace back from the stack information at run timeApp
In the source;
Summary – Trace back from memory address to file address
This is the first step in the symbolization two-step: go back from the memory address to the file and summarize the content and tools in this step
App
And the binary file format of the library isMach-O
, includingMach-O
The header contains the binary segment association information andload
Instructions, these binary segments arelinker
Created, which includes the address information of the binary segmentlinker address
;otool -l
Can help us exportMach-O
Address and property information for the specified binary segment, includinglinker address
;- In crash logs
binary image
The list can be obtained at the time of the crashload address
; vmmap
Can also get runningApp
的load address
ASLR Slide + Linker address = Load address
Step 2 – Analyze the debug information
With that in mind, we can move on to the second step of symbolization: analyzing debugging information; Debugging information is generally includedfile address
Relationship information between source code andXcode
This relational information is generated at compile time and stored asdSYM
Files, or you can build this relational information into binary compilations;
There are three types of debugging information, each of which provides different levels of debugging information associated with file address.
Function starts
Nlist symbol table
DWARF
The following figure shows that each of the three tools provides debugging information for the corresponding dimension
Function Starts
As can be seen from the figure above,function starts
Compared to other tools, this tool can only provide the starting address of the function. Specifically,function starts
The starting address of the function and the address from which it was called are provided; But it doesn’t tell you if there are other functions in the call address, it just tells you that there is a problem with one function;
function starts
Through the coding__LINKEDIT
Binary segmentlinker
Address lists to provide this functionality;function starts
Based on directly built inApp
In the compilation product, passmach-O
Of the fileload
The directiveLC_FUNCTION_STARTS
To describe thefunction starts
;
In practice, you can passsymbols -onlyFuncStartsData
Command line tool to outputfunction starts
The relevant information is shown in the figure belownull
becausefunction starts
Function names are not provided, so usenull
To be a placeholder for the function name;
Based on thefunction starts
We can process unsymbolized crash logs from the memory address of the crash log0x10045fb70
Minus what we calculatedASLR Slide
0x45c000
getfile address
0x100003b70
; Then combining withfunction starts
On the output, we find that there is only the first address0x100003a68
Less than what we figured outfile address
0x100003b70
, so only this first address contains the address where the error occurred; Based on this we calculate that the two addresses are offset0x108
In decimal terms, it is264
That is, usfile address
And the actual error occurred at the address264
Byte offset;
So function starts helps us understand how the function in the crash log is set up and which registers are changed. Because function Starts does not provide the function name, we can only analyze the error logs at the low-level machine code level. It is useful for debugging and developing App, but we also need other tools to analyze the error logs.
Nlist symbols List – Nlist symbol table
nlist
Is a structure, and its specific structure is shown in the figure below.nlist
The symbol table is built onfunction starts
And a coded one__LINKEDIT
segment
Of coursenlist
Have their ownload
Instruction; withfunction starts
The difference isnlist
He encodes more information in its structure than just memory addresses; As shown in the picture below,nlist
The structure contains the name and several other attributes, specificallynlist
The type of then_type
Determined by the
There are three types of N_type that we are interested in symbolizing. Here we will focus on two of them; The first is direct symbole – direct symbol; The direct symbolic correlation is that in App and binary libraries, there are fully defined methods and functions; The direct symbol stores the function name and the function file address in the nlist_64 structure;
Nlist direct symbol
n_type
The value of the specified binary bit innlist
To be specific,n_type
The second, third, and fourth bits of1
Is displayednlist
The type is a direct symbol, and the combination of these three bits is also calledN_SECT
;
We can get throughNm - defined - only - numberic - sort
Command line tool to viewN_SECT
; Here,nm
Traverse themagicNumbers
App
Make symbols, and listed in order to address, refer to the output in the figure below; Notice we still use it herexcrun -swift-demangle
To resolveSwift mangling
The function name after;
As shown in the figure above, we can already get the method name from the resultnumberChoices()
And the name of the classMagicNumbers
, file name,main
; This is because the information is directly inApp
In the definition;symbols
View the direct symbols andnm
Tools are similar,symbols
The command line tool also provides viewingnlist
Data method, and support automaticdemangle
, as shown in the following figureThe above two methods, let us from the memory address in the crash log, associated with the specific function name in the source code, so far, the crash log symbolic information rich degree further;So far, we can match the function entry offset address provided by fuction starts from the direct symbols to a function entry with a name. Putting these information together, we can find that crash occurs at the 264 byte offset of the main method address. But main isn’t the only function in the crash, which suggests we have more to discover; For example, we haven’t figured out the line count information in the code
We’ve figured outmain
Not the only function associated with crashes, we have more information to discover; For example, we don’t have the number of lines in the file; And in the above symbolization, some functions are serialized, and some stack and crash log information is not symbolized
We are inInstruments
A similar situation is encountered in the stack of, where some function names are symbolized and readable, but some are still memory addresses; The reason for this phenomenon is that the functions contained in the direct symbol table are limited to the parts that are directly linked when linking, and the binaries loaded at runtime such as dynamic libraries are not included. These unsymbolized methods are called from dynamic libraries across modules. We need other means of symbolizing this debugging information;
This direct symbol table logic helps reduce compilation volume; After all, it’s not common sense to store all the relevant function information when packaging into a symbol table; forFrameworks
和 Libraries
We need to deal with logging the methods that are called and stripping out the unused ones; Of course, if you strip out the functions in the main program in the direct symbol table, there’s nothing left in the symbol table;
Effects of Xcode compilation Settings on nlist direct symbols
inXcode
In the compilation Settings of,strip
Configuration items havestrip linked product
,strip style
、strip swift symbols
Three options. These compiler Settings are controlled by optionsApp
Stripping the logic of redundant symbol tables during link compilation; To be specific,strip linked product
为 YES
When the binaries will be based onstrip style
Value for symbol table stripping; For example,strip style
A value ofall symbols
, the symbol table will execute the most radical stripping strategy, and the final symbol table contains only the most core methods;Non globals
A type strips away the immediate symbols that are used in common across different modules in an application, but leaves them for use elsewhereAPP
The symbol in;Debugging symbols
The third type is deletednlist
Type symbol, which is discussed laterDWARF
Will, but this type of stripping preserves the symbols directly used.
For example, here’s one that defines twopublic interface
Interface and ainternal shared
The method of implementationframework
Since all of these functions are useful in linking links, they all have direct symbolic items.
If I follownon globals
Peel it off. There’s only twointerface
Leaves; Since the shared implementation of the function only inframework
Internal, so it is not global, and therefore will not be placed in the symbol table;Similarly if it’sall symbols
The stripping strategy is always if these twointerface
Have beenframework
When called externally, they are still left behind;
Symbols - onlyNListData
It prints something distributed between the direct symbolsfunction starts
The entry; These entries also indicate whether the function exists in the direct symbol table or has been stripped out. You can use these stripping Settings to achieve the symbol table visibility you need; With this information, we can determine when direct symbol tables are needed. In practice, sometimes we can symbolize the function name without specifying the line number or file name. Or the symbolic result contains the method name and the method start address, as hereframework
的 symbols
Examples of directives;
Indirect symbols – Indirect symbols
Similar to a direct or indirect signn_type
The first binary bit of is1
, orn_EXT
throughnm -m -arch arm64 -undefined-only --numberic-sort MagicNumbers
Output indirect symbol information; Which uses- undefined - only
To replace- defined - only
, which is used to view indirect symbols;-m
This allows you to see where these methods come fromframework
或 libraries
. The output is indicated in the figure belowMagicNumbers
App
Rely on thelibSwiftCore
A series ofSwift
The basic methods are as followsprint()
.
#### Summary – Function starts and NLIST notation table at the beginning of this article, we agreed to discuss three symbolic tools, Function starts, Nlist and DWARF. The first two have been discussed so far, and I’ll review them here;
Function starts
Can provide address list, missing method name, can help calculate the file address offset corresponding to the crash;Nlist symbol table
Store details associated with an address in a structure,nlist
Symbols can provide function names and can also be described inApp
Direct symbols defined within and indirect symbols provided in the binary library; Direct symbol tables usually retain functions related to links,Xcode
In project Settingsstrip build style
Affects the contents of the immediate symbol table.- Both symbol tables are embedded directly in
App
Binary fileMach-O
In the header__LINKEDIT
Binary segment
DWARF
So far we have not seen symbolic information such as file name, number of function lines, number of crash lines, etc. This information is available inDWARF
We will discuss it in detail hereDWARF
; In contrast tonlist
The symbol table only holds part of the function information,DWARF
Almost all context information for a function is recorded; reviewfunction starts
Provide offset information on only one dimension;nlist
Based on the codingnlist_64
The structure escalates debugging information into two dimensions, address information and function names; As a comparisonDWARF
A third dimension is added: relational information; In real projects, functions do not exist in isolation. Functions are called and other functions are called within them. Functions have input and output parameters. By logging contextual information about these functions; DWARF will unlock the most iconic poses;
When we analyzeDWARF
“, generally refers to reference analysis of onedSYM bundle
thebundle
There are metadata components inplist
, including oneDWARF
Binary file; In binaryDWARF
The information is recorded in__DWARF
Binary segment;DWARF
In this binary section are recorded the three data flows we need to focus on; Specifically, the three data streams aredebug_info
.debug_abbrev
.debug_line
; debug_info
Contains raw data,debug_abbrev
Structured processing of raw data,debug_line
Contains file names and line numbers; In addition toDWARF
It also defines two that need to be discussedvocabulary list
Vocabulary:compile unit
Compilation unit andsubprogram
Subroutines; The third type of vocabulary – inline subroutines – will be mentioned later
Compile Unit – The Unit that compiles
A compilation unit represents a single source file that will be compiled in a project; Specifically, each in the projectswift
Every file has a compilation unit that corresponds to it;DWARF
Attributes are assigned to each compilation unit, such as filename, module name,__TEXT segment
Function placeholder part, etc.;main.swift
File corresponding compilation unit indebug_info
These properties are stored in the data stream, as shown on the left; Corresponding to that, indebug_addrev
The data stream contains an associated entry that tells us what the values represent, as shown on the right; We see the file name, language, and one on the rightlow/high
Yeah, it’s a statement__TEXT
segment
The scope of the
Subprogram – subroutine
A subroutine represents a defined function; We already havenlist
Defined methods are found in the symbol table, but subroutines can also be used to describe static and local methods; Subroutines also have their own names and corresponding__TEXT
segment
Address start range
Tree DWARF relationship
A basic relationship between a compilation unit and a subroutine is that a subroutine is defined in a compilation unit;DWARF
Use a tree to express this relationship; The compilation unit is on the root node, and the subroutine is the child node of the root node. These child nodes can be retrieved by their address range;
We can get throughdwarfdump
Command line tool to verify the aboveDWARF
First, we’ll look at a compilation unit that carries the same attributes (filename, language, number of lines, etc.).dwarfdump
Tools combinedebug_info
和 debug_abbrev
Content to showdSYMs
Data structure and content in a file
The output is long, and if we look down, we’ll see a subroutinesubprogram
; The address range it occupies exists within the address range of the compilation unit, and the method name can be seen; I mentioned earlierDWARF
A very detailed description of symbol tables and relational information that we will not delve into furtherDWARF
But knowing these details can help us understand the logic behind symbolization;
Looking down at the output, you’ll see that it also contains parameter information,DWARF
Has its own vocabulary describing the names and types of parameters; A parameter is a child node of a subroutine; You can see the output in the figure belownumberofChoice
Parameters of a functionchoices
Relevant information of; File name and line number information
In addition,debug_line
The file name and the number of lines associated with the function are stored in the data stream. butdebug_line
The data flow is not a tree structure; instead, the data flow defines oneline table program
Line table program, the navigation program can let the linked file address map to the source file specific line number; We can use this line table program to find the file address associated with the specific source and line number;
To sum up, based ondebug_info
The tree structure anddebug_line
The row table program, we can get a following structure; By walking through the tree, we can find the desired file address; Start with the compilation unit, iterate through its children, and filter out the containsdebug_line
Child node of;
DWARF and inline optimization of compile-time functions
We can use the ATos command line tool to do this, this time we omit the -i flag, you can see that the output is much less, only the method name, file name, and line number; The result here provides the number of lines, so we can conclude that we are using DWARF for symbolization; But except for the file name and the number of lines, this output is not much different from the symbolized results of the NList symbol table; Then we try to add -i flag to atOS. The output is the second image below. You can compare the difference between the two outputs. They command only a – I atos – o MagicNumbers. DSYM/Contents/Resources/DWARF/MagicNumbers – arch arm64 – l x10045fb70 x10045c000 0 0 atos -o MagicNumbers.dSYM/Contents/Resources/DWARF/MagicNumbers -arch arm64 -l 0x10045c000 -i 0x10045fb70
You might guess, well, this is-i
What it means; In fact,atos
的 -i
meansinlined function
Inline functions, inlining is a general optimization performed by the compiler; To be more specific, inlining means that the implementation code of a function replaces the code called by the function directly during compilation. This substitution makes both the calling code and the function definition code “disappear”; In ourDemo
In is to usenumberOfChoice()
The implementation code replaces the calling code;numberOfChoice()
The calling code is missing
Inlined subroutines – Lined subroutines
DWARF
Use inline subroutines to express this compile-time inlining optimization; So that’s the third one we’re going to talk aboutvocabulary list
Glossary types:inlined subroutines
Inline subroutine; An inline subroutine is a type of subroutine, so it is also a method, a method that is inlined to another subroutine; So the inline function is inDWARF
A relationship tree is a child node of a subroutine; This definition implies recursion; That is, an inline subroutine can have other inline subroutines as child nodes;Again usingdwarfdump
Command-line tools, we can check that outDWARF
Inline subroutine; These inline subroutines are listed as children of other nodes and have attributes similar to those of subroutines, such as name and address; But in theDWARF
In a file, these properties are typically accessed through a common node, called an abstract source; If there are many inline copies of a particular function, the common shared properties of that function are stored in the abstract source, so that the inline functions are not duplicated. Inline subroutines have a unique propertycall site
Call location; This property represents where functions are actually called in the source code, which the compiler optimizer replaces. For example, we are inmain.swift
Line 36 in the file is calledgenerateANumber()
, which makes it necessary to add child nodes to the tree to record the function call;
So here we areDWARF
With a more comprehensive understanding of symbolization, as shown in the figure below, we have an understanding ofApp
Call logic also has a broader perspective. Understanding the optimization methods and details of inline functions is the key to fully symbolic crash logging.-i
The instructions actually require thatatos
The above inline functions are taken into account in the process of symbolization. The information for these inline functions is also inInstruments
Missing in stack; We’re crashing logs andInstruments
All on the stackdSYM
Documents, becausedSYM
Contains precisely the three types of information mentioned above: compilation units, subroutines, andDWARF
Relationship tree;
Get DWARF from the library and object files
In addition todSYM
Files can also be found in static libraries and object filesDWARF
; Which means even if there isn’tdSYM
Files you can still generate from static libraries or linked functions in object filesDWARF
; In this case, you will find the debug symbol tablenlist
Type, these could have beenstrip
One of the stripped symbol types; But thesenlist
The type does not directly containDWARF
Instead, they associate functions directly with their source files; If a library is built to contain debugging information, thesenlist
Entries can be provided to usDWARF
Information about
Of the above typenlist
Entries can be accessed throughdsymutil -dump-debug-map
Command line tools to output and view in detail; Here we list the different function methods and where they come from; This address information can be scanned and processed intoDWARF
Information required in the document;
Summary – DWARF
DWARF
It is an important source of deep symbolic dataDWARF
Describes the important relationship between functions and files.DWARF
The problem of compilation time convergence optimization is properly dealt with.dSYM
Both files and static libraries can be includedDWARF
;- It is recommended in practice
dSYM
To obtainDWARF
Because fromdSYM
To derive theDWARF
Can be easily used in other tools, andXcode
Many built-in tools are also supportedDWARF
;
Development tools and symbolic practices
Xcode compilation setting -debug info format
- Direct build is recommended for local development configuration
DWARF
- For the publish build configuration, make sure that the build contains
DWARF
的dSYM
file - Submitted to
App Store Connect
的App
You can download it theredSYM
- Even if it does
bitcode
Technology, you can also get fromApp Store Connect
Download thedSYM
file
Find and confirmdSYM
file
As shown below, locallyMac
Up can catch itmdfind
Command line tool checkdSYM
File; This alphanumeric string is used to compile binary artifactsUUID
Is also runtimeload
The unique identifier of the instruction; You can still passsymbols -uuid
To view thedSYM
Of the fileUUID
;
In rare cases, the compilation process will produce an invalidDWARF
You can go throughdraftdump -verify
Command to verifyDWARF
Effectiveness; If this check command outputs any errors, go throughfeedbackassistant.apple.comTo carry outDeveloper Tool - A development Tool
thebug
Feedback;
The maximum size of a DWARF binary is 4GB. If errors greater than 4GB are reported in the validation above, you may consider componentializing the project so that each component has a smaller dSYM
In practice, by comparisondSYM
的 UUID
And crash logsbinary image
的 UUID
Sex to match the two; Except to view it in the crash logApp
Binary mirroredUUID
You can still passsymbols
Command line tool to obtainUUID
, refer to the figure below; In actual symbolization, you needdSYM
And crash logsUUID
Matching;
Other symbolic details
symbols
Command-line tools can also help you check in on yourselfApp
Available debugging information contained in the compilation product; The label in square brackets in the output tells the source of the debugging information. Use this command to see what debugging information is available when you don’t know what to use for debugging.
If you’re sure it’s already availabledSYM
File, but still notInstruments
Stack information symbolized in, please check the item’sEntitlements
And code signing configuration; Specifically usingcodesign
Command-line tool, where you can verify that you have the correct code signing configuration;
At the same time, you also need to check the local developmententitlement
Is it contained inget-task-allow
The configuration grantsInstruments
Such tools perform mappings in debuggingApp
Symbolic rights; In general,Xcode
This is automatically set by defaultget-task-allow
Configuration items; butInstruments
If it cannot be symbolized, check the configuration item. If you find thatentitlement
There is noget-task-allow
, you can check to make surebuild-setting
-> code signing
-> code signing inject base entitlemens
The value oftrue
To solve the problem;
Finally, for useUniversal 2
technologyApp
When using the command-line tools mentioned in this article, you can specify schemas such assymbols
,otool
,dwarfdump
There are-arch
Can be configured so that only architecture-specific operations can be performed;
conclusion
As the “symbolic progression” in its name, this Session is summed up with the following key points
- symbolic
UUID and file address are a consistent and reliable way to identify App problems at runtime, as they are not affected by the ASLR Slide offset; UUID and file address are key first steps in symbolizing runtime information
- In practice, use as much as possible
dSYM
Complete symbolization;dSYM
以DWARF
The form records the most detailed debugging information, and isXcode
和Instruments
Well supported - Several command-line symbolization tools such as
otool
.vmmap
.nm
.symbols
.dwarfdump
.atos
; These tools are included in theXcode Command line tool
Provides powerful diagnostics and the ability to view symbolic processes and details. If necessary, you can integrate these tools into your workflow;
If you are interested in learning more about linking and symbolization, I recommend two WWDC18 sessions: Optimizing Your App Startup Time By Optimizing the startup speed of your App Past, Present, and Future-App launch timeline: past,present, and future;
Pay attention to our
We are the Veteran Driver Tech Weekly, a tech newsletter that continues to pursue premium iOS content. Welcome to follow.
Focus on politeness, focus on[Old Driver Technology Weekly], reply “2021” and get the internal reference of 2017/2018/2019/2020
Support the author
I recommend the WWDC21 Insider column, which contains 102 articles about WWDC21 and is the source of this article. If you are interested in the rest of the content, please click the link to read more
WWDC internal reference series is led by the old driver organization of high-quality original content series. We’ve been doing it for a couple of years, and it’s been good. It is mainly for the annual WWDC content, do a selection, and call on a group of front-line Internet iOS developers, combined with their actual development experience, Apple documents and video content to do a second creation.