In this article, we’ll walk through the process of compiling links through examples.
Compile link procedure
The compilation of the program is staged, mainly in the following stages:
- Pretreatment: Prepressing
- Compiler: Compilation
- Assembly: the Assembly
- Link: Linking
The whole process is illustrated as follows:
Next, let’s use the simplest C file to illustrate the source-to-executable process
Source code to executable file
There are many ways to manually or automatically compile source code on macOS, and here we use Clang to demonstrate the process. For simplicity, just implement a simple hello.c program
#include <stdio.h>
#define MAX_AGE 120
int main(a) {
printf("%s\n"."hello world~");
printf("%d\n", MAX_AGE); // Simple output
return 0;
}
Copy the code
In general, the hello.c source code can be directly invoked by clang’s compile command to output the executable a.out through the compile link.
$ clang hello.c Compile command
$ ./a.out # a.out execution
hello world~
120
Copy the code
The entire compilation process of the program can be displayed by command
$ clang -ccc-print-phases hello.c
+- 0: input, "hello.c", c
+- 1: preprocessor, {0}, cpp-output
+- 2: compiler, {1}, ir
+- 3: backend, {2}, assembler
+- 4: assembler, {3}, object
+- 5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Copy the code
If we had compiled directly with Clang without entering any parameters, we would have skipped the details, but today we’ll explore them step by step.
Clang instruction description
Since we need to use clang, which supports a number of directives, let’s start with a brief overview. You can see the description of man Clang, excerpted from it, and see the official documentation for detailed directives: Clang – The Clang C, C++, and Objective-C Compiler
$ man clang
NAME
clang - the Clang C, C++, and Objective-C compiler
DESCRIPTION
clang is a C, C++, and Objective-C compiler which encompasses preprocessing, pars-
ing, optimization, code generation, assembly, and linking. Depending on which
high-level mode setting is passed, Clang will stop before doing a full link. While
Clang is highly integrated, it is important to understand the stages of compila-
tion, to understand how to invoke it. These stages are:
...
to use the static analyzer.
Copy the code
It can be seen that the instructions are mainly divided into the following parts
- Driver
- Preprocessing
- Parsing and Semantic Analysis
- Code Generation and Optimization
- Assembler
- Linker
- Clang Static Analyzer
What we focus on is these Options, which will be the key parameter in our demonstration below.
OPTIONS
Stage Selection Options
-E Run the preprocessor stage.
-fsyntax-only
Run the preprocessor, parser and type checking stages.
-S Run the previous stages as well as LLVM generation and optimization stages
and target-specific code generation, producing an assembly file.
-c Run all of the above, plus the assembler, generating a target ".o" object
file.
no stage selection option
If no stage selection option is specified, all stages above are run, and the
linker is run to combine the results into an executable or shared library.
Copy the code
Other Common parameters
-o <file> Write output to file. # Specify an output file -v Show commands to run and use verbose outputCopy the code
Let’s get started
Compilation process depth
1. Pretreatment
The preprocessing phase typically handles precompiled instructions starting with # in code, such as
- Remove the comment
- delete
#define
And expand the macro definition - will
#include
Include files to insert into the instruction location, etc. (i.e. replace macros, remove comments, expand header files, produce.i files)
Preprocess hello.c above
$ clang -E hello.c -o hello.i
Copy the code
The output hello. I file is as follows. Because there are many files, only some of them are extracted here
# 1 "hello.c"Extern int __vsnprintF_chk (char * RESTRICT, size_t, int, size_t, const char * RESTRICT, va_list);# 408 "/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h"2, 3, 4,
# 2 "hello.c" 2
int main() {
printf("%s\n", "hello world~");
printf("%d\n", 120);
return 0;
}
Copy the code
As you can see from the above, the comments are gone, and the macro MAX_AGE we defined is converted directly to 120, but the code basically doesn’t change much
2, compile,
During the compile phase, the precompiled file (hello.i) is normally processed as follows:
- Lexical analysis
- Cut the code into pieces
Token
, such as size brackets, and mark their position
- Cut the code into pieces
- Syntax analysis
- Verify syntax by combining sequences of words into phrases to form an abstract syntax tree
AST
- Verify syntax by combining sequences of words into phrases to form an abstract syntax tree
- Generate intermediate code
IR
- Will be generated in the previous step
AST
Traversal translate intoLLVM IR
- Will be generated in the previous step
- BC intermediate code generation (optional)
- After Xcode7 open
bitcode
, Apple makes further optimization by optimizing the postIR
, can generate intermediate code.bc
- After Xcode7 open
- Assembly code generation
- Through one by one
Pass
To optimize and finally generate assembly code
- Through one by one
At the end of AST construction, IR file is output, which is the intermediate code generated by the front end of the compiler. By setting the compilation parameter -s, the. I file can be directly converted into assembly language to generate.
$ clang -fmodules -fsyntax-only -Xclang -ast-dump hello.c # output AST
$ clang -S -emit-llvm hello.c Generate an intermediate IR file
$ clang -S hello.i -o hello.s # Direct compilation generates assembly
Copy the code
The output assembly code hello.s reads as follows:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 11, 0 sdk_version 11, 3
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $0, -4(%rbp)
leaq L_.str(%rip), %rdi
leaq L_.str.1(%rip), %rsi
movb $0, %al
callq _printf
leaq L_.str.2(%rip), %rdi
movl $120, %esi
movl %eax, -8(%rbp) ## 4-byte Spill
movb $0, %al
callq _printf
xorl %ecx, %ecx
movl %eax, -12(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "%s\n"
L_.str.1: ## @.str.1
.asciz "hello world~"
L_.str.2: ## @.str.2
.asciz "%d\n"
.subsections_via_symbols
Copy the code
3, assembly
In this stage, the assembly code hello.s generated in the previous stage is converted into a platform-specific object file, also known as an object file. The output format is.o
$clang -c hello.s -o hello.o
Copy the code
At this point, the object file is output, with the following internal sections:
cffa edfe 0700 0001 0300 0000 0100 0000
0400 0000 0802 0000 0020 0000 0000 0000
1900 0000 8801 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
c000 0000 0000 0000 2802 0000 0000 0000
c000 0000 0000 0000 0700 0000 0700 0000
0400 0000 0000 0000 5f5f 7465 7874 0000
0000 0000 0000 0000 5f5f 5445 5854 0000
Copy the code
As you can see, the inside is binary code, just to16
There it is. There it iscffa edfe
Is there a kind of familiar feeling 😆. You guessed it, this kind ofThe target file
In fact, there are specific results, which are typicalMach-O
File, drag inMachOView
That is visibleAt this point, the object file is generated, although it is machine code, but it cannot be executed directly. All resources must be linked.
4, links,
Link phase, which typically links the object file into an executable. In this phase, multiple object files are merged into one executable file or dynamic library file, and the output format is: .out or.so usually, we do multi-file or module development, and share code through library files, etc., so different object files may have variables that reference each other or call functions. A linker is a program that links different object files (.o files) together. For example, we often call methods and variables in the Foundation framework and UIKit framework, but these frameworks are not in the same object file as our code, which requires the linker to link them to our own code. Remember that the hello.c file above refers to the system library function printf, which cannot be used without linking. Let’s try linking to the hello.o file
$ ld hello.o
Undefined symbols for architecture x86_64:
"_printf", referenced from:
_main in hello.o
ld: symbol(s) not found for architecture x86_64
#So this is an external symbol, and we need to specify the library that _printf is in in order to link
ld hello.o /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib/libc.tbd
#The libc.tbd path can be obtained from the command line, simplifying the command above
ld hello.o `xcrun --show-sdk-path`/usr/lib/libc.tbd
#Execute the generated A.out file
$ ./a.out
hello world~
120
Copy the code
At this point, our object file is linked outa.out
It’s an executable program, but it’s also a MachO file, get rid of that.out
Have a lookTo summarize, the compiler does two important things when it compiles code:
- Compile the source code into assembly
- Categorize and summarize symbols
In the next video we’ll talk about symbols.
- Stackoverflow.com/questions/9…