Main Contents:
- understand
C
,C++
As well asOC
The relationship between - Compiled versus interpreted languages
- The compiler
LLVM
withCLang
- understand
iOS
The compilation process - pretreatment
- compile
- assembly
- link
1. Understand the relationship between C, C++ and OC
1. The C language
C
Language is a process-oriented computer programming language, which can be used for both system software development and application software development.C
Language compilers are ubiquitous in a variety of operating systems, for exampleMicrosoft Windows
.Mac OS X
.Linux
.Unix
And so on;C
The design of the language influenced many later programming languages, such asC++
,Objective-C
,Java
,C#
And so on;
2. C + + language
- Compatible with the
C
The language is process-oriented, but it has been expanded and improved. - As an object-oriented language, it has the characteristics of encapsulation, multi-inheritance and polymorphism.
3. The Objective – C language
- Extend the
C
The language’s ability to make it capable of object-oriented design is equivalentC
A superset of; OC
You can have it in your codeC
andC++
Statement, which can be calledC
Delta function can also passC++
Object access method;
4. Comparison between OC and C++
OC
withC++
fromC
The language evolved into object-oriented design languages, which are also standards-compliantC
Language; But they belong to different object-oriented schools;- The biggest difference between the two is:
OC
Provides a dynamic binding mechanism at runtime, whileC++
It is compile-time statically bound and simulated by embedding classes and virtual functions. OC
Reduced compilation requirements and increased flexibility during compilation, whileC++
More potential errors are found in the compilation process and corrected before running, reducing flexibility;
Take the following code, which C++ considers to be an error and OC considers to be fine during compilation:
NSString *test =(id) [[NSArray alloc] init];
Copy the code
The differences between OC and C++ in details are as follows:
- Finalize the design:
OC
Is dynamic stereotyping, which allows access to methods and classes based on string names, as well as dynamically linking and adding classes; - Inheritance:
OC
Multiple inheritance is not supported,C++
Support multiple inheritance; - Function call:
OC
Function calls are implemented through messaging, whileC++
Function calls directly; - Interface:
OC
usingProtocol
Form to define the interface, whileC++
Virtual function is used to define the interface. - Overloading:
OC
It is not allowed for two methods in the same class to have the same name (even if the argument type is different), butC++
You can;
Compiled and interpreted languages
Objective-c is a compiled language to keep the iPhone efficient;
Compiled languages
- The program must pass before it can run
The compiler
generateMachine code
, machine code directly throughCPU
Execute, run without retranslation; - The program execution efficiency is high, but depends on the compiler, debugging cycle is long, poor cross-platform;
- Representative Language:
C
,C++
,OC
And so on;
2. Interpretive language
- Before the program runs, it does not need to compile, but stores the program code in the form of text, which needs to be interpreted by an interpreter before running.
- The program execution efficiency is low, but the program is dynamic, and the code can be added and updated to change the program logic at any time after running;
- Representative Language:
Javascript
,Python
And so on;
LLVM and CLang compilers
1. The compiler
Concept: a program that converts one programming language (the original language) into another programming language (the target language);
Most compilers have a front end and a back end:
- Front end: in charge
Lexical analysis
,Syntax analysis
,Generate intermediate code
; - The backend: in
The middle code
As input, do architecture-independent code optimizations, and then generate different machine code for different architectures;
Supplement:
- Front end to
The middle code
As a medium, the front and back ends can be independent of each other; - The advantage of this is that adding a new language only requires modifying the front end and adding a new one
CPU
The architecture only needs to modify the back end;
2. LLVM and Clang
LLVM is the compiler currently used by Apple:
LLVM
Is a set of compiler infrastructure projects for free software toC++
Contains a series of modular compiler components and a toolchain for developing compilersThe front end
andThe back-end
;- Based on the
LLVM
It has spawned some powerful sub-projects, such as:Clang
和LLDB
.
CLang, based on LLVM, is a lightweight compiler that is highly modularized;
CLang
Mainly from Apple computer support, support at the same timeC
,Objective-C
As well asC++
;CLang
Used to replaceXcode5
Used before versionGCC
The compilation speed has been improved3
Times:
3. Understand the compiler in iOS
- in
iOS
Under development, usuallyLLVM
Is considered to be the back end of the compiler, whileClang
Is the front end of the compiler; - Both to
IR
(intermediate code) as a medium, so that the front and back end of the separation, so that the front and back end can change independently, do not affect each other; C
The front end of the language family isclang
.swift
The front end isswiftc
But the back ends of both areLLVM
;
Understand the iOS compilation process
1. Compile flow charts
The compilation process of LLVM is quite complex, iOS code operation needs to go through four key stages: pre-processing, compilation, assembly and linking. The specific process is shown as follows:
2. Prepare test files
Take THE OC language as an example, analyze the code compilation process in detail, and prepare a main.m file as follows:
#import <Foundation/Foundation.h>
/// add comment: macro defines Name
#define Name "Wu Yu Bei Chen"
int main(int argc, const char * argv[]) {
NSLog(@"Hello, %s", Name);
return 0;
}
Copy the code
V. Prepressing
1. Main functions
- Replace macros: Replace various macro definitions in code, such as defined constants, functions, etc.
- Import header file: will
#include
Insert included files into the instruction location, etc.; - 3. To delete all comments:
//
、/ *
* /
And so on; - Conditional compilation: processing
#if
,#ifdef
.#endif
And similar conditional compilation; - Add line numbers and file name identifiers so that at compile time the compiler can display warning and error line numbers.
2. View the preprocessing result
Using the xcrun command, perform preprocessing at the terminal:
xcrun clang -E main.m
Copy the code
The terminal display is as follows:
# 1 "main.m"
# 1 "<built-in>" 1
.
# 1 "/Applications/Xcode13.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framewo rks/Foundation.framework/Headers/FoundationLegacySwiftCompatibility.h" 1 3
# 193 "/Applications/Xcode13.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framewo rks/Foundation.framework/Headers/Foundation.h" 2 3
# 2 "main.m" 2
int main(int argc, const char * argv[]) {
NSLog(@"Hello, %s"."Wu Yu Bei Chen");
return 0;
}
Copy the code
Results analysis:
- In the preprocessed file, comments have been cleaned up and macro definitions replaced;
- The preprocessed file has many lines because the header file (
Foundation.h
), and the process is recursive;
Vi. Compilation
1. Lexical Analysis
Main functions: through the scanner, segmentation recognition of source code symbols (such as size brackets, =, string);
Using the xcrun command, perform lexical analysis on the terminal:
xcrun clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code
The terminal display is as follows:
annot_module_include '#import <Foundation/Foundation.h>
/' Loc= <main.m:1:1>
int 'int' [StartOfLine] Loc= <main.m:4:1>
identifier 'main' [LeadingSpace]
.
r_brace '}' [StartOfLine] Loc= <main.m:7:1>
eof '' Loc= <main.m:10:1>
Copy the code
Results analysis:
- Each split source code symbol is recorded to facilitate subsequent error location;
- Such as
Loc=<main.m:4:1>
It means:'int'
This symbol is from the source filemain.m
The first4
The first1
Start with a character;
2. Semantic Analysis
Main functions: analyze the source code symbols, verify whether the syntax is correct, and finally generate AST syntax tree;
Using the xcrun command, view the parsing results:
xcrun clang -fsyntax-only -Xclang -ast-dump main.c | open -f
Copy the code
AST syntax tree:
- Abstract syntax tree, structure is more concise than code, traversal faster;
- Faster static checking while generating
IR
Intermediate code;
3. Static Analysis
Main functions: Traversal AST tree analysis, including type check, method implementation check, will prompt errors;
4. Generate intermediate Code Generation
Main functions: CodeGen is responsible for traversing AST syntax tree from top to bottom and translating it into IR intermediate code step by step;
IR intermediate code:
- This is a language that more closely resembles machine code, allowing the compiler to be divided into a front end and a back end, allowing different platforms to use their own compilers to convert intermediate code into machine code for different platforms.
- for
iOS
Systemically speaking,IR
The intermediate code generated isMach-O
Executable file; IR
Is the output of the front end, the input of the back end;
Vii. Assembly
The output of the intermediate code signals the completion of the front-end work, which then moves on to the back-end processing flow.
1.LLVM optimizes intermediate code
The intermediate code IR goes to the back end and LLVM optimizes it:
Optimization Level
bitcode
2. Generate assembly code
LLVM optimizes IR to generate different assembly code for different architectures.
Purpose of assembly stage:
- Assembles code and categorizes symbols;
- Place the external import symbol into the relocation symbol table.
- Finally, one or more are generated
.o
Object file;
Using the xcrun command, generate the assembler file:
xcrun clang -S main.m -o main.s
Copy the code
Open the. S file and extract the following contents:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 11.0 sdk_version 11.3
.globl _main ## -- Begin function main
/ /...
callq _NSLog
/ /...
.subsections_via_symbols
Copy the code
As you can see, the NSLog operation in the assembly file has been converted to a call in the form of an assembly command, namely callq _NSLog;
3. Generate the target file
In this stage, the assembler converts assembly code into machine code and outputs the target file, that is. O file.
Using the xcrun command, generate the target file:
xcrun clang -fmodules -c main.m -o main.o
Copy the code
Use the file command to view the target file type:
% file main.o
main.o: Mach-O 64-bit object x86_64
Copy the code
As you can see, the assembler generates a file in Mach-O format of type Object, the object file type:
Mach-O
Files are used foriOS
andOS
The type of file on the platform;Mach-O
As aa.out
The substitution of format provides stronger scalability and improves the speed of accessing information in symbol table.
Using the xcrun command, look at the symbols in main.o:
xcrun nm -nm main.o
Copy the code
The terminal display is as follows:
(undefined) external _NSLog
(undefined) external ___CFConstantStringClassReference
0000000000000000 (__TEXT,__text) external _main
Copy the code
As you can see, the NSLog function that we’re using here corresponds to the _NSLog symbol:
undefined
: indicates that the symbol cannot be found temporarily in the current file_NSLog
;external
: indicates that the symbol is externally accessible, corresponding to the symbol that indicates that the file is privatenon-external
;
Viii. Linking
Main functions: symbol parsing, relocation, merging object files, and finally generating executable files;
1. Execute the link using the xcrun command to obtain the executable file
xcrun clang main.o -o main
Copy the code
2. Run the file command to view the file type
% file main
main: Mach-O 64-bit executable x86_64
% . /main
2021-10-01 19:06:41.846 main[5663:660299] HelloRain in the northCopy the code
The Mach-O format is executable. Running the file also prints the expected results;
3. Run the xcrun command again to view the symbol table of the executable file
% xcrun nm -nm main
(undefined) external _NSLog (from Foundation)
(undefined) external ___CFConstantStringClassReference (from CoreFoundation)
(undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003f40 (__TEXT,__text) external _main
0000000100008008 (__DATA,__data) non-external __dyld_private
Copy the code
The _NSLog symbol is undefined, but there is some additional information, namely, from Foundation, which means that this symbol is from Foundation and will be dynamically bound at runtime.
4. Main tasks of the link stage
1. Symbol analysis
Associate each symbol reference with the corresponding symbol definition;
- When the linker links multiple files, it creates a symbol table, which is used to record all defined and undefined symbols.
- If the same symbol appears, an error is reported:
"ld:dumplicate symbols"
; - An error is reported when symbols are not found in other object files:
"Undefined symbols"
;
- If the same symbol appears, an error is reported:
- In addition, the linker can help us clarify the functions that are not called when sorting out the symbolic call relationship, and automatically remove;
2. The relocation
Associate symbolic definitions such as variable names and function names with a memory location;
- Because only after binding does the machine know what memory address it needs to manipulate;
- Otherwise, we need to set the memory address for each instruction when writing the code, which is not only cumbersome, but also easy to cause errors;
3. Merge the target file
O object files generated by compiling multiple. M files and other Mach-O files (such as dylib, a, TBD) were synthesized into an executable file in Mach-O format.
- Usually a project consists of multiple files, between different files
variable
andThe interface function
There will be interdependence; - Before the program runs, it is necessary to use the linker to bind symbols and addresses in multiple files to ensure the normal invocation of variables and interfaces in the whole program.
5. Understand static and dynamic links
Static linking: At compile time, linked files may still have the “undefined” symbol. But these symbols are recorded and dynamically linked to dlSYM at run time via Dlopen;
Dynamic linking: At runtime, the advantage is that many shared libraries like UIKit don’t have to be included in every App package. For example, the UIKit system library we use will link to the UIKit before the App actually starts to run, and then run the App after the link is completed.