Main Contents:

  1. understandC,C++As well asOCThe relationship between
  2. Compiled versus interpreted languages
  3. The compilerLLVMwithCLang
  4. understandiOSThe compilation process
  5. pretreatment
  6. compile
  7. assembly
  8. link

1. Understand the relationship between C, C++ and OC

1. The C language
  1. CLanguage is a process-oriented computer programming language, which can be used for both system software development and application software development.
  2. CLanguage compilers are ubiquitous in a variety of operating systems, for exampleMicrosoft Windows.Mac OS X.Linux.UnixAnd so on;
  3. CThe design of the language influenced many later programming languages, such asC++,Objective-C,Java,C#And so on;
2. C + + language
  1. Compatible with theCThe language is process-oriented, but it has been expanded and improved.
  2. As an object-oriented language, it has the characteristics of encapsulation, multi-inheritance and polymorphism.
3. The Objective – C language
  1. Extend theCThe language’s ability to make it capable of object-oriented design is equivalentCA superset of;
  2. OCYou can have it in your codeCandC++Statement, which can be calledCDelta function can also passC++Object access method;
4. Comparison between OC and C++
  1. OCwithC++fromCThe language evolved into object-oriented design languages, which are also standards-compliantCLanguage; But they belong to different object-oriented schools;
  2. The biggest difference between the two is:OCProvides a dynamic binding mechanism at runtime, whileC++It is compile-time statically bound and simulated by embedding classes and virtual functions.
  3. OCReduced compilation requirements and increased flexibility during compilation, whileC++More potential errors are found in the compilation process and corrected before running, reducing flexibility;

Take the following code, which C++ considers to be an error and OC considers to be fine during compilation:

NSString *test =(id) [[NSArray alloc] init];
Copy the code

The differences between OC and C++ in details are as follows:

  1. Finalize the design:OCIs dynamic stereotyping, which allows access to methods and classes based on string names, as well as dynamically linking and adding classes;
  2. Inheritance:OCMultiple inheritance is not supported,C++Support multiple inheritance;
  3. Function call:OCFunction calls are implemented through messaging, whileC++Function calls directly;
  4. Interface:OCusingProtocolForm to define the interface, whileC++Virtual function is used to define the interface.
  5. Overloading:OCIt is not allowed for two methods in the same class to have the same name (even if the argument type is different), butC++You can;

Compiled and interpreted languages

Objective-c is a compiled language to keep the iPhone efficient;

Compiled languages
  1. The program must pass before it can runThe compilergenerateMachine code, machine code directly throughCPUExecute, run without retranslation;
  2. The program execution efficiency is high, but depends on the compiler, debugging cycle is long, poor cross-platform;
  3. Representative Language:C,C++,OCAnd so on;
2. Interpretive language
  1. Before the program runs, it does not need to compile, but stores the program code in the form of text, which needs to be interpreted by an interpreter before running.
  2. The program execution efficiency is low, but the program is dynamic, and the code can be added and updated to change the program logic at any time after running;
  3. Representative Language:Javascript,PythonAnd so on;

LLVM and CLang compilers

1. The compiler

Concept: a program that converts one programming language (the original language) into another programming language (the target language);

Most compilers have a front end and a back end:

  • Front end: in chargeLexical analysis,Syntax analysis,Generate intermediate code;
  • The backend: inThe middle codeAs input, do architecture-independent code optimizations, and then generate different machine code for different architectures;

Supplement:

  1. Front end toThe middle codeAs a medium, the front and back ends can be independent of each other;
  2. The advantage of this is that adding a new language only requires modifying the front end and adding a new oneCPUThe architecture only needs to modify the back end;
2. LLVM and Clang

LLVM is the compiler currently used by Apple:

  1. LLVMIs a set of compiler infrastructure projects for free software toC++Contains a series of modular compiler components and a toolchain for developing compilersThe front endandThe back-end;
  2. Based on theLLVMIt has spawned some powerful sub-projects, such as:ClangLLDB.

CLang, based on LLVM, is a lightweight compiler that is highly modularized;

  1. CLangMainly from Apple computer support, support at the same timeC,Objective-CAs well asC++;
  2. CLangUsed to replaceXcode5Used before versionGCCThe compilation speed has been improved3Times:
3. Understand the compiler in iOS
  1. iniOSUnder development, usuallyLLVMIs considered to be the back end of the compiler, whileClangIs the front end of the compiler;
  2. Both toIR(intermediate code) as a medium, so that the front and back end of the separation, so that the front and back end can change independently, do not affect each other;
  3. CThe front end of the language family isclang.swiftThe front end isswiftcBut the back ends of both areLLVM;

Understand the iOS compilation process

1. Compile flow charts

The compilation process of LLVM is quite complex, iOS code operation needs to go through four key stages: pre-processing, compilation, assembly and linking. The specific process is shown as follows:

2. Prepare test files

Take THE OC language as an example, analyze the code compilation process in detail, and prepare a main.m file as follows:

#import <Foundation/Foundation.h>
/// add comment: macro defines Name
#define Name "Wu Yu Bei Chen"
int main(int argc, const char * argv[]) {
    NSLog(@"Hello, %s", Name);
    return 0;
}
Copy the code

V. Prepressing

1. Main functions
  1. Replace macros: Replace various macro definitions in code, such as defined constants, functions, etc.
  2. Import header file: will#includeInsert included files into the instruction location, etc.;
  3. 3. To delete all comments:/// * * /And so on;
  4. Conditional compilation: processing#if,#ifdef.#endifAnd similar conditional compilation;
  5. Add line numbers and file name identifiers so that at compile time the compiler can display warning and error line numbers.
2. View the preprocessing result

Using the xcrun command, perform preprocessing at the terminal:

xcrun clang -E main.m
Copy the code

The terminal display is as follows:

# 1 "main.m"
# 1 "<built-in>" 1

.

# 1 "/Applications/Xcode13.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framewo rks/Foundation.framework/Headers/FoundationLegacySwiftCompatibility.h" 1 3
# 193 "/Applications/Xcode13.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Framewo rks/Foundation.framework/Headers/Foundation.h" 2 3
# 2 "main.m" 2


int main(int argc, const char * argv[]) {
 NSLog(@"Hello, %s"."Wu Yu Bei Chen");
 return 0;
}
Copy the code

Results analysis:

  1. In the preprocessed file, comments have been cleaned up and macro definitions replaced;
  2. The preprocessed file has many lines because the header file (Foundation.h), and the process is recursive;

Vi. Compilation

1. Lexical Analysis

Main functions: through the scanner, segmentation recognition of source code symbols (such as size brackets, =, string);

Using the xcrun command, perform lexical analysis on the terminal:

xcrun clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m
Copy the code

The terminal display is as follows:

annot_module_include '#import <Foundation/Foundation.h>
/'		Loc= <main.m:1:1>
int 'int'	 [StartOfLine]	Loc= <main.m:4:1>
identifier 'main'	 [LeadingSpace]

.

r_brace '}'	 [StartOfLine]	Loc= <main.m:7:1>
eof ''		Loc= <main.m:10:1>
Copy the code

Results analysis:

  1. Each split source code symbol is recorded to facilitate subsequent error location;
  2. Such asLoc=<main.m:4:1>It means:'int'This symbol is from the source filemain.mThe first4The first1Start with a character;
2. Semantic Analysis

Main functions: analyze the source code symbols, verify whether the syntax is correct, and finally generate AST syntax tree;

Using the xcrun command, view the parsing results:

xcrun clang -fsyntax-only -Xclang -ast-dump main.c | open -f
Copy the code

AST syntax tree:

  1. Abstract syntax tree, structure is more concise than code, traversal faster;
  2. Faster static checking while generatingIRIntermediate code;
3. Static Analysis

Main functions: Traversal AST tree analysis, including type check, method implementation check, will prompt errors;

4. Generate intermediate Code Generation

Main functions: CodeGen is responsible for traversing AST syntax tree from top to bottom and translating it into IR intermediate code step by step;

IR intermediate code:

  1. This is a language that more closely resembles machine code, allowing the compiler to be divided into a front end and a back end, allowing different platforms to use their own compilers to convert intermediate code into machine code for different platforms.
  2. foriOSSystemically speaking,IRThe intermediate code generated isMach-OExecutable file;
  3. IRIs the output of the front end, the input of the back end;

Vii. Assembly

The output of the intermediate code signals the completion of the front-end work, which then moves on to the back-end processing flow.

1.LLVM optimizes intermediate code

The intermediate code IR goes to the back end and LLVM optimizes it:

  1. Optimization Level
  2. bitcode
2. Generate assembly code

LLVM optimizes IR to generate different assembly code for different architectures.

Purpose of assembly stage:

  1. Assembles code and categorizes symbols;
  2. Place the external import symbol into the relocation symbol table.
  3. Finally, one or more are generated.oObject file;

Using the xcrun command, generate the assembler file:

xcrun clang -S main.m -o main.s
Copy the code

Open the. S file and extract the following contents:

	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 11.0	sdk_version 11.3
	.globl	_main                           ## -- Begin function main
    
    / /...

	callq	_NSLog

    / /...
.subsections_via_symbols
Copy the code

As you can see, the NSLog operation in the assembly file has been converted to a call in the form of an assembly command, namely callq _NSLog;

3. Generate the target file

In this stage, the assembler converts assembly code into machine code and outputs the target file, that is. O file.

Using the xcrun command, generate the target file:

xcrun clang -fmodules -c main.m -o main.o
Copy the code

Use the file command to view the target file type:

% file main.o
main.o: Mach-O 64-bit object x86_64
Copy the code

As you can see, the assembler generates a file in Mach-O format of type Object, the object file type:

  1. Mach-OFiles are used foriOSandOSThe type of file on the platform;
  2. Mach-OAs aa.outThe substitution of format provides stronger scalability and improves the speed of accessing information in symbol table.

Using the xcrun command, look at the symbols in main.o:

xcrun nm -nm main.o
Copy the code

The terminal display is as follows:

                 (undefined) external _NSLog
                 (undefined) external ___CFConstantStringClassReference
0000000000000000 (__TEXT,__text) external _main
Copy the code

As you can see, the NSLog function that we’re using here corresponds to the _NSLog symbol:

  1. undefined: indicates that the symbol cannot be found temporarily in the current file_NSLog;
  2. external: indicates that the symbol is externally accessible, corresponding to the symbol that indicates that the file is privatenon-external;

Viii. Linking

Main functions: symbol parsing, relocation, merging object files, and finally generating executable files;

1. Execute the link using the xcrun command to obtain the executable file
xcrun clang main.o -o main
Copy the code
2. Run the file command to view the file type
% file main
main: Mach-O 64-bit executable x86_64
% . /main
2021-10-01 19:06:41.846 main[5663:660299] HelloRain in the northCopy the code

The Mach-O format is executable. Running the file also prints the expected results;

3. Run the xcrun command again to view the symbol table of the executable file
% xcrun nm -nm main
                 (undefined) external _NSLog (from Foundation)
                 (undefined) external ___CFConstantStringClassReference (from CoreFoundation)
                 (undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003f40 (__TEXT,__text) external _main
0000000100008008 (__DATA,__data) non-external __dyld_private
Copy the code

The _NSLog symbol is undefined, but there is some additional information, namely, from Foundation, which means that this symbol is from Foundation and will be dynamically bound at runtime.

4. Main tasks of the link stage

1. Symbol analysis

Associate each symbol reference with the corresponding symbol definition;

  • When the linker links multiple files, it creates a symbol table, which is used to record all defined and undefined symbols.
    1. If the same symbol appears, an error is reported:"ld:dumplicate symbols";
    2. An error is reported when symbols are not found in other object files:"Undefined symbols";
  • In addition, the linker can help us clarify the functions that are not called when sorting out the symbolic call relationship, and automatically remove;

2. The relocation

Associate symbolic definitions such as variable names and function names with a memory location;

  • Because only after binding does the machine know what memory address it needs to manipulate;
  • Otherwise, we need to set the memory address for each instruction when writing the code, which is not only cumbersome, but also easy to cause errors;

3. Merge the target file

O object files generated by compiling multiple. M files and other Mach-O files (such as dylib, a, TBD) were synthesized into an executable file in Mach-O format.

  • Usually a project consists of multiple files, between different filesvariableandThe interface functionThere will be interdependence;
  • Before the program runs, it is necessary to use the linker to bind symbols and addresses in multiple files to ensure the normal invocation of variables and interfaces in the whole program.
5. Understand static and dynamic links

Static linking: At compile time, linked files may still have the “undefined” symbol. But these symbols are recorded and dynamically linked to dlSYM at run time via Dlopen;

Dynamic linking: At runtime, the advantage is that many shared libraries like UIKit don’t have to be included in every App package. For example, the UIKit system library we use will link to the UIKit before the App actually starts to run, and then run the App after the link is completed.