At first, I bought the book “Self-cultivation of Programmers” because I was short of money to buy the book on JD.com, which was not enough to use the coupon. For a long time after I bought it, I thought the book was just a joke and self-deprecation for programmers. But after reading the first chapter, I realized how wrong I was. I feel that even a programmer who does not use C/C++, or even writes interpreted languages such as JS, should take the time to read this book. As an iOS developer using OC or Swift, I consider this book a must read.

So this article will give you a brief overview of the process of Self-cultivation as a Programmer. If you have limited time and want to read the book quickly, you can start with this article. Detailed information can be obtained at the given page number. In order to simplify the problem, some details of the original text will be omitted, all in order to ensure that the reader quickly understand the book.

For programmers who are not specialized in C and underlying development, I personally think it is impractical and not necessary to read all the contents of this book. There are two main points in this book that are important for beginners to understand:

  1. How does a piece of source code end up in an executable program
  2. What does a process look like in memory

With these two questions to read, the harvest will be greater. Before reading the original book, here is a summary of some of the relevant content, and I will give you some background knowledge in as simple a language as possible. Even if you can’t understand it completely, it’s also good for understanding when reading.

From source to program

The program’s original form of existence is the source code, which is several.c files. The following steps are required for it to become an executable program:

  1. Precompile (P39) : The person responsible for this step is called the precompiler. It handles all the #define macros; All precompiled instructions such as #if, #endif, etc. Next, the #include directive is recursively processed, replacing the precompiled directive with the included file. The. C file is precompiled to a. I file.

  2. Compiling (P42) : The compiler is responsible for this step, which mainly consists of five parts: lexical analysis, syntax analysis, semantic analysis, optimization and assembly code generation:

    • Lexical analysis: Identify parentheses, numbers, punctuation, etc., in source code. Such as the(But there is no)In this step, you can find the error
    • Parsing: This step generates a syntax tree, such as2 + 4It’s a root node+, the left and right leaf nodes are respectively2and4Syntax tree. If you just write2 +At this step, an error will be reported.
    • Semantic analysis: This step focuses on type declarations, matches, and conversions. For example, if you write2 * "3"This is where you get an error
    • Intermediate language generation: This step generates platform-independent three address codes, such as2 + 3Will be written ast1 = 2 + 3, and also optimize expressions that can be determined at compile time
    • Object code generation: The compiler generates the target machine code that depends on the target machine, that is, assembly language, based on the three address codes.

    The.i file is compiled to produce an assembly file with a.s suffix

  3. Assembly (P40) : This step is taken by the assembler, converting assembly language into a language that the machine can execute (consisting entirely of zeros and ones). The assembly file is assembled into an object file with a.o suffix.

  4. Link (P41) : This step is the focus of the book. The preceding steps are based on.c files. A.c source code file is eventually assembled to generate the object file. This step deals with linking multiple object files together.

    Consider a.c file that uses a variable or a function from another.c file. When compiling this file, we were unable to determine the address of the variable or function at compile time. Only after all the target files have been linked together can we be sure. The linker is primarily responsible for address redistribution, symbolic name binding, and relocation.

There’s a lot more to getting from source code to running a program than compiling it, and a lot of times when we say, “Compile the program,” it’s not accurate. However, compilation is the most complicated part of the process.

Software call hierarchy

We divide the entire computer call structure into four layers:

  1. The top layer is the application layer. Whether it’s the browser, the game, or the various development tools we use, like Xcode, VS, the assembler itself, etc.
  2. The second layer is the operating system runtime. When we call system apis, such as file read and write, we invoke the corresponding services provided by layer 2. This call is made through the operating system API, which communicates the application layer with the operating system runtime. That’s why, whether we’re programming on a Mac or Windows, we can call, rightprintf()orfread()Etc. Function. Because different operating system runtimes provide different underlying implementations, the apis for the application layer are always the same.
  3. The third layer is the operating system kernel. The operating System runtime uses System calls to Call functions provided by the System kernel. Such asfreadIt’s an API, it’s called under Linuxread()This system call, which is called under WindowsReadFile()This system call. The application can call the system call directly, but in this way, we need to take account of the different system calls under each operating system, and system calls are more difficult to implement because they are more low-level. Crucially, system calls are done through interrupts and involve saving and restoring the stack, and frequent system calls can affect performance.
  4. The fourth layer is the hardware layer. Programs cannot access this layer directly; only the kernel of the operating system can access it through an interface provided by the hardware vendor.

The relationship between these four layers is shown in the figure below:

Virtual address space

The most important concept in running a program is the virtual address space. The so-called virtual address space is the address space that the application itself thinks it is in. It differs from the physical address space. The latter is real, such as a computer with an 8Gb memory, physical address space is 0 to 8Gb. The MMU of the CPU is responsible for translating virtual addresses into physical addresses.

The first advantage of introducing virtual addresses is that the programmer no longer cares about what the real physical memory space looks like. In theory, the programmer has almost infinite virtual memory space available, and finally just establishes the correspondence between the virtual address and the physical address. On the other hand, the operating system masks the details of the physical memory space, so that processes cannot access physical addresses that are forbidden by the operating system, nor can they access the address space of other processes, which greatly improves program security.

The Paging technology, which is derived from virtual address space, greatly improves the efficiency of memory usage. To run a program, we no longer need to put the entire program in memory to execute, we just need to make sure that the pages to execute are in memory, otherwise page errors occur.

It’s very important to understand the address space, because there’s a lot of memory and addresses in the book, and we need to figure out for ourselves whether this is a virtual address or a physical address. If the analysis is wrong, understanding the problem can be difficult.

Link and relocate

We define foo in another file and call it in main.c. We compile main.c separately to look like this:

...0000000000000024	callq	0x29
0000000000000029Xorl % ecx, % ecx...Copy the code

As you can see, we call the next command directly where we should have called foo, but when main.o and foo.o are linked together, it becomes:

0000000100000f30	pushq	%rbp
0000000100000f31	movq	%rsp, %rbp
0000000100000f34	movl	$0x7b, %eax
0000000100000f39	movl	%edi, -0x4(%rbp)
0000000100000f3c	movl	%esi, -0x8(%rbp)
0000000100000f3f	popq	%rbp
// This is the implementation of the foo function...0000000100000f74	callq	0x100000f30
0000000100000f%e % c % C % CCopy the code

At this point, the location of the foo function is set correctly. The reason is that when the compilation module main.c was compiled separately, the compiler could not determine the location of foo and had to temporarily replace it with the location of the next instruction.

The linker relocates such symbols during the link process. In the case of relocation, main.o has the modified symbol name of the function foo, and the same symbol name is found in foo.o, so the two are connected by the linker. 0x29 The temporary call address is updated to 0x100000F30. This process is similar to the jigsaw puzzle, the program is linked to deal with a variety of similar problems, when all compiled modules are fully linked according to symbolic names, the program is ready to run.

Much of the book is devoted to the structure of object files, many of which are prepared for relocation. Once you understand how relocation works and how it works, it’s much easier to read about it.

Knowledge summary

Finally, a brief summary of some knowledge points and their positions in the book are listed for readers’ reference:

### Static link section

This section focuses on how multiple.c files can be statically linked to create a static library.

  • P58

    The object file is divided into several sections. For example, the.text section holds code, the.data section holds initialized global and local static variables, and the.bss section holds uninitialized global and local static variables.

  • P70

    Linux object files also have an ELF header that summarizes information about the object file, including the number of ELF, machine size in bytes, data storage mode, version, running platform, ABI version, relocation type, hardware platform and version, entry address, segment table location, number of segments, etc.

  • P74

    The segment table is actually an array in which each element is a structure. The structure contains the segment name, type, load address, offset from the header, segment size, link information, and so on.

  • P79

    There is also a relocation table in the object file. Information that needs to be relocated is recorded in this table. All information that needs to be relocated from the.text segment is placed in the.rel. Text segment.

  • P81

    When linking, we call both function names and variables symbols. Each function and variable has its own unique symbolic name so that they can be linked to each other. Different languages have their own rules for decorating symbols. In UNIX C, the compiled symbol name is preceded by an “_”. For example, the compiled result of the function foo is _foo.

  • P86

    The namespace in C++ is designed to avoid symbol name conflicts. C++ has its own set of symbolic name modification rules that can be demangle restored using the C++ filt command. Once you understand the symbol name decorator rule, it’s easy to check if you encounter undefined symbol or duplicate symbol name while writing iOS.

  • P92

    There are strong signs and weak signs. Strong symbols may not have the same name, and weak symbols (uninitialized global variables) may have the same symbol name. References to a symbol name are divided into strong references and weak references. A strong reference indicates that an error is reported if the symbol definition is not found. A weak reference does not report an error and defaults to 0 or a special value.

  • P99

    The link process is usually divided into two steps, first address assignment, then symbol resolution and repositioning.

    Because different object files may contain the same segments, we can merge similar segments during the linking process, which is called address assignment.

    Once the merge is complete, the positions of all symbols are uniquely determined and the relocation can begin. When the link is complete, we have the static library.

  • P118

    A static library can be viewed as a collection of object files. Different object files in the same static library may depend on each other, and different static libraries can also depend on each other.

  • P127

    Link control scripts control the operation of the linker and convert object files and library files into executable files. Link control scripts are written in link scripting languages. It can be considered that the control program entrance, some segments merge, some segments abandon, etc

Dynamic loading

This section focuses on how the executable is loaded into memory after being linked

  • P153

    There are two typical methods of dynamic loading: overwrite loading and page mapping. Overwrite loading allows two modules that are not dependent on each other to share the same memory and replace each other in use. Slower, time for space. A common solution is a page mapping, which divides the virtual memory space of a program into pages, with a dedicated page load manager responsible for managing the mapping between virtual pages and pages in physical memory.

  • P157

    Creating a process has three steps: first, the process creates its own physical space. Map the pages in the virtual space to the pages in the physical space (this step can occur after a page error), and then map the virtual space to the executable. Under Linux, each segment of an object file has its own location in Virtual Memory. This is called a Virtual Memory Area (VMA), which represents the location in Virtual Memory where it is loaded, and the instruction register is set as the entry to the executable file.

  • P159

    After the process is created, there is only a correspondence between physical and virtual pages, but the real instructions and data have not been put into the physical pages, and the memory of the physical pages is in the unallocated state. Once the physical page is accessed, a page error occurs.

    When a page error occurs, the operating system immediately finds the virtual memory corresponding to the page according to the mapping between the pages of physical memory and the pages of virtual memory, and then searches the VMA of each segment to find the offset of the page in the executable file. At this time, the operating system allocates memory space for the physical page, and then writes the data and instructions in the executable file to the physical page, and finally establishes the connection between the physical page and the virtual page. The process then reexecutes from where the page error occurred.

  • P169

    Executables have many sections of varying sizes, but some are smaller than the size of the page, resulting in a waste of space (you can’t store sections contiguously because you might have two sections with different permissions on the same page). Since the operating system does not care about the specific function of each Section, but cares about their read and write permissions (whether they are readable, writable, and executable), it is common to combine the sections with permissions into one Segment

  • P172:

    After the process runs, the operating system initializes the process stack, which holds environment variables and command-line arguments. These arguments are passed to the main function (argc and argv correspond to the number and array of arguments)

Dynamic link

  • P181

    Dynamic linking breaks up a program into relatively independent modules, deferring links between modules until run time. ELF dynamic link files become “dynamic Shared Object (DSO)” with the suffix “.so “. The process of dynamic linking is done by the dynamic linker. Dynamic linking saves memory (multiple processes share a single module in memory) and facilitates upgrades (each statically linked module affects the entire executable).

  • P188:

    Because a dynamically shared object can be used by multiple programs, its location in the virtual address space can be difficult to determine. If different modules have the same target load address, then importing both modules at the same time will cause problems. It doesn’t work if they’re all different, because there are probably too many modules. There’s not that much memory. So dynamically shared objects need to be relocated at load time.

  • P191:

    The current solution for load-time relocation, which cannot be shared across multiple processes, is address independent code technology. The address reference in dynamic objects can be divided into four types: internal module reference and external module reference, instruction reference and data reference. For the instruction or data reference inside the module, the method of relative offset call is adopted.

  • P195:

    The address related parts that need to be relocated are put into the data segment, and the global offset table (GOT) is created. Use.got and.got. PLT tables for data and function references, respectively.

  • P200:

    Relocating the function when it is first used increases the speed of the program. This method is called Lazy Binding. Linux maintains a Procedure Linkage Table (PLT) to hold the mapping between symbol names and real addresses

  • P208:

    There are two relocation tables in the dynamic link. Rel. Dyn and.rel. PLT corresponding to.rel. Text and.rel. The former corrects the data reference (.got), and the latter corrects the function reference (.gott.plt).

  • P214:

    The dynamic linker is a special shared object that does not depend on any dynamically shared files and does its own relocation. This is done without using any static or all variables, using a special code called Bootstrap

Memory and the library

  • P286:

    In the i386 processor, the top of the stack has the ESP register location. Because the stack grows down, the address of the top of the stack is reduced by pressing the stack

  • P287:

    The Stack holds the maintenance information needed for a function call. It is called a Stack Frame, or active record. It contains the return address of the function and the function, temporary variables, and the context in which it is saved. Ebp is a frame pointer pointing to a fixed position in the active record.

  • P294:

    The caller and the caller of a function follow the same “calling convention.” The default cDECL convention requires that function arguments be pushed from right to left, with the function caller taking care of the arguments’ exit.

  • P301:

    Function return value: if four bytes, put in eAX. Return values of 4-8 bytes are stored jointly through EAX (low) and EDX (high). Look up the 8-byte return value and put the address of the return value on the stack into eAX.

  • P306:

    Data on the stack is freed when the function returns, and the global, dynamic way to allocate memory is to use the heap. If the operating system manages the heap, the performance overhead is high due to the constant system calls, so the application typically “wholesale” a large chunk of memory and then manages it itself.

  • P311:

    The heap does not always grow upward (as in the Windows HeapCreate series), a call to malloc may result in a system call (depending on whether the process has pre-allocated enough space), the heap is reclaimed by the operating system when the process ends, and the heap is contiguous in the virtual address space and may be discontiguous in the physical space

  • P314:

    There are three algorithms for heap allocation: free lists (simple, bytes of record length are easily destroyed by array crossing), bitmaps (fast (easy to hit cache), stable (not easy to array crossing), easy to manage, fragmentation, bitmaps can be too large), and object pools (allocated space for fixed size).

  • P319:

    After the process is created, the operating system gives control to an entry function of the runtime, and then starts heap construction, starts I/O, creates threads, constructs global variables, and so on. It then calls main, and when main completes, it performs the reverse of what it did before, making a system call to terminate the process.