C&C++ programmers are familiar with and strange, familiar with each code has to go through the process of compiling & linking, strange is that most people will not deliberately pay attention to the principle of compiling & linking. This article explores C++ compilation & linking for 64-bit Linux through four typical problems encountered during development.

Compilation principle:

Compiling the following simplest C++ program (main.cpp) into an executable object program can actually be divided into four steps: preprocessing, compilation, assembly, and linking

G++ main.cpp -v see the detailed process, but now the compiler has combined the preprocessing and compilation processes.

**g++ -e main. CPP -o main.ii, -e indicates only preprocessing. Preprocessing is mainly to deal with various macro expansion; Add line numbers and file identifiers to facilitate compiler generation of debugging information; Delete comments; Preserve compiler instructions used by the compiler, etc.

**g++ -s main.ii -o main. S, -s indicates only compilation. Compilation is to generate assembly code on the basis of preprocessing file after a series of lexical analysis, syntax analysis and optimization.

**g++ -c main.s — o main.o Assembly is the conversion of assembly code into instructions that a machine can execute.

** link: **g++ main.o. We need links because our code can’t be as simple as main. CPP. Modern software takes millions of lines. If you write a single main. CPP, it’s not easy to work with and maintain, so it’s usually a bunch of CPP files. These CPP will refer to a function or global variable in another module, when compiling a single CPP cannot know their exact address, so at the end of the compilation, all the linker will not accurate address symbols (functions and variables, etc.) set to the correct values, so that together can form a complete executable.

Problem 1: Header file occlusion

One of the weirdest problems during compilation is the blocking of header files. In the following code, main.cpp contains the header common.h. The desired header is the one on the far right in the image that contains name

Common. H (./include1) is found in the middle of the compilation process, which causes the compiler to error: The Test structure does not have a name member. For programmers, if they have defined the name member clearly, they say there is no name member. If they encounter this situation for the first time, they may doubt their life. To deal with this weird problem, we can use the -e parameter to see the compiler’s preprocessed output, as shown below.

The preprocessed file format is as follows: # linenum filename flag, which indicates that the following content is expanded from the linenum line of the file named filaname. The value of flag can be 1,2,3,4, or multiple values separated by Spaces. 1 indicates that a new file is to be expanded. 2 indicates that a file is expanded. 3 indicates that the following content comes from a system header file. 4 means that the following should be considered extern C form introduction.

From the expanded output, we can clearly see that the Test structure does not define the name member, and that the Test structure is defined in common. H in./include1. This can be done by adjusting -i or by placing a partial path in the header file to specify the location of the header file.

Target file:

For Linux, the target file Format is ELF (Executable Linkable Format), as defined in the /usr/include/elf.h header. Common target files include: Relocatable object files, that is.o ending object files, and static libraries fall into this category as well; Executable files, such as the default compiled A.out file; Share the target file.so; Core dump files are generated after core dump. You can run the file command to view the format of a Linux file.

A typical ELF file format, as shown in the figure below, has two views: the compile view, which organizes programs around the section header table; From the running perspective, the program header table organizes the program with segment as the core. In order to save storage, many sections will waste a lot of memory at runtime due to the alignment requirements. The runtime usually loads sections with similar permissions into segments.

You can run the objdump and readelf commands to view the contents of ELF files.

Common sections for relocatable object files are:

Symbol resolution:

The linker will modify the reference to the correct address of the referenced symbol. If the definition of the referenced symbol cannot be found, the linker will report the error of undefined reference to XXXX. In another case, where more than one symbol definition is found, the linker has a set of rules. Before describing rules, we need to understand the concepts of strong symbols and weak symbols. Simply speaking, functions and initialized global variables are strong symbols, while uninitialized global variables are weak symbols.

The multiple definition linker handles the following rules for symbols:

1. Multiple strong symbolic definitions are not allowed. The linker will report duplicate definition errors

2. If a strong symbol has the same name as multiple weak symbols, select a strong symbol

3. If the symbol is weak in all the object files, select the one that takes up the most space

With that in mind, let’s take a look at the static linking process:

1. The linker scans object files and static libraries from left to right in command line order

2. The linker maintains a set E of object files, an unparsed symbol set U, and a defined symbol set D in E. The initial states E, U, and D are empty

3. For each file f on the command line, the linker determines whether f is an object file or a static library. If it is an object file, f is added to E, undefined symbols in F are added to U, defined symbols are added to D, and the next file continues

4. If it is a static library, the linker tries to match the undefined symbol in U in the static library object file. If m matches a symbol in U, then M is treated as f in the previous step, and each member file is processed in turn until U and D remain unchanged and the member files not contained in E are simply discarded

5. After all input files are processed, if there are symbols in U, an error occurs. Otherwise, the link is normal and the executable file is output

Problem two: static library order

Liba. ain turn depends on libb.a. According to the static link algorithm, if g++ main. CPP liba.a libb.a order can be linked properly, because the undefined symbol FunB will be added to the U of the above algorithm. Then in libb found a definition, if use g + +. The main CPP libb. A liba. The order of a compiler, cannot find the definition of FunB, because, according to the static link algorithm in parsing libb. When a U is empty, so don’t need to do any parsing, easy to abandon libb. A, Analytical liba. But in a while and found FunB is not defined, cause U don’t empty, eventually link error, so doing static link, the need to pay special attention to the arrangement of the libraries, reference static library from other libraries that you need to put on the front, touching the link many libraries, may need to do some adjustments, so that the dependencies more clearly.

Dynamic linking:

Most of the previous content was related to static links, but static links have many disadvantages: it is not conducive to update, as long as a library changes, you need to recompile; It is not conducive to sharing, and each executable program keeps a separate copy, which is a great waste of memory and disk.

To generate a dynamic link library, the parameter “-shared-fpic” indicates that a shared object file with Position Independent Code (PIC) is to be generated. For static linking, the whole linking process is completed when the executable object file is generated, but to achieve the effect of dynamic linking, the program needs to be divided into relatively independent parts according to the module, and they are linked into a complete program when the program runs. Shared between at the same time in order to implement the code in different applications is to ensure that the code has nothing to do with the location (because of a Shared object file is loaded in each program of the virtual address is different, make sure it will work no matter where is loaded), and has nothing to do in order to achieve the position dependent on a premise: data segment and the distance of the code is always remains the same.

The Global Offset Table (Global Offset Table) is used in front of the data segment, regardless of how a target module is loaded in memory. At the same time, the compiler generates a relocation record for each item in the GLOBAL offset system. Since data segments can be modified, the dynamic linker relocates each item in the global offset system when loaded.

This is the general principle, but when implemented, functions and global variables are treated differently. Since there are thousands of functions in a large program, and the program is likely to use only a small part of them, there is no need to reposition all the functions when loading. Only the address is changed when needed. For this reason, the compiler introduces the PLT (Procedure Linkage Table) to implement delayed binding. When the PLT jumps to the next instruction in the global offset system, the PLT jumps to the next instruction in the global offset system. When the PLT jumps to the next instruction in the global offset system, the PLT jumps to the next instruction in the global offset system. The PLT will then call the dynamic linker to correct the address of the global offset system (GOT). After that, the PLT will jump to the address of the global offset system (GOT).

For shared object files, there are several sections to watch:

With that in mind, let’s look at the dynamic linking process:

1. Program execution will jump to the dynamic linker during loading

2. The dynamic linker automatically relocates itself based on the global offset table and. Dynamic information

3. Load the Shared object file: executable file and the linker itself merge into the global symbol table, in turn, breadth-first traversal Shared object files, the symbol table will continue to merge into the global symbol table, if more than one Shared objects have the same sign, load sharing will be the priority target file will block out the back of the symbol

4. Relocation and initialization

Problem three: global symbolic intervention

The most critical step 3 in the dynamic linking process can be seen that when multiple shared object files contain the same symbol, the symbol loaded first will occupy the global symbol table, and the same symbol in the subsequent shared object file will be ignored. When we don’t have a good handle on naming in our code, it can lead to very strange errors, lucky core dump right away, unfortunately core dump doesn’t make any sense until much later in the program, or even never core dump with incorrect results.

The libadd.so symbol will be used in main. CPP. We will focus on the libadd.so symbol

CPP libadd.so libadd1.so libadd.so, g++ main. CPP libadd.so libadd.so libadd.so, g++ main. CPP libadd.so libadd.so So libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd.so libadd. Add1.cpp says Add has three parameters. If there is such code in the program, you can expect it to cause a huge mess. LD_DEBUG=all./a.out = LD_DEBUG=all./a.out = LD_DEBUG=all. Add is bound to libadd.so and libadd1.so.

Dynamic library loading at runtime:

With the support of dynamic linking and shared object files, Linux provides a more flexible way to load modules: by providing several apis of Dlopen, DLSYm, DLclose and DLError, it can dynamically load modules at runtime, thus realizing the function of plug-ins.

Add. CPP “g++ -fpic — shared — o libadd.so add. CPP” becomes libadd.so. Main. CPP is compiled to a.out with g++ main. cpp-ldl. In main. CPP, first obtain a handle void *handle through the dlopen interface, and then search for the symbol Add in the dlSYm clause handle. After finding it, convert it into Add function, and then use it as a normal function. Any errors during this period can be picked up by dlError.

Static global variables and dynamic libraries cause double free

CPP has a static global object named foo_, and foo. CPP will compile to libfoo.a. Bar.cpp relies on the libfoo.a library. So, main.cpp depends on both libfoo.a and libbar.so.

The compiled makefile looks like this:

Running a.out causes a double free error. This is caused by calling the destructor twice in one place. The reason for this is that when linking, the static library first resolves the symbol of foo_ to the global variable in the static library. When linking libbar. So, the global symbol already exists, so according to the global symbol input, the reference to foo_ in the dynamic library points to the version in the static library. The result is that we end up destructing the same object twice.

The solutions are as follows:

1. Do not use global objects

2. Switch the order of libraries at compile time and put the dynamic libraries first so that there is only one foo_ object globally

3. Use all dynamic libraries

4. Control symbol visibility through compiler arguments.

Conclusion:

Through the problems encountered in the four compilation links, the basic compilation links of these things covered again, with these foundations, in the daily work to deal with the general compilation links should be able to do with ease. Due to the limited space, this article omits a lot of details and focuses on the big framework. If you want to dig deeper into the details, you can consult relevant references and read elf.h related header files.

References:

1. Linkers and Loaders

2. An In-depth Understanding of Computer Systems

3. Self-cultivation of a Programmer

4. www.gnu.org/software/bi…

Note 1: The tools covered in this article are available from www.gnu.org/software/bi…

Note 2: In the sample code picture of this article, the white area below each window has the corresponding file name of this code, pay attention to match the corresponding description in this article

Click to follow, the first time to learn about Huawei cloud fresh technology ~