An object file is an intermediate file that has been compiled from source code but not linked, and has the same structural format as an executable. ELF (Executable Linkable Format) file under Linux, and PE (Portable Executable) file under Windows.
1. Object file structure
The object file contains compiled machine instruction code, data and some symbol tables, strings, etc. The object file stores this information as sections or segments, based on attributes.
Main parts:
Segment | Content |
---|---|
.text | code |
.data | A global variable that has been initializedandLocal static variable |
.bss | An uninitialized global variableandLocal static variableorInitialized to 0 (global/local static)theThe size of theThe sum of the |
Why are program instructions and program data stored in two separate sections |
- Data and code segments are mapped to two different virtual memory regions.
- The code segment is read-only for the process, and the data segment can be read and modified. Separate the two segments to set different permissions to prevent instructions from being maliciously modified.
- CPU caches have data caches and instruction caches, which are stored separately to improve cache hit ratio.
- Multiple copies of the same program whose program instructions are the same require only one copy of read-only instructions in runtime memory, as do other read-only resources.
Object file structure details
int printf(const char* format, ...);
int global_init_var = 84;
int global_uninit_var;
void func1(int i)
{
printf("%d\n", i);
}
int main(void)
{
static int static_init_var = 85;
static int static_uninit_var;
int a = 1;
int b;
func1(static_init_var + static_uninit_var + a + b);
return a;
}
Copy the code
Objdump -h source.o Displays the structure of the target file
source.o: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0. text 0000005f 0000000000000000 0000000000000000 000000000000 2**0 CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE 1. data 00000008 0000000000000000 0000000000000000 0000000000000000 000000A0 2**2 CONTENTS, ALLOC, LOAD, DATA 2. BSS 00000004 0000000000000000 0000000000000000 0000000000000000 000000A8 2**2 BSS segment ALLOC 3. rodata 00000004 0000000000000000 0000000000000000 000000A8 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 4. Comment 0000002b 0000000000000000 0000000000000000 000000000000 2**0 READONLY 5.note.GNU-stack 00000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000000d7 2**0 Stack prompt CONTENTS, READONLY 6 .note.gnu.property 00000020 0000000000000000 0000000000000000 000000d8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .eh_frame 00000058 0000000000000000 0000000000000000 000000f8 2**3 CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATACopy the code
Section CONTENTS this section exists in the object file, BSS section has no CONTENTS property, BSS has no CONTENTS in the object file. The BSS segment is simply the sum of the sizes of uninitialized global variables and local static variables or (global/local static) variables initialized to 0. At runtime, the program allocates memory space for these variables according to the size of the MEMORY recorded in the BSS segment.
Size source.o Displays the size of each segment in the ELF file
text data bss dec hex filename
219 8 4 231 e7 source.o
Copy the code
Code segment data segment and read-only data segment
- .data segment: initialized global and local static variables
- .rodata segment: string constants
- Const variables are stored in read-only data segments
To specify a variable or code to be placed in a specific segment:
__attribute__((section("FOO"))) void foo(a)
{}Copy the code
2. ELF file structure description
1. The file header
Readelf -h source. O # -h header file7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00Class: ELF64 Data:2 'S complement, little endian #1(Current) # Version OS/ABI: UNIX-system V #0ABI version Type:RELELF file type (Relocatable file) Machine: Advanced Micro Devices X86- 64.Hardware platform Version:0x1# Hardware platform version Entry point:0x0Start of program headers:0Start of section headers: (bytes into file)1184(bytes into file) # position of segment table Flags:0x0
Size of this header: 64(bytes) # Size of program headers:0 (bytes)
Number of program headers: 0
Size of section headers: 64(bytes) # Number of section headers:14Number of ELF file middle Section header string table index:13The subscript of the segment in which the string table residesCopy the code
The ELF header describes some basic information about the entire file, most importantly the address offset of the segment table in ELF and the number of middle ELF files. A segment table is like an array. Each element is a segment descriptor for a segment. The segment descriptor records the basic attributes of a segment. The target file does not have Program headers. More information about Program headers is in Chapter-6.
Added: ELF file types are:
type | instructions | The instance |
---|---|---|
REL (Relocatable file) | Relocatable files, which contain code and data, can be used to link to executable files or shared object files | .o or.obj files |
DYN (Shared object file) | Share object files, containing code and data. | So or DLL files |
EXEC (Executable file) | Executable file | / bin/bash file |
CORE (Core file) |
2. The segment table
The segment table is an array of segment descriptors that record information about ELF segments (segment name, segment length, segment offset in the file, read and write permissions, etc.).
Readelf -s source. O # -s Segment14 section headers, starting at offset 0x4a0:
Section Headers:
[Nr] Name Type Address Offset Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040 000000000000005f 0000000000000000 AX 0 0 1
[ 2] .rela.text RELA 0000000000000000 00000380 0000000000000078 0000000000000018 I 11 1 8
[ 3] .data PROGBITS 0000000000000000 000000a0 0000000000000008 0000000000000000 WA 0 0 4
[ 4] .bss NOBITS 0000000000000000 000000a8 0000000000000004 0000000000000000 WA 0 0 4
[ 5] .rodata PROGBITS 0000000000000000 000000a8 0000000000000004 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 000000ac 000000000000002b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 000000d7 0000000000000000 0000000000000000 0 0 1
[ 8] .note.gnu.propert NOTE 0000000000000000 000000d8 0000000000000020 0000000000000000 A 0 0 8
[ 9] .eh_frame PROGBITS 0000000000000000 000000f8 0000000000000058 0000000000000000 A 0 0 8
[10] .rela.eh_frame RELA 0000000000000000 000003f8 0000000000000030 0000000000000018 I 11 9 8
[11] .symtab SYMTAB 0000000000000000 00000150 00000000000001b0 0000000000000018 12 12 8
[12] .strtab STRTAB 0000000000000000 00000300 0000000000000080 0000000000000000 0 0 1
[13] .shstrtab STRTAB 0000000000000000 00000428 0000000000000074 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
l (large), p (processor specific)
Copy the code
- The segment descriptor size is fixed, so the Name field (sh_name) is just the offset of the string in the.shstrtab segment table string table.
- Segment Offset Offset (sh_offset) indicates the segment Offset in the ELF file. The BSS segment Offset is the same as the read-only data segment Offset, because BSS does not exist in the ELF file and is meaningless.
- The type NOBITS indicates that the segment has no content in the file, such as the BSS segment.
- The flag bit of a segment represents the attributes of that segment in the process’s virtual address space. A (alloc) indicates that the segment needs to allocate space in the process space.
- RELA relocatable type segment. The Link field represents the symbol table’s subscript, and the Info field indicates the relocation segment (table) of which segment it is.
- Address is the virtual space Address, the virtual space Address was not allocated before the link, so all zeros.
3. Relocatable table
source.o: file format elf64-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
0000000000000017 R_X86_64_PC32 .rodata-0x0000000000000004
0000000000000021 R_X86_64_PLT32 printf-0x0000000000000004
000000000000003d R_X86_64_PC32 .data
0000000000000043 R_X86_64_PC32 .bss-0x0000000000000004
0000000000000056 R_X86_64_PLT32 func1-0x0000000000000004
RELOCATION RECORDS FOR [.eh_frame]:
OFFSET TYPE VALUE
0000000000000020 R_X86_64_PC32 .text
0000000000000040 R_X86_64_PC32 .text+0x0000000000000028
Copy the code
Combined with the target file assembly code:
0000000000000028 <main>:
0000000000000028: f3 0f 1e fa endbr64
000000000000002c: 55 push %rbp
000000000000002d: 48 89 e5 mov %rsp,%rbp
0000000000000030: 48 83 ec 10 sub $0x10,%rsp
0000000000000034: c7 45 f8 01 00 00 00 movl $0x1,-0x8(% RBP) # initialize variable A 000000000000003b: 8b15 00 00 00 00 mov 0x0(%rip),%edx # 41 <main+0x19>
0000000000000041: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 47 <main+0x1f>
0000000000000047: 01 c2 add %eax,%edx
0000000000000049: 8b 45 f8 mov -0x8(%rbp),%eax
000000000000004c: 01 c2 add %eax,%edx
000000000000004e: 8b 45 fc mov -0x4(%rbp),%eax
0000000000000051: 01 d0 add %edx,%eax
0000000000000053: 89 c7 mov %eax,%edi
0000000000000055: e8 00 00 00 00 callq 5a <main+0x32>
000000000000005a: 8b 45 f8 mov -0x8(%rbp),%eax
000000000000005d: c9 leaveq
000000000000005e: c3 retq
Copy the code
Offset 0x0056 is a reference to the func1 function, leaving 00 00 00 00 00 before the link, which needs to be repositioned during the link. Relocation corrects only global and external symbols.
When linking, the linker needs to correct the absolute address reference in the object file (e.g., printf, func1). The relocation information is recorded in the corresponding relocation segment (relocation table). The relocation segment corresponding to the.text segment is.rela.text.
4. String table and segment table String table
ELF all strings (variable names, function names, etc.) are stored centrally in the string table. References to these strings in the segment table can be replaced by offsets of these strings in the string table.
Hex dump of section '.strtab':
0x00000000 00736f75 7263652e 63007374 61746963 .source.c.static
0x00000010 5f696e69 745f7661 722e3139 32320073 _init_var.1922.s
0x00000020 74617469 635f756e 696e6974 5f766172 tatic_uninit_var
0x00000030 2e313932 3300676c 6f62616c 5f696e69 .1923.global_ini
0x00000040 745f7661 7200676c 6f62616c 5f756e69 t_var.global_uni
0x00000050 6e69745f 76617200 61646472 0066756e nit_var.addr.fun
0x00000060 6331005f 474c4f42 414c5f4f 46465345 c1._GLOBAL_OFFSE
0x00000070 545f5441 424c455f 00707269 6e746600 T_TABLE_.printf.
0x00000080 6d61696e 0076616c 756500 main.value.
Hex dump of section '.shstrtab': 0x00000000 002e7379 6d746162 002e7374 72746162 .. symtab.. strtab 0x00000010 002e7368 73747274 6162002e 72656c61 .. shstrtab.. rela 0x00000020 2e746578 74002e64 61746100 2e627373 .text.. data.. bss 0x00000030 002e7265 6c612e64 6174612e 72656c2e .. rela.data.rel. 0x00000040 6c6f6361 6c002e72 6f646174 61002e63 local.. rodata.. c 0x00000050 6f6d6d65 6e74002e 6e6f7465 2e474e55 omment.. note.GNU 0x00000060 2d737461 636b002e 6e6f7465 2e676e75 -stack.. note.gnu 0x00000070 2e70726f 70657274 79002e72 656c612e .property.. rela. 0x00000080 65685f66 72616d65 00 eh_frame.Copy the code
5. The symbol table
Symbol table '.symtab' contains 18 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS source.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5
6: 0000000000000004 4 OBJECT LOCAL DEFAULT 3 static_init_var.1920
7: 0000000000000000 4 OBJECT LOCAL DEFAULT 4 static_uninit_var.1921
8: 0000000000000000 0 SECTION LOCAL DEFAULT 7
9: 0000000000000000 0 SECTION LOCAL DEFAULT 8
10: 0000000000000000 0 SECTION LOCAL DEFAULT 9
11: 0000000000000000 0 SECTION LOCAL DEFAULT 6
12: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 global_init_var
13: 0000000000000004 4 OBJECT GLOBAL DEFAULT COM global_uninit_var
14: 0000000000000000 40 FUNC GLOBAL DEFAULT 1 func1
15: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
16: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
17: 0000000000000028 55 FUNC GLOBAL DEFAULT 1 main
Copy the code
The contents of the symbol table:
- Global symbols defined in this object file can be referenced by other object files. For example, global_init_var, main, func1, etc.
- Global symbols defined in external files and referenced in this file are also called external symbols. For example, printf.
- Local symbol (static_init_var).
- Section name.
Description:
-
Value is the Value of the symbol, which in the case of a variable or function (in the object file) is its address (the offset address of the segment). For example: Global_init_var is the first data in the. Data segment so its Value is 0x000000 and has a size of 4 bytes. Static_init_var is the second data in the. Data segment and its Value is 0x000004.
Hex dump of section '.data': 0x00000000 54000000 55000000 T... U...Copy the code
-
In the case of an executable, Value is the last virtual address of the symbol.
54: 0000000000001129 24 FUNC GLOBAL DEFAULT 14 func 55: 0000000000001170 101 FUNC GLOBAL DEFAULT 14 __libc_csu_init 56: 0000000000004018 0 NOTYPE GLOBAL DEFAULT 24 _end 57: 0000000000001040 47 FUNC GLOBAL DEFAULT 14 _start 58: 0000000000004010 0 NOTYPE GLOBAL DEFAULT 24__bss_start 59: 0000000000001141 37 FUNC GLOBAL DEFAULT 14 main Copy the code
-
Type refers to the Type of the symbol, SECTION or OBJECT or function FUNC.
-
The Ndx refers to the subscript of the segment where the symbol is located. The initialized global variable (global_init_var) and local static variable (static_init_var) are in the.data segment (3). Printf belongs to an undefined external variable (UND). Uninitialized global variables (global_uninit_var) are placed in the COMMON block because space is allocated in the.bss section when the final link is an executable (also in ELF file format).
6. Strong and weak symbols
extern int ext; // External variables are neither strong nor weak
int weak; // weak symbol for initialized global variables
int strong = 1; // Initializes the global variable with a strong sign
__attribute__((weak)) weak2 = 2; // Specify a weak symbol
int main(a)
{
return 0;
}
Copy the code
- Strong symbols are not allowed to be defined more than once
- A symbol is strong if it is strong in one object file and weak in all other object files.
- If a symbol is weak in all object files, the one that takes up more space is selected.
- Strong symbolic references report an error if the symbol is not defined if the link is not defined
- Weak symbolic references do not fail if they are undefined, and the linker defaults to either 0 (weak global_uninit_var) or a special value.
- The variable A, b is on the stack, b is uninitialized, and its value is random.