The previous plan was to relearn the process of compiling links, but this time it was arranged, but I still felt too young. As for the previous problem, WHEN PREPARING this one, I found that I still had a lot to learn, which should be a lot, and I was disappointed.
However, the flag has been sent out, and I’m going to stick to it, hoping to stick to it. Come on…
2.1 Object File
So we did what we did in the last section, and we compiled it step by step, the preprocessor file.i, and the assembler file.s, and the object file.o.
We talked about preprocessing files in the last article, assembly files are all assembly statements, and we’ll talk about that in the future, but in this video we’re going to target.o files.
I added some variables and a function call to the previous hello_world to make the analysis more complete.
#include <stdio.h>
int g_a = 0;
int g_b = 84;
int func1(int i)
{
printf("i = %d\n", i);
return 0;
}
int main(int argc, char **argv)
{
static int s_a = 0;
static int s_b = 84;
int a = 1;
int b;
func1(s_a+s_b+a+b);
printf("hello world %d %d %d\n", g_a, a, b);
return 0;
}
Copy the code
Here in hello_world, we define global variables, local variables, function calls.
Just as we did in the previous section, we compiled the.o files directly.
We can use the file command to see the format of this file:
root@ubuntu:~/c_test/02# file hello_world.o
hello_world.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Copy the code
The hello_world.o file is obviously an ELF file, but it is not an executable file, it is a relocatable file, waiting for the linker to link multiple. I’ll do that in the next video.
2.2 objdump command
In Linux, there are two commands for us, just for those of us who are fidgety. One is objdump disassembly and the other is readelf parsing ELF, so that instead of comparing ELF files byte by byte, we can compare ELF files byte by byte if we are interested. But I’m not going to compare it, because it’s a huge task.
Objdump is a common objdump parameter.
O # objdump -h hello_world. O # Display common segment information objdump -x hello_world Objdump -d hello_world. O # Displays all assembly information objdump -s hello_world. O # displays all contents of all parts of the request in hexadecimal format Objdump -t hello_world. O # displays the symbol table, similar to nm-sCopy the code
2.3 the readelf command
Next, take a look at the readelf arguments:
O # display elf segment table readelf-s hello_world. O # display symbol table readelf-r Readelf -d hello_world. O # Display relocation readelf -d hello_world. O # display dynamic segmentCopy the code
In fact, I am not very familiar with these commands, and I often add them in the future.
2.4 the hello_world. O analysis
After coming to the key point of this chapter, we will analyze hello_world.o first, next time we will analyze hello_world, and we will see if we should also analyze a wave of.
Against 2.4.1 header information
To view the header information, you can view it with either command:
2.4.1.1 objdump – f
root@ubuntu:~/c_test/02# objdump -f hello_world.o
hello_world.o: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x0000000000000000
Copy the code
You can view the system architecture of the compiled file, the format of the file, and the tag of the file:
/* BFD contains relocation entries. */
#define HAS_RELOC 0x01
/* BFD is directly executable. */
#define EXEC_P 0x02
...
/* BFD has symbols. */
#define HAS_SYMS 0x10
Copy the code
2.4.1.2 readelf – h
root@ubuntu:~/c_test/02# readelf -h hello_world.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 1168 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 13
Section header string table index: 10
root@ubuntu:~/c_test/02#
Copy the code
This readelf command key is according to the binary code of the elf, and belongs to the first part of the information, transcoding to come out, so we look just like this, and back, ready to analyze the binary, ah, now there are more than a tool, direct input a command came out, you don’t have to go to the binary data analysis.
Section 2.4.2 for information
Segment information is an important knowledge point, and segment information can be read using two commands.
Readelf-s will be saved for the next section to analyze hello_world, and objdump -h will be used for this section.
First, an overview:
root@ubuntu:~/c_test/02# size hello_world.o
text data bss dec hex filename
245 8 8 261 105 hello_world.o
Copy the code
We can use size to simply check the size of each segment: text segment, data segment, BSS segment.
root@ubuntu:~/c_test/02# objdump -h hello_world.o
hello_world.o: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0000007f 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000008 0000000000000000 0000000000000000 000000c0 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000008 0000000000000000 0000000000000000 000000c8 2**2
ALLOC
3 .rodata 0000001e 0000000000000000 0000000000000000 000000c8 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .comment 00000036 0000000000000000 0000000000000000 000000e6 2**0
CONTENTS, READONLY
5 .note.GNU-stack 00000000 0000000000000000 0000000000000000 0000011c 2**0
CONTENTS, READONLY
6 .eh_frame 00000058 0000000000000000 0000000000000000 00000120 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
Copy the code
The key point is out, the next goal, we will focus on the analysis of these paragraphs.
2.4.2.1 text segment
So this is where the code is stored, so let’s see how the code ends up, okay? Don’t want to watch?
Objdump is the old tool man
root@ubuntu:~/c_test/02# objdump -s hello_world.o hello_world.o: file format elf64-x86-64 Contents of section .text: 0000 554889e5 4883ec10 897dfc8b 45fc89c6 UH.. H.... }.. E... 0010 bf000000 00b80000 0000e800 000000b8 ................ 0020 00000000 c9c35548 89e54883 ec20897d ...... UH.. H.. .} 0030 ec488975 e0c745f8 01000000 8b150000 .H.u.. E......... 0040 00008b05 00000000 01c28b45 f801c28b ........... E.... 0050 45fc01d0 89c7e800 0000008b 05000000 E............... 0060 008b4dfc 8b55f889 c6bf0000 0000b800 .. M.. U.......... 0070 000000e8 00000000 b8000000 00c9c3 ............... . Just leave the.text section root@ubuntu:~/c_test/02#Copy the code
-s is to output all segments in hexadecimal format, which is also the format stored in the code, directly read hexadecimal can not understand, right, so we also need to disassemble the machine instruction segment, a comparison will understand a lot.
root@ubuntu:~/c_test/02# objdump -d hello_world.o
hello_world.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func1>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 89 7d fc mov %edi,-0x4(%rbp)
b: 8b 45 fc mov -0x4(%rbp),%eax
e: 89 c6 mov %eax,%esi
10: bf 00 00 00 00 mov $0x0,%edi
15: b8 00 00 00 00 mov $0x0,%eax
1a: e8 00 00 00 00 callq 1f <func1+0x1f>
1f: b8 00 00 00 00 mov $0x0,%eax
24: c9 leaveq
25: c3 retq
0000000000000026 <main>:
26: 55 push %rbp
27: 48 89 e5 mov %rsp,%rbp
2a: 48 83 ec 20 sub $0x20,%rsp
2e: 89 7d ec mov %edi,-0x14(%rbp)
31: 48 89 75 e0 mov %rsi,-0x20(%rbp)
35: c7 45 f8 01 00 00 00 movl $0x1,-0x8(%rbp)
3c: 8b 15 00 00 00 00 mov 0x0(%rip),%edx # 42 <main+0x1c>
42: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 48 <main+0x22>
48: 01 c2 add %eax,%edx
4a: 8b 45 f8 mov -0x8(%rbp),%eax
4d: 01 c2 add %eax,%edx
4f: 8b 45 fc mov -0x4(%rbp),%eax
52: 01 d0 add %edx,%eax
54: 89 c7 mov %eax,%edi
56: e8 00 00 00 00 callq 5b <main+0x35>
5b: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 61 <main+0x3b>
61: 8b 4d fc mov -0x4(%rbp),%ecx
64: 8b 55 f8 mov -0x8(%rbp),%edx
67: 89 c6 mov %eax,%esi
69: bf 00 00 00 00 mov $0x0,%edi
6e: b8 00 00 00 00 mov $0x0,%eax
73: e8 00 00 00 00 callq 78 <main+0x52>
78: b8 00 00 00 00 mov $0x0,%eax
7d: c9 leaveq
7e: c3 retq
Copy the code
The first column on the left is the offset of this statement to show that x86 machine commands are variable length.
The second column on the left, which is the hexadecimal content, can be viewed as machine code.
On the right are the corresponding assembly instructions.
By comparing the two hexadecimal, is it exactly the same, so the code is stored in a file, that’s how it’s stored.
2.4.2.2. Data segment
The data initialized in the program is a global variable that is not zero, a local static variable that is not zero.
Let’s see, this is 8 bytes. How can it be 8 bytes?
These two data are stored in the data segment, don’t believe it?
We use disassembly to verify a wave:
Contents of section .data: 0000 54000000 54000000 T... T...Copy the code
0x54000000 = 84, there are two 84s in there.
The reason why the lowest is first is that Linux systems tend to be small endian.
In order, it should be in the order in which the variables are loaded.
2.4.2.3 BSS
Global variables that are 0 in the program, static variables that are 0, or no assignment, no assignment will default to 0.
The reason why there is a.bss segment is because it can save a little space, and it will be created when the program runs, so this segment needs to be sized, and the size of the variables that fit this segment is counted in the program, leaving space, but because the value is 0, do not store it.
CONTENTS means this.o program contains CONTENTS, because BSS doesn’t have it, so it doesn’t.
2.4.2.4 rodata
Read-only data segment, usually containing const, such as string constants.
This is how the compiler detects read-only data, because it stores the read-only data together in the compiler. If the data is modified, an error will be reported.
However, it seems possible to point to this address with a pointer and change it directly, since Linux stores read-only data in RAM and can theoretically bypass the compiler.
Like 51 microcontroller, because the RAM is too little, the read-only data is directly stored in flash, Flash is a storage medium that can not be changed, this pointer can not be modified.
There must be no read-only data in this code. How could there be?
So that’s why we don’t know enough about c, because the string in printf is stored as a string constant, and that’s where the read-only data segment is. Don’t believe it? Let’s disassemble:
Contents of section .rodata: 0000 69203d20 25640a00 68656c6c 6f20776f i = %d.. hello wo 0010 726c6420 25642025 64202564 0a00 rld %d %d %d..Copy the code
Is the arrangement clear and unambiguous?
2.4.2.5 comment
Contains the information segment for the compiler
Contents of section .comment: 0000 00474343 3a202855 62756e74 7520352e .GCC: (Ubuntu 5. 0010 342e302D 36756275 6e747531 7e31362e 4.0-6 Ubuntu1 ~ 16.0020 30342e31 32292035 2e342e30 20323031 04.12) 5.4.0 201 0030 36303630 3900 60609.Copy the code
2.4.2.6. Note. The GNU – stack
Stack prompt, which means you don’t really understand,
2.4.2.7 .eh_frame
Save c++ exception handling
Contents of section .eh_frame: 0000 14000000 00000000 017a5200 01781001 ......... zR.. x.. 0010 1b0c0708 90010000 1c000000 1c000000 ................ 0020 00000000 26000000 00410e10 8602430d .... &... A.... C. 0030 06610c07 08000000 1c000000 3c000000 .a.......... <... 0040 00000000 59000000 00410e10 8602430d .... Y.... A.... C. 0050 0602540c 07080000 .. T.....Copy the code
I don’t understand this paragraph, and I’ll make it up later. There are still many things I don’t understand.
2.4.2.8 Custom Section
__attribute__((section("FOO"))) int global = 42;
__attribute__((section("BAR"))) void foo()
{undefined
}
Copy the code
When we first analyzed the Uboot code, there were a lot of these custom segments. When the Uboot started, it would put the same initialization function in the same segment, and then compile the contents of that segment as if in sequence.