Less code, more hair
This article has been included in my GitHub, welcome everyone to participate in star and issues.
Github.com/midou-tech/…
Interviewers love to ask “Hello world” questions, especially for college recruitment, which we have encountered three times
In fact, many seemingly natural and simple things, behind is a set of complex knowledge
I remember very clearly the first time I interviewed alibaba, the interviewer asked me to write a Hello World program
I did confirm the question mark three times, and the interviewer still calmly said yes
After writing, let me talk about Hello World, a Hello World for an hour
At that time, the interview was an internship, and AFTER the conversation, I really doubted life
This question is a great test of the candidate’s computer skills, self-study ability and ability to study the problem
To answer this question well, it is necessary to master computer fundamentals, operating systems, compilation principles and other knowledge to give a perfect answer
Here, chat, haven’t paid attention to me remember to pay attention to me, one button three links
So the code that I’ve written here is so simple that you wouldn’t think that something like this could go wrong, right
Not ashamed to say, long Uncle first time in writing this code, this simple procedure about three or four times
Finally, I hit run and found that the header file was missing
Add and run again, found missing ending; No.
And when I add that, I’m missing return 0
After several iterations, the console output hello World!! “I was so excited that I laughed out loud at that moment
So proud I quickly struck while the iron was hot and wrote the following version
These two versions of the code are written in C language, C language course should be the university of general education, with this language, we can see clearly
Running results:
My nephew is very curious about how the hello world is output to the screen
Uncle Long was also curious about this problem, but only after the completion of C linguistics began to curious
As we can see from von Neumann’s structure, the basic components of a computer are as follows:
-
Procedures, the first is through the input device, mouse, keyboard input
-
When you write code in a text file, you need to store it, and that’s when you use memory, and the code is stored on disk
-
When you click Run, your code is read into memory, where it is compiled into an executable by the compiler
-
The compiled file is launched by an operating system process to a user process that executes the user’s executable program
-
The CPU will process the program logic and output the execution results to the output device, the display
Each part has its own work and fulfill its duties. This is called module clarity and functional integrity in system design
Let’s talk about the hello world from several aspects and leave your interviewer stunned
Code entry procedure
- Start the IDE software
- Pounding away at the code on the keyboard
- After checking the code, click Run
Code input such a simple problem, but also with long Uncle say??
As shown in the figure above, the input process is first described. This figure makes a summary of the main components: keyboard, host (CPU, memory, disk) and display
The code entry process seems simple enough to start with an editor or IDE
At the beginning of learning to recommend the use of IDE, of course, not without IDE can not write code
Any text editor can enter code
IDE(Integrated Development Environment) an Integrated Development Environment, including a code editor, compiler, debugger, and graphical user interface
For example, to write C&C with class, you will download vc++, devC++, VS, Clion, etc. It is great, and the tools can improve productivity
I used to use Clion, IDE is according to their needs to choose, with cool on the line
What does it mean to start an IDE?
An IDE is a piece of software, highly integrated software, and starting an IDE means that the operating system must start a process called the IDE process
Since it is integration, there are many threads responsible for the integration module
The deeper content about processes and threads will be discussed in detail in the following article, which will not be expanded here
IDE processes are managed and scheduled by the operating system
How does code get into the IDE when the keyboard is pounding away at it?
To understand this question, let’s talk about how the keyboard works
The basic principle of the keyboard is to monitor keys in real time and send key information into the computer
In the internal design of the keyboard, there is a key scanning circuit that locates the position of the key. When any key is pressed, the coding circuit will generate a code, and these codes will be fed into the interface circuit, which is called the keyboard control circuit
According to the working principle of the keyboard, it can be divided into coded keyboard and non-coded keyboard
Coded keyboard: the function of the keyboard control circuit completely depends on the hardware to complete automatically, according to the key automatically identify coded information
Non-coded keyboard: The function of keyboard control circuit depends on hardware and software
The principle of monitoring keyboard is potential scan, potential scan is divided into progressive scan and row scan
This is how the keyboard works, and from now on I will be more powerful when I hit the keyboard at full speed
This is just the keyboard driver getting input from the keyboard. How does the application get input data?
The keyboard daemon retrieves the results and stores them in its own shared memory, from which the application retrieves the keyboard input results
In the figure above, it is clear that keyboard input will take place in IO. The overall content of IO is not expanded here, and will be updated later in the article
At this point, the IDE gets the code for the keyboard input, and your Hello World code finally shows up on the monitor
How does the code run while lying in the IDE
Code compiles into an executable program
When the code is finally typed, you’re excited to run it and can’t wait to see the results
Wait, the code program we write is called source code, the CPU executes machine code, and the program that contains machine code is called an executable
How does source code become an executable
How does source code become an executable program
Ides are integrated environments that make it easy for beginners to think the source code is being executed directly by the CPU
It’s not
Source code must be compiled by a compiler to become a binary executable program
IDE integrated compiler debugger, C language compiler mainly GNU compiler suite GCC, Microsoft C or MS C, Borland Turbo C or Turbo C
Compilation is a complex process, so let’s talk about it
Compilation is a general term for the process, which also includes different stages, source code pre-processing stage, compilation and optimization stage, assembly stage, link stage
Pretreatment stage
The preprocessor will process the pseudo-instructions (instructions beginning with #) and special symbols, remove all comments, and finally generate.i files
Pseudo instructions include:
- Macro definition directives, such as # define Name TokenString, # undef, etc
- Conditional compilation instructions, such as # ifdef, # ifndef, # else, # elif, # endif, etc
- The header file contains directives such as # include “FileName” or # include < FileName>
- Special symbols, the precompiler can recognize some special symbols
You can output the. I file using the GCC command
gcc -E helloWorld.cpp -o helloWorld.i
Copy the code
At this point, the.i file is loaded with the comments removed, the macros replaced, and the header file is larger than the source file
There’s too much content, so I’m not going to paste the code, so let’s try it out
Compilation optimization stage
The compiler’s job is to translate all the instructions into equivalent intermediate code or assembly code through lexical analysis, syntax analysis, semantic analysis, after confirming that all the instructions conform to the syntax rules
Lexical analysis and grammatical analysis should not be confused, the school recruitment interview was given a long time by the interviewer
- Lexical analysis
The lexical analyzer recognizes tokens and converts strings into tokens
Tokens include keywords, identifiers, literals, operators, and delimiters
Why do you do that? By sorting the words in the code, the later stages of the compiler will be better at understanding the code
- Syntax analysis
In the grammar analysis stage, Token string is converted into a tree data structure that reflects grammar rules, namely abstract syntax tree AST
The AST tree reflects the syntactic structure of the program
For example, the Hello World code is parsed to produce an AST tree
A lot of people are wondering why we want to convert the program into an AST tree.
Because the compiler does not understand the meaning of the statement directly like a human, the AST tree is more structured, and subsequent stages can do various analyses on the tree
- Semantic analysis
Semantic analysis is literally understanding semantics, understanding what a program does, right
For example, understand that the “+” sign performs addition, the “=” sign performs assignment, the “for” structure performs loop, and so on
So how do you understand that?
This stage is to do context analysis, including reference resolution, type analysis and checking
Reference resolution: find the scope of a variable, whether a variable scope is global or local
Type identification: For example, if a=3 is executed, the type of variable A needs to be identified because floating-point numbers and integers perform different operations
Type checking: for example, if int b = 3 can be assigned, the expression to the right of the equals sign must return an integer or be automatically converted to an integer before assigning to the variable b of type int
The information obtained after semantic analysis (reference resolution information, type information) will be annotated in the AST to form a syntax tree with annotations, so that the compiler can better understand the semantics of the program
With an abstract syntax tree of the program after parsing, and an AST and symbol table with annotations after parsing, you can depth-first traverse the AST and execute the semantic rules of the nodes as you traverse
For an interpreted language the entire process of traversal is the process of executing code
An interpreted language, such as Python, begins execution by traversing an abstract syntax tree with annotations and symbol tables
Compiled languages need to generate object code, such as C and C++
Compiled languages need to generate object code, whereas interpreted languages need only the interpreter to execute the semantics
When I was interviewed for the school recruitment, the interviewer saw how well I spoke hello World, and asked if the process of executing Hello World in Java and Python was the same.
At that time leng, know not the same but did not explain very clear
- Code optimization
The generated assembly code is different for different architectures of CPUS, and if the optimization is for each type of assembly code, the process can be quite complicated
Therefore, a process is added before the generation of the object code, and an intermediate code IR is generated after unified optimization into the object code
Optimization code is mainly divided into local optimization, global optimization, process optimization
Local optimization: expression analysis and activity analysis are available
Global optimization: optimization based on control flow chart CFG
Inter-process optimization: optimization across functions, optimization between multiple functions
Said some dry, give you an example to understand how to optimize
Activity analysis is the removal of unused code, such as unused variables
- Object code generation
Object code generation is the translation of optimized IR code into assembly code
The main steps in translating into assembly code are
- Select the appropriate instructions to generate the highest performance code
- Optimize register allocation so that frequently used variables are stored in registers
- In the premise of not changing the running results, the instructions are reordered optimization, reordering optimization is to make full use of the parallelism capability of the CPU
Instructions used during compilation
gcc -S helloWorld.cpp -o helloWorld.s
Copy the code
Generated assembly code:
The GCC version information is as follows
Assembly stage
The assembly code generated in the compilation phase above is still human-readable and not directly executed by the machine, which is called machine code
Machine code is placed in an executable file
There are several types of object files in Unix environments:
- A relocatable file that contains code and data suitable for other object file links to create an executable or shared object file
- A shared object file that holds code and data suitable for linking in both contexts
- An executable file contains a file that can be executed by a process created by the operating system
Different operating systems have different executable file formats
- Windows PE file
- Elf file for Linux
- Mac macho file
The assembler actually generates the first type of object file, and the executable file is generated only after the link is complete
Link phase
Link the object files generated in assembly stage together to generate executable files
In fact, many people do not understand why the link process, clearly assembly stage has generated the object code
For example, you can see that when we do system development, we pay attention to the modularization of system functions. Now it’s all micro services
A complex system is often divided into several different subsystems which are broken down into different functional modules
The chaining process is similar to this one: a complex piece of software needs to be broken up into different modules, each compiled independently
The process of assembling modules by “assembling” them as needed is called linking
For example, if the main function calls printf, the mian function does not know the address of printf at compile time (each module is compiled separately).
But the call must know the address of the function in order for the call relationship to occur
This address is temporarily shelved at compile time and is being corrected at link time
When the link is complete, it forms an executable file, also known as an ELF file
This ELF file and other files are also enough to drink in the back to talk about file systems
)
How the program is loaded
Loading is the loading of an executable program into memory for subsequent CPU execution
We often execute an executable program like this on the Linux command line
./a.out
Copy the code
This will load the program into memory and execute it directly after loading
You can actually use it
strace ./a.out
Copy the code
This command can see all system calls
You can see that the first system call executed is execve
You can see the description of this function in man Execve
execve() executes the program pointed to by filename. filename must be either a binary executable, or a script starting with a line of the form:
#! interpreter [optional-arg]
The program file specified by execve() must be a binary executable or execute a script that begins with shebang
The Shebang is #! At the beginning
View the source code for Linux execve as follows
The main execution falls on do_execve, so keep looking at the source code for do_execve
The loader executes search_binary_handler by calculating parameters such as argv and env and copying the data
The list_FOR_each_entry function is important as it iterates through the list of all formats to find the appropriate loadable format for the current system
As mentioned earlier, the executable file format under Linux is ELF files
Retval = FMT ->load executable (BPRM
Load_binary is an ELF file
If you look closely at the load_binary source code, you will see that there is an initialization, which is replaced by an assignment to
Now that you’ve gotten the hang of it, how can you tell if the ELF file is loaded
Can go to see how to write the source code (source too long, here will not paste to tell you the location is interested in their own to see)
Source location:
Static int load_elf_binary(struct linux_binprm * BPRM)
In the/fs/binfmt_elf. C Line 820
Readelf -l a.out to see the executable header information
The interpreter determines the file type of the executable by determining the value of INTERP in Program Headers
CPU executor
Our CPU executes the program in the following steps:
- The CPU reads the instruction to which the PC pointer points.
- The CPU analyzes the instructions in the instruction register and determines the types and parameters of the instructions, referred to as decode.
- If it is a computational instruction, it is handed over to the LOGICAL operation unit to compute. If the instruction is of the storage type, it is executed by the control unit, which is called execute.
- Return the execution result to the register or store the register data in memory, called store.
- The PC pointer increments and is ready to get the next instruction
The above step is a cycle also known as the CPU instruction cycle, the CPU is working cycle after cycle, cycle after cycle.
For more information about CPU execution, see Kobayashi. Aren’t you curious about how the CPU performs tasks?
Or stay tuned, and I’ll update you later on CPU execution scheduling
Results output
On Unix systems, each process has three standard I/ OS on by default: STDIN, STDOUT, and STDERR
Printf source
This is only the first source code, if you want to look at the Vfprintf implementation, you will see the underlying use of buffered output
An output is an output, which undergoes a transfer of data from an external file system
conclusion
So that’s basically the end of hello World, not necessarily the end of the story
For example, knowledge about file systems, IO, CPU scheduling, process management, memory management, and so on can not be explained thoroughly in one article
To be honest, a small Hello World hidden in the university is a lot of content
Today, I just control the overall, and the operating system will be updated one by one after the details
I’m Uncle Long, and I’ll see you next time