Ali: The interviewer asked me to say hello world

Less code, more hair

This article has been included in my GitHub, welcome everyone to participate in star and issues.

Github.com/midou-tech/…

Interviewers love to ask “Hello world” questions, especially for college recruitment, which we have encountered three times

In fact, many seemingly natural and simple things, behind is a set of complex knowledge

I remember very clearly the first time I interviewed alibaba, the interviewer asked me to write a Hello World program

I did confirm the question mark three times, and the interviewer still calmly said yes

After writing, let me talk about Hello World, a Hello World for an hour

At that time, the interview was an internship, and AFTER the conversation, I really doubted life

This question is a great test of the candidate’s computer skills, self-study ability and ability to study the problem

To answer this question well, it is necessary to master computer fundamentals, operating systems, compilation principles and other knowledge to give a perfect answer

Here, chat, haven’t paid attention to me remember to pay attention to me, one button three links

So the code that I’ve written here is so simple that you wouldn’t think that something like this could go wrong, right

Not ashamed to say, long Uncle first time in writing this code, this simple procedure about three or four times

Finally, I hit run and found that the header file was missing

Add and run again, found missing ending; No.

And when I add that, I’m missing return 0

After several iterations, the console output hello World!! “I was so excited that I laughed out loud at that moment

So proud I quickly struck while the iron was hot and wrote the following version

These two versions of the code are written in C language, C language course should be the university of general education, with this language, we can see clearly

Running results:

My nephew is very curious about how the hello world is output to the screen

Uncle Long was also curious about this problem, but only after the completion of C linguistics began to curious

As we can see from von Neumann’s structure, the basic components of a computer are as follows:

Procedures, the first is through the input device, mouse, keyboard input
When you write code in a text file, you need to store it, and that’s when you use memory, and the code is stored on disk
When you click Run, your code is read into memory, where it is compiled into an executable by the compiler
The compiled file is launched by an operating system process to a user process that executes the user’s executable program
The CPU will process the program logic and output the execution results to the output device, the display

Each part has its own work and fulfill its duties. This is called module clarity and functional integrity in system design

Let’s talk about the hello world from several aspects and leave your interviewer stunned

Code entry procedure

Start the IDE software
Pounding away at the code on the keyboard
After checking the code, click Run

Code input such a simple problem, but also with long Uncle say??

As shown in the figure above, the input process is first described. This figure makes a summary of the main components: keyboard, host (CPU, memory, disk) and display

The code entry process seems simple enough to start with an editor or IDE

At the beginning of learning to recommend the use of IDE, of course, not without IDE can not write code

Any text editor can enter code

IDE(Integrated Development Environment) an Integrated Development Environment, including a code editor, compiler, debugger, and graphical user interface

For example, to write C&C with class, you will download vc++, devC++, VS, Clion, etc. It is great, and the tools can improve productivity

I used to use Clion, IDE is according to their needs to choose, with cool on the line

What does it mean to start an IDE?

An IDE is a piece of software, highly integrated software, and starting an IDE means that the operating system must start a process called the IDE process

Since it is integration, there are many threads responsible for the integration module

The deeper content about processes and threads will be discussed in detail in the following article, which will not be expanded here

IDE processes are managed and scheduled by the operating system

How does code get into the IDE when the keyboard is pounding away at it?

To understand this question, let’s talk about how the keyboard works

The basic principle of the keyboard is to monitor keys in real time and send key information into the computer

In the internal design of the keyboard, there is a key scanning circuit that locates the position of the key. When any key is pressed, the coding circuit will generate a code, and these codes will be fed into the interface circuit, which is called the keyboard control circuit

According to the working principle of the keyboard, it can be divided into coded keyboard and non-coded keyboard

Coded keyboard: the function of the keyboard control circuit completely depends on the hardware to complete automatically, according to the key automatically identify coded information

Non-coded keyboard: The function of keyboard control circuit depends on hardware and software

The principle of monitoring keyboard is potential scan, potential scan is divided into progressive scan and row scan

This is how the keyboard works, and from now on I will be more powerful when I hit the keyboard at full speed

This is just the keyboard driver getting input from the keyboard. How does the application get input data?

The keyboard daemon retrieves the results and stores them in its own shared memory, from which the application retrieves the keyboard input results

In the figure above, it is clear that keyboard input will take place in IO. The overall content of IO is not expanded here, and will be updated later in the article

At this point, the IDE gets the code for the keyboard input, and your Hello World code finally shows up on the monitor

How does the code run while lying in the IDE

Code compiles into an executable program

When the code is finally typed, you’re excited to run it and can’t wait to see the results

Wait, the code program we write is called source code, the CPU executes machine code, and the program that contains machine code is called an executable

How does source code become an executable

How does source code become an executable program

Ides are integrated environments that make it easy for beginners to think the source code is being executed directly by the CPU

It’s not

Source code must be compiled by a compiler to become a binary executable program

IDE integrated compiler debugger, C language compiler mainly GNU compiler suite GCC, Microsoft C or MS C, Borland Turbo C or Turbo C

Compilation is a complex process, so let’s talk about it

Compilation is a general term for the process, which also includes different stages, source code pre-processing stage, compilation and optimization stage, assembly stage, link stage

Pretreatment stage

The preprocessor will process the pseudo-instructions (instructions beginning with #) and special symbols, remove all comments, and finally generate.i files

Pseudo instructions include:

Macro definition directives, such as # define Name TokenString, # undef, etc
Conditional compilation instructions, such as # ifdef, # ifndef, # else, # elif, # endif, etc
The header file contains directives such as # include “FileName” or # include < FileName>
Special symbols, the precompiler can recognize some special symbols

You can output the. I file using the GCC command

gcc -E helloWorld.cpp -o helloWorld.i
Copy the code

At this point, the.i file is loaded with the comments removed, the macros replaced, and the header file is larger than the source file

There’s too much content, so I’m not going to paste the code, so let’s try it out

Compilation optimization stage

The compiler’s job is to translate all the instructions into equivalent intermediate code or assembly code through lexical analysis, syntax analysis, semantic analysis, after confirming that all the instructions conform to the syntax rules

Lexical analysis and grammatical analysis should not be confused, the school recruitment interview was given a long time by the interviewer

Lexical analysis

The lexical analyzer recognizes tokens and converts strings into tokens

Tokens include keywords, identifiers, literals, operators, and delimiters

Why do you do that? By sorting the words in the code, the later stages of the compiler will be better at understanding the code

Syntax analysis

In the grammar analysis stage, Token string is converted into a tree data structure that reflects grammar rules, namely abstract syntax tree AST

The AST tree reflects the syntactic structure of the program

For example, the Hello World code is parsed to produce an AST tree

A lot of people are wondering why we want to convert the program into an AST tree.

Because the compiler does not understand the meaning of the statement directly like a human, the AST tree is more structured, and subsequent stages can do various analyses on the tree

Semantic analysis

Semantic analysis is literally understanding semantics, understanding what a program does, right

For example, understand that the “+” sign performs addition, the “=” sign performs assignment, the “for” structure performs loop, and so on

So how do you understand that?

This stage is to do context analysis, including reference resolution, type analysis and checking

Reference resolution: find the scope of a variable, whether a variable scope is global or local

Type identification: For example, if a=3 is executed, the type of variable A needs to be identified because floating-point numbers and integers perform different operations

Type checking: for example, if int b = 3 can be assigned, the expression to the right of the equals sign must return an integer or be automatically converted to an integer before assigning to the variable b of type int

The information obtained after semantic analysis (reference resolution information, type information) will be annotated in the AST to form a syntax tree with annotations, so that the compiler can better understand the semantics of the program

With an abstract syntax tree of the program after parsing, and an AST and symbol table with annotations after parsing, you can depth-first traverse the AST and execute the semantic rules of the nodes as you traverse

For an interpreted language the entire process of traversal is the process of executing code

An interpreted language, such as Python, begins execution by traversing an abstract syntax tree with annotations and symbol tables

Compiled languages need to generate object code, such as C and C++

Compiled languages need to generate object code, whereas interpreted languages need only the interpreter to execute the semantics

When I was interviewed for the school recruitment, the interviewer saw how well I spoke hello World, and asked if the process of executing Hello World in Java and Python was the same.

At that time leng, know not the same but did not explain very clear

Code optimization

The generated assembly code is different for different architectures of CPUS, and if the optimization is for each type of assembly code, the process can be quite complicated

Therefore, a process is added before the generation of the object code, and an intermediate code IR is generated after unified optimization into the object code

Optimization code is mainly divided into local optimization, global optimization, process optimization

Local optimization: expression analysis and activity analysis are available

Global optimization: optimization based on control flow chart CFG

Inter-process optimization: optimization across functions, optimization between multiple functions

Said some dry, give you an example to understand how to optimize

Activity analysis is the removal of unused code, such as unused variables

Object code generation

Object code generation is the translation of optimized IR code into assembly code

The main steps in translating into assembly code are

Select the appropriate instructions to generate the highest performance code
Optimize register allocation so that frequently used variables are stored in registers
In the premise of not changing the running results, the instructions are reordered optimization, reordering optimization is to make full use of the parallelism capability of the CPU

Instructions used during compilation

gcc -S helloWorld.cpp -o helloWorld.s
Copy the code

Generated assembly code:

The GCC version information is as follows

Assembly stage

The assembly code generated in the compilation phase above is still human-readable and not directly executed by the machine, which is called machine code

Machine code is placed in an executable file

There are several types of object files in Unix environments:

A relocatable file that contains code and data suitable for other object file links to create an executable or shared object file
A shared object file that holds code and data suitable for linking in both contexts
An executable file contains a file that can be executed by a process created by the operating system

Different operating systems have different executable file formats

Windows PE file
Elf file for Linux
Mac macho file

The assembler actually generates the first type of object file, and the executable file is generated only after the link is complete

Link phase

Link the object files generated in assembly stage together to generate executable files

In fact, many people do not understand why the link process, clearly assembly stage has generated the object code

For example, you can see that when we do system development, we pay attention to the modularization of system functions. Now it’s all micro services

A complex system is often divided into several different subsystems which are broken down into different functional modules

The chaining process is similar to this one: a complex piece of software needs to be broken up into different modules, each compiled independently

The process of assembling modules by “assembling” them as needed is called linking

For example, if the main function calls printf, the mian function does not know the address of printf at compile time (each module is compiled separately).

But the call must know the address of the function in order for the call relationship to occur

This address is temporarily shelved at compile time and is being corrected at link time

When the link is complete, it forms an executable file, also known as an ELF file

This ELF file and other files are also enough to drink in the back to talk about file systems

)

How the program is loaded

Loading is the loading of an executable program into memory for subsequent CPU execution

We often execute an executable program like this on the Linux command line

./a.out
Copy the code

This will load the program into memory and execute it directly after loading

You can actually use it

strace ./a.out
Copy the code

This command can see all system calls

You can see that the first system call executed is execve

You can see the description of this function in man Execve

execve() executes the program pointed to by filename. filename must be either a binary executable, or a script starting with a line of the form:

#! interpreter [optional-arg]

The program file specified by execve() must be a binary executable or execute a script that begins with shebang

The Shebang is #! At the beginning

View the source code for Linux execve as follows

The main execution falls on do_execve, so keep looking at the source code for do_execve

The loader executes search_binary_handler by calculating parameters such as argv and env and copying the data

The list_FOR_each_entry function is important as it iterates through the list of all formats to find the appropriate loadable format for the current system

As mentioned earlier, the executable file format under Linux is ELF files

Retval = FMT ->load executable (BPRM

Load_binary is an ELF file

If you look closely at the load_binary source code, you will see that there is an initialization, which is replaced by an assignment to

Now that you’ve gotten the hang of it, how can you tell if the ELF file is loaded

Can go to see how to write the source code (source too long, here will not paste to tell you the location is interested in their own to see)

Source location:

Static int load_elf_binary(struct linux_binprm * BPRM)

In the/fs/binfmt_elf. C Line 820

Readelf -l a.out to see the executable header information

The interpreter determines the file type of the executable by determining the value of INTERP in Program Headers

CPU executor

Our CPU executes the program in the following steps:

The CPU reads the instruction to which the PC pointer points.
The CPU analyzes the instructions in the instruction register and determines the types and parameters of the instructions, referred to as decode.
If it is a computational instruction, it is handed over to the LOGICAL operation unit to compute. If the instruction is of the storage type, it is executed by the control unit, which is called execute.
Return the execution result to the register or store the register data in memory, called store.
The PC pointer increments and is ready to get the next instruction

The above step is a cycle also known as the CPU instruction cycle, the CPU is working cycle after cycle, cycle after cycle.

For more information about CPU execution, see Kobayashi. Aren’t you curious about how the CPU performs tasks?

Or stay tuned, and I’ll update you later on CPU execution scheduling

Results output

On Unix systems, each process has three standard I/ OS on by default: STDIN, STDOUT, and STDERR

Printf source

This is only the first source code, if you want to look at the Vfprintf implementation, you will see the underlying use of buffered output

An output is an output, which undergoes a transfer of data from an external file system

conclusion

So that’s basically the end of hello World, not necessarily the end of the story

For example, knowledge about file systems, IO, CPU scheduling, process management, memory management, and so on can not be explained thoroughly in one article

To be honest, a small Hello World hidden in the university is a lot of content

Today, I just control the overall, and the operating system will be updated one by one after the details

I’m Uncle Long, and I’ll see you next time