Method call:
Methods as the basic unit of program composition, as the initial encapsulation of atomic instructions, the computer must support method calls. The atomic instructions of the Java language are bytecodes, which are encapsulated in Java methods, so the JVM must support calls to Java methods
Finger extraction (extraction instruction) :
Method to the package of atomic instructions, computer into the method, and finally take out these instructions and execute them one by one. Once the JVM enters Java methods, it also needs to be able to emulate the hardware CPU, able to retrieve bytecode instructions from Java methods one by one.
Operation:
The computer takes out the instruction, carries on the related logic operation according to the instruction, realizes the function.
Method calls
The JVM, as a virtual machine, should be capable of fully executing Java programs, so it must be capable of executing single Java functions. It must be able to make function calls. The JVM doesn’t actually end up calling actual Java functions, but rather a bunch of machine instructions. (See Note 1: Dynamically translating Java bytecode instructions at runtime into native machine instructions.)
1. Real machine calls
Learn some real machine call principles. The knowledge involved is more: save scene, stack allocation, parameter passing and so on. First the routine given in the book:
Use assembly for summation (read less, assembly language author has not yet contacted, embarrassed…)
main:
pushl %ebp
movl%esp, %ebp
subl$32, %ebp
movlA $5, 20(%esp)
movl$3, 24(%esp)
movl24(%esp), %eax
movl%eax, 4(%esp)
movl20(%esp), %eax
movl%eax, (%esp)
calladd
movl%eax, 28(%esp)
movl$0, %eax
leave
ret
add:
subl$16, %esp
movl12(%ebp), %eax
movl8(%ebp), %edx
addl%edx, %eax
movl%eax, -4(%ebp)
movl-4(%ebp), %eax
leave
retCopy the code
Analyze the program: this section of the assembler defined two labels, a main label, a add label. Labels are similar to the concept of functions in C. Just as a function. Focus specifically on how memory changes.
(1) Detailed explanation of main function
Main: // Saves the caller's stack base address and allocates new stack space for main(). pushl %ebp movl%esp, %ebp subl$32, %ebp// allocate new stack, total 32 bytes // initialize two data, one is 5, one is 3 movLA $5, 20(%esp)
movl$3, 24(%esp) // Movl24 (% ESP), % eAX MOVl % eAX, 4(% ESP) MovL20 (% ESP), % eAX Movl %eax, (%esp) // Return movL$0, %eax
leave
retCopy the code
The above process consists of 5 steps: save the caller’s stack base address, initialize the data, push the stack, call the function and return.
1. Save the stack base and assign a new stack
// Save the caller's stack base address and allocate new stack space for main(). pushl %ebp movl%esp, %ebpCopy the code
Pushl %ebp is the stack base address that holds the caller. The operating system is the caller. Movl %esp, %ebp points the caller’s stack base address to the top of its stack.
After execution, the subl%32, % ESP instructions are the instructions to allocate stack space. What it means: Subtract 32 bytes from the current top of the stack. Because on Linux, the stack grows downward, from the highest address in memory to the lowest address, each time a new function is called, stack space needs to be allocated for the new function. The top of the stack of the new function must be in the lowest direction relative to the top of the caller’s stack. So the top of the stack of the new function is always calculated by subtracting the top of the caller’s stack.
A byte contains 8 binary bits, and an int integer contains 4 bytes. The main function can hold a total of 8 ints, as shown in the figure below:
2. Initialize data
The next two instructions for main are:
movlA $5, 20(%esp)
movl$3, 24(%esp)Copy the code
This program stores the integers 5 and 3 on the main() stack, where 20 (%esp) indicates that the current top of the stack (that is, the esp register is currently pointing to your village address) is moved 20 bytes upwards, and the data 5 is saved there. Similarly, the integer 3 is stored at the top of the main () stack offset 24 bytes upward. As shown in figure:
The standing position of main function is denoted as (%esp). The space of main method stack is 32 bytes, and each 4 bytes is divided into units. The method stack of main is marked as cheap. Let’s look at where 5 and 3 are in the main stack. As shown in figure:
Note: The position at the bottom of the stack, 28 (%esp), is reserved for the return value of the call to add().
3. Pressure stack
The main function then executes:
movl24(%esp), %eax
movl%eax, 4(%esp)
movl20(%esp), %eax
movl%eax, (%esp)Copy the code
These four instructions are mainly used to push the stack. Movl24 (%esp), % eAX is to transfer the memory value at 24 (%esp) to the EAX register, i.e. 3. Next movL % eAX, 4 (% ESP) transfers the value of the EAX register to 4 (% ESP). The CPU does not support direct transfer of data from one memory location to another, and registers must be used to achieve this effect.
Generally speaking, the act of pushing data to the top of the stack is called “pushing.” The purpose of the pushdown operation is to make a function call. In a real physical machine, a function call must be preceded by a pushdown operation. The purpose of pushing is to transfer parameters.
4. Function calls
After the pushdown, the main function begins the function call, the Call Add instruction. After the add () function is executed, the result of the calculation is saved to the EAX register. To get the return value of the add function, main simply retrieves it from the eAX register. So movl% eAX, 28 (%esp) is then executed. The method stack memory is as follows:
The compiler assigns local variables within a method near the bottom of the stack and passed parameters near the top.
5. Return
movl$0, %eax
leave
retCopy the code
Function returns, saves the return value to the EAX register, and then executes two line return instructions.
(2) Add () function details
The add() call generally has four steps: save the caller’s stack base address, read the parameters, perform the operation, and return.
1. Save the caller stack base address
subl$16, %esp
movl12(%ebp), %eaxCopy the code
These two instructions mainly save the caller’s stack base address, the physical machine when performing a function call, the called always need to save the caller’s stack base address. This is because the esp and ebp two registers the next step is to point to by the caller’s stack base address and the stack, the two registers saved originally is the caller’s base address and stack address, now it is modified, if not saved, then when the caller after the function, the program will return to the caller in the process, The physical machine will not be able to recover the caller’s stack base and top, and thus cannot continue execution.
Next, the add() function is allocated 16 bytes of space. % $16 subl, esp. Memory structure is shown as follows:
Where, the space occupied by EIP: when the main function executes the calladd instruction, the physical machine automatically pushes a value — EIP to the top of the stack. The position of the instruction executed by CPU is jointly determined by the CS:IP registers, where EIP is the IP register. This is done so that main can continue processing subsequent instructions after it returns from the call. Next EBP: This value is pushed when the add function is executed.
The conclusions are as follows:
- When the call function is called by the physical machine, the machine automatically pushes the EIP
- When a physical machine makes a function call, the called machine needs to manually push the EBP onto the stack.
2. Read the parameters
movl12(%ebp), %eax
movl8(%ebp), %edxCopy the code
The first instruction uses two registers, EBP and EAX, where the EBP register, like ESP, is used only to identify the bottom of the stack. Fetch data 12 bytes cheaper from the bottom of the add stack and pass it to the EAX register. Similarly, the second instruction fetches data 8 bytes cheaper from the bottom of the add stack and places it in the EBX register.
3. Perform the operation
addl%edx, %eaxCopy the code
Add the value in the EDX register to the value in the EAX register, and the result is put into the EAX register. Before the add function executes this instruction, the two parameters from main function are read into registers eAX and ebX respectively, so the sum of the two registers is the sum of the two parameters of main function.
After summation, the add function then executes movl%eax, -4 (%ebp), moving the value in the EAX register to a position four bytes below the stack base address. It’s actually the first position on the stack of methods for add.
4. Return
The next step is to return the result, if any, into the EAX register, and then execute the leave and RET instructions. Execute the above instruction without a return value.
Through the analysis of the above program, the following operations are performed when the physical machine performs a function call:
- Parameter is pushed. Push as many arguments as you have. Different machines are pushed in different order.
- The code pointer (EIP) is pushed so that the physical machine can come back and continue executing the original function instructions after the called function has finished executing.
- The stack base address of the calling function is pushed. Prepares the physical machine to return to the caller from the caller.
- Allocate stack space for callers. Each function has its own stack space.
When a physical machine executes a program, it divides the program into functions, each of which corresponds to a piece of machine code. The machine code for a program is stored in a contiguous block of memory called a code block. The physical machine assigns a method stack to each function, and the method stack has no address connection to the code segment. And only when the physical machine executes a function will it be assigned a method stack.
Ps: The next section will look at function calls in C