The best time to plant a tree was ten years ago, followed by now.
0x00 several concepts interpreter instruction register stack local variable table operand stack 0x01 register or stack scenario register scheme stack scheme mixed scheme, Combine register and stack scheme (i.e., JVM scheme) simple comparison 0x02 interpreter core 0x03 simple code implementation register stack mix about the above code 0x04 summary 0x05 preview 0x06 FAQ
0x00 several concepts
The interpreter
An interpreter is a program that interprets a programming language line by line. The interpreter acts like a middleman. Every time you run a program, you have to switch to another language, so the interpreter’s programs run slowly. It does not translate the entire program at once, but runs it as soon as it translates a line of program description, then translates the next line, runs it again, and so on.
instruction
Some related concepts, assembly instructions, JVM bytecode instructions. Instructions are usually simple and describe a specific operation. For example, the assembly instruction mov &ex, 1 => puts the integer 1 in register EX. The bytecode instruction bpush 1 => puts Byte 1 at the top of the operand stack.
register
Simply put, a register is a Map. It can be accessed and valued by its key. Main operations get(key), PUT (key,value)
The stack
Push: pushes a value to the top of the stack and increases the stack size by 1. Pop: removes a value from the top of the stack and decreases the stack size by 1.
Local variable scale
The data structure inside the stack frame is an array. LLDB Array [0], array[1]. From another perspective, you can actually think of it as a special register. It’s just that key is a subscript.
The operand stack
Stack frame internal data structure, same stack.
0x01 Register or stack
Technology design outside of the business scenario is rogue. — Nicholas. Zhao four
scenario
The pseudocode is shown below.
int a = 1 + 1;
int b = 2 + 2;
int c = 0;
int d = b - a;
int d = d - c;
println(d)
Copy the code
2. The premise of automation is to be manual. So let’s do it.
Register scheme
Generate the instruction format, inst [value | address] + um participant
mov &0 1= > the number1Put in register0 里
add &3 &0 &1 &2=> Add the register0Register,1Register,2Add the values in, and place the result in a register3In the water.
sub &3 &0 &1 &2=> Add the register0In minus the register1Register,2And put the result into a register3In the water.
println &3=> Take out the register3Value of and output.
Copy the code
The command code generated is as follows. For ease of understanding, a comment follows a double slash. Indicates the state of a register or stack after an operation
mov &0 1 // {0:1}
mov &1 1 // {0:1, 1:1}
add &2 &0 &1 // {0:1, 1:1, 2:2}
mov &3 2 // {0:1, 1:1, 2:2, 3:2}
mov &4 2 // {0:1, 1:1, 2:2, 3:2, 4:2}
add &5 &3 &4 // {0:1, 1:1, 2:2, 3:2, 4:2, 5:4}
mov &6 0 // {0:1, 1:1, 2:2, 3:2, 4:2, 5:4, 6:0}
sub &7 &5 &2 // {0:1, 1:1, 2:2, 3:2, 4:2, 5:4, 6:0, 7:2}
sub &8 &7 &6 // {0:1, 1:1, 2:2, 3:2, 4:2, 5:4, 6:0, 7:2, 8:2}
println &8 // {0:1, 1:1, 2:2, 3:2, 4:2, 5:4, 6:0, 7:2, 8:2}
Copy the code
Stack solution
Generate command format inst [value]{0,1} LLDB
push 1= > will value1Push to the top of the stack. - > (.. .1)
Add => add the two values at the top of the stack and place the result at the top of the stack. ,v1,v2) -> (.. ,v1+v2)
Sub => assume the top stack value is v2, (.. ,v1,v2) -> (.. ,v1-v2)
Swap => swap the top two elements (v1,v2) -> (v2,v1)
Swap1 => second and third positions on switch stack (v1,v2,x) -> (v2,v1,x)
Println => the top value of the stack is sent out and the result is printed. .1) - > (..)
Copy the code
The command code generated is as follows. For ease of understanding, a comment follows a double slash. Indicates the state of a register or stack after an operation
push 1 // (1)
push 1 // (1, 1)
add // (2)
push 2 // (2, 2)
push 2 // (2, 2, 2)
add // (2, 4)
push 0 // (2, 4, 0)
swap // (2, 0, 4)
swap1 // (0, 2, 4)
swap // (0, 4, 2)
sub // (0, 2)
swap // (2, 0)
sub // (2)
println // ()
Copy the code
Hybrid schemes, combining registers and stacks (i.e., JVM schemes)
e.g:
load=> Takes the value from the register and pushes it to the operand stack.
store=> The top value of the operand stack is removed from the stack and stored in a register.
Copy the code
The command code generated is as follows. For ease of understanding, a comment follows a double slash. Indicates the state of a register or stack after an operation
push 1 // (1) {}
push 1 // (1, 1) {}
add // (2) {}
store &0 // () {0:2}
push 2 // (2) {0:2}
push 2 // (2, 2) {0:2}
add // (4) {0:2}
store &1 // () {0:2, 1:4}
push 0 // (0) {0:2, 1:4}
store &2 // () {0:2, 1:4, 2:0}
load &1 // (4) {0:2, 1:4, 2:0}
load &0 // (4, 2) {0:2, 1:4, 2:0}
sub // (2) {0:2, 1:4, 2:0}
store &3 // () {0:2, 1:4, 2:0, 3:2}
load &3 // (2) {0:2, 1:4, 2:2, 3:2}
load &2 // (2, 0) {0:2, 1:4, 2:2, 3:2}
sub // (2) {0:2, 1:4, 2:2, 3:2}
store $4 // () {0:2, 1:4, 2:2, 3:2, 4:2}
load $4 // (2) {0:2, 1:4, 2:2, 3:2, 4:2}
println // () {0:2, 1:4, 2:2, 3:2, 4:2}
Copy the code
Simple comparison
In simple scenarios, all three solutions meet requirements. The register scheme corresponds to the computer physical implementation. CPU processing is register-based. Advantages High performance, less data handling times, disadvantages of complex instructions. A purely stack-based scheme does not seem to have any, because there are only two operations, push POP, and data needs to be moved frequently in the case of a large number of local variables. The advantage is simple instructions. Easy to transplant. Hybrid scheme, set the strengths of the two, in the portability and efficiency of the compromise scheme.
0x02 Interpreter core
As foretold, the core of the interpreter is a loop.
do {
Get the next instruction
Explain instructions
} while(and instructions);
Copy the code
The architecture diagram is as follows
0x03 Simple code implementation
INTERPRETER-DEMO
register
// Core loop
public void run(a) {
List<Inst> insts = genInsts();
int size = insts.size();
int pc = 0;
while (pc < size) {
Inst inst = insts.get(pc);
inst.execute();
pc++;
}
}
// The Add directive is implemented
class AddInst implements Inst {
public final Integer targetAddress;
public final Integer[] sourceAddresses;
AddInst(Integer targetAddress, Integer... sourceAddresses) {
this.targetAddress = targetAddress;
this.sourceAddresses = sourceAddresses;
}
@Override
public void execute(a) {
int sum = 0;
for (Integer sourceAddress : sourceAddresses) {
sum += RegisterDemo.REGISTER.get(sourceAddress);
}
RegisterDemo.REGISTER.put(targetAddress, sum);
}
}
Copy the code
Code address: register DEMO
The stack
// Core loop
public void run(a) {
List<Inst> insts = genInsts();
int size = insts.size();
int pc = 0;
while (pc < size) {
Inst inst = insts.get(pc);
inst.execute();
pc++;
}
}
// The Add directive is implemented
class AddInst implements Inst {
@Override
public void execute(a) {
Integer v2 = StackDemo.STACK.pop();
Integer v1 = StackDemo.STACK.pop();
StackDemo.STACK.push(v1 + v2);
}
}
Copy the code
Address: stack DEMO
hybrid
// Core loop
public void run(a) {
List<Inst> insts = genInsts();
int size = insts.size();
int pc = 0;
while (pc < size) {
Inst inst = insts.get(pc);
inst.execute();
pc++;
}
}
// The Add directive is implemented
class AddInst implements Inst {
@Override
public void execute(a) {
Integer v2 = HybridDemo.STACK.pop();
Integer v1 = HybridDemo.STACK.pop();
HybridDemo.STACK.push(v1 + v2);
}
}
Copy the code
Address: mixed DEMO
About the code above
Implemented for a specific scenario, it just works. Three scenarios, each with less than 100 lines of code. Back to the previous question, is 10 minutes enough to implement a simple interpreter? Nature is enough, have interest might as well try to write.
0 x04 summary
This article discusses three options for implementing the interpreter, and implements the corresponding interpreter for a simple case. The Mini-JVM is slowly expanding from this simple core. Since the JVM chooses a hybrid solution, the focus will be solely on the hybrid solution.
!!!!!!!!! The core implementation of the interpreter is particularly important, if this article does not make the reader understand, it must be a problem of the article, welcome to submit your questions, have iterated this article.
0 x05 forecast
- The current interpreter has nothing to do with a javac compiled class. It should.
0x06 FAQ
- Why don’t we start with the interpreter when, by convention, we should start with classfile parsing? Classfile parsing is not difficult, manual work, parsed data is for the interpreter service. Essentially a static data provider. Non-core logic. Therefore, the interpreter comes first.