Allison is an engineer at Dropbox, where she maintains the world’s largest network of Python clients. Before Dropbox, she was the trainer for Recurse Center,… She gave a talk on Python internals at PyCon in North America, and she loves strange bugs. She blogs atakaptur.com.
Introduction
Byterun is a Python interpreter implemented in Python. As I worked on Byterun, I was surprised and pleased to discover that the Python interpreter infrastructure met the 500-line limit. In this chapter we will clarify the structure of the interpreter to give you enough knowledge to explore further. Our goal is not to show you every detail of the interpreter – like programming and other interesting areas of computer science, you may spend years figuring this out.
Byterun was done by Ned Batchelder and myself, building on the work of Paul Swartz. Its structure is similar to the main Python implementation (CPython), so understanding Byterun will help you understand most interpreters and CPython interpreters in particular. (If you don’t know what Python you’re using, chances are it’s CPython.) Despite its small size, Byterun can execute most simple Python programs.
A Python Interpreter
Before we begin, let’s narrow down the meaning of “Pyhton interpreter.” The term “interpreter” can be used in many different ways when discussing Python. Sometimes the interpreter refers to the REPL, the interactive environment you get when you type Python at the command line. Sometimes people use the Python interpreter and Python interchangeably to illustrate the process of executing Python code. In this chapter, “interpreter” has a more precise meaning: the last step in the process of executing a Python program.
Before the interpreter takes over, Python performs three other steps: lexical analysis, parsing, and compilation. These three steps together convert the source code into a code object, which contains instructions that the interpreter can understand. The interpreter’s job is to interpret instructions in a Code object.
You may be surprised to learn that there is a compile step involved in executing Python code. Python is often referred to as an interpreted language, as opposed to compiled languages like Ruby and Perl, such as C and Rust. However, the terminology here is not as precise as it seems. Most interpreted languages, including Python, do have a compile step. The reason Python is called interpreted is that it does less work at the compile step (the interpreter does more work) than compiled languages. As you will see later in this chapter, the Python compiler needs less information about program behavior than the C compiler does.
A Python Python Interpreter
Byterun is a Python interpreter written in Python, which may surprise you, but nothing is stranger than writing a C compiler written in C. (In fact, the widely used GCC compiler is written in C itself.) You can write a Python interpreter in almost any language.
Writing Python in Python has both advantages and disadvantages. The biggest drawback is speed: Executing code in Byterun is much slower than executing code in CPython, which is implemented and optimized in C. However, Byterun is designed for learning, so speed is not important to us. The biggest advantage of using Python is that we can just implement the interpreter and not worry about the Python runtime parts, especially the object system. For example, when Byterun needs to create a class, it falls back to “real” Python. Another advantage is that Byterun is easy to understand, partly because it is written in a high-level language (Python!). (Also we won’t be optimizing the interpreter – again, clarity and simplicity are more important than speed)
Building an Interpreter
Before we look at the Byterun code, we need some high-level perspective on the interpreter structure. How does the Python interpreter work?
The Python interpreter is a virtual machine that simulates real computer software. Our virtual machine is a stack machine, which uses several stacks to perform operations (as opposed to a register machine, which reads and writes data from specific memory locations).
The Python interpreter is a bytecode interpreter: its input is a collection of commands called bytecode. When you write Python code, lexers, syntax parsers, and compilers generate code objects for the interpreter to manipulate. Each Code object contains a set of instructions to be executed — called bytecode — plus some information that the interpreter needs. Bytecode is an intermediate representation of Python code: it represents the source code in a way that the interpreter can understand. This is similar to assembly language as an intermediate representation between C and machine language.
A Tiny Interpreter
To make the illustration more concrete, let’s start with a very small interpreter. It can only compute the sum of two numbers and understand only three instructions. All the code it executes is just a different combination of these three instructions. Here are the three instructions:
LOAD_VALUE
ADD_TWO_VALUES
PRINT_ANSWER
We don’t care about morphology, syntax, or compilation, so we don’t care how these instructions are generated. You can imagine, you write 7 + 5, and a compiler generates a combination of those three instructions for you. If you have a proper compiler, you can even write in Lisp’s syntax, as long as it produces the same instructions.
Assuming that
7 + 5Copy the code
Generate an instruction set like this:
what_to_execute = {
"instructions": [("LOAD_VALUE", 0), # the first number
("LOAD_VALUE", 1), # the second number
("ADD_TWO_VALUES", None),
("PRINT_ANSWER", None)],
"numbers": [7, 5] }Copy the code
The Python interpreter is a stack machine, so it must do this addition by manipulating the stack. (Figure 1.1) The interpreter first executes the first instruction, LOAD_VALUE, and pushes the first number onto the stack. And then it pushes the second number onto the stack. Then, the third instruction, ADD_TWO_VALUES, pops two numbers off the stack, adds them up, and pushes the result onto the stack. The final step is to pop up and print the result.
The LOAD_VALUE instruction tells the interpreter to push a number onto the stack, but the instruction itself does not specify what the number is. The instruction requires an extra piece of information to tell the interpreter where to find the number. So our instruction set has two parts: the instruction itself and a list of constants. (In Python, bytecodes are what we call “instructions,” and the interpreter executes code objects.)
Why not embed the numbers directly into the instructions? Imagine if, instead of numbers, we added strings. We don’t want to add something like a string to an instruction, because it can have any length. In addition, our design also means that we only need one copy of the object, such as the addition 7 + 7, and now the regular scale “numbers” contains only one 7.
You might wonder why you need directives other than ADD_TWO_VALUES. Indeed, this example is somewhat contrived for us to add two numbers. This instruction, however, is a wheel for building more complex programs. For example, with the three instructions we’ve defined so far, given the right combination of instructions, we can add three numbers, or any number of numbers. At the same time, the stack provides a clear way to track the state of the interpreter, which supports our increased complexity.
Now let’s finish our interpreter. The interpreter object needs a stack, which can be represented by a list. It also needs a method that describes how each instruction is executed. For example, LOAD_VALUE pushes a value onto the stack.
class Interpreter:
def __init__(self):
self.stack = []
def LOAD_VALUE(self, number):
self.stack.append(number)
def PRINT_ANSWER(self):
answer = self.stack.pop()
print(answer)
def ADD_TWO_VALUES(self):
first_num = self.stack.pop()
second_num = self.stack.pop()
total = first_num + second_num
self.stack.append(total)Copy the code
These three methods complete the three instructions understood by the interpreter. But the interpreter needs one more thing: a way to put everything together and execute it. This method is called run_code, and it takes the dictionary structure what-to-execute we defined earlier as an argument, loops through each instruction, how the instruction takes arguments, processes the arguments, and calls the corresponding method in the interpreter object.
def run_code(self, what_to_execute):
instructions = what_to_execute["instructions"]
numbers = what_to_execute["numbers"]
for each_step in instructions:
instruction, argument = each_step
if instruction == "LOAD_VALUE":
number = numbers[argument]
self.LOAD_VALUE(number)
elif instruction == "ADD_TWO_VALUES":
self.ADD_TWO_VALUES()
elif instruction == "PRINT_ANSWER":
self.PRINT_ANSWER()Copy the code
To test, we create an interpreter object and call run_code with the 7 + 5 instruction set defined earlier.
interpreter = Interpreter()
interpreter.run_code(what_to_execute)Copy the code
Obviously, it prints 12
Despite the limitations of our interpreter, this addition process is almost identical to that of a real Python interpreter. There are a few other things to note here.
First, some instructions require parameters. In true Python Bytecode, about half of the instructions have arguments. As in our example, parameters and instructions are packaged together. Note that the parameters of the instruction are different from those passed to the corresponding method.
Second, the instruction ADD_TWO_VALUES does not require any arguments; it pops the required values from the interpreter stack. This is the nature of stack-based interpreters.
Remember we said that as long as we give the proper instruction set, we don’t have to do any changes to the interpreter, we add multiple numbers. Consider the following instruction set. What do you think will happen? If you have a suitable compiler, what code would compile the following instruction set?
what_to_execute = {
"instructions": [("LOAD_VALUE", 0),
("LOAD_VALUE", 1),
("ADD_TWO_VALUES", None),
("LOAD_VALUE", 2),
("ADD_TWO_VALUES", None),
("PRINT_ANSWER", None)],
"numbers": [7, 5, 8] }Copy the code
From this point on, we begin to see the extensibility of this structure: we can describe more operations by adding methods to the interpreter object (as long as we have a compiler that generates a well-organized instruction set for us).
Variables
Next, add variable support to our interpreter. We need an instruction to store the value of a variable, STORE_NAME; An instruction to take the value of a variable LOAD_NAME; And a variable to value mapping. For now, we ignore namespaces and scopes, so we can store the mapping of variables and values directly in the interpreter object. Finally, we want to ensure that what_to_execute has a list of variable names in addition to a list of constants.
>>> def s():
... a = 1
... b = 2
... print(a + b)
# a friendly compiler transforms `s` into:
what_to_execute = {
"instructions": [("LOAD_VALUE", 0),
("STORE_NAME", 0),
("LOAD_VALUE", 1),
("STORE_NAME", 1),
("LOAD_NAME", 0),
("LOAD_NAME", 1),
("ADD_TWO_VALUES", None),
("PRINT_ANSWER", None)],
"numbers": [1, 2],
"names": ["a", "b"] }Copy the code
Our new implementation is below. To keep track of which name is bound to that value, we add an environment dictionary to the __init__ method. We also added the STORE_NAME and LOAD_NAME methods, which get the variable name and then set or fetch the value from the Environment dictionary.
Now the directive parameter has two different meanings, it could be the index of the numbers list or the index of the NAMES list. The interpreter can tell which arguments it is by examining the instructions it executes. We break this logic and place the mapping between the instruction and what parameters it uses in a separate method.
class Interpreter:
def __init__(self):
self.stack = []
self.environment = {}
def STORE_NAME(self, name):
val = self.stack.pop()
self.environment[name] = val
def LOAD_NAME(self, name):
val = self.environment[name]
self.stack.append(val)
def parse_argument(self, instruction, argument, what_to_execute):
""" Understand what the argument to each instruction means."""
numbers = ["LOAD_VALUE"]
names = ["LOAD_NAME", "STORE_NAME"]
if instruction in numbers:
argument = what_to_execute["numbers"][argument]
elif instruction in names:
argument = what_to_execute["names"][argument]
return argument
def run_code(self, what_to_execute):
instructions = what_to_execute["instructions"]
for each_step in instructions:
instruction, argument = each_step
argument = self.parse_argument(instruction, argument, what_to_execute)
if instruction == "LOAD_VALUE":
self.LOAD_VALUE(argument)
elif instruction == "ADD_TWO_VALUES":
self.ADD_TWO_VALUES()
elif instruction == "PRINT_ANSWER":
self.PRINT_ANSWER()
elif instruction == "STORE_NAME":
self.STORE_NAME(argument)
elif instruction == "LOAD_NAME":
self.LOAD_NAME(argument)Copy the code
With just five instructions, the run_code method is starting to get wordy. If this structure is maintained, then each instruction requires an if branch. Here, we use Python’s dynamic method of lookup. We always define a method named FOO for an instruction named FOO, so we can use Python’s getattr function to dynamically find the method at run time instead of this big branching structure. The run_code method now looks like this:
def execute(self, what_to_execute):
instructions = what_to_execute["instructions"]
for each_step in instructions:
instruction, argument = each_step
argument = self.parse_argument(instruction, argument, what_to_execute)
bytecode_method = getattr(self, instruction)
if argument is None:
bytecode_method()
else:
bytecode_method(argument)Copy the code
Real Python Bytecode
Now, abandon our small instruction set and look at real Python bytecode. The structure of the bytecode is similar to the instruction set of our little interpreter, except that the bytecode indicates the instruction by a byte instead of a name. To understand its structure, we will examine the bytecode of a function. Consider the following example
>>> def cond():
... x = 3
... if x < 5:
... return 'yes'
... else:
... return 'no'
...Copy the code
Python exposes a lot of internal information at run time, and we can access this information directly through the REPL. For the function object cond, cond.__code__ is the code object associated with it, and cond.__code__. Co_code is its bytecode. You never want to use these properties directly when you’re writing Python code, but it allows you to play all sorts of pranks and also look inside.
>>> cond.__code__.co_code # the bytecode as raw bytes b'd\x01\x00}\x00\x00|\x00\x00d\x02\x00k\x00\x00r\x16\x00d\x03\x00Sd\x04\x00Sd\x00 \x00S' >>> list(cond.__code__.co_code) # the bytecode as numbers [100, 1, 0, 125, 0, 0, 124, 0, 0, 100, 2, 0, 107, 0, 0, 114, 22, 0, 100, 3, 0, 83, 100, 4, 0, 83, 100, 0, 0, 83]Copy the code
When we print this bytecode directly, it seems completely incomprehensible – the only thing we know is that it is a string of bytes. Fortunately, we have a very powerful tool to use: the Dis Module in the Python standard library.
Dis is a bytecode disassembler. The disassembler takes input from low-level code written for the machine, such as assembly code and bytecode, and outputs it in human-readable form. When we run dis.dis, it outputs an interpretation of each bytecode.
>>> dis.dis(cond) 2 0 LOAD_CONST 1 (3) 3 STORE_FAST 0 (x) 3 6 LOAD_FAST 0 (x) 9 LOAD_CONST 2 (5) 12 COMPARE_OP 0 (<) 15 POP_JUMP_IF_FALSE 22 4 18 LOAD_CONST 3 ('yes') 21 RETURN_VALUE 6 >> 22 LOAD_CONST 4 ('no') 25 RETURN_VALUE 26 LOAD_CONST 0 (None) 29 RETURN_VALUECopy the code
What does all this mean? Let’s take the first directive LOAD_CONST as an example. The number (2) in the first column indicates the number of lines of the corresponding source code. The number in the second column is the bytecode index, telling us that the instruction LOAD_CONST is at position 0. The third column is the human-readable name of the instruction itself. If the fourth column exists, it represents the argument to the instruction. If column 5 exists, it is a hint of what the argument is.
Consider the first few bytes of the bytecode: [100, 1, 0, 125, 0, 0]. These six bytes represent instructions for the two stripe parameters. We can use dis.opname, a byte to readable string mapping, to find out what instruction 100 and instruction 125 represent:
>>> dis.opname[100]
'LOAD_CONST'
>>> dis.opname[125]
'STORE_FAST'Copy the code
The second and third bytes – 1, 0 – are LOAD_CONST arguments, and the fifth and sixth bytes – 0, 0 – are STORE_FAST arguments. Like our previous little example, LOAD_CONST needs to know where to find constants, and STORE_FAST needs to find names. (Python LOAD_CONST is the same as LOAD_VALUE in our small example, LOAD_FAST is the same as LOAD_NAME). So these six bytes represent the first line of source code x = 3. If Python uses one byte, you can only have 256 constants/names per code object, but with two bytes, this increases to 256 square, 65536).
Conditionals and Loops
So far, our interpreter can only execute instructions one by one. The problem with this is that we often want to execute certain instructions more than once, or skip them under certain conditions. To be able to write loops and branches, the interpreter must be able to jump between instructions. In a way, Python uses GOTO statements in bytecode to handle loops and branches! Let’s look at the disassembly result of a cond function:
>>> dis.dis(cond) 2 0 LOAD_CONST 1 (3) 3 STORE_FAST 0 (x) 3 6 LOAD_FAST 0 (x) 9 LOAD_CONST 2 (5) 12 COMPARE_OP 0 (<) 15 POP_JUMP_IF_FALSE 22 4 18 LOAD_CONST 3 ('yes') 21 RETURN_VALUE 6 >> 22 LOAD_CONST 4 ('no') 25 RETURN_VALUE 26 LOAD_CONST 0 (None) 29 RETURN_VALUECopy the code
The conditional expression if x < 5 on the third line is compiled into four instructions: LOAD_FAST, LOAD_CONST, COMPARE_OP, and POP_JUMP_IF_FALSE. X < 5 corresponds to loading x, loading 5, comparing these two values. The POP_JUMP_IF_FALSE directive completes the if statement. This instruction pops the value at the top of the stack, and if the value is true, nothing happens. If the value is false, the interpreter jumps to another instruction.
The instruction to be loaded is called the jump target and is taken as an argument to the instruction POP_JUMP. Here, the jump target is 22, and the instruction with index 22 is LOAD_CONST, which corresponds to line 6 of the source code. (DIS marks jump target with >>.) If X < 5 is false, the interpreter ignores line 4 (return yes) and jumps straight to line 6 (return “no”). So the interpreter executes instructions selectively by jumping instructions.
Python loops also rely on jumps. In the bytecode below, the line while x < 5 produces almost the same bytecode as if x < 10. In both cases, the interpreter performs the comparison first and then executes POP_JUMP_IF_FALSE to control which instruction to execute next. The last bytecode on the fourth line, JUMP_ABSOLUT(where the body of the loop ends), returns the interpreter to the ninth instruction at the beginning of the loop. When x < 10 becomes false, POP_JUMP_IF_FALSE causes the interpreter to jump to the end of the loop, instruction 34.
>>> def loop(): ... x = 1 ... while x < 5: ... x = x + 1 ... return x ... >>> dis.dis(loop) 2 0 LOAD_CONST 1 (1) 3 STORE_FAST 0 (x) 3 6 SETUP_LOOP 26 (to 35) >> 9 LOAD_FAST 0 (x) 12 LOAD_CONST 2 (5) 15 COMPARE_OP 0 (<) 18 POP_JUMP_IF_FALSE 34 4 21 LOAD_FAST 0 (x) 24 LOAD_CONST 1 (1) 27 BINARY_ADD 28 STORE_FAST 0 (x) 31 JUMP_ABSOLUTE 9 >> 34 POP_BLOCK 5 >> 35 LOAD_FAST 0 (x) 38 RETURN_VALUECopy the code
Explore Bytecode
I encourage you to try out your own functions with dis.dis. Some interesting questions to explore:
- What’s the difference between a for loop and a while loop to the interpreter?
- Can you write two different functions that produce the same bytecode?
elif
How does it work? What about list derivation?
Frames
So far, we’ve seen that the Python virtual machine is a stack machine. It can execute instructions sequentially, jump between instructions, push and pop stack values. But it’s a long way from the interpreter we had in mind. In the previous example, the last instruction is RETURN_VALUE, which corresponds to the return statement. But where does it go back to?
To answer this question, we must add another layer of complexity: frame. A frame is a collection of information and the execution context of the code. Frames are dynamically created and destroyed during Python code execution. Each frame corresponds to one call of the function. — So each frame has only one code object associated with it, and a code object can have many frames. Let’s say you have a function that recursively calls itself 10 times, and you have 11 frames. In general, Python programs have a frame per scope, for example, per module, per function call, and per class definition.
Frame exists in the call stack, a completely different stack from the one we discussed earlier. (The stack you’re most familiar with is the call stack, the exception tracebacks you often see, with each traceback starting with “File ‘program.py’ corresponding to a frame.) The stack on which the interpreter operates as it executes bytecode is called the data stack. There is actually a third stack, called a block stack, for specific blocks of control flow, such as loops and exception handling. Each frame in the call stack has its own data stack and block stack.
Let’s use a concrete example to illustrate. Suppose the Python interpreter executes at the point marked 3. The interpreter is calling foo, which then calls bar. The following is a diagram of the frame call stack, block stack, and data stack. What we’re interested in is that the interpreter starts at foo() at the bottom, then executes the function body of foo, and then reaches bar.
>>> def bar(y): ... z = y + 3 # <--- (3) ... and the interpreter is here. ... return z ... >>> def foo(): ... a = 1 ... b = 2 ... return a + bar(b) # <--- (2) ... which is returning a call to bar ... . >>> foo() # <--- (1) We're in the middle of a call to foo ... 3Copy the code
The interpreter is now in the call to the bar function. There are three FRAMS in the call stack: one for the Module layer, one for the function foo, and one for the function bar. (Figure 1.2) Once the bar returns, its corresponding frame is popped from the call stack and discarded.
The bytecode instruction RETURN_VALUE tells the interpreter to pass a value between frames. First, it pops the top value of the data stack in the frame at the top of the call stack. Then eject the entire frame and discard it. Finally, the value is pushed onto the data stack of the next frame.
When Ned Batchelder and I were writing Byterun, we had a major bug in our implementation for a long time. Instead of having one data stack per frame, we have only one data stack in the entire virtual machine. We did a lot of testing, both on Byterun and on real Python, to make sure the results were consistent. We passed almost every test except one, and that was the generator. Finally, a close look at the Cpython source code revealed the error. Moving the data stack to each frame solves this problem.
Looking back at the bug, I was surprised to see how little Python really relies on having a data stack per frame. Almost all operations in Python empty the data stack, so it is ok for all frames to share one data stack. In the example above, when bar completes execution, its data stack is empty. Even if Foo shares this stack, its value is not affected. However, a key feature of a generator is that it can pause execution of one frame, return to another frame, and after a period of time return to the original frame and continue execution in the same state as when it left.
Byterun
We now have enough Python interpreter background to examine Byterun.
There are four types of objects in Byterun.
VirtualMachine
Class, which manages the high-level structure, frame call stack, instruction to operation mapping. This is a little bit more than the previousInteprter
More complex versions of the object.Frame
Each class,Frame
Each class has a Code Object and manages some other necessary state information, global and local namespaces, Pointers to the frame that invoked it, and the bytecode instructions that were last executed.Function
Class, which is used instead of real Python functions. Recall that a new frame is created when the function is called. We implement it ourselvesFunction
, so we control the creation of a new frame.Block
Class, which simply wraps three properties of a code block. (The details of the code block are not the core of the interpreter, and we won’t spend time with it; it’s listed here because Byterun needs it.)
The VirtualMachine
Class
Only one VirtualMachine is created when the program runs because we only have one interpreter. VirtualMachine holds the call stack, exception state, and return values passed in the frame. Its entry point is the run_code method, which takes a compiled Code object, starts by creating a frame, and then runs the frame. This frame might create a new frame; The call stack shrinks as the program runs. When the first frame returns, the execution ends.
class VirtualMachineError(Exception):
pass
class VirtualMachine(object):
def __init__(self):
self.frames = [] # The call stack of frames.
self.frame = None # The current frame.
self.return_value = None
self.last_exception = None
def run_code(self, code, global_names=None, local_names=None):
""" An entry point to execute code using the virtual machine."""
frame = self.make_frame(code, global_names=global_names,
local_names=local_names)
self.run_frame(frame)Copy the code
The Frame
Class
Next, let’s write the Frame object. Frame is a collection of properties that have no methods. As mentioned earlier, these attributes include code objects generated by the compiler; Local, global, and built-in namespaces; A reference to the previous frame; A data stack; A block stack; The last instruction to be executed. (We need to do a little more work with built-in namespaces, which Python treats differently; But this detail is not important to our virtual machine.
class Frame(object):
def __init__(self, code_obj, global_names, local_names, prev_frame):
self.code_obj = code_obj
self.global_names = global_names
self.local_names = local_names
self.prev_frame = prev_frame
self.stack = []
if prev_frame:
self.builtin_names = prev_frame.builtin_names
else:
self.builtin_names = local_names['__builtins__']
if hasattr(self.builtin_names, '__dict__'):
self.builtin_names = self.builtin_names.__dict__
self.last_instruction = 0
self.block_stack = []Copy the code
Next, we add the frame operation to the virtual machine. There are three helper functions: a method to create a new frame, and a method to push and push. The fourth function, run_frame, does the main job of executing the frame, which we’ll discuss later.
class VirtualMachine(object): [... snip ...] # Frame manipulation def make_frame(self, code, callargs={}, global_names=None, local_names=None): if global_names is not None and local_names is not None: local_names = global_names elif self.frames: global_names = self.frame.global_names local_names = {} else: global_names = local_names = { '__builtins__': __builtins__, '__name__': '__main__', '__doc__': None, '__package__': None, } local_names.update(callargs) frame = Frame(code, global_names, local_names, self.frame) return frame def push_frame(self, frame): self.frames.append(frame) self.frame = frame def pop_frame(self): self.frames.pop() if self.frames: self.frame = self.frames[-1] else: self.frame = None def run_frame(self): pass # we'll come back to this shortlyCopy the code
The Function
Class
The implementation of Function is a bit distorted, but most of the details are not important to understand the interpreter. The important thing is that when the function is called — the __call__ method is called — it creates a new Frame and runs it.
class Function(object):
"""
Create a realistic function object, defining the things the interpreter expects.
"""
__slots__ = [
'func_code', 'func_name', 'func_defaults', 'func_globals',
'func_locals', 'func_dict', 'func_closure',
'__name__', '__dict__', '__doc__',
'_vm', '_func',
]
def __init__(self, name, code, globs, defaults, closure, vm):
"""You don't need to follow this closely to understand the interpreter."""
self._vm = vm
self.func_code = code
self.func_name = self.__name__ = name or code.co_name
self.func_defaults = tuple(defaults)
self.func_globals = globs
self.func_locals = self._vm.frame.f_locals
self.__dict__ = {}
self.func_closure = closure
self.__doc__ = code.co_consts[0] if code.co_consts else None
# Sometimes, we need a real Python function. This is for that.
kw = {
'argdefs': self.func_defaults,
}
if closure:
kw['closure'] = tuple(make_cell(0) for _ in closure)
self._func = types.FunctionType(code, globs, **kw)
def __call__(self, *args, **kwargs):
"""When calling a Function, make a new frame and run it."""
callargs = inspect.getcallargs(self._func, *args, **kwargs)
# Use callargs to provide a mapping of arguments: values to pass into the new
# frame.
frame = self._vm.make_frame(
self.func_code, callargs, self.func_globals, {}
)
return self._vm.run_frame(frame)
def make_cell(value):
"""Create a real Python closure and grab a cell."""
# Thanks to Alex Gaynor for help with this bit of twistiness.
fn = (lambda x: lambda: x)(value)
return fn.__closure__[0]Copy the code
Then, going back to the VirtualMachine object, we’ll add some help to the manipulation of the data stack. The stack of bytecode operations is always in the data stack of the current frame. This allows us to do POP_TOP,LOAD_FAST, and make other stack instructions more readable.
class VirtualMachine(object): [... snip ...] # Data stack manipulation def top(self): return self.frame.stack[-1] def pop(self): return self.frame.stack.pop() def push(self, *vals): self.frame.stack.extend(vals) def popn(self, n): """Pop a number of values from the value stack. A list of `n` values is returned, the deepest value first. """ if n: ret = self.frame.stack[-n:] self.frame.stack[-n:] = [] return ret else: return []Copy the code
Before we can run frame, we need two more methods.
The first method, parse_byte_and_args, takes a bytecode as input, checks to see if it has arguments, and parses its arguments if so. This method also updates the frame last_Instruction property, which points to the last instruction executed. An instruction with no arguments is only one byte long, while a byte with arguments is three bytes long. The meaning of parameters depends on what the instruction is. For example, as mentioned earlier, the POP_JUMP_IF_FALSE directive takes arguments that refer to jump targets. BUILD_LIST, which takes the number of lists. LOAD_CONST, which takes an index of a constant.
Some instructions take simple numbers as arguments. For others, the virtual machine takes a little effort to find meaning. The disModule in the standard library has a cheat sheet explaining what parameters mean, which makes our code more concise. For example, the list dis.hasname tells us that LOAD_NAME, IMPORT_NAME,LOAD_GLOBAL, and nine other directives all have the same meaning: the index of the list of names.
class VirtualMachine(object): [... snip ...] def parse_byte_and_args(self): f = self.frame opoffset = f.last_instruction byteCode = f.code_obj.co_code[opoffset] f.last_instruction += 1 byte_name = dis.opname[byteCode] if byteCode >= dis.HAVE_ARGUMENT: # index into the bytecode arg = f.code_obj.co_code[f.last_instruction:f.last_instruction+2] f.last_instruction += 2 # advance the instruction pointer arg_val = arg[0] + (arg[1] * 256) if byteCode in dis.hasconst: # Look up a constant arg = f.code_obj.co_consts[arg_val] elif byteCode in dis.hasname: # Look up a name arg = f.code_obj.co_names[arg_val] elif byteCode in dis.haslocal: # Look up a local name arg = f.code_obj.co_varnames[arg_val] elif byteCode in dis.hasjrel: # Calculate a relative jump arg = f.last_instruction + arg_val else: arg = arg_val argument = [arg] else: argument = [] return byte_name, argumentCopy the code
The next method is dispatch, which looks at a given instruction and performs the action. In CPython, this dispatch function is done in a huge switch statement with over 1500 lines of code. Fortunately, we’re using Python, so our code is much cleaner. We define a method for each bytecode name and use getattr to find it. Just like our previous little interpreter, if an instruction is called FOO_BAR, its corresponding method is byte_FOO_BAR. For now, let’s think of these methods as a black box. Each instruction method returns None or the string why, and in some cases the virtual machine requires this additional why information. The return value from these instruction methods is only an internal indication of the state of the interpreter and should not be confused with the return value from executing the frame.
class VirtualMachine(object): [... snip ...] def dispatch(self, byte_name, argument): """ Dispatch by bytename to the corresponding methods. Exceptions are caught and set on the virtual machine.""" # When later unwinding the block stack, # we need to keep track of why we are doing it. why = None try: bytecode_fn = getattr(self, 'byte_%s' % byte_name, None) if bytecode_fn is None: if byte_name.startswith('UNARY_'): self.unaryOperator(byte_name[6:]) elif byte_name.startswith('BINARY_'): self.binaryOperator(byte_name[7:]) else: raise VirtualMachineError( "unsupported bytecode type: %s" % byte_name ) else: why = bytecode_fn(*argument) except: # deal with exceptions encountered while executing the op. self.last_exception = sys.exc_info()[:2] + (None,) why = 'exception' return why def run_frame(self, frame): """Run a frame until it returns (somehow). Exceptions are raised, the return value is returned. """ self.push_frame(frame) while True: byte_name, arguments = self.parse_byte_and_args() why = self.dispatch(byte_name, arguments) # Deal with any block management we need to do while why and frame.block_stack: why = self.manage_block_stack(why) if why: break self.pop_frame() if why == 'exception': exc, val, tb = self.last_exception e = exc(val) e.__traceback__ = tb raise e return self.return_valueCopy the code
The Block
Class
Before we complete each bytecode method, let’s briefly discuss blocks. A block is used for some kind of control flow, especially exception handling and looping. It is responsible for ensuring that the data stack is in the correct state when the operation is completed. For example, in a loop, a special iterator is stored in the stack, and it pops out of the stack when the loop completes. The interpreter needs to check whether the loop is still continuing or has been stopped.
To keep track of this additional information, the interpreter sets a flag to indicate its status. We implement this flag with a variable why, which can be None or the following strings,” continue”, “break”,”excption”,return. They indicate what to do with block stacks and data stacks. Going back to our iterator example, if the top of the block stack is a loop block and why is continue, the iterator should be stored on the data stack, not ejected if why is break.
We won’t spend time on the details of block manipulation, but it’s worth a closer look for interested readers.
Block = collections.namedtuple("Block", "type, handler, stack_height") class VirtualMachine(object): [... snip ...] # Block stack manipulation def push_block(self, b_type, handler=None): level = len(self.frame.stack) self.frame.block_stack.append(Block(b_type, handler, stack_height)) def pop_block(self): return self.frame.block_stack.pop() def unwind_block(self, block): """Unwind the values on the data stack corresponding to a given block.""" if block.type == 'except-handler': # The exception itself is on the stack as type, value, and traceback. offset = 3 else: offset = 0 while len(self.frame.stack) > block.level + offset: self.pop() if block.type == 'except-handler': traceback, value, exctype = self.popn(3) self.last_exception = exctype, value, traceback def manage_block_stack(self, why): """ """ frame = self.frame block = frame.block_stack[-1] if block.type == 'loop' and why == 'continue': self.jump(self.return_value) why = None return why self.pop_block() self.unwind_block(block) if block.type == 'loop' and why == 'break': why = None self.jump(block.handler) return why if (block.type in ['setup-except', 'finally'] and why == 'exception'): self.push_block('except-handler') exctype, value, tb = self.last_exception self.push(tb, value, exctype) self.push(tb, value, exctype) # yes, twice why = None self.jump(block.handler) return why elif block.type == 'finally': if why in ('return', 'continue'): self.push(self.return_value) self.push(why) why = None self.jump(block.handler) return why return whyCopy the code
The Instructions
All that remains is to complete the instruction methods: byte_LOAD_FAST,byte_BINARY_MODULO, and so on. While the implementation of these instructions is not very interesting, we have only shown a small part of it here, and the full implementation is here (enough to execute all the code we have described above).
class VirtualMachine(object): [... snip ...] ## Stack manipulation def byte_LOAD_CONST(self, const): self.push(const) def byte_POP_TOP(self): self.pop() ## Names def byte_LOAD_NAME(self, name): frame = self.frame if name in frame.f_locals: val = frame.f_locals[name] elif name in frame.f_globals: val = frame.f_globals[name] elif name in frame.f_builtins: val = frame.f_builtins[name] else: raise NameError("name '%s' is not defined" % name) self.push(val) def byte_STORE_NAME(self, name): self.frame.f_locals[name] = self.pop() def byte_LOAD_FAST(self, name): if name in self.frame.f_locals: val = self.frame.f_locals[name] else: raise UnboundLocalError( "local variable '%s' referenced before assignment" % name ) self.push(val) def byte_STORE_FAST(self, name): self.frame.f_locals[name] = self.pop() def byte_LOAD_GLOBAL(self, name): f = self.frame if name in f.f_globals: val = f.f_globals[name] elif name in f.f_builtins: val = f.f_builtins[name] else: raise NameError("global name '%s' is not defined" % name) self.push(val) ## Operators BINARY_OPERATORS = { 'POWER': pow, 'MULTIPLY': operator.mul, 'FLOOR_DIVIDE': operator.floordiv, 'TRUE_DIVIDE': operator.truediv, 'MODULO': operator.mod, 'ADD': operator.add, 'SUBTRACT': operator.sub, 'SUBSCR': operator.getitem, 'LSHIFT': operator.lshift, 'RSHIFT': operator.rshift, 'AND': operator.and_, 'XOR': operator.xor, 'OR': operator.or_, } def binaryOperator(self, op): x, y = self.popn(2) self.push(self.BINARY_OPERATORS[op](x, y)) COMPARE_OPERATORS = [ operator.lt, operator.le, operator.eq, operator.ne, operator.gt, operator.ge, lambda x, y: x in y, lambda x, y: x not in y, lambda x, y: x is y, lambda x, y: x is not y, lambda x, y: issubclass(x, Exception) and issubclass(x, y), ] def byte_COMPARE_OP(self, opnum): x, y = self.popn(2) self.push(self.COMPARE_OPERATORS[opnum](x, y)) ## Attributes and indexing def byte_LOAD_ATTR(self, attr): obj = self.pop() val = getattr(obj, attr) self.push(val) def byte_STORE_ATTR(self, name): val, obj = self.popn(2) setattr(obj, name, val) ## Building def byte_BUILD_LIST(self, count): elts = self.popn(count) self.push(elts) def byte_BUILD_MAP(self, size): self.push({}) def byte_STORE_MAP(self): the_map, val, key = self.popn(3) the_map[key] = val self.push(the_map) def byte_LIST_APPEND(self, count): val = self.pop() the_list = self.frame.stack[-count] # peek the_list.append(val) ## Jumps def byte_JUMP_FORWARD(self, jump): self.jump(jump) def byte_JUMP_ABSOLUTE(self, jump): self.jump(jump) def byte_POP_JUMP_IF_TRUE(self, jump): val = self.pop() if val: self.jump(jump) def byte_POP_JUMP_IF_FALSE(self, jump): val = self.pop() if not val: self.jump(jump) ## Blocks def byte_SETUP_LOOP(self, dest): self.push_block('loop', dest) def byte_GET_ITER(self): self.push(iter(self.pop())) def byte_FOR_ITER(self, jump): iterobj = self.top() try: v = next(iterobj) self.push(v) except StopIteration: self.pop() self.jump(jump) def byte_BREAK_LOOP(self): return 'break' def byte_POP_BLOCK(self): self.pop_block() ## Functions def byte_MAKE_FUNCTION(self, argc): name = self.pop() code = self.pop() defaults = self.popn(argc) globs = self.frame.f_globals fn = Function(name, code, globs, defaults, None, self) self.push(fn) def byte_CALL_FUNCTION(self, arg): lenKw, lenPos = divmod(arg, 256) # KWargs not supported here posargs = self.popn(lenPos) func = self.pop() frame = self.frame retval = func(*posargs) self.push(retval) def byte_RETURN_VALUE(self): self.return_value = self.pop() return "return"Copy the code
Dynamic Typing: What the Compiler Doesn’t Know
You’ve probably heard that Python is a dynamic language-it’s dynamically typed. This description has been revealed as we build the interpreter.
One meaning of dynamic is that a lot of work is done at run time. We saw earlier that the Python compiler doesn’t have a lot of information about what the code really does. As an example, consider the following simple function mod. It takes two arguments and returns their modular values. From its bytecode, we see that the variables A and B are loaded first, and the bytecode BINAY_MODULO does the modulo operation.
>>> def mod(a, b):
... return a % b
>>> dis.dis(mod)
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
6 BINARY_MODULO
7 RETURN_VALUE
>>> mod(19, 5)
4Copy the code
Nineteen percent five is four — not surprisingly. What if we use a different class of parameters?
>>> mod("by%sde", "teco")
'bytecode'Copy the code
What just happened? You’ve probably seen syntax like this, formatting strings.
>>> print("by%sde" % "teco")
bytecodeCopy the code
Formatting a string with the % symbol calls the bytecode BUNARY_MODULO. It takes the modulo of the two values at the top of the stack, whether they’re strings, numbers, or instances of your own class. Bytecode the same bytecode is generated when a function is compiled (or defined) and used for different classes of arguments.
The Python compiler knows very little about the capabilities of bytecode. It is up to the interpreter to decide what type of object BINAYR_MODULO is applied to and to do exactly what it does. This is why Python is described as dynamically typed: you don’t have to know the type of the function argument until you run it. In contrast, in a statically typed language, the programmer needs to tell the compiler what the type of the parameter is (or the compiler can deduce the type of the parameter itself).
Compiler ignorance is one of the challenges of optimizing Python — just looking at the bytecode, without actually running it, you don’t know what each bytecode is doing! You can define a class that implements a __mod__ method, which Python automatically calls when you use % on an instance of the class. So, BINARY_MODULO can actually run any code.
Looking at the code below, the first a % b looks useless.
def mod(a,b):
a % b
return a %bCopy the code
Unfortunately, statically analyzing this code — without running it — can’t be sure that the first A % b didn’t do anything. Calling __mod__ with % may write a file, or interact with other parts of the program, or do anything else that can be done in Python. It’s hard to optimize a function that you don’t know what it will do. In Russell Power and Alex Rubinsteyn’s excellent paper, “How fast can we explain Python?” “In the general absence of type information, each order must be treated as an INVOKE_ARBITRARY_METHOD,” they say.
Conclusion
Byterun is a compact Python interpreter that is easier to understand than CPython. Byterun replicates the main structure of CPython: a stack based set of instructions called bytecodes that execute sequentially or jump between instructions, pushing data into and out of the stack. The interpreter dynamically creates, destroys, and jumps between frames as functions and generators are called and returned. Byterun has the same limitations as a real interpreter: because Python uses dynamic typing, the interpreter must determine the correct behavior of instructions at run time.
I encourage you to disassemble your program and run it in Byterun. You will soon find instructions that this shortened version of Byterun does not implement. The full implementation is here or by perusing the real CPython interpreter ceval.c, you can implement your own too!
Acknowledgements
Thanks to Ned Batchelder for originating this project and guiding my contributions, Michael Arntzenius for his help debugging the code and editing the prose, Leta Montopoli for her edits, and the entire Recurse Center community for their support and interest. Any errors are my own.
comments