• A Bit About Bytes: Understanding Python Bytecode-Pycon 2018
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: cdpath
  • Proofreader: HCMY

PyCon 2018 James Bennett – A Bit About Bytes: Understanding Python Bytecode

0:07 Welcome to Byte Chatter

0:11 Today we’re going to talk about Python bytecode

0:14 Titles apart from playing word games

0:17 more meaning

0:20 Cut the gossip

0:22 Welcome Django core developer James Bennett

0:25 start the speech

0:36 I want to start with a slightly existential question

0:38 Why are we at PyCon

0:41 because I love Python

0:45 right

0:52 Why Do you love Python?

0:55 because we all get it

0:57 Reason reading Code takes more time than writing it

1:03 So try to make your code more readable

1:05 Of course, we love Python

1:08 is because Python was made for a simple idea

1:12 code should be easy to read

1:19 Python is clear, readable, and easy to understand

1:24 Even if you’re not a programmer

1:25 You can also take a look at Python code

1:28 Understand the logic

1:30, right?

1:36 That’s Python

1:41 At least CPython downloaded from Python.org

1:50 now I’m going to teach you, where does it come from

How does 1:53 work

1:56 What is the use of understanding it

1:59 And finally in practice

2:01 or apply it in theory

2:05 but before we do that

2:07 We’re going to learn a little bit about how computers work, right

2:08 Also know how programming languages work

2:12 I love this tweet

So beautiful and so true

2:19 But we do need to understand how computers work

The CPU processor inside the computer is a silicon chip

2:26 Carved with carefully arranged circuits

2:32 Input a specific current

At 2:35 you get another mode of current

2:37 And the pattern is predictable

Give these patterns names and meanings

2:45 We can say that this current pattern represents addition

2:49 That’s how computers work

These are the names that we have chosen

2:53 is called the CPU instruction

2:56 is sometimes called machine code

3:00 If further presented in a form that is user-friendly to humans

3:01 is assembly code

3:05 But even assembly language isn’t that easy to understand

3:08 Have you ever seen assembly code?

3:11 How many of you want to code in sinks all the time?

We’d rather write source code

Beautiful, clear, easy to read, easy to understand

3:24 But computers only accept binary instructions

How do you build a bridge between them?

3:33 Several approaches have been tried over the years

3:35 Some languages use compilers first invented by Grace Murray Hopper

Compile the source code directly to machine code

3:45 These are the compiled languages

3:47 Some languages rely on interpreters

Interpret source code as machine code directly at runtime

3:57 These are interpreted languages

Python is an interpreted language

People also talk a lot about the Python interpreter

4:03 But there is a third language

4:06 Instructions compiled in some languages

4:10 does not work with real physical cpus

4:16 I mean you could build a CPU like this, but at least it doesn’t exist right now

4:20 These languages can be interpreted to compile instructions for non-existent cpus

4:25 is a program that simulates the CPU to execute instructions

4:28 The interpreter understands these instructions

4:30 and translate these instructions into real CPU-accepted binary code

4:36 This intermediate instruction is called bytecode

There are many languages that fall into this category

4:41 Anyone using Java?

4:45 Java-compiled bytecode runs on the Java virtual machine

4:47 Does anyone use.NET?

So there are c #

4:51 C# compiled bytecode runs on a.net virtual machine

4:55 And of course Python

4:58 Python-compiled bytecode runs on the Python virtual machine

Let’s take a closer look at how it works

5:04 This is a Python function that computes Fibonacci numbers

5:11 Very easy to understand

5:13 first check whether is less than 2, if yes, directly return

5:18 Otherwise the Fibonacci number is obtained through a loop

5:23 How does Python actually execute this function?

5:25 Has anyone seen a file with the extension PYC?

5:33 If you use Python2, you know that Python2 will be in

The 5:35 source code path places a PYC file of the same name

5:40 If Python3 is used, pyc is placed in the __pycache__ path

5:47 You may have heard that these PYCs are compiled Python

5:50 You’ve probably heard that PYC saves recompiling time

5:52 This is Python bytecode

The 5:55 PYC file is the bytecode from compiling the source code

So the next time you run this code

6:01 or the next time you import this module

6:03 Python does not need to be compiled from scratch again

6:08 Python needs bytecode in this format to execute

6:13 So how do you understand how it works?

6:15 Suppose you use the Python interpreter

6:17 Input the Fibonacci function

At 6:20 you get a function object

This object has a special method, __code__

6:27 is the Python Code object

6:32 Did anyone hear Emily Morehouse’s talk yesterday about parsing and AST (Abstract Syntax trees)?

6:36 very good speech

6:38 You can learn something about code objects

6:40 and how does Python use it

It is a different property that we shall look at today

6:45 From another Angle

6:46 is what happens next in syntax parsing

The 6:48 Code object contains everything Python needs to execute functions

6:54 It has some properties, we can see what’s in it

6:56 and how does it work

There is an attribute called co_consts

7:02 It is a tuple whose elements are all literals and constants referenced in the function body

7:06 You can see that there are

7:09 the numbers 2,0,1

7:11 a tuple of 0s and 1s

Indeed, and None

None here looks strange

After all, None is not written in the function body

7:22 But Python puts None here for a reason

Python functions that do not explicitly use return

None is returned at 7:33

7:36 So None is in the tuple

7:45 because when Python is compiled

7:47 Cannot tell if there is an explicit return expression

7:52 In fact, it’s impossible to know

7:55 These are the literals

7:59 has one more attribute, co_varnames

8:01 Its elements are local variable names

8:06 are: N, Current, and next

8:12 The other attribute is co_names

8:15 The element is the nonlocal variable name referenced in the function body

8:18 This function does not use the nonlocal variable

8:20 So it’s an empty tuple

8:22 And finally, the most interesting property

8:25 co_code

8:30 This is the bytecode of the function

8:33 It is not a string, but a Bytes object

8:36 because of the Python3 implementation

8:42 Some characters can be represented in ASCII

8:47 This has to do with the default way Python presents bytes objects

8:49 But it’s not a string, and it can’t be treated as a string

8:51 It’s just a string of bytes

8:55 If we want to know what this long string of bytes means

8:57 Might as well start with the first byte

That looks like a pipe symbol |

9:06 I don’t know if you can memorize ASCII tables

9:08 anyway I can not recite

“So I don’t know the pipe symbol | corresponding decimal number is what

9:15 But I can ask Python to tell me

Though in Python and | corresponding decimal number is 124

9:24 So the value of the first byte of bytecode is 124

9:26 Still no useful information

9:30 good thing there’s a DIS module in the standard library

The opname array contains all Python bytecode instructions

The index value is the decimal value of the bytecode

9:46 The bytecode operator corresponding to 124 is LOAD_FAST

9:48 Ok, we know that the decimal number for the first byte is 124

9:54 indicates the LOAD_FAST instruction

The second byte in the 9:57 bytecode is 0

10:00 adds up to LOAD_FAST 0

10:02 I don’t know if you noticed the first slide

10:05 is actually what’s going on here

10:08 LOAD_FAST 0 is the Python bytecode instruction

10:12 Exactly

10:15 This instruction means to look for a variable name whose index value is 0 in a tuple of variable names

10:21 is the local variable n

Push it to the top of the call stack

10:29 We’ll cover the call stack later

10:31 But now I have to show you a shortcut

10:35 The way I showed you how to read bytecode is very tedious

10:38 There’s an easy way

10:41 Import dis then call dis.dis

10:44 You can pass it anything

10:47 Take functions

10:48 or the source code string

10:50 or any Type of Python object

10:52 dis.dis() will untangle it

10:56 Print out easy-to-read bytecode

11:00 the result obtained by passing in Fibonacci function

11:02 is the first slide

11:05 This is the bytecode of the Fibonacci function

11:11 A few points worth noting:

11:12 These numbers on the left

11:172, 3, 4, 5, 6, 7, 8

11:18 Line number of the corresponding source code

11:20 is also the starting point for each instruction block

11:22 You must have noticed

11:25 Each line of source code corresponds to multiple lines of bytecode instructions

11:30 There is a number next to each command

11:32 And this number is always even

11:34 Would anyone like to guess why it’s even?

11:38 this is a new feature of Python3.6

11:41 These numbers are bytecode offsets

11:44 If you look at __code__. Co_codes carefully

11:46 Enter the index value

11:49 such as 6

At 11:53 you get POP_JUMP_IF_FALSE

11:57 even numbers

11:59 is because Python3.6

12:02 Not all bytecode instructions have arguments

12:04 but Python3.6 takes arguments to each instruction

12:07 Regardless of whether the parameters are there or not

12:08 Each bytecode instruction thus takes up two bytes

12:10 This is also easier to implement

12:16 There are also instructions that have too many parameters

12:19 Can’t fit in one byte

12:21 will be split into multiple bytes

12:22 but it must be a multiple of two bytes

12:24 for Python3.5 or earlier

12:28 For the same input

12:29 The bytecode you get might have odd offsets

12:31 Because not all instructions in Python 3.5 have arguments

12:33 Another point worth noting

12:37 These right triangle signs

For example, line 4 of source code, offset 12

12:42 LOAD_CONST here

12:44 and line 5 of the source code, offset 22

12:47 These are jump targets

12:50 This is Python’s way of telling you that other instructions might jump to these places

12:57 Remember loops in Fibonacci functions?

12:59 In the beginning is a judgment

13:01 Each run to the start of the loop

13:04 all jump back to the previous instruction

These trigonometric arrows indicate that this could be a jump point for other instructions

13:12 Ok, looked at some bytecode

13:17 Do we also know how to parse raw bytecode

13:19 Get the bytecode first

13:22 Then manually parse the instructions corresponding to these bytes

13:24 Or dis.dis

13:26 We actually talked a little bit about how Python works

13:29 and how does Python use bytecode

The Python virtual machine implemented by CPython is stack-oriented

In other words, its underlying data structure is the stack

13:40 If you haven’t used stacks before (here’s a brief introduction)

13:43 Stacks are kind of like lists

13:45 simply supports two very important operations

13:48 A stack has two ends, let’s call them top and bottom

One operation is push

So 13:52 puts the value at the top of the stack

13:55 Another operation is pop

13:57 is the value taken from the top of the stack, removed, and returned

14:01 Each call to a Python function pushes the call frame to the top of the call stack

14:07 The call stack keeps track of every function that is called

14:09 Once the function returns the corresponding call frame pops off the call stack

14:17 Return value push to call frame

So if I call the Fibonacci function

14:21 More on that later

14:23 you can get the return value

14:24 When a call frame within a call frame is executed

14:31 Will also use two other stacks

14:34 “Computing stack”, also known as “data stack”

14:40 Python uses it to store all the data it uses

14:43 Most of the computation of Python functions takes place here

14:46 And most of the instructions operate on the top of the stack

14:53 Another stack used is the “code block stack”

14:55 Used to record the currently active code block

15:00 code blocks are things like try/except, with blocks

Python needs code blocks because statements like break and continue apply to the current code block

15:11 Python needs to know what the current code block is

15:13 This can be done by maintaining a stack of code blocks

So every time you encounter this structure

Python pushes it onto the code block stack

15:21 Pop it off after it’s over

Let’s see how the function is executed

15:27 Suppose we wanted to find the eighth Fibonacci number

15:31 We will call Python’s Fibonacci function to solve

This can be converted into three bytecode instructions

15:39 LOAD_GLOBAL, LOAD_CONST and CALL_FUNCTION

15:42 look closely

At first the stack is empty

15:46 The first instruction is LOAD_GLOBAL

Load the global variable fib, also known as the Fibonacci function

15:54 Needs to be looked up in the nonlocal variable name in the co_NAMES tuple

After finding the function, push the function object to the top of the stack

Next up is LOAD_CONST

Here is the element with index 1 of the constant tuple

16:10 Remember

16:12 The element with index 0 is None

So we get an integer 8

16:17 is the argument to the function

16:19 Push it to the top of the stack

Next comes the CALL_FUNCTION directive

16:26 The parameter is 1

16:29 The way Python calls a function when only positional arguments are used is

16:34 Push the function to the top of the stack

Push the position argument to the top of the stack (above the function object).

And then when you call a function

16:42 pop all position parameters

So the next element on the stack is the function object, and pops out that function object

16:48 Push the new stack to the call frame or call stack

Execute Fibonacci function in new call frame

16:56 Obtain the return value 21

17:00 Next pop call stack, get call frame

17:03 The return value is returned to the stack

17:10 These are the details of Python’s step-by-step Fibonacci functions

17:14 The CALL_FUNCTION directive here applies only to positional arguments

17:18 If it is a keyword parameter

The CALL_FUNCTION_KW command is used at 17:20

17:26 If generator is used, parameter unpack

17:30 The * or ** operator

The CALL_FUNCTION_EX directive is used at 17:33

This is how the function works

17:42 If you’re interested

You can refer to the DIS module in the Python standard library documentation

17:47 The DIS module is very useful

17:53 It lists all the bytecode instructions

It also explains the functions of these instructions, the parameters they accept, whatever you want to know

18:00 Technical details about Python bytecode

18:03 Here are a few more interesting things

18:07 The DIS module has a function called distb

18:12 Have you ever encountered strange anomalies

18:15 Wonder where exactly it was thrown

18:18 dis.distb can help

You can call it directly after an exception has occurred

18:29 or pass in the captured Traceback object

18:33 Distb will parse the active call frames on the current call stack

18:39 Prints the executed bytecode

Arrows are drawn directly to the instruction that throws an exception

Let me give you an example

So LET me divide a number by 0

18:51 Python throws an exception

18:54 import dis; dis.distb()

You can print out executed bytecode at 18:57

19:00 If you still want to dig into the details

See the references I give at the end of the slide

19:04 You can take a look at the Python interpreter written in C

That’s the C source code for the Python bytecode interpreter on GitHub 2 hours ago

19:16 is essentially a huge switch expression

19:19 What is the operation represented by an instruction to find an incoming decimal number

19:27 Ok, now we know a little bit about bytecode

19:31 But what’s the use of bytecode?

19:34 What are the benefits of knowing bytecode?

19:40 Have you heard or used the Forth language?

19:46 or the newer Factor language?

Both Forth and Factor are stack-oriented programming languages

19:57 Python virtual machines are also stack-oriented

We talked about that a moment ago

20:01 is basically all about pushing something to the top of the stack

20:03 Do something at the top of the stack

20:05 Finally pop the results back

20:08 This process is a little different from the way we’re used to programming

But a lot of programming languages are designed around this idea

And it’s good to understand the programming idea

There may not have been a day when it was actually used

But you can learn it

Then expand your programming horizons

And stack-oriented programming languages or virtual machines

20:34 Through very few instructions

20:37 and a limited number of stack operators can do amazing things

20:39 Very clever indeed

Of course, knowing bytecode is also practical

Everyone likes to joke about C

All regard C as half of assembly language.

Because you write, read C code can see it will be transferred to what machine code

20:59 Python is the same to some extent

21:03 We can learn Python bytecode

21:05 Learn how to understand it

21:07 To find out what bytecode our Python source code will be translated into

21:11 and how does the Python interpreter execute source code

21:16 All this will give you insight

21:19 You will also learn how Python works

21:26 and what everyone wants to know about improving the performance of Python code

21:30 Look at these two functions

33They both do the same thing

21:35 Count the number of seconds in a week

21:37 But there is a faster way of writing it

21:40 Can you see which way is faster?

21:46 I want you to think about it

21:49 Why is one function faster than another

21:52 and how to find this function

21:55 The method is to look at the bytecode

21:56 Bytecode is first obtained using the DIS module

22:02 The bytecodes of the two functions are quite different

You can see that the bytecode of the first function stores the number of seconds of the day in the variable

22:09 that means you need to load constants

22:12 Store variables

22:15 Read the value of the variable again

Load another constant and multiply

22:19 The result is returned

The bytecode of the second function uses only the multiplication of two constants

22:24 While Python is compiled

22:27 Found that this is the multiplication of two constants

22:32 It’s not going to change

22:35 7 * 86400 doesn’t change anything

22:41 Python is optimized for this

Multiply at compile time

22:45 Actually returns 604800 directly

All other superfluous operations are omitted

22:51 This optimization is really clever

22:53 Python does this whenever it encounters constant operations

23:00 This is not the only optimization Python makes

23:03 Have you heard of Spectre and Meltdown?

23:05 Know anything about it?

23:08 These two loopholes are mainly caused by branch prediction

23:11 is when the processor tries to guess what the if statement might do next

23:18 Python also predicts bytecode operations

23:22 Some bytecode operators always come in pairs

For example, a comparison operation is often followed by a jump instruction

23:29 The Python bytecode interpreter will be optimized

23:31 Trying to guess the next operation

23:33 So as to make full use of the branch prediction function of CPU to improve the execution speed

23:37 So it’s pretty good

23:41 You can also answer some frequently asked performance tuning questions

“They always ask

23:46 Why is a literal list or dictionary faster than calling a list or dict

23:51 Well, here’s why

Create a literal dictionary with {}

23:57 Only two instructions are required

24:00:00 If dict is called

24:2:00 requires three instructions

24:4:00 One of them is still CALL_FUNCTION

24:06:00 this means pushing the call frame to the call stack

24:07:00 Perform the function and pop the result back

24:10:00 Let’s use real code for an example

24:12:00 Is a very simple example

24:15:00 is to take the first ten perfect squares

24:17:00 The complete bytecode is not shown here

24:20:00 is just the bytecode corresponding to the while loop

24:22:00 consists of 15 bytecode instructions

24:25:00 This code can be optimized

24:28:00 For example, replace the while loop with the for loop

24:30:00 Count with range

24:34:00 Now the bytecode of the loop body is much shorter

24:36:00 requires only nine instructions

24:39:00 if written more in line with the Python philosophy

24:42:00 For example, using a list derivation

What would the bytecode for 24:43:00 look like?

There are now only nine instructions in the bytecode of the entire function body

24:48:00 But don’t be fooled by appearances

24:54:00 I put this byte code here for a reason

24:57:00 Notice, there are only nine instructions

25:00.00 contains instructions for creating and calling functions

25:02:00 So extra call frames need to be pushed onto the call stack

25:03:00 where the function body is executed

25:05:00 Pop off and return

25:08:00 This operation will consume more resources

25:10:00 even though there are fewer bytecode instructions

25:15:00 Because not all instructions consume the same amount of resources

25:18:00 We are now talking about the performance differences between different bytecodes and bytecode instructions

25:24:00 Everybody wants to know about this micro-optimization technique

25:28:00 First of all, I want to emphasize

Set: 00 Python slowly

25:32:00 If you’re struggling to speed up the execution of Python bytecode instructions

25:35:00 then you can’t see the forest for the trees

25:37:00 Python is much slower than C

25:39:00 There is no need to think about micro-optimizations

25:43:00 If you want to write lightning-fast Python code

25:52:00 Go over the Python standard library first

25:56:00 Look at built-in functions and built-in classes

25:58.00 Learn what is implemented in C

26:01:00 Which are implemented in Python

26:03:00 because when it comes to speed differences

26:05:00 The improvement from optimizing bytecode instructions may be so small

26:10:00 and change to the C language implementation version

26:12:00 there are so many performance improvements, there is no comparison

26:16:00 Even so, you might want to have some basic ideas

26:18:00 Here are a few

26:22:00 If you’ve read some Python performance tuning guides

26:24:00 You’ve probably heard of not referencing variables inside loops

26:27:00 Instead, you create the alias first and then use it in the loop

26:30:00 that’s why (pointing to the slide)

26:32:00 LOAD commands have different performance

26:35:00 LOAD_CONST and LOAD_FAST are faster

26:38:00 LOAD_NAME and LOAD_GLOBAL are slower

26:40:00 And why

Finding nonlocal variables can be complicated

26:47.00 May need to search in multiple namespaces

26:52:00 If you look at the source code that implements the interpreter

26:56:00 will know that the implementation of these instructions is very complicated

26:57:00 Also, loops and code blocks are slow

27:01:00 can be avoided as much as possible

They use SETUP_LOOP, SETUP_WITH, SETUP_EXCEPTION

27:10:00 Each time you enter or exit a loop or block of code

27:13:00 all require multiple instructions to enter the loop

27:18:00 Handle the context and push to the code block stack

27:19:00 Execute the loop body

27:22:00 If you exit the loop, you have to jump out

27:24:00 Final pop result

27:26:00 and some finishing touches

27:27:00 are very resource-intensive instructions that can be avoided at all costs

Access properties, dictionary searches, and list indexes also need attention

27:38:00 LOAD_ATTR and BINARY_SUBSCR here

27:42:00 You hear it a lot

Get an element from a dictionary or list

27:45:00 if I want to loop through it

27:47:00 quote each time

It is better to use aliases of local variables in advance.

Because every step in the loop is a lookup, dict lookup is efficient.

27:55:00 and this command is more resource-intensive

28:01:00 There are many similar optimization tips in the dis module documentation

28:05:00 The documentation describes various instructions for your reference

28:08:00 There are some other materials worth reading

28:10:00 Here are three recommendations

28:13:00 First up is a free online ebook, Inside the Python Virtual Machine.

28:20:00 Tips for authors are certainly welcome

28:21:00 This book is a complete introduction to the inner workings of the Python interpreter

28:28:00 All internal mechanisms

Thou: 00 stack

28:32:00 Various byte instructions

28:36:00 Followed by Implementing the Python Interpreter with Python by Allison Kaptur

28:39:00 She explained the implementation in detail

28:40:00 Oh, she also has a PyCon talk

28:43:00 Did she explain how to use it

28:48:00 Reasonable data structure

28:50:00 To write a Python interpreter in Python with various bytecode operations

28:52:00 Finally, read the source code for the CPython bytecode interpreter

28:57:00 And part of that is the huge switch expression I just showed you

It has about a thousand lines

29:2:00 The version I saw was this long

At least a few hundred lines at 29:05.00

29:08:00 but it’s not hard to read

29:09.00 is very well written C code

29:11:00 CPython C source code style is relatively easy to read

29:18:00 These are good references

29:21:00 You can still find me on Twitter

29:24:00 I can answer a few questions

29:28:00 You can follow me online

29:32:00 And finally, thank you for listening

29:36:00 I hope you got something

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.