1. In-depth understanding of GIL

1. Before quote

When I was 6, I had a music box. When I wind up, the ballerina on top of the music box spins, while the inside jingles, “Twinkle, twinkle Little Star, all over the sky.” It’s gonna be tacky, but I love that music box, and I want to know how it works. When I took it apart, I saw a simple device inside, a thumb-sized metal cylinder embedded in the body that produced the notes by plucking steel comb teeth as it rotated.

Of all the qualities a programmer possesses, this curiosity to see how things work is essential. When I opened my music box and looked inside, I could see that I was, if not a brilliant programmer, at least a curious one.

Oddly enough, I’ve been writing Python programs for years and have had the wrong idea about the global interpreter lock (GIL) because I’ve never been curious enough about how it works. I met others who were equally hesitant and ignorant. It’s time to open the box and take a look. Let’s take a look at the CPython interpreter source to find out what the GIL is, why it exists in Python, and how it affects multithreaded programs. I will give you some examples to help you understand GIL better. You’ll learn how to write fast running and thread-safe Python code, and how to choose between threads and processes.

(I only describe CPython in this article, not Jython, PyPy, or IronPython. Most programmers still use CPython to implement Python.)

2. Look, the global interpreter lock (GIL)

Here:

static PyThread_type_lock interpreter_lock = 0; /* This is the GIL *
Copy the code

This line of code is taken from the source code of the Ceval. c — CPython 2.7 interpreter, and Guido van Rossum’s comment “This is the GIL” was added in 2003, But the lock itself dates back to 1997 with his first multithreaded Python interpreter. On Unix systems, PyThread_type_lock is an alias for the standard C mutex_t lock. It initializes when the Python interpreter starts:

void
PyEval_InitThreads(void)
{
interpreter_lock = PyThread_allocate_lock();
PyThread_acquire_lock(interpreter_lock);
}
Copy the code

All C code in the interpreter must hold this lock while executing Python. Guido originally added the lock because it was easy to use. And every attempt to remove the GIL from CPython would cost a single-threaded application too much performance, and even though removing the GIL would increase the performance of a multithreaded application, it was not worth it. (The former is Guido’s biggest concern and the most important reason not to remove the GIL, and a simple attempt in 1999 resulted in a nearly two-fold decrease in single-threaded program speed.)

GIL’s effect on threads in your application is simple enough that you can write this principle on the back of your hand: “One thread runs Python while N others sleep or wait for I/O.” Python threads can also wait for threading.lock or other synchronization objects in thread modules; Threads in this state are also called “sleep”.

When do threads switch? Whenever a thread starts sleeping or waiting for network I/O, there is always an opportunity for another thread to fetch the GIL and execute Python code. This is collaborative multitasking. CPython also has preemptive multitasking. If a thread runs 1000 bytecode instructions non-stop in Python 2, or 15 milliseconds non-stop in Python 3, it will discard the GIL and other threads can run. Think of this as the old time slice when you had multiple threads but only one CPU. I’ll discuss these two types of multitasking in detail.

Think of Python as an old mainframe, with multiple tasks sharing a SINGLE CPU.

Collaborative multitasking

When a task such as network I/O starts and there is no need to run any Python code for a long or indefinite period of time, one thread cedes the GIL so that another thread can acquire the GIL and run Python. This polite behavior is called collaborative multitasking, and it allows concurrency; Multiple threads are waiting for different events simultaneously.

That is, two threads each connect to a socket:

def do_connect() :
s = socket.socket()
s.connect(('python.org'.80)) # drop the GIL

for i in range(2):
    t = threading.Thread(target=do_connect)
    t.start()
Copy the code

Only one of the two threads can execute Python at a time, but once the thread starts connecting, it drops the GIL so the other threads can run. This means that two threads can wait concurrently for a socket connection, which is a good thing. They can do more work in the same amount of time.

Let’s open the box and see how a thread actually discards the GIL when the connection is established, in socketModule.c:

/* s.connect((host, port)) method */
static PyObject *
sock_connect(PySocketSockObject *s, PyObject *addro)
{
sock_addr_t addrbuf;
int addrlen;
int res;

/* convert (host, port) tuple to C address */
getsockaddrarg(s, addro, SAS2SA(&addrbuf), &addrlen);

Py_BEGIN_ALLOW_THREADS
res = connect(s->sock_fd, addr, addrlen);
Py_END_ALLOW_THREADS

/* error handling and so on .... */
}
Copy the code

It is at the Py_BEGIN_ALLOW_THREADS macro that the thread discards the GIL; It is simply defined as:

PyThread_release_lock(interpreter_lock);
Copy the code

Of course Py_END_ALLOW_THREADS retrieves the lock again. One thread might block at this location, waiting for another thread to release the lock; Once this happens, the waiting thread grabs the lock back and resumes execution of your Python code. In short: When N threads are stuck on network I/O, or waiting to reacquire the GIL, and one thread is running Python.

Let’s look at a complete example of using collaborative multitasking to grab many urls quickly. But before we do that, compare collaborative multitasking with other forms of multitasking.

4. Preemptive multitasking

Python threads can either proactively release the GIL or preemptively grab the GIL.

Let’s review how Python works. Your program runs in two phases. First, Python text is compiled into a simple binary format called bytecode. Second, the main loop of the Python interpreter, a function called pyeval_evalframeex(), smoothly reads the bytecode and executes its instructions one by one.

When the interpreter passes bytecode, it periodically abandons the GIL without permission from the thread executing the code, so that other threads can run:

for (;;) {
if (--ticker < 0) {
ticker = check_interval;

/* Give another thread a chance */
PyThread_release_lock(interpreter_lock);

/* Other threads may run now */

PyThread_acquire_lock(interpreter_lock, 1);
}

bytecode = *next_instr++;
switch (bytecode) {
/* execute the nextinstruction ... * /}}Copy the code

By default, the detection interval is 1000 bytecode. All threads run the same code and are drawn from their locks periodically in the same way. The implementation of the GIL in Python 3 is more complex, and instead of a fixed number of bytecodes, the detection interval is 15 milliseconds. However, for your code, these differences are not significant.

5. Thread safety in Python

Weaving multiple threads together requires skill.

If a thread can lose the GIL at any time, you must make code thread-safe. Python programmers, however, have a very different view of thread safety than C or Java programmers, because many Python operations are atomic.

Calling sort() on a list is an example of an atomic operation. Threads cannot be interrupted during sorting, and other threads never see the sorted portion of the list, nor do they see expired data before the list is sorted. Atomic manipulation has simplified our lives, but there are surprises. For example, + = seems simpler than the sort() function, but + = is not an atomic operation. How do you know which operations are atomic and which are not?

Take a look at this code:

n = 0

def foo() :
    global n
    n += 1
Copy the code

We can see bytecode compiled by this function using Python’s standard DIS module:

>>> import dis
>>> dis.dis(foo)
LOAD_GLOBAL 0 (n)
LOAD_CONST 1 (1)
INPLACE_ADD
STORE_GLOBAL 0 (n)
Copy the code

One line of code, n += 1, is compiled into four bytecodes that perform four basic operations:

  1. Load the n value onto the stack
  2. Load the constant 1 onto the stack
  3. Add the two values at the top of the stack
  4. Store the sum back to n

Remember, for every 1000 bytes a thread runs, the interpreter interrupts and snatches the GIL. With bad luck, this can happen between the thread loading n onto the stack and storing it back to N. It’s easy to see how this process can cause updates to be lost:

threads = []
for i in range(100):
t = threading.Thread(target=foo)
threads.append(t)

for t in threads:
	t.start()

for t in threads:
	t.join()

print(n)
Copy the code

Normally this code prints 100, because each of the 100 threads is incremented by n. But sometimes you’ll see 99 or 98 if an update from one thread is overwritten by another.

So, despite the GIL, you still need to lock to protect the mutable state of the share:

n = 0
lock = threading.Lock()

def foo() :
    global n
    with lock:
    	n += 1
Copy the code

What if we used an atomic operation such as sort()?

lst = [4.1.3.2]

def foo() :
	lst.sort()
Copy the code

The bytecode of this function shows that sort() cannot be broken because it is atomic:

>>> dis.dis(foo)
LOAD_GLOBAL 0 (lst)
LOAD_ATTR 1 (sort)
CALL_FUNCTION 0
Copy the code

A line is compiled into three bytecodes:

  1. Load the LST value onto the stack
  2. Load its sorting method onto the stack
  3. Call sort method

Even though the line **lst.sort()** is divided into several steps, calling sort itself is a single bytecode, so the thread has no chance to grab the GIL during the call. We can conclude that locking is not required at sort(). Or, to avoid worrying about which operation is atomic, follow a simple rule: always lock around reads and writes that share mutable state. After all, getting a threading.Lock in Python is cheap.

While GIL doesn’t eliminate the need for locks, it does mean that there is no need for fine-grained locks (fine-grained locks that programmers need to lock and unlock themselves to keep threads safe, typically in Java, whereas coarse-grained locks in CPthon, The language layer itself maintains a global locking mechanism to keep threads safe. In thread-free languages such as Java, programmers strive to lock access to shared data in the shortest possible time, reducing thread contention and maximizing parallelism. However, because threads cannot run in parallel in Python, fine-grained locking has no advantage. As long as no thread holds the lock, such as while sleeping, waiting for I/O, or some other lost GIL operation, you should use the coarse-grained, simple lock as possible. Other threads cannot run in parallel anyway.

6. Concurrency can complete faster

I bet you really want to optimize your program through multithreading. By waiting for many network operations at the same time, your task will complete faster, so multithreading helps, even if only one thread can execute Python at a time. This is concurrency, and threads work well in this case.

Code runs faster in threads

import threading
import requests

urls = [...]

def worker() :
	while True:
		try:
			url = urls.pop()
		except IndexError:
			break # Done.

requests.get(url)

for _ in range(10):
    t = threading.Thread(target=worker)
    t.start()
Copy the code

As we can see, in getting a URL over HTTP, these threads drop the GIL while waiting for each socket operation, so they finish their work faster than a single thread.

7. Parallelism in parallel

What if you want to complete tasks faster just by running Python code at the same time? This approach is called parallelism and is forbidden by GIL. You have to use multiple processes, which is more complex and requires more memory than threads, but it makes better use of multiple cpus.

This example forks 10 processes, which is faster to complete than just one, because the processes are running in parallel in multiple cores. But 10 threads do not complete any faster than one thread, because only one thread can execute Python at a time:

import os
import sys

nums =[1 for _ in range(1000000)]
chunk_size = len(nums) // 10
readers = []

while nums:
	chunk, nums = nums[:chunk_size], nums[chunk_size:]
	reader, writer = os.pipe()
	if os.fork():
		readers.append(reader) # Parent.
	else:
		subtotal = 0
	for i in chunk: # Intentionally slow code.
		subtotal += i

print('subtotal %d' % subtotal)
os.write(writer, str(subtotal).encode())
sys.exit(0)

# Parent.
total = 0
for reader in readers:
	subtotal = int(os.read(reader, 1000).decode())
	total += subtotal

print("Total: %d" % total)
Copy the code

Because each fork process has a separate GIL, the program can delegate work and run multiple computations at once.

Jython and IronPython offer single-process parallelism, but they are far from fully implementing CPython compatibility. PyPy with software transaction memory could one day run faster. If you’re curious, try these interpreters.)

conclusion

Now that you’ve opened the music box and seen its simple setup, you know all you need to know about how to write fast, thread-safe Python code. The use of threads for concurrent I/O operations, parallel computation in the process. This principle is simple enough that you don’t even need to write it on your hand.

Personal understanding of GIL (global interpreter Lock) and Lock (mutex

1. Questions about GIL and Lock

Let’s be clear: GIL is not a Python feature, and Python can do without GIL

GIL: Ensures that only one thread in the same process can execute instructions using the CPU at a time.

Lock Mutex: Guarantees that a piece of code cannot be executed by another thread until it has finished executing.

The GIL lock does not guarantee the absolute security of data, but makes python code unable to multithread execution. What’s the point? Here is my understanding of GIL and mutex. Maybe not exactly right, but hopefully it might help you understand the difference.

2. The difference between global interpreter locks and mutex locks

Global interpreter lock

Like a large lock in the Python interpreter, the thread guarantees most routine modifications to data (atom-like operations: The python global interpreter lock has a feature that the operating system CPU rotation mechanism (i.e., periodic release of the lock) 15 milliseconds after execution. This means that a thread that has acquired the GIL lock will automatically release the GIL lock after 15ms execution. After the GIL is released, the thread will enter the ready queue again, waiting to run. At this time, other threads can acquire the GIL lock, so as to obtain CPU resources to execute the task, and continue to execute the task from the position left by the first thread when they come back next time. This mode ensures the simultaneous execution between multiple threads. Does not result in one thread being executed for a period of time until the execution is complete, which is obviously unreasonable.

The mutex

Just like protecting some code with a small lock, mutex exists because of the limitation of GIL lock. Global lock guarantees the safety of most operations, but it can not avoid a few operations (non-atomic operations: For example, a+=, -=, *=, /=), for example, a+=1, the actual operation is to get the value of A, and then calculate the sum with 1. If thread 1 happens to get the sum, but does not put the sum back to A, then the rotation, thread 2 does +1 to A and put back to A, a=1, When the CPU rotates to thread 1 again, it overwrites its calculation result 1 of A back to the place of A, resulting in the loss of data. And mutex can guarantee the thread on the data of these operations (the atomic operations: namely, detachable, such as + =, =, * = / =), there can be only one thread to execute this code, the lock can be split among this instruction code either no obstruction were performed, or block does not perform all blocks of column waiting for next time. Thus ensuring the absolute safety of the few code running modified data, as well as operating efficiency.

The problem

Someone says, give up the GIL lock, use a mutex not entirely, we consider, if use the mutex, everything all operation need add mutex, is a large amount of code, even if all correct, executed in the lock and release the lock switch back and forth, you can imagine how low efficiency, and through the global lock and the mutex, The global interpreter lock controls relatively safe modification operations that cannot be modified by multiple threads at the same time, such as append, POP (Operation class atom: Unsplit command), and so on. Mutex is used to control a few relatively insecure data operations (non-atomic operations: split commands), thus ensuring data security and improving CPU efficiency