In this paper, the author: Du Yuyang drops | senior experts and engineers – the Linux kernel

Deadlock is a common and serious problem in multithreaded and distributed programs. Deadlocks are devastating, and once they occur, they are difficult or nearly impossible to recover from; Deadlocks are random and only occur if certain conditions are met, and if conditions are complex, the probability of occurrence is low, but once they occur, they are very difficult to reproduce and debug. Deadlock caused by using locks is a common condition in deadlock. The Linux kernel uses the Lockdep tool to detect and specifically predict deadlock scenarios for locks. However, Lockdep currently supports only mutex and does not support more complex read-write locks, especially recursive-read locks. Therefore, Lockdep can have false positive and false negative prediction errors caused by read/write locks. This work first decrypts the Lockdep tool, then proposes a universal deadlock prediction algorithm design and implementation (mutex can be regarded as only using the write lock in the read-write lock), and proves that the algorithm is correct and comprehensive solution.

Earlier this year, we have solved the drops have been severely affected by the foundation platform of large-scale server cluster of three kernel fault, when we solve these problems, a lot of time and energy is spent on where to find who make up the deadlock, the troubleshooting are delayed, so at that time just want to have any generic methods can help us to deal with the deadlock problem. However, due to time constraints, we can only explore and deal with these specific issues. With these kernel failures finally fixed, there was some quiet time to think deeply about why deadlocks occur and how to detect and predict them. With further research on this issue, I have made some algorithm optimization and algorithm design work on kernel deadlock prediction, some of which have been accepted by Linux kernel, and others are still under review. Here I share with you one of the more important work: a generic deadlock prediction algorithm for read-write locks. This work presents a generic deadlock prediction algorithm for locks that supports all Linux kernels reading and writing locks, while proving that the algorithm is correct and comprehensive solution. The core problem solved by this algorithm has been around for more than 10 years (currently under community review). Before introducing this work I will first give a brief introduction to the deadlock problem and the Linux kernel deadlock tool Lockdep.

1. Deadlock

Deadlocks are not uncommon in everyday life. Everyone who lives in a big city has experienced the scene shown below to some extent. A deadlock occurs at a roundabout or intersection. Some of them may be broken, but the vast majority of them work. But because each car had to wait for the car in front to move, none of the cars could move, or more generally, they could not Make Forward Progress. This happens because the waiting of vehicles creates a cycle in which each car is waiting for the car in front, so none of the cars can get to where it is waiting. This kind of vehicle deadlock can continue to worsen and have serious consequences: first, it causes traffic jams at intersections, and if the congestion is further expanded, it can lead to a large area of traffic paralysis. Vehicle deadlocks are difficult to heal by themselves. It is very difficult or takes a long time to get out of the deadlock by themselves and can only be solved by human intervention (such as traffic police).

Deadlocks can also occur in multithreaded or distributed system applications. Its essence is the same as the traffic jam at the intersection mentioned above, because participants constitute a cycle of waiting, so that all participants can not wait for the desired result, so that they can never make progress there. The Linux kernel, of course, is also subject to deadlocks. If Core parts, such as the scheduler and memory management, or subsystems, such as the file system, are deadlocked, the entire system becomes unavailable.

Deadlocks are random. As in the case of the roundabout above, the roundabout is there and deadlocks don’t always happen. However, roundabout itself is a potential deadlock, especially at the intersection with high traffic pressure, roundabout is more likely to produce deadlocks. It would be much better if they were designed as traffic lights, and much better if they were designed as overpasses. In our programs, a Potential Deadlock Scenario is referred to as a Potential Deadlock, and an imminent or ongoing Deadlock is referred to as a Concrete Deadlock instance.

How to deal with deadlocks has always been an active research and solution problem in academic and application fields. Solutions to deadlocks can be roughly divided into deadlock Detection, deadlock Prevention, and deadlock Prediction. Deadlock discovery refers to the discovery of deadlock instances during program execution. Deadlock avoidance further prevents deadlock instances when they are found to be about to be created; Deadlock prediction is to find out the potential deadlocks in the program through static or dynamic methods so as to eliminate the potential deadlocks fundamentally.

2. Deadlock and Lockdep of locks

In deadlocks, deadlock caused by improper use of locks is an important source of deadlocks. Lock is one of the main means of synchronization, and it is inevitable to use lock. For complex synchronization relationships, the use of locks can be complicated. If used improperly, it is easy to cause deadlock of the lock. From the perspective of waiting, locks are deadlocked due to participating threads waiting for the lock to be released, and this waiting constitutes a waiting loop, such as an ABBA deadlock:

Where, the black arrow in the thread represents the current execution statement of the thread, and the red arrow represents the wait relationship between the thread statements. As you can see, the red arrows form a circle (or loop). Again, reviewing potential deadlocks and deadlock instances, it is very likely that deadlock instances would not occur if the execution times of the two threads were slightly different, such as if Thread1 were allowed to finish executing before Thread2 started executing. However, such Locking Behavior is undoubtedly a potential deadlock.

Further, it is possible to predict or identify deadlocks if we can track and analyze the locking behavior of programs, rather than waiting for deadlocks to occur to detect deadlock instances. The Lockdep tool for the Linux kernel characterizes the locking behavior of the kernel to predict and report potential deadlocks.

Lockdep can characterize the behavior of A Class of locks, mainly by recording the Locking Order of all Lock instances in A Class of locks. That is, if A thread holds Lock A and takes Lock B again before releasing it, then Lock A and Lock B have an A->B Locking sequence. This locking sequence is called Lock Dependency in Lockdep. Similarly, for ABBA type deadlocks, we do not need Thread1 and Thread2 to produce exactly one instance of the deadlock, as long as one thread produces the sequence behavior of A->B, and another thread produces the sequence behavior of B->A, then this would constitute A potential deadlock, as shown in the following figure:

As a generalization, we can record and save all locking sequences (i.e. lock dependencies) to form a locking sequence Graph. If there is A lock dependency A->B and A lock dependency B->C, then the Relation is Transitive, so we can also obtain A lock dependency A->C. A->B and B->C are referred to as Direct dependencies, while A->C are referred to as Indirect dependencies. For each new direct lock dependency, we check to see if the dependency loops with the existing lock dependency in the diagram, and if so, we can predict a potential deadlock.

3. Read-write Lock

All locks we have just referred to are Exclusive locks. A read-write Lock is a more complex type of Lock, or a General Lock. We can think of a mutex as a read-write Lock that uses only write locks. A read lock allows readers to hold it simultaneously as long as there is no write lock or contention for the write lock. There are various read and write locks in the Linux kernel, including rwSEM, rwlock, and qrwlock. The problem is that read-write locks complicate deadlock prediction. Lockdep does not support these types of read-write locks. Therefore, Lockdep generates False Positive and False Negative deadlock predictions when used. This problem has existed for more than 10 years. We propose a general deadlock prediction algorithm for locks and prove that this algorithm solves the deadlock prediction problem for read/write locks.

4. General Deadlock Prediction For Locks

In the process of describing this algorithm, we propose several lemmas to explain or prove the validity of our proposed deadlock prediction.

Lemma 1: In the introduction of read and write lock, lock sequence cycle is a potential deadlock necessary conditions, but not sufficient conditions. Also, a potential deadlock can and can only be predicted as soon as the last lock order (or lock dependency) is about to generate a deadlock cycle.

Based on lemma 1, to solve the deadlock prediction problem is to calculate whether the waiting circle constitutes a potential deadlock through some method when the last lock order (i.e. lock dependence) forms a waiting circle (cycle), and our task is to find this method (algorithm).

Lemma 2: Two virtual threads T1 and T2 can be used to represent all deadlock scenarios.

For any deadlock instance, assuming that n threads are involved in the deadlock instance, the n threads are represented as:

T1, T2,... ,TnCopy the code

Consider the case of n:

If n=1: This Deadlock, in which a thread waits on itself, is called a Recursion Deadlock in Lockdep. Because checking for such deadlocks is simple, this special case is ignored in the algorithm below. If n>1: this Deadlock is called an Inversion Deadlock in Lockdep. For this case, we divide the n threads into two groups, namely T1,T2… Tn-1 and Tn, and then combine all the lock dependencies in the previous group and assume that all these dependencies exist in a virtual thread, resulting in two virtual threads T1 and T2.

These are the two virtual threads described in Lemma 2. Based on Lemma 2, we propose a deadlock checking two-thread Model to represent the locking behavior of the kernel:

T1: Currently checks all previous lock dependencies, which form a lock dependency graph. T2: the current direct lock dependency to check.

Based on Lemma 2 and the deadlock-checking two-thread model, we can obtain the following lemma:

Lemma 3: Any deadlock can be converted into the ABBA type.

Based on the above three lemmas, we can further describe the deadlock prediction problem as, when we get A new direct lock dependency B->A, we assume this new dependency as T2, and all previous lock dependencies exist in A lock dependency graph generated by an assumed T1. Therefore, deadlock prediction is to check whether there is A lock dependency of A->B in T1, if there is A deadlock, otherwise there is no deadlock and merge T2 into T1. As shown below:

With the introduction of read-write locks, lock dependencies also depend on the type of lock in the lock, that is, the read or write type. According to the design characteristics of mutex and read-write lock in Linux kernel, we introduce a lock mutex table to represent the mutex relationship between locks:

A recursive-read Lock is a special type of read Lock that can be accessed recursively by the same thread. First, we propose a Simple Algorithm. Based on the two-threaded model, given T1 and T2, and ABBA locks:

The steps of the simple algorithm are as follows:

If x1.a and x1.b are mutually exclusive and X2.a and x2.b are mutually exclusive, then T1 and T2 form A potential deadlock.

Otherwise, T1 and T2 do not constitute potential deadlocks.

It can be seen from the simple algorithm that the lock type determines the mutual exclusion between locks, and the mutual exclusion is the key information to check deadlocks. For read-write locks, lock types may change during program execution, so how do you keep track of all lock types? We propose Lock Type Promotion based on the mutual exclusion of Lock types, that is, the mutual exclusion of Lock types goes from low to high: recursive read Lock < read Lock < write Lock (mutex). During the execution of the program, if we encounter a high mutex lock type, we upgrade the lock type from the lock dependency to the high mutex lock dependency. Lock type upgrade as shown below:

RRn stands for Recursive read Lock N, Rn stands for read Lock N, Wn stands for Write Lock or mutex Lock n. The following Xn represents any lock n (that is, a recursive read, read, or write lock).

However, the above simple algorithm can not handle all the deadlock prediction cases, such as the following case will avoid the simple algorithm, but in fact it is a potential deadlock:

In this case, X1 and X3 are mutually exclusive and therefore this case constitutes a potential deadlock. However, when the simple algorithm checks RR2->X1 (that is, T2 is RR2->X1), it can find X1->RR2 in T1 according to the simple algorithm. However, since RR and RR are not mutually exclusive, it mistakenly determines that this case is not a deadlock. The reason why this case is incorrect is that X3->X1 in the true deadlock X1X3X3X1 is an indirect lock dependency, which is missed by the simple algorithm.

The deeper reason for this problem is that mutexes are mutually exclusive, so any ABBA is a potential deadlock and there is no need to check for T2’s indirect lock dependencies. In the case of read locks, this condition does not exist, so we have to consider indirect lock dependencies in T2.

Lemma 4: For direct dependence on the introduction of indirect lock dependence, if the indirect lock dependence constitute a deadlock, then direct lock dependence is still the key.

Lemma 4 is an extension of Lemma 1. According to Lemma 1, the direct lock dependency must be the last lock dependency that forms the lock cycle, and lemma 4 states that there must be some way to determine whether the lock cycle is potentially deadlocked. In other words, the new algorithm is bound to solve this problem by modifying and enhancing the simple algorithm previously proposed. However, the problem is that the original T2 direct lock dependency may generate many indirect lock dependencies. How do we find the indirect lock dependency that ultimately leads to a potential deadlock? Further, we first need to redefine T2, and then find all the indirect lock dependencies in this T2, so what is the boundary of T2? If T2 were extended to cover the entire lock dependency graph, the algorithm complexity would increase so much that it might even exceed the processing power of Lockdep, making Lockdep virtually unavailable.

Lemma 5: T2 only needs to be extended to the current thread to take the Lock Stack.

According to Lemma 5, we first modify the previously proposed dual-thread model as follows:

T1: Currently checks all lock dependencies prior to direct lock dependencies, which form a graph. T2: stack lock of the current thread to be checked.

According to Lemma 5 and the new two-thread model, we propose the following Final Algorithm based on the simple Algorithm:

Continuing to search the lock dependency graph T1 looks for a new lock dependency loop. In this new loop, if there are other locks in T2, the lock and the direct lock dependency in T2 form an indirect lock dependency, and the indirect lock dependency is checked for potential deadlocks. If a potential deadlock is found, then the algorithm ends, if not until the algorithm turns to 1 until the entire lock dependency graph is searched.

Will the final algorithm solve the previous cases of vulnerability? The answer is yes, as shown in the figure:

However, is lemma 5 true for all other cases? Why does the final algorithm work? We prove that the indirect lock dependence in the final algorithm is necessary and sufficient by the following two lemmas.

Lemma 6: Check T2 among the indirect lock dependence is necessary, otherwise the simple algorithm has solved all the problems.

Lemma 6 states that direct lock dependencies cannot be checked only because of read-write locks.

Lemma 7: T2 boundary is the current thread of the stack, which is sufficient.

According to Lemma 2 and 3, any deadlock can be converted to a two-threaded ABBA deadlock, and T1 can only contribute AB, and T2 must contribute BA. Here, T2 is not only a virtual thread, but also an actual physical thread, so T2 needs to check the current thread only.

At this point, the description of a generic read-write deadlock prediction algorithm is not formally proven. The algorithm is implemented in Lockdep community for review and submitted to Linux memory (the latest version of the current see https://lkml.org/lkml/2019/8/29/167). Due to the limitation of relevance and space, some key details of the algorithm are not all shown here. Readers who are interested can go to the link above to find it. Meanwhile, comments and suggestions are welcome.

It reviews the process from dealing with several serious system failures that broke out in a large cluster of Didi basic platform, to learning and researching the kernel deadlock prediction tool, and then to designing and implementing a new universal read-write deadlock prediction algorithm. There was a lot of uncertainty and even drama, but the whole process and the end result was rewarding. I think this experience is just like the forrest Gump running in the movie forrest Gump: when you reach a destination, you want to run a little more, and when you reach the next destination, you set a new and further goal. I also think that the difference between ordinary work and world-class work is not the starting point, but the end point, is whether to run a few farther goals.

At the same time, welcome to pay attention to didi technology official account, we will release the latest open source information and technical information in a timely manner!