There is usually a balance between security and activity:

We usually use locking mechanisms to ensure thread safety, but if locking is used excessively, it can lead to sequential deadlock of locks.

Similarly, we use thread pools and semaphores to restrict the use of resources, but these restricted behaviors can lead to resource deadlocks.

First, the deadlock

Deadlock conditions are best described in the classic “philosopher’s meal” problem.

Five philosophers sat around a round table, with a chopstick between them. Philosophers thought and ate only when they were hungry. It is stipulated that each philosopher can only take his left chopstick first and then his right chopstick before eating. If five philosophers pick up their left chopsticks at the same time, a deadlock occurs. Everyone has the resources that everyone else needs while waiting for the resources that everyone else already has, and everyone doesn’t give up what they already have until they have all the resources they need.

When one thread holds a lock forever, and other threads try to acquire it, they are blocked forever. This is the simplest form of deadlock (called Deadly Embrace), in which multiple threads wait forever because of a loop’s lock dependency. Imagine each thread as A node in A directed graph, where each edge represents the relationship “thread A waits for resources occupied by thread B”. If the graph forms a loop, there is a deadlock.

1.1 Lock sequence deadlock

A deadlock occurs when two threads attempt to acquire the same lock in different order. If locks are requested in the same order, there will be no recurring lock dependencies and no deadlocks. If every thread that requires locks L and M acquires L and M in the same order, no deadlocks occur

If all threads acquire locks in a fixed order, there will be no lock order deadlocks in the program.

1.2 Dynamic Sequential lock deadlocks

Consider the code below, which transfers funds from one account to another. Before you can start transferring money, you must first acquire the locks on both Account objects to update the balances in both accounts without atomically, without breaking invariance conditions such as “the Account balance cannot be negative.”

Public void transferMoney(Account fromAccount, Account toAccount, DollarAmount amount) throws InsufficientFundsException { synchronized (fromAccount) { synchronized (toAccount) {if (fromAccount.getBalance().compareTo(amount) < 0)
            throw new InsufficientFundsException();
        else{ fromAccount.debit(amount); toAccount.credit(amount); }}}Copy the code

All threads seem to acquire locks in the same order, but in fact the order in which they are locked depends on the order of arguments passed to transferMoney, which in turn depends on external inputs. If two threads call transferMoney at the same time, and one thread transfers money from X to Y and the other thread transfers money from Y to X, a deadlock occurs:

A: transferMoney(myAccount, yourAccount, 10);

B: transferMoney(yourAccount, myAccount, 20);

If the execution is not timed properly, then A may acquire the myAccount lock and wait for yourAccount lock, while B has yourAccount lock and is waiting for myAccount lock.

When specifying the order of locks, you can use the system. identityHashCode method, which returns the value returned by Object.hashCode. Another version of transferMoney, shown below, uses System.identityHashCode to define the lock order. Some new code has been added, but deadlocks are eliminated.

Private static final Object tieLock=new Object(); public void transferMoney(final Account fromAcct, final Account toAcct, Final DollarAmount amount) throws InsufficientFundsException {/ / custom Exception class, inheritance Exception class, When the withdrawal amount is greater than the deposit throw class Helper {public void transfer () throws InsufficientFundsException {if(fromAcct.getBalance().compareTo(amount)<0)
                   throw new InsufficientFundsException();
               else{ fromAcct.debit(amount); // Debit, fromAcct reduce amount toacct. credit(amount); // toAcct increases amount}}} // uses System. IdentityHashCode to define the lock sequence //// returns the hash code for the given object, which is identical to the default methodhashCode() returns the same Code regardless of whether the given object's class is overridden (override).hashCode (). int fromHash=System.identityHashCode(fromAcct); int toHash=System.identityHashCode(toAcct);if(fromHash<toHash){ synchronized (fromAcct) { synchronized (toAcct) { new Helper().transfer(); }}}else if(fromHash>toHash){ synchronized (toAcct) { synchronized (fromAcct) { new Helper().transfer(); }}}else{// If you get two identical hashcodes, use the overtime lock, Synchronized (tieLock) {fromAcct (fromAcct) {synchronized (toAcct) {new Helper().transfer(); } } } } }Copy the code

In rare cases, two objects may have the same HashCode, and the order of locks must be determined in some arbitrary way, which may reintroduce deadlocks. To avoid this, use a tie-breaking lock. Acquiring the “overtime” lock before acquiring both Account locks ensures that only one thread at a time obtains both locks in an unknown order, eliminating the possibility of deadlocks occurring (as long as this mechanism is used consistently).

If hash collisions occur frequently, this technique may be referred to as a bottleneck to concurrency (similar to having only one lock in the whole program because there is often a wait to obtain a overtime lock), But because the frequency of hash collisions in System. IdentityHashCode is very low, this technique provides maximum security at minimal cost.

If an Account contains a unique, immutable, and comparable key, such as an Account, it is much easier to specify the lock order: objects are sorted by key, thus eliminating the need for “overtime” locks.

1.3 Deadlocks occur between collaborating objects

Some operations that acquire multiple locks are not as obvious as the above; the two locks are not necessarily acquired in the same method. The two classes below work together and may be used in taxi scheduling systems. Taxi represents a Taxi object, including location and destination two attributes, Dispatcher represents a Taxi fleet.

Class Taxi{private Point location,destination; //Taxi represents a Taxi object, including location and destination attributes private final Dispatcher Dispatcher; Public Taxi(Dispatcher Dispatcher){this.dispatcher= Dispatcher; } public synchronized PointgetLocation() {return location;
        }
        public synchronized void setLocation(Point location){
            this.location=location;
            if(location.equals(destination))
                dispatcher.notifyAvailable(this);
        }
    }

    class Dispatcher{
        private final Set<Taxi> taxis;
        private final Set<Taxi> availableTaxis;

        public Dispatcher(){ taxis=new HashSet<Taxi>(); availableTaxis=new HashSet<Taxi>(); } public synchronized void notifyAvailable(Taxi Taxi){//notify, availableTax.add (Taxi); } public synchronized ImagegetImage(){
            Image image = new Image();
            for (Taxi t : taxis)
               image.drawMarker(t.getLocation());
            returnimage; }}Copy the code

Although no method explicitly acquires two locks, callers of methods such as setLocation and getImage do. If a thread calls setLocation when it receives an update event from a GPS receiver, it will first update the location of the taxi and then determine whether it has arrived at its destination. If it does, it notifies the Dispatcher that it needs a new destination. Because setLocation and notifyAvailable are both synchronous methods, the thread calling setLocation will first acquire the lock of Taxi and then the lock of Dispatcher. Similarly, the thread calling getImage will first acquire the Dispatcher lock and then the lock for each Taxi (one at a time). This is similar to the situation in LeftRightDeadlock, where two threads acquire two locks in different order, and deadlocks can result.

In LeftRightDeadlock and transferMoney, finding deadlocks is simpler: just find the methods that require two locks. However, it is difficult to find deadlocks in Taxi and Dispatcher: if you need to call an external method while holding a lock, you need to be alert to deadlocks.

If an external method is called while the lock is held, activity problems occur. Other locks may either be acquired in this external method (which can cause deadlocks), or the block may be too long for other threads to acquire the currently held lock in a timely manner.

1.4 Open Calls

Method calls act as an abstraction barrier; you do not need to know what is being done in the calling method, and because you do not know what is being done in the called method, it is difficult to analyze the call to an external method while holding the lock, which can lead to deadlocks.

If you do not need to hold a lock when calling a method, the Call is called an Open Call.

Classes that rely on open scheduling generally behave better and are easier to write than classes that require locks to be held when scheduling methods. This approach to avoiding deadlocks through openness is similar to the approach to providing thread-safety through encapsulation: while it is possible to ensure that thread-safe classes are built without encapsulation, it is much easier to analyze thread-safety for a program that uses encapsulation than for one that does not. Similarly, it is easier to analyze the activity of a program that depends entirely on open calls than it is to analyze the activity of a program that does not. By using open calls whenever possible, it is easier to identify code paths that require multiple locks, and therefore easier to ensure that locks are acquired in a consistent order.

Eliminating the risk of deadlocks by changing the above code to open calls requires that the synchronized code block be used only to protect operations that involve shared state.

In general, using synchronized methods (rather than synchronized code blocks) just for syntactic compactness or simplicity (and not because the entire method must be protected by a lock) will cause problems in the code above.

Class Taxi{private Point location,destination; //Taxi represents a Taxi object, including location and destination attributes private final Dispatcher Dispatcher; Public Taxi(Dispatcher Dispatcher){this.dispatcher= Dispatcher; } public synchronized PointgetLocation() {return location;
        }
        public void setLocation(Point location){
            boolean reachedDestination;
            synchronized (this) {
                this.location=location;
                reachedDestination=location.equals(destination);
            }
            if(reachedDestination) dispatcher.notifyAvailable(this); }} class Dispatcher{// private final Set<Taxi> Taxis; private final Set<Taxi> availableTaxis; publicDispatcher(){ taxis=new HashSet<Taxi>(); availableTaxis=new HashSet<Taxi>(); } public synchronized void notifyAvailable(Taxi Taxi){//notify, availableTax.add (Taxi); } public ImagegetImage(){
            Set<Taxi> copy;
            synchronized (this) {
                copy=new HashSet<Taxi>(taxis);
            }
            Image image=new Image();
            for(Taxi t:copy)
                image.drawMarker(t.getLocation());
            returnimage; }}Copy the code

Use open calls wherever possible in your programs; it is easier to perform deadlock analysis on programs that rely on open calls than on programs that call external methods while holding the lock.

Rewriting synchronized blocks of code to use open calls sometimes produces unexpected results because it makes an atomic operation non-atomic. In many cases, it is acceptable for an operation to lose its atomicity. For example, updating the location of a taxi and notifying the scheduler that the taxi is ready to go to a new destination do not need to be implemented as an atomic operation.

However, in some cases, the loss of atomicity causes an error, at which point another technique is required to implement atomicity. For example, when constructing a concurrent object, having only one thread execute at a time uses the code path of open calls. For example, when shutting down a service, you might want to release the resources occupied by those services after all the ongoing operations have completed. Holding the lock on the service while waiting for the operation to complete can easily lead to a deadlock, but releasing the lock on the service before the service is shut down can cause another thread to start a new operation. The solution to this problem is to hold the lock until the status of the service is updated to “closed” so that other threads that want to start new operations, including other operations that want to close the service, will find that the service is not available and will not attempt to start new operations. You can then wait for the close operation to end, knowing that when the open call is complete, only the thread that performed the close operation will have access to the state of the service. Therefore, the technique relies on some protocol (rather than locking) to prevent other threads from entering the critical sections of code.

1.5 Resource deadlocks

Just as a deadlock occurs when multiple threads hold each other’s locks that they are waiting on without releasing their own locks, a deadlock also occurs when they are waiting on the same set of resources.

Suppose you have two resource pools, such as connection pools for two different databases. Resource pools usually use semaphores to implement (5.5.3) blocking behavior when the resource pool is empty. If A task needs to connect the two databases, and at the request of these two resources will not always follow the same order, then thread A D1 may hold and database connection, and wait for the connection with database D2, while the thread B hold connection with D2 and wait for the connection with D1 (the greater the resource pool, the smaller the possibility that this kind of situation, If each resource pool has N connections, then not only do N threads loop waiting in the event of a deadlock, but also a large number of improper execution timings are required.

Another form of resource-based Deadlock is a thread-starvation Deadlock.

An example: a task submits another task and waits for the submitted task to complete in a single-threaded Executor. In this case, the first task will wait forever, causing the other task and all other tasks executed within this Executor to stop executing. If some tasks need to wait for the results of other tasks, these tasks tend to be the main source of thread starvation deadlocks, and bounded pool/resource pool cannot be used together with interdependent tasks.

Deadlock avoidance and diagnosis

If multiple locks must be acquired, the order of the locks must be considered at design time: minimize the potential number of lock interactions, document the protocols to be followed when acquiring the locks, and always follow those protocols.

173 In programs that use fine-grained locks, we use a two-part Strategy to check for deadlocks in code: First, figure out where multiple locks will be acquired (keeping the set as small as possible), and then analyze all of these instances globally to ensure that they are acquired in the same order throughout the program. Use open calls whenever possible, which can greatly simplify the analysis process. If all calls are open calls, it is easy to find instances of acquiring multiple locks, either through code review or through automated source code analysis tools.

2.1 Periodic locking is supported

With built-in locks, as long as the lock is not acquired, it will wait forever, whereas with explicit locks, you can execute a Timeout, and tryLock will return a failure message after waiting beyond the event.

If the timeout is much longer than the time it took to acquire the lock, control can be regained after some unexpected event occurs.

When a timing lock fails, you do not need to know the cause of the failure. Maybe a deadlock occurred, maybe a thread mistakenly entered an infinite loop while holding the lock, or maybe an operation took much longer to execute than expected. However, you can at least record the failure that occurred, along with other useful information about the operation, and restart the calculation in a more gentle way rather than shutting down the whole process.

Using timing locks to acquire multiple locks is an effective way to deal with deadlocks even if they are not always used throughout the system. If you time out while acquiring the lock, you can release the lock, then back up and nibble again some time later, eliminating the condition for deadlocks to occur and allowing the program to recover. (This technique only works if two locks are acquired at the same time, and if multiple locks are requested in nested method calls, you cannot release them even if you know that an outer lock is already in place.)

2.2 Deadlock analysis based on thread dump information

The JVM uses Thread dumps to help identify deadlock occurrences.

Thread dumps include stack trace information for each running thread, which is similar to stack trace information when an exception occurs.

Thread dumps also contain locking information, such as which locks are held by each thread, which stack frames to acquire them in, and which locks the blocked thread is waiting to acquire. Before a thread dump is generated, the JVM will run a search loop in the wait diagram to find deadlocks. If a deadlock is found, the deadlock information is obtained, such as which locks and threads are involved in the deadlock, and where in the program the lock acquisition operation is located.

To trigger a thread dump on a UNIX platform, you can either send a SIGQUIT message to the JVM’s process (kill-3) or press Ctrl-\ on a UNIX platform or Ctrl-break on a Windows platform. Thread dumps are requested in many ides (Integrated Development Environments).

When a deadlock occurs, you can find information like this: Found One Java-level deadlock:

Built-in locks are associated with acquiring the thread stack frame they are in, whereas explicit locks are associated with acquiring only its thread.

Other active dangers

Deadlocks are the most common active hazard, but there are other active hazards in concurrent threads, including starvation, lost signals, and live locks.

3.1 Starvation

Starvation occurs when a thread cannot continue execution because it does not have access to the resources it needs.

The most common source of hunger is the CPU clock cycle. If the priority of threads is misused in Java applications, or if some unfinishable structure is performed while holding the lock (such as an infinite loop, or an unlimited wait for a resource), starvation can also occur because other threads that need the lock will not be able to get it.

Thread priorities are defined in the Thread API only as a reference for Thread scheduling. There are 10 priorities defined in the Thread API that the JVM maps to the operating system’s scheduling priorities as needed, and this mapping is platform-specific (different operating systems). On some operating systems, if the number of priorities is less than 10, multiple Java priorities are mapped to the same priority.

Avoid using thread priorities, as this increases platform dependency and can lead to activity issues. In most concurrent applications, the default thread priority can be used.

3.2 Poor responsiveness

Poor responsiveness is common when background threads are used in GUI applications.

In a GUI framework, if your background tasks are CPU-intensive and compete with the main event thread for CPU clock cycles, you can reduce the priority of the background thread as a result of the responsiveness of the main CPU thread.

Poor lock management can also lead to poor responsiveness. If a thread holds a lock for a long time (or is iterating over a large container and doing computationally intensive processing on each element), other threads that want to access the container will have to wait a long time.

3.3 Livelock

Livelock is another form of an activity problem. While it does not block a thread, it cannot continue because the thread keeps doing the same thing over and over again and always fails.

Live locks typically occur in applications that process transaction messages: if a message cannot be successfully processed, the message processing mechanism rolls back the entire transaction and puts it back at the top of the queue. If the message processing has an error in processing a particular type of message that causes it to fail, a transaction rollback occurs every time the message is fetched from the queue and passed to the processor where the error occurred. Since the Message is put back at the beginning of the queue, the handler is called repeatedly and returns the result of the first communication (sometimes called a Poison Message, Poison Message).

Although the thread processing the information is not blocked, it cannot continue to execute. This form of live locking is often caused by excessive error-recovery code, because it mistakenly treats unfixable errors as fixable errors.

A live lock occurs when multiple cooperating threads respond to each other to modify their state and make it impossible for any thread to continue. It is like two people who are too polite and meet on the way: they give way to each other, and then meet in another way, so that they do so repeatedly.

To solve this kind of live lock problem, you need to introduce randomness into the retry mechanism. On a network, for example, if two machines try to send packets using the same carrier, the packets will collide. Both machines detected a conflict and both sent again later. If they both try again after 0.1 seconds, they clash again and again, so that packets cannot be sent even if there is a lot of idle broadband. To avoid this, they need to wait for a random amount of time (the Ethernet protocol defines an exponential fallback mechanism in the event of repeated collisions, reducing the risk of congestion and repeated failures between multiple machines that are in conflict).

In concurrent applications, live locks can be effectively avoided by waiting for random lengths of time and fallbacks.