Why do we need biased locks? When multiple processors are working at the same time, it is often necessary to deal with mutual exclusion. In general, acquire and release operations are included in the solution, which ensures that after acquire is executed by one thread, no other thread can complete acquire until release is executed. This process often involves locking. L. Lamport A Fast Mutual Execlusion Algorithm shows that this can be done through the FAST LOCKS algorithm, where the time required for lock and unlock operations is independent of the number of potential competing processors. Java has monitor built in to handle multithreaded contention.

  1. One optimization is to use lightweight locks to avoid the use of weight locks in most cases. The main mechanism for lightweight locks is the use of atomic operations on monitor entry, as well as certain exit operations that fall back to using the operating system’s mutex if a race occurs

    Lightweight locks are considered non-competitive in most cases

    There are several atomic instructions commonly used in the use of locks:

    • CAS: checks whether the value at the given pointer position is the same as the value passed in, and if so, modifies it
    • SWAP: Replaces the value of the original position of the pointer and returns the old value
    • Membar: Memory barriers restrict reordering of instructions by processors, such as preventing read-write operations from being reordered after write operations

    Two-word object headers are used in Java

    1. Mark Word, which contains synchronization information, garbage collection information, hash code information
    2. A pointer object to an object

    These instructions are expensive because their implementation often depletes the processor’s reorder buffer, limiting the processor’s ability to process instructions like a pipeline. The research data found that Eliminating_synchronization- related_atomic_OPERATIONs_with_biased_locking_and_bulk_rebiasing atomic operation in the real application, Javac, for example, can result in a 20% performance drop.

    CAS and fence are serialized in the operating system, and serialization instructions cause the CPU to almost stop, terminating and disabling any unnecessary instructions, and waiting for local storage to run out. On multi-core processors, this process can result in a considerable performance penalty

  2. Another way to optimize is to use bias locking, which not only assumes that there is no contention in most cases, but that only one thread will perform enter and exit throughout the lifetime of the monitor, making it suitable for the monitor to bias this thread. Of course, if another thread attempts to enter the biased lock, the biased lock cancellation operation will be performed even if there is no contention

Lightweight lock

  1. When a lightweight lock acquires a lock through the Monitorenter instruction, the lock is logged to the thread’s stack to indicate the lock acquisition. The lock record holds the mark Word of the original object and some necessary metadata to identify the locked object. At the time of acquiring a lock, mark word will be copied a record to lock (this operation is called displaced mark word) then try yes CAS operation object mark word pointer to lock the record. If the CAS succeeds, the current thread holds the lock; if it fails, another thread acquires the lock, which is then “inflated”, using the operating system’s mutexes and conditions instead. During the “inflated” process, the object’s own Mark word is manipulated by the CAS to point to the data structure containing mutex and condition.
  2. When unlock is executed, a mark word is thrown through the CAS. If the CAS succeeds, there is no contest and the lightweight lock remains. If it fails, the lock is in contention, and when held, the lock is properly released in a “very slow” manner and other waiting threads are notified to acquire it
  3. The same thread reprocesses the object in a straightforward way. If the light lock finds that the lock it is trying to acquire is already held by the current thread, it stores a 0 and does nothing with the mark word. In the same way, it does not update the mark word of the object if it sees a 0. And every time you reenter, count is explicitly recorded.

Implementation of bias locking

The thread pointer is NULL(0) indicating that no threads are currently biased towards this object

When an object is allocated that can perform bias and has no bias, CAS is executed and the current thread ID is placed in the Thread ID area of the Mark Word.

  1. If successful, the object itself is biased to the current thread, and the current thread becomes the bias owner

    The thread ID points directly to the thread represented within the JVM; In Java virtual machines, the last three bits are filled with 0x5 to indicate bias mode.

  2. If the CAS fails, i.e. another thread has become the owner of the bias, this means that the bias of this thread must be undone. The state of the object becomes a lightweight lock mode, and in order to achieve this, the thread trying to bias the object to itself must be able to manipulate the stack favoring the owner, requiring that the global safety point has been reached (no thread is executing bytecode). At this point, the bias owner will act like a lightweight lock operation, its stack will fill in the lock record, the object’s own Mark word will be updated to point to the oldest lock record on the stack, and the thread’s own blocking at the safe point will be released

    If it is not held by the original biased lock holder, the object is destroyed to return to the biased but not yet biased state, and an attempt is made to regain the lock. If the object is currently locked, it enters the lightweight lock, if not locked, it enters the unlocked, and cannot bias the object

If the object is biased and the owner of the biased thread is the current thread, the lock will be acquired immediately without any additional action.

The stack favoring the holder of the lock will not initialize the lock record, because the lock record will never be checked when the object is favoring

When you unlock, it tests the state of the Mark word to see if it’s still biased. If so, no more tests are done, and it doesn’t even matter if the thread ID is the current thread ID

Here the interpreter guarantees that the Monitorexit operation will only execute on the current thread, so this is another reason not to check

Biased locking mode does not apply

  1. Production-consumer mode, there will be a thread to compete;
  2. One thread allocates multiple objects, performs the initial synchronization for each object, and other threads handle the subprocesses

Batch back to deflatable or undo deflatable?

Experience has found that it is reasonable to selectively disable biased locks (Store-Fremm Biased Lock SFBL) for specific data structures to avoid inappropriate situations. To do this, we need to consider whether each data structure is less expensive to perform the undo bias or to return to the biased state. A heuristic is used to determine which way to perform bulk rebias. The metadata for each class contains a counter and a timestamp that increases each time an instance of a bias lock performs a bias undo. The timestamp is used to record the last time that the bulk rebias was performed.

Undo counts and counts undo operations that are in a biased but unbiased state, requiring only one CAS to undo

Counter itself has two thresholds, the bulk rebias threshold and bulk Revocation threshold. Initially, this heuristic algorithm can independently decide whether to perform REbias or REVOKE. Bulk Rebias will be performed when the threshold of bulk rebias is reached. The shift to the Rebiasable state time threshold is used to reset the counter counter, which will occur if the time elapsed since the last time bulk bias was performed has exceeded this threshold.

This means that there have not been many undo operations since the last time bulk Rebias was executed, which means that bias is still a good option

However, if the number of revocation increases within the time threshold after bulk rebias is implemented, once the Bulk revocation threshold is reached, the object of this class will not be allowed to use biased locks.

The threshold in Hotspot is Bulk rebias Threshold 20 Bulk Revoke Threshold 40 Decay time 25s

Undoing the bias itself is a costly exercise because it must suspend the thread and traverse the stack to find and modify lock Records.

The most obvious way to find all object instances of a data structure is to traverse the heap, which is fine when the heap is small, but performs poorly when the heap is large. To solve this problem for the class, use epoch. Epoch is a timestamp used to indicate the validity of a bias. As long as the data interface is biased, there will be a corresponding EPOCH bit on the Mark Word

At this point, an object is considered to have been biased to the thread T must satisfy two conditions: 1: the mark in the Mark Word that is biased to all this must be the thread, 2: The epoch of the instance must be equal to the epoch of the data structure. The size of the epoch itself is limited, that is, loops may occur, but this does not affect the correctness of the scheme

In this way, the bulk Rebiasing operation of class C will cost much less. Specific operations are as follows

  1. Increment the epoch of class C, which is itself a fixed-length INTEGER with the same number of bits as the epoch in the object header
  2. Scan all thread stacks to locate the locked ones in the current instance of class C, update their epoch to the new epoch of class C, or undo bias according to heuristic policies

Instead of scanning the heap, instances that have not had their epochals changed (unlike the class’s epochals) are automatically treated as being biased but not yet biased

The state can be called rebiaseable

Bloat with bias source code

The current HotSpot virtual machine implementation

Batch undo itself has performance problems, the general solution is as follows

  1. Add epoch, as previously described
  2. The thread does not bias when it first acquires, but after a certain number of executions, the same thread acquires again
  3. Allows locks to have fixed bias threads that change forever (or very little), and allows non-bias threads to acquire locks rather than revoke them.

    This method must ensure that the thread acquiring the lock must ensure that no other thread holds the lock before entering the critical section, and that no read-modify-write instructions can be used, only read and write

Current Hotspot JVM has different form 64bit for 32-bit and 64-bit

Thin locks, as described earlier. It is implemented in HotSpot using the product header, also known as stack lock

Mark’s complete state transition is as follows

  1. The object has just been allocated and is biased and unbiased

  2. The object favors thread T and takes note of the epoch

  3. There are new threads competing

    • 3.1 One strategy is for T to execute the corresponding UNLOCK and reassign it to a new thread so that the undo operation does not need to be performed
    • 3.2 If the biased object is handled by another thread via wait or notify, it enters the expansion mode and uses the weight lock
  4. With new threads competing, one strategy is to use heuristics to count the number of cancellations

    • 4.1 When the bulk rebias threshold is set, the system performs Bulk rebias
    • 4.2 When bulk Revoke is achieved and the product is still being held (the original partial lock holder), switch to lightweight lock (the calculation of hashcode relies on bloat to support modifying product Mark Word)
    • 4.3 No Hashcode calculation is performed when Bulk Revoke is reached and the lock is not held (the original biased lock holder) and is switched to the unlocked unbiased state
  5. For an object that has undergone bulk rebias, an instance that is not locked during the check will have its epoch become stale, unlike the class’s, but can be biased

    • 5.1 If garbage collection occurs, the Lock will be initialized to a biased but unbiased state (which also reduces the impact of the epoch cycle)
    • 5.2 If a biased lock is acquired by a thread again, the thread returns to the biased lock acquisition state
  6. In the lightweight lock state, it might not have a Hashcode calculation, it might, depending on inflat

    • 6.1 Without HashCode, the unlock returns to the unbiased state without HashCode calculation
    • 6.2 Is occupied by other threads and moved to weight locks (such as Mutex and Condition using POXIS)
  7. An unbiased state that is not locked and does not calculate hashcode after locking is transferred to a lightweight lock

  8. In weight lock condition

    • 8.1 8.2 If there is no contention during stop-the-word, it can be de-ballooned (no other threads acquire and release locks during STW, which is safe) and revert to The corresponding state (i.e. revert to using bias locks) depending on whether hashcode exists.
    • 8.3 The Lock/UNLOCK is still in a Weight Lock
  9. After calculating HashCode, add lock and unlock corresponding state transition (9.10)

The appendix

Quickly Reacquirable Locks Dave Dice Mark Moir Bill Scherer

Eliminating_synchronization-related_atomic_operations_with_biased_locking_and_bulk_rebiasing

Evaluating and improving biased locking in the HotSpot virtual machine

biased-locking-in-hotspot