Notes on Understanding the Java Virtual Machine in Depth

Reread Understanding the Java Virtual Machine and organize your notes in question and answer format.

How are Java memory regions allocated?

During the execution of a program, Java divides the memory it manages into several different areas, each with its own purpose, creation time and destruction time. There are several areas: program counter, virtual machine stack, local method stack, heap, method area, and run-time constant pool

Program counter: A small area of memory that can be used as an indicator of the line number of bytecode executed by the current thread. Because in multithreading, it is the thread in turn switch, allocate CPU execution time to achieve. A kernel can execute only one instruction at any given time. So in order for the thread to switch back and forth and continue to execute instructions from the correct location, you need a program counter. Also, this block of memory is private to each thread. In addition, if the thread is executing a Java method, the counter records the address of the bytecode instruction. If a local method is executed, the counter value is null. This area is the only one in the Java Virtual Machine Specification that does not specify any OOM conditions.

Virtual machine stack: Thread is private and has the same life cycle as thread. For each method execution, the virtual machine creates a stack frame that stores information about local variables, operand stacks, dynamic connections, method exits, and so on. Each method called to the end of the execution, corresponding to a stack frame on and off the stack.

Local variable table: Stores the various Java virtual machine primitives known at compile time, object references, and returnAddress types (addresses pointing to a bytecode instruction). Raises stack overflow and OOM exceptions.

Native method stack: Very similar to the virtual machine stack, except that the virtual machine stack serves Java methods and the Native method stack serves Native methods.

Heap: The largest chunk of memory managed by the virtual machine, shared by all threads and created when the virtual machine is started. The only purpose of a heap is to hold object instances, and almost all object instances in Java are allocated memory in the heap.

Method area: used to store type information that has been loaded by the virtual machine, constants, static variables, code cache compiled by the just-in-time compiler, and so on. The method area also includes a run-time constant pool. In addition to the description of the Class version, fields, methods, and interfaces, the Class file contains the constant pool table, which is used to store various literals and rich references generated at compile time. This part of the content is stored in the runtime constant pool of the method area after the Class is loaded.

How to understand the JMM?

Java Memory model: A specification that specifies how the JVM uses computer memory. Broadly speaking, there are two parts: THE JVM memory structure and the JMM and threading specifications

The JMM is designed to control thread communication between Java, determining when a thread’s write to a shared variable is visible to another thread (defining an abstract relationship between threads and main memory).

The JMM assures developers that if programs are properly synchronized, their execution will be sequentially consistent (sequence-consistent memory model)

On the basis of ensuring sequential consistency (execution results remain unchanged), the compiler and processor are given maximum freedom to optimize (improve program parallelism).

External (multi-threaded) : various synchronization mechanisms (volatile, lock, final, synchronize, etc.)

See PRIK’s BLOG for more details on this in another article: An In-depth Understanding of the Java Memory Model

What is the process of creating Java objects?

  1. When the virtual machine reaches a bytecode new instruction, it checks to see if the instruction’s argument can locate a symbolic reference to a class in the constant pool, and determines whether the class represented by the reference has been loaded, parsed, and initialized. If not, the corresponding class loading process must be performed first.
  2. After the class load check passes, it’s time to allocate memory. The amount of memory an object needs can be determined after the class is loaded. Allocating memory to an object is the equivalent of quickly dividing a full chunk of memory out of the Java heap. There are two situations:
    1. If the heap memory is absolutely complete, all you need to do is move a pointer to the boundary between used and unused memory to one side. This allocation is called pointer collisions.
    2. If the heap memory is not completely tidy, the virtual machine needs to maintain a list of what memory is available and how big it is. When allocating memory, you need to find a chunk of memory in the list that is large enough to allocate to the object instance and update the list. This approach is called the free list.

    The choice of mismatch depends on the tidiness of the heap, which in turn depends on the ability of the garbage collector to compress. Serial, ParNew, and other collectors have a compacted collation process that can be used as a pointer collision. CMS, which is based on mark-sweep algorithms, can theoretically only use the free list approach.

  3. There is also the issue of thread safety to consider. In concurrent cases, many operations are thread-unsafe. There are two solutions:
    1. CAS+ failed and retry to ensure atomicity of the update operation.
    2. Thread Local Allocation Buffer TLAB. Each thread has its own piece of memory, and when it runs out, it synchronizes the lock to allocate memory.
  4. You then need to initialize the allocated memory space (except for the object header) to zero (for the data type) to ensure that the Fields of the Java object can be used in code without assigning initial values.
  5. Set the object header. The object header contains information about which class the object is an instance of, how to find the metadata of the class, the object’s hashCode(computed only when the hashCode() method is called), and the object’s GC generation age.
  6. At this point from the virtual machine’s point of view, an object has been created. From a programmer’s point of view, object creation is just beginning. Because the constructor hasn’t been executed yet. We also need to execute the () constructor in the Class file to make the object as constructed as we want it to be, and then a fully usable object is created.

What are the parts of a Java object? What information is stored separately?

A Java object consists of three parts, the object header, the instance data, and the alignment padding

  1. Object. The object header stores two parts of information. The first part is the runtime data of the object itself, such as hash code, GC generation age, lock status flag, thread held lock, bias thread ID, bias timestamp, etc. The length of the data is 32 – or 64-bit (depending on the number of VM bits). In order to increase space utilization, it is designed as dynamic data structure. Store as much data as possible in a very small space. The meaning of the stored information varies according to the state of the object. The second part is the type pointer. A pointer to an object’s type metadata that the virtual machine uses to determine which class the object is an instance of.
  2. Instance data. That is, the information about the object we actually store, the various fields defined in the code, and so on.
  3. Align the fill. Placeholder. The HotSpot VIRTUAL machine’s automatic memory management system requires that the object size be an integer multiple of 8 bytes. The object header has been precisely designed to be twice or twice the size of the 8 bits, and the object instance data portion needs to be aligned to complete if it is not complete.

How does the virtual machine find the location of Java objects? How many ways are there? What are their strengths and weaknesses?

Java programs manipulate concrete objects on the heap using reference data on the stack. There are two main approaches: handles and direct Pointers.

Handle: The Java heap allocates a block of memory as a handle pool, and reference stores the address of the handle to the object. The handle stores the specific address information of the object instance data and the type data. The advantage is that reference stores stable handle information, and the operation of moving objects is similar to garbage collection. Only the data pointer in the handle needs to be changed, but reference itself does not need to be modified.

Direct pointer: Reference stores the address of the instance data of the object, and the object also needs to consider how to store the information related to the type data of the object. Advantage: fast, if only access the object itself, save a pointer location time overhead.

How do you determine if an object is garbage that needs to be collected? What is the garbage collection process like?

Garbage collection needs to answer three questions first:

  1. What memory needs to be reclaimed?
  2. When is it recycled?
  3. How to recycle?

To determine which memory needs to be reclaimed is to determine which objects are dead (no longer needed). There are two main methods:

  1. Reference counting: Adds a reference counter to an object that increments by one each time a reference is made. When a reference is invalid, the counter is decayed by one. Any object with a zero counter is no longer used. Although this approach is simple and efficient, it has not been adopted by mainstream Java virtual machines. The reason is that this deceptively simple algorithm has a lot of exceptions to consider and requires a lot of extra processing to work correctly. For example, reference counting alone is difficult to solve the problem of circular references between objects.
  2. Reachable analysis algorithm: The basic idea is to search down according to reference relationship through a series of root objects called “GC Roots” as the starting node. If there is no path for an object to reach GC Roots, it means that the object is unreachable from GC Roots to this object, and thus proves that this object is no longer in use.

Objects that are determined to be unreachable in the reachability analysis are not immediately garbage collected. If unreachable is found, the first flag is made, and then a filter is done, depending on whether the object needs to execute the Finalize () method. If the object has not overridden the method, or has already been invoked by the virtual machine, it is considered unnecessary to execute. Objects that need to execute the finalize() method are put into a queue, and then a thread with low scheduling priority is automatically set up by the virtual machine to execute their Finalize () method. The virtual machine only promises to trigger the method to start execution, but does not promise to wait for it to finish. In addition, the Finalize () method has been officially declared not to be recommended. A better approach is to use try-finally.

What reference types are available in Java?

After JDK1.2, Java divides references into strong, soft, weak, and virtual references.

Strong references: The most traditional reference, simply new an object reference. In any case, as long as a strong reference relationship exists, the garbage collector will never reclaim the referenced object.

Soft references: Soft references are weaker than strong references and describe objects that are useful but not required. If the memory is found to be insufficient, the soft reference object is recycled twice. An out-of-memory exception is thrown if there is still not enough memory after the collection. Can be used for memory sensitive caching. SoftReference implementation.

Weak reference: a reference weaker than a soft reference. Weakly referenced objects only survive until the next garbage collection occurs. When the garbage collector starts working, it reclaims weak reference objects regardless of whether there is currently sufficient memory. It can also be used for memory-sensitive and less important caches. WeakReference implementation.

Virtual reference: also known as phantom reference, is the weakest reference relationship. The existence of a virtual reference does not affect the lifetime of an object, nor can a virtual reference be used to obtain an instance of an object. The sole purpose of setting a virtual reference to an object is to send a system notification before the object is collected by the garbage collector. PhantomReference implementation.

What is the method area garbage collection like?

The Java Virtual Machine specification mentions the possibility of not requiring garbage collection in the method area. Moreover, the cost-effectiveness of method area garbage collection is low, there is not much memory to recycle, and it is complicated to determine what should be collected.

The method area garbage collection mainly recycles obsolete constants and types that are no longer used. Deprecating constants is similar to recycling objects that are no longer used. However, determining whether a type is obsolete or not is more troublesome, and three conditions need to be met simultaneously:

  1. All instances of this class are recycled, and there are no instances of this class or any of its derived children in the Java heap
  2. The classloader that loaded the class has been reclaimed
  3. The java.lang.Classs object corresponding to this class is not referenced anywhere and cannot be accessed by reflection.

However, if the conditions are met, recycling is only allowed, and the specific recycling is controlled by parameters.

In scenarios where bytecode frameworks such as reflection, dynamic proxy, and CGLib are used extensively, and frequent custom classloaders such as Dynamic JSP generation, the Java VIRTUAL machine is usually required to have type unloading capability to ensure that it does not place too much memory strain on the method area.

What is generational collection?

Most garbage collectors of commercial virtual machines are designed according to the theory of “generational collection”. The theory of generational collection is based on two hypotheses:

  1. Weak generation hypothesis: most objects are ephemeral
  2. The strong generation hypothesis: the more garbage collections an object passes, the less likely it is to die.

Generational collection theory: The collector should divide the Java heap into different regions, and then allocate reclaimed objects to different regions for storage based on their age (the number of times they have survived garbage collection).

  1. If an area is mostly made up of dead objects, then the area can be reclaimed by focusing on the few objects that will survive, rather than marking many objects that need to be reclaimed, thus reclaiming a large amount of space at a low cost.
  2. If a region is mostly hard-to-die objects, it can be collected at a low frequency, with both the time cost of garbage collection and the efficient use of memory space.

There is another problem: cross-generation references. If you do a Minor GC, the objects of the new generation are likely to be referenced by the old generation, and you also need to traverse the entire old generation to ensure that the reachability analysis results are correct. This obviously has a significant impact on performance.

A third hypothesis can be inferred from the first two hypotheses: cross-generational references are rare for same-generation references. Since there are two objects that reference each other, they should tend to live or die at the same time. For example, if an old age object references a new age object, the old age object will not die, and the new age object will not die, and will be promoted to the old age as it grows older.

With this hypothesis, we only need to create a global data structure for the Cenozoic era (Remembered Set), which divides the old era into smaller chunks and identifies which chunks of memory will be referenced across generations. When a Minor GC occurs, only old age objects in a small chunk of memory containing cross-generation references need to be added to GC Roots for reachable analysis.

What are the common garbage collection algorithms? How does each work?

The algorithm atmosphere two stages: mark, clear. All marked objects are marked first, and all marked objects are recycled after marking. You can also mark living objects and uniformly recycle all tagged objects. The marking process is the process of deciding whether an object is garbage. This algorithm has two disadvantages:

  1. Execution efficiency is unstable. The execution time increases as the number of objects increases.
  2. Memory space fragmentation. A large number of discrete memory fragments are left after the flag is cleared. Too much fragmentation can cause another garbage collection action to be triggered early when large memory objects are allocated later because a contiguous chunk of memory is not found.

Mark-copy algorithm Half area copy algorithm: Use only one of two blocks of available memory with the same size. When one block is used up, the remaining objects are copied to another block of memory, and then the block of memory is cleaned up. If it is older, there is a lot of overhead to copy objects. If it is a new generation, it is simple to implement and efficient to run. But the disadvantage is obvious, is not high memory utilization. Most of today’s commercial virtual machines use an evolutionary version of this approach: divide the new generation into a large Eden space and two smaller Survivor Spaces. Only Eden and one Survivor are used for each allocation of memory. When garbage collection occurs, the surviving objects in Eden and Survivor are copied once to another Survivor space, and then cleaned up. HotSpot default size ratio when Eden: Survivor = 8:1. To avoid situations where Survivor is insufficient to accommodate surviving objects, other region memory (old age) is also relied on for allocation guarantees.

Mark-collation algorithm In the old days, the tag copy algorithm was generally not chosen. Because of the high replication overhead, additional allocation guarantees are required. In view of the survivability characteristics of old objects, mark-declutter algorithm appears: the marking process is the same as mark-clean algorithm, but after the marking, all surviving objects are moved to one end of the memory space, and then the memory beyond the boundary is directly cleaned up. However, moving objects is also a very heavy operation, and if not, there are fragmentation space problems, or more complex memory allocators and memory accessors to resolve them. One solution is that most of the time, the marker-clearing algorithm is adopted. When the fragmentation degree of memory space is too large and the allocation of large memory objects is affected, the marker-clearing algorithm is carried out again.

How to solve the problem of cross-generation reference in generation collection theory? What is a memory set? What is a card table? How is the card table maintained?

To solve the problem of cross-generation references, the garbage collector creates data structures called memory sets in the new generation to avoid adding the entire old generation to GC Roots.

A memory set is a data structure that records a collection of Pointers from a non-collection region to a collection region. Considering storage and maintenance costs, it is not necessary to keep the memory set accurate to every pointer. The final choice is card precision: each record is accurate to a small area of memory in which one or more objects contain cross-generation Pointers. This implementation is called a Card Table. The underlying data structure is a byte array. Each element corresponds to a specific size of memory in the memory region it represents. This memory is called a card page. Each card page has multiple objects, and any object that contains a cross-generation pointer is marked as 1, while the others are marked as 0. When garbage collection is done, simply filter out the element marked 1 in the card table and you can easily find the cross-generation Pointers in the card page memory and add them to GC Roots to scan together.

So how is the card table state maintained?

The HotSpot VIRTUAL machine maintains the card table using write barrier technology. Write barriers can be thought of as the VIRTUAL machine level AOP aspect of the “reference type field assignment” operation. A wrap notification is generated when a reference object is assigned, and you can use this feature to maintain the card table.

How can accessibility analysis be correct in a concurrent environment?

In reachability analysis, object graph must be traversed on the basis of consistency snapshot. Otherwise, it is possible to mark dead objects that should otherwise be alive. Such as A marked for death object reference object B is analyzed, the tag B for death, but then B is A has been scanning, marked as object of survival C refer to it, then not again the scanning the surviving C object, so this should live object B will be garbage collected.

To solve the object disappearance problem, there are two solutions.

Incremental update: When a scanned and marked as alive object inserts a new reference to a marked as dead object, the reference is recorded. After the concurrent scan is completed, the alive object in the recorded reference relationship is scanned again as root.

Raw snapshot: An object that has been accessed but has not been fully determined to be alive (not all references have been traversed). If the evaluator wants to remove references to objects that have not yet been scanned, it records this for the time being and scans the object again at the root after the scan is complete.

What are the common garbage collectors? How does it all work?

Serial, the most basic and oldest collector, uses a mark-copy algorithm. The only choice for early Cenozoic collectors. A single thread works, and when garbage collection occurs, all other worker threads must be suspended until the collection is complete. But it has the advantage of being simple, efficient, and consuming the least extra memory of any garbage collector, and is the default new generation collector running in client mode. In addition, for single-core processors, single thread does not have the overhead of thread switching, so collection efficiency is higher. It is good for running in client mode (desktop applications). For the new generation of small memory, garbage collection pauses can be controlled in the tens to tens of milliseconds.

Serial Old collector An older version of the Serial collector. Single thread. Use a mark-de-clutter algorithm. It is used by VMS in client mode. It is also used on the server side, as part of the Parallel Scavenge collector prior to JDK5 and as a backup collector for CMS to collect Concurrent Mode failures.

The multithreaded version of the Serial collector is clearly superior to the Serial collector for multi-core processors.

Parallel collector A new generation collector using the mark-copy algorithm. Multithreading. The focus is on achieving a manageable throughput, also known as a throughput first collector. Some parameters can be set to automatically set the appropriate size of the new generation, the ratio between Eden and Survivor area, the size of the promotion old age object and other parameters according to the operation of the system, so as to achieve the appropriate pause time or the maximum throughput (adaptive adjustment). This pattern is a good choice if the user has difficulty tuning the collector manually.

Parallel Od collector Paralllel Paralllel Paralllel collector paralllel collector paralllel collector paralllel collector paralllel collector parallleL collector Multithreading. Mark-collation algorithm. Also focus on throughput.

CMS collector to the shortest recovery pause time as the goal, the system pause time as short as possible to give users the best interactive experience. The collection process is divided into four steps: 1. Initial tag -> 2. Concurrent tag -> 3. Re-mark -> 4. Concurrent cleanup. Initial marking and re-marking need to Stop The World. The initial tag just marks objects that GC Roots can be directly associated with, which is fast. Concurrent marking is the process of traversing the entire object graph, starting with objects directly associated with GC Roots. Relabelling is to correct a portion of the object that changes the markup (incremental updates) as the user thread continues to run during concurrent tagging, with a slightly longer pause. Finally, there is the concurrent cleanup phase. But there are three obvious disadvantages:

  1. The CMS collector is very sensitive to processor resources. It takes up a portion of the CPU’s computing power, resulting in a decrease in overall throughput.
  2. Failure to collect floating garbage can result in a Full GC. Concurrent marking and concurrent clearing phase, the system is still running normally, so you need to set aside a portion of memory for the system to use. Concurrent Mode Failure occurs when the reserved memory cannot meet the requirements of the program for newly allocated memory. At this point, the virtual machine enables backup, freezing the user thread and temporarily enabling the Serial Old collector to redo the Old garbage collection. This will lead to a longer pause.
  3. Because of the mark-clear algorithm, a large amount of debris space is created at the end of the collection. Sometimes Full GC leaves early.

Garbage First Collector G1 collector. Milestone. Pioneered the idea of local collection oriented and region-based memory layout. Achieve the highest throughput possible with controllable latency. The G1 collector divides the continuous Java heap into independent regions of equal size, and each Region can act as the Eden space of the new generation, Survivor space, or old space as needed. There is also a Humongous region, which is used to store large objects, basically equivalent to the old age. The G1 collector tracks the value of garbage in each Region, that is, the amount of space it takes to collect and the time it takes to collect. The system maintains a priority list based on the value. The Region with the highest value is preferentially reclaimed based on the user-defined pause time. This ensures that G1 can achieve the highest collection efficiency in the limited time. Each Region maintains its own memory set to solve cross-region reference problems. Therefore, it takes up more memory (10% to 20% of heap memory). Unlike CMS, which uses the incremental update algorithm for concurrent collection, G1 uses the original snapshot algorithm. Collection process:

  • Initial tag: Marks objects to which GC Roots can be directly associated
  • Concurrency marker: Concurrency for reachability analysis
  • Final markup: A short pause to process a small number of objects left over at the end of the concurrent markup (original snapshot)
  • Filter collection: Update Region statistics, sort by value and collection cost models, and make a collection plan based on the expected downtime of users. Select multiple regions to form a collection, copy the surviving objects to an empty Region, and clear the entire old Region space. The process of moving an object must suspend the user thread. And executed concurrently by multiple collector threads.

G1 vs CMS Based on experience, CMS below 6-8G is better, while G1 above is better. In the future, THE G1 will leapfrog CMS. The G1 has a higher memory footprint and processor load than CMS. And it cannot completely replace the existence of CMS.

Collector contrast

The collector parallel Applicable area algorithm The target Applicable scenario
Serial serial The new generation Replication algorithm Speed of response priority Client mode in single-CPU mode
Serial Old serial The old s Mark-tidy Speed of response priority Client mode /CMS backup plan in single-CPU mode
ParNew parallel The new generation Replication algorithm Speed of response priority The Server mode works with the CMS in multi-CPU mode
Parallel Scavenge parallel The new generation Replication algorithm Throughput priority In the background, but less interaction
Parallel Old parallel The old s Mark-tidy Throughput priority In the background, but less interaction
CMS concurrent The old s Marked – clear Speed of response priority Java applications on Internet websites or B/S system servers
G1 concurrent both Mark – Tidy + copy Speed of response priority Server oriented applications, replacing CMS

Common collector combinations:

  1. Serial + Serial Old implements single-threaded low-latency garbage collection
  2. ParNew + CMS implements multi-threaded low latency garbage collection
  3. Parallel insane + Parallel insane implement multi-threaded high throughput garbage recycling

How do I choose the right garbage collector

More attempts should be made according to the actual situation. The guiding principles are as follows:

  1. If the system is throughput first and CPU resources are used to maximize processing, Parallel GC is used
  2. If your system is considering low latency, keep each GC as short as possible and use CMS GC
  3. If the system heap memory is large and you want the average GC time to be manageable overall, use G1 GC

Memory considerations:

  1. Above 4G, using G1 GC is relatively cost-effective
  2. The G1 GC is highly recommended for more than 8GB of memory and 16-64GB of memory

What is the default garbage collector for each JDK version?

Java8 was a Parallel GC, and Java9 has been changed to G1 GC.

How does class loading work? What was done at each stage?

Class loading goes through five stages: loading, validation, preparation, parsing, and initialization.

1. The load

  1. Gets the binary byte stream that defines a class by its permission name. Raises a NoClassDefFoundError if it cannot be found
  2. Transform the static storage structure represented by this byte stream into the runtime data structure of the method area
  3. Generate an in-memory java.lang.Class object representing each Class as an access point to the Class’s various data in the method area.

2. Verify

  1. File format. Check whether the magic number starts with 0xCAFEBABE and whether the VM version is suitable
  2. Metadata. Compliance with Java language specifications
  3. Bytecode. Semantic validity
  4. Symbolic reference. Whether access to required external classes, methods, and so on is missing or blocked.

(VerifyError ClassFormatError UnsupportedClassVersionError)

3. Prepare to allocate memory and set initial values for variables (static variables) defined in the class.

4. Resolve to replace symbolic references in the constant pool with direct references.

5. Initialize the execution class constructor () method. Generated by the compiler’s automatic collection of all class variables in the class assignment action and statements in a static statement block. (Note the sequential collection; statements in static blocks cannot access variables defined after the block.) The JVM explicitly states that class initialization must be performed on the first active use of a class

What is the parental delegation model? What are the class loaders in Java? What are their roles?

The parental delegation model works like this: if a class loader receives a request to load a class, it does not try to load it itself, but delegates it to the parent class loader first. So all class loading requests are sent to the top class loader. Only if the parent loader cannot complete the load request (it was not found) does the child loader attempt to complete it. (ClassNotFundException)

The advantage of this is that classes have a hierarchy of precedence, such as the Object class in Java, which is only loaded by the top start class loader. Developers can’t write a new Object class to replace it, on the one hand mina keeps the program safe.

From the virtual machine’s point of view, there are only two different class loaders: the boot class loader and other class loaders.

Startup class loader: implemented by C++ and part of the virtual machine itself. Is responsible for loading classes that are stored in the

/lib directory, or in the path specified by the -xbootclasspath parameter, and are recognized by the Java virtual machine (by name, such as rt.jar, not loaded). Cannot be referenced directly by Java programs. If you need to delegate a custom class loader to the start class loader, use NULL instead.

Extension class loader: responsible for loading classes in the

/lib/ext directory. Or all class libraries in the path specified by the java.ext.dirs system variable.

Application class loader: Is responsible for loading all the class libraries on the classpath. Through this. GetSystemClassLoader () to obtain the application class loader. If a custom class loader is not used, all user-defined classes are loaded.

Custom class loaders

How does virtual machine lock optimization work? What types of locks are available?

Adaptive spin while it avoids the overhead of thread switching, spinning for too long can consume processor resources and result in wasted performance. JDK6 optimizes spin locking to introduce adaptive spin. Adaptive means that the spin time is no longer a fixed time, but is determined by the previous spin time on the same lock and the state of the lock owner. If the previous thread on the same lock has just spun successfully, and the thread holding the lock is running, then the virtual machine assumes that it is likely to spin again for a relatively long time, allowing a relatively long spin. If a lock spin rarely succeeds, it is possible to simply ignore the spin process when trying to acquire the lock later.

The lock elimination virtual machine removes locks that are detected to be impossible for data contention, even if the compiler is running on some code that requires synchronization. The main determinant is escape analysis.

As a rule of lock coarsening, you should write code to limit the scope of synchronized blocks to as small as possible. Keeping the number of operands that need to be synchronized to a minimum allows waiting threads to get the lock as quickly as possible, even if there is a contention. However, if a series of consecutive operations repeatedly lock and unlock the same object, and even if the locking operation is in the loop body, frequent mutex synchronization can cause unnecessary performance loss, even if there is no thread contention. So the virtual machine will expand the scope of the lock in this case, such as moving the lock operation inside the circulatory body to outside the circulatory body.

Lightweight locking utilizes the Mark Word implementation of object headers.

To save space, Mark Word stores different information for different objects in different states. Such as hash code, GC generation age and so on. When the object is locked, there are two bits to store the lock flag, and one bit is 0, indicating that the object has not entered the bias mode.

When code is about to enter a synchronization block, if the synchronization object is not locked (the Lock flag is 01), the virtual machine will create a space called Lock Record in the stack frame of the current thread to store a copy of the object’s current Mark Word. Then use CAS to update the Mark Word to a pointer to the Lock Record. If the update succeeds, the thread owns the lock on the object and changes the lock flag bit in the Mark Word to “00” to indicate that it is in a lightweight lock state.

If the update fails, another thread has acquired the lock. The current thread enters spin and continues trying to acquire the lightweight lock. If none is obtained after a certain period of time, the lightweight lock is expanded to a heavyweight lock (change the object header information to heavyweight lock: pointer to the heavyweight lock + flag bit 10), and suspend wait.

When lightweight locks are released, CAS is also required to update the saved Mark Word back. If the update succeeds, the synchronization is complete. If the update fails, it means that another thread in the previous step also wants to acquire the lock, ballooning the lock to a heavyweight lock. So you need to wake up the suspended thread at the same time you release the lock.

Note that lightweight locks can improve performance based on the experience that “for most locks, there is no contest for the entire synchronization cycle”. CAS avoids the overhead of using mutex. If lock contention is present most of the time, then the cost of the CAS operation is even higher, in addition to the cost of the mutex already required.

Biased locking purpose: to improve performance by eliminating synchronization primitives in uncontested cases. If a thread acquises a bias lock, the thread that holds the bias lock never needs to synchronize again without other threads competing. The bias mode ends as soon as another thread attempts to acquire the lock. If the object is not locked, undo bias (bias bit is set to 0) and revert to unlocked (flag bit 01). If the object is locked, the bias is undone and the light lock is switched (flag bit 00). If most locks are accessed by multiple threads, the bias pattern is actually redundant. Bias locking can be turned off with a parameter.

What is a spin lock? What is adaptive spin?

In most cases, the locked state of shared data does not last very long, and if a thread fails to request a lock, it is not worth suspending/resuming the thread for a short time. Instead of giving up processor execution time, the thread can execute a busy loop (that is, spin) and wait for the thread holding the lock before it. If the lock is not reached after a fixed amount of spin, the thread is suspended.

Adaptive spin while it avoids the overhead of thread switching, spinning for too long can consume processor resources and result in wasted performance. JDK6 optimizes spin locking to introduce adaptive spin. Adaptive means that the spin time is no longer a fixed time, but is determined by the previous spin time on the same lock and the state of the lock owner. If the previous thread on the same lock has just spun successfully, and the thread holding the lock is running, then the virtual machine assumes that it is likely to spin again for a relatively long time, allowing a relatively long spin. If a lock spin rarely succeeds, it is possible to simply ignore the spin process when trying to acquire the lock later.

What is escape analysis? What’s the use?

The most basic principle of escape analysis is to analyze object dynamic scope. When an object is defined in a method, it may be referenced by an external method, for example, as a call parameter to another method. This is called method escape. It can also be accessed by an external thread, which is called thread escape. Non-escape, method escape, thread escape, address object from low to high different escape degree.

If you can prove that an object cannot escape from a method or thread (that is, no other method or thread can access the object in any way), or if the escape degree is low (only the method escapes, but not the thread), you can perform different levels of optimization:

  1. Stack allocation: Allocating memory on the heap is a resource-intensive operation for both reclamation and collation. If you are certain that an object will not escape the thread, it is a good idea to have that object allocate memory on the stack. The memory space occupied by the object can be destroyed as the frame goes off the stack. This puts a lot less pressure on the garbage collector.
  2. Scalar substitution: Data that can no longer be decomposed into smaller pieces of data to display, such as raw data types, reference types, etc. in the Java virtual machine, is called a scalar. And Java objects are the opposite of scalars, aggregates. According to the program access situation, the enabled member variable, to restore the original type to access, this process is called scalar replacement. If escape analysis proves that an object cannot be accessed from outside the method and that the object can be disassembled, the program may not actually create the object, but instead create its member variables that are used by the method. It can create conditions for further optimization.
  3. Sync elimination: Thread synchronization is a heavy operation. If escape analysis proves that a variable cannot escape from the thread, then there must be no contention between reads and writes on that variable, and synchronization on that variable can be safely eliminated.

Follow my blog

trzoey.github.io/blog-prik/