As we know, the JVM performs garbage collection by marking all reachable objects and then cleaning up unreachable objects to free up memory. So, how to quickly find all reachable objects?

The simplest and most crude implementation is to scan all objects in the entire heap for all living objects every time garbage collection is done. The logic is simple, but the performance is poor.

Simple and crude implementations are usually undesirable. So how does the JVM implement fast marking of reachable objects?

GC Roots.

GC Roots are the starting point for the garbage collector to look for reachable objects, and through these starting references, live objects can be quickly traversed. GC Roots are most commonly used for static references and locally referenced variables of the stack. However, that’s not the point here 🙂

In modern JVMS, the heap space is typically divided into new generation and old generation. Since Cenozoic garbage collections are often frequent, if the old age objects refer to the new age objects, then all references from the old age to the new age need to be tracked to avoid scanning the entire old age at each YGC and reduce overhead.

For HotSpot JVM, Card Marking technology is used to solve the reference problem from old generation to new generation. Specifically, Card tables and Write barriers are used to mark and speed up GC Roots scanning.

Card Table

Based on the design of Card Table, the heap space is usually divided into a series of 2 power Card pages.

Card Table is used to mark the status of Card pages. Each Card entry corresponds to one Card page.

The HotSpot JVM Card Page is 512 bytes in size and the Card Table is implemented as a simple byte array with 1 byte each tag item in the Card Table.

When an object reference is written (object reference changes), the write barrier logic marks the card page on which the object is located as dirty.

OpenJDK/Oracle 1.6/1.7/1.8 JVM default card markers simplify the logic as follows:

CARD_TABLE [this address >> 9] = 0;Copy the code

First, the card table index number of the card page that the object references is computed. Moving the address 9 bits to the right is the same as dividing the address by 512 (2 to the ninth). It can be understood that the start address of the card page of the fake card table is 0, then the start address of the corresponding card page of the card entries 0, 1 and 2 are 0, 512 and 1024 respectively (the index number of the card entry multiplied by the 512 bytes of the card page).

Second, set the card identifier to dirty based on the card table index number.

This brings two problems

1. Performance overhead caused by unconditional write barriers

Every update to a reference, whether or not it updates a reference from an older generation to a new generation object, performs a write barrier operation. Obviously, this adds some extra overhead. However, this overhead is much lower than when YGC scans the entire old age.

However, in high-concurrency environments, write barriers introduce false sharing issues.

2. Performance overhead caused by virtual sharing with high concurrency

Under high concurrency, frequent write barriers can easily lead to false sharing, resulting in performance overhead.

Assume that the CPU cache row size is 64 bytes. Since one card entry is 1 byte, this means that 64 card entries will share the same cache row.

HotSpot is 512 bytes per card page, so 64*512=32KB per cache row.

If different threads update object references in the same 32KB region, this will cause the same cache row of the card table to be updated at the same time, resulting in cache row write back, invalidation, or synchronization, which indirectly affects program performance.

A simple solution is not to use an unconditional write barrier, but to check the card table markup first and mark the card entry as dirty only if it is not already marked.

This was the workaround introduced in JDK 7, which introduced a new JVM parameter -xx :+UseCondCardMark to do a quick check before performing a write barrier. If the card page has already been identified, it is no longer identified.

Simple understanding is as follows:

if(CARD_TABLE [this address >> 9] ! = 0) CARD_TABLE [this address >> 9] = 0;Copy the code

Compared with the original implementation, it simply adds a judgment operation.

Although -xx :+UseCondCardMark has some judgment overhead, it can avoid concurrent write to card tables that can occur in high concurrency situations. Avoid false sharing by reducing concurrent write operations.

Also used for CMS GC

CMS in the concurrent marking phase, the application thread and the GC thread execute concurrently, so new objects may be generated or object relationships may change, for example:

  • The objects of the new generation rise to the old;
  • Assign objects directly in the old age;
  • The reference relationship of the old object is changed.
  • And so on.

These objects need to be re-marked to prevent them from being missed. In order to improve the efficiency of re-marking, the concurrent marking stage will mark the cards of the changed objects as Dirty, so that the subsequent stages only need to scan the Dirty Card objects, thus avoiding scanning the whole old age.