In the morning, I saw a post introducing the content of “pseudo-sharing”. At the same time, I thought about the topic of “pseudo-sharing” introduced in JAVA VIRTUAL Machine principles, and recorded my knowledge and ruminated it.

False Sharing

There is also the concept of CPU cache between CPU and memory. CPU cache is divided into L1, L2 and L3. The smaller the level number is, the smaller the capacity is, and the closer it is to the CPU, the faster the access speed is. L1 and L2 are integrated on the CPU, L3 is integrated on the motherboard, and the stolen picture is shown as follows:The CPU Cache is the smallest unit of data, the Cache line is an integer power of 2 consecutive bytes, the mainstream size is 64 bytes. If multiple variables are on the same cache row and are modified simultaneously in a concurrent environment, because write barriers and memory conformance agreements result in only one thread operating on the cache row at a time, resulting in performance degradation due to contention, this is called “pseudo-sharing.” “Pseudo-sharing” is a low-level detail in high concurrency scenarios.

“Pseudo-sharing” in the JVM

During GC, it is necessary to conduct reacability analysis to determine whether an instance needs tag collection. JVM abstracts two data structures OopMap and Remember Set to optimize the time cost of traversing GC Roots. OopMap records the reference relationship of common object Pointers. Memory sets record cross-generation reference relationships, and “pseudo-sharing” in the HotSport VIRTUAL machine occurs on memory sets.

A data record of a memory set has three data accuracies:

  • Word length accuracy: Each record is accurate to a machine word length (the addressing length of the processor, 32 or 64 bits) in which cross-generation reference Pointers are recorded;
  • Precision of object: Each record is accurate to an object, which has fields holding cross-generation reference Pointers;
  • Precision card page: Each record is accurate to a memory region, which is represented by a card page, and there is a cross-generation reference pointer in the memory region identified by the card page (it can also be understood that there is at least one field in the region that holds the cross-generation reference pointer);

The memory set implemented with the accuracy of card page is called “card table”, and “card table” is the most mainstream memory set implementation method. In the Hotspot virtual machine, one element of the card table is one byte per card page, and 64 card pages can be stored in a cache line. But a card page identifies 512 bytes of memory, so a cache row of the card table covers 64*512=32KB of JVM memory; When cross-generation references occur in this 32KB region, you need to identify the presence of cross-generation references on one of the 64 card pages, resulting in a “pseudo-sharing” problem on this cache row. The Hotspot virtual machine after JDK1.7 provides the “+XX UseCondCardMark” parameter to solve this problem. Each time the card page status is identified, it determines whether the card page has been identified and has cross-generation references, avoiding duplicate identification and further reducing the frequency of “fake sharing”. So optimizing rather than solving.

if(card_table[this_address>>9]! =0) //2 to the ninth power =512
   card_table[this_address>>9] = 0
Copy the code

“Pseudo-sharing” in high Concurrency Scenarios

Combining the concept with instances of “pseudo-sharing” in the JVM, it’s easy to imagine the impact in high-concurrency scenarios. For example, two variables a and B belong to the same cache line, and there are two lines T1 and T2 that need to modify A and B, respectively. First, the cache row is the minimum unit of data, so the two threads compete, and the cache is separated by a write barrier. To complicate things a little bit, if T1 and T2 are on two different cpus, then both cpus have the same cache row containing a and B. After t1 changes a, due to the memory consistency protocol, the cache row of T2 will be invalidated, and the data must be updated from main memory again through L3->L2->L1. This kind of high frequency and repeated operation will inevitably drag down the concurrent performance and throughput of the application, mainly reflected in the concurrent modification of variables. Through the locking mechanism, we can guarantee the atomicity and consistency of data, but the time cost can be optimized by understanding the underlying layer.

How to Optimize “Fake Sharing”

The memory management features of C and C++ should give more consideration to “pseudo-sharing,” a concurrency optimization detail worth mastering in the JAVA context. This can be done in JAVA with byte alignment and @Contended annotations.

Byte alignment

Byte alignment refers to variable objects that are prone to “pseudo-sharing” in concurrent scenarios, aligned according to the size of cached rows. For example, if a variable takes 32 bytes, you define additional fields (such as four long variable fields) to complete the entire cache line. It is more reasonable to complete the byte completion before and after the variable, and to complete it at twice the cache line size, i.e. 128 bytes referring to JEP-142. Since cached rows also have a mechanism for preloading, padding by front and back and 2x size optimizes the negative mechanism of preloading.

@ Contended annotations

Sun.misc.Contended is a annotation in jdk1.8. For details, see jep-142. This annotation can be used on fields and classes to automatically help us with byte completion. 1.8 Above recommended to use this method, it is more robust than our own implementation of the completion, to get used to standing on the shoulders of giants. The value of a Contended annotation can be used to define groups. For example, two fields in an object are modified with @contended. To avoid contiguous cache lines in which the two fields reside, you can specify different values to separate them.

Tips: @Contended annotations need to be used in conjunction with JVM configurations. -xx :-RestrictContended is enabled by default, and if annotations don’t work, check the JDK version and JVM parameters.

Thinking jumping

“Preloaded” keywords trigger associations

Preloading is associated with the prefetch mechanism in the four features of Innodb engine, which is divided into linear prefetch and random prefetch. Linear prefetch occurs between extents, and Innodb_read_ahead_threshold defines the threshold for triggering the ratio. Random prefetch is to load the remaining pages within the extent. Random prefetch increases the uncertainty, which is very dangerous in high concurrency scenarios. After 5.5, random prefetch is gradually abandoned.

The “padding” keyword triggers the association

The memory storage structure of the object instance data of a class can be divided into three parts: header, instance data, and padding. The padding is not for memory alignment of the cache line, but because Hotspot VIRTUAL Machine memory management requires that the object’s starting address be an integer multiple of 8 bytes, and headers are designed to do just that. Therefore, when the instance data does not meet this requirement, it is automatically filled in an integer multiple of 8 bytes.

reference

  • [1] Real Byte Side two: What is a pseudo-share
  • [2] JEP-142: Reduce cache contention for specific fields
  • [3] The @Contended annotation for JAVA8