“This article has participated in the call for good writing activities, click to view: the back end, the big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!”
Locality principle
When programs access data, they tend to cluster together in a contiguous area, which is known as the locality principle.
In terms of time and space, it is divided into two categories:
Time locality: If a piece of data is being accessed, it is likely to be accessed again in the near future.
Spatial locality: If data at one location is accessed, then data near that location is likely to be accessed.
There are specific implementations of both THE CPU and the operating system for the locality principle.
This paper mainly summarizes the influence and significance of CPU and operating system locality principle in Java backend.
CPU space locality
The following figure shows the Java memory model
We know that the CPU has three levels of cache L1, L2, and L3 to improve the performance of reading data from memory.
The CPU uses the locality principle to read data items from the memory to the cache, and also reads data blocks near the memory into the cache. This process is called prefetch, or sharing.
Cache line
The CPU cache is composed of many cache lines, each of which is usually 64 bytes.
Each time the CPU pulls data from main memory, adjacent data of the target data is stored in the same cache line.
That is, the performance of reading contiguous memory is better than random memory access, which can be proved by Java programs.
public static void main(String[] args) {
int[][] arr = new int[10000] [10000];
int sum = 0;
long startTime = System.currentTimeMillis();
for (int i = 0; i < arr.length; i++) {
for (int j = 0; j < arr[0].length; j++) {
sum += arr[i][j];
}
}
System.out.println(Array sequential access time: + (System.currentTimeMillis() - startTime) + "ms");
sum = 0;
startTime = System.currentTimeMillis();
for (int i = 0; i < arr.length; i++) {
for (int j = 0; j < arr[0].length; j++) {
sum += arr[j][i];
}
}
System.out.println(Non-sequential array access time: + (System.currentTimeMillis() - startTime) + "ms");
}
Copy the code
This is a section of the two-dimensional array loop read code.
The first half of the program is read sequentially at the beginning of the second dimension of the array, that is, the two-dimensional array is accessed in sequential memory contiguous space row by row.
The lower half is read in columns by the first dimension of the array, not sequentially.
After 10,000 * 10,000 array accesses, the results are as follows:
Thus, sequential access to memory performs better than random access.
Pseudo sharing of cache rows
Take ArrayBlockingQueue for example:
ArrayBlockingQueue has three member variables:
- TakeIndex: Index of the element to be removed
- PutIndex: The index of the position at which the element can be inserted
- Count: number of elements in the queue
These three variables can easily be placed in a cache line, but there is not much correlation between their changes. Therefore, each change invalidates the previously cached data, so that the shared data cannot be fully achieved.
This inability to fully utilize the cached row feature is called pseudo-sharing.
There are two ways to solve fake sharing:
- Padding: Space change time. Data is filled so that frequently changed data is not in the same cache line
- @Contented annotation: This annotation in Java tells the JVM that fields should be placed in a different cache line
Disk space locality
In Java daily development, many middleware need to deal with disk files, and the high-performance access of disk data also relies on the principle of locality, such as:
- MySql log file
- MQ message data
We know that MySql data is ultimately saved on disk. In order to reduce disk I/O and improve performance, InnoDB engine relies on BufferPool+redo log mechanism to improve MySql read and write performance (see MySql principle summary for details). The redo log, undo log, and binlog cannot avoid disk I/O. Therefore, the PageCache mechanism of the operating system is used to read and write disk data sequentially, which makes the DISK I/O performance close to the memory performance.
We often talk about Kafka and rocketMQ as high-performance messaging-oriented middleware, and part of that performance lies in sequential reads and writes to disk files. For example, sequential writes to commit logs, sequential reads and writes to partitions in Kafka, and messages in consumerQueue in rockerMQ. Also use the PageCache mechanism of the operating system.
PageCache
The PageCache is the OS’s cache of files to speed up reading and writing to files. Generally speaking, the sequential read and write speed of the program is almost close to the read and write speed of the memory. The main reason is that the OS uses the PageCache mechanism to optimize the performance of the read and write access operation, and uses part of the memory for PageCache.
The OS first writes data to the Cache and then asynchronously flusits the data from the Cache to physical disks by the PDFlush kernel thread.
If PageCache is not matched during a file reading, the OS prereads data files of other adjacent blocks in sequence when the file is accessed from the physical disk.
And PageCache is the implementation of locality principle.
Time locality
Time locality may be more evident in our daily business development.
Similar LRU cache is its concrete implementation.
In addition, CPU instruction reordering is also related to the point, such as the access to a data calculation, the data related instructions are prioritized together.
reference
Zhihu: How to understand the principle of locality in operating system
GitHub: RocketMQ design document
Disruptor Meituan Technical Team: High-performance queue Disruptor