Hello, I am Xiao Huang, a Java development engineer of Unicorn Enterprise. Thank you for meeting us in the vast sea of people. As the saying goes: When your talent and ability are not enough to support your dream, please calm down and learn. I hope you can study with me and work hard together to realize your own dream.
One, the introduction
For Java developers, the underlying knowledge is generally used as a black box, without having to open the black box.
But with the current development of the programmer industry, it’s important to open this black box and explore its secrets.
This series of articles will take you through the mysteries of the underlying black box.
Ii. Recommendation of related books
The principle of reading: look at the big picture without understanding
If you go into Lushan mountain and, without saying a word, squat down and bend down, you will study a tree or a blade of grass without saying that you have studied the whole context of Lushan Mountain clearly, then your learning method will be extremely inefficient and painful.
The most important thing is to slowly hit your enthusiasm, say my learning how so not happy ah, how so boring that, because your learning method is wrong, generally read understand, first use, use with, a lot of truth you will understand.
- Coding: The Language Behind Computer Hardware and Software
- In-depth Understanding of Computer Systems (not recommended)
- “Introduction to Algorithms”, “Java Data Structures and Algorithms”, “Sword Offer”
- 30 Days of Homemade Operating System
- TCP/IP Details Volume 1
- Principles of Compilation
3. Hardware basics
1. The production process of CPU
How is the CPU made?
I believe everyone will be such a question mark, today to tell you that all cpus are from: sand
For the CPU production process, here is an article, we are interested in can see: how is the CPU made
3. To look directly at generalizations without interest:
- Step one: We provide monocrystalline silicon crystals from sand
- Step 2: Cut the crystal into thin slices to create a wafer
- Step 3: Bombard the wafer with metal particles and then electroplate it
- Step 4: Photolithography is carried out on the wafer to complete the wire interconnection between different transistors
- Step 5: Quality detection, remove poor quality CPU
2, the principle of CPU
The computer’s first problem: how to represent numbers?
The original computer uses light bulbs. When we calculate 0100 + 1010, we use eight light bulbs to represent 0 and 1 by the state of the light bulb, so our first version of the computer has been OK.
- The ENIAC weighs 27 tons and covers 1,800 square feet. It was developed during World War II as a tool to help artillery calculate the trajectory of shells.
The first version of the computer had an ironic flaw.
When we perform high-speed calculation on the computer, the flash frequency of the bulb is relatively high, which may cause damage to the bulb. Therefore, it is necessary for staff to replace the bulb in time, thus affecting the efficiency.
As the first computer, it was enough to shock the world.
At present, most of the computer adopts the transistor way of calculation, using and gate, or gate, not gate, or not gate state to express different calculation methods.
We often hear in daily life, the CPU32 bit, CPU64 bit, the simple difference is: how many bits (bits) to read at a time.
Any of our calculations can be achieved by a logical operation. Let’s look at the following logical operation: 0 && 1 = 0.
First, let’s take a look at the circuit diagram: this is aWith the door
Circuit diagramFor this circuit,A
和 B
As input,Q
As output.
For example, if A inputs low level and B outputs high level, then Q will output low level. When converted to binary, A inputs 0 and B outputs 1, then Q will output 0. The corresponding logical operation expression is 0 && 1 = 0
Here’s a little story about source of bugs: once upon a time there was a man of computer calculation, found the Numbers always is not correct, to find for a long time did not find the reason, then found a hole is a computer worms (BUG) corrosion, in no way to switch from the low level, high level, from now on, our programming error is called: a BUG
3, assembly language implementation process
Let’s think about it. In the circuit above, an event like 0 &&1 = 0 occurs. How does the user know that this event has occurred?
We can’t just take the machine apart. Let’s see the level change inside.
So, here comes assembly language, and assembly language essentially emerges as a mnemonic for machine language
For example, we say to the computer, you calculate the operation of 1 + 2 for me, the computer needs to perform the difference between high and low levels and output the calculation result
And our assembly language: MOV, Add, sub……
We can see at a glance the current operating state of the computer
Diagram of the computer:
Let’s take a look at the whole process of computer calculation:
Here are the differences between Java and C:
- C language: the CPU can be compiled directly
- Java language: Need to let
JVM
Translation, to makeCPU
Compilation, that’s exactly what it isJava
The key to cross-platform
Quantum computers
For quantum computers, the world is currently exploring, no results
In our ordinary computers, one bit stands for 1 or 0 and 32 bits can stand for any number of 2^32
And our qubit, the most interesting thing about it, is that it can represent both a 1 and a 0
- One bit: 1, 0
- Two qubits: 00, 01, 10, 11
- Three qubits: 000, 001, 011, 111……
- Thirty-two qubits: one time representation
2 ^ 32
The number of
This may not be intuitive, but let’s look at an example:
Now we have a number that we know ranges from 1 to 2^32. How can we figure that out quickly?
For ordinary bits, only one can be represented at a time, so we need to loop 2^32 times to find the number
For qubits, it can be done using a 32-bit operating system
5. Basic composition of CPU
- PC (Programme Counter) : Address of the current instruction of the program technician
- Registers: Registers that temporarily store data required by the CPU for computing
- ALU (Arithmetic & Logic Unit) : Arithmetic Unit used for Arithmetic
- CU: indicates a Control Unit
- Memory Management Unit (MMU) : memory processing unit
- Cache Cache:
5.1 ALU
The previous CPU was single-core, so there would be only oneRegisters
, ourPC
It will constantly switch to point to a new thread, storing the corresponding data toRegisters
For context switch, it will seriously affect our efficiency.
Nowadays, cpus are usually in multi-core state. When computing, there will be more than two Registers. In this way, our PC does not need to switch frequently, but our ALU can handle the switching of computing.
5.2 register
5.3 the Cache
As can be seen from the above figure, our computer has added a three-level cache for the convenience of data acquisition. For different caches, the length of acquisition time is also different
For multi-core cpus, this is shown below:
- L1 and L2 are stored in different cores
- L3 is stored in the same
CPU
中
5.3.1 Locality principle
To put it simply, when our CPU reads data, it reads the data fast instead of taking a single byte, as shown in the following figure:Current CPU requirementsX
For this target value, the steps are as follows:
- Step 1: Look in the register
X
This field - Step 2: Go
L1, L2, L3
The Cache to findX
This field - Step 3: Go to memory, disk, etc
X
This field - Step 4: Once found,
64 bytes that will begin with X
Form a block - Step 5: In
L3, L2, L1
To store this data separately, for the convenience of taking the cache next time - Step 6: Will
X
Write to register for data processing,
5.3.2 MESI Cache Consistency protocol
MESI cache Consistency Protocol (MESI cache Consistency protocol). MESI cache consistency protocol (MESI cache consistency protocol
Each CPU Cache line marks four states
- M (modified) :
Cache line
Yes, the data is modified and inconsistent with the data in memory. The data only exists in the local Cache. - E (exclusive) : the
Cache line
Yes, the data is the same as the data in the memory. The data exists only in the local Cache. - S: shared
Cache line
Yes, the data is consistent with the data in memory, which exists in many caches. - I (invalid) :
Cache line
Is invalid
Intel — Cache rows
- The larger the cache row, the higher the local space efficiency, but the slower the read time
- The smaller the cache row, the lower the local space efficiency, but the faster the read time
- Intel has experimentally specified that the cache row size is:
64 bytes
Bus lock (the bus must be locked if the cache row does not fit)
One of the cache lock implementations is that some data that cannot be cached or that spans more than one cache row must still use a bus lock
How do we test that our guess is correct?
We test two programs
Space is limited, the source code here temporarily not shown, interested can pay attention to the public number, reply: algorithm source code
- First procedure: Numeric changes are in the same cache line
- Second procedure: Numeric changes are not on the same cache line
The above program verifies our conjecture that two threads frequently cache fast in the faster cache, resulting in a longer run time
5.3.3 Cache Line alignment
For some particularly sensitive numbers, there will be high-contention access from threads, and to ensure that pseudo-sharing does not occur, we generally do not require cache line alignment
In short, we don’t want to fetch Y as well as X
Long Cache line padding is used in both JDK7 and Disruptor
public long p1, p2, p3, p4, p5, p6, p7; // cache line padding
private volatile long cursor = INITIAL_CURSOR_VALUE;
public long p8, p9, p10, p11, p12, p13, p14; //cache line padding
Copy the code
This way, when we cache row retrieves, we load the long before or after the cursor into the cache block to avoid row alignment
In our JDK8, we can add @contended to this parameter (which is configurable based on the underlying CPU to make sure that two parameters do not share a cache line), plus -xx: -restrictContended to take effect