I++ caused the CPU core to strike, just use atomic operations to solve the problem, did not expect the cache again

Introduction to review

While chatting with xiao Hei, the old K of our workshop suddenly appeared at the door.

“Ah Q, there you are, let me find you easily, hurry back, the tiger in the no.2 workshop next door said we changed their data, came to make trouble…”

I++ is a problem

“Ah Q, please hurry back. The tiger in no.2 workshop next door said that we changed their data and came to make trouble.”

Due to the sudden appearance of old K, I had to end the communication with xiaohei in advance and rushed back to CPU no.1 workshop.

When I came back, the tiger shouted at me, “What’s wrong with you? Just a few nanoseconds, you changed the data to me, you tell me how to do it!”

I listened to the daze, repeatedly said: “Tiger you don’t worry, I just came back, in the end what happened, let me understand the first good?”

Then Old K told me what had happened. It turns out that the threads in both of our CPU workshops were executing an I ++ operation, and we both added the value of I to our cache without notifying each other. We added the value twice, but the result was only one, and there was an inconsistency problem.

Atomic operation

After understanding what had happened, I said to Huzi, “Everyone is executing the same code. It’s not our fault.”

“Why don’t you blame you, we took a step ahead of you to find the memory I, that you have to wait for us to finish the use of ah, do not believe you can call the memory guy, see if it is not our second workshop came first.”

“Good good, you calm down first, you see we didn’t know you went to take it first, isn’t that understandable, and now that things have come out, we should sit down together and figure out a way to avoid this kind of problem in the future, don’t you think?”

The tiger sighed and asked, “What can you do?”

I said, “look, we shouldn’t be disturbed when we’re doing something like i++.”

“Undisturbed?”

“Yes, for example, when you visit I in workshop 2, we can’t visit I in workshop 1. We need to wait and come back after your visit. It’s very simple but very useful.”

The tiger listened to a leng, “isn’t that locking? You want to blame programmers do i++ before the lock?”

“It is indeed lock, but this simple operation also need programmers to lock that is too much trouble, we CPU internal processing well on the line.”

“Internal processing, how are you going to do that?” “Asked the tiger.

“This, let me think…”, tiger son asked the concrete implementation, I have not thought of this step.

At this time, the old K stood out and said, “I have an idea. I can find the bus director. He is responsible for coordinating the use of system bus in each workshop to access memory.

Then we went to the bus director, and we worked out a solution: We define something called atomic operation, which means that this is an unseparable action. Whoever wants to perform an atomic operation, the bus director will add a LOCK# signal to the system bus, and the other shop members who want to access the memory will have to wait until the atomic operation instruction is completed.

We reported this plan to the leader and approved it soon. Later, our eight workshops all worked according to this plan. After the programmers changed the i++ action into atomic operation, the problem could be solved easily.

However, after a period of time, each workshop began to bring great suffering: because a workshop to perform an atomic operation, let the bus director to lock the system bus, other workshop people can not access the memory, can not work, seriously affect the work efficiency.

Grumbling aside, life will go on until a better alternative emerges.

Caching issues

It didn’t take long, however, for data inconsistencies to emerge again.

This time, it was not the problem of addition. Because of the cache of our two workshops, we successively modified the value of the variable, but the other party did not know in time and misused the wrong value, resulting in a big mistake.

“Ah Q, last time that way is good, but can not solve this time’s problem ah,” the tiger again came to visit.

“You’re just in time. I was just going to talk to you about this.”

“Oh, well, have you figured it out yet?”

“Just some preliminary ideas. The core of the problem lies in the lack of a communication mechanism as each workshop has its own private cache and does not greet each other when updating the memory after modifying the data.”

Tiger nodded, “Indeed, so we need to establish a contact mechanism, to each workshop cache content unified management is it?”

“Rightness! It is not up to us to discuss this matter. I suggest convening a meeting of representatives from the eight core workshops to discuss this issue in detail. Oh, by the way, get the bus director, too. He has a lot of experience and may be able to provide some ideas.”

Cache consistency protocol MESI

Soon, our CPU’s eight core workshops held a meeting on this issue, and achieved very important results.

We set up a new dedicated line to connect 8 core workshops for information communication between workshops, which is different from the bus system outside the CPU, which is called the on-chip bus.

A new line was laid so that people could communicate instantly, and a set of rules called the Cache Consistency Protocol was developed to address the problem.

The rule specifies the cache unit for all workshops — the cache row has four states:

Modified (M)

The cache row has been modified to a different value from the memory. If another CPU kernel reads the memory block, it writes the cache line back to main memory before it does, making the state shared (S).

Exclusive Exclusive (E)

Cached rows are only in the current CPU core cache and are the same as in-memory data. When other CPU cores read it, the state becomes shared; If the current CPU core modifies it, it changes to the modified state.

Shared Shared (S)

Cache rows exist in caches of multiple CPU cores and are consistent with the contents in memory.

Invalid Invalid (I)

The cache line is invalid

The transition between the four states looks like this:

According to the rules, people can no longer be as random as before, each shop to read and write to their own cache, to avoid using outdated data.

In addition, if a memory area is cached by more than one workshop, it is no longer allowed to modify the cache at the same time.

Another result of the meeting was that the problem of locking the bus for every atomic operation was solved. The bus director will no longer need to lock the bus through the cache consistency protocol.

Since then, the problem of data inconsistencies has been eradicated and the eight workshops are working happily again.

Follow me for more great posts

I++ caused the CPU core to strike, just use atomic operations to solve the problem, did not expect the cache again

Introduction to review

I++ is a problem

Atomic operation

Caching issues

Cache consistency protocol MESI

Related Posts

Talk about the difference between heap and stack in the JVM!

Elasticsearch Java index management tutorial

How to implement LRU cache in TLB multiplex group associative cache design?