preface

At the end of the year, as expected, it was not peaceful. Recently, we received an operation and maintenance alarm: it indicated that some servers were under very high load, and let us locate the problem.

Also really think what come what, a few days ago also intentionally increase the load of some servers (yes, the boss asked me to write a BUG!) Fortunately, the different environments did not affect each other.

Location problem

When I got the problem, I first went to the server and found that only our Java application was running. So first use ps command to get the application PID.

The thread of the process is then displayed using the top-HP PID. Typing a capital P can sort threads in proportion to CPU usage, resulting in the following result.

Sure enough, some threads have very high CPU utilization.

To locate the problem I immediately dumped the thread stack into a log file using jStack pid > pid.log.

I randomly select a pid=194283 from 100% of the threads above and convert it to hexadecimal (2f6eb) and query it in the thread snapshot:

Because thread ids in thread snapshots are stored in hexadecimal.

Disruptor stack (Disruptor queue) Disruptor stack (Disruptor queue) Disruptor stack (Disruptor queue)

I didn’t expect it to happen again.

In order to have a more intuitive view of the thread status information, I uploaded the snapshot information to a special analysis platform.

fastthread.io/

There is a menu that shows all the CPU consuming threads, and I look closely and see that they are almost identical to the stack above.

That is, both stacks of Disruptor queues and both execute the java.lang.thread. yield function.

It is well known that the yield function causes the current thread to give up CPU resources and let other threads compete.

According to the thread snapshot, there are about 30 threads in the RUNNABLE state and all performing yield functions.

Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor Disruptor

To solve the problem

I then looked at the code and found that there were two Disruptor queues internally decoupled for each business scenario.

Assuming there are now seven business types, this is equivalent to creating 2*7=14 Disruptor queues with one consumer per queue, for a total of 14 consumers (more in production).

We also discovered that the configured consumption wait strategy is a YieldingWaitStrategy that does perform yield to yield cpus.

The code is as follows:

Initially, this waiting strategy has a lot to do with it.

Local simulation

To verify, I created 15 locally Disruptor queues and monitored CPU usage.

Fifteen Disruptor queues were created and each queue sent 100W data to and from the Disruptor queue using the thread pool.

The consumer program simply prints.

The CPU usage is really high after running for a while.

The dump discovery and production phenomena are the same: the consuming threads are both RUNNABLE and yield.

Disruptor Official Disruptor documents:

The YieldingWaitStrategy is a strategy for squeezing the CPU to the limit, using spin + yield to improve performance. This policy is recommended when the number of Event Handler Threads is smaller than the number of CPU cores.

The other waiting strategy BlockingWaitStrategy (also the default strategy) uses a locking mechanism and has low CPU usage.

Therefore, the waiting strategy is changed to BlockingWaitStrategy under the same conditions as before.

Compared with the previous CPU, it will be found that the usage of the latter will be significantly reduced; After the thread is dumped, most of the threads are in waiting state.

Optimal solution

It looks like changing the wait strategy to BlockingWaitStrategy can slow CPU usage,

Note that the YieldingWaitStrategy description states that it is recommended when the number of Consuming Event Handler Threads is smaller than the number of CPU cores.

In the current usage scenario, it is clear that the number of consuming threads has significantly exceeded the number of core cpus. Since I use a Disruptor queue as a single consumer, I have adjusted the queue to a single Disruptor and try again (again the YieldingWaitStrategy).

After running for a minute, I found that THE CPU utilization was stable and not high.

conclusion

So the investigation can have a conclusion, want to fundamentally solve this problem need to split our existing business; Disruptor queues are now used for N services in a single application, each using several Disruptor queues.

Running on a single server, CPU resources are shared, which leads to high CPU utilization.

So our adjustment method is as follows:

To quickly alleviate this problem, change the wait strategy toBlockingWaitStrategy, which can effectively reduce the CPU usage (and is acceptable in the business).
The second step is to split the application (the one simulated above)DisruptorQueue), one application processes one type of business; And then deploy them separately, so they can be isolated from each other.

There are other optimizations as well, because this is an old system and the dump thread found 800+ threads created.

The thread pool is created in the same way as the number of core threads and the maximum number of threads, so that some idle threads are not collected. There will be a lot of pointless resource consumption.

Therefore, we will adjust the way of creating the thread pool according to the business, and reduce the number of threads to make the best use of things.

The demo code has been uploaded to GitHub:

Github.com/crossoverJi…

Your likes and shares are the biggest support for me

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

One production CPU 100% check optimization practice

preface

Location problem

To solve the problem

Local simulation

Optimal solution

conclusion

One production CPU 100% check optimization practice

preface

Location problem

To solve the problem

Local simulation

Optimal solution

conclusion

Related Posts

Multi-kernel threads + multi-coroutines, and Golang’s GMP model

Talk about Golang’s Zap buffer

10 great JavaScript libraries to improve Web development efficiency