Check whether the Mysql database suddenly becomes slow

Check whether Mysql suddenly becomes slow

Last week, the client said that the system suddenly became very slow, and a 404 and 500 popped out from time to time, which really made me lose face. When the problem happened, I was on a business trip in Shenzhen, so I had no time

Look at the problem, until today, can be considered to find the cause of the problem.


Location problem

At first, I got feedback from the system that was slow. I didn’t locate the problem point to the database and checked whether the service was normal for a long time (because there was a Dubbo memory leak before).

After a review of the application service logs, there are no exceptions, just a few warning logs.

Check the logs of the service running time, and see that the logs indicate a Lock Wait timeout exceeded. The try Restarting transaction is abnormal.

At this point, there is still no focus on the database, believing that the code is responsible for the transaction never committing.

Lock wait timeout exceeded; Lock wait timeout exceeded; More and more logs are generated for the try Restarting transaction.

Think is the database level out of the problem, began to check the database.


Look for reason

Since our database is not a cloud RDS version, it is an installed version on an 8-core 32GB AWS.

Using the top command, you can view that the CPU usage of Mysql is as high as 90%.

In the heart a panic, feel not good, such a high load of CPU utilization, not good server will be down.

So I took out the only basic knowledge of Mysql, basically this time only use the following statements:

  • Check all Mysql processes

show processlist;

  • Check the maximum cache of Mysql

show global variables like "global max_allowed_packet"

  • View transactions currently in progress

select * from information_schema.INNODB_TRX

  • Check the current number of Mysql connections

show status like 'thread%'


To solve

Follow the statements above to track down step by step.

show processlist; Next, we can look at all the current processes, and get the most time-consuming process.

In the current database, we saw that there were a lot of SQL in Sleep state, which was a significant cause of high CPU usage. There were too many Sleep threads, so we configured a solution with wait_time_out of 600 seconds.

The reason for setting 600 seconds is that the maximum timeout period is 600 seconds or 10 minutes. The configuration depends on the service.

select * from information_schema.INNODB_TRX

I was surprised to see that most of the SQL locked in Mysql was an update SQL query, and there was also a single condition SQL query that took 4 minutes.

So I looked at this chart.

As soon as I opened the structure, I almost couldn’t resist spitting fragrance in my mouth. Unexpectedly, there was no index, the amount of data exceeded 300W, and no index query basically took 4 minutes to go up.

He was ready to add the index, and after a long wait, the index was finally added.

show status like 'thread%'

After adding index, check the current Mysql connection count, it seems not as high as before, guess it is squeezed too much.

Then I looked at the CPU usage of the server. This time it was a little better, bouncing from 1% to 80%, not as high as 90&.


conclusion

Mysql as a very frequently used database, for its SQL tuning is really a technical work, and some SQL in the project is also want to vomit, this kind of tuning is really difficult.

In fact, the Mysql logs at information_SCHEMA look much better than the business logs.