Case Study 1: The user reports that the system is seriously stuck and the background log shows Broken Pipe, indicating that the DATABASE CPU has soared to 80%

background

At 3 o ‘clock in the afternoon, the user reported that the system was seriously stuck, and the background log said “Broken Pipe”. The database CPU reached 80%. After restarting the service, the same situation would still occur within 10 minutes.

process

At 3:30pm, I connected to the DB of the production environment, executed the show Full processList statement, checked the current thread processing status, located the Command column, found that there was an SQL has been processing Query state, so I took the SQL statement from the info column of this record. I found that I was querying the table at the stage point of the project, and my mind was working fast. The data of this table is estimated to be about 3000 pieces. I remember that the query conditions are indexed, unless a table is locked.

Verify a: query the database lock table situation.

Verification 2: Explain whether the index was not matched.

Verification three: whether the data volume suddenly reaches the single table capacity (5 million).

For validation 1: I personally use three tables from the Information_SCHEMA database: INNODB_TRX, INNODB_LOCKS, and INNODB_LOCK_WAITS.

— All transactions currently running

SELECT * FROM information_schema.INNODB_TRX;

Key field description: TRx_state: transaction status (RUNNING, LOCK WAIT, ROLLING BACK, and research case). Trx_query: THE SQL statement that the transaction is executing. Trx_rows_locked: indicates the number of records that are locked by transactions. Contains rows marked DELETED that have been saved to disk but are not visible to the transaction.

— The current lock

SELECT * FROM information_schema.INNODB_LOCKS;

Key field description: LOCK_TRx_ID: ID of the transaction that has the lock. You can JOIN the INNODB_TRX table to obtain transaction details lock_table: the name of the table that is locked or contains locked records.

— Lock wait relationship

SELECT * FROM information_schema.INNODB_LOCK_WAITS;

Key field description: Requested_lock_ID: ID of the lock that the transaction is waiting for. You can JOIN the INNODB_LOCKS table. Blocking_lock_id: The ID of the lock for a transaction that blocks another transaction. You can JOIN the INNODB_LOCKS table.

No lock table was found after operation. The second guess is to verify whether the index was not hit and whether a full table scan was performed

Explain select * from sys_xM_jd t where t.projname = ‘XXX’ and t.projphase in (‘XXX’)

As I can see from the key column value UK_XM_period that the index has hit, I find that the most unlikely scenario is probably the truth of the online exception. That is, the data volume of the table may have reached more than 5 million. Through verification 3, it is found that the data of the project stage table is 23 million, much larger than the capacity of Mysql single table data

Attached: Explain key field description

  1. Table: The table referenced by the output row

  2. Type: connection type. Common types are: ALL, index, range, ref, eq_ref, const, system, NULL (left to right, poor to good performance), ALL: Full Table Scan, MySQL will traverse the entire Table to find a matching row, no match

  3. Possible_keys: Possible_keys will be listed if there is an index on a column involved in the query, but it may not be used in the query. If the column is NULL, there is no related index. It is recommended to create an appropriate index and check the query again with EXPLAIN

  4. Key: the column that shows the Key (index) MySQL actually decides to use

  5. Rows: indicates the number of rows that the MySQL queries based on the table statistics and index selection. The smaller the number, the better.

At the end

At 16:00 in the afternoon, I found the corresponding R & D and found that the table was a table for the back-end periodic synchronization of stage-point data information. Due to misoperation, the program had an infinite loop for the current data. In just one week, 20 million data information was written into a table with 3000 data. As a result, a Broken Pipe message was displayed in the background log when users searched related data, indicating that the CPU of the database reached 80%. At this point, after data cleaning (which is also a very troublesome thing, for deleting big data, it will directly timeout), the problem is solved and the system runs normally.

Mysql > delete 25 million big data

Delete small data with regular delete can, delete tens of millions of large data will be very slow, and will not release disk space, policy: create a new table, import data that does not need to be deleted, and then delete the old table, rename the new table to the old table, the steps are as follows

1. Create a new table based on the old table

create table sys_xm_jd_cp like sys_xm_jd;

Insert data (tens of millions of data must be inserted in batches, limited to 500,000 at a time, after all, mysql data processing capacity is limited), can be queried by ID after insert

insert into sys_xm_jd_cp select * from sys_xm_jd where periodid > 1000 and periodid <= 3000; The new tables retain only useful data, freeing up disk space

Drop the old watch

drop table sys_xm_jd;

4, rename the new table to sys_xM_JD

alter table sys_xm_jd_cp rename to sys_xm_jd;

We will continue to cover other scenarios.