preface

I’m sorry to post this article so late. I have a lot of things to do these days, so I have to write this on Thursday when I finish the task. Please look down ↓↓↓ for more details

Detailed requirements

System, which has a special traffic library (no primary key), one of the history table data volume is too big, table space occupying too much, every day there are 500 w writing data, and then the boss arranged to have a task to me, let me write a table according to the days of timing task, each time the data is transferred to the day of a day generated in the table, and delete the data in the table, The main purpose is not to grow the tablespace any more, to maintain a balance, because 500W is added every day

Table space and data volume:

Implementation approach

Implementation process, as shown in the figure:Implementation pseudocode (truncated part of code) :

** ** Scheduled(cron = "0 0 3 **? ) public void dataTransfer () throws the Exception {System. Out. Println (" timer starts running -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "); String tabaleName = "XXX"; String isTable = getTableName(tabaleName); If (objectutils.isNULL (isTable)){createHistoryDate(names); if(objectutils.isnull (isTable)){createHistoryDate(names); Map<String, Object> orderTime = orderByTime(); If (sysfun.isnotempty (orderTime) && OrderTime.size () > 0){orderTime.put("startTime",startTime); orderTime.put("endTime",endTime); orderTime.put("tableName",tabaleName); int i=0; for (;;) {system.out.println (" enter loop "); Int RST = dataTransfer(orderTime); Int delt = deleteDataTransfer(orderTime); If (RST <=0 &&delt <=0){break; if(RST <=0 &&delt <=0){break; } i++; System.out.println(" tabaleName: "+ "tabaleName: "+ I); }} System. Out. Println (" timer finishes running -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- "); }Copy the code

journey

After the method was completed, we went to the server for formal test last Friday. When the method was measured, it was modified with @postconstruct. It would run when the server loaded the Servlet and be executed by the server only once. When the console printed:

(” Timer started “) stuck, go to the library to see the table has been successfully created

At first, I thought there was something abnormal, but later I found that the longest time was stuck, that is, I was stuck by an SQL query (I directly used this SQL to query 300s+ in the database, but failed to find it). Then the guy who maintained this database told me: Limit 1 = 1 limit 1 = 1 limit 1 = 1

Limit 1 does not work, so we can only use query optimization. Before we talk about query optimization, let’s talk about why we use itorder byWhy is it so slow?

In-depth analysis

MySql has two ways to implement ORDER BY:

  1. Generate ordered results through index scans

For example, if the history table has a primary key index on the id field, and the id is currently in the range of 1001-1006, then the id index B+Tree is as follows:

Now when we want to fetch data in order of id from smallest to largest, execute the following SQL

select * from history order by id

Mysql traverses the linked list of leaf nodes indexed by ID in the image above without additional sorting operations. This is sort by index scan.

  1. Using filesort (filesort)

If the id field does not have any index and the above B+Tree structure does not exist, Mysql can only scan the table to filter the data that meets the criteria, and then sort the filter results by ID. The sorting process is called filesort. The ORDER BY clause must be indexed to avoid filesort. The ORDER BY clause must be indexed to avoid filesort


Before optimization we also need to learn to see SQL execution plan (EXPLAIN) respectively (here focus on type, rows, Extra, other here do not EXPLAIN, can be their own private understanding) :

Id, select_Type, table, type, possible_keys, key, key_len, ref, rows, Extra

  1. Type: access mode of the table. This mode is also called access type.
  • Mysql > system > const > eq_ref > ref > range > index > All

  • Range only retrives rows in a given range and uses an index to select rows. Generally, between, <, >, in and other queries appear in where. Range scanning is better than full table scanning

  • Index Full Index Scan: The index type is different from All. It is usually faster than All because index files are usually smaller than data files. That is, both all and index are read from the full table, but index is read from the index, while all is read from the hard disk

  • ALL Full Table Scan, which traverses the Table to find matching rows

  1. Rows: Approximate the number of rows to read to find the desired record based on table statistics and index selection, that is, as few as possible

  2. Extra: Contains important additional information that is not appropriate to be explicit in other columns

  • Using Index: indicates that the corresponding select operation Using a Covering Index, avoiding access to the table row, efficiency is good. Using WHERE indicates that the index is used to perform a key lookup. If using WHERE is not present at the same time, the index is used to read data rather than perform lookup actions.

  • Using filesort: when a Query contains an order by operation that cannot be done Using an index, the operation is called “filesort”.

Let’s analyze it with a real data graph

SQL: EXPLAIN SELECT updateTime FROM historyData ORDER BY updateTime LIMIT 1

There is no way to hit the index with our type of query, followThe principle of leftCreate a new one for updateTimeNormal index(index)NORMAL

In the process of adding indexes to carry a mouth, because the history repository data to there all the time, so I worry about the lock table indexed, at that time, the query related information to know, after Mysql5.6 version does not affect the reading and writing, can’t lock table, the premise for the InnoDB storage engine, MyISAM indexed table lock, speaking, reading and writing will be all blocked.

If there is too much data in the table, it may take too long to build the index. For example, I built the index in 4 hours with 6000W. The following is the index effect picture:

Combining the analysis of the data with the execution plan, the optimization came to this. The abnormal stability of the timing program in these days after the optimization, and the transfer and deletion of 500W data every day were also solved.

At the end

Actually this is some very basic things, welcome to point out problems, may be, as you know, but there is no chance to go to the actual contact with so many data, actual to optimize such things, I also contact them for the first time, write this article simply just want to share, to deepen your impression, by the way, writing is not good, please forgive me!!!!! Mysql – ORDER BY mysql