More like a paragraph: without a cold biting, how fragrant plum blossom, learning is boring, please adhere to! I learned this article from Mr. Dinky. Do not understand their own search ha! It takes about 35 minutes to read this article!

Hi, everybody. Earlier we had a rough idea of how to choose an index and the pros and cons. MySQL > select the wrong index If you choose the right index, twice the work!

Let’s simulate the data, do some preparatory work, create a table, insert 100,000 pieces of data

CREATE TABLE `t` ( `id` int(11) NOT NULL, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY 'a' (' a '), KEY 'b' (' b ')) ENGINE=InnoDB;Copy the code
delimiter ;;
create procedure idata()
begin
  declare i int;
  set i=1;
  while(i<=100000)do
    insert into t values(i, i, i);
    set i=i+1;
  end while;
end;;
delimiter ;
call idata();
Copy the code

Next, let’s run an SQL analysis

select * from t where a between 10000 and 20000;
Copy the code

According to the index, this statement must have no problem with the a index

Figure 1 Using the Explain command to view statement execution

As can be seen from Figure 1, the result is ok and consistent with our ideal.

Next, let’s execute the following SQL

Figure 2 Execution flow of Session A and Session B

In this case, sessionA starts the transaction with a consistent snapshot, sessionB deletes data, reinserts data, queries the execution plan, and sessionA commits the transaction.

Select * from T where a between 10000 and 20000; select * from T where a between 10000 and 20000; We can use slow log to see how this works.

To test our guess.

  1. The time to modify the slow query log first The current value means that all queries after 0 seconds are recorded in the slow query log.There is a pit set up after the session must restart, otherwise invalid set long_query_time=0;
  2. Check whether the current Settings are displayed successfullyshow variables like 'long%';
  3. Query Check whether the current slow query log function is enabledshow variables like '%slow_query_log%';

So that’s the preparation. We used a set of control operations. Then observe the execution results.

select * from t where a between 10000 and 20000; /*Q1*/ select * from t force index(a) where a between 10000 and 20000; /*Q2*/Copy the code

For the record, the graph above is not mine, the others are mine, and my slow query log does not show up. Very strange

As you can see, Q1 scans 100,000 rows, which is obviously a full table scan, and the execution time is 40 milliseconds. Q2 scans 10001 rows and executes for 21 milliseconds. That is, MySQL used the wrong index when we did not use force Index, resulting in a longer execution time.

This example corresponds to our usual scenario of constantly deleting historical data and adding new data. Is it strange that MySQL should choose the wrong index? Today, let’s start with that strange result.

What the optimizer does

In the previous article, we looked at the SQL execution flow and explained that the optimizer is responsible for selecting indexes. So what does the optimizer actually do here? Step by step, we dig deep into the underlying principles.

The optimizer chooses indexes to find an optimal execution plan and to execute statements with minimal cost. In a database, the number of rows scanned is one of the factors affecting the execution cost. Fewer rows scanned means less disk data is accessed and less CPU resources are consumed.

Of course, the number of rows scanned is not the only criterion. The optimizer also takes into account factors such as whether to use temporary tables, whether to sort, and so on.

Our simple query does not involve temporary tables and sorting, so MySQL must have chosen the wrong index to determine the number of rows scanned.

How many rows are scanned in MySQL?

It is impossible for the database to know how many rows there are before the actual execution, and it is also impossible to look up all the data and select the corresponding index based on the number. So how does that work?

Index differentiation

Obviously, the more different values there are on an index, the better the index is differentiated. The number of different values in an index is called cardinality. That is, the larger the cardinality, the more differentiated the index. We can see the cardinality of an index by showing index from the table name.

The values in the Cardinality columns below do not represent the number of rows.

Cardinality of index

This is a brief introduction to what cardinal numbers are and what cardinal numbers can affect. Here’s a look at how MySQL gets index cardinals internally. Sampling statistics

When sampling statistics, InnoDB will select N data pages by default, count the different values on these pages, get an average, and then multiply by the number of pages in this index. This number is the base of the index!

Tables are constantly updated and index statistics are not fixed. So when the number of rows changed exceeds 1/M, it automatically triggers a re-index. This is similar to the analyze table name mentioned above

Index statistics are stored in two ways in MySQL and can be viewed in the following code.

show variables like 'innodb_stats_persistent';
Copy the code

  • If this parameter is set to on, statistics will be stored persistently. By default, N is 20 and M is 10
  • If this parameter is set to off, statistics are stored only in memory. By default, N is 8 and M is 16.

SQL influence

Figure 3 explains the execution plan

According to Figure 3, we can analyze that there is a large gap in the number of scanned lines. The causal effect of the number of rows has been analyzed. Let’s take a look at it from another perspective.

Possible_keys: possible index during query

Key: indicates the actual index

Based on the preceding two prompts, the index that may be used is inconsistent with the actual index, which results in invalid indexes and improves query performance.

So why is that? The above test data, through the sessionA and sessionB session data operation. Resulting in inaccurate statistics. Why choose an execution plan that scans 100,000 rows instead of an execution plan that scans 37,000 rows?

This is because, if you use index A, every time you get a value from index A, you have to go back to the primary key index and look up the entire row, which the optimizer also takes into account.

If you choose to scan 100,000 rows, the scan is done directly on the primary key index at no additional cost.

The optimizer estimates the cost of both choices, and from the results, the optimizer decides that it is faster to scan the primary key index directly. Of course, this option is not optimal in terms of execution time.

Use a normal index and factor in the cost of returning to the table. According to Figure 1, we can see that there is no problem with our SQL policy. So there is a debt and there is a master

What are the reasons for choosing the wrong index

  • Index discrimination error (cardinality)
  • The index selection is incorrect due to the table back operation

The solution

  • Use force index to forcibly select an index
  • Use the Analyze table T to re-count index information
  • Add a more appropriate index
  • Write SQL that directs the optimizer to select indexes

Why do I have to rely on sessionA’s transactions

Why is there no session A and session B scans 1W rows

The preparation for the top, I believe many of you are very confused. I also hit a brick wall when I first approached it. Why does session deletion, insertion, and execution depend on sessionA’s consistent transactions?

If there is no sessionA. SessionB Can a session do all the work of this article? Error, cannot complete

Here is how delete is implemented.

Delete: The delete does not actually delete from the disk, but directly deletes the current reference mark.

Because of the delete delete feature. If the interval between delete and insert statements in the stored procedure is short, the Purge thread has not had time to clean up the record data. If the primary key is the same, MySQL will automatically use the previous record space.

Due to the same amount of data and table size, index statistics do not change, resulting in a series of problems such as the number of scanned rows.

Why does session B scan 3W rows when Session A is enabled

Since it is a consistent transaction, it can also be called a consistent read to ensure that sessionA is repeatable. Insert can only reinvent the wheel, inserting data somewhere else. Then, as time runs out, the Purge purge will purge all the deleted data.

Since purge has not yet been executed, there have been a number of problems with inconsistent table sizes, inconsistent table Spaces, and ultimately incorrect index statistics.

What has the Purge thread done

innodb_purge_batch_size

set innodb_purge_batch_size=X
Copy the code

Used to set the number of Undo log pages to purge per purge operation. [The default value is 300, indicating that 300 pages are cleared each time. Dynamic change is supported.]

A larger setting means that more pages are recycled at a time, and more undo pages are available for reuse, reducing disk storage and allocation overhead. However, if this parameter is set too high, purge will need to process more undo pages at a time, causing CPU and disk IO to become too focused on undo log processing, which degrades performance. Common users are not advised to adjust this parameter

innodb_purge_threads

set innodb_purge_threads=X
Copy the code

Increasing Innodb_PURGE_THREADS can improve the efficiency of the Purge mechanism when there are many tables performing DML operations.

In the current version of MySQL. The Purge thread has been separated from the Master thread, allowing for increased scalability by using separate threads.

Starting from MySQL5.7.8, this parameter defaults to 4 and can be set to a maximum of 32.

These two generally do not need to be modified, so I will not elaborate on them here

set innodb_max_purge_lag = X
set innodb_max_purge_lag_delay = X
Copy the code

role

  • Clear records of the DEL Flag label
  • Clear the historical version of undo
  • If you need to truncate the undo TABLESPACE.

Article continues to be updated [official account]

In this section, we introduce why MySQL can select the wrong index, what the optimizer does, how to select the index, how to guide to a better index, how to delete the index, how to create a consistent read.

Finally in the public number listed some of the above article mastery degree of the topic, help a favor to a concern! Thank you!