To be honest, this question can relate to a lot of the core knowledge of MySQL, can pull out a lot of, like when you are asked to test your computer network knowledge, “after entering the URL enter, what on earth happened”, see how many you can say.
Tencent interview before the truth, also asked this question, but the answer is very bad, before did not think about the relevant reasons, resulting in a moment between the pull out. So today, I take you to detail what are the reasons, I believe you will have a harvest after reading, or you beat me.
Start to pretend: classification discussion
If a SQL statement is executed slowly, is it executed slowly every time? Or is it normal most of the time and slow occasionally? So I think there are two other things we can talk about.
1, most of the situation is normal, only occasionally appear very slow.
2. This SQL statement has been executing very slowly without changing the amount of data.
In view of these two cases, let’s analyze the possible causes.
For the occasional slow case
A SQL most normal, occasionally can appear very slow, in view of this situation, I think the WRITING of the SQL statement itself is no problem, but other reasons, what will be the reason?
I have no choice but to refresh dirty pages in the database
When we want to insert into the database, or a data to update the data, we know that the database will update the corresponding field data in the memory, but after the update, the update of the field will not be immediately synchronized persisted to disk, but to write the record of these updates to the redo log diary, when the free, Update data to disk via redo log.
However, the capacity is limited in the redo log, if the database has been very busy, and it’s frequently updated, at this time of the redo log wrote will soon be over, this time can’t wait until the data synchronization to disk when idle, can only suspend other operations, devote to the data synchronization to the disk, and this time, This will cause our normal SQL statement to suddenly execute slowly, so that the database in the synchronization of data to disk, it may cause our SQL statement to execute slowly.
What am I supposed to do without the lock
We can’t get the lock, so we have to wait for someone else to release the lock. Or, if the table is not locked, but one of the rows I want to use is locked, I can’t help it.
To determine if you are actually waiting for a lock, you can use the show processList command to check the current status. Some commands are better to record.
Let’s come and analyze the second case, which I think is the most important one
For being this slow all the time
If the same amount of data in the case of this SQL statement every time so slow, it is necessary to consider your SQL writing, let’s analyze the reasons why our SQL statement execution is not ideal.
Let’s first assume that we have a table with the following two fields: primary key ID and two common fields C and D.
mysql> CREATE TABLE `t` (
`id` int(11) NOT NULL,
`c` int(11) DEFAULT NULL,
`d` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Copy the code
I got it. I didn’t use the index
I don’t use the index, which I think is one of the reasons that many people can think of, for example, if you want to query this statement
select * from t where 100 <c and c < 100000;
Copy the code
Field has no index
If you don’t have an index on your c column, you’ll have to do a full table scan, and you’ll experience the fun of not having an index, so this will cause the query to be slow.
The field has an index, but it is not indexed
Ok, so now you index c, and then you query a statement
select * from t where c - 1 = 1000;
Copy the code
I want to ask you a question, do you use indexes when you do queries like this?
A: No, if we did the operation on the left side of the field, then sorry, in the query, the index will not be used, so we should pay attention to this field has an index, but due to their negligence, resulting in the system did not use the index situation.
The correct query would be as follows
select * from t where c = 1000 + 1;
Copy the code
One might say, well, if I have an operation on the right hand side, I can use the index, right? Wouldn’t the database automatically optimize for us, automatically convert c-1 =1000 to c =1000 +1?
I’m sorry, but I can’t help you, so, be careful.
The index was not used because of the function operation
If we perform functional operations on fields in the query, it will also cause that indexes are not used, for example
select * from t where pow(c,2) = 1000;
Copy the code
I’m just doing an example here, but if pow is c to the NTH power, there’s probably no pow of c,2. And this is actually very similar to what we did on the left hand side.
So, when a statement is executed slowly, it may be that the statement does not use the index, but the specific reason why the index is not used, you will have to analyze, I listed above three reasons, should be more appeared.
Ha ha, the database itself chose the wrong index
When we perform a query operation, for example
select * from t where 100 < c and c < 100000;
Copy the code
As we know, there is a difference between a primary key index and a non-primary key index. A primary key index stores the value of an entire row of fields, whereas a non-primary key index stores the value of an entire row of fields. MySQL > alter table select * from primary key; alter table select * from primary key; alter table select * from primary key
In other words, if we go to the index of c, we will find the corresponding primary key. Then, we will go to the primary key index according to the primary key value, and return the whole row.
If you have an index in c, the system may scan the entire table for 100 < C and c < 100000. If you have an index in C, the system may scan the table for 100 < C and C < 100000.
Why is that?
This statement is used to predict whether there will be fewer rows in the c index scan or fewer rows in the full table scan. Obviously, fewer rows scanned is better, because fewer rows scanned means fewer I/O operations.
If we were to scan the entire table, then the number of scans would be the total number of rows in the table, let’s say n; If we go to index C, after we find the primary key through index C, we have to go to the primary key index again to find our entire row, that is, we need to go through the index twice. Also, we don’t know how many rows are in the table that match 100 c < and C < 10000. What if the table matches all rows? This means that not only does the c index scan n rows, but it also has to go through the index twice per row.
Therefore, it is possible for the system to perform a full table scan without using the index. So how does the system determine that?
The judgment comes from the system’s prediction, that is, the system will predict how many rows will need to be scanned to go through the C field index. If it expects a large number of rows to be scanned, it may skip the index and scan the entire table.
So the question is, how does the system predict and judge? ** Here I tell you how the system is judging it, although this time I have written neck a little sore.
The system is judged by index differentiation. The more different values on an index, the fewer indexes with the same value, and the higher the index differentiation. We also call the degree of differentiation cardinal number, that is, the higher the degree of differentiation, the larger the cardinal number. So, a larger cardinality means fewer rows that meet the conditions 100 < c and c < 10000.
Therefore, the larger the cardinality of an index, the more advantageous it is to query through the index.
So how do you know the cardinality of this index?
Of course, the system does not traverse the whole data to obtain the cardinality of an index. It is too expensive. The index system predicts the cardinality of an index by traversing part of the data, that is, by sampling.
The index c has a very large base, but unfortunately the base of the index c was predicted to be very small. For example, if you sample a piece of data that happens to have a very small base, you mistakenly think the index has a very small base. Then ha ha, the system does not go to the C index, directly go to the full scan.
So, having said so much, we come to the conclusion that due to the error of statistics, the system did not go to the index, but went to the full table scan, and this is also the cause of our SQL statement execution is very slow.
Here I declare that the prediction of the number of rows scanned by the system is only one of the reasons. Whether the query statement needs to use temporary tables, whether it needs to sort, etc., will also affect the system’s choice.
However, we can sometimes force indexes to query, for example
select * from t force index(a) where c < 100 and c < 100000;
Copy the code
We can also pass
show index from t;
Copy the code
To check whether the cardinality of the index matches the actual value, if it does not match the actual value, we can reset the index cardinality, we can use this command
analyze table t;
Copy the code
To reanalyze the statistics.
Since the cardinality of the index is predicted wrong, it also means that the system may select the wrong index when the query has multiple indexes, which may also be a reason for the slow SQL execution.
All right, that’s it, I think it’s pretty cool that you’ve got that, so let’s just conclude.
conclusion
The above is my summary and understanding, the last part, I am afraid that many people do not understand the database will choose the wrong index, so I explained in detail, I will make a summary of the above.
An SQL execution is slow, we will discuss two cases:
1, most of the cases are very normal, occasionally very slow, there are the following reasons
(1) The database is updating dirty pages. For example, the redo log is full and needs to be synchronized to disk.
(2) When executing, encounter lock, such as table lock, row lock.
2. This SQL statement has been executed slowly for the following reasons.
(1) No index is used: for example, this field does not have an index; Index unavailable due to field operations, function operations.
(2) The database selected the wrong index.
If you have a supplement, you can also add a wave in the message area.
Finally, promote my public number: helpless and painful code farmers: Stamp I can pay attention to, the article will be first in my public number, looking forward to the attention of all heroes exchange.