Tencent interview: a SQL statement execution is slow what are the reasons? - No regrets series

To be honest, this question can relate to a lot of the core knowledge of MySQL, can pull out a lot of, like when you are asked to test your computer network knowledge, “after entering the URL enter, what on earth happened”, see how many you can say.

Tencent interview before the truth, also asked this question, but the answer is very bad, before did not think about the relevant reasons, resulting in a moment between the pull out. So today, I take you to detail what are the reasons, I believe you will have a harvest after reading, or you beat me.

Start to pretend: classification discussion

If a SQL statement is executed slowly, is it executed slowly every time? Or is it normal most of the time and slow occasionally? So I think there are two other things we can talk about.

1, most of the situation is normal, only occasionally appear very slow.

2. This SQL statement has been executing very slowly without changing the amount of data.

In view of these two cases, let’s analyze the possible causes.

For the occasional slow case

A SQL most normal, occasionally can appear very slow, in view of this situation, I think the WRITING of the SQL statement itself is no problem, but other reasons, what will be the reason?

I have no choice but to refresh dirty pages in the database

When we want to insert into the database, or a data to update the data, we know that the database will update the corresponding field data in the memory, but after the update, the update of the field will not be immediately synchronized persisted to disk, but to write the record of these updates to the redo log diary, when the free, Update data to disk via redo log.

However, the capacity is limited in the redo log, if the database has been very busy, and it’s frequently updated, at this time of the redo log wrote will soon be over, this time can’t wait until the data synchronization to disk when idle, can only suspend other operations, devote to the data synchronization to the disk, and this time, This will cause our normal SQL statement to suddenly execute slowly, so that the database in the synchronization of data to disk, it may cause our SQL statement to execute slowly.

What am I supposed to do without the lock

We can’t get the lock, so we have to wait for someone else to release the lock. Or, if the table is not locked, but one of the rows I want to use is locked, I can’t help it.

To determine if you are actually waiting for a lock, you can use the show processList command to check the current status. Some commands are better to record.

Let’s come and analyze the second case, which I think is the most important one

For being this slow all the time

If the same amount of data in the case of this SQL statement every time so slow, it is necessary to consider your SQL writing, let’s analyze the reasons why our SQL statement execution is not ideal.

Let’s first assume that we have a table with the following two fields: primary key ID and two common fields C and D.

mysql> CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

Copy the code

I got it. I didn’t use the index

I don’t use the index, which I think is one of the reasons that many people can think of, for example, if you want to query this statement

select * from t where 100 <c and c < 100000;
Copy the code

Field has no index

If you don’t have an index on your c column, you’ll have to do a full table scan, and you’ll experience the fun of not having an index, so this will cause the query to be slow.

The field has an index, but it is not indexed

Ok, so now you index c, and then you query a statement

select * from t where c - 1 = 1000;
Copy the code

I want to ask you a question, do you use indexes when you do queries like this?

A: No, if we did the operation on the left side of the field, then sorry, in the query, the index will not be used, so we should pay attention to this field has an index, but due to their negligence, resulting in the system did not use the index situation.

The correct query would be as follows

select * from t where c = 1000 + 1;
Copy the code

One might say, well, if I have an operation on the right hand side, I can use the index, right? Wouldn’t the database automatically optimize for us, automatically convert c-1 =1000 to c =1000 +1?

I’m sorry, but I can’t help you, so, be careful.

The index was not used because of the function operation

If we perform functional operations on fields in the query, it will also cause that indexes are not used, for example

select * from t where pow(c,2) = 1000;
Copy the code

I’m just doing an example here, but if pow is c to the NTH power, there’s probably no pow of c,2. And this is actually very similar to what we did on the left hand side.

So, when a statement is executed slowly, it may be that the statement does not use the index, but the specific reason why the index is not used, you will have to analyze, I listed above three reasons, should be more appeared.

Ha ha, the database itself chose the wrong index

When we perform a query operation, for example

select * from t where 100 < c and c < 100000;
Copy the code

As we know, there is a difference between a primary key index and a non-primary key index. A primary key index stores the value of an entire row of fields, whereas a non-primary key index stores the value of an entire row of fields. MySQL > alter table select * from primary key; alter table select * from primary key; alter table select * from primary key

In other words, if we go to the index of c, we will find the corresponding primary key. Then, we will go to the primary key index according to the primary key value, and return the whole row.

If you have an index in c, the system may scan the entire table for 100 < C and c < 100000. If you have an index in C, the system may scan the table for 100 < C and C < 100000.

Why is that?

This statement is used to predict whether there will be fewer rows in the c index scan or fewer rows in the full table scan. Obviously, fewer rows scanned is better, because fewer rows scanned means fewer I/O operations.

If we were to scan the entire table, then the number of scans would be the total number of rows in the table, let’s say n; If we go to index C, after we find the primary key through index C, we have to go to the primary key index again to find our entire row, that is, we need to go through the index twice. Also, we don’t know how many rows are in the table that match 100 c < and C < 10000. What if the table matches all rows? This means that not only does the c index scan n rows, but it also has to go through the index twice per row.

Therefore, it is possible for the system to perform a full table scan without using the index. So how does the system determine that?

The judgment comes from the system’s prediction, that is, the system will predict how many rows will need to be scanned to go through the C field index. If it expects a large number of rows to be scanned, it may skip the index and scan the entire table.

So the question is, how does the system predict and judge? ** Here I tell you how the system is judging it, although this time I have written neck a little sore.

The system is judged by index differentiation. The more different values on an index, the fewer indexes with the same value, and the higher the index differentiation. We also call the degree of differentiation cardinal number, that is, the higher the degree of differentiation, the larger the cardinal number. So, a larger cardinality means fewer rows that meet the conditions 100 < c and c < 10000.

Therefore, the larger the cardinality of an index, the more advantageous it is to query through the index.

So how do you know the cardinality of this index?

Of course, the system does not traverse the whole data to obtain the cardinality of an index. It is too expensive. The index system predicts the cardinality of an index by traversing part of the data, that is, by sampling.

The index c has a very large base, but unfortunately the base of the index c was predicted to be very small. For example, if you sample a piece of data that happens to have a very small base, you mistakenly think the index has a very small base. Then ha ha, the system does not go to the C index, directly go to the full scan.

So, having said so much, we come to the conclusion that due to the error of statistics, the system did not go to the index, but went to the full table scan, and this is also the cause of our SQL statement execution is very slow.

Here I declare that the prediction of the number of rows scanned by the system is only one of the reasons. Whether the query statement needs to use temporary tables, whether it needs to sort, etc., will also affect the system’s choice.

However, we can sometimes force indexes to query, for example

select * from t force index(a) where c < 100 and c < 100000;
Copy the code

We can also pass

show index from t;
Copy the code

To check whether the cardinality of the index matches the actual value, if it does not match the actual value, we can reset the index cardinality, we can use this command

analyze table t;
Copy the code

To reanalyze the statistics.

Since the cardinality of the index is predicted wrong, it also means that the system may select the wrong index when the query has multiple indexes, which may also be a reason for the slow SQL execution.

All right, that’s it, I think it’s pretty cool that you’ve got that, so let’s just conclude.

conclusion

The above is my summary and understanding, the last part, I am afraid that many people do not understand the database will choose the wrong index, so I explained in detail, I will make a summary of the above.

An SQL execution is slow, we will discuss two cases:

1, most of the cases are very normal, occasionally very slow, there are the following reasons

(1) The database is updating dirty pages. For example, the redo log is full and needs to be synchronized to disk.

(2) When executing, encounter lock, such as table lock, row lock.

2. This SQL statement has been executed slowly for the following reasons.

(1) No index is used: for example, this field does not have an index; Index unavailable due to field operations, function operations.

(2) The database selected the wrong index.

If you have a supplement, you can also add a wave in the message area.

Finally, promote my public number: helpless and painful code farmers: Stamp I can pay attention to, the article will be first in my public number, looking forward to the attention of all heroes exchange.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Tencent interview: a SQL statement execution is slow what are the reasons? — No regrets series

Start to pretend: classification discussion

For the occasional slow case

I have no choice but to refresh dirty pages in the database

What am I supposed to do without the lock

For being this slow all the time

I got it. I didn’t use the index

Ha ha, the database itself chose the wrong index

conclusion

Tencent interview: a SQL statement execution is slow what are the reasons? — No regrets series

Start to pretend: classification discussion

For the occasional slow case

I have no choice but to refresh dirty pages in the database

What am I supposed to do without the lock

For being this slow all the time

I got it. I didn’t use the index

Ha ha, the database itself chose the wrong index

conclusion

Related Posts

How does a Web page appear when a URL is entered into the address bar of Web browsing? (a)

Java Concurrent Programming — initial JUC

Redis master/Slave Architecture Deep digging configuration files (2)