Tag: article of public account
I do not know when the Internet began to circulate such a saying:
MySQL > select ‘WHERE’ from ‘IS NULL’; = index query cannot be used, only full table scan can be used.
This argument has become so strong that it is even regarded as the truth by many students. We don’t say anything. For example. Suppose we have a table s1 with the following structure:
CREATE TABLE s1 (
id INT NOT NULL AUTO_INCREMENT,
key1 VARCHAR(100),
key2 VARCHAR(100),
key3 VARCHAR(100),
key_part1 VARCHAR(100),
key_part2 VARCHAR(100),
key_part3 VARCHAR(100),
common_field VARCHAR(100),
PRIMARY KEY (id),
KEY idx_key1 (key1),
KEY idx_key2 (key2),
KEY idx_key3 (key3),
KEY idx_key_part(key_part1, key_part2, key_part3)
) Engine=InnoDB CHARSET=utf8;
Copy the code
There are 10,000 entries in this table:
mysql> SELECT COUNT(*) FROM s1;
+----------+
| COUNT(*) |
+----------+
| 10000 |
+----------+
1 row in set (0.00 sec)
Copy the code
Below we directly paste a few pictures:
SQL > select * from ‘WHERE’ WHERE ‘IS NOT NULL’; = these conditions, but as you can see from their execution plans, these statements use the corresponding secondary index to execute the query, instead of using a so-called full table scan. Of course, it’s not the goal of this article to debunk these rumors, but to take a closer look at how these queries are actually executed.
How are NULL values stored in records
In MySQL, each record has its own fixed format. Let’s use the Compact row format of InnoDB storage engine as an example to see how NULL values are stored. In Compact row format, a record consists of the following parts:
To keep the story going, let’s create a new table called record_format_demo:
CREATE TABLE record_format_demo (
c1 VARCHAR(10),
c2 VARCHAR(10) NOT NULL,
c3 CHAR(10),
c4 VARCHAR(10)
) CHARSET=ascii ROW_FORMAT=COMPACT;
Copy the code
Since our focus is on how NULL values are stored in the record, we will focus on the NULL values list in format, and the rest can be viewed in the booklet. The procedure for storing NULL values is as follows:
-
First, count the columns in the table that are allowed to store NULL.
As mentioned earlier, primary key columns and columns decorated with NOT NULL are NOT allowed to store NULL values, so these columns are NOT counted in the statistics. Record_format_demo (c1, C3, C4) is NULL. Record_format_demo (C2) is NOT NULL.
-
If there are no null-allowed columns in the table, then the NULL value list will not exist. Otherwise, each null-allowed column will have a binary bit. The binary bits are arranged in reverse order of the column.
- The binary bit value is
1
, represents the value of the columnNULL
. - The binary bit value is
0
, represents that the value of the column is notNULL
.
Since the record_format_demo table has three columns whose values are allowed to be NULL, the mapping between the three columns and the binary bits looks like this:
Again, the bits are arranged in reverse order of the columns, so the first column C1 corresponds to the last bit.
- The binary bit value is
-
The uncle who designed InnoDB specifies that the list of NULL values must be represented by integer byte bits. If the number of binary bits used is not an integer byte, 0 will be added at the top of the byte.
Record_format_demo only has 3 columns that are allowed to be NULL, corresponding to 3 binary bits, less than one byte, so fill the high byte with 0, and the effect is like this:
Similarly, if there are nine null-allowed entries in a table, then the NULL value list portion of that entry needs two bytes to represent.
Suppose we now insert a record into the record_format_demo table:
INSERT INTO record_format_demo(c1, c2, c3, c4)
VALUES('eeee'.'fff', NULL, NULL);
Copy the code
The values of c3 and c4 in c1, C3 and C4 are NULL, so the corresponding binary bits of the three columns are:
So the list of NULL values for this record is 0x06 in hexadecimal.
How are records with NULL keys stored in B+ trees
For the InnoDB storage engine, records are stored in pages (the default size of a page is 16KB), and these pages can be used as nodes of a B+ tree to form an index, something like this (just use the following figure for an example of a B+ tree, not related to the table above) :
Both clustered indexes and secondary indexes correspond to B+ trees like the one above, but:
-
For clustered index indexes, records on a page are sorted by primary key value; For secondary indexes, the records on the page are sorted by the value of a given index column.
-
For clustered indexes, nodes (pages) at each level of the B+ tree are sorted by the size of the primary key recorded in the page; For secondary indexes, each level of the B+ tree node (page) is sorted by the value of a given index column recorded in the page.
-
For clustered indexes, the page corresponding to the leaf node of the B+ tree stores the complete user record (i.e. a record containing all the column values defined by InnoDB, as well as some hidden columns added by InnoDB itself). For secondary indexes, only the index column value + primary key value is stored in the page corresponding to the leaf node of the B+ tree.
The primary key of a record is not allowed to store NULL values, so the WHERE clause must be FALSE:
SELECT * FROM tbl_name WHERE primary_key IS NULL;
Copy the code
A statement optimizer like this can determine by itself that the WHERE clause must be NULL, so it doesn’t execute at all. (The Extra information indicates that the WHERE clause is not valid at all.)
For secondary indexes, the value of the index column may be NULL. Where are secondary index records placed in the B+ tree if the index column value is NULL? The answer is: on the far left of the B+ tree. Let’s say we have the following query statement:
SELECT * FROM s1 WHERE key1 IS NULL;
Copy the code
Its query diagram looks like this:
InnoDB’s secondary index idx_KEY1 is placed at the left of the B+ tree. InnoDB’s secondary index idx_KEY1 is placed at the left of the B+ tree.
We define the SQL null to be the smallest possible value of a field.
That is, they consider a NULL value in SQL to be the smallest value in a column.
After quickly locating the leftmost record in the leaf node through the B+ tree corresponding to the secondary index IDx_KEY1, that is, the record with the ID value of 521 in this case, records can be obtained along the one-way linked list of records by following the next_record property that each record has. Until the key1 column of a record is not NULL.
Tip: The process of quickly locating records to leaf nodes through a B+ tree is done using a so-called Page Directory, but this is not the focus of this article. You can refer to the brochure for detailed explanation.
What is the basis for not using indexes?
IS NULL, IS NOT NULL,! When is a full table scan used?
The answer is simple: cost. Of course, we spend a lot of time in the small volume talking about how to quantify the cost of executing a query using an index. However, due to the limited space, we will only prepare a qualitative analysis here. There are two main cost components for queries using secondary indexes:
-
The cost of reading secondary index records
-
The cost of performing a back-table operation on a secondary index record, that is, the operation to find the complete user record in the clustered index.
Article clearly, to scan a secondary index record number, the more you need to perform the back to the table the more the number of operation, achieved a certain scale, the cost of using secondary index performs the query is more than the cost of a full table scan (for an extreme example, for example to scan all the secondary index record, that is about to implementation for each record again back to the table, Naturally not as fast as scanning the cluster index directly).
So the MySQL optimizer precalculates the number of secondary index records that need to be scanned for each possible index before actually executing the query. For example, for this query:
SELECT * FROM s1 WHERE key1 IS NULL;
Copy the code
Optimizer will analyze the query only need to look up key1 value is NULL, then visit the secondary indexes idx_key1, look at the value to NULL record how many (if eligible secondary index record number is less, then the statistical result is accurate, if too much, will use a certain method of calculating the value of a fuzzy, This is called an Index dive, in which the optimizer first accesses the index to calculate the number of index records to scan before the query is actually executed. Of course, for some queries, such as WHERE clause with IN condition and IN condition with many parameters, for example:
SELECT * FROM s1 WHERE key1 IN ('a'.'b'.'c'. .'zzzzzzz');
Copy the code
So there are too many key1 values that need to be counted, so that index dive cannot be used to actually access the secondary index idx_KEY1. Instead, you need to estimate the number of matched secondary index records using statistics previously generated in the background (obviously counting records based on statistics is much less accurate than index Dive).
Regardless of whether you use Index Dive or statistical estimation, you end up with a number of secondary index entries that need to be scanned. If the number of entries is too large as a percentage of the total number of entries, you tend to use the full table scan, otherwise you tend to use the index.
IS NULL, IS NOT NULL, IS NOT NULL! = These conditions can still use indexes, essentially the optimizer calculates the ratio of the corresponding number of secondary indexes to the total number of records.
Do not believe rumors, do not spread rumors
As you can see, the decision not to perform a query using an index in MySQL is based on a simple matter of cost. Instead of using IS NULL, IS NOT NULL,! = these conditions. Everyone later also more refute rumors, not so complicated, just a cost.
digression
Writing articles is tiring, and sometimes you feel that the reading is smooth, but it is actually the result of countless revisions behind. If you think it is good, please help to forward it, thanks a million ~ here is my public account “we are all small frog”, there are more technical dry goods, occasionally pull a calf, welcome to pay attention to: