preface
What does MySQL data look like on disk? In this article we talk about how data in MySQL is stored on disk. Those of you who are interested can review it.
However, there is still a bit of a tail to it: the most used joint index and the left-most prefix principle in the work.
define
An index consisting of multiple fields.
We’ve been talking about single-valued indexes, where you only have one field, and now you have multiple fields.
The SQL statement
Table name (1, 2…); table name (1, 2…); USING BTREE
CREATE TABLE t_employee (
id bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT,
create_time datetime NOT NULL,
update_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
name varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'name',
age int(10) UNSIGNED NOT NULL COMMENT 'age',
dept varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'department'.PRIMARY KEY (id) USING BTREE,
INDEX idx_name_age_dept(name, age, dept) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = 'Staff list' ROW_FORMAT = Compact;
Copy the code
To create a joint index on an existing table, execute the following statement: create index index name on table name (field name 1, field name 2…) ;
create index idx_name_age_dept on t_employee(name, age, dept);
Copy the code
If you want to create a unique index, use unique before index, otherwise it is normal.
UNIQUE INDEX uk_name_age_dept(name, age, dept) USING BTREE
Copy the code
create unique index uk_name_age_dept on t_employee(name, age, dept);
Copy the code
The form of storage on disk
Insert 10 records
As mentioned earlier, MySQL’s data structure on disk is stored as B+tree.
And all indexes except the primary key index are called secondary indexes: the data on the leaf node stores only the primary key value except the index itself.
The name field of the bottom leaf node is orderly, increasing from left to right.
Globally, the AGE and DEPT fields are unordered;
But locally, when the first field name is equal, the value of the age field is sorted. If age is still equal, sort by the third field.
Underlying lookup process
The first field (name) is compared, the second field (age) is compared, and the third field (dept) is compared.
In the data page directory (data node), the binary method is used to quickly locate the data, and then the specific data item can be found by traversing the records in the corresponding interval.
After the data is found, the primary key value (1) stored on the leaf node is returned to the table to find the complete record in the clustered index tree and read it out.
Left-most prefix rule
The leftmost prefix principle is the way data is searched in the associative index B+tree.
If the leftmost index column is not used, the index is invalid, but if it is used, it may be valid.
Idx_name_age_dept (name, age, dept)
statements | The effect | instructions |
---|---|---|
select * from t_employee | ❌ | No query conditions, direct full table lookup |
~ where name = ‘HanMeimei’ | ✔ | The index uses the name field |
~ where name = ‘HanMeimei’ and age = 18 | ✔ | The index uses the name and age fields |
~ where age = 18 and name = ‘HanMeimei’ | ✔ | The index uses the name and age fields. The query optimizer analyzes the search criteria and decides which search criteria to use first and which to use later, based on the order of columns in the available indexes. |
~ where name = ‘HanMeimei’ and age < 18 | ✔ | The index uses the name and age fields |
~ where name = ‘HanMeimei’ and age = 18 and dept = ‘finance’ | ✔ | The index uses the name, age, and DEPT fields |
~ where name = ‘HanMeimei’ and dept = ‘finance’ | ✔ | The index uses only the name field; throughB+tree Select * from dept where name = ‘HanMeimei |
~ where age = 18 | ❌ | Query condition does not bring the leftmost name field, direct full table lookup |
~ where name < ‘WangWu’ and age = 18 | ✔ | The index uses only the name field; throughB+tree Find the node whose name < ‘WangWu’ and compare the age field from it |
~ where name like ‘%Han%’ | ❌ | On the far left is %. The exact value is ambiguous and cannot be foundB+tree Is used to compare values. |
~ where name like ‘Han%’ | ✔ | The index uses the name field; throughB+tree Find data where name starts with ‘Han’ |
~ where name = ‘HanMeimei’ or age = 22 | ❌ | SQL: index(name) index(age); |
~ where substring(name, 3) = ‘hhh’ | ❌ | Indexes are not used, and implicit conversions, calculations, or function operations on index columns are invalid |
advantages
Reduce index building overhead
For example, idx_name_AGe_DEPT (name, age, dept) creates 3 indexes at the same time
- idx(name)
- idx(name, age)
- idx(name, age, dept)
Cover index
An override index is a select column of data that can be retrieved only from the index, not from the rows.
That is, the field information we want to query can be directly obtained from the joint index tree structure, instead of the primary key value after the query table.
Select * from T_employee where XXX; select * from T_employee where XXX;
Select name, age, dept from T_employee where name =’HanMeimei’ and age = 18 select dept from T_employee where name =’HanMeimei’ and age = 18
The name, age, and DEPT fields can be retrieved directly from their own federated index structures.
This is an important means of SQL optimization, sometimes also artificially constructed overwrite index, this query method, is called overwrite index.
Effectively narrow down the screening field
Select * from T_employee where name = ‘HanMeimei’ and age = 18 and dept = ‘finance’ select * from T_employee where name = ‘HanMeimei’ and age = 18 and dept = ‘finance’ Suppose 10% of the data can be filtered out for each condition.
- When there is no joint index, only single value index, then filter 10% of the data through the single value index, and then query the table from the 100W data.
- When there is a joint index, it is much more efficient to select 10% × 10% × 10% data from the index, and then query the table from 1W data.
Sorting faster
SELECT * FROM T_employee WHERE name = ‘HanMeimei’ ORDER BY create_time;
Sort the results by create_time field.
Execute the plan using Explain to see the results
SELECT * FROM T_employee WHERE name = ‘HanMeimei’ ORDER BY age; SELECT * FROM T_employee WHERE name = ‘HanMeimei’ ORDER BY age;
Using index condition: The query uses the index, but the data needs to be retrieved from the table.
Using filesort: Use file sorting.
You can see that the latter sort by one step less than the formerfilesort
Operation.
Filesort sort is a way to sort data in memory or on disk after a query. Sometimes, if the result set is too large to be sorted in memory, it is possible to temporarily use disk space to store intermediate results, and then return the sorted result set to the client.
When you need to sort the fields in the index, because the index B+tree is already in order, eliminating file sorting will also improve efficiency.
Pay attention to
The order of the syndicated index columns
The order of the columns in the federated index is important
Index (name, age) is not the same as index(name) + index(age)
Index (name, age) is not the same as index(age, name)
The value of the indexed field is NULL
Read a lot of information, say what all have… Experiment is the sole criterion for testing truth
Now change the name and DEPT fields to null
And add an HHH field, the union index and the single-value index are set to unique index.
- insert
null
Data, see the effect 👇
As a result, null values can be inserted into both unique indexes.
-
Query null data
Use the Explain execution plan to verify that the index is valid when the query is executed.
- Validate federated index
uk_name_age_dept
The validity of
EXPLAIN SELECT * FROM t_employee WHERE ISNULL(name); Copy the code
Here you can see the use of the union index;
- Validate a single-valued index
idx_hhh
The validity of
EXPLAIN SELECT * FROM t_employee WHERE ISNULL(hhh); Copy the code
Here you can see that single-valued indexes are also used;
Now let’s make a change: set all the first 10 rows to NULL
Go through the plan again
Select idX_hhh, idx_hHH, idx_hHH, idx_hHH, idx_hHH Here’s why:
In MySQL, there is something called the query optimizer, which calculates statistics on the records in the table and then uses those statistics to calculate the number of records that need to be returned to the table based on the query criteria.
The more records need to be returned to the table, the more inclined to use full table scan;
Instead, the secondary index + back table approach is preferred.
Of course the optimizer does more analysis than that, but it’s basically the process.
In general, limiting the number of records retrieved by a query makes the optimizer more inclined to use the secondary index + back table approach, because the fewer records returned to the table, the higher the performance improvement.
Select * from HHH; select * from HHH; select * from HHH;
SQL > alter table HHH; alter table HHH; alter table HHH;
Actually,
MySQL
There are also explanations on the official websiteUNIQUE index allows columns that can contain NULL to use multiple NULL values.
MySQL can use indexes to search for NULL data.
But when we design the database table, it is better to have a field set to
NOT NULL
As for the habit ofNULL
Another article today will tell you the answer. - Validate federated index
The more indexes, the better
The index itself is also data, and also needs to be stored.
It takes time to create an index, and it takes time to maintain the index as the table inserts or updates data, both of which increase with the volume of data.
So when every field is indexed, it’s not necessary.
The resources
- How MySQL Works: Understanding MySQL from the Root
- www.jb51.net/article/157…
- Dev.mysql.com/doc/refman/…
- Dev.mysql.com/doc/refman/…
The last
Index is a good thing, but do not greedy ah, to consider the big picture ~ the author’s ability is limited, if the wrong place to write welcome in the comments section to communicate.
If this article helped you, please like 👍.
Share technology, hold on, we can win 💪!