Mysql index Because the data size of large companies is generally large, if not indexing, the time will be serious.)
Have you ever found yourself creating an index that you don’t use?
Do you know the index? What data structure does the index use?
Is there a difference between a clustered index and a non-clustered index when querying data?
In fact, there has been ali Jingdong these large factory interview experience partners should have a feeling that these large factories for the underlying knowledge is more important, and we rarely or rarely to think about these
Today, we will talk about MySQL index
What is an index
An index is a data structure that helps us quickly retrieve data in a database. MySQL has two main structures: Hash index and B+ Tree index. InnoDB engine is used by small editor, the default is B+ Tree.
B+ Tree index and Hash index
The underlying Hash index is a Hash table. A Hash table is a key-value data storage structure. Therefore, multiple data have no sequential relationship in storage relationship. Hash indexes are good for equivalent queries, but not range queries. Hash indexes cannot be sorted by indexes. They do not support left-most matching rules for multi-column joint indexes. B+ Tree is a multi-path balanced query Tree, so its nodes are naturally ordered (the left child node is smaller than the parent node, and the parent node is smaller than the right child node), so there is no need to perform full table scan for range query.
Clustered index and non-clustered index
The two most common storage engines in MySQL are MyISAM and InnoDB, which implement non-clustered index and clustered index respectively.
In InnoDB, the leaf of index B+ Tree stores the entire row of data in primary key index, also known as cluster index. The leaf node of index B+ Tree stores the primary key value of the non-primary key index, also known as the non-clustered index.
Do you know why InnoDB non-primary key indexes are generally slower than primary key indexes? The answer is that InnoDB uses clustered indexes, primary indexes need to be queried once, and non-primary indexes need to be queried twice.
Why should a non-primary key index be queried twice? Let’s see what happens next.
Primary and secondary indexes
First, introduce the basic concepts. In the index classification, we can be divided into “primary index” and “secondary index” according to whether the key of the index is the primary key. The index established by the key value of the primary key is called “primary index”, and the other is called “secondary index”. Therefore, there can only be one primary index and many secondary indexes.
Why do I need a secondary index? As we explained earlier, queries that want to use indexes need to satisfy the leftmost matching rule. Sometimes our queries do not use primary key columns, so we need to create indexes on other columns, known as secondary indexes.
Non-clustered index
The primary index of a non-clustered index is almost the same as the secondary index, except that the primary index is not allowed to duplicate, null values are not allowed, and the keys of their leaf nodes store the physical address of the data corresponding to the key value.
Data tables with non-clustered indexes are stored separately from index tables. Data in a non-clustered index is stored according to the insertion order of the data. Therefore, non-clustered index is more suitable for single data query. Insert order is not affected by key values.
Clustering index
The leaf node of the primary index of the cluster index stores the data corresponding to the key value, while the leaf node of the secondary index stores the primary key value of the data corresponding to the key value. Therefore, the primary key value length should be as small as possible and the type should be as simple as possible.
The data for the clustered index is stored together with the primary key index.
The data in the cluster index is stored according to the order of the primary key. Therefore, it is suitable to search by the interval of the primary key index, which can have less disk I/O and speed up the query. However, for this reason, it is better to insert the clustered index in the same order as the primary key, otherwise it will frequently cause page splitting (an operation during BTree insertion), which seriously affects performance.
In InnoDB, if you only need to find the column of the index, try not to add other columns. This will improve query efficiency.
A diagram illustrates the difference between clustered and non-clustered indexes:
Many partners believe that: when querying data, because the leaf node that is not the primary key index is the value of the primary key, after finding the value of the primary key, we need to query again through the value of the primary key. Primary key index queries are queried only once, whereas non-primary key indexes need to be queried multiple times
This is not always the case: a covering index means that the execution of a query can be obtained only from the index, not from the table. You can also call it index coverage. When a query statement meets the conditions of overwriting an index, MySQL can return the data required by the query only through the index. In this way, the operation of returning to the table after the index is queried is avoided, which reduces I/O and improves efficiency. For example, the table covering_index_SAMPLE has a common index IDx_KEY1_KEY2 (key1,key2). Select key2 from covering_index_SAMPLE where key1 = ‘keytest’; Can be queried by overwriting the index without returning to the table.
Here is a more detailed description of overridden and federated indexes
Cover index
Create a secondary index and use index overwrites if you can get data directly from the secondary index file without accessing the clustered index (increment primary key index) file.
This is extremely efficient.
select a from table xxx where b = 2Copy the code
As in the above statement, the index can be used to execute the SQL if it is only indexed for column B, but since the data for column A is not in the B index, the index needs to access the clustered index file again.
If you create a joint index like (b,a), the joint index file will contain the values of columns A and B, which can be overridden by the index.
Joint index
A federated index is a multi-column index that exists to improve query performance.
CREATE TABLE `xxxx` ( `id` int(10) NOT NULL AUTO_INCREMENT COMMENT 'on the', `code` int(10), `age` int(10), PRIMARY KEY (`id`), KEY `code` (`code`)) ENGINE=InnoDBCopy the code
Some students will think that the use of single-column index can be directly, why must use the union index. In fact, as you can see from the example I gave above, if you just use a single column index, even though you also use the index, you often have to backtrack to the clustered index, which is still a performance penalty, especially if you have to do things like sort, which is even slower. Here’s another example, paging queries
select id,code from xxxx order by age limit50000, 10;Copy the code
This SQL query is not efficient if it is indexed only on the AGE column, but how to create a federated index (age,code) can be done without backtracking to the clustered index.
Do not index columns with simple enumerated values
A secondary index for a column with only 0 and 1 values is unnecessary because there is no distinction between them. For example, if you look at 0, you can find a lot of data from B+ numbers, which is poor performance.
Indexed columns do not participate in the calculation
If you want to query, you need to calculate the data in the tree first and then compare, which is too expensive and extremely slow. Therefore, the index column uses a function, which cannot use the index at all. MySql does not support this.
Expand indexes if you can, and try not to create new ones
If the database has an index (a,b) and now needs an index (B), create (a,b). Because queries like b=11 or b in (11,22),
MySql is optimized for indexing, so you can use it with confidence.
Advantages and disadvantages of indexes
Advantages: fast retrieval, reduce I/O times, speed up retrieval; Grouping and sorting by index can speed up grouping and sorting;
Disadvantages: Indexes are tables themselves and therefore take up storage space. Index maintenance and creation require time costs, which increase with the volume of data; Building an index reduces the efficiency of table modification operations (delete, add, modify) because the index table needs to be modified at the same time as the table is modified.
If you have any questions about the MySQL index, please feel free to comment or send me a message
There are some learning materials for you, click on MySQL to get free