MySQL B tree index concept, use, optimization and use scenarios

0 foreword

This article will not explain the basic knowledge of index, mainly about the MySQL database b-tree index related principles, some of the knowledge in the reference to MySQL Technology Insider book, is also a summary of these knowledge. For more information about b-trees and b-trees, check out this blog post: Throw this article at the interviewer who asks you about B-trees and B-trees

1 Index management

There are many types of indexes: normal indexes, unique indexes, primary key indexes, composite indexes, and full-text indexes. Let’s look at how to create and drop these types of indexes.

1.1 Creation methods of Indexes

Index creation can be done in a number of situations.

  • Create index directly
CREATE [UNIQUE|FULLLTEXT] INDEX index_name ON table_name(column_name(length))
Copy the code

[UNIQUE | FULLLTEXT] : it means the choice of index type, the only index or a full-text index, without words is common indexes. Table_name: the name of the table to which the index is to be added. Column_name (length) : Column_name is the column name of the table, and length indicates that the index is added to the previous length row of the column.

  • Add indexes by modifying the table structure
ALTER TABLE table_name ADD [UNIQUE|FULLLTEXT] INDEX index_name (column(length))
Copy the code
  • Create an index when you create a table
CREATE TABLE `table` (
    `id` int(11) NOT NULL AUTO_INCREMENT ,
    `title` char(255) CHARACTER NOT NULL ,
    PRIMARY KEY (`id`),
    [UNIQUE|FULLLTEXT] INDEX index_name (title(length))
)
Copy the code

1.2 Primary key index and composite index creation method

All the above mentioned are normal index, unique index and full-text index creation method, but, the primary key index and composite index creation method is a little different, so take out separately to talk about.

Composite index creation mode

  • Create an index when you create a table
CREATE TABLE `table` (
    `id` int(11) NOT NULL AUTO_INCREMENT ,
    `title` char(255) CHARACTER NOT NULL ,
    PRIMARY KEY (`id`),
    INDEX index_name(id,title)
)
Copy the code
  • Add indexes by modifying the table structure
ALTER TABLE table_name ADD INDEX name_city_age (name,city,age); 
Copy the code

Primary key Index Creation Method A primary key index is a special unique index. Each table can have only one primary key and no empty value is allowed. The primary key index is typically created while the table is being built.

CREATE TABLE `table` (
    `id` int(11) NOT NULL AUTO_INCREMENT ,
    `title` char(255) CHARACTER NOT NULL ,
    PRIMARY KEY (`id`)
)
Copy the code

1.3 Deleting an Index

An INDEX can be dropped using the ALTER TABLE or DROP INDEX statement. Similar to the CREATE INDEX statement, DROP INDEX can be handled as a statement inside ALTER TABLE. The syntax is as follows.

(1)DROP INDEX index_name ON talbe_name (2)ALTER TABLE table_name DROP INDEX index_name (3)ALTER TABLE table_name DROP PRIMARY KEY

The third statement is used only when the PRIMARY KEY index is dropped, because a table can have only one PRIMARY KEY index, so you do not need to specify the index name.

1.4 Index Instance

The above talked about the basic knowledge, next, or through a specific example to experience.

  • Step1: create the table
 create table table_index(
    id int(11) not null auto_increment,
    title char(255) not null,
    primary key(id)
);
Copy the code
  • Step2: Add an index

First, we add a plain index using the direct add index method.

CREATE INDEX idx_a ON table_index(title);
Copy the code

Next, we add the index when we modify the table structure.

ALTER TABLE table_index ADD UNIQUE INDEX idx_b (title(100));
Copy the code

Finally, we add another composite index.

 ALTER TABLE table_index ADD INDEX idx_id_title (id,title);
Copy the code

Now that we’ve used all the previous indexing methods, I’m sure you’re familiar with them.

  • Step3:SHOW INDEXCommand to view index information

If you want to view the INDEX information in a table, you can use the SHOW INDEX command.

 SHOW INDEX FROM table_index\G;
Copy the code

Get the information up here. What does the information up here mean? Let’s go one by one!

field explain
Table The table where the index resides
Non_unique Non-unique index. If 0, it is unique, that is, 0 if the column index does not contain duplicate values and 1 otherwise
Key_name The name of the index, or PRIMARY if it is the PRIMARY key
Seq_in_index The position of the column in the index, starting at 1, in the order in which the fields were indexed if it is a composite index
Collation How columns are stored in an index. Can be A or NULL, B tree index is always A, sorted,
Sub_part Whether part of a column is indexed, 100 if only the first 100 rows, NULL if the entire column
Packed Whether the keyword is compressed. If not, NULL
Index_type Index type, for InnoDB only supports B tree index, so all display BTREE
  • Step4: delete the index

Delete the index directly

DROP INDEX idx_a ON table_index;
Copy the code

Drop indexes when modifying table structures

ALTER TABLE table_index DROP INDEX idx_b;
Copy the code

1.5 Cardinality Keyword parsing

The Cardinality keyword is so critical that the optimizer uses this value to determine whether to use the index. In a B-tree index, only highly selective fields are meaningful. High selectivity means that the field has a wide range of values, such as the name field, which has many names and is highly selective.

In general, you can use the Cardinality keyword to determine whether an index is needed. If it is very close to 1, it is necessary to use the index. If it is very small, it is necessary to use the index.

Note that this keyword is not up-to-date. To update the ANALYZE TABLE, use the ANALYZE TABLE, for example.

analyze table table_index;
Copy the code

Because there’s no data, you can see that this value is always 0, it doesn’t change.

InoDB storage engine Cardinality strategy

In the InnoDB storage engine, updates to this keyword occur in two operations: INSERT and update. However, it is not always updated, which can increase the load, so there is a policy for updating this keyword:

  • In the table1/16Changes in the data
  • InnoDB stores counters for the enginestat_modified_conter> 2000000000

By default InnoDB storage engine samples 8 leaf nodes as follows:

  • The number of leaf nodes in the index of B tree is denoted asA
  • randomGet the index of the B tree8One leaf node. Count the number of different records on each page, respectively p1-P8
  • Estimated Cardinality based on sampling information:(p1 p2 p3 ... p8)*A/8

Because of random sampling, the Cardinality value is different every time, and only one case is the same, that is, the leaves in the table are less than or equal to 8, in which case, the random sampling is all 8, so it is the same.

1.6 Fast Index Creation

Before MySQL 5.5, adding or deleting an index required creating a temporary table each time, importing data into the temporary table, and then deleting the original table. If a large table did this operation, it would be very time-consuming, which was a big disadvantage.

InnoDB storage engine has added a Fast Index Creation method since version 1.0.x.

The strategy of this approach is to add an S lock (shared lock) to each indexed table. At creation time, there is no need to rebuild the table. Deleting the secondary index only requires updating the internal view and marking the secondary index space as available, so this efficiency is greatly improved.

1.7 Online Data Definition

Online data definition operations were initially supported in MySQL5.6, allowing secondary index creation along with other DM operations such as INSERT, UPDATE, and DELETE, which greatly improved database availability.

So, we can use the new syntax to create an index:

ALTER TABLE table_name ADD [UNIQUE|FULLLTEXT] INDEX index_name (column(length))
[ALGORITHM = {DEFAULT|INPLACE|COPY}]
[LOCK = {DEFAULT|NONE|SHARED|EXLUSIVE}]
Copy the code

ALGORITHM Specifies the ALGORITHM to create or delete an index

  • COPY: method of creating temporary tables
  • INPLACE: No temporary tables need to be created
  • DEFAULT: Based on parametersold_alter_tableParameter check if yesOFF, the use ofINPLACEThe way of

LOCK represents the condition in which a LOCK is added to the table

  • NONE: No lock is added
  • SHARE: add an S lock, concurrent read can take place, write operations need to wait
  • EXCLUSIVE: Add an X lock to prevent concurrent read and write operations
  • DEFAULT: Check whether it can be usedNONE, if not, check whether it can be usedSHARE, if not, judge whether it can be usedEXCLUSIVEMode.

2. Use of b-tree indexes

2.1 Joint Index

A joint index is an index of multiple columns in a table. In this part, we will use several examples to explain the relevant knowledge points of a joint index.

First, we create a table and a federated index for that table.

create table t_index(
a char(2) not null default ' ',
b char(2) not null default ' ',
c char(2) not null default ' ',
d char(2) not null default ' '
)engine myisam charset utf8;
Copy the code

Create a federated index

alter table t_index add index abcd(a,b,c,d);
Copy the code

Insert some test data

insert into t_index values('a'.'b'.'c'.'d'),
('a2'.'b2'.'c2'.'d2'),
('a3'.'b3'.'c3'.'d3'),
('a4'.'b4'.'c4'.'d4'),
('a5'.'b5'.'c5'.'d5'),
('a6'.'b6'.'c6'.'d6');
Copy the code

At this point, we have basically prepared the data we need, and we can further explore the joint index.

When do we need to create a federated index

Index to establish the main purpose is to improve the efficiency of the query, so the purpose of the joint index is also similar, the purpose of the joint index is to improve the efficiency of the presence of multiple queries condition, established as the above table, there are multiple fields, when we need to use multiple fields for a query, we will need to use the combined index.

When will a joint index come into play

Sometimes we use federated indexes, but we don’t really know how they work, when they work and when they don’t.

With that in mind, let’s look at the leftmost matching principle for federated indexes.

Left-most matching principle: This principle means that the composite index is created with the leftmost column as the criterion, and the query will use the index as long as the leftmost column is in the criterion.

Let’s look at this principle with a few examples.

EXPLAIN SELECT * FROM t_index WHERE a = 'a' \G;
Copy the code

Let’s look at the result of this statement. First, we see that the index is used, because the query has the leftmost column A in it. How many indexes are used? We need to look at key_len, we know that utF8 encodes a character of 3 bytes, and we use the data type char(2), which is 2 bytes, and the index is 2*3 = 6 bytes, so only one index works.

EXPLAIN SELECT * FROM t_index WHERE b = 'b2' \G;
Copy the code

As can be seen from this statement, possible_keys is not used because it is empty. In addition, rows can be seen from the number of queried rows is 6 (there are 6 rows in total in our test data), indicating that it has been completely scanned, indicating that this situation does not comply with the left-most matching principle, so index query will not be used.

EXPLAIN SELECT * FROM t_index WHERE a = 'a2' AND b = 'b2' ORDER BY d \G;
Copy the code

Select * from Extra where key_len = 12 and select * from Extra where len = 12 and select * from Extra where len = 12 and select * from Extra where len = 12 and select * from Extra where len = 12 and select * from Extra where len = 12 and select * from Extra where len = 12 and select * from Extra where len = 12. The above query using the index of a and b, but when we use d field to sort (a, d) or (b, d) the two indexes are not sorted, the use of combined index has a benefit, is the index of the next field is automatically sort, here in this case, the c field is sort of, but d is not, If we sort by c we get a different answer.

EXPLAIN SELECT * FROM t_index WHERE a = 'a2' AND b = 'b2' ORDER BY c \G;
Copy the code

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

At this point, I believe that through the above several examples, the knowledge of the joint index has been very thorough and clear. Finally, let’s talk about some common questions.

Q1: Why not create an index for each column in the table

First, creating and maintaining indexes takes time, which increases with the volume of data. Second, the index needs to occupy physical space, in addition to the data table occupies data space, each index also needs to occupy a certain physical space, if the establishment of clustered index, so the space will be larger. Third, indexes are maintained dynamically when adding, deleting, or modifying data in a table, which slows down data maintenance.

Q2: Why do you need to use federated indexes

Reduce expenses. Create a joint index (col1,col2,col3), which is equivalent to (col1),(col1,col2),(col1, col3). Each additional index increases the overhead of write operations and disk space. For tables with a lot of data, using a federated index can greatly reduce overhead!

Overwrite the index. Select col1,col2,col3 from test where col1=1 and col2=2. MySQL can then retrieve data directly by traversing the index without returning to the table, which reduces a lot of random I/O operations. Reducing IO operations, especially random IO, is a major optimization strategy for DBAs. Therefore, in real application, overwriting index is one of the main optimization methods to improve performance.

High efficiency. The more indexed columns, the less data is filtered through the index. Select * from table where col1=1 and COL2 =2 and COL3 =3 select * from table where col1=1 and COL2 =2 Then the index can filter 1000W10%= 100W data, and then go back to the table from 100W data to find col2=2 and COL3 = 3 data, and then sort, and then paging; If it is a joint index, through the index screening 1000W10% 10% *10%= 1W, efficiency improvement can be imagined!

Overwrite index An overwrite index is a record that can be queried from a secondary index, rather than from a clustered index. One advantage of using an overwrite index is that the secondary index does not contain all the information for the entire row, so it is much smaller than the clustered index, and therefore, IO operations can be greatly reduced. Another benefit of overwriting indexes is that they are optimized for statistical problems, as shown in the following example.

explain select count(*) from t_index \G;
Copy the code

Select Tables Optimized Away (myISam) Select Tables Optimized Away (myISam)

If the InnoDB engine is used, the Extra column outputs a Using index statement indicating that the InnoDB engine optimizer uses an overwrite index operation.

2.2 Index Prompt

The MySQL database supports INDEX hints, which can be displayed to tell the optimizer which INDEX to use. There are two ways to use INDEX hints:

  • The MySQL database optimizer selected an index incorrectly, causing SQL to run slowly
  • When an SQL statement has a large number of indexes to choose from, the optimizer’s selection of scheduled execution time may be more expensive than the SQL statement itself.

Select * from t_index; select * from t_index;

alter table t_index add index a (a);
alter table t_index add index b (b);
alter table t_index add index c (c);
Copy the code

Next, we execute the following statement;

EXPLAIN SELECT * FROM t_index WHERE a = 'a' AND b = 'b' AND c = 'c' \G;
Copy the code

You can see that this statement can use three indexes. At this point, we can display the use index prompt to use a, as follows:

EXPLAIN SELECT * FROM t_index USE INDEX(a) WHERE a = 'a' AND b = 'b' AND c = 'c' \G;
Copy the code

If the optimizer still does not select the INDEX you want, you can use another method: FORCE INDEX.

EXPLAIN SELECT * FROM t_index FORCE INDEX(a) WHERE a = 'a' AND b = 'b' AND c = 'c' \G;
Copy the code

This way, you must select the index you want.

2.3 Index Optimization

Multi – Range Read optimization

MySQL5.6 is now supported. This optimization is intended to reduce random access to disks and convert random access to more sequential data access. This optimization applies to queries of the range, REF, and Eq_REF types.

Multi-range Read optimization benefits:

  • Make data access sequential.
  • Reduces the number of times pages in the buffer are replaced.
  • Batch processing of key query operations.

We can use the flag in the optimizer_switch parameter to control whether multi-range Read optimization is turned on. The following way will set the state to always on:

SET @@optimizer_switch='mrr=on,mrr_cost_based=off';
Copy the code
Index Condition Pushdown (ICP) optimization

This optimization has been supported since MySQL5.6. Before this optimization, when performing index queries, we first looked up records by index and then filtered records by WHERE criteria. However, when ICP optimization is supported, the MySQL database will judge whether the WHERE condition filtering can be carried out when the index is retrieved, that is, the WHERE filtering part is placed in the storage engine layer, which greatly reduces the record demand of the upper SQL.

ICP supports range, REF, Eq_ref, ref_OR_NULL queries and currently supports MyISAM and InnoDB storage engines.

We can turn ICP on with the following statement:

set @@optimizer_switch = "index_condition_pushdown=on"
Copy the code

Or close:

set @@optimizer_switch = "index_condition_pushdown=off"
Copy the code

When ICP is enabled, you can see the Using Index Condition prompt in plan Extra.

3 Index features, advantages, disadvantages, and application scenarios

Index features

  • Can speed up the database retrieval speed
  • Reduce the maintenance speed of database insertion, modification, and deletion
  • Can only be created on tables, not views
  • It can be created either directly or indirectly

Advantages of indexes

  • Create unique indexes to ensure the uniqueness of each row in a database table
  • Greatly speed up the retrieval of data
  • Speeding up joins between database tables is of particular interest, especially in achieving referential integrity of data
  • When grouping and sorting terms are used for data retrieval, the query time can also be significantly reduced
  • By using indexes, you can use optimization hiders in queries to improve system performance

Disadvantages of indexes

  • First, creating and maintaining indexes takes time, which increases with the volume of data.
  • Second, the index needs to occupy physical space, in addition to the data table occupies data space, each index also needs to occupy a certain physical space, if the establishment of clustered index, so the space will be larger.
  • Third, indexes are maintained dynamically when adding, deleting, or modifying data in a table, which slows down data maintenance.

Application scenarios of indexes

  • Match the whole value

Specifying a specific value for all columns in the index is a condition that all columns in the index have an equivalent match.

  • Range query for matched values

Range lookups can be performed on the value of an index.

  • Matches the leftmost prefix

Using only the leftmost column of the index, for example, a union index on col1 col2 col3 can be used by an equivalent query containing col1, (col1 col2), (col1 col2 col3), But cannot be used by col2, (COL2, COL3) equivalent query. The left-most matching rule is the first rule used in MySQL b-tree indexes.

  • Only indexes are queried

Queries are more efficient when the columns in the query are in the fields of the index, so you should avoid using SELECT * and only look up the fields that you need.

  • Matching column prefix

Lookup using only the first column in the index and containing only the beginning of the first column of the index.

  • The ability to achieve accurate index matching in part and range matching in other parts
  • If the column name is an index, then using column_name is null would use the index, as in the following:
explain select * from t_index where a is null \G
Copy the code
  • Fields that often appear after the keywords ORDER by, group by, distinct
  • Result set field for operations on sets such as union
  • Often used as a table join field
  • Consider using index overwrites, where the data is rarely updated. If users frequently query your fields, consider creating indexes on those fields to change a table scan to an index scan

Index failure condition

  • A like query starting with % cannot use the B-tree index. If the key value in the execution plan is null, no index is used
  • Indexes are not used for implicit conversions of data types, for example,where 'age' 10=30
  • Perform a function on an indexed column for the same reason
  • Regular expressions do not use indexes
  • Strings and data comparisons do not use indexes
  • In the case of a compound index, if the query condition does not contain the leftmost part of the index column, that is, the leftmost principle is not met, the compound index is not used
  • If MySQL estimates that using an index is slower than a full table scan, do not use the index
  • A condition split with or. If the column in the condition before or has an index and the column behind it has no index, the index involved will not be used
  • Use negative queries (not, not in, not like, <>,! =,! >,! <) does not use indexes

Refer to the article

  • www.yuanrengu.com/index.php/2…
  • Segmentfault.com/a/119000000…
  • www.jb51.net/article/157…
  • www.cnblogs.com/xiaoxi/p/82…
  • MySQL Tech Insider

If there are any improper articles, please correct them. If you like reading on wechat, you can also follow my wechat official account: Learn Java well and get quality learning resources.