Learn more about Mysql indexes

Please follow my wechat official account [Mflyyou] for continuous updates.

Github.com/zhangpanqin… Collect technical articles and my series of articles, welcome to Star.

preface

The indexes in the database are intended to improve query efficiency and will be like the contents of a dictionary.

Once we understand how indexing works, there is no need to memorize the so-called Mysql catch-all.

In this paper, the content

Index type: UNIQUE, FULLTEXT, SPATIAL, NORMAL
Why the index is a B+ tree structure, why not a binary tree, B- tree
Mysql > select B+ tree index and Hash index
Why should index use followLeftmost matching principle
Joint index,Clustering index 和 Cover indexWhat are they
What are the criteria for index addition

The index

Mysql > create index ();

Normal index
The only index
The full text indexing
Spatial index

Mysql > create index ();

B+Tree, storage engineInnoDB 和 MyISAMAll support. Because we usually use storage enginesInnoDB 和 MyISAMWe all useB+TreeIndex of data structures.
HASH, storage engineMEMORYSupport, storage engineInnoDB 和 MyISAMYou cannot manually define HASH indexes.

So, let’s look at B+Tree in detail.

Let’s take a look at the differences between the data structures of the two indexes and get a feel for their respective usage scenarios.

Index of Hash data structure

Hash data structures, which we can understand as Java maps, store key-value pairs of data, but there is no way to do range lookup. However, its equivalent search is faster and the time complexity is O(1).

Create a table with id and description fields, and add a HASH index to description. Storage engines use MEMORY

DROP TABLE IF EXISTS `index_hash_test`;
CREATE TABLE `index_hash_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `description` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL.PRIMARY KEY (`id`) USING BTREE,
  KEY `inx_descrption` (`description`(100)) USING HASH
) ENGINE=MEMORY AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ROW_FORMAT=DYNAMIC;
INSERT INTO `index_hash_test` VALUES (1.'a');
INSERT INTO `index_hash_test` VALUES (3.'b');
INSERT INTO `index_hash_test` VALUES (2.'c');
INSERT INTO `index_hash_test` VALUES (4.'d');
INSERT INTO `index_hash_test` VALUES (5.'e');
Copy the code

HASH index range lookup does not take effect

SQL > alter table scan
EXPLAIN SELECT * FROM index_hash_test WHERE description > = 'b'
Copy the code

HASH equivalent lookup takes effect

EXPLAIN SELECT * FROM index_hash_test WHERE description = 'b'
Copy the code

B+Tree Indicates the index of the data structure

B+TreeIndexes of data structures can be queried in a range.

DROP TABLE IF EXISTS `index_test`;
CREATE TABLE `index_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `description` varchar(255) DEFAULT NULL.PRIMARY KEY (`id`),
  KEY `inx_descrption` (`description`)USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `index_test` VALUES (1.'a');
INSERT INTO `index_test` VALUES (3.'b');
INSERT INTO `index_test` VALUES (2.'c');
INSERT INTO `index_test` VALUES (4.'d');
INSERT INTO `index_test` VALUES (5.'e');
Copy the code

Execute the query plan on the index table of the B+Tree data structure, and you can see that the index is available at the time of query.

EXPLAIN SELECT * FROM index_test WHERE description > = 'b';
EXPLAIN SELECT * FROM index_test WHERE description > 'b';
EXPLAIN SELECT * FROM index_test WHERE description = 'b';
Copy the code

Looking at the execution plan separately, you can see that both equivalent lookup and range lookup use indexes, but there are performance differences between the three (more on this later).

B+Tree Indicates the data structure

In the actual development, we use InnoDB and MyISAM storage engines, so we mainly study B+Tree index

If you're interested, you can go to this website and see how the data structure works. Such as B + tree insert, delete, and query https://www.cs.usfca.edu/~galles/visualization/Algorithms.htmlCopy the code

Compared to binary trees, B+ trees have more child nodes, lower tree height, and reduce the number of traversals to reduce I/O times (loading data from disk into memory) when searching for data.

Compared to a B-tree, the leaf nodes are ordered and only the leaf nodes store data.

For example, when querying data greater than 3, if you find 3, you can directly traverse the linked list on 3 to query data greater than 3.

More indexes on a table is not always better; no more than five indexes are generally recommended. Because when we modify the data, the database also generates computations and IO to maintain the data structure of the index, which affects the performance of the data. This part of the impact can be seen when indexes are large and data volumes are large.

When more than five indexes are added for service needs, the performance of the database is affected by too many indexes. Consider a vertical split of the table to split some business fields into another table.

Index related nouns

CREATE TABLE `my_test` (
  `id` int(11) NOT NULL,
  `name` varchar(255) DEFAULT NULL,
  `age` varchar(200) DEFAULT NULL,
  `phone` varchar(200) DEFAULT NULL.PRIMARY KEY (`id`) USING BTREE,
  KEY `index_a_b` (`name`(20),`age`(5)) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `my_test` VALUES (1.'a'.'1');
INSERT INTO `my_test` VALUES (2.'a'.'2');
INSERT INTO `my_test` VALUES (3.'a'.'3');
INSERT INTO `my_test` VALUES (4.'b'.'1');
INSERT INTO `my_test` VALUES (5.'b'.'2');

Copy the code

The diagram of the index is just a schematic demonstration, focusing on understanding.

Clustering index

A leaf node of a clustered index (also known simply as a primary key) stores the entire row of data, whereas a leaf node of a non-clustered index stores the index data and the primary key.

Non-leaf nodes are primary keys, and leaf nodes have rows corresponding to primary keys.

When you query data using a primary key, you are actually querying the clustered index and then reading the data out of it.

Joint index

Joint index: Multiple columns form an index. For example, name and age form an index

We see that the union index is sorted by name, age, and when the name is the same, it is sorted by age, and the leaf node will have the id data.

Query this index to get the primary key of the corresponding row, and then query the cluster index based on the primary key ID to get the entire row.

Cover index

Overwrite index, also a federated index. But the data we query is in the index, so we don’t have to go back to the table to find the data.

SELECT `name`,age FROM my_test WHERE `name` = b;
Copy the code

The above SQL execution makes use of an overwrite index, where the results of the query reside.

SELECT * FROM my_test WHERE `name` = b;
Copy the code

The index contains only ID,name,age, and phone. This data can only be queried through the table.

Select * from (name,age); select * from (age); select * from (name,age);

Do not use select * when querying the required field. The advantage of this method is to reduce IO, and another is to avoid the table back.

The smallest unit of Mysql interaction with disk is Page. The B+ tree actually stores pages of data, which can be approximated by the following figure.

Figure from how MySQL Works: Understanding MySQL from the root

Use and addition of indexes

Now that we know the B+ tree index data structure, we have a pretty good idea of how to use indexes and understand some of the rules for using indexes.

Index failures, as they are commonly called, can be partly deduced from data structures. On the other hand, mysql determines that the index performance will be higher if the index is not used.

Indexes are used for retrieving small amounts of data from tables. If you add an index and need to scan most of the data in the table, you do not need to add an index in this field, because it does not improve your query, but rather because the data changes need to maintain the index, and may reduce query performance.

Usually, we will select a highly differentiated field to index (if the field is not relevant to the query business, there is no need to index).

For example, in the field of gender, only male and female options, when you have 1000 W of data, 700W is male, 300W is female, there is no need to add index, meaningless. Gender is a less discriminative field.

SHOW INDEX FROM index_test;
Copy the code

After we create the index, we can look at Cardinality to see if the index was added properly. The closer the Cardinality/ table rows value is to 1, the better the query performance.

Cardinality represents the number of unique values of the data in this index. There are now five rows of data in the table, and there are no duplicate values of description.

The database also optimizes queries based on Cardinality, but this value is not updated in real time, so we need to update this value every once ina while when business is not busy.

ANALYZE [NO_WRITE_TO_BINLOG | LOCAL] TABLE tbl_name [, tbl_name] ...

mysql> analyze table index_test;
+-----------------------+---------+----------+----------+
| Table                 | Op      | Msg_type | Msg_text |
+-----------------------+---------+----------+----------+
| index_blog.index_test | analyze | status   | OK       |
+-----------------------+---------+----------+----------+
1 row in set (0.00 sec)
Copy the code

Leftmost matching rule

When we use the index, we only need to contain the left of the index to match the index (name,age).

SELECT * FROM my_test WHERE `name` = 'b';

SELECT * FROM my_test WHERE `name` = 'b' AND age = 1;
Copy the code

The index (name,age) is not used when we execute the following SQL

See the performance of a query by viewing the execution plan: full table scan
EXPLAIN SELECT * FROM my_test WHERE age=1;
Copy the code

The leftmost matching principle is determined by the data structure of the Mysql index.

The leaf nodes in the B+Tree data structure of the joint index (name,age) are sorted by name and then by age. Age is actually out of order and there is no way to do a range lookup.

If you also want to index lookups in age, you need to create a new index on age.

-- Full table scan
EXPLAIN SELECT * FROM my_test WHERE NAME LIKE '%a';

-- Index range lookup
EXPLAIN SELECT * FROM my_test WHERE NAME LIKE 'a%';
Copy the code

The length of a field (from the beginning of the value) is specified to create an index.

When you need a keyword search, use a full-text index, or add an ES for the search.