Hello everyone, MY name is Li Tu, this issue brings the index of Mysql again, last issue we talked about the personal experience of learning Mysql from the general direction, interested can take a look: learning Mysql hundred million points of opinion

Later can be bold in the work of writing technical articles, hee hee hee, of course, is in the work of interface development under the premise of the end, but with my development speed, the development of interface is not a breeze.

Why can I write technical questions at work? See below:

Because of blogging is one of the kpi, friends will ask is there so great, return true have, because the internal to learn and share together, will be the RD were asked to write a blog, as one of the kpi, then good article will share to internal team, mutual learning, so I work even in the blog had been seen by leadership, can also be aboveboard.

Ok, so without further discussion, let’s jump right into today’s business, and first let’s talk about what indexes are.

The index concept

Concept: Indexes are data structures that improve the efficiency of mysql queries. In a word, index is a data structure to improve query efficiency.

Database query is one of the most important functions of a database. Designers want to query data as fast as possible, so the designers of database systems will optimize from the perspective of query algorithm.

Of course, the most basic query algorithm is linear Search. This algorithm with the complexity of O(n) is obviously bad when the amount of data is large. Fortunately, the development of computer science provides many better search algorithms, such as sequential search, split search, fast search and so on.

However, each search algorithm can only be applied to a specific data structure. For example, sequential search relies on sequential structure, and split search implements binary search through binary search tree or red-black tree.

Therefore, in addition to the data, the database system maintains data structures that satisfy a particular search algorithm. This data structure is called an index.

Index Performance Analysis

At present, most database systems and file systems use b-Tree or its variant B+Tree as the index structure. B+ tree index is an implementation of B+ tree in the database. It is the most common and the most frequently used index in the database.

From the earliest balanced binary trees. B+ Tree is gradually optimized by binary search Tree, balanced binary Tree (AVLTree) and balanced multipath search Tree (B-tree).

So why does mysql select B+?

Ordered array, Hash index, red-black tree, binary search tree and AVL tree can also be used as data structure or to implement index. However, B tree or B+ tree are commonly used in file system and database system. Here, the characteristics of each index and the composition principle of calculation are combined for in-depth analysis.

However, for Mysql, the best query is suitable for it. On the one hand, to achieve efficient query, in addition to simple conditional query, but also support ordered efficient index range query, grouping.

Ordered arrays perform very well in both equivalent and range queries, so why not use ordered arrays as indexes? Ordered arrays are also not used as the underlying implementation of indexes because it is too expensive to update arrays as indexes, and new data moves all subsequent data back one bit.

Hash is stored in the form of key-value, which is suitable for the equivalent query scenario. The time complexity of the query is O(1). Because the hash storage is not orderly, it may be necessary to traverse all the data for the range query, and hash conflicts may occur when different values are calculated. So hashes are not suitable for Mysql indexes.

On the other hand, in addition to high query efficiency, there should be high data read efficiency (IO), we all know that the computer’s random disk IO efficiency is very low.

So why is hard drive access so slow?

This is about the hard disk read and write principle, there are many kinds of hard disk, but are composed of disk, head, disk spindle, control motor, head controller, data converter, interface, cache and other parts.

All the platters are fixed on one axis, which is called the main axis of the platters, and all the platters are absolutely parallel, also forming a column, and each of the platters has a head, and each head is on the same axis, so when you look down from the top, the heads are absolutely overlapping.

All the controller head even in a head, the head controller is responsible for the movement of each head, head can move along the direction of the radius of the disc, is actually the oblique movement, each head at the same time must be coaxial platter with thousands to tens of thousands of RPM speed in high speed, so that the head can specify the location of platter data read and write operations:

Disk data read and write principles

The platter is divided into a series of concentric rings, the center of which is the center of the platter. Each concentric ring is called a track, and all tracks of the same radius form a cylinder.

Tracks are divided into small segments along the radius. Each segment is called a sector. Each sector is the smallest storage unit on the disk. For the sake of simplicity, let’s assume that the disk has only one platter and one head.

When the disk reads the data, the system will transmit the logical address of the data to the disk, and the disk’s control circuit will translate the logical address into the physical address according to the addressing logic, that is, determine the data to be read in which track, which sector.

To read the sector data, need to put the head in the above sectors, in order to achieve this, the head need to move on the corresponding track, this process is called seek, called the time-consuming seek time, and then rotating disk will rotate to the target sectors, this process is called the rotation time.

That is, a read/write operation on a disk consists of three actions:

  • Seek (time) : the head moves to the specified track.
  • Rotation delay (time) : waits for the specified sector to rotate past the head.
  • Data transfer (time) : The actual transfer of data between disk and memory

Additional knowledge:

  • Disk: Each disk has two disk surfaces. Generally, each disk can be used and can store data. Disk number is also called head number because each valid disk has a corresponding read/write head.
  • Tracks: When a disk is formatted, it is divided into concentric circles called tracks. Tracks are numbered from 0 to 0 and information is recorded in these tracks in the form of impulse trains. These concentric circles do not record data continuously, but are divided into segments.
  • The same tracks on all disk surfaces form a cylinder, usually called a cylinder. The same tracks on all disk surfaces form a cylinder, usually called a cylinder. Data is read/written on the disk surface instead of the cylinder surface. When a track is full of data, data is written on the disk surface next to the same cylinder. When a cylinder is full, data is moved to the next sector to write data.

Improved disk data read and write principles

Principle of locality and disk prefetch. Due to the characteristics of the storage medium, the disk itself access is much slower than the main memory, coupled with the cost of mechanical movement, disk access speed is often a few hundred cents of the main memory, so in order to improve efficiency, to minimize disk I/O.

To achieve this, instead of reading strictly on demand, the disk often prereads each time. Even if only one byte is needed, the disk reads a certain length of data backward from this position into memory.

This is based on the well-known principle of locality in computer science: when one piece of data is used, nearby data is usually used immediately.

Therefore, the data required for the duration of the program should usually be centralized. Because sequential disk reads are very efficient (no seek time and very little rotation time), prereads can improve I/O efficiency for local programs.

The preread length is usually an integer multiple of 4K page. A page is a logical block of computer managed memory. Hardware and operating systems tend to divide main storage and disk storage into contiguous equally sized blocks. Each block of storage is called a page (in many operating systems, the size of a page is usually 4K).

When the program to read the data is not in the main memory, will trigger a missing page exception, at this time the system will be sent to disk read disk signal, disk will find the starting position of data and backward read a page or several pages loaded into memory, and then abnormal return, the program continues to run.

Therefore, because of the mechanical motion involved in the hard disk, so the disk IO consumption time is very large, compared to the memory read speed, like the speed of light compared to the speed of sound.

Therefore, if memory conditions allow, Mysql is willing to load all data into memory at once for reading and writing. In general, the indexes themselves are too large to be stored entirely in memory, so they are often stored on disk as index files.

Server memory size is also limited, a server may not only run Mysql a, more or less may be twenty or thirty, each need to allocate memory operating system.

In this case, the index lookup generates disk I/O consumption, which is several orders of magnitude higher than the memory access consumption, the number of disk I/O accesses during the lookup.

Some large tables in Mysql, a table may be several G, the index structure is very large, that server memory is not bursting.

Therefore, we must make a trade-off between memory and disk to measure, data in memory as far as possible, and in a small amount of data in the disk, the number of times to read disk control to a minimum, that is, the Mysql performance has a minimum impact, plus disk data read and write principle to improve data read efficiency.

So in the condition of many trees, B+ tree is how to stand out? Now let’s talk about B tree, B- tree, B+ tree, red black tree performance.

Binary tree, red-black tree, AVL tree, B tree, B+ tree performance analysis

B tree performance analysis: B tree is a binary search balanced tree, but B tree only has one keyword per node. In case of a large amount of data, THE tree height of B tree is very large and the performance is low:

Even in extreme cases, because there is no binary search tree balancing algorithm, so in some special cases, the binary search tree is equal to linear, poor case, designers found that reducing the height of the tree can naturally improve the search efficiency:

Red-black trees and AVL trees are based on binary trees with balancing algorithm. Red-black trees make sure that no path is twice as long as any other path. It is a weakly balanced tree while AVL is strictly balanced.

However, there is also the problem of a very tall tree due to a large amount of data, so the goal now is to compress the height of the tree.

B tree based on the height of the reduced tree, B tree is a multi-way search tree, each node can have more than two children, is not binary:

The biggest difference between B tree and B+ tree is that non-leaf nodes of B can store data, while only leaf nodes of B+ tree can store data. B tree is a multi-path search tree, and a node can store a lot of data, so the height of B tree is greatly reduced.

However, compared with B+ tree, B tree always searches down from the root node because each node may contain the target data when searching for data. This feature will bring a large number of random IO.

In the B+ tree, because the leaf node stores data (InnoDB), it stores more index data per page size than the B tree (16K), and the leaf node is connected to the adjacent node by bidirectional pointer.

And the adjacent nodes are ordered, so it is very convenient for the scope search, to get the first one that meets the condition, and then obtain the data through the pointer until the last one does not meet the condition.

So in summary: B+ tree is a multi-fork tree, the size of a data page is 16KB, the tree height of 1-3 can store more than 1 billion levels of data, that is, only 1-3 times of access to disk is enough, and the leaf node of B+ tree has a pointer to the next leaf node, which is convenient for range query:

Mysql index B+ tree query

B+ tree index principle

In B+Tree, all data record nodes are stored on leaf nodes of the same layer according to the order of key value, instead of only key value information stored on leaf nodes, which can greatly increase the number of key values stored on each node and reduce the height of B+Tree.

In B + tree structure, only on a leaf node storing data, only store the index in the leaf node, in the leaf node can have more space to store more index, such a B + tree can significantly increase the degree of d, thus reducing the B + tree height h, B the size of a node in the tree for the size of a page, or read a IO, The smaller the h, the less the number of IO:

dmax=floor(pagesize/(keysize+datasize+pointsize))

Floor means round down. Since the data field is removed from the nodes in the B+Tree, it can have greater outperformance and better performance.

Let’s look at the B+ tree search process. Mysql’s InnoDB index structure is shown below. Suppose we want to search for data with ID 15:

  1. Disk block 1 is found according to the root node and read into memory. Generally, the root node is also resident in memory and can even omit a disk I/O operation. [Disk I/O operation 1]
  2. Compare id 15 to the left of interval 28, so find disk 2 according to P1.
  3. Read disk 2 into memory and find 15 between (10,17). [Disk I/O operation # 2]
  4. Then locate disk block 5 according to the pointer p2 of disk 2 and read it into the memory. [Disk I/O operation for the third time]
  5. Finally, according to the id=15 to find the corresponding data, return the result.

Therefore, it only takes up to three disk IO attempts to find the corresponding data. From the schematic diagram of the B+ tree above, non-leaf nodes are formed like a catalog, also called an index page, and the data of the leaf nodes is found.

In MySQL, indexes are implemented differently by different storage engines. MySQL has two storage engines: MyISAM and InnoDB.

MyISAM

In the MyISAM storage engine, data and index files are stored separately. MyISAM stores three files named.frm,.myd, and MYI. FRM is the table definition file,.myd is the data file, and.myi is the index file.

Myisam only supports table locking and does not support transactions. Myisam is very good at reading data because it has a separate index file.

Myisam is also a B+ tree structure, but the leaf node of Myisam index holds the address of the row data. Therefore, the index retrieval algorithm in MyISAM first finds the address of the row data in the index tree, and then finds the corresponding row data according to the address.

As you can see, MyISAM’s index file only holds the address of the data record. Primary key index and secondary index, only primary index requires the key to be unique, while secondary index key can be repeated. If we create a secondary index on Col2, the index looks like this:

MyISAM indexes are also called “non-clustered” to distinguish them from InnoDB’s clustered indexes.

InnoDB

In InnoDB, data and index files are stored together. As shown in the figure, InnoDB has two files, named.frm and.idb, where.frm is the table definition file and IDb is the data file.

In InnoDB, the index file and data file are the same. In InnoDB, the key in the index file is the primary key index in the data table. Therefore, InnoDB’s index file is also the primary index file. As shown in the figure below:

If another field is specified as a normal index, the structure of the normal index tree looks like the figure below:

Therefore, when the query is not based on the primary key, it will first find the value of the primary key in the secondary index tree, and then find the value of the corresponding row data in the primary index tree. This is called back table, which reduces the query efficiency of the table.

Mysql Index type

There are not many types of indexes in Mysql. Different types of indexes have different functions, and there is a cross relationship between the functions of indexes. Indexes in Mysql are mainly divided into the following categories:

  1. PRIMARY KEY index: A PRIMARY KEY index is usually specified when a table is created. A table has only one PRIMARY KEY index. The PRIMARY KEY index is unique and non-null.
  2. UNIQUE: A UNIQUE index can be specified when a table is created or created after the table is created.
  3. “INDEX” : The only purpose of an INDEX is to speed up queries.
  4. INDEX: A composite INDEX creates a “multi-field INDEX”. This concept is based on the “left-most prefix” principle of the composite INDEX query as compared to the single-column INDEX above.
  5. FULLTEXT: A full-text index is an index created for large text fields. It is also called a full-text search.
  6. “Cluster index” and “non-cluster index” : The concepts of cluster index and non-cluster index are larger than the above concepts and belong to the relationship of include and be included. For example, InnoDB uses clustered indexes for primary key indexes.

If you want to view all indexes of a table, you can perform the following SQL:

Show index from table nameCopy the code

For example, to view the index in my test table, Key_name indicates the name of the index, Column_name indicates the column of the index:

The primary key index

Primary key index is the most common index type in InnoDB storage engine. Every table will have a primary key index, which is unique and cannot be null.

When creating a table, you can specify the primary KEY index by RIMARY KEY. In the InnoDB storage engine, if there is no subjective primary KEY index created, Mysql will see if there is a unique index in the table. If there is, Mysql will specify “non-empty unique index” as the primary KEY index

If there is no unique index, an auto-growing primary key of 6 bytes is generated as the primary key index by default. You can query the corresponding primary key by selecting _rowid from table name.

The MyISAM storage engine may not have a primary key index, but there are significant differences between the structure of data stored in MyISAM and InnoDB, which will be explained in detail in a later section.

The only index

Unique indexes differ from primary key indexes in that they are allowed to be null, as long as the column values created in a composite index are unique

Unique indexes are used to ensure that the data is unique in practice. If you just want the data to be queried quickly, you can also use a normal index, so the unique index is important to show that it is unique.

In actual business scenarios, some library table fields require unique, so unique indexes can be used. There are three ways to create unique indexes.

Select * from SQL where SQL > create table;

CREATE TABLE user( 
 id INT PRIMARY KEY NOT NULL, 
 name VARCHAR(16) NOT NULL, 
 UNIQUE unique_name (name(10)) 
);
Copy the code

(2) Create table after creating table:

CREATE UNIQUE INDEX unique_name ON user(name(10));Copy the code

(3) Create a table by modifying the structure of the table as follows:

ALTER user ADD UNIQUE unique_name ON (name(10))
Copy the code

One detail to note here is that the name field is created with a specified length of 16 characters, while the index is created with a specified length of 10 characters. Since no one’s name is longer than 10 characters, reducing the index length will reduce the size of the index space.

Normal index

The only purpose of a normal index is to speed up the query of data. It is generally used to create a normal index on the fields following the query statement WHERE and ORDER BY.

There are three ways to create a common INDEX, which are basically the same as creating a UNIQUE INDEX, but replace UNIQUE with INDEX, as shown below:

CREATE TABLE user(ID INT PRIMARY KEY NOT NULL, name VARCHAR(16) NOT NULL, INDEX index_name (name(10))); CREATE TABLE user(ID INT PRIMARY KEY NOT NULL, name VARCHAR(16) NOT NULL, INDEX index_name (name(10)); CREATE INDEX INDEX index_name ON user(name(10)); ALTER user ADD INDEX index_name ON (name(10)) ALTER user ADD INDEX index_name ON (name(10))Copy the code

If you want to delete an index, you can do so by executing the following SQL:

DROP INDEX index_name ON user;
Copy the code

Composite index

A composite index uses multiple fields to create an index. A composite index can avoid “back table query” and is more efficient than a single column index with multiple fields.

Select * from index where index = index; select * from index where index = index; select * from index where index = index;

ALTER TABLE employee ADD INDEX name_age_sex (name(10),age,sex) ALTER TABLE Employee ADD INDEX name_age_sex (name(10),age,sex);Copy the code

Back to the table query

What is a back table query? The backtable query simply says “query data through secondary index, can not obtain the complete data row, need to query the primary key index again to obtain the data row”.

InnoDB storage engine indexes are divided into “cluster indexes” and “secondary indexes”. The primary key index is the cluster index and the other indexes are secondary indexes.

The leaf node in the cluster index holds the complete data rows, while the leaf node in the secondary index does not hold the complete data rows.

As mentioned above, InnoDB table must have primary key index, although the index takes up space, the index conforms to binary search algorithm, which is very fast to find data.

If the employee table has a primary key id and a normal index name, InnoDB will have two B+ trees, one for the primary key:

The leaf node in the primary key index tree stores the complete data rows, and the second tree is the secondary index tree for the name field, as shown in the figure below:

Select name, age, sex from Employee where id =’as’; When name=’as’ is queried, the primary key is 50. Then, the primary key index tree is queried according to the primary key, and the complete data row is obtained. The specific execution process is as follows:

This is the backtable query. The backtable query will be queried twice, which will reduce the efficiency of the query. In order to avoid the backtable query, only query once can get the complete data?

Indexes cover

A common way to do this is to “create composite indexes (federated indexes) and” do “index coverage. What is index coverage? Index overwrite is “the leaf node of the index already contains the query data, there is no need to query back to the table.”

Select name, age, sex from Employee where name =’as’; Since only the name field is indexed for a normal index, this will inevitably lead to a backtable query.

In order to improve query efficiency, (name) “single-column index upgrade to the federated index” (name, age, sex) is different.

Because of the established joint index, there will be three values of name, age and sex in the leaf stage of the secondary node at the same time, and the required data will be obtained at one time, so as to avoid back table, but all the schemes are not perfect.

If the name value of the federated index changes or the age of a row changes on any given day, I need to maintain both the primary key index and the federated index tree, which is high maintenance cost and performance overhead.

Compared to previous data changes, I only need to maintain the primary key index. The creation of a federated index causes two trees to be maintained at the same time, which affects the operation of inserting and updating data, so no solution is perfect.

Leftmost prefix rule

(B+Tree) (c +Tree) (c +Tree) (c +Tree) (c +Tree) (c +Tree) (d +Tree) (C +Tree) (D +Tree) (D +Tree) (D +Tree) (D +Tree) (d +Tree)

The left-most prefix of an index is the same as that of an index created on the left. For example, if a table has the following data:

As shown in the figure above, the name field is placed first, so the name field is fully ordered, but the age field is not. Only when the names are the same, for example, name=’ BC ‘, then the age field is fully ordered.

So you will find that in a federated index you will only use the index if you use the following rules:

  • name,age,sex
  • name,age
  • name

Because Mysql has a query optimizer at the bottom, it determines that full table scans are more efficient than indexes when executing SQL.

If age>=23,sex=’ male ‘; Two fields are used as the query criteria, but the name field is not used because it is unordered for age without name knowing.

For age>=23, there is no way to use the index. This is why the index is implemented. It is important to follow the “find order, make full use of the order of the index.”

If you create three single-column indexes in the name, age, and sex columns, it is equivalent to creating three index trees, so it is more efficient than using one index tree query efficiency.

WHERE name like ‘%d%’ WHERE name like ‘%d%’; Fuzzy queries with the like condition invalidate the index.

We can understand that “the query string follows the left-most prefix principle”, and that the query of a string is to match each character in the string one by one. “If the left-most % of the string indicates an indeterminable string, then the order of the index cannot be taken advantage of.”

WHERE name like ‘d%’; You can use an index because the leftmost string is deterministic, which is called a “match column prefix.”

In the actual business scenario, when creating a joint index, “We should put the fields with high recognition in front to improve the hit ratio of the index and make full use of the index”.

An index pushdown

Mysql5.6 proposes the principle of indexing down, “for query optimization, mainly for queries with the like keyword.” what is indexing down?

SELECT * from user where name like ‘%’ and age=40; SELECT * from user where name like’ %’ and age=40;

If there is no index push down, the execution process is as shown in the figure below:

Mysql > select * from ‘age’ where ‘id’ =5 and ‘id’ =7 from ‘name’ where ‘id’ =5 and ‘id’ =7 from ‘name’ where ‘id’ =5

If the above query operation uses index push, the execution process is as follows:

Mysql will pass the query condition with age=40 to the storage engine, and filter out the data rows with age=50 again. In this way, the number of times to return to the table becomes once, improving the query efficiency.

In summary, index pushing means that when SQL queries are executed, some judgment conditions of index columns will be passed to the storage engine. The storage engine will judge whether the conditions are met, and only the data that meets the conditions will be returned to the Mysql server.

The full text indexing

Full-text index is also called full-text retrieval. You can create a full-text index by using the following SQL: ALTER TABLE EMPLOYEE ADD FULLTEXT FULLTEXT_NAME (name); Or CREATE INDEX.

Full-text indexes are mainly valid for TEXT fields such as CHAR, VARCHAR, or TEXT. You can also query TEXT using the like keyword.

A plain index (single-column index) query can only speed up the retrieval of the first string in the field content, not for a query consisting of multiple words.

DROP INDEX fullTEXT_name ON Employee; DROP INDEX fullTEXt_name ON Employee;

Clustered indexes and non-clustered indexes

Clustered and non-clustered indexes are a concept relative to a storage engine that is larger in scope and includes the index types mentioned above.

In a clustered index, the entire row data is stored in the leaf node, and the index and data are stored together; in a non-clustered index, the index file and the data file are separate, so the query data will be queried one more time.

In Mysql’s storage engine, InnoDB supports clustered indexes, MyISAM does not support clustered indexes, MyISAM supports non-clustered indexes.

Clustering index

InnoDB has a primary key, which is used to support a clustered index. The structure of a clustered index is as follows:

The best primary key selection in InnoDB is to give an AUTO_INCREMENT column as the primary key increment, and some people might use UUID as the random primary key.

Because indexes need to be ordered, the cost of maintaining a primary key index tree becomes higher when primary key inserts need to be placed in the right place with random primary keys.

In contrast, when primary keys are incrementing, the cost of maintaining the primary key index tree becomes smaller, and random primary keys should be avoided as much as possible.

Non-clustered index

MyISAM uses a non-clustered index. When new data is inserted, it is written to the disk sequentially, and each row of data is marked with a row number, increasing from small to large.

When MyISAM creates the primary key index, the structure of the primary key index tree is as follows:

In the primary key index, data is also non-empty and unique. The primary key index tree stores the row number of data rows. When querying data, the primary key index needs to obtain the row number, and then obtain data through the row number.

The only difference is that a non-primary key index does not have to be non-null and unique.

If InnoDB(clustered index) and MyISAM(non-clustered index) are not clustered, the index data layout is as follows:

** InnoDB(clustered index) ** and MyISAM(non-clustered index) ** have a very clear understanding. The following is a talk about index optimization, which is the most closely related to our daily development.

The index optimization

Based on my years of experience, Mysql index optimization is nothing more than the following three points:

  1. The best use of indexes for queries
  2. Avoid full table scans
  3. Complex queries to avoid scanning invalid data

To optimize the premise

The Explain keyword is a common “keyword” for SQL optimization in Mysql. It is often used to “view the execution plan of THE SQL without executing the SQL” to quickly identify problems with the SQL.

Before explaining Explain, create the user table user, role table role, and user role table roLE_user as test table:

DROP TABLE IF EXISTS 'user'; CREATE TABLE `user` ( `id` int(11) NOT NULL, `name` varchar(25) DEFAULT NULL, `age` int(11) NOT NULL DEFAULT 0, `update_time` datetime DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO ` user ` (` id `, ` name `, ` age `, ` update_time `) VALUES (1, 'zhang, 23,' the 2020-12-22 15:27:18 '), (2, 'li si, 24,' 2020-06-21 15:27:18 '), (3, 'Cathy', 25, '2020-07-20 15:27:18'); DROP TABLE IF EXISTS `role`; CREATE TABLE `role` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`), KEY `index_name` (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO ` role ` (` id `, ` name `) VALUES (1, 'a product manager), (2,' technical manager), (3, 'project director); DROP TABLE IF EXISTS `role_user`; CREATE TABLE `role_user` ( `id` int(11) NOT NULL, `role_id` int(11) NOT NULL, `user_id` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `index_role_user_id` (`role_id`,`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; INSERT INTO ` role_user ` (` id `, ` role_id `, ` user_id `) VALUES (1, 2, 1), (2,1,2), (3) filling;Copy the code

Explain select * from user where id =2; , the execution result can be seen as follows:

This is the explain execution plan. If you can understand this execution plan, you will be on the way to becoming proficient in SQL optimization. Here is a detailed description of what the 12 fields mean.

Id field

The ID indicates the sequence number of the SELECT query statement. It is the id of the SQL execution order. The SQL is executed from top to bottom according to the ID.

What does that mean? For example, run the following SQL: explain select * from user where id in (select user_id from role_user);

+----+-------------+-----------+------------+-------+---------------+--------------------+---------+------+------+------ ----+-----------------------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+--------------------+---------+------+------+------ ----+-----------------------------------------------------------------------------------+ | 1 | SIMPLE | user | NULL | ALL | PRIMARY | NULL | NULL | NULL 100.00 | | 3 | NULL | | 1 | SIMPLE | role_user | NULL | index | NULL | 8 index_role_user_id | | NULL | 3 | | 33.33 Using the where; Using index; FirstMatch(user); Using join buffer (Block Nested Loop) | +----+-------------+-----------+------------+-------+---------------+--------------------+---------+------+------+------ ----+-----------------------------------------------------------------------------------+Copy the code

The first record corresponds to the user table, and the second record corresponds to the roLE_user table. In this case, the SQL is executed from top to bottom.

If the ids are different, run the following SQL: explain select (select 1 from user limit 1) from role; :

+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+------- ------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |  +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+------ -- -- -- -- -- -- -- + | 1 | PRIMARY | role | NULL | index | NULL | index_name | | NULL 33 | 3 | | 100.00 Using index | | 2 | SUBQUERY | user | NULL | index | NULL | PRIMARY 100.00 | | | NULL | 3 | 4 Using the index | +----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+------- ------+Copy the code

You will see that there are two records, and the two records will have different ids. The larger the id is, the more it will be executed first. You can see that the user table is executed with id=2, which is the subquery part, and the outermost part is executed last.

In a complex query, there are multiple records. In a simple query, there is only one record. In a complex query, those with the same ID are in a group, and the execution order is from the top down. Optimization for subqueries exists in Mysql 8, so sometimes even complex queries have only one record.

Select_type field

Select_type Indicates the query type, that is, whether the query corresponds to a simple query or a complex query. If the query is a complex query, it can be simple subquery, subquery with the FROM clause, or union query. Let’s take a look at all query types in select_type.

  • simple

Simple Indicates a simple query without any complex query.

  • PRIMARY

SQL > select * from ‘PRIMARY’ where ‘PRIMARY’ = ‘PRIMARY’; explain select * from role where id = (select id from role_user where role_id = (select id from user where id = 2));Select * from role where id =? Will be marked as PRIMARY.

  • SUBQUERY

“SUBQUERY included in SELECT or WHERE” is represented as SUBQUERY type. For example, two subqueries in the previous SQL statement are SUBQUERY.

  • DERIVED

“DERIVED stands for DERIVED table or DERIVED table and is represented as DERIVED in subqueries contained in FROM.” Mysql executes these subqueries recursively and places the results in temporary tables. Run SQL: explain select * from (select name from user union select name from role) a where A.name = ‘zhang SAN ‘;Mysql has been optimized since 5.7 with the addition of derived_merge to speed up query efficiency.

  • UNION

In the “UNION query, the second select query is represented as a UNION” :

  • UNION RESULT

“The RESULT of the UNION query statement is marked as UNION RESULT”, as in the SQL executed above: Explain select * from (select name from user union select name from role) a where a.name = ‘zhang SAN ‘;As you can see from the table field, the fourth row is from the second and third rows <union2,3>, so the RESULT of a UNION query is marked as the UNION RESULT

  • DEPENDENT UNION: Also refers to the second or subsequent statement in the UNION query, but depends on the outside query.
  • DEPENDENT SUBQUERY: The first select statement in a SUBQUERY, which is also DEPENDENT on the external query.
  • UNCACHEABLE SUBQUERY: The result of the SUBQUERY cannot be cached. The first line of the external connection must be reevaluated.

The table field

The table field indicates which table is being queried. The table field indicates an existing table. For example, the user and role fields above are created by ourselves.

For example, the table field of the UNION RESULT is represented as <union2,3>, which means that the second and third row RESULT records are queried.

The type field

The type field represents the type associated with the SQL or the type accessed. From this field we can determine when this SQL search database table, the search record is about the scope of what, directly can reflect the efficiency of SQL.

System, const, eq_ref, ref, range, index, and ALL System > const > eq_ref > ref > range > index > ALL

system

System is a special case of const, “indicating that there is only one row in the table.”

const

Const indicates that the data is found once by the index. The common const appears in the “equivalent query for a unique index or primary key index.” The lookup is fast because only one data match exists in the table. Example: explain select * from user where id =2;

eq_ref

Eq_ref indicates that a unique index or primary key index scan is used as a table link matching condition. For each index key, only one record in the table matches it. Such as: explain select * from user left join role_user on user.id = role_user.user_id left join role on role_user.role_id=role.id;

ref

Ref performance is worse than eq_REF performance. It also indicates the table’s link matching conditions, that is, which table columns are used as the value of the query index. The difference between ref and eq_REF is that eQ_REF uses a unique index or a primary key index.

The result of the ref scan may be to find multiple rows that match the condition, which is essentially an index access that returns the matching rows. For example: explain select * from user where name = ‘zhang SAN ‘;

range

Select * from user where id > 2; select * from user where id > 2;

index

“Index” indicates that the index tree is traversed. “index” is faster than “ALL”, but “index” indicates that you need to check whether your index is correctly used: explain select ID from user;

ALL

“The difference between ALL and index is that ALL is read from disk, while index is read from index file.” ALL means that Mysql will scan the table from head to tail. This means that Mysql usually needs to add indexes to optimizes queries, or that queries do not use indexes as criteria: explain select * from user;

Possible_keys field

Possible_keys means that the list of possible indexes can be used, just possible, but not necessarily true.

When NULL is not used, it indicates that the index needs to be added to optimize the query. If the table has less data, the database thinks that the full table scan is faster, and NULL is also possible.

The key fields

The difference between the key field and possible_keys is that it means the actual index used, meaning that possible_keys contains the value of key.

To make Mysql USE or IGNORE the INDEX in possible_Keys, USE FORCE INDEX, USE INDEX or IGNORE INDEX.

Key_len field

Indicates the number of bytes used by the index in the SQL query statement. The number of bytes is not the actual length, but is obtained by calculating the length of the index used in the query. It shows the maximum possible length of the index field.

Select * from user where id =2; select * from user where id =2;

Key_len has its own calculation rules for different types. The specific calculation rules are as follows:

If the index is a string type, the actual stored string is too long to be stored (768 bytes), mysql will use a similar left prefix index.

Ref field

Ref indicates the comparison between the column and the index, the matching condition of the table join, and which columns or constants are used to query the value of the index column.

Rows field

Rows indicates the estimated number of rows to scan. Mysql will estimate the number of rows to scan to find the record based on table information and index selection. Note that this is not the actual number of rows in the result set.

Partitions, Filtered fields

Partitions represent matched partitions; Filtered represents the percentage of the table that the query table rows represent.

Extra fields

This field displays additional information about the SQL query, mainly in the following cases:

Using index

Explain select id from user where id =2; select id from user where id =2;

If using WHERE is present, the index is used to perform a lookup of the index key; If using WHERE is not present, the index is used to read data, not to perform the query action.

Using where

In contrast to Using index, the column in the query is not overwritten by the index; the where condition is followed by a non-indexed leading column; it is simply a where condition: explain select user.* from user,role,role_user where user.id = role_user.user_id and role.id=role_user.role_id;

Using temporary

“Using Temporary means that a temporary table is used to store intermediate results, and is usually used when sorting results.” For example, when sorting order by and querying group by, a temporary table is used. Example: explain select * from (select name from user union select name from role) a where a.name = ‘zhang SAN ‘;

Using filesort

Explain select * from user order by name; select * from user order by name;

Using join buffer

Using join Buffer explain select user.* from user,role,role_user where user.id = role_user.user_id and role.id=role_user.role_id;

It emphasizes that the connection buffer is used to store the intermediate result instead of the index when the join condition is obtained, and the presence of this value generally indicates that the index needs to be added for optimization.

Impossible where

Impossible select * from user where name = ‘hah’ and name = ‘SFSD ‘; Impossible select * from user where name = ‘hah’ and name =’ SFSD ‘;

Select tables optimized away

Explain select min(id) from user;

There are other attributes in the Extra field, but almost none of them have been seen before, so I will explain them. Those who are interested can learn about them by themselves. Here are only the common ones.

conclusion

Let me summarize some rules for using indexes correctly. Indexes should be created correctly, queried correctly, and not invalidated. Therefore, the design and optimization of indexes should follow the following principles:

  1. The index column should not appear in the expression. This will invalidate the index. For example, SELECT…… WHERE id+1=5;
  2. Do not use index columns as arguments to functions.
  3. Try not to use the like keyword in index columns. For example, SELECT…… WHERE name like ‘%d%’. You can use the Mysql built-in function INSTR(STR,substr) to match, query the string appears in the following table; You can also use a FullText index, using match Against search; If the amount of data is very large, it is recommended to use ES or Solr instead.
  4. Numeric indexed columns should not be queried as strings. For example, SELECT…… WHERE id = ’35’;
  5. Try not to use the conditions not in, <>,! =, or, in which the keyword should be used carefully. The number of in conditions should be controlled within 1000. If there are too many conditions after in, it will lead to full table scan, which is also explained in the Development manual of Alibaba. It is recommended to use between instead of in or subquery EXISTS instead. The or keyword suggests using union instead of optimization.
  6. It is not recommended that the default value of a table be null. Generally, a table is created with a default value, such as 0 or an empty string, to prevent the occurrence of null Pointers.
  7. Do not have NULL values in index columns. NULL values invalidate indexes. Replace NULL values with special characters such as the empty string “” or 0.
  8. Queries for federated indexes should follow the left-most prefix rule.
  9. Generally, indexes are built for the distinctively large fields. In the joint index, the distinctively large (highly identifiable) fields are put in the first place to improve the hit ratio of the index. In the XML file of Mybatis with multiple conditions where query, the condition that hits the index is put in the first place, and the single column index is not used if the joint index can be used.
  10. The conditions following WHERE, Order BY, and group BY should be optimized as much as possible using the index. Exlpain can be used to optimize the query.
  11. Inner on is recommended for complex queries, because in the mysql optimizer inner on is limited to using small tables to associate with large tables. The conditions after ON should be indexed. Left and right associations should follow the rule of small with large table.
  12. The design principle of library table under microservice distributed system is that cross-library join operation should not be carried out. If the data amount of a library is relatively large, join operation should be reduced or not carried out as far as possible. If join operation is carried out, invalid data of join should be avoided, and the principle of single interface should be maintained. For some fields that will not change for a long time, redundancy can be considered to reduce join operations. It is also mentioned in alibaba’s manual that more than three joins are not allowed, and multiple queries are preferred.
  13. Business optimization considerations: Read more write less scene or less write more scenes, general Internet is to read more writing less scenarios, for some the need for statistical fields, such as some front page reports, panel data, need a lot of statistics interface, to minimize the Mysql to the statistics, avoid slow SQL, optimization method, can sacrifice part of writing to improve the performance of the new can read, In advance, when writing, statistics are carried out in an asynchronous way (such as statistics of the current month’s points, a year’s performance and other data). Some data volume involves large data itself. If you use Mysql for query statistics, it will inevitably lead to slow SQL.
  14. Query optimization can be used asynchronously to reduce the response time of interfaces that call services multiple times synchronously.
  15. The size of the index must be moderate to avoid excessive index redundancy.

Brother, I see your bones surprised, I will teach you my life skills, to carry forward this unique martial arts, it is up to everyone, just kidding.

Well, that’s it for this installment, I’m Leidu, and we’ll continue to dig deeper into Mysql in the next installment, see you next time.