MySQL index, for r & D students, especially back-end r & D students, must be familiar with. If we work with MySQL databases a lot, we will certainly use performance optimization designs and considerations frequently, often involving MySQL indexes. But are you really using MySQL indexes correctly?

By the way, a less formal introduction is also needed before we start the formal explanation

I have many years of back-end research and development experience, “mixed” in several large Internet companies, focusing on the research and learning of software architecture technology, hoping to continue to exert my excess heat, summarize and output the problems and technologies in my work, share and influence more people.

Everyone can see my head picture is like a top, in fact, the meaning of spiraling up, so that technology and self can continue to improve.

Next, I will take a real MySQL query application problem in my work as the background, to gradually analyze and analyze, see the move, based on scientific theory, analysis and exploration, hope to lead us to clarify the index application principle, and finally explore the problem clearly.

Main principles: problem driven;

Main process: application phenomenon – problem analysis – follow up of doubts – layers of inquiry – clear conclusion.

Next, let’s use about 15 minutes to let the free mind fly for a while.

Problem is introduced

Cities we have a data table that contains some basic data about the code and name of a city.

One day, I was running the following SQL (one with the field ID specified and the other with * not specified), and I found that the results of the two cases were different!

Case1: select id from cities limit 1;Copy the code

Query result:

Id: 2

Case2: select * from cities limit 1;Copy the code

Query result:

This has succeeded in getting my attention, so let’s move on!

Problem analysis

According to the previous work experience told me, do not panic when things happen, explain the implementation of the first look.

Case1: explain select id from cities limit 1;Copy the code

Execution Result:

Case2: explain select * from cities limit 1;Copy the code

Execution Result:

After looking at the execution plan above, it was found that the SQL in Case1 was applied to an index named ‘uniq_city_code’, while the second went through the full table scan query.

** That is to say, the two SQL query fields due to the difference, resulting in MySQL in the specific execution of the selection of different index strategy, resulting in different query results.

Suspects to follow up

In fact, after the above analysis, there are still several questions:

  • Why is the index of the city_code field used when it does not appear in the Case1 query?

  • Why does Case2 query not use uniq_city_code index?

The Extra field in Case1 query plan is Using index, indicating that index coverage is satisfied (the index contains all data that meet the query conditions, and there is no need to query from the table). Mysql > alter table uniq_city_code alter table uniq_city_code

With that in mind, let’s review the MySQL engine index implementation.

As shown in the figure, implement a legend for Innodb and the index of the MyISAM engine by reference.

1. Comparison diagram of InnoDB clustered index and secondary index (non-clustered index

At the same time for you to understand, I marked the yellow line, red line respectively represent the two engine data query path, you can refer to the legend, experience comparison.

InnoDB stores data as a clustered index, so its data layout is quite different.

1) Each leaf node in the cluster index contains the value of the primary key, transaction ID, and rollback pointer — used for transactions and MVCC, and the remaining columns (e.g. Col2).

2) Relative to MyISAM, secondary indexes are quite different from clustered indexes. InnoDB’s secondary index leaves contain primary key values instead of row Pointers. This reduces the overhead of maintaining secondary indexes when moving data or splitting data pages, since InnoDB does not need to update the index’s row Pointers.

2. MyISAM engine index icon

MyISAM does not support clustered indexes. Each leaf node in the index contains only row number, and the leaf nodes are stored in col1 order.

In MyISAM, the primary key is no different from any other index. The Primary key is simply a unique, non-empty index called Primary.

All right, let’s go back to the problem.

In fact, we can draw the following preliminary conclusion:

Case1: select id from cities limit 1;Copy the code

Since the uniq_city_code index contains the ID field, this query can retrieve data directly from the uniq_city_code index, so the optimizer selects the uniq_city_code index.

Case2: select * from cities limit 1;Copy the code

Select * in this query selects a column that is not included in the uniq_city_code index, so the uniq_city_code index cannot be used.

To verify the preliminary results we just reached, let’s use Case3 to verify.

Case3: select id, city_code from cities limit 1;Copy the code

Execution Result:

According to the above theory, Case1 (query ID) and Case3 (query ID +city_code) should have the same query plan to execute the application.

Through the verification experiment, we can confirm a conclusion: the query of Case1 does have index coverage.

The official account

Let’s ask further: why index override? Is it okay if I don’t?

Let’s take a look at MySQL’s official explanation…

In fact, said so much, the essence is the last sentence, this can make the query faster!

Well, let’s think about this for a moment:

Why doesn’t the optimizer choose to use a primary key index since it contains all the data columns and can be covered just as well?”

.

In fact, this problem is a typical MySQL index selection principle.

When MySQL does a full table scan, find_shortest_key() is called to select the shortest index to scan.

For the find_shortest_key() function, let’s look at the official explanation as follows:

So, the answer to the above question is: when there are multiple indexes with different length, MYSQL will select the shorter index.

Since secondary indexes must be a subset of primary key indexes, they are preferred in full table scans to save IO.

conclusion

Ok, let’s conclude the whole sharing together.

1) First of all, we encountered a query problem. Due to different query fields, there were differences in our query result data;

2) We investigate the problem and find that MySQL selects different index strategies according to different select fields, that is, the result data is different;

3) As to whether there is index coverage problem, Case3 verification is carried out to confirm the existence of index coverage problem;

4) As to why MySQL has such index selection principle, we finally find that secondary indexes must be subsets of primary key indexes. From the perspective of saving IO, secondary indexes are preferentially selected in full table scan.

Key refinements:

Different engines for different query implementation methods, index coverage, MySQL index selection principle.

Different engines for different query implementation methods, index coverage, MySQL index selection principle.

Different engines for different query implementation methods, index coverage, MySQL index selection principle.

Important questions three times, hahaha ~

In fact, stepping on pits is also a kind of growth!

In fact, in the face of any problem, do not rush to conclusions at the beginning, you can try to do more in-depth analysis, understand the essence of the problem and then consider solutions to solve it, I hope you can master the ability of problem analysis and solution, to understand by analogy, rather than just understand one move, blindly copy.

If you still have any questions, you can follow my wechat public account “The Road to Architecture Improvement” to obtain my personal wechat information. Let’s keep in touch and look forward to growing together with you.

– END –

Author: the path of architecture improvement, focusing on software architecture research, technical learning and personal growth, attention and private letter I reply “01”, send you a programmer growth advanced gift package, welcome to hook up.

Thanks for reading!