directory

  • directory
  • preface
  • What is an index
  • What can an index do for us?
  • What types of indexes are there?
  • B- tree index and B+ tree index
    • b-tree
    • B + tree
  • How do YOU create a high-performance index?
    • Prefix index and index selectivity
    • Joint index
      • The principle of the left-most prefix index
    • Clustering index
    • Cover index
    • Delete redundant and duplicate indexes
  • How do I view some information about the index?
    • Index information
    • Index size
    • The index fragments
  • Refer to the article

preface

It is said on the Internet that there are two parts to learn mysql: index and transaction. In fact, in the recent learning process of mysql, I think there are three parts: index, query and transaction. The query mainly refers to query optimization that is to write efficient SQL statements.

This article documents the process of learning MySQL indexing. This is mainly for the understanding and extension of reading High Performance MySQL.

What is an index

An index is a data structure used by a storage engine to quickly find records.

This is the official MySQL definition of an index. You can see that an index is a data structure, so how should we understand an index? A common example is book catalogs. We have all formed the habit of reading the catalogue. When we get a book, we will first check its catalogue, and when we want to find a certain content, we will look in the catalogue, and then find the corresponding page number of the section, and then look in the book according to the corresponding page number. Without an index, we would have to look it up page by page.

In MySQL, suppose we have a table with the following records:

id name age
1 huyan 10
2 huiui 18
3 lumingfei 20
4 chuzihang 15
5 nono 21

If we want to find the name of a person aged 15, we can only go through all the data to do the comparison one by one without an index, so the time is O(n).

If we maintain an additional array while inserting data, store the age field in order. I get the following array.

,15,18,20,21 [10] | | | | | [x4 x1, x2, x3, x5]Copy the code

The x below is where the simulated data is stored on the disk. At this point if we want to look up the name of a 15-year-old. We can do binary lookup on the cover array. As we all know, the time complexity of binary search is O(logn). After the search, the real data can be obtained according to the specific location.

MySQL index is not an array, but a B+ tree (see below).

What can an index do for us?

As mentioned above, indexes help us find data quickly. Second, because the values in the index are stored sequentially, it helps us with the Order Derby operation. Real values are also stored in the index, so some queries can be performed directly in the index (the concept of overwriting an index, described below).

To summarize the advantages of indexing (summarized in the Book High Performance):

  • Reduces the amount of data that a query needs to scan (speeds up the query)
  • Reduces server sorting and temporary table creation (speeds up operations like groupby and OrDerby)
  • Change the server’s random IO to sequential IO(to speed up queries).

What are the disadvantages of indexes?

First, indexes are also data and need to be stored, so they take up extra storage space. Second, the index needs to be maintained at the same time as the insert, update, and delete operations, thus incurs additional time overhead.

To sum up:

  • Indexes occupy disk or memory space
  • Slow down the insert update operation

In fact, within a certain data range (without too many indexes), the cost of indexing is far less than the benefit, but we still need to prevent index abuse.

What types of indexes are there?

For MySQL, indexes are not implemented at the server layer, but are implemented by the storage engine, so different storage engines implement different types of indexes. InnoDB, the most widely used storage engine, uses a B+ tree index, so most of the time we are referring to it.

MySQL has the following indexes:

  • B-tree index /B+ tree index
  • The hash index
  • Spatial data index
  • The full text indexing

This article will only study b-tree indexes and B+ tree indexes.

B- tree index and B+ tree index

The data structure principle of B- tree and B+ tree will not be explained in detail here. Interested partners can refer to the article in the article. Or find out for yourself via Google.

b-tree

B-tree is a multi-path balanced search tree. For a B-tree of order M, it has the following properties:

  1. The root node has at least two children.
  2. Each node contains K-1 elements and k children, where m/2 <= k <= m.
  3. Each leaf node contains k-1 elements, where m/2 <= k <= m.
  4. All the leaf nodes are in the same layer.
  5. The elements in each node are arranged from small to large, so k-1 element is exactly the division of the range of values contained by K children.

It might be a little confusing to think of the B-tree as a dumber binary search tree.

B + tree

B+ tree is an advanced version of B- tree. On the basis of B- tree, the following restrictions are made:

  1. Each intermediate node does not hold data, but is only used for indexing, which means that a copy of all non-leaf node values is kept in the leaf node.
  2. The leaf nodes are linked in their own order.

What good can this do?

  1. Intermediate nodes do not hold data, so more indexes can be held, reducing the number of DATABASE disk I/OS.
  2. Because the middle node does not store data, every search will hit the leaf node, which is in the same layer, so the query performance is more stable.
  3. All leaf nodes are sequentially linked into a linked list so that range queries can be easily performed.

How do YOU create a high-performance index?

Since tuning indexes and tuning queries are generally inseparable, this section may contain some of the query tuning content.

Prefix index and index selectivity

If you want to add an index to a long string, consider using a prefix index. Before introducing the prefix index, let’s first consider the working steps of the index. When the database uses the index to search, it is generally the following steps:

  1. Find the corresponding value in the index B+ tree, for example, find the school nameCassel CollegeAnd get the address of this data on disk.
  2. Look it up on the disk according to the address and get all the values of that data.

So if kassel can uniquely identify this data among all school name values, can it achieve the same effect with kassel college index?

The answer is yes, and with Cassel, you can reduce the index size by 60%. That’s what prefix indexes are for.

Prefix index: When you index a long string, you can index only the first part of the string, which greatly saves the index space and improves the index efficiency. But this also reduces index selectivity.

Index selectivity: non-duplicate values/all values. You can see that the index selectivity is 0-1, with the highest being that the column is unique and has no duplicate values. So the efficiency of a unique index is better.

But in general, longer strings have better prefixes, and we can figure that out. Use the following statement:

select 
    count(distinct left(school_name,3)) /count(*) as sch3, 
    count(distinct left(school_name,4)) /count(*) as sch4,
    count(distinct left(school_name,5)) /count(*) as sch5,
    count(distinct school_name)/count(*) as original
from 
    user;
Copy the code

The original found is the original selection,sch3, SCH4,sch5 are the selection of the first 3,4, and 5 characters of the column as indexes respectively. Incrementally increasing this value, when the selectivity is close to the original, is the appropriate length for a prefix index (this is generally true, but there are exceptions where such a prefix index will perform poorly in a particular case when the data is extremely uneven).

Alter table user add index sch_pre3(‘ school(3) ‘)

Note: prefix index and overwrite index are difficult to use together, I just tried this morning to optimize the index to this point failed, the specific reasons will be explained after the introduction of overwrite index.

Joint index

There is usually a need to index multiple columns because of the variety of query requirements. At this point, we can choose to create multiple independent indexes or create a joint index. Most of the time a federated index is more appropriate.

Select * from user where school_name = ‘kassel’ and age > 20, we create two separate indexes on school and age, so we expect the query to hit both indexes, but using the explain command, we find that it is not necessarily the same. This is a metaphysical process. I have not studied it clearly.

In theory, MySQL after 5.0 version of face support index, which is used at the same time two indexes, but the optimizer MySQL may not think so, he might have thought that the query twice the price of B + tree is higher than the query again after the index to the data table to filter, so can choose only one index. (on my own 5 tables Do tests similar to this case, the result is to use only one index.

Alter table user add index school_age(‘ school ‘, ‘age’)

Select * from user where age (select * from user where age); select * from user where age (select * from user where age) =20 does not match the above union index.

Without considering any queries, we should put the most selective columns in front of the federated index, but in reality we are more likely to push back the index by query so that a fixed query can hit the index as fast as possible. After all, the purpose of indexing is to speed up queries.

Therefore, the optimization of the joint index is more based on a certain or some statements to optimize, there is no general rule.

The principle of the left-most prefix index

Mysql can use the school_age index when columns are in order.

school age
a 12
b 12
b 14
b 15
c 1

In this data, the school field is fully ordered, and the index school can use the index.

The age field is not ordered by the whole table, so you can’t use the index directly. So if you look at the table, when is age ordered? When school is matched, for example, when school=b,age is ordered for these three data points, so you can use the age index. That’s how the leftmost prefix works.

In addition, the left-most prefix index can only use one range query, such as SELECT * from user where school > a, Select * from user where school > a and age > 12; select * from user where school > a and age > 12; In 12, only school can hit the index, which can also be concluded from the above. When school is a range match,mysql cannot confirm whether age is strictly ordered. For example, if school is a range match, age is not ordered. Subsequent indexes cannot be used.

Clustering index

Clustered indexes are not a type of index, but a way of storing data. Innodb’s clustered index stores indexes and data in the same data structure.

Because the real data can only be sorted in one way, there can only be one clustered index on a table. Innodb uses primary keys for clustered indexes. If there is no primary key, Innodb selects a unique non-empty index. If there is no primary key, Innodb chooses to generate an implicit primary key for clustered indexes. Why innoDB is so obsessed with a clustered index is that there is always one and only one way to sort data in a table on disk, so it is necessary.

This is why InnoDB recommends that we use auto-increment primary keys, because the auto-increment primary keys are continuous and only need to append data continuously when inserting. Imagine using a UUID as a primary key. Each insert would need to find the position of the current primary key in the sorted primary key, then insert and move the data behind the primary key so that the data and the primary key remain in the same order, which would be very expensive.

For this reason, in the leaf nodes of other indexes, the “data” stored is not the real physical address of the data, but the primary key of the data. After finding the primary key, the index is conducted once according to the primary key to get the data.

The difference between clustered and non-clustered indexes can be illustrated with a simple example:

When we get a book, directory is the primary key, is a clustering index, because the contents of the continuous in the directory, is continuous in the body, when we want to view the grand fled into the sun, chapters, you just need to find it in a directory corresponding page, such as 459, and then go to the corresponding page to see the text.

The non-clustered index, on the other hand, is similar to the appendix proper noun index at the back of the book (secondary common index). When you look up Bundalev, the appendix tells you that the noun appears in the great Escape into the Sun section, and then you have to go back to the table of contents (primary key index) to find the corresponding page number.

Cover index

When an index contains (or overwrites) the values of all the fields that need to be queried, it is called an overwrite index.

Imagine a query statement like this:

select 
  school_name,age
from  
  user
where 
  school_name = 'Oriole's Tail Academy'
Copy the code

Select * from primary key; select * from primary key; select * from primary key; select * from primary key; select * from primary key; But now that the index contains all columns that need to be returned, there is no need to perform a query back to the table. Besides, the index size is usually much smaller than the actual data size. Overwriting an index can greatly reduce the amount of data loaded from disk.

Why can’t prefix indexes and overwrite indexes be used together?

For the purpose of the prefix index prefix is used to represent the real value, they almost no difference in the selectivity, but MySQL still could not judge what is the real data, such as alibaba and ali mother at the time of prefix is 2 is the same, but in order to ensure that you query alibaba won’t appear the content of the ali mama, is the need to go back to the data table to get the number According to a precise match again to carry out filtering.

Therefore, an overwrite index cannot be used with a column prefix index, which is what I found in a morning of testing.

Delete redundant and duplicate indexes

Some indexes are never used in a query, but add overhead when inserting data. These indexes should be deleted in time.

Creating a normal index on a primary key, for example, is of no use.

“School_age” = “school_age”; “school_age” = “school_age”; “school_age” = “school”;

How do I view some information about the index?

Index information

Mysql > show index from table_name mysql > show index from table_name

Or run show create table table_name to see the table construction clause that contains the statement to create the index.

Index size

In versions after 5.0, more detailed data can be obtained by looking at the data in the information_schema.tables table.

The meanings of the fields in this table are as follows:

field meaning
Table_catalog Data table registry directory
Table_schema The name of the database to which the data table belongs
Table_name The name of the table
Table_type Table type [system view
Engine Database engine used [MyISAM
Version Version, default value 10
Row_format [Compact row format
Table_rows How many rows of data are stored in the table
Avg_row_length Average line length
Data_length The length of the data
Max_data_length Maximum data length
Index_length The index length
Data_free Space debris
Auto_increment Do auto-increment the current value of the primary key
Create_time The creation time of the table
Update_time Table update time
Check_time Table check time
Table_collation The character verification encoding set of the table
Checksum The checksum
Create_options Creation options
Table_comment Table comments, remarks

We can use some query statements to obtain detailed information, such as:

// Check the size of all indexes on the current MySQL server (in MB, default bytes)SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM TABLES// View all the sizes of a librarySELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM TABLES  WHERE table_schema = 'XXX'; // Check the index size of a tableSELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM TABLES  WHERE table_schema = 'yyyy' and table_name = "xxxxx"; // Summarize the data size and index size of a librarySELECT CONCAT(table_schema,'. ',table_name) AS 'Table Name'.CONCAT(ROUND(table_rows/1000000.4),'M') AS 'Number of Rows'.CONCAT(ROUND(data_length/(1024*1024*1024),4),'G') AS 'Data Size'.CONCAT(ROUND(index_length/(1024*1024*1024),4),'G') AS 'Index Size'.CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'xxxxx';
Copy the code

All views of the data in the tables tables are available, including some data information about the tables themselves, but since it is not relevant to the topic of this article, I will not use examples here.

Note: The table above is cached. After updating the database index, it is best to run analyze Table XXXX and then view it. MySQL updates the table only when the table data changes significantly (more than 1/16 of the size or 2 billion rows inserted).

The index fragments

In the index creation and deletion process, it is inevitable to produce index shards, and of course data shards, we can execute the optimize table Mysql > alter table XXXX engine=innodb mysql > alter table XXXX engine=innodb mysql > alter table XXXX engine=innodb mysql > alter table XXXX engine=innodb

Refer to the article

Book High Performance MySQL(3rd edition) B- tree B+ tree














ChangeLog





All the above are personal thoughts, if there is any mistake welcome to comment.

Welcome to reprint, please sign and keep the original link.

Contact email: [email protected]

For more study notes, see my personal blog ——>HuYan ten