Is it really hard to get started with MySQL indexing

  • Is it really hard to get started with MySQL indexing
    • Why indexes exist
    • Type of index
      • The hash index
      • Binary tree
      • Jump table
      • B+Tree
    • Classification of indexes
      • Functional differentiation
        • Normal index
        • The primary key index
        • The only index
        • The prefix index
        • The full text indexing
      • By the number of indexes
        • Joint index
        • The most left prefix
      • From the disk point of view
        • Clustered index, non-clustered index
          • Back to the table
          • An index pushdown
    • conclusion

Often in the development of colleagues say that data query is slow, the first reaction is to add an index to the table. So I want to explore what is the index that we often say? Can you just add an index to solve the database query problem?

With this question in mind we began to explore what an index is in MySQL and what it can help us do.

Why indexes exist

In the existing program business, the database as an important part of storage, indispensable, and for the operation of the database is nothing more than to add, delete, change and check, but with the increase of the amount of data, the performance of the database will become the most important one, data query can not be slow, data query a slow, user experience will be poor.

How to ensure the efficiency of adding, deleting, changing and checking in data storage? It becomes an essential design.

In a data store like Mysql, there is always one thing — index. Index is similar to the table of contents of books. Using the table of contents of books can help to locate the number of pages of knowledge quickly, and index is also the same purpose, to quickly retrieve data.

Then we can summarize the purpose of indexing: to improve the speed of data retrieval.

Type of index

Since indexes can improve the speed of retrieval, we should add indexes to all database queries to make them run faster. This is really not a hurry, why? There are many kinds of indexes in the database. If the index is not used correctly, it will cause the whole database to be dragged down by many slow queries instead of speeding up the operation of the database.

With so many types of data structures used to improve read and write efficiency, let’s take a look at the common index types in a database.

The index type The hash index Binary tree Jump table B+Tree

The hash index

Hash index is simply a key-value model. We can find the corresponding Value through a given Key, which is very fast and convenient.

But you have to understand that a hash index is a hash function that calculates the Key and converts it to the data storage location. As the amount of data increases, it is inevitable that different keys will be computed and the same data storage location will occur.

How can we solve this situation? The common approach in the industry is that when data results in the same location appear, a linked list will be followed by the result, and the same data will be put into the linked list.

Further, when there are more and more data in the same position, the data in the linked list will be traversed when the data is queried, and the speed is also slow. At this time, the linked list can be treed, and the search speed of binary tree is still very fast.

⬇️ figure is an illustration of data:

Hash indexes are only suitable for finding equivalent data, not for range indexing, sorting, etc. A common hash index is in Redis.

Binary tree

There are tree data structures in the data structure. Although there are many kinds of tree structures, the commonly used data structure is binary tree. Binary tree is a tree with two forks, namely the left child node and the right child node. And then you have the octopus, and you have the octree, and you can imagine what an octree looks like.

Binary trees are characterized by the value of the left node < the parent node < the right node. If you want to find a value, you can look it up in the order of the children.

As the amount of data increases, the height of the binary tree also increases. The data stored in the database is not stored in the memory, but in the disk, whose access speed is dozens of times slower than that of the memory.

Now, if the height of a tree is 30, each time the tree is searched, the disk access speed is 10ms. If the height of a tree is 30, at least 30 times of disk access are required to obtain data. 30 x 10=300ms.

If you have more data, and the tree is 100, it’s expensive to get data once.

To solve this problem, you can use n-tree to reduce the height of the tree and reduce the number of disk accesses, thus improving efficiency.

Jump table

Skip table is a data structure based on multi-level linked list. The efficiency of data retrieval can be improved through layer by layer linked list query.

B+Tree

Mysql index implementation is built on the database engine, and there are multiple database engines in Mysql, the commonly used database engine is InnoDB.

InnoDB engine index implementation uses B+Tree index model, in fact, there is a BTree model, B+Tree is based on the development of BTree.

B+Tree can be considered an improved version of BTree:

Note that child nodes and leaf nodes are different concepts. Nodes that have no children are called leaf nodes

  • In a B+Tree, the neutron node only stores indexes, whereas in a B Tree, it stores data.
  • The leaf nodes in a B tree do not need to be connected by a linked list, as in a B+ tree.
  • The leaf nodes in the B+ tree store data.

Each index in a database can correspond to a B+ tree, and a table can have multiple B+ trees.

Both B+Tree and BTree use multi-fork trees (the number of forks in the Tree is calculated according to the size of the page. The index involves adding and deleting, as well as page splitting and merging) to ensure that all index data is not put into the memory, reducing disk access times and speeding up data access.

Classification of indexes

Innodb engine is commonly used in Mysql, and the way to organize database indexes is B+Tree.

B+Tree is an index organization table. How many index types are there in B+Tree?

Division from different directions can be divided into different types.

Functional differentiation

Mainly for ordinary index, primary key index, unique index, prefix index, full text index, hash index.

Normal index

Create a single index -> create a single index

alter table table_name add index index_name(column);
drop index index_name on table_name;
ALTER TABLE table_name DROP INDEX index_name
Copy the code

The primary key index

A primary key index is a normal index with two constraints: unique and cannot be null. Primary key indexes are used to maintain index organization in Innodb, so it is recommended that all your tables have primary keys when using the Innodb engine.

Create a primary key you can specify the primary key(‘id’) when creating a table, or you can create a joint primary key(‘id’,’name’).

Create the SQL associated with the primary key

Select * from table where no primary key exists; ALTER TABLE table_name ADD PRIMARY KEY (' id ') # ALTER TABLE table_name DROP INDEX name_indexCopy the code

The only index

A unique index adds a unique constraint on the basis of a common index, and checks whether the index data already exists in the database when related data is inserted.

Create with the following create statement:

ALTER TABLE table_name ADD UNIQUE (' column ') # drop index index_name on table_name;Copy the code

The prefix index

Strings are often encountered in programming, such as mailboxes. In some business scenarios, prefixes of certain strings need to be matched.

There is a problem with this. If you cannot use an index, you can only do a full table scan. With a large amount of data, this approach becomes a performance bottleneck.

Prefix index in database is to solve the problem of string prefix matching.

Alter table table_name add index index_name(columns(6)); Drop index index_name on table_name; Select count(distinct column name)/count(*)as a, count(distinct left(column name,100)) as b, COUNT(DISTINCT left(column name,110)) as C from the table nameCopy the code

A disadvantage of prefixed indexes is that they cannot be optimized using overridden indexes and must be queried back to the table.

The full text indexing

Full text index is used to solve the problem of slow matching of Chinese text in Mysql. It is often used to search % content % like fuzzy search. It is not possible to use the index listed above.

The related SQL file can be viewed at 👇

create fulltext index table_name
    on index_name(column.column);
alter table table_name
    add fulltext index index_name(column.column);
Copy the code

Note that full-text indexing has its own matching syntax, using the match and against keywords 👇.

select * from table_name where match(column.column) against('xxx xxx');
Copy the code

By the number of indexes

The index consists of several columns. A federated index is formed when there are multiple columns

  • Single index: An index that has only one column created
  • Federated index: An index consisting of multiple columns combined.

The introduction of single index need not say more, here mainly talk about the union index.

Joint index

We know that the index structure of the B+Tree is that we can use the left-most prefix to locate records. Proper federated indexes are needed to make this rule powerful.

The most left prefix

By the way, the question is how should we arrange the field order in the index when creating a federated index?

When the current table is newly created, no other indexes can be created based on service requirements. If other indexes already exist in the table, reordering can help reduce index creation.

Each time a new index is created, the index storage space increases. As the amount of data increases, the index storage space also increases. Therefore, when creating an index, you need to check the space usage principle.

When a large field and a small field are combined to form a joint index, the large field index precedes the joint index.

For example, you now need to query data by mail and age, but you also need to query data by age and email separately.

The usual first reaction is to create three indexes,age,name,(name,age)/(age,name).

Generally, the length of email is greater than age. Under the principle of left-most prefix, the first field in the joint index can be queried separately using the index. The selected index is age,name,(name,age).

From the disk point of view

Look at some database data always clustered index, non-clustered index, these two ways with the primary key index, normal index and what is the difference?

Clustered index, non-clustered index

In fact, it is a kind of content, but according to the classification of different ways, different names. Clustered indexes and non-clustered indexes refer to the different organization of data on disk.

The clustered index refers to the disk storing the actual data according to a certain physical address in the same order as the index. So when the index is adjacent, the corresponding data must be stored in the order of adjacent.

Sequential reads are much faster than random reads. Because clustered indexes are stored in physical order, a table can have only one clustered index, which is a primary key index in Mysql. Of course, primary key indexes can contain multiple columns.

Other types of indexes are called non-clustered indexes.

Query data using a non-clustered index. The purpose is to find the corresponding clustered index, that is, the primary key index. Query the corresponding data through the primary key index. There are also two techniques involved in this process, namely back table and index push-down.

Back to the table

Back table very see is through the other index to find the corresponding primary key value, and then use the primary key value to go to the table again to retrieve the data.

Is there no need to return the form? SQL > select * from primary key; select * from primary key; select * from primary key;

Select * from user where userid=1 select * from user where userid=1 Select * from user where userid=1; select * from user where userid=1; Select * from user where userid =1 select * from user where userid =1; select * from user where userid =1Copy the code
An index pushdown

Prior to Mysql5.6, each query needed to be returned to the table once, but with the addition of index push-down technology, unneeded data can be filtered directly from the index column, reducing the number of times to return to the table.

Note that this technique needs to be used in the federated index (age,sex).

select * from user where  10<age and age>20 and sex=1;
Copy the code

First, the index is used to query the age of the person whose age is greater than 10 and less than 20. Meanwhile, the index column can directly determine whether the gender of the user meets 1 (male). If yes, the record will continue, not be filtered out.

Before 5.6, when the value is greater than 10 and less than 20, the comparison will be made according to the primary key data read. There are naturally more times of going back to the table than using the index push-down version.

conclusion

This chapter from a small problem triggered the exploration of index, including the introduction of several commonly used index technology in the existing database, understand why Mysql will choose B+Tree as the index retrieval method, and through this way to comb through the existing index technology in Mysql.

I hope this index article can help you understand the index.