MySQL index
The MySQL index is a data structure used by the storage engine to quickly find records
-
The index type
There are several indexes we often refer to:
-
Single index
An index contains only a single column, but a table can have multiple single-column indexes.
Single-column indexes can be further subdivided into the following categories
- Plain index: a basic index type in MySQL that has no restrictions. It allows duplicate and null values to be inserted in the column where the index is defined, purely to make querying data faster.
- Unique index: Values in index columns must be unique, but null values are allowed
- Primary key index: a special unique index that cannot be null
We will not cover the creation and modification of these single-column indexes, as we are familiar with these operations.
-
Joint index
Indexes created by combining multiple columns, following the left-most prefix principle, such as (c1,c2,c3,c4…. CN), where conditions are used in the order in which the index is created (conditions can be written out of order), if a middle column is not used in the condition, or if like is used, the following column cannot be used in the index.
Indexes can also be used for grouping and sorting, sorting before grouping, averaging and so on. So in grouping and sorting, if the field order can be in accordance with the index of the field order, you can use the index of the ordered feature
-
The full text indexing
Prior to MySQL 5.6, only the MyISAM storage engine supported full-text indexing;
MySQL 5.6 and later, MyISAM and InnoDB storage engines support full-text indexing;
The fields used are CHAR, vARCHar, and text
Creating a full-text index
create table fulltext_test ( id int(11) NOT NULL, content text NOT NULL.PRIMARY KEY (id), FULLTEXT KEY content_tag_fulltext(content) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; Copy the code
Create a federated full-text index
create table fulltext_test ( id int(11) NOT NULL, content text NOT NULL, content1 varchar(255), PRIMARY KEY (id), FULLTEXT KEY content_tag_fulltext(content,content1) //Create a federated full-text index column) ENGINE=MyISAM DEFAULT CHARSET=utf8; Copy the code
Create full-text indexes on existing tables
//throughcreatecreatecreate fulltext index content_tag_fulltext on fulltext_test(content); //throughaltercreatealter table fulltext_test add fulltext index content_tag_fulltext(content); Copy the code
Delete full text Index
//throughdropdeletedrop index content_tag_fulltext on fulltext_test; //throughalterdeletealter table fulltext_test drop index content_tag_fulltext; Copy the code
Using full-text indexes
select * from fulltext_test where match(content) against('xxx'); Copy the code
-
Spatial index
MySQL’s spatial data type is a new function added after 5.7, so MySQL’s spatial data type is not used by many people. If GIS development is generally not used as data storage, so I will not mention these for the moment, and I will find time to introduce them later
There are four spatial data types in MySQL: GEOMETRY, POINT, LINESTRING, and POLYGON. When creating a SPATIAL index, use the SPATIAL keyword. The engine is MyISAM. The columns in the SPATIAL index must be declared NOT NULL.
But in terms of MySQL indexes as a whole these are just the types of indexes, and the types are just the form that we use directly in MySQL for presentation. Under these types there are different algorithms to support the realization of these indexes. There are four ways to support these types: BTREE, RTREE, HASH, and FULLTEXT.
-
-
Index method
In InnoDB, all data is stored in the form of pages as a basic unit. Each data page is connected by a two-way linked list. The records in the page form a one-way linked list in ascending order of primary key size. If you want to find the data you need on a page, InnoDB has two ways to find it, depending on what you’re looking for:
- Search by primary key is simple. InnoDB uses dichotomy to quickly locate the corresponding slot in the page directory, and then iterates through the records in the corresponding slot group to quickly find the specified record. (If there is no primary key, InnoDB creates a virtual primary key, which we can’t see.)
- The search for non-primary key columns using other columns as search criteria is not so lucky, because there is no so-called page directory for non-primary key columns in the data page, so we cannot quickly locate the corresponding slot using dichotomy. In this case, each record in the single linked list can only be traversed from the smallest record in turn, and then compare whether each record meets the search conditions. Obviously, this search is very inefficient.
However, this search is only within one page. If we want to search for records in multiple pages, we do not have the basis to locate records quickly. Therefore, we can only search from the first page down the bidirectional linked list, traversing the data of each page until the end. If it’s a small amount of data it doesn’t feel slow, but what if it’s 100 million? Billions? So to solve this problem, he was born in the opening sentence of our article.
-
B-tree indexes
InnoDB has designed an index scheme and introduced the concept of directory entry records, which are no different from our ordinary data pages. The physical analogy is that if the data in a table is viewed as a book, then each data entry is a section, the page in the database corresponds to a chapter in the book, and the table of contents entry is the table of contents in front of the book, which clearly records the page number of each chapter.
The directory entry record in the figure stores the minimum primary key value of each data page and the page number of each data page. When we look up data through the primary key, we can first check the table of contents record page to determine which record we are looking for in the page, and then go to the specific data page to retrieve the data. Let’s say we look for the record with ID 4. If 4 is greater than 1 and less than 5 then the record is stored on the data page 10.
When a data page cannot hold these entries, it is necessary to allocate a new page to store the entries, and the new page and the previous page form a bidirectional linked list
But at this time, if the directory entry record page is very much, we will find it very difficult. In fact, at this point InnoDB will use the original directory entry record page as the data page and generate a directory entry record page on it. In this way, we can quickly locate the data we need through layer upon layer search.
In this diagram, we can see that the data structure formed by this design is very much like what we call a tree, and yes, it is a B+ tree that is often referred to in InnoDB. Both the data pages that hold user records and the data pages that hold directory entry records are stored in the B+ tree data structure, so we also call these data pages nodes. As can be seen from the figure, our actual user records are stored in the lowest nodes of the B+ tree, which are also called leaf nodes or leaf nodes. The other nodes used to store directory items are called non-leaf nodes or internal nodes, among which the node at the top of the B+ tree is also called the root node.
Some people may ask, if the amount of data stored is very large, the number of layers of B+ tree will also be very high, resulting in a slow index efficiency, or even worse than no index.
It’s a bit of a stretch, you know, the people who designed this data structure didn’t think of it. Suppose we can store 100 pieces of data per user record page and 1000 pieces of data per directory record page (why is the directory record page more than the user record? That’s because each entry in the catalog record page only holds the minimum primary key and page number.)
So if you have a directory record page you have 1000 x 100= 100,000 records
If there are two levels of directory records page is 1000×1000×100=100000000 records
If there are three layers of directory record pages, 1000×1000×1000×100=100000000000 records
If you have a three-level directory record page, that is, a tree with a height of 4, you can record 100,000 million data. In a real scenario, you would record this much data in a single table.
-
Clustering index
A clustered index is a B+ tree with the following characteristics:
- Sort records and pages using the size of the record primary key value:
- The records in the page are arranged in a one-way linked list in order of the size of the primary key.
- Each page storing user records is also arranged in a bidirectional linked list according to the primary key size of user records in the page.
- The pages that store directory entry records are divided into different levels. The pages in the same level are also arranged in a bidirectional linked list according to the primary key size of directory entry records in the page.
B+
The leaf node of the tree stores the complete user record, which means that the value of all columns (including hidden columns) is stored in the record.
The InnoDB storage engine automatically creates clustered indexes for us instead of explicitly using INDEX statements in MySQL statements.
- Sort records and pages using the size of the record primary key value:
-
Secondary indexes
A secondary index is when we create a primary key index and then want to create an index for another column. InnoDB regenerates into a B+ tree, but the leaves of this B+ tree are not user records. Let’s say we have a table with three attribute columns: C1, C2, and C3. C1 is the primary key. We created an index for the primary key, and then for C2. Then the B+ tree formed by C2 is shown in the figure below:
Why is the value of the C1 column stored in the resulting tree? That’s because when we find c2 from this index, the primary key (c1) is used to locate the specific record in the page where the actual user records are stored (the primary key index is used here, not traversed). This operation is called table back
-
Joint index
In essence, it is also a secondary index, but its index column is no longer a single column, but a combination of multiple columns. For example, if a federated index uses c2 and C3 columns as index columns, it will first sort records and pages by C2 column. In the case that the C2 columns of the records are the same, the C3 column is used for sorting. In addition, instead of carrying the primary key in each record like a single-column secondary index, he puts the primary key in the leaf node.
-
-
A Hash index
Compared with a B-tree index, a Hash index can locate data more efficiently because of its key-value method, which requires only one retrieval, unlike a B-tree index, which requires multiple I/O access from the root node to the branch node and finally to the page node.
So why didn’t Hash indexes replace B-tree indexes? That’s because the specificity of Hash indexes also brings limitations and drawbacks.
- Hash indexes only support equivalent comparison queries, including =, IN, <=> (note that <> and <=> are different operations). No range queries, such as WHERE price > 100, are supported. Because a Hash index compares Hash values after Hash operation, it can only be used for equivalent filtering, not for range-based filtering. The size of Hash values processed by the corresponding Hash algorithm cannot be the same as before the Hash operation.
- Hash indexes cannot be used to avoid sorting of data. Since the Hash index stores the Hash value after the Hash calculation, and the size of the Hash value is not necessarily the same as the key value before the Hash operation, the database cannot use the index data to avoid any sort operation.
- Hash indexes cannot be queried using partial index keys. For a combined index, the Hash index calculates the Hash value after the combined index keys are combined, rather than separately. Therefore, the Hash index cannot be used when the first one or more index keys are used for query.
- Hash indexes cannot avoid table scans at any time. As previously known, Hash index is to store the Hash value of the Hash operation result and the corresponding row pointer information in a Hash table after the Hash operation of the index key. Since different index keys have the same Hash value, even if the number of records of the data satisfying a certain Hash key value is taken, You can’t do the query directly from the Hash index, but you still need to access the actual data in the table to compare and get the corresponding result.
- Hash indexes do not necessarily perform better than BTree indexes when a large number of Hash values are equal. For index keys with low selectivity, if a Hash index is created, a large number of record pointer information will be associated with the same Hash value. This can be cumbersome when trying to locate a single record, wasting multiple accesses to the table data and resulting in poor overall performance.
In addition, InnoDB, our common storage engine, does not support Hash index, only Memory and NDB support it, which is the main reason why we seldom use Hash index in normal development.
-
R – Tree indexes
At present, r-tree index is mainly used for the index of spatial data, but the spatial data type of MySQL was released after 5.7, and it is not widely used, so I don’t know much about this index method at present, and I will make up this content after learning.
How MySQL works: From the root understand MySQL