What is the index?

If you want to find a particular section of a book, you usually read the “index” of the book first and then find the corresponding page number. ‘Index’ can be understood as’ table of contents’.

In MYSQL, the storage engine uses indexes in a similar way, first finding the corresponding value in the index, and then finding the corresponding row based on the matching index record.

What’s an index for?

Speed up the retrieval of data in the table, that is, to help information searchers to find the record ID that meets the restriction conditions as soon as possible.

How does the index work?

Indexes are used when the where condition is an indexed field at query time and when the usage rules are met.

Classification of indexes (by type)

B-tree indexes

The characteristics of

Values on the B-tree are stored in order, and each leaf page is the same distance from the root node.

Pointers to leaf nodes point to indexed data rather than to other node pages.

Retrieve the principle

Binary search is carried out from the root node first. If it is found, data of the corresponding node is returned; otherwise, the node to which the pointer of the corresponding interval points is recursively searched until the node is found or null pointer is returned.

disadvantages

1. Inserting and deleting new data records will damage the properties of the B-tree. Therefore, you need to split, merge, and transfer the Tree to maintain the properties of the B-tree. I/O operations are frequent.

2. Interval search may require repeated traversal of upper-layer nodes, resulting in tedious I/O operations.

B + Tree index

The characteristics of

Compared with b-tree, B+Tree has the following differences:

Non-leaf nodes do not store data, but only index keys. Only leaf nodes store data.

MySQL B + Tree

On the basis of classical B+Tree, sequential access pointer optimization is added.

A B+Tree with sequential access Pointers is formed by adding a pointer to each leaf node of a B+Tree that points to adjacent leaves.

Improved range query performance:

If you want to query all data records with keys from 10 to 50, when 10 is found, you can access all data nodes at once by following the sequence of nodes and Pointers.

Interval query efficiency is greatly mentioned.

Because there is no need to go back to the upper parent node and repeat the search to reduce I/O operations as with b-tree. The structure is as follows:

Mysql > select * from B+TREE; What are the benefits of a B+TREE index?

Indexes themselves are too large to be stored entirely in memory, so indexes are often stored on disk as index files.

Index lookups incur disk I/O costs that are orders of magnitude higher than memory accesses,

Therefore, the index structure should minimize the number of disk I/O accesses during the search process to improve the index efficiency.

Compared with b-tree, the index of B+Tree is optimized.

B-/+Tree index performance advantages

Generally, the disk I/O times are used to evaluate indexes.

1. Optimized processing combined with the storage structure of the operating system: mysql skillfully uses the storage structure of the operating system (allocating a node to a storage page -> minimizing IO times)

& Disk prefetch (cache prefetch -> Speed up prefetch of data to be used soon).

2.B+Tree A node can store multiple child nodes with the same I/O count to retrieve more information.

3.B+TREE stores data only on leaf nodes & all leaf nodes contain a chain pointer & all other inner non-leaf nodes store index data only.

Only use index to quickly locate the data index range, locate the index first and then quickly locate the data efficiently through the index.

Principle of high speed positioning

Mysql design makes use of the disk prefetch principle, set the size of a B+Tree node to a page, and apply for a page space when creating a new node.

This ensures that each Node is physically stored on a page, and computer storage allocation is page-aligned, so that each Node requires only one I/O operation.

4. B-tree index, B+Tree index: a single node can hold multiple child nodes, and the I/O query times are the same (mysql can query THE I/O at most 3-5 times – so each node needs to store a lot of data)

5.B+Tree is more suitable for storing external indexes. From the above analysis, it can be seen that the larger d is, the better the index performance is, and the upper limit of the outgo degree depends on the size of key and data in the node:

6. The nodes in B+Tree are removed from the data field, so they have a larger output and better performance. Only use index to quickly locate the data index range, locate the index first and then quickly locate the data efficiently through the index.

dmax=floor(pagesize/(keysize+datasize+pointsize))
Copy the code

The B-tree index is applicable to the query of the full key value, key value range, and key prefix

Queries that can use indexes

There are the following tables and indexes

Table structure

The index

data

All values match

Matches all columns in an index

SELECT * from `user` t  where t.`user`='1' and t.age= 1;
Copy the code

Matches the leftmost prefix

Use only the first column of the index

SELECT * from `user` t  where t.`user`='1';
Copy the code

Matching column prefix

Matches only the beginning of a column’s value.

SELECT * from `user` t  where  t.`user` like 'd%';

Copy the code

Matching range value

It can be a range value

SELECT * from `user` t  where  t.age >=1 ;
Copy the code

Queries that access only indexes

Only the index is accessed, not the data row

SELECT * from `user` t  where id=1 ;
Copy the code

Description The B-tree index is unavailable

1. Do not start the search in the leftmost column of the index.

2. Columns in the index cannot be skipped.

3. If there is a range query for a column in the query, all columns to the right of it cannot be found using index optimization.

The hash index

define

Implemented based on hash tables, only queries that exactly match all columns of the index are valid.

Hash indexes are only displayed in the Memory engine.

Realize the principle of

For each row of data, the storage engine computes a Hash code for all index columns,

The hash code is a small value, and the computed hash code is different for rows with different key values.

A hash index stores all hash codes in the index while maintaining a pointer to each row of data in the hash table.

Note: If multiple columns have the same hash value, the index stores multiple record Pointers in a linked list into the same hash entry.

Advantages and disadvantages of hash indexes

advantages

The index structure is compact and the search speed is very fast.

limit

1. A hash index contains only hash values and row Pointers, not field values, so you cannot use values in the index to avoid reading rows.

However, accessing rows in memory is fast, and in most cases this doesn’t have a noticeable impact on performance.

2. Hash index data is not stored in microcosm order and therefore cannot be used for sorting.

3. Hash indexes also do not support partial index match lookups for federated indexes because hash indexes always use the entire contents of the index column to calculate the hash value.

4. Hash index does not support range search, only support equivalent comparison query, including =, IN(), <=>.

5. Accessing data in a hash index is very fast unless there are many hash conflicts (different index column values have the same hash value).

In the case of hash collisions, the storage engine must compare all row Pointers in the variable list, row by row, until it finds all rows that match the criteria.

6. If there are many hash conflicts, the corresponding index maintenance is more expensive.

An adaptive hash index for the InnoDB engine

When InnoDB notices that some index values are being used very frequently, it creates a hash index in memory based on the B-tree index, which gives the B-tree index some of the benefits of hash indexes. This is completely automatic, internal behavior that the user has no control over

Spatial Data Index (R-tree)

MyISAM tables support spatial indexes and can be used as geographic data stores.

Spatial indexes index data from all dimensions.

The full text indexing

Is a special type of index that looks for keywords in text rather than directly comparing values in the index.

Index of other categories

For example, TokuDB uses fractal tree indexes. This is a newly developed data structure, which has both the advantages of B-tree and avoids some disadvantages of B-tree.

Classification of indexes (by category)

The only index

Accelerated query + column value unique (can have NULL)

Primary key index (cluster index)

Accelerated query + unique column value (no NULL) + only one table

Normal index (non-clustered index)

Accelerated query only

The full text indexing

Segmentation of text content, search

Composite index

An index composed of multiple column values is designed for combined search, which is more efficient than index merging

Clustered index and non-clustered index (also called clustered index and non-clustered index)

Cluster index definition

Index and data files are the same file.

Not a consistent single index type, but a way to store data.

The details depend on how it is implemented, but InnoDB’s clustered index actually holds the B-tree index and rows in the same structure.

When a table has a clustered index, its rows are actually stored in the leaf pages of the index.

The term “clustering” refers to the compact storage of rows of data with adjacent key values.

The physical order of data rows is the same as the logical order of column values (typically the primary key column), and only one clustered index can exist in a table.

The characteristics and advantages of clustering index

The leaf node of the index is the corresponding data node (except for MySQL MyISAM, which only has a unique constraint on clustered and non-clustered indexes).

Instead of a clustered index, the index may need to perform a secondary query (back to the table) if the index does not cover the corresponding column, as discussed below.

Therefore, the speed of clustered indexes tends to be more advantageous in terms of queries.

Cluster index page splitting problem

Tables based on clustered indexes can suffer from “page splitting” when new rows are inserted, or when the primary key is updated so that rows need to be moved.

When the primary key of a row requires that the row must be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, a page split.

Page splitting causes tables to take up more disk space.

Definition of a non-clustered index

Index The index that is separate from the data file.

The logical order of the indexes in the index is different from that of the physical storage on the disk. A table can have multiple non-clustered indexes. Similar to the Chinese dictionary partial search, out of order

Secondary Query problem for non-clustered Indexes (back table)

A non-clustered index leaf node is still an index node, but it has a pointer to the corresponding data block. If a non-clustered index query is used,

If the query column contains other columns that are not covered by the index, a second query is made for the corresponding row on the node.

A method to solve the second query

Composite index (overlay index)

Create index(a, b); create index(a, b)

select a, b from t1 where col1 = 'dq';

Note that the left-most index principle should be followed when using a composite index. If the WHERE condition does not contain one or more left-most columns, the index will not work.

There is also the use of include in SQL Server, which allows you to include columns that are not in a clustered index without creating a composite index.

Myisam and InnoDB storage engine for index implementation

MyISAM

Data stores data addresses. An index is an index, and data is data. The index is in xx.myi and the data is in xx.myd, so it is also called a non-clustered index.

InnoDB

Data stores the data itself. Indexes are also data. Data and indexes are stored in a xx.idb file, so they are also called clustered indexes.

Suggestions on the use of clustered index and non-clustered index

1. The query efficiency of a clustered index is higher than that of a non-clustered index. However, if the value of a clustered index needs to be changed frequently, the write performance is not high, because the physical location of the corresponding data needs to be moved.

2. The non-clustered index can avoid the second query when the query is possible, so that the performance can be greatly improved.

3. Not all tables are suitable for index creation. Only tables with large data volumes are suitable for index creation.

Advantages and disadvantages of indexes

advantages

1. Indexes greatly reduce the amount of data that the server needs to scan.

2. Indexes help the server avoid sorting and temporary tables.

3. Indexes can change random I/ OS into sequential I/ OS.

disadvantages

1. Some disk space will be occupied.

2. Maintain corresponding indexes for inserts and modifications, consuming certain performance

Samsung index decision

One-star index: Indexes put related records together.

Two-star index: The order of the data in the index is the same as the order in the lookup.

Three-star index: The columns in the index contain all the columns needed in the query.

Several cases of index failure

1. Like starts with %, index invalid; The index is valid when the like prefix does not have % and the suffix has %.

2. The index is not used before or after the OR statement. If only one of the left and right query fields is an index, the index is invalid. The index takes effect only when both the left and right query fields are indexes.

3. The combined index does not use the first column index. The index is invalid.

4. Implicit data type conversion occurs. Varchar without single quotes may be automatically converted to int, invalidating the index and causing a full table scan.

5. Run the IS NULL or IS NOT NULL operation on index columns.

Indexes do not index null values, so such operations can not use indexes, can be handled by other methods, such as: number type, check greater than 0, string type set a default value, check whether the default value.

6. Use not,<>,! On index fields =. Does not mean that the index is never used, so processing it will only result in a full table scan.

Key <>0 or key<0.

7. The index is invalid when a function is used to calculate the index field.

8. If the full table scan speed is faster than the index speed, the mysql uses the full table scan, and the index becomes invalid.

Refer to the MYSQL -b +TREE index principle

Reference High-performance MySQL(Version 3)