Know the index

It is very important to know what an index is. A very appropriate analogy is the relationship between the contents page of a book and the body content of a book. In order to facilitate the search for the contents of a book, the index is formed through the contents of the book. So, the first thing you need to understand is that an index is also a file, and it takes up physical space.

For example, for MyISAM storage engine:

Files with the.frm suffix store table structures.

Files with the.myd suffix store table data.

Files with the.myi suffix store index files.

As shown below:

For InnoDB storage engine:

.frm

Ibd files store index files and data (innodb_file_per_table parameter required)

As shown below:

Therefore, when you index a table, the size of the index file changes, and when the data in your table changes, the index file also changes, but MySQL automatically maintains the index, and this process does not require your intervention, which is why improper indexes can affect MySQL performance.

Conclusion:

1. Index is to put the data in the data table in the index file according to the specific data structure for quick search;

2. Indexes exist on disks and occupy physical space.

Type of index

B-tree indexes

B-tree index is the most common index type. For example, InnoDB and MyISAM both use b-tree index structure. In fact, b-tree index structure is B+Tree. B+ Tree adds sequential access Pointers to leaf nodes to facilitate the range traversal of leaf nodes. Here is a brief introduction to InnoDB and MyISAM.

InnoDB

InnoDB support clustering index, index of clustering and the clustering index is not a strictly index, but a way of data storage, the name has something to do with its own storage, “cluster” said data line and the adjacent keys stored together, in short, is stored in the leaf node is actually the real data. InnoDB aggregates data by primary key, so a table can only have one clustered index and must have a primary key. If no primary key is defined and there is no non-empty index to replace it, InnoDB implicitly defines a primary key as the clustered index.

The secondary index of the cluster index stores the primary key value of the row rather than the pointer to the physical position of the row. Therefore, if a row is searched through the secondary index, it needs to find the leaf node of the secondary index to obtain the corresponding primary key value, and then search for the corresponding row. With InnoDB, an adaptive hash index can reduce such rework.

The lock

InnoDB uses row locks and therefore supports transactions, while MyISAM uses table locks and does not support transactions.

Scope of application

B-tree index is suitable for interval query, because the leaf nodes after b-tree storage are themselves ordered, and THE B+ Tree structure also adds sequential Pointers to leaf nodes, which is more convenient for interval query.

The hash index

Hash indexes are implemented based on hash tables, and only queries that exactly match all columns of the index are valid. The method is to compute a Hash code for all index columns, which acts as an index and holds a pointer to each row in the hash table.

advantages

The index itself only stores hash code, so the structure is compact and lookup is fast

limit

The hash codes in the index are stored sequentially, but the data corresponding to the hash code is not sequential and therefore cannot be used for sorting

Partial index column match lookups are not supported because a hash index uses the entire contents of the index column to compute a hash code

Only equivalence comparison is supported, but range query is not supported

If hash conflicts are serious, all row Pointers in the linked list must be traversed

If hashing conflicts are severe, index maintenance operations can be costly

InnoDB’s adaptive hash index

First, note that adaptive hash indexing is insensitive to the user; it is a completely automatic, internal behavior that the user cannot control or configure, but can turn off.

When InnoDB notices that an index value is being used very frequently, it creates a hash index in memory based on the B-tree index, so that the B-tree can have some of the benefits of hash indexes, such as fast hash lookups.

Of course, if the storage engine does not support the hash index, users can also customize the hash index, so that the performance will be relatively high, the defect is that you need to maintain the hash value, if this method is adopted, do not use SHA1() and MD5() as the hash function, because these two are strong encryption functions, the design goal is to minimize the conflict. The resulting Hash code is a very long string that wastes a lot of space, and there are less conflicting requirements for indexes in a hash index.

Advantages of indexes

Using indexes reduces the amount of data that the server needs to scan

Using indexes helps the server avoid sorting and temporary tables

You can use indexes to turn random I/ OS into sequential I/ OS

However, indexes are not always the best solution. For very small tables, a simple full table scan is more efficient in most cases, for medium to large tables, indexes are more efficient, and for very large tables, partitioning is more efficient.

Common optimization methods

The left-most prefix principle for federated indexes

Compound indexes comply with the left-most prefix rule. The index is used only when the field before the compound index is used. If the search condition is not based on the left-most column of the index, the index cannot be used.

As in (a, b, c) established on the three fields of joint index, so it can speed up a | | (a, b) (a, b, c) of three groups of query speed, not speed up | b (b, a) the query sequence.

In addition, when building a federated index, the most discriminating field is at the far left.

Do not use functions and operations on columns

Do not use functions on columns as this will cause index invalidation and a full table scan.

For example, the following SQL statement:

select * from artile where YEAR(create_time) <= ‘2018’; Copy the code

Even if an index is created on date, a full table scan is performed, which allows computing to be placed in the business layer. This not only saves database CPU, but also optimizes query caching.

Negative conditional queries cannot use indexes

The negative conditions are:! =, <>, not in, not exists, not like, etc.

select * from artile where status ! = 1 and status ! = 2; Copy the code

This can be optimized using in:

Select * from artile where status in (0,3

Using overwrite indexes

A covered index is a column that is being queried. Data can be retrieved from the index instead of the row locator and retrieved from the table, which can greatly improve performance.

You can define an additional column that the index contains, even if the column is useless to the index.

Avoid casts

If the types on the left and right sides of the query condition do not match, the cast may cause index failure and full table scan.

If the phone field is of type VARCHAR, the following SQL cannot match the index:

select * from user where phone=12345678901; Copy the code

Can be optimized as:

select * from user where phone=’12345678901′; Copy the code

Range columns can use indexes

Range conditions include <, <=, >, >=, and between.

A range column can use the index, but the column following the range column cannot use the index. An index can be used for a maximum of one range column. If there are two range columns in a query condition, the index cannot be used for all of them.

Indexes should not be created on fields with frequent updates and low data differentiation

Updates change the B+ tree, and indexing frequently updated fields can significantly degrade database performance.

“Gender” is an attribute with little distinction. It is meaningless to build an index and cannot effectively filter data. The performance is similar to that of full table scan.

The distinction can be calculated using count(distinct(column name))/count(*), and indexes can be created at least 80% of the time.

Index columns cannot be null

Single-column indexes do not store null values, and compound indexes do not store all null values. If columns are allowed to be NULL, you may get an unexpected result set.

Avoid using OR to join conditions

You should avoid using OR to join conditions in the WHERE clause, because this can lead to index invalidation and full table scans. Although newer versions of MySQL can hit indexes, query optimization costs more CPU than in.

Fuzzy query

Leading fuzzy queries cannot use indexes, non-leading queries can.