MySQL index (2021 latest edition)

Hi guys, I’m Courson.

Today we are going to show you how many questions you can answer correctly

This article appears in the Interview Cheat Sheet series on Github at github.com/cosen1024/J… Domestic Gitee(downloadable PDF) : gitee.com/cosen1024/J…

This is the MySQL index interview question directory, will not quickly check the missing ~

1. What is the index?

Indexes are special files (indexes on InnoDB tables are part of the table space) that contain Pointers to all the records in the table.

An index is a data structure. A database index is a sorted data structure in a database management system to help query and update data in a database table quickly. Indexes are usually implemented using B trees and their variant B+ trees. More generally, an index is a table of contents. In order to facilitate the search of the contents of the book, through the content of the index to form a catalog. And the index is a file that takes up physical space.

The establishment of MySQL index is very important for the efficient operation of MySQL, index can greatly improve the speed of MySQL retrieval. For example, when we look up the dictionary, there are search pinyin and partial, strokes, and so on in front, and then find the corresponding dictionary page number, so that we can open the dictionary page number can know that we need to search for a key of all the value of the information.

2. What are the advantages and disadvantages of indexes?

Advantages of indexes

It can greatly speed up the retrieval of data, which is the main reason for creating indexes.
By using indexes, you can improve the performance of the system by using optimization hiders during the query process.

Disadvantages of indexes

Time: It takes time to create and maintain indexes. To be specific, indexes need to be maintained dynamically when data in a table is added, deleted, or modified, which reduces the efficiency of adding, changing, or deleting.
Spatial: Indexes need to occupy physical space.

3. What types of indexes does MySQL have?

1. Storage structure: BTree index (B-tree or B+Tree index), Hash index, full-index, r-tree index. What is described here is the form in which the index is stored,

2, from the application level to divide: common index, unique index, composite index.

Normal index: that is, an index contains only a single column. A table can have multiple single-column indexes
Unique index: The value of the indexed column must be unique, but empty values are allowed
Composite index: An index composed of multiple column values that is specifically used for combined searches and is more efficient than index merges
Clustered index (clustered index) : not a separate type of index, but a way of storing data. The details depend on the implementation, InnoDB’s clustered index is actually a b-tree index (technically B+Tree) and rows in the same structure.
Non-clustered index: if it is not a clustered index, it is a non-clustered index

3, according to the physical order of the data and the logical (index) order relationship of the key value: clustered index, non-clustered index.

4. What is the underlying implementation of indexing?

A Hash index

Based on hash table implementation, only queries that accurately match all columns of the index are valid. For each row of data, the storage engine computes a Hash code for all columns of the index. Hash indexes store all hash codes in the index and store Pointers to each row in the index table.

Image source: www.javazhiyin.com/40232.html

MySQL > select * from B+Tree;

B-tree speeds up data access because the storage engine does not need to perform a full table scan to obtain data, which is distributed among nodes.

B + Tree index

B-tree is an improved version of the B-tree and is the storage structure used by the database index index. The data is on the leaf nodes, and sequential access Pointers are added, with each leaf node pointing to the address of the adjacent leaf node. Compared with B-tree, only two nodes need to be searched and traversed. B-tree obtains all nodes, which is more efficient than B+Tree.

B + tree properties:

N subtree nodes contain n keywords and do not store data but indexes of the data.
All leaf nodes contain information of all keywords and Pointers to records containing these keywords, and leaf nodes themselves are linked in large order according to the size of keywords.
All non-terminal nodes can be considered as index parts, which contain only the maximum (or minimum) keyword in the subtree.
In a B+ tree, data objects are inserted and deleted only on leaf nodes.
B+ trees have two head Pointers, one for the root node of the tree and one for the leaf node of the minimum key code.

5. Why the default index structure uses B+Tree instead of B-tree, Hash, binary, or red-black?

B-tree: Answer in two ways

B+ trees have lower disk read and write costs: B + tree internal nodes and no pointer to keyword specific information, so the internal nodes relative smaller B (B -) tree, if you put all the same internal nodes keyword in the same disk block, then disk blocks can hold is, the more the number of keywords, one-time read into memory to find the key words, the more, the relative number of IO, speaking, reading and writing was reduced.
Because of B + tree data is stored in the leaf node, branch node for the index, convenient sweep library, you just need to sweep the leaf node again, but because of the B tree branch node stores the same data, we need to find the specific data, need to be in a sequence traversal to sweep in order, so the B + tree is more suitable for in the case of range query, So usually B+ trees are used for database indexes.

Hash:

Fast location, but no sequence, high I/O complexity.

Hash table based implementation, only Memory storage engine explicitly support Hash index;
Suitable for equivalent query, such as =, in(), <=>, does not support range query;
Because the index is not stored in order, it cannot be sorted using indexes like B+Tree.
Hash indexes are very fast when looking up equivalents;
Because the Hash index always contains all the contents of all the columns in the index, it does not support match lookup of partial index columns.
Hash indexes are inefficient if there are a large number of duplicate keys because of hash collisions.

Binary tree: the height of the tree is not uniform, cannot self-balance, search efficiency is related to data (tree height), and IO cost is high.

Red-black tree: The tree height increases with the data volume, and the I/O cost is high.

6. Talk about clustered indexes and non-clustered indexes.

In InnoDB, the leaf node of index B+ Tree stores the entire row of data. The primary key index is also known as the clustered index. In this way, the data store and index are put together.

The leaf node of index B+ Tree stores the primary key value of the non-primary key index, also known as the non-clustered index, secondary index.

Differences between clustered index and non-clustered index:

The difference between a non-clustered index and a clustered index is that the leaf node of a non-clustered index does not store the data in the table, but instead stores the primary key (row number) corresponding to that column
For InnoDB, to look up data, we need to look up data in the clustered index based on the primary key. This process of looking up data based on the clustered index is called table back. The first index is usually sequential I/O, and the return operation is random I/O. The more times we need to return to the table, i.e. the more random I/OS, the more we tend to use full table scans.
Typically, a primary key index (clustered index) query is queried only once, whereas a non-primary key index (non-clustered index) needs to be queried multiple times back to the table. Of course, if you are overwriting the index, you can look it up once
Note: Both MyISAM primary key index and secondary index are non-clustered indexes, while InnoDB primary key index is clustered index and secondary index is non-clustered index. Our own indexes are mostly non-clustered indexes.

7. Will non-clustered indexes be queried back into the table?

Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table. An index that contains (overwrites) the values of all the fields to be queried is called a “overwrite index”.

To take a simple example, if we create an index on the age of the employee table, then when we select score from student where score > 90, the leaf node of the index already contains the score information, and the table will not be queried again.

8. What is a federated index? Why do I care about the order in a federated index?

MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched.

Specific reasons are as follows:

MySQL > create index (name, age, school); MySQL > create index (name, age, school); MySQL > create index (school);

When the query is performed, the indexes are only strictly ordered according to name, so the name field must be used for equivalent query first. Then, the matched columns are strictly ordered according to age field, and the age field can be used for index search, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure.

9. What is MySQL’s left-most prefix rule?

The left-most prefix principle is left-most first. When creating a multi-column index, the most frequently used column in the WHERE clause is placed on the left-most according to business requirements. Mysql > select * from (a,b,c,d); mysql > select * from (b, C,d); mysql > select * from (a,b,c,d); If the index (a,b,d,c) can be used, the order of a,b,d can be arbitrarily adjusted.

= and in can be out of order, such as a = 1 and b = 2 and c = 3. Create (a,b,c) indexes in any order. Mysql’s query optimizer will help you optimize them into a form that can be recognized by the indexes.

10. What about prefix indexes?

Because the fields we index may be very long, this takes up both memory and maintenance. So we thought it would be super double if we only indexed the public portion of the front of a long field. However, we need to note that order by does not support prefix indexes.

The process is:

Select count(distinct COL_1)/count(1) from table_1

Select count(distinct left(COL_1,4))/count(1) from table_1

Create index idx_front on table_1 (col_1(4))

11. Understand index push-down?

MySQL 5.6 introduced index push-down optimization. SET optimizer_switch = ‘index_condition_PUSHdown =off’; You can turn it off.

With index push-down optimization, you can reduce the number of times back to the table
Valid only for secondary indexes in InnoDB

Examples and explanations given in the official documentation are as follows:

There is a secondary index (zipcode, lastName, address) in people_table, SELECT * FROM people WHERE zipcode= ‘95054’ AND lastname LIKE ‘%etrunia%’ AND address LIKE ‘%Main Street%’;

If index push-down is not used, MySQL will query the data from the storage engine with zipcode= ‘95054’ and return it to the MySQL server. Then the MySQL server determines whether the data meets the criteria based on lastName LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’
If index push-down is used, MYSQL will first return an index that matches zipcode= ‘95054’, and then determine if the index matches lastName LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’. If yes, the data is located based on the index. If no, reject the data.

MySQL > select * from index;

Through Explain, such as the following example:

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND title='Senior Engineer' AND from_date='1986-06-26';

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	filtered	rows	Extra
1	SIMPLE	titles	null	const	PRIMARY	PRIMARY	59	const,const,const	10	1

Id: In a large query, each SELECT keyword corresponds to a unique ID, for example explain SELECT * from s1 where id = (SELECT ID from s1 where name = ‘egon1’); The id of the first select is 1 and the id of the second select is 2. Sometimes there are two SELECT’s with an ID of 1 because the optimizer turns the subquery into a join query.
Select_type: select corresponding to the type of the query keywords, such as SIMPLE, PRIMARY, SUBQUERY, DEPENDENT, SNION.
Table: indicates the table name of each query.
Type: The type field is important because it provides an important basis for determining whether a query is efficient. Through the type field, we can determine whether the query is a full table scan or an index scan, etc. Such as const(primary key index or unique secondary index for equivalence matching),ref(ordinary secondary index column for equivalence matching constant),index(scan full index overwrite index).

In general, the performance relationship between different types of type is as follows: ALL < index < range ~ index_merge < ref < eq_ref < const < System ALL Type Because it is a full table scan, it is the slowest in the same query conditions. An index query is not a full table scan, but it scans ALL indexes and is therefore slightly faster than an ALL query.
Possible_key: Possible index *(it can be deleted to reduce optimizer optimization time)*
Key: This field is the actual index used by MySQL in the current query.
Filtered: indicates the percentage that the finder predicts meets the next query criteria.
Rows is also an important field. Based on the statistics, the MySQL query optimizer estimates the number of rows of data that the SQL needs to scan and read to find the result set. This value is a very intuitive indication of how efficient SQL is, and in principle the fewer rows the better.
Extra: Indicates additional information, such as Using WHERE,Start temporary,End TEMPORARY, and Using temporary.

13. Why is it officially recommended to use self-growing primary keys as indexes?

In combination with the characteristics of B+Tree, the auto-added primary key is continuous, and the page splitting is minimized during the insertion process. Even if the page splitting is required, only a small part of the page splitting will be performed. And it can reduce the movement of data, every insert is inserted to the end. The idea is to reduce the frequency of splitting and movement.

Insert continuous data:

Pictures from: www.javazhiyin.com/40232.html

Insert discontinuous data:

14. How to create an index?

There are three ways to create an index.

1. CREATE index when execute CREATE TABLE

CREATE TABLE user_index2 (
	id INT auto_increment PRIMARY KEY,
	first_name VARCHAR (16),
	last_name VARCHAR (16),
	id_card VARCHAR (18),
	information text,
	KEY name (first_name, last_name),
	FULLTEXT KEY (information),
	UNIQUE KEY (id_card)
);

Copy the code

ALTER TABLE ALTER TABLE ALTER TABLE

ALTER TABLE table_name ADD INDEX index_name (column_list);
Copy the code

ALTER TABLE creates a normal, UNIQUE, or PRIMARY KEY index.

Table_name indicates the name of the table to which the index is to be added. Column_list indicates the column to which the index is to be added. If there are multiple columns, the columns are separated by commas.

The index name index_name is self-naming. By default, MySQL assigns a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time. 3. Run the CREATE INDEX command.

CREATE INDEX index_name ON table_name (column_list);
Copy the code

15. What should I pay attention to when creating an index?

Non-empty fields: Columns should be specified NOT NULL unless you want to store NULL. Columns with null values are difficult to query optimize in mysql because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string;
The columns of fields with large value dispersion (the difference between values of variables) are placed before the joint index. You can view the difference value of the field by using the count() function. The larger the returned value is, the more unique values of the field the higher the dispersion degree of the field is.
The smaller the index field, the better: Database data is stored in pages. More data is stored on a page. More data is obtained in one I/O operation, the more efficient it is.

16. What are the principles for building an index?

Mysql > select * from (a,b,c,d); mysql > select * from (b,c,d); mysql > select * from (a,b,c,d); D (a,b,d,c); d (a, B,d,c); d (a, B,d);

2, = and in can be in random order, such as a = 1 and b = 2 and c = 3 to create (a,b,c) index can be in any order, mysql query optimizer will help you optimize the form to be recognized by the index.

3. Try to select columns with high distinction as indexes. The formula of distinction is count(DISTINCT Col)/count(*), which indicates the proportion of fields that are not duplicated. So one might ask, is there any empirical value to this ratio? This value is also difficult to determine in different application scenarios. Generally, we require join fields to be above 0.1, that is, 10 records are scanned per item on average.

Select froM_unixtime (create_time) = ‘2014-05-29’ from ‘b’; select from_unixtime(create_time) = ‘2014-05-29’ from ‘b’; Obviously the cost is too great. The statement should be create_time = unix_TIMESTAMP (‘ 2014-05-29 ‘).

5, as far as possible to expand the index, do not create a new index. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.

17. Do indexed queries always improve query performance?

Querying data through an index is usually faster than a full table scan. But we must also be aware of the costs.

Indexes require space to store and need regular maintenance, and the index itself is modified whenever a record is added or subtracted from the table or an index column is modified. This means that each record of I* NSERT, DELETE, UPDATE will incur 4 or 5 additional disk I/ OS. Because indexes require extra storage and processing, unnecessary indexes can slow query response times. Using INDEX queries may not improve query performance. INDEX RANGE SCAN queries are applicable to two situations:

Based on a range retrieval, a typical query returns a result set less than 30% of the number of records in the table.
Retrieval based on non-unique indexes.

18. When does an index not walk (index failure)?

1, use! = or <> invalidates the index

2. Indexes fail due to inconsistent types

3. Index invalidation caused by function

Such as:

SELECT * FROM `user` WHERE DATE(create_time) = '2020-09-03';
Copy the code

If you use a function in an index column, it does not walk the index.

4. Index invalidation caused by operator

SELECT * FROM `user` WHERE age - 1 = 20;
Copy the code

If you perform (+, -, *, /,!) , then all will not go to the index.

5. Index invalidation caused by OR

SELECT * FROM 'user' WHERE 'id' = 0 and 'height' = 0;Copy the code

OR causes the index to be invalidated under certain circumstances. Not all OR invalidates the index. If an OR joins the same field, the index will not be invalidated; otherwise, the index will be invalidated.

6. Index invalidation caused by fuzzy search

SELECT * FROM 'user' WHERE 'name' = '% ';Copy the code

If % is placed in front of a matching field, the index is not removed.

7. NOT IN or NOT EXISTS causes the index to become invalid

End

It’s not easy to organize, please give it a thumbs up! See you guys next time

Shoulders of giants

Blog.csdn.net/ThinkWon/ar…

www.javazhiyin.com/40232.html

Juejin. Cn/post / 684490…