How much do you know about frequent interview questions about database indexing?

The MySQL index is a popular interview question. Here are some of the most popular interview questions you can ask:

Search the public account zhang on wechat, reply to the interview manual, and get the PDF version of this document and more interview materials.

Recommended reading:

Understand all basic Java knowledge questions
Understand all the computer network interview questions
Understand all the Java collection interview questions
Learn all the HashMap interview questions

What is an index?

An index is a structure that sorts the values of one or more columns of a database table. Using an index, you can quickly access specific information in the data table.

Pros and cons of indexes?

Advantages:

Greatly speed up data retrieval.
Turn random I/O into sequential I/O(because the leaves of a B+ tree are joined together)
Accelerometers join to each other

Disadvantages:

From a spatial perspective, building an index requires physical space
In terms of time, it takes time to create and maintain indexes. For example, indexes need to be maintained when data is added, deleted, or modified.

Index data structure?

Index data structure mainly includes B+ tree index and hash table, corresponding indexes are B+ tree index and hash index respectively. InnoDB engine index types are B+ tree index and hash index. The default index type is B+ tree index.

B + tree index

Those familiar with data structure know that B+ tree, balanced binary tree and red-black tree are all classical data structures. In a B+ tree, all record nodes are placed on leaf nodes in order of key-value size, as shown in the figure below.

As you can see from the figure above, because B+ trees are ordered and all the data is stored in leaf nodes, lookup is very efficient and supports sorting and range lookup.

The index of B+ tree can be divided into primary index and secondary index. The primary index is clustered index, and the secondary index is non-clustered index. Cluster index is a B+ tree index composed of the primary key as the key value of the B+ tree index. The leaf nodes of the cluster index store complete data records. A non-clustered index is a B+ tree index consisting of columns with non-primary keys as the key values of the B+ tree index. The leaf nodes of the non-clustered index store primary key values. Therefore, when using a non-clustered index to perform a query, the primary key value will be found first, and then the data field corresponding to the primary key will be found according to the clustered index. In the figure above, the leaf node stores data records and is the structure diagram of clustered index. The structure diagram of non-clustered index is as follows:

The letters in the figure above are the column values of the non-primary key of the data. If you want to query the information of the column value B, you need to find primary key 7 and query the data field corresponding to primary key 7 in the cluster index.

The hash index

Hash index is based on the hash table, for each row of data, the storage engine will be on the indexed column by hash algorithm calculated hash hash code, and the hash algorithm to keep different column values computed hash code value is different, the hash code value as the key value of the hash table, pointer will point to the data line as the value of a hash table values. The time complexity of finding a data in this way is O (1), which is usually used for accurate searching.

What’s the difference between a Hash index and a B+ tree?

Because of their differences in data structure, hash indexes are generally used for accurate equivalent lookup, while B+ indexes are mostly used for other searches besides accurate equivalent lookup. In most cases, you will choose to use B+ tree indexes.

Hash indexes do not support sorting because hash tables are unordered.
Hash indexes do not support range lookups.
Hash indexes do not support fuzzy queries and left-most prefix matching of multi-column indexes.
Since hash conflicts occur in hash tables, the performance of hash indexes is unstable, whereas the performance of B+ tree indexes is relatively stable, with each query going from root to leaf nodes

What are the types of indexes?

The main index types of MySQL are FULLTEXT, HASH, BTREE, and RTREE.

FULLTEXT

FULLTEXT is a full-text index. MyISAM storage engine and InnoDB storage engine support full-text index in MySQL5.6.4 and above. It is generally used to find keywords in text, rather than directly compare whether they are equal or not. Full-text index is mainly used to solve the problem of low efficiency of fuzzy query for text such as WHERE name LIKE “%zhang%”.
HASH

HASH is a HASH index. HASH indexes are mostly used for equivalent queries with complex time of O (1) and high efficiency. However, they do not support sorting, range query, and fuzzy query.
BTREE

BTREE is a B+ tree index, INnoDB storage engine default index, support sorting, grouping, range query, fuzzy query, and so on, and stable performance.
RTREE

RTREE is a spatial data index, which is mainly used to store geographical data. Compared with other indexes, the spatial data index has the advantage of range search

What are the types of indexes?

Primary key index: Data columns cannot be duplicate or NULL, and a table can have only one primary key index
Composite index: An index consisting of multiple column values.
Unique index: Data columns cannot be duplicated and can be NULL. The value of the index column must be unique. If it is a combined index, the combination of column values must be unique.
Full-text indexing: Searches for the content of text.
Normal index: Basic index type, which can be NULL

What’s the difference between a B tree and a B+ tree?

There are two main differences between B trees and B+ trees:

The inner node and leaf node of B tree store keys and values, while the inner node of B+ tree only has keys and no values. The leaf node stores all keys and values.
The leaf nodes of the B+ tree are linked together to facilitate sequential retrieval.

The structure of both is shown below.

Why does the database use B+ trees instead of B trees?

B trees are good for random retrieval, while B+ trees are good for both random and sequential retrieval
The space utilization of the B+ tree is higher because each node of the B+ tree stores keys and values, while the internal nodes of the B+ tree only store keys. In this way, one node of the B+ tree can store more indexes, which lowers the height of the tree, reduces I/O times, and speeds up data retrieval.
The leaf nodes of a B+ tree are all connected together, so the range search and sequence search are more convenient
The performance of B+ trees is more stable because in B+ trees, each query goes from the root node to the leaf node, whereas in B trees, the value to be queried may not be at the leaf node but already found at the internal node.

When is it appropriate to use a B-tree? Because the internal nodes of a B-tree can also store values, you can put some frequently accessed values closer to the root node, which can improve query efficiency. In summary, the performance of B+ trees is more suitable as database indexes.

What is a clustered index and what is a non-clustered index?

The main difference between clustered indexes and non-clustered indexes is whether data and indexes are stored separately.

Clustered indexes: Data and indexes are stored together, and the leaves of the index structure retain rows of data.
Non-clustered indexes: Data forward and index are stored separately. Index leaf nodes store addresses pointing to data rows.

In InnoDB storage engine, the default index is B+ tree index. Indexes created using primary keys are primary indexes and clustered indexes. Indexes created on primary indexes are secondary indexes and non-clustered indexes. The secondary index is created on top of the primary index because the leaf node in the secondary index stores the primary key.

In MyISAM storage engine, the default index is also B+ tree index, but the primary and secondary indexes are non-clustered indexes, that is, the leaf node of the index structure stores an address to the data row. And use secondary indexes to retrieve indexes that do not require access to the primary key.

To see the difference, here are two classic images (courtesy of the Internet) :

Does a non-clustered index have to do a table-back query?

Is said above the leaf node of the cluster index storage is a primary key, that is to find the primary key, through the clustering index through clustering index data of the primary key, the latter to find primary key corresponding data through clustering index of query process is back to the table, then the clustering index table query will certainly to back?

The answer is not necessarily, there is an index coverage problem, if the query data is fully available on the secondary index, there is no need to query back to the table. For example, a table stores personal information including id, name, age, and other fields. Select ID,name from user where name = ‘zhangsan’; select ID,name from user where name = ‘zhangsan’; This query does not need to be queried back to the table because all the data is already retrieved through the non-clustered index, which is what index coverage is. Select id,name,age from user where name = ‘zhangsan’; A back table query is required because the value of age cannot be retrieved through a non-clustered index. So how do you solve that? Select id,name,age from user where name = ‘zhangsan’; select id,name,age from user where name = ‘zhangsan’; Query.

So index coverage can solve the problem of non-clustered index query back to the table.

What are the use scenarios for indexes?

Indexing is very effective for medium and large tables, but for very small tables, full table scans are generally faster.
For very large tables, where the cost of creating and maintaining indexes can become high, consider partitioning.
If the number of changes to the table is very large and the number of queries is very small, there is no need to create an index, because maintaining the index is also costly.
Fields that do not normally appear in a WHERE condition do not need to be indexed.
Consider a federated index if multiple fields are frequently queried.
Consider unique indexes when there are many fields and their values are not duplicated.
Consider normal indexes when there are multiple and duplicate fields.

Index design principles?

The best columns for an index are those that appear after WHERE or specified in a join sentence, not in a selection list that appears after the SELECT keyword.
The larger the cardinality of the index column is, the better the index effect is. In other words, the higher the differentiation of the index column, the better the index effect is. For example, using a poorly differentiated column, such as gender, as an index, would be bad, because the column has a maximum of three cardinals, most of which are either male or female.
Use shorter indexes whenever possible. You should specify a shorter prefix length when indexing longer strings because smaller indexes involve less disk I/O and because blocks in the index cache can hold more key values, making queries faster.
Use left-most prefixes whenever possible.
Don’t over-index, each index requires extra physical space and takes time to maintain, so more indexes is not always better.

How can indexes be optimized?

The key to index optimization is to conform to the index design principles and application scenarios. The indexes that do not meet the requirements are optimized to conform to the index design principles and application scenarios.

In addition to the design principles and application scenarios of the index, it can also be considered from the following two aspects.

Indexed columns cannot be part of an expression or arguments to a function when querying, because indexes cannot be used. For example,select * from table_name where a + 1 = 2
Place the most differentiated index first
Use select* as little as possible

The usage scenarios of indexes, the design principles of indexes and how to optimize indexes can be regarded as a problem.

How do I create/drop an index?

Create index:

Use the CREATE INDEX statement

CREATE INDEX index_name ON table_name (column_list);

Created when CREATE TABLE is created

	CREATE TABLE user(
	id INT PRIMARY KEY,
	information text,
	FULLTEXT KEY (information)
);
Copy the code

Create index with ALTER TABLE

ALTER TABLE table_name ADD INDEX index_name (column_list);

Delete index:

Delete the primary key index

Alter table TABLE name drop primary key
Delete other indexes

Alter TABLE TABLE name DROP Key index name

Does performance improve when using indexed queries?

Not necessarily. How to use indexes properly has been mentioned in index usage scenarios and index design principles. Because creating and maintaining indexes costs space and time, improper use of indexes will degrade query performance.

What is a prefix index?

Prefix index is used to index the first few characters of text or string. In this way, the index length is shorter and the query speed is faster.

Application scenario: Prefixes are highly distinguishable.

The way to build a prefix index

ALTER TABLE table_name ADD KEY(column_name(prefix_length));

There’s a prefix_length argument that’s hard to determine, and that’s what prefix length means. This can usually be determined using the following method, where the distinction of the entire column is calculated first

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;

And then the distinction between the prefix length and the entire column is the most similar.

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;

Constantly adjust the value of prefix_length until it is close to the distinction calculated for the entire column.

What is the leftmost matching principle?

Left-most matching rule: Continuous matching starts from the left-most and stops when a range query (<, >, between, or like) occurs.

For example, to create an index (a,b,c), you can guess whether an index is used in the following cases.

The first kind of

select * from table_name where a = 1 and b = 2 and c = 3 
select * from table_name where b = 2 and a = 1 and c = 3
Copy the code

SQL > select * from ‘where’; SQL > select * from ‘where’; SQL > select * from ‘;

The second,

select * from table_name where a = 1
select * from table_name where a = 1 and b = 2  
select * from table_name where a = 1 and b = 2 and c = 3
Copy the code

The answer is that all three queries use indexes, because all three matches from the far left.

The third kind of

select * from table_name where  b = 1 
select * from table_name where  b = 1 and c = 2 
Copy the code

The answer is that neither query uses the index because the match is not from the left

A fourth
```
select * from table_name where a = 1 and c = 2 
Copy the code
```
This query only uses the index for column A, but does not use the index for column C because column B is skipped.
The fifth
```
select * from table_name where  a = 1 and b < 3 and c < 1
Copy the code
```
In this query, only columns A and B use indexes, and column C does not because a range query is stopped according to the left-most matching query principle.

6 kinds of

select * from table_name where a like 'ab%'; 
select * from table_name where  a like '%ab'
select * from table_name where  a like '%ab%'
Copy the code

In the case of column strings, only prefix matches can use indexes, and infix matches and suffix matches can only perform full table scans.

Under what circumstances does an index fail?

In addition to the several cases that failed to comply with the leftmost matching principle described above, the following cases can also cause index invalidation.

The condition has or, for exampleselect * from table_name where a = 1 or b = 3
Evaluation on an index will invalidate the index, for exampleselect * from table_name where a + 1 = 2
An implicit conversion of the data type to the type of the index will invalidate the index. For example, the string must be quoted, assumingselect * from table_name where a = '1' I’m going to use the index, if I writeselect * from table_name where a = 1 The index is invalidated.
Using a function in an index will invalidate the index, for exampleselect * from table_name where abs(a) = 1
Starting with % in a like query invalidates the index
Used on index! , =, <> Will result in index invalidation, for exampleselect * from table_name where a ! = 1
If is NULL/IS not NULL is used on an index field, the index is invalid, for exampleselect * from table_name where a is null

How much do you know about frequent interview questions about database indexing?

What is an index?

Pros and cons of indexes?

Index data structure?

What’s the difference between a Hash index and a B+ tree?

What are the types of indexes?

What are the types of indexes?

What’s the difference between a B tree and a B+ tree?

Why does the database use B+ trees instead of B trees?

What is a clustered index and what is a non-clustered index?

Does a non-clustered index have to do a table-back query?

What are the use scenarios for indexes?

Index design principles?

How can indexes be optimized?

How do I create/drop an index?

Does performance improve when using indexed queries?

What is a prefix index?

What is the leftmost matching principle?

Under what circumstances does an index fail?

Related Posts

Scala series 4 – Array Array

JDK15 is officially released, and the landmark ZGC is officially announced

Python Programming Tutorial 12: Arguments to functions