The index

What is an index?

2. What are the advantages and disadvantages of indexes?

3. Index Usage Scenarios (key points)

4. What are the types of indexes?

5. Index data structure (B tree, hash)

6. Basic principles of indexing

7. What are the indexing algorithms?

8, index design principles?

9. Principles of index creation (important)

10, Three ways to create an index, drop an index

What do I need to pay attention to when creating an index?

12. Do indexed queries always improve query performance? why

13. How to delete data of millions level or above

Prefix index

What is the leftmost prefix rule? What is the leftmost matching principle

16, The difference between B tree and B+ tree

17. Advantages of using B trees

18. Benefits of using B+ trees

19. What’s the difference between a Hash index and a B+ tree?

20. Why do databases use B+ trees instead of B trees

21. B+ tree does not need to query data back to the table when it meets the requirements of clustered index and overwritten index.

22, What is clustering index? When to use clustered and non-clustered indexes

23, Non-clustered index must be back to the table query?

What is a federated index? Why do I care about the order in a federated index?

What is an index?

Indexes are special files (indexes on InnoDB tables are part of the table space) that contain Pointers to all the records in the table.

An index is a data structure. A database index is a sorted data structure in a database management system to help query and update data in a database table quickly. Indexes are usually implemented using B trees and their variant B+ trees.

More generally, an index is a table of contents. In order to facilitate the search of the contents of the book, through the content of the index to form a catalog. An index is a file that occupies physical space.

2. What are the advantages and disadvantages of indexes?

Advantages of indexes

(1) It can greatly accelerate the speed of data retrieval, which is also the main reason for creating indexes.

(2) Through the use of index, can be in the query process, the use of optimization hide, improve the performance of the system.

Disadvantages of indexes

(1) Time: it takes time to create and maintain indexes. Specifically, when adding, deleting and modifying data in tables, indexes also need to be maintained dynamically, which will reduce the efficiency of adding, changing, and deleting data.

(2) Space: the index needs to occupy physical space.

3. Index Usage Scenarios (key points)

where


In the figure above, the record is queried by ID. Because the ID field is only the primary key index, this SQL execution can select only the primary key index. If there are more than one, it will eventually select the better one as the basis for the retrieval.

Altertable innodb1 add sex char(1); altertable innodb1 add sex char(1); NullEXPLAINSELECT *from Innodb1where sex='male';Copy the code

Alter table add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index)))

order by

When we use order BY to sort the query results by a field, if the field is not indexed, then the execution plan will use external sort for all the queried data. This operation can affect performance. Because all the data involved in the query needs to be read from disk to memory (if a single data is too large or too much data will reduce efficiency), let alone the sorting after reading to memory.

However, if we create an index for this field alter table add index(field name), then because the index itself is ordered, so directly according to the order of the index and mapping relationship can be fetched one by one. And if paging, then only the index in a range of the index table corresponding to the data, rather than fetching all the data to sort and return a range of data as described above. (Fetching data from disk is the most performance critical)

join

It is efficient to index the fields involved in the join statement matching relation (ON)

Indexes cover

If the fields to be queried are all indexed, the engine will query directly in the index table without accessing the raw data (otherwise, it will do a full table scan whenever a field is not indexed), which is called index overwrite. Therefore, we need to write only the necessary query fields after the select as much as possible to increase the chance of index coverage.

It’s worth noting here that you don’t want to index every field, because the advantage of using indexes in preference is their small size.

4. What are the types of indexes?

Primary key index: 

Columns cannot duplicate or be NULL, and a table can have only one primary key.

Unique index: 

Data columns are not allowed to duplicate and NULL values are allowed. A table allows multiple columns to create unique indexes.

ALTER TABLE table_name ADD UNIQUE (column); Create a unique index

ALTER TABLE table_name ADD UNIQUE (column1,column2); Create a unique composite index

Common index: 

The basic index type, which has no restriction on uniqueness, can be NULL.

ALTER TABLE table_name ADD INDEX index_name (column); Creating a normal index

ALTER TABLE table_name ADD INDEX index_name(column1, column2, column3); Create composite indexes

Full-text index: 

It is a key technology used by search engines at present.

ALTER TABLE table_name ADD FULLTEXT (column); Creating a full-text index

5. Index data structure (B tree, hash)

The data structure of index is related to the implementation of specific storage engine. Indexes used in MySQL include Hash index, B+ tree index, etc. The default index of InnoDB storage engine we often use is B+ tree index. For hash index, the underlying data structure is hash table, so in the vast majority of requirements for a single record query, you can choose hash index, query performance is the fastest; In most scenarios, you are advised to select the BTree index.

(1) B-tree index

Mysql uses storage engine to fetch data, and almost 90% of people use InnoDB. According to the implementation, InnoDB has only two index types: BTREE index and HASH index. B-tree index is the most frequently used index type in Mysql database. Almost all storage engines support BTree index. Mysql > select * from BTREE; select * from BTREE; select * from BTREE;

Query mode:

Primary key index area :PI(address of associated saved data) press the primary key to query,

Common index area :si(address of the associated ID, and then to the address above). So press the primary key query, the fastest

B + tree properties:

1) N subtree nodes contain n keywords, not to store data but to store data index.

2) All leaf nodes contain information of all keywords and Pointers to records containing these keywords, and leaf nodes themselves are linked in large order according to the size of keywords.

3) All non-terminal nodes can be regarded as index parts, which only contain the maximum (or minimum) keyword in the sub-tree.

4) In B+ tree, data objects are inserted and deleted only on leaf nodes.

5) B+ tree has two head Pointers, one is the root node of the tree and the other is the leaf node of the minimum key code.

(2) Hash index

Briefly said, simple implementation of a HASH table is similar to the data structures (HASH), when we use HASH index in mysql, mainly through the HASH algorithm (common HASH algorithm is directly addressing method, in the square method and folding method, the divisor residual method, random number method), puts the data into a database field long HASH value, The row pointer to this data is stored in the Hash table; If a Hash collision occurs (two different keywords have the same Hash value), they are stored in a linked list under the corresponding Hash key. Of course, this is just a rough simulation.

6. Basic principles of indexing

Indexes are used to quickly find records that have specific values. If there is no index, the query will generally traverse the entire table.

The principle of indexing is simple: it turns unordered data into ordered queries

(1) Sort the contents of the column where the index is created

(2) Generate an inversion list of sorting results

(3) Spell the data address chain on the inverted list

(4) In the query, first get the inversion list content, and then take out the data address chain, so as to get specific data

7. What are the indexing algorithms?

Index algorithms include BTree algorithm and Hash algorithm

BTree algorithm

BTree is the most common mysql database indexing algorithm and the default mysql algorithm. Because it can be used not only on the =,>,>=,<,<= and between comparison operators, but also on the like operator, as long as its query condition is a constant that does not begin with a wildcard, for example:

Select * fromuserWHERE name like from fromuserWhere where name like'jack%'; -- If it starts with a wildcard character, or if no constant is used, the index will not be used, for example: select* fromuserWhere name like'%jack';Copy the code

The Hash algorithm

Hash Hash indexes can only be used for peer comparison, such as the =,<=> (equivalent to the =) operator. Because it is a positioning data, unlike the BTree index, which needs to access the page node from the root node to the branch node for many IO visits, the retrieval efficiency is much higher than that of the BTree index.

8, index design principles?

(1) Columns that are suitable for indexing are those that appear in the WHERE clause, or those specified in the join clause

(2) It is not necessary to create an index in this column because the index effect is poor for the class with small cardinality

(3) Use a short index. If you index a long string column, you should specify a prefix length, which can save a lot of index space

(4) Don’t over-index. Indexes require additional disk space and reduce write performance. When table contents are modified, the index is updated or even reconstructed, and the more index columns, the longer this takes. So keep only the indexes you need to help the query.

9. Principles of index creation (important)

Indexing is good, but it is not unlimited. It is best to comply with the following principles

Mysql will keep matching to the right until it hits a range query (>, <, between, like). A = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and C > 3 and d = 4

2) create an index for a field that is frequently used as a query condition

3) Frequently updated fields are not suitable for creating indexes

4) If the column cannot effectively distinguish data, it is not suitable for index column (such as gender, male and female unknown, there are only three at most, the distinction is too low).

5) Expand indexes as much as possible, do not create new indexes. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.

6) foreign key columns must be indexed.

7) For those columns that are rarely involved in the query, do not create indexes for those columns with a high number of duplicate values.

8) Do not index columns of data types defined as text, image, and bit.

10, Three ways to create an index, drop an index

The first method: CREATE an index when executing CREATE TABLE

CREATETABLE user_index2 (Copy the code

Second: use the ALTER TABLE command to add an index

ALTERTABLE table_name ADDINDEX index_name (column_list);Copy the code

ALTER TABLE creates a normal, UNIQUE, or PRIMARY KEY index.

Table_name indicates the name of the table to which the index is to be added. Column_list indicates the column to which the index is to be added. If there are multiple columns, the columns are separated by commas.

The index name index_name is self-naming. By default, MySQL assigns a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time.

Third method: Run the CREATE INDEX command to CREATE the INDEX

CREATEINDEX index_name ON table_name (column_list);Copy the code

CREATE INDEX Can add a normal or UNIQUE INDEX to a table. (However, you cannot create a PRIMARY KEY index.)

Remove the index

Delete normal index, unique index, full-text index by index name:

ALTER TABLE 表名 DROP KEY 索引名 altertable user_index dropKEY NAME;
altertable user_index dropKEY id_card;
altertable user_index dropKEY information;Copy the code

Alter table table name drop primary key; It is worth noting here that this operation cannot be performed directly if the primary key grows (self-growth depends on the primary key index) :

Need to cancel self-growth and then delete:

Altertable user_index -- redefine field MODIFY id int,dropPRIMARYKEYCopy the code

However, primary keys are usually not removed because design primary keys must be independent of business logic.

What do I need to pay attention to when creating an index?

(1) Non-empty field:

The column should be specified NOT NULL unless you want to store NULL. Columns with null values are difficult to query optimize in mysql because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string;

(2) Fields with large discrete values:

(the degree of difference between variable values) is placed before the joint index, and the difference value of the field can be viewed through the count() function. The greater the return value, the more unique values of the field the higher the dispersion degree of the field.

(3) The smaller the index field, the better:

Database data is stored in pages. More data is stored on a page. More data is obtained during an I/O operation.

12. Do indexed queries always improve query performance? why

In general, querying data through an index is faster than a full table scan. But we must also be aware of the costs.

(1) The index needs space for storage and regular maintenance. Whenever records are added or deleted in the table or index columns are modified, the index itself will also be modified. This means that each INSERT, DELETE, and UPDATE record will cost 4 or 5 more disk I/ OS. Because indexes require extra storage and processing, unnecessary indexes can slow query response times. Using INDEX queries may not improve query performance. INDEX RANGE SCAN queries are applicable to two situations:

(2) Based on a range of retrieval, a typical query returns a result set less than 30% of the number of records in the table

(3) Retrieval based on non-unique index

13. How to delete data of millions level or above

About index: Because index needs extra maintenance cost, index file is a separate file, so when we add, modify and delete data, there will be extra operations on index file, these operations need to consume extra IO, which will reduce the efficiency of add/change/delete. So, when we delete millions of database data, check the MySQL official manual to see that the speed of deleting data is proportional to the number of indexes created.

(1) So when we want to delete millions of data, we can delete the index first (at this time, about three minutes).

(2) Then delete the useless data (this process takes less than two minutes)

(3) Create index after delete (less data at this time) create index is also very fast, about 10 minutes.

(4) With the previous direct delete is absolutely much faster, not to mention in case of delete interruption, all delete will be rolled back. That’s even worse.

Prefix index

Syntax: index(field(10)) : uses the first 10 characters of a field value to create an index. The default value is to use the entire content of a field to create an index.

Prerequisite: The identifier of the prefix is high. Passwords, for example, are good for prefix indexing because they are almost always different.

Practical difficulty: the length of the prefix cut.

Select count(*)/count(distinct left(password,prefixLen)); By adjusting the prefixLen value (incremented from 1) to see an average match for different prefix lengths near 1 (the prefixLen characters representing a password almost determine a single record)

What is the leftmost prefix rule? What is the leftmost matching principle

When creating a multi-column index, the most frequently used column in the WHERE clause is placed on the left.

Mysql > select * from left prefix; mysql > select * from left prefix; mysql > select * from left prefix; A = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and C > 3 and d = 4

(3) = and in can be in random order, such as a = 1 and b = 2 and c = 3, create (a,b,c) index can be in any order, mysql query optimizer will help you optimize the form to be recognized by the index

16, The difference between B tree and B+ tree

(1) In a B-tree, you can store keys and values in internal nodes and leaf nodes; But in a B+ tree, the inner nodes are all keys and have no values, and the leaf nodes hold both keys and values.

(2) The leaf nodes of B+ tree are connected by a chain, while the leaf nodes of B tree are independent.

17. Advantages of using B trees

B-trees can store both keys and values in internal nodes, so placing frequently accessed data near the root node greatly improves the efficiency of hot data queries. This feature makes b-trees more efficient in scenarios where a particular data is queried repeatedly.

18. Benefits of using B+ trees

Since the internal nodes of the B+ tree only store keys, not values, a single read can fetch more keys in the memory page, which helps to narrow down the search more quickly. The leaf nodes of B+ tree are connected by a chain. Therefore, when a full data traversal is needed, B+ tree only needs O(logN) time to find the smallest node, and then O(N) sequential traversal through the chain is enough. B trees, on the other hand, need to traverse each level of the tree, which requires more memory replacement times and therefore more time

19. What’s the difference between a Hash index and a B+ tree?

First, understand the underlying implementation of Hash indexes and B+ tree indexes:

The underlying hash index is a hash table. When searching for data, you can call the hash function once to obtain the corresponding key value, and then query back to the table to obtain the actual data. The underlying implementation of a B+ tree is a multi-path balanced lookup tree. For each query, it starts from the root node, and the key value can be obtained when the leaf node is found, and then it is judged whether it is necessary to query data back to the table according to the query.

So you can see that they have the following differences:

  • Hash indexes are faster for equivalent queries (in general), but not for range queries.

After the hash function is used to create indexes in the hash index, the index order cannot be the same as the original order, and range query cannot be supported. All nodes of a B+ tree follow the rules (the left node is smaller than the parent node, the right node is larger than the parent node, and the same is true for multi-fork trees), which naturally supports the range.

  • Hash indexes do not support sorting by indexes.
  • Hash indexes do not support fuzzy query and left-most prefix matching of multi-column indexes. It also works because hash functions are unpredictable. The indexes of AAAA and AAAAB have no correlation.
  • Hash indexes can always be used to query data back to the table, whereas B+ trees can use indexes only when certain conditions (clustered indexes, overwriting indexes, etc.) are met.
  • Hash indexes, while fast for equivalent queries, are not stable. Performance is unpredictable. When there are a large number of duplicate key values, hash collisions occur, and the efficiency may be very poor. The query efficiency of B+ tree is relatively stable. All queries are from the root node to the leaf node, and the height of the tree is relatively low.

Therefore, in most cases, choosing B+ tree indexes directly can achieve stable and good query speed. Instead of using hash indexes.

20. Why do databases use B+ trees instead of B trees

(1) B tree is only suitable for random retrieval, while B+ tree supports both random and sequential retrieval;

(2) The B+ tree space utilization is higher, which can reduce I/O times and lower disk read and write costs. Generally, indexes themselves are too large to be stored in memory, so indexes are often stored on disk as index files. In this case, disk I/O consumption is incurred during index lookups. The internal nodes of the B+ tree do not have Pointers to the specific information about keywords, but are used as indexes. The internal nodes of the B+ tree are smaller than those of the B tree. The number of keywords in the nodes that can be contained in the disk block is larger, and the number of keywords that can be searched in the memory at a time is larger. IO read and write times are the biggest factor affecting index retrieval efficiency.

(3) THE query efficiency of B+ tree is more stable. B-tree search may end at non-leaf nodes, and the closer it is to the root node, the shorter the record search time is. As long as the key word is found, the existence of the record can be determined, and its performance is equivalent to a binary search in the whole set of keywords. However, in B+ tree, sequential retrieval is more obvious. In random retrieval, any keyword must be searched from the root node to the leaf node. All keyword search paths have the same length, resulting in the same query efficiency of each keyword.

(4) B-tree improves disk IO performance but does not solve the problem of low efficiency of element traversal. The leaf nodes of a B+ tree are connected together sequentially using Pointers, and the entire tree can be traversed simply by traversing the leaf nodes. Moreover, range-based queries are very frequent in the database, and B trees do not support such operations.

(5) Higher efficiency in adding and deleting files (nodes). Because the leaf node of B+ tree contains all keywords and is stored in an ordered linked list structure, the efficiency of addition and deletion can be greatly improved.

21. B+ tree does not need to query data back to the table when it meets the requirements of clustered index and overwritten index.

In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index.

When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.

22, What is clustering index? When to use clustered and non-clustered indexes

(1) Clustered index: put the data storage and index together, find the index will find the data

(2) Non-clustering index: Myisam uses key_buffer to cache the index in memory. When it needs to access data (through the index), myISam directly searches the index in memory, and then finds the corresponding data on disk through the index. This is why indexes are slow when they are not hit by the key buffer

To clarify a concept: innodb, above the clustering index created index called secondary index, auxiliary index access data is always need a second search, the clustering index is auxiliary index, as a composite index, the prefix index, the only index, auxiliary index leaf node storage is no longer the physical location, but the primary key

When to use clustered and non-clustered indexes

23, Non-clustered index must be back to the table query?

Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table.

Select age from employee where age < 20; select age from employee where age < 20; select age from employee where age < 20;

What is a federated index? Why do I care about the order in a federated index?

MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched.

Specific reasons are as follows:

MySQL > create index (name, age, school); MySQL > create index (name, age, school); MySQL > create index (school);

When the query is performed, the indexes are only strictly ordered according to name, so the name field must be used for equivalent query first. Then, the matched columns are strictly ordered according to age field, and the age field can be used for index search, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure.

The last

Welcome to pay attention to the public number: programmers chasing the wind, receive Java knowledge learning mind map summary + a large factory Java interview summary + a 300 page PDF document Java core knowledge summary!