Index related algorithms

Binary search

The search process of binary search method is: the records are arranged in order, and the midpoint position of the sequence is the comparison object when searching. If the element value to be found is less than the midpoint element, the search scope is reduced to the left half; If the element you are looking for has a value greater than the midpoint element, narrow the query to the right half. And so on until you find the value you want.

Binary search tree

In a binary search tree, the key value of the left subtree is always less than the key value of the root, and the key value of the right subtree is always greater than the key value of the root, and there are at most two subtrees per node.

Balanced binary trees

Let’s look at the definition of a balanced binary tree: it satisfies the definition of a binary search tree, and it must satisfy the height difference of the two subtrees of any node is at most 1.

The disadvantage is that each node only has two branches at most. If the amount of data is large, the data in leaf nodes can only be queried through multiple nodes.

B tree

B tree can be understood as a multi-fork search tree in which a node can have more than 2 child nodes. The same key does not appear more than once in a B tree, either on a leaf node or an inner node.

Compared with balanced binary trees, B trees utilize multiple branching nodes (balanced binary trees have only two branching nodes) to reduce the number of nodes experienced in obtaining records.

Disadvantages: Because each node contains key and data values, if data is large, the number of keys stored on each page will be small. When there is a large amount of data, there will also be the problem of “the data in the leaf node can be queried only after going through multiple nodes”.

B + tree

B+ tree is a variant of B tree. The definition is basically the same as that of B tree, but different from B tree:

  • All leaf nodes contain information about all keywords
  • Each leaf node is connected with a pointer
  • Non-leaf nodes store only key information, which can increase the number of keys stored in each page compared to B trees.
  • A B tree grows vertically and eventually becomes a “tall and thin” tree, while a B+ tree grows horizontally and eventually becomes a “dumpy” tree.

In a B+ tree, all the record nodes are stored on the leaf nodes in the same layer in order of key values. A B ina B+ tree is not a binary tree but a balance.

The main difference from B-tree is that the key must appear on the leaf node, but it may also repeat in the non-leaf node. The same key doesn’t appear more than once in a B tree.

B + tree index

In a database, B+ trees are usually between two and four levels high, so it only takes two to four IO at most to find a row. In the case of no index, row by row scanning is obviously much less efficient, which is why adding indexes can improve query speed.

A B+ tree index does not find a specific row for a given key value, only the page on which the row is searched. The database then reads the page into the buffer pool and searches the page in memory by binary lookup method to obtain the required data.

InnoDB B+ tree index is divided into clustered index and secondary index.

Clustered index

InnoDB’s data is stored in primary key order, whereas a clustered index is a B+ tree based on the primary key of each table, and its leaf nodes hold entire rows of data.

InnoDB’s primary key must be a clustered index. If no primary key is defined, the clustered index may be the first unique index not allowed to be null, or it may be a row ID.

Since the actual data pages can only be sorted by a B+ tree, there can only be one clustered index per table (except for the TokuDB engine). Query optimizers tend to use clustered indexes because clustered indexes can find data directly on the leaf nodes of a B+ tree index.

Clustered indexes are very fast in sorting and range lookups for primary keys.

Key information:

  • A B+ tree structure is created based on primary key values
  • Each leaf node contains an entire row of data

Secondary index

The InnoDB leaf node that stores the engine’s secondary index does not hold entire rows of data, but keys and primary key ids.

When looking for data by secondary index, InnoDB storage engine traverses the secondary index tree to find the primary key of the corresponding record, and then finds the corresponding row by primary key index.

Key information:

  • A B+ tree structure is created based on the values of the normal indexes
  • Each leaf node holds its own key and primary key ID for the normal index

Scenarios where indexes need to be added

Data retrieval

When data is retrieved, indexes are added to conditional fields.

Aggregation function

Indexes can improve the efficiency of Max () function, as well as min() function. It has also been analyzed that indexes also have optimization effects on count(*).

The sorting

  • If you sort a single field, you can optimize the sort statement by adding an index to the sorted field;
  • If the sort is multiple fields, you can add joint indexes on multiple sort fields to optimize the sort statement.
  • If the query is equivalent and then sort, you can optimize the sort statement by adding joint indexes to the condition and sort fields.

To avoid back to the table

select a,d from t9_1 where a=90000;
/* select * from d; /* select * from d
Copy the code

Because of the structure of the secondary index, the primary key of the corresponding record is found through the secondary index, and then the corresponding row data is found through the primary key index back to the table.

If the condition field and the field to be queried have a joint index, this step is actually saved, because the joint index contains the values of the two fields. An index like this already covers the scenario for our query requirements, which we call an overwriting index.

Associated query

Make BNL NLJ or BKA by adding indexes to associated fields.

Joint index

Index is the index of multiple columns on a table. Suitable for multiple column combinations in WHERE conditions, which can avoid back to the table in some scenarios. The federated index is shown in the figure.

For a, B two fields as a condition, the query can go index; For individual A field queries, it is also possible to go indexed. However, the index of field b cannot be queried alone, because in the figure above, the corresponding values of field b are 1,2,3,1,2,3, obviously not in order, so it cannot be indexed for field b.

Joint Index recommendations:

  • In the WHERE condition, frequently co-occurring columns are placed in the federated index.
  • Place the most selective column at the far left of the union index.