Database index

Talk about common data structures

Basic data structure

An array of 1.

(1) is a data node consisting of a set of elements of the same type. (2) ** allocates a contiguous chunk of memory to store **. (3) The corresponding storage address of the element can be calculated by using the index of the element.Copy the code

2. List

(1) The objects are sorted in linear order. (2) The order in the linked list is determined by the Pointers in each object.Copy the code

3. Stack/queue

(1) stacks and queues are dynamic collections, in which the delete operation removes predefined elements; (2) stacks implement LIFO(Last in,first out) policies; (3) queues implement FIFO(First in,first out) policiesCopy the code

4. The binary tree

P. left,right holds Pointers to parent, left child, and right child nodesCopy the code

5. A root tree

(1) on the base of binary tree, add left and right attributes child1,child2... Each node contains two Pointers, <1> pointing to the leftmost child node <2> pointing to its adjacent sibling node to the rightCopy the code

6. A hash table

(1) Generalization of the concept of ordinary array. (2) Keyword set + hash function + conflict resolution (chain address method/open address method/double probe/double hash)Copy the code

Binary search tree

(1) Organize by binary tree (2) Binary search tree properties: let x be a node in the binary search tree. If y is a node in the left subtree of X, y.ey <= X. ey; if y is a node in the right subtree of X, y.ey >= X. eyCopy the code

8. Avl tree:

(1) is a binary search tree. (2) Properties of a balanced search tree: <1> is an empty tree or the absolute value of the difference in height between the left and right subtrees is not more than 1.Copy the code

9. The red-black tree

(1) Red black tree is binary search tree (2) Property of red black tree <1> Every node is red or black <2> Root node is black <3> leaf node (NIL) is black <4> If a node is red, its children are black <5> For each node, the simple path from that node to all its descendant leaf nodes contains the same number of black nodes. (3) A red-black tree with N nodes is at most 2lg(n+1) in height.Copy the code

Advanced data structure

1. b-tree

Btree is a root tree. 2. Properties of btree: <1> each node x has the following attributes: a. x.n, the number of keywords currently stored on node x b. x.n keywords x.key1,x.key2... ,x.keyn The value of x.key1 <= X. key2 <=... <= x.keyn c.x.leyaf, if x is a leaf, x.leyaf = True; If x is an internal node, xleaf = False <2> Each internal node x also includes x.n+1 Pointers to its children, x.c1, x.c2... . The leaf node has no sons, so the <3> keyword x. keyyi of the leaf node is not defined to divide the range of keywords stored in each subtree: if ki is any keyword stored in the subtree rooted in X. key1, then k1 <= X.key1 <= <= k2 <= X.ey2 <= X.eyx.n <= kx.n+1 <4> Each leaf node has the same depth, that is, the height of the tree H <5> The number of keywords contained in each node has upper and lower bounds. A is represented by a minimum degree called b tree and a fixed integer T >=2. Every node other than the root node must have at least T-1 keywords, so every internal node other than the root node must have at least T children. If the tree is not empty, the root node has at least one keyword b. Each node can contain 2T-1 keywords. So an internal node can have at most 2T children. When a node has exactly 2T-1 keywords, the node is said to be full. In particular, every place in a tree where data is stored should have both keys and values.Copy the code

2. b+tree

The root node contains at least one element. 2. B +tree has two types of nodes: internal node and leaf node. Internal nodes do not store data (value), but store key values and Pointers to child nodes. The keys in the inner node are sorted in ascending order. For a key in the inner node, all the keys in the calculation are smaller than it and all the keys in the right subtree are larger than it. The records in the leaf node are also arranged according to the size of the key. 4. The parent node stores the index of the first element of the child node.Copy the code

Indexes and data structures

(1) The nature of index: it is a data structure designed to speed up the search. Its essence is to associate a keyword with its corresponding record.

(2) Index implementation logic = specific data structure (storage) + algorithm (logic)

(3) Contradictions in index implementation:

1. Search speed 2. Modify/Delete/increase speed 3Copy the code

Back to database storage

Possible data structures that support indexing:

1. Array: stored in an ordered array by keyword, retrieved by binary search and other methods. Linked list through the transformation of skip list according to the keyword ordered storage, search through binary search. 3. The hash table supports the direct search of a single keyCopy the code

First, a few status quo:

1. Reading data from the memory the disk is read in pages (4kB by default). In addition to the search of a single record, but also need to orderby, greater than, less than and other batch data search. The total amount of data is so large that it is generally considered that it cannot be fully loaded into memory.Copy the code

We try to build the index from 0

On disk we first construct a record store

Of course, we can’t store only one piece of data, we can associate multiple pieces of data with Pointers

Our disk storage is scheduled and read using pages, so we also put our data on a page, increasing the page number, the previous page pointer and the next page pointer

At this point, we have multiple data in a page. To retrieve the data in a page more quickly, we add an index in a page

Ok, so now we have a page that looks reasonable, and in fact we’ll have a lot of pages, like this

At this time the important question comes, the underlying data already exists, how to conduct efficient search?

We tried adding an index to the page

This structure does not solve the problem.1. It is not stored in page format,2. The number of indexes increases proportional to the number of pages

Try multi-level indexing.

Wait, isn’t that a B + tree?

References:

  1. Introduction to algorithms 2nd edition
  2. wiki
  3. Github.com/roy2220/pla…
  4. www.bilibili.com/video/BV1Aa…
  5. www.bilibili.com/video/BV1sQ…