Physical storage structure

The physical storage structure of mysql is actually shown in the figure above. It is a bidirectional linked list. Each node is a data page.

Row after row of data is placed in this data page

The index

Why can produce index such a thing, very simple, is to speed up the query!

In fact, the index and the book table of contents, are to find the data needed faster

What does the index of the database look like?

In fact, the same as the catalog of books, is to put the minimum data of each data page on the catalog, according to this data to carry out multiple binary search

Page divided

Purpose: To sort the primary keys of all data pages from smallest to largest

Page splitting this is actually very simple, so much data in the database, a data page is certainly not enough to store, when the stored data is larger than a data page, the new data and the overall data of the data page will be sorted, the data with large primary keys in the new data page

Constitute a

In fact, we can guess the composition of the index from the function of the index

  1. First, there must be a minimum value of the primary key corresponding to the data page, the purpose is very simple, is used for binary search, improve the speed of the query
  2. According to the previous step, we got the data page where we want to find the data, but we don’t know the specific one, so we must have the data page number

B + tree

It is well known that myslq database uses B+ tree data structure, what is that exactly?

There are three reasons

  • A single node stores more elements, resulting in fewer IO times for queries, which makes it more suitable as the underlying data structure of MySQL database.
  • All queries need to find leaf nodes, and the query performance is stable, while b-tree, each node can find data, so it is unstable.
  • All the leaf nodes form an ordered linked list, which is easier to find.

The principle of

Leftmost matching principle

Say you set up a index (a, b), there is no need to build a index, because (a, b) a index, is included this so no need to separate all established index b, but b index or to separate, because (a, b) is to meet the condition of meeting a and b, and the index b is not only a means

Of course, you need to consider the size of the two indexes, and try to choose the smaller index as a separate index, because it is relatively faster

The first principle is that if you can maintain one less index by adjusting the order, that order is often the preferred one.

The principle of equivalent

The query field is the same as the database field, and is matched by =, 100 percent will use the index

Left-most prefix matching rule

If you want to use like for fuzzy queries, 1% can use the index, but %1 cannot

Range lookup principle

If there is a range query in the WHERE statement, then only the leftmost index of the union index is used for range lookup

Equivalent matching + range matching principle

type

The primary key index

A primary key index is also called a cluster index

Leaf nodes indexed by primary keys hold complete pages of data

In the key-value scenario, if there is only one and unique index, it is suitable to use the service field as the primary Key index.

A primary key index first looks up the primary key directory, which maintains the page number and minimum primary key value for each data page

Non-primary key index

Aliases: secondary index, secondary index, normal index

Leaf node contents that are not primary key indexes are primary key values

Leaf nodes store primary key + field values

Without affecting the sorting result, all acquired primary keys will be sorted after the primary key is fetched and before the table is returned

  • If a primary key is defined, it is used as a clustered index.

  • If no primary key is defined, the first unique non-empty index of the table is used as the clustered index.

  • If there is no primary key and no suitable unique index, then InnoDB internally generates a hidden primary key as the clustered index. The hidden primary key is a six-byte column ROW_ID, whose value is incremented as data is inserted.

Methods to reduce the number of times to return to the table

Cover index

In general, only one secondary index can be used in a SQL statement, but it is possible to query multiple index trees at the same time to fetch an intersection and then return the table to the primary key index

The difference between

Select * from B+Tree; select * from B+Tree; select * from B+Tree; A normal index searches the index to get the primary key value, and then searches the primary key index tree again.

So do we choose to use a business field as the primary key, or a self-increment field as the primary key?

  • The insert data mode, which increments primary keys first, fits the incremental insert scenario we mentioned earlier. Every time a new record is inserted, it is an append operation, which does not involve moving other records and does not trigger the splitting of leaf nodes. However, the primary key of fields with business logic is not easy to ensure orderly insertion, so the cost of writing data is relatively high.
  • In addition to considering performance, we can also look at it from a storage perspective. If you do have a unique field in your table, such as a string id number, should you use id number as the primary key or increment field as the primary key? Because every leaf node that is not a primary key index has a primary key value. The leaf node of each secondary index takes about 20 bytes if the id number is used as the primary key, but only 4 bytes if the integer key is used, and 8 bytes if the bigint is used.

Obviously, the smaller the primary key length, the smaller the leaf nodes of the normal index, and the smaller the space taken up by the normal index.

Therefore, auto-increment of primary keys is often a more reasonable choice for performance and storage

If a data page is full, a new data page is added according to the B+Tree algorithm, which is called page splitting, causing performance deterioration. The utilization of space is reduced by about 50%. When the utilization rate of two adjacent data pages is very low, data pages will be merged. The process of merging is the reverse process of splitting.

Cover index

If the query condition is a normal index (or the leftmost principle field of the union index) and the query result is a field or primary key of the union index, the data can be found directly in the union index tree without the operation of the table, and the result can be directly returned

The index must contain all fields in the WHERE condition section and select return section for an overwriting index to be implemented

It is better to use statements such as limit or WHERE to limit the number of times the table can be returned to the clustered index

The only index

There is only one index in a table. If that index is not a primary key index, then it can only be a non-primary key index.

Joint index

Binary search is carried out according to each field in turn. First, the value of the first field is located in which page. Then, if there are many pieces of data in a page that are identical, the search will be based on the second field, and so on

The leaf node of the federated index holds the same page, but not the entire page data, but the fields in the federated index

Design indexes

  1. Try to use indexes for fields with large cardinality, that is, fields with large values, to make full use of the advantages of B+ trees

    For example, if a field has values of 0 and 1 in all rows, there is no need to set an index for that field

    Because there’s no way to use binary lookup if you only have 1 and 0 in your index field, so what’s the point in making that field an index

  2. Try to index fields with smaller field types

  3. Try not to have a function or calculation in the query, as this will not use the index

  4. Do not use uuid or anything like that, because if the primary key is self-incrementing, page splitting will occur naturally, but if it is not, it will cause frequent page splitting

  5. Do not have too many indexes, generally two or three combined indexes can cover all queries of a table

    If there are too many indexes in a table, the query is very convenient, but each increment, deletion and change has to maintain a huge index tree, and the performance deteriorates rapidly

  6. Try to place the range of queries at the end to ensure that all indexes are available

In fact, in most cases, there is no way to use an index at the same time

In this case, it is recommended to use the index where, because using the index where, in the case of a small amount of data, first load into memory, then sort according to the order by condition, sorting in memory is much faster than sorting directly on disk

Focus on

Try to use one or two complex multi-field joint indexes to support more than 80% of the queries, and then use one or two secondary indexes to support the remaining 20% of the atypical indexes to ensure that more than 99% of the queries can fully utilize the indexes, and you can guarantee your queries and performance!

skills

  1. For example, if you often need to query users who have logged in in the last 7 days, you can use this as a field. Add this field, for example, 1 means logged in in the last 7 days, 0 means not logged in in the last 7 days, and convert it directly into an enumeration value

Pay attention to the problem

The optimizer will disable tree search by performing functions on indexed fields

If a field has a range index, subsequent indexes will be invalidated, so it is generally recommended that the range index be placed last

The sorting

use

Often used multi-field sort, can directly establish a joint index in accordance with the order, because the index itself has a sort, so fast, directly according to the index tree search

Mysql > order by +desc; mysql > order by +desc

Where and orderby

When only one index can be used between WHERE and Order Derby, if the data volume is small, you can choose to find the data in WHERE first, and then sort and paginate the data at a relatively low cost

The principle of

MySQL allocates a memory (sort_buffer) to each thread for sorting. This memory is sort_buffer_size

  • If the amount of data sorted is less than sort_BUFFer_size, the sort will be done in memory
  • If the amount of sorting data is too large to hold in memory, temporary files on disk are used to assist sorting, also known as external sorting
  • With external sort, MySQL splits the sorted data into several separate temporary files and then merges them into one large file

Mysql will traverse the index to read the data that meets the criteria into sort_buffer and do a quick sort by sort fields

  • If the queried field is not included in the secondary index, you need to return to the clustered index by the primary key of the secondary index record to retrieve the required field
  • This method causes random I/O. In MySQL5.6, MRR is provided to fetch the primary key of the secondary index matching record, sort it in memory, and then return to the table
  • Create federated indexes on a case-by-case basis to avoid performance costs associated with sorting, or create overwrite indexes if possible to avoid returning to the table