MySQL massive storage index and sub-table design combat

Internet services often use MySQL database as background storage and InnoDB as storage engine. Based on the characteristics of Internet and MySQL database, we describe how to design tables and sub-tables in specific business scenarios. This paper starts from the introduction of MySQL related infrastructure design, and combined with the actual case of the enterprise to introduce the sub-table and index design combat skills.

What is InnoDB record storage?

It is well known that records are stored in primary key order in the InnoDB storage engine, and primary key clustering indexes are created for tables based on this feature.

How does InnoDB implement “sequential storage” of records? The first thing to know is “sequential” in-page order and inter-page order. Pages are the basic unit of InnoDB memory exchange.

Page to page order: the pages of a disk file are connected by a bidirectional linked list, and the pages may be physically ordered. Most cases are logically ordered;

In-page order: records in a page are connected by a single necklace table, so that the page is logically ordered, with slot data structure to achieve in-page query efficiency close to binary search.

InnoDB page space distribution:

Page Header

Based on the above characteristics, let’s analyze the impact of using different primary keys on storage:

Autoincrement primary key: The value of the primary key increases. The data is inserted sequentially, so the data is physically contiguous in the page. After a page is filled, the next page is allocated sequentially. In the absence of a delete operation, the records of the entire table are stored consecutively in the disk file in the order they are written. In this storage mode, disk utilization is high and random I/O is low. The insertion efficiency is quite high.

Service primary key: For example, the user table uses the UID primary key and the commodity table uses the infoId primary key. These meaningful primary keys are called service primary keys. Obviously, business continuous and primary key can’t record not only the physical page when inserting data can also cause the split, which leads to pages within the pieces, for example, if a page space is full, store the primary key value of 0 ~ 99100 data, if you want to insert records, this page has not put, in need to split into two pages to complete the insert, However, the two split pages are difficult to be filled, resulting in in-page fragmentation. Therefore, the write performance and disk utilization of the service primary key are inferior to that of the auto-add primary key.

From the above analysis, can we conclude that it is always good to use auto-increment primary keys? It’s too early to tell until we’ve analyzed InnoDB’s indexes.

What is a primary key index?

InnoDB automatically creates indexes on primary keys of tables and uses B+Tree data structures. Primary key indexes are also called clustered indexes due to storage characteristics. The index structure of the clustered index is stored together with the actual data, and the B+Tree leaf node stores the actual record, as shown in the figure:

Clustering index

What is a non-primary key index?

Since records are stored in the primary key index structure, how do indexes created on other columns find records? It is natural to think that an index on a non-primary key column can first look up the primary key through its own index structure, and then use the primary key to find the corresponding record on the clustered index. This is what InnoDB does, so we also call indexes on non-primary key columns secondary indexes (because a query requires two index trees to be found).

Secondary indexes have the following characteristics:

Indexes other than primary key indexes;
Data in the leaf node of the index structure is the primary key;
A query looks up both its own index and the primary key index.

What is a joint index?

A joint index is also called a multi-column index. The key of the index structure contains multiple fields. When sorting, the first column is compared, and the second column is compared if they are the same. The structure of the joint index is shown in the figure below:

Joint index

A query on a federated index must meet the following characteristics:

Select key from left; otherwise, the index cannot be used.
If the middle column is skipped, subsequent columns cannot use the index.
If a column uses range queries, subsequent columns cannot use indexes.

According to the prefix index feature, the associative index (a,b,c) can meet (a), (a,b), (a,b, C) three types of query.

Five, the summary

InnoDB index: InnoDB index: InnoDB index: InnoDB index: InnoDB index: InnoDB index

Autoadd primary key: High write efficiency, high query efficiency, and high disk utilization. However, two levels of indexes are required for each query because online services do not directly use primary key columns.

Service primary key: Low write, query efficiency, and disk utilization. However, a primary index can be used, relying on the feature of overwriting indexes. In some cases, a single-index complete query can be implemented on non-primary key indexes (as described in the following cases).

The advantages of auto-add primary keys over service primary keys in I/O efficiency are almost negligible on SSDS, but service primary keys have obvious advantages in service query performance. Therefore, service primary keys are used in service databases.

Vi. Design and practice of e-commerce business sub-table

According to the characteristics of MyQL database combined with its own business characteristics, a series of database usage specifications have been developed, which can effectively guide RD to design database tables and indexes in the process of project development. The following are key design principles for tables and indexes in e-commerce business and two practical cases.

1, table design principles

Primary key selection: The advantages and disadvantages of the service primary key and the auto-added primary key have been compared and analyzed. The conclusion is that the service primary key is more suitable for service query requirements, while most Internet services meet the characteristics of more reads and less writes. Therefore, all online services use the service primary key.

Number of indexes: The number of indexes should not be more than five because excessive indexes may cause large index files.

Column type selection: Generally, the smaller and simpler the better. For example, TINYINT is used for BOOL fields, TINYINT is used for enumeration fields, and LONG is used for transaction amount. Because BOOL and enumeration types can be easily extended using TINYINT, InnoDB provides DECIMAL types that support exact calculations, but DECIMAL is a storage type, not a data type, and does not support CPU native calculations, which is less efficient. So let’s just do a simple thing of converting decimals to integers and storing them in LONG.

Sub-table strategy: first of all to make clear the database performance problems in general after the amount of data to a certain extent! So we are required to make a good estimate in advance, do not wait for the need to split and then split, generally the table data volume control in ten million level; There are two types of table division strategies: select modules by key to ensure uniform read and write; By time, cold and hot data is clear.

2. Actual cases

Case 1: User table design

Select * from user table where uid, nickname, mobile, addr, image… . The switch; Uid is the primary key. Services have two types of query requirements: UID and mobile. Therefore, indexes must be created on moblie.

The switch column is of type BIGINT and is used to store user attributes of type BOOL. Each column can store one attribute of the user, for example, we use the first column to store whether to receive push, the second column to store whether to store offline messages, etc.

This design is very scalable (because BIGINT has 64 bits and can hold 64 states, which is difficult to use up in general), but it also presents some problems. Switch has a high query frequency. Since InnoDB is row storage, to find a switch query you need to fetch the positive row data.

What optimizations can we make in table design for the above scenario? A common scheme, and we’re not going to talk too much about this, is to sort the table vertically.

Another option is to take advantage of InnoDB’s ability to override indexes by creating a joint index on the uid and switch columns, so that the secondary index contains the values of the UID and switch columns. In this way, when querying a switch with a UID, only the secondary index can find the switch without accessing the record. You don’t even need to go to the leaf node of the secondary index to find the switch value to be queried, which is very efficient.

Another thing to consider is that switch changes are quite frequent, as you can imagine. Would a switch worth changing cause a change in the federated index (in this case, the index node is split or reordered)?

The answer is no! Since the uid of the first column of a federated index is unique and unchangeable, it already determines the order of the index. Changing the switch column only changes the value of the second key on the index node, not the index structure.

Case 2: IM subsystem sub-table scheme

IM subsystem consists of four service tables: user, contact, cloud message, and system message. The database is split by business, with a separate instance for each business. Except for the system message table, the other tables are divided into 128 tables based on the UID as the key. Because of the special business of system message, its table division scheme is different from other business.

Let’s first understand the business characteristics of the system message: the system message table stores the message of notification type sent by the server. Since it is notification, it will be effective. We set the validity period of the system message to be 30 days, so we adopt the following table scheme for the above characteristics:

The system message table is divided into tables by month, and the data of each month is divided into 128 tables.

Consider this: Querying a person’s system messages requires two database interactions because they are tabulated by month, and most queries are cross-month (because you need to look up messages within 30 days). Can it be optimized?

We can make redundant storage, and the specific optimization scheme is as follows:

Insert system message write current month and last month two tables;
Start reading from the previous month;

Redundant storage mode

In this scheme, we can ensure that all system messages within the validity period of users can be found in one query. However, the scheme may not be the optimal one in exchange for sacrificing storage space and write efficiency. However, it can be selected in business scenarios with small amount of total data and more emphasis on query performance.

Seven,

The performance of the auto-added primary key may not be high, so you need to analyze it based on actual service scenarios.
Use simple types for most scenarios;
More indexes is not always better, too many indexes can lead to too large index files;
If the data to be queried can be found in the index file, the storage engine will not look up the primary key index to access the actual record.

Original: www.tuicool.com/articles/Uv…