MySQL: Internet companies commonly used sub-database sub-table solution summary!

1. Database bottlenecks

Either an IO bottleneck or a CPU bottleneck will eventually lead to an increase in the number of active connections to the database, approaching or even reaching the threshold for the number of active connections the database can handle. In the case of a business Service, there are few or no database connections available. Then you can imagine (concurrency, throughput, crashes).

1. IO bottlenecks

The first type: disk read IO bottleneck, too much hotspot data, database caching, each query will produce a large number of I/O, reduce the query speed -> branch library and vertical branch table. Second: network IO bottleneck, request too much data, network bandwidth is not enough -> branch library.

2. CPU bottlenecks

The first kind: SQL problem, such as SQL contains JOIN, group BY, order by, non-index field condition query, etc., increase THE OPERATION of CPU operation -> SQL optimization, establish appropriate indexes, and carry out business calculation in the business Service layer. Second: single table data volume is too large, query scan too many rows, SQL efficiency is low, CPU bottleneck -> horizontal table.

Two, sub-database sub-table

1. Horizontal branch library

Concept: Split data from one library into multiple libraries based on fields and policies (hash, range, etc.). Results:

Each library has the same structure;
Each library has different data and no intersection;
The union of all libraries is full data;

Scenario: The absolute concurrency of the system is up, it is difficult to fundamentally solve the problem of table splitting, and there is no obvious business ownership to divide the database vertically. Analysis: library more, IO and CPU pressure can naturally be doubled relief.

2. Level table

Concept: Split data in one table into multiple tables based on fields and policies (hash, range, etc.).

Results:

Each table has the same structure;
The data in each table is different and there is no intersection;
The union of all tables is full data;

Scenario: The absolute concurrency of the system does not increase, but the amount of data in a single table is too large, affecting SQL efficiency and increasing the CPU burden, and thus becoming a bottleneck. Recommended: Analysis of the principle of a SINGLE SQL query Analysis: The amount of data in a table is small, and the efficiency of a single SQL query is high, which naturally reduces the CPU burden.

3. Vertical branch library

Concept: Split different tables into different libraries based on service ownership.

Results:

Each library has a different structure;
The data in each library is also different and there is no intersection;
The union of all libraries is full data;

Scenario: The absolute concurrency of the system is up and individual business modules can be abstracted out. Analysis: At this point, it’s basically servitization. For example, with the development of services, there are more and more common configuration tables and dictionary tables. At this time, these tables can be separated into a separate library, or even service-oriented. Furthermore, as the business develops and incubates a business model, relevant tables can be separated into a separate library or even servitized.

4, vertical table

Concept: Separate the fields in a table into different tables (main table and extended table) according to the activity of the fields.

Results:

Each table has a different structure;
The data of each table is also different. Generally speaking, each table has at least one intersection column of fields, which is usually the primary key used to associate data.
The union of all tables is full data;

Scenario: The absolute amount of concurrent data does not increase. The table has a few records but many fields. In addition, hotspot data and non-hotspot data are stored together, and a single row of data requires large storage space. As a result, the number of data rows in the database cache is reduced. A large number of random read I/OS are generated when disk data is read during query, resulting in I/O bottlenecks. Analysis: You can use list pages and detail pages to help you understand. The split principle of vertical split tables is to put hot data (which may be redundant and often queried together) together as the primary table, and non-hot data together as the extended table. This allows more hotspot data to be cached, reducing random read IO. After the split, to get all the data needs to associate the two tables to fetch the data. Keep in mind, however, that you should never use joins, because joins not only increase the CPU burden but also couple the two tables together (on the same database instance). Associated data should be done in the business Service layer, fetching primary and extended table data separately and then associating all data with associated fields.

Three, database and table tools

Sharding-sphere: JAR, formerly sharding-JDBC;
TDDL: JAR, Taobao Distribute Data Layer;
Mycat: middleware.

Note: the pros and cons of the tool, please do your own research, official website and community priority.

Four, the steps of database and table

Evaluate the number of branches or tables based on capacity (current capacity and growth)-> Select key(uniform)-> table rules (hash or range)-> execute (double write)-> Capacity expansion (minimize data movement).

Fifth, the problem of database and table

1. Query non-partition keys

Based on the horizontal database and table, the split strategy is the common hash method. Only one non-partition key can be used for conditional query

Mapping method

Genetic method

Note: When writing, the genetic method generates user_id, as shown in the figure. For xBIT gene, for example, there are 8 tables, 23=8, so x is 3, that is, 3bit gene. When querying according to user_id, modules can be directly routed to the corresponding sub-database or sub-table.

When querying according to user_name, user_name_code is generated using the user_name_code generation function, and then the user_name_code is modelled to the corresponding sub-database or sub-table. Id Generates the common Snowflake algorithm.

More than one non-partition key is used as a conditional query

Mapping method

Redundancy method

Note: queries as order_id or buyer_id are routed to db_o_buyer, and queries as seller_id are routed to db_o_seller. Feels like putting the cart before the horse! Is there another good way? What about changing the technology stack?

In addition to partition keys, the background can also query the combination of non-partition keys

No method

Redundancy method

2. Non-partition key cross-library cross-table paging query problem

Based on the horizontal database and table, the split strategy is the common hash method.

Note: use NoSQL method to solve (ES, etc.).

3. Capacity expansion

Based on the horizontal database and table, the split strategy is the common hash method.

Horizontal expansion of library

(Upgrade slave library method)

Note: Expansion is multiplied.

Horizontal expansion table (double write migration method)

Step 1 :(synchronize double-write) modify the application configuration and code, add double-write, and deploy;
Step 2 :(synchronous double write) copy the old data from the old library to the new library;
Step 3 :(synchronous double-write) proofread the old data in the new database according to the old database;
Step 4 :(synchronize double-write) modify the application configuration and code, remove double-write, and deploy the application.

Note: Double-write is common.

Vi. Summary of database and table

Split database and table, first of all, we have to know where the bottleneck is, and then can reasonably split (split database or table? Horizontal or vertical? Several points? . And can not be divided in order to separate table and split.
Key selection is important to consider both split evenly and non-partition key query.
As long as it meets the requirements, the split rules should be as simple as possible.

MySQL: Internet companies commonly used sub-database sub-table solution summary!

1. Database bottlenecks

1. IO bottlenecks

2. CPU bottlenecks

Two, sub-database sub-table

1. Horizontal branch library

2. Level table

3. Vertical branch library

4, vertical table

Three, database and table tools

Four, the steps of database and table

Fifth, the problem of database and table

1. Query non-partition keys

2. Non-partition key cross-library cross-table paging query problem

3. Capacity expansion

Vi. Summary of database and table

Related Posts

JVM Notes [1] — Runtime data area

How does SAP WebIDE load a custom XML view in SAP UI5

Top 10 sorting algorithms with animation