Why separate database and table?

With the great leap forward of informationization in recent years, paperless office of all walks of life has produced a large amount of data, and more and more data are stored in the database. When using a MySQL database, a performance watershed occurs when a single table exceeds 20 million data volumes. In addition, physical servers have limited resources such as CPU, memory, storage, and connection number. When a large number of connections are performed at the same time in a certain period of time, database performance bottlenecks may occur. To solve this problem, industry pioneers fully develop the idea of divide-and-conquer, large table segmentation, and then implement better control and management, using the CPU, memory, storage of multiple machines, to provide better performance. There are two ways to divide and conquer: vertically and horizontally.

The vertical resolution

Vertical split into vertical sub – library and vertical sub – table. Let’s talk about vertical repositories. Vertical segmentation is actually a simple logical segmentation. For example, in our database we have table Products, table Orders, and table Scores. Then we can create three databases, one for goods, one for orders, and one for points. As shown below:

Vertical repository has the advantage that it can be incubated according to business scenarios. For example, only 2-3 tables are used in a single scenario. Basically, applications and databases can be split into corresponding services.

Let’s say we have 100 fields in a table. After analyzing the SQL statement executed by the current business, 20 fields are frequently used, while the other 80 fields are rarely used. So we can put 20 fields in the main table, and we’re creating a secondary table for the other 80 fields. Of course, both primary and secondary tables have primary keys. By associative merging with the primary key, they can assemble a table of 100 fields.

Vertical table can solve the problem of cross-page. It’s called a row link in Oracle. How do you understand that? That is, in the case of fewer fields, the original row of data only needs to be stored in one page, but in the case of more fields, it can not be stored, so it needs to cross pages. This can result in additional addressing and a performance overhead. In addition, loading such a long line of data into memory is often several pages. As a result, we often only access a few fields, which is also a great cost to memory. In order to cache more data in memory and reduce disk I/O, vertical partitioning is a good method.

In general: Vertical split has the following advantages:

  • Follow the business segmentation, similar to the recently popular concept of microservices, to facilitate management and scaling after decoupling.
  • In high concurrency scenarios, vertically splitting the CPU, I/O, and memory of multiple servers improves performance and improves the number of database connections and some resource constraints on a single server.
  • Can realize the separation of hot and cold data.

Disadvantages of vertical splitting:

  • Some business tables cannot join, so the application layer needs to be greatly transformed and can only be achieved through aggregation. Increased the difficulty of development.
  • When the amount of table data in a single library increases, it still has not been effectively solved.
  • Distributed transactions are also a challenge.

Horizontal split

When the amount of data in a certain table reaches a certain level, the performance watershed will appear when there is more than 20 million data in a single MySQL table. In this case, you cannot split the database based on the service rules. As a result, the read and write performance of a single library is bottleneck. That’s when you have to split horizontally.

Horizontal split is divided into a sub-table and sub-database sub-table. Let’s talk about the inventory table. Suppose that when we get to 50 million rows of Orders, this affects our database read and write efficiency. We can consider rang partitioning according to order_id of order number, that is, placing orders numbered 1-10 million in Order1, orders numbered 10-20 million in Order2, and so on, holding 10 million data in each table. As shown below:

Although we can fix the capacity of a single table at 10 million through the database sub-table, the data of these tables are still stored in a database, using the CPU, IO and memory of the host. There is also a limit to the number of connections to a single library. It doesn’t completely reduce the pressure on the system. At this point, we need to consider another technique called sub-database sub-table. On the basis of the sub-table in the database, the sub-table is moved to different hosts and databases. The CPU, memory, and I/O resources of other hosts can be fully used. And after the separation of libraries, the connection number limit of a single library is no longer a bottleneck. But if you perform a scan without a shard key, you need to look it up on every library. We just split the 5 libraries by order_id, but when we query for name=’AAA’ and without order_id, it doesn’t know which shard to look on, so we create 5 joins and then retrieve each library. This broadcast query results in an increase in connection counts. Because it needs to create connections on each library. On a highly concurrent system, executing this broadcast query will quickly alert the system thread.

In general, the advantages of horizontal splitting are as follows:

  • Horizontal expansion can be expanded wirelessly. There is no such thing as a large table in a library.
  • It can cope with high concurrency and disperse hot data.
  • The changes on the application side are minor and do not need to be split based on services.

Disadvantages of horizontal splitting:

  • Routing is a problem, requiring an additional layer of routing calculation, and as mentioned earlier, queries without shard keys produce broadcast SQL.
  • Cross-library joins have poor performance.
  • Consistency issues for distributed transactions need to be addressed.

Used together

In our current system, both vertical split and horizontal split are in use. Vertical split is mainly to do business segmentation, and all subsystems of the business are planned well. Decoupling is possible. And then vertically split. Let’s do the level sub-database sub-table. The large table data is separated into several libraries by the model taking algorithm.

Logical and physical libraries

Introduced the above sub – library sub – table, we need to say a few concepts, one is the concept of logical library and physical library. Let’s take the sub-database sub-table in horizontal split. At the physical level, we split the data from one library into five databases. These five databases are physical libraries, and their representation of the upper-layer application is a library. The library presented to the upper layer is called the logical library. The logical library is transparent to the application layer. Applications don’t need to understand the underlying situation, just use it.

Logical tables and physical tables

For example, the orders table is split into five parts, with orders_1 to 5 at the bottom. The bottom five tables are physical tables. But at the application level, there is only the Orders table. This is the logical table.

Conclusion: This article mainly describes some concepts after the separation of database and table. We need to deepen our understanding, because our project has just started to split, so we hope you can put forward your opinions and correct any mistakes.

Reference Documents:

  • 1. Introduction to high-performance database cluster – depots table: https://www.jianshu.com/p/9eadfba9cdaf
  • 2. Database depots table: https://www.cnblogs.com/butterfly100/p/9034281.html