Can not carry, the boss let me on the sub-table plan

This article github.com/smileArchit… Has been included. JavaMap is aMap of Java knowledge designed to help developers learn and not get lost! Java learning please look for JavaMap.

Start with a few questions:

(1) When do we need the sub-database sub-table? What are our criteria?

(2) How much data is stored in a table?

(3) The speed of data growth is very fast. How much data are generated every day before we need to consider making sub-database sub-table?

Have you figured out all these questions? I believe the answer will come after reading this article.

Why do we need to divide the tables?

First, to answer the question of why the database is divided into tables, the answer is simple: the database has a performance bottleneck. In plain English, the database is running out of time.

Database performance bottleneck, external performance has several aspects:

Heavy request blocking

In a high concurrency scenario, a large number of requests need to operate the database. As a result, the number of connections is insufficient and the requests are blocked.
SQL operations are slow

If there is a table in the database with hundreds of millions of data volumes, a full table scan will be performed if the SQL fails to hit the index. This query will take a long time.
Storage problem

The service volume increases rapidly, and the data volume in a single database becomes larger and larger, which causes great pressure on storage.

From the point of view of the machine, the performance bottleneck is nothing more than CPU, memory, disk, network, etc. The simplest way to solve the performance bottleneck is to improve the machine performance, but the cost/benefit ratio of this method is often too high to be cost-effective, so the focus is on the software perspective.

Database optimization scheme

There are many database optimization schemes, mainly divided into two categories: software level, hardware level.

Software level includes: SQL tuning, table structure optimization, read and write separation, database cluster, database and table;

The hardware layer mainly increases machine performance.

SQL tuning

SQL tuning is often the first step in solving a database problem, and often a little effort can yield a lot of benefits.

The main goal of SQL tuning is to make slow SQL as fast as possible by simply making the SQL execution hit the index as fast as possible.

Enable slow SQL recording

If you are using Mysql, you need to set a few parameters in the Mysql configuration file.

slow_query_log=on
long_query_time=1
slow_query_log_file=/path/to/log
Copy the code

Tuning tools

The explain command is often used to look at the execution plan of an SQL statement. It is easy to see whether the SQL statement is a full table scan and whether an index is hit.

Select id, age, gender from user where name = 'laughing architect ';Copy the code

Returns a column called “type”. Common values are:

ALL, index, range, ref, eq_ref, const, system, NULL

ALL indicates that the SQL statement scans the entire table and needs to be optimized. Generally, the range level and above is required.

Table structure optimization

Take a scenario as an example:

“User” table has user_id, nickname and other fields, “order” table has order_ID, user_ID and other fields, if you want to get the user nickname? In general, the user table is queried when the order table is queried by joining the associated table operation, so as to obtain the nickname of the guide user.

But with the increase of business volume, order table and user table is certainly a surge, this time through the two tables associated data is more laborious, in order to take a nickname field and have to query tens of millions of user table, its speed can be guessed.

At this time, you can try adding the nickname field to the order table (order_ID, user_ID, nickname), which is usually called the database table redundant field. The benefit of this is that you no longer need to query the user table associated when displaying the order list.

The downside of having redundant fields is that if this field update involves updating multiple tables at the same time, try to select a field that is infrequently updated.

Structure optimization

When a single database instance cannot hold, we can add instances to form a cluster for external services.

When we find that the number of read requests is significantly higher than the number of write requests, we can make the primary instance responsible for the write, and the secondary instance provides the external read capability.

If reading the instance is still very stressful, you can add a cache such as Redis in front of the database to allow requests to reduce database access by caching the data first.

After the cache has shared some of the burden, the database is still the bottleneck. In this case, it is time to consider dividing the database and table, which will be discussed in more detail later.

The hardware optimization

Hardware costs are very high, and it is generally not possible to upgrade hardware when you run into database performance bottlenecks.

When the initial service volume is small, the hardware database performance can be greatly improved by upgrading. But at a later stage, the benefits of upgrading hardware are less obvious.

Detailed explanation of sub-table

Let’s take a mall system as an example to explain how the database evolves step by step.

One application, one database

In the early start-up stage, I want to make a mall system, which is basically a system containing several basic functional modules, and then packaged into a WAR package to be deployed. This is a typical single architecture application.

As shown in the figure above, the mall system includes homepage Portal template, user module, order module, inventory module and so on. All modules share a database, and there are usually many tables in the database.

Since the user base is small, this architecture works perfectly well in the early days, and developers can take demos to investors.

Once you get the money from the investors, the business needs to be promoted on a large scale, and the system architecture needs to match the rapid development of the business.

Multiple applications with single database

In the early stage, in order to seize the market, this system is constantly updated, more and more code, the architecture has become more and more bloated, now with the system access pressure gradually increased, system splitting is imperative.

In order to ensure smooth service, system architecture reconfiguration is carried out in several stages.

In the first stage, the single architecture of the mall system is divided into sub-services according to functional modules, such as Portal service, user service, order service, inventory service, etc.

Multiple applications with single database

As shown in the figure above, multiple services share a database so that the underlying database access logic can be left untouched with minimal impact.

Multiple applications and multiple databases

As the promotion of the business increased, the database finally became a bottleneck, and at this point it was hardly feasible for multiple services to share a database. We need to separate the tables related to each service to create a separate database, which is called “sub-database”.

There is a limit to the amount of concurrency a single database can support. Splitting into multiple libraries can improve the performance of services without competition.

As shown in the figure above, a large data is divided into several small databases, and each service corresponds to a database. This is the necessary “database” operation when the system develops to a certain stage.

The same is true of microservice architecture, which is very popular today. If you break down the application without breaking down the database, you can’t solve the fundamental problem, and the whole system can easily reach a bottleneck.

table

After that, when will the table be divided?

If the system is in a rapid development stage, take the mall system, a day order volume may be hundreds of thousands, the order table in the database growth is particularly fast, growth to a certain stage of the database query efficiency will be significantly reduced.

Therefore, when the single table data increment is too fast, the industry circulation is more than 5 million data volume to consider sub-table. Of course, 5 million is just an empirical point, you can make a decision based on the actual situation.

Then how to divide the table?

Sub-table has several dimensions, one is horizontal segmentation and vertical segmentation, the other is single database sub-table and multi-database sub-table.

Horizontal split and vertical split

Take the users table (user), the table has seven fields: id, name, age, sex, the nickname, the description, if the nickname and the description is not commonly used, we can be broken down into another table: User details table, so that a user table is split into user basic information table + user details table, the two tables are not the same structure and independent of each other. But from this point of view, vertical splitting does not fundamentally solve the problem of large amount of data in a single table, so we still need to do a horizontal splitting.

Another way to split a table is to split it into two tables with an odd number of ids: 1,3,5,7… Put in user1, id even: 2,4,6,8… In user2, such a split is a horizontal split.

There are many ways to split horizontally. In addition to the above table split by ID, it can also be split according to the time dimension, such as the order table, which can be split by day, month and so on.

Daily table: Stores only the data of the day.
Monthly table: You can start a scheduled task to migrate all the data of the previous day to the current month table.
History table: You can also use a scheduled task to migrate data older than 30 days to the History table.

To summarize the characteristics of horizontal and vertical split:

Vertical sharding: Partitioning based on tables or fields, with different table structures.
Horizontal sharding: Based on data partition, table structure is the same, data is different.

Single library split and multiple library split

Take horizontal splitting as an example, where each table is split into multiple child tables that exist in the same database. For example, the following user table is split into user 1 table and user 2 table.

Splitting a table into several sub-tables in a database can solve the single-table query performance problem to some extent, but there is also a problem: single-database storage bottleneck.

So the industry is more used to split child tables into multiple databases. For example, in the figure below, the user table is split into two child tables, each of which exists in a different database.

In a word, sub-table is mainly to reduce the size of a single table, to solve the performance problems caused by the amount of data in a single table.

The complexity of dividing libraries and tables

Since the sub-scale is so good, shouldn’t we use it at the beginning of the project? Do not get excited, calm down, the sub-database sub-table does solve a lot of problems, but also brought a lot of complexity to the system, the following briefly explained.

(1) Cross-library associated query

Before the single database is not divided into tables, we can easily use join operation to associate multiple tables to query data, but after the database is divided into tables, the two tables may not be in the same database, how to use join?

There are several solutions:

Field redundancy: Put the fields to be associated into the main table to avoid the join operation.
Data abstraction: gather data through ETL to generate new tables;
Global tables: For example, some basic tables can be placed in each database.
Application layer assembly: the basic data can be detected and calculated through the application program.

(2) Distributed transactions

A single database can be handled with a local transaction, while multiple databases can only be handled with a distributed transaction.

Common solutions include reliable messaging (MQ) based solutions, two-phase transaction commit, flexible transactions, etc.

(3) Sorting, paging, function calculation problems

When using SQL, keywords such as order by and LIMIT need special processing. Generally, the idea of sharding is adopted:

The corresponding function is executed on each shard, and then the result set of each shard is summarized and re-computed to obtain the final result.

(4) Distributed ID

If you use Mysql database, you can use id autoincrement as primary key for a single table in a single database. If you use Mysql database, you can use id autoincrement as primary key.

Common distributed ID solutions are:

UUID
Maintain a separate ID table based on database increment
Them roughly mode
Redis cache
Snowflake algorithm
Baidu uid – the generator
Meituan Leaf
Drops Tinyid

These solutions will be covered in a later article and will not be expanded here.

(5) Multiple data sources

After dividing the database into tables, it may be faced with obtaining data from multiple databases or subtables. The general solution is: client adaptation and agent layer adaptation.

Common middleware in the industry include:

Shardingsphere (formerly Sharding-JDBC)
Mycat

conclusion

If there is a database problem do not rush to divide the database table, first see if the use of conventional means can be solved.

Dividing the database and dividing the table will bring great complexity to the system, and it is not recommended to use it in advance unless it is absolutely necessary. Being a system architect can make your system flexible and extensible, but don’t overdesign and design ahead of time. At this point, the architect must be proactive and anticipate in advance. Did you get it?

— END —

Do not white whao, point a praise cough up, each picture is carefully drawn, the pen refill ~

About the author: ☕ read a few years: Huazhong University of Science and Technology master graduate; 😂 wave over several big factories: Huawei, netease, Baidu…… 😘 has always believed that technology can change life, willing to maintain the original aspiration, refueling technology people!

Wechat search public account [love laughing architect], pay attention to this technology and the pursuit of life.

Finally recommend a treasure open source project, github.com/smileArchit… JavaMap is aMap of Java knowledge that allows developers to learn without getting lost! Java learning please look for JavaMap. JAVA core knowledge point arrangement (283 pages, super detailed) free.