MySQL database sub-database sub-table

If the amount of data is too large, people usually divide the database into separate tables. Sub – library needs to pay attention to less content, but sub – table needs to pay attention to more content.

Work in recent years have not encountered a particularly large amount of data business, those over 100 million of data, because the index set reasonable, single table performance has no impact, so the actual combat has not used the table. Recently I have a project in my hand, and it is estimated that the amount of data will be very large. The plan of table division is one of the options. Take this opportunity to sort out the contents of table division.

What we want to talk about this time is mainly the level sub – library sub – table, other kinds of sub – library, sub – table is easier to understand. If the following is not specified, all refer to the horizontal sub – library sub – table.

1. Basic knowledge

1.1 Definition of sub-database and sub-table

1.1.1 depots

Vertical segmentation: The system divides the tables of different modules into different databases by service modules.

For example, the e-commerce system has an e-commerce database, which can be divided into user database, commodity database and order database according to business modules. These can be regarded as independent databases and do not need to be put together. The advantage is that the change can be independent, and can be isolated from the interaction.

1.1.2 table

Vertical split table: that is, “big table split small table”, based on the column field. Usually because the table design is not reasonable, need to be split.

If a table to store students, teachers, courses, grades information, it is best to split into student table, class schedule, grades table.

Horizontal split table: for a single table with a large amount of data (such as order table), according to a certain rule (RANGE,HASH module, etc.), divided into multiple tables. But these tables are still in the same library, so library-level database operations still have IO bottlenecks. Not recommended.

Horizontal database and table: the data of a single table is divided into multiple servers, each server has corresponding libraries and tables, but the data set in the table is different. Horizontal library table can effectively relieve the performance bottleneck and pressure of single and single library, and break through the bottleneck of IO, connection number, hardware resources, etc.

1.2 Differences between Partitions and sharding

When dividing tables, you often see two terms: partition and shard. Both terms refer to the partitioning of a large table into chunks, but there are essential differences between the two.

Sharding (fragmentation) thoughts comes from the idea of partition, but the database partition is basically a data object level processing, such as tables and indexes of partitions, each data set can have different physical storage properties, and operating within the scope of a single database, and database Sharding is able to cross, even across the physical machine.

The Partition feature provided by MySQL5.1 does allow table partitioning, but it is limited to a single database and does not cross server boundaries.

When we divide tables, we usually use a sharding scheme, that is, data is stored on multiple physical machines.

1.3 Sharding Policy

Sharding rules are as follows:

1.3.1 Slice according to hash

Mod-long: Hash partition to partition column values

Fragment column ID = Partition column value mod Number of fragments
Mod-long-by-hash: Hash partition for strings

Fragment column ID =hash(partition column value) Number of mod fragments

1.3.2 Slice according to scope

Range: Create a partition rule when creating a table. According to the partition rule, you can determine which partition the values of the partitioned columns are on

Generally, partitions are listed as time or value, as in
```
date_range:
	0: 1000000
	1: 2000000
	2: 3000000
	3: 4000000
	4: maxvalue

Copy the code
```
If the partition column value is 1500000, the data is placed on shard 1.

2. Sub-database sub-table middleware

The user is not aware that this is a shard table, the use of the same as a normal table, generally need to introduce middleware.

There are generally three ways to manipulate a shard table:

2.1 Client Fragmentation

The so-called client sharding refers to the direct operation of sharding logic in the application layer that uses the database. Sharding rules need to be synchronized among multiple nodes of the same application, and each application layer has a logical implementation of operation slice. Such as Dangdang Sharding JDBC.

2.2 Proxy Sharding

Proxy sharding is to add a proxy layer between the application layer and database layer and configure routing rules for sharding on the proxy layer. The proxy layer provides JDBC-compatible interfaces to the application layer. After services are implemented, configure routing rules on the proxy layer. Mycat, for example, is based on this solution.

2.3 Distributed database that supports transactions

At present, there are OceanBase and TiDB frameworks. These frameworks package the implementation of scalable specific and distributed transactions into the internal implementation of distributed database, which is transparent to users. Users do not need to directly control these features, but the support for transactions is not as good as that of relational data. Suitable for big data log system, statistics system, query system, social networking sites, etc.

2.4 illustrates

A distributed database that supports transactions is another option that has nothing to do with MySQL.

For client sharding and proxy sharding, the two companies I have worked with are using proxy mode, one is MyCAT, the other is Dbatman. The client sharding mode is not touched. The differences between the two are as follows:

3. Distributed transactions

Sharding means that data is distributed across multiple physical machines, introducing the problem of distributed transactions. We slice up the data of a single table and store it in multiple databases or even multiple database instances. Therefore, the transaction mechanism of the database itself cannot meet the needs, so we need to use distributed transaction to solve the problem. See distributed systems and consistency protocols for more information about distributed transactions.

I won’t go into details about how distributed transactions are handled here, but I’ll write a separate article later. Let’s talk about how distributed transactions affect operating MySQL.

Knowing that distributed transaction issues are introduced, you can’t operate MySQL as if it were a single table. Different middleware has different capabilities, so it needs to be analyzed separately. I take Dbatman as an example to illustrate the differences in use.

The fragment version does not maintain autoincrement and unique primary keys. Services can maintain unique keys by themselves

This means that the primary key IDS of different shards are the same

Cross-shard transaction write is not supported, but cross-shard transaction read is supported

If you ensure that the contents of the transaction operation are in a shard, it is not a distributed transaction, as in a single machine
A transaction involving more than one shard is called a cross-node transaction and single-shard transaction support

Update and INSERT must have sharded columns

To sum up, operating the same shard has no impact. Operating different shards depends on whether the middleware supports them.

If you use middleware, even if the same shard, try not to use special SQL, some middleware may not support, such as INSERT Not exists.

4. Determine whether to select the database and table

Choose to do sub-database sub-table, consider the following factors:

Space: A single physical instance cannot support data storage requirements, and a single physical server cannot be expanded by adding disks
Primary library performance: The CPU, memory, and disk IOPS of a primary library are affected. When the CPU, memory, and disk IOPS of a primary library approach or reach the upper limit, it needs to be split
Disaster recovery: Reduce the impact of a single master library outage on writes.

In view of the above three points, we can also consider whether there is a more appropriate plan

Space:

Delete historical data. Clear space
Modify the storage model to reduce MySQL disk usage
Switch to a storage engine with a higher space compression ratio

Main library performance:

Read/write separation can be used to reduce the amount of read requests to the write library, thus improving the support for writing.
Optimize data write model to reduce batch write (peak clipping)

Disaster recovery:

If services have high requirements on read availability, it is recommended to perform read/write separation and route important requests to read libraries. The number of read libraries is generally N more than that of write libraries. Automatic SWITCHOVER is performed on the agent layer.
From the perspective of the cluster as a whole, database and table partitioning actually increases the failure rate. Assume that the SLA of a single physical machine is 99.99%, then the SLA of two physical machines is 99.98(approximate number), and the SLA of 10 physical machines is only 99.90%. The average outage time increased from 52 minutes to 525 minutes per year. So in some scenarios, the failure of a single node may render the entire agent unavailable, magnifying the impact of the failure.

Design of 5.

Current project requirements are as follows:

Generates a unique code with an integer value
The code values need to be inserted into the database in batches
The code update operation is a single processing, and the code value operation needs to be recorded
The final number is variable, but in the long run the data will be very large

Based on the above requirements, do the following design:

Code value primary key, their own control primary key unique
The code table uses range to fragment, such as the fragment range of 01 billion, 120 million
The operation record table of the code table also uses range to fragment, and the fragment range is the same as the code table

The requirements can be implemented through this design.

But after calculation, it is found that a single table can store billions of data, and the index design is reasonable, the business logic is relatively simple, no high concurrent requests, a single table seems to be possible.

conclusion

Under normal circumstances, we generally need to do horizontal sub-database sub-table, which involves distributed transactions, we must consider whether we can meet their needs, whether the SQL statement we want to use can support, consider whether there are other schemes.

About the realization principle of middleware, understanding is not very deep, if there is time behind, you can learn.

data

MySQL sub – database sub – table scheme, summed up very good!
MySQL database subtable (MyCAT implementation)
Mysql database 分析表现 : Mysql database 分析表现
MySql table, library, shard and partition knowledge
The difference between Sharding and Partition
Database sub-database sub-table middleware comparison (full)
Sub-database sub-table middleware
Sub-library and sub-table: comparison of middleware schemes
XA distributed transaction principle

The last

If you like my article, you can follow my public account (Programmer Malatang)

My personal blog is shidawuhen.github. IO /

Review of previous articles:

recruitment

Bytes to beat | push big 24:00
Bytes to beat | headlines today guangzhou server push r&d engineers
Bytes to beat | trill electricity now hiring front-end development project in Shanghai
Bytes to beat | trill electricity senior server-side development engineer – trading in Shanghai
Bytes to beat | trill electric ShangWuHan server-side development engineer (senior)
Bytes to beat | fly book big customer push product manager
Bytes to beat | trill electricity service side technical posts vacant
Bytedance recruitment special

Design patterns

Go Design Mode (15)- Facade mode
Go Design Pattern (14)- Adapter pattern
Go Design Mode (13)- Decorator mode
Go Design Mode (12)- Bridge mode
Go Design Pattern (11)- Proxy pattern
Go Design Mode (10)- Prototype mode
Go Design Mode (9)- Builder mode
Go Design Pattern (8)- Abstract Factory
Go Design Mode (7)- Factory Mode
Go Design Pattern (6)- Singleton pattern
Go Design Pattern (5)- Class diagram symbolic representation
Go Design Pattern (4)- Code writing optimization
Go Design Pattern (4)- Code writing
Go Design Patterns (3)- Design principles
Go Design Pattern (2)- Object-oriented analysis and design
Go Design Pattern (1)- Syntax

language

No more fear of not getting Gin request data
Understand pprof
Go tool generate
Go singleton implementation scheme
Implementation principle of Go channel
Implementation principle of Go timer
Beego framework use
Golang source BUG tracking
Gin framework concise version
Gin source code analysis

architecture

The paging check pit is designed
Payment access general issues
Current limiting 2
Seconds kill system
Distributed systems and consistency protocols
Service framework and registry for microservices
Discussion on Micro-service
Current limiting implementation 1
CDN request process details
Common Cache tips
How to effectively connect with third-party payment
Algorithm is summarized

storage

MySQL development specification
Redis implements distributed locking
The implementation principle of atomicity, consistency and persistence of transactions
InnoDB locks and transactions

network

HTTP2.0 basics tutorial
HTTPS Configuration Combat
HTTPS Connection Process
TCP Performance Optimization

tool

GoLand Practical skills
Automatically generate go struct from mysql table
Markdown editor recommends – Typora

Reading notes

Selected by MAO
The principle of
History As A Mirror
Agile revolution
How to exercise your memory
Simple Logic – After reading
Hot Wind – After reading
Analects of Confucius – After reading
Sun Tzu’s Art of War – Reflections from reading

thinking

Some thoughts on blogging
The experience of calling 119 at night
Struggle to mobilize all forces for victory
Anti-liberalism
practical
The standard by which you judge yourself
2020 Blog Summary
Service team holiday shift plan
Project process management
Some thoughts on project management
Some thoughts on product manager
Thinking about programmer career development
Thinking about code review