If the amount of data is too large, people usually divide the database into separate tables. Sub – library needs to pay attention to less content, but sub – table needs to pay attention to more content.

Work in recent years have not encountered a particularly large amount of data business, those over 100 million of data, because the index set reasonable, single table performance has no impact, so the actual combat has not used the table. Recently I have a project in my hand, and it is estimated that the amount of data will be very large. The plan of table division is one of the options. Take this opportunity to sort out the contents of table division.

What we want to talk about this time is mainly the level sub – library sub – table, other kinds of sub – library, sub – table is easier to understand. If the following is not specified, all refer to the horizontal sub – library sub – table.

1. Basic knowledge

1.1 Definition of sub-database and sub-table

1.1.1 depots

Vertical segmentation: The system divides the tables of different modules into different databases by service modules.

For example, the e-commerce system has an e-commerce database, which can be divided into user database, commodity database and order database according to business modules. These can be regarded as independent databases and do not need to be put together. The advantage is that the change can be independent, and can be isolated from the interaction.

1.1.2 table

Vertical split table: that is, “big table split small table”, based on the column field. Usually because the table design is not reasonable, need to be split.

If a table to store students, teachers, courses, grades information, it is best to split into student table, class schedule, grades table.

Horizontal split table: for a single table with a large amount of data (such as order table), according to a certain rule (RANGE,HASH module, etc.), divided into multiple tables. But these tables are still in the same library, so library-level database operations still have IO bottlenecks. Not recommended.

Horizontal database and table: the data of a single table is divided into multiple servers, each server has corresponding libraries and tables, but the data set in the table is different. Horizontal library table can effectively relieve the performance bottleneck and pressure of single and single library, and break through the bottleneck of IO, connection number, hardware resources, etc.

1.2 Differences between Partitions and sharding

When dividing tables, you often see two terms: partition and shard. Both terms refer to the partitioning of a large table into chunks, but there are essential differences between the two.

Sharding (fragmentation) thoughts comes from the idea of partition, but the database partition is basically a data object level processing, such as tables and indexes of partitions, each data set can have different physical storage properties, and operating within the scope of a single database, and database Sharding is able to cross, even across the physical machine.

The Partition feature provided by MySQL5.1 does allow table partitioning, but it is limited to a single database and does not cross server boundaries.

When we divide tables, we usually use a sharding scheme, that is, data is stored on multiple physical machines.

1.3 Sharding Policy

Sharding rules are as follows:

1.3.1 Slice according to hash

  1. Mod-long: Hash partition to partition column values

    Fragment column ID = Partition column value mod Number of fragments

  2. Mod-long-by-hash: Hash partition for strings

    Fragment column ID =hash(partition column value) Number of mod fragments

1.3.2 Slice according to scope

  1. Range: Create a partition rule when creating a table. According to the partition rule, you can determine which partition the values of the partitioned columns are on

    Generally, partitions are listed as time or value, as in

    date_range:
    	0: 1000000
    	1: 2000000
    	2: 3000000
    	3: 4000000
    	4: maxvalue
    
    Copy the code

    If the partition column value is 1500000, the data is placed on shard 1.

2. Sub-database sub-table middleware

The user is not aware that this is a shard table, the use of the same as a normal table, generally need to introduce middleware.

There are generally three ways to manipulate a shard table:

2.1 Client Fragmentation

The so-called client sharding refers to the direct operation of sharding logic in the application layer that uses the database. Sharding rules need to be synchronized among multiple nodes of the same application, and each application layer has a logical implementation of operation slice. Such as Dangdang Sharding JDBC.

2.2 Proxy Sharding

Proxy sharding is to add a proxy layer between the application layer and database layer and configure routing rules for sharding on the proxy layer. The proxy layer provides JDBC-compatible interfaces to the application layer. After services are implemented, configure routing rules on the proxy layer. Mycat, for example, is based on this solution.

2.3 Distributed database that supports transactions

At present, there are OceanBase and TiDB frameworks. These frameworks package the implementation of scalable specific and distributed transactions into the internal implementation of distributed database, which is transparent to users. Users do not need to directly control these features, but the support for transactions is not as good as that of relational data. Suitable for big data log system, statistics system, query system, social networking sites, etc.

2.4 illustrates

A distributed database that supports transactions is another option that has nothing to do with MySQL.

For client sharding and proxy sharding, the two companies I have worked with are using proxy mode, one is MyCAT, the other is Dbatman. The client sharding mode is not touched. The differences between the two are as follows:

3. Distributed transactions


Sharding means that data is distributed across multiple physical machines, introducing the problem of distributed transactions. We slice up the data of a single table and store it in multiple databases or even multiple database instances. Therefore, the transaction mechanism of the database itself cannot meet the needs, so we need to use distributed transaction to solve the problem. See distributed systems and consistency protocols for more information about distributed transactions.

I won’t go into details about how distributed transactions are handled here, but I’ll write a separate article later. Let’s talk about how distributed transactions affect operating MySQL.

Knowing that distributed transaction issues are introduced, you can’t operate MySQL as if it were a single table. Different middleware has different capabilities, so it needs to be analyzed separately. I take Dbatman as an example to illustrate the differences in use.

  1. The fragment version does not maintain autoincrement and unique primary keys. Services can maintain unique keys by themselves
  • This means that the primary key IDS of different shards are the same
  1. Cross-shard transaction write is not supported, but cross-shard transaction read is supported
  • If you ensure that the contents of the transaction operation are in a shard, it is not a distributed transaction, as in a single machine

  • A transaction involving more than one shard is called a cross-node transaction and single-shard transaction support

  1. Update and INSERT must have sharded columns

To sum up, operating the same shard has no impact. Operating different shards depends on whether the middleware supports them.

If you use middleware, even if the same shard, try not to use special SQL, some middleware may not support, such as INSERT Not exists.

4. Determine whether to select the database and table

Choose to do sub-database sub-table, consider the following factors:

  1. Space: A single physical instance cannot support data storage requirements, and a single physical server cannot be expanded by adding disks

  2. Primary library performance: The CPU, memory, and disk IOPS of a primary library are affected. When the CPU, memory, and disk IOPS of a primary library approach or reach the upper limit, it needs to be split

  3. Disaster recovery: Reduce the impact of a single master library outage on writes.

In view of the above three points, we can also consider whether there is a more appropriate plan

  1. Space:
  • Delete historical data. Clear space

  • Modify the storage model to reduce MySQL disk usage

  • Switch to a storage engine with a higher space compression ratio

  1. Main library performance:
  • Read/write separation can be used to reduce the amount of read requests to the write library, thus improving the support for writing.

  • Optimize data write model to reduce batch write (peak clipping)

  1. Disaster recovery:
  • If services have high requirements on read availability, it is recommended to perform read/write separation and route important requests to read libraries. The number of read libraries is generally N more than that of write libraries. Automatic SWITCHOVER is performed on the agent layer.

  • From the perspective of the cluster as a whole, database and table partitioning actually increases the failure rate. Assume that the SLA of a single physical machine is 99.99%, then the SLA of two physical machines is 99.98(approximate number), and the SLA of 10 physical machines is only 99.90%. The average outage time increased from 52 minutes to 525 minutes per year. So in some scenarios, the failure of a single node may render the entire agent unavailable, magnifying the impact of the failure.

Design of 5.

Current project requirements are as follows:

  1. Generates a unique code with an integer value

  2. The code values need to be inserted into the database in batches

  3. The code update operation is a single processing, and the code value operation needs to be recorded

  4. The final number is variable, but in the long run the data will be very large

Based on the above requirements, do the following design:

  1. Code value primary key, their own control primary key unique

  2. The code table uses range to fragment, such as the fragment range of 01 billion, 120 million

  3. The operation record table of the code table also uses range to fragment, and the fragment range is the same as the code table

The requirements can be implemented through this design.

But after calculation, it is found that a single table can store billions of data, and the index design is reasonable, the business logic is relatively simple, no high concurrent requests, a single table seems to be possible.

conclusion

Under normal circumstances, we generally need to do horizontal sub-database sub-table, which involves distributed transactions, we must consider whether we can meet their needs, whether the SQL statement we want to use can support, consider whether there are other schemes.

About the realization principle of middleware, understanding is not very deep, if there is time behind, you can learn.

data

  1. MySQL sub – database sub – table scheme, summed up very good!

  2. MySQL database subtable (MyCAT implementation)

  3. Mysql database 分 析 表 现 : Mysql database 分 析 表 现

  4. MySql table, library, shard and partition knowledge

  5. The difference between Sharding and Partition

  6. Database sub-database sub-table middleware comparison (full)

  7. Sub-database sub-table middleware

  8. Sub-library and sub-table: comparison of middleware schemes

  9. XA distributed transaction principle

The last

If you like my article, you can follow my public account (Programmer Malatang)

My personal blog is shidawuhen.github. IO /

Review of previous articles:

recruitment

  1. Bytes to beat | push big 24:00

  2. Bytes to beat | headlines today guangzhou server push r&d engineers

  3. Bytes to beat | trill electricity now hiring front-end development project in Shanghai

  4. Bytes to beat | trill electricity senior server-side development engineer – trading in Shanghai

  5. Bytes to beat | trill electric ShangWuHan server-side development engineer (senior)

  6. Bytes to beat | fly book big customer push product manager

  7. Bytes to beat | trill electricity service side technical posts vacant

  8. Bytedance recruitment special

Design patterns

  1. Go Design Mode (15)- Facade mode

  2. Go Design Pattern (14)- Adapter pattern

  3. Go Design Mode (13)- Decorator mode

  4. Go Design Mode (12)- Bridge mode

  5. Go Design Pattern (11)- Proxy pattern

  6. Go Design Mode (10)- Prototype mode

  7. Go Design Mode (9)- Builder mode

  8. Go Design Pattern (8)- Abstract Factory

  9. Go Design Mode (7)- Factory Mode

  10. Go Design Pattern (6)- Singleton pattern

  11. Go Design Pattern (5)- Class diagram symbolic representation

  12. Go Design Pattern (4)- Code writing optimization

  13. Go Design Pattern (4)- Code writing

  14. Go Design Patterns (3)- Design principles

  15. Go Design Pattern (2)- Object-oriented analysis and design

  16. Go Design Pattern (1)- Syntax

language

  1. No more fear of not getting Gin request data

  2. Understand pprof

  3. Go tool generate

  4. Go singleton implementation scheme

  5. Implementation principle of Go channel

  6. Implementation principle of Go timer

  7. Beego framework use

  8. Golang source BUG tracking

  9. Gin framework concise version

  10. Gin source code analysis

architecture

  1. The paging check pit is designed

  2. Payment access general issues

  3. Current limiting 2

  4. Seconds kill system

  5. Distributed systems and consistency protocols

  6. Service framework and registry for microservices

  7. Discussion on Micro-service

  8. Current limiting implementation 1

  9. CDN request process details

  10. Common Cache tips

  11. How to effectively connect with third-party payment

  12. Algorithm is summarized

storage

  1. MySQL development specification

  2. Redis implements distributed locking

  3. The implementation principle of atomicity, consistency and persistence of transactions

  4. InnoDB locks and transactions

network

  1. HTTP2.0 basics tutorial

  2. HTTPS Configuration Combat

  3. HTTPS Connection Process

  4. TCP Performance Optimization

tool

  1. GoLand Practical skills

  2. Automatically generate go struct from mysql table

  3. Markdown editor recommends – Typora

Reading notes

  1. Selected by MAO

  2. The principle of

  3. History As A Mirror

  4. Agile revolution

  5. How to exercise your memory

  6. Simple Logic – After reading

  7. Hot Wind – After reading

  8. Analects of Confucius – After reading

  9. Sun Tzu’s Art of War – Reflections from reading

thinking

  1. Some thoughts on blogging

  2. The experience of calling 119 at night

  3. Struggle to mobilize all forces for victory

  4. Anti-liberalism

  5. practical

  6. The standard by which you judge yourself

  7. 2020 Blog Summary

  8. Service team holiday shift plan

  9. Project process management

  10. Some thoughts on project management

  11. Some thoughts on product manager

  12. Thinking about programmer career development

  13. Thinking about code review