MySQL database and partition

preface

Relational database is easy to become a system bottleneck, single storage capacity, connection number, processing capacity are limited, when the amount of data and concurrency up, it must be on the database segmentation.

The method of data sharding is to divide database and table. There are two aspects to the sub-table, which may be optical sub-table or optical sub-table.

The core content of database distribution is nothing more than data segmentation, and the positioning and integration of data after segmentation.

Why separate database and table

table

A large amount of data in a single table seriously affects SQL performance. When the general single table reaches millions, the performance will be relatively poor, then the score table.

Split table is to put the data of one table into multiple tables, and then look up one table when querying. For example, divide the table by project ID: put a fixed number of project data into a table, so that the amount of data in each table can be controlled within a manageable range.

depots

As a rule of thumb, a library needs to be expanded up to 2000 concurrency, and a healthy single-library concurrency value should be kept around 1000. Then you can split the data from one library into multiple libraries and access only one library.

This is called a separate list. Why do we have a separate list?

  • Improve concurrency support
  • Reducing disk Usage
  • Improves SQL execution performance

How to divide database and table

Look directly at the picture:

For vertical split, it is recommended to do a good table design at the beginning of the system design, to avoid vertical split table.

Horizontal splits can be divided by range or by a field hash. According to the range to divide, the advantage is easy to expand, prepare a new table or library can be. However, hot-spot problems may occur easily. Therefore, you need to use it based on service scenarios. The advantage of hash partitioning is that the request load of each library or table can be evenly distributed. The disadvantage is that the capacity expansion is troublesome, and the previous data needs to be rehash, and there is a process of data migration.

Problems brought by separate database and separate table

It can effectively relieve the pressure of network IO, hardware resources and connection number brought by single machine and single library. But it also brings some problems.

  • Transaction consistency problem

    This can be resolved by distributed transactions or by ensuring ultimate consistency.

  • Query the join problem associated across nodes

    Global tables, field redundancy, data assembly, ER sharding

  • Cross-node paging, sorting, aggregation function issues

    Firstly, queries are performed on different shard nodes, and finally the results are summarized or merged

  • Global primary key avoidance problem

    Various distributed ID generation algorithms

  • Data migration and capacity expansion problems

    For range sharding, you only need to add nodes to expand the capacity.

    If hash is used, historical data is read first, and then data is written to each sharding node according to the specified sharding rules.

Data migration

This section describes two data migration schemes.

One of the lowest scheme, is the system downtime for a period of time, with the realization of written derivative data tools run a single table of data out of the single, write sub-table inside.

The second solution sounds more plausible, the double-write migration solution. In the online system, where all the data has been written before, add, delete and change operations, except for the old library add, delete and change, plus the new library add, delete and change, this is called double write. Then after the system deployment, the derivative data tool in the scheme 1 run, read the old library to write the new library. When writing, use gmT_modified to determine when the data was last modified, unless it is not read in the new library, or is newer than the new library data. Simply put, it is not allowed to overwrite new data with old data.

After writing a round, it is possible that there are still inconsistencies, so the program will automatically check a new round, compare each table of the old and new library data, and then if there are differences, for those differences, read data from the old library and write again. Repeat the loop until the data are identical.

The middleware

The more common middleware of sub-library sub-table are:

  • Cobar: Alibaba B2B team developed and open source, is a proxy layer solution, between application server and database server. The application accesses the Cobar cluster through the JDBC driver. Cobar decompresses the SQL according to the SQL and repository rules, and then distributes the SQL to different database instances in the MySQL cluster for execution. Operations such as read/write separation, stored procedures, cross-library joins, and paging are not supported. It hasn’t been updated in recent years, and nobody uses it anymore.
  • TDDL: Developed by Taobao team and belongs to client layer scheme. Supports basic CRUD syntax and read-write separation, but does not support join, multi-table query, and other syntax. At present, it is not used much, because it still relies on Taobao’s Diamond configuration management system.
  • Atlas: 360 open source, belongs to the proxy layer scheme. Also did not maintain for several years, now use the company basic also very few.
  • Sharding- JDBC: Dangdang open source, belongs to the client layer scheme, has been renamed ShardingSphere. SQL syntax support is also more, without too many restrictions, support sub-database sub-table, read and write separation, distributed ID generation, flexible transactions (maximum effort service type transactions, TCC transactions). And use more companies, active community.
  • Mycat: A proxy layer scheme based on Cobar transformation. The functionality supported is very complete. It’s a little younger than Sharding-JDBC.

In summary, sharing-JDBC and Mycat are now considered.

Sharding-jdbc client layer scheme has the advantages of no deployment, low operation and maintenance cost, no need for secondary forwarding of the proxy layer, and high performance. The disadvantage is coupling.

The disadvantage of proxy layer scheme like Mycat is that it needs to deploy and operate and maintain a set of middleware by itself, so the operation and maintenance cost is high. However, the advantage is that it is transparent to the project.

MySQL partition (not recommended)

Partitioning is introduced here to prevent confusion with concepts such as shards, libraries and tables.

MySQL supports partition from version 5.1. Partitioning refers to the breaking up of a table by a database into smaller, more manageable parts according to certain rules. In the case of an application accessing a database, there is logically only one table or index, but in reality this table may consist of multiple physical partitions, that is, transparent to the application.

MySQL partitioning introduces the concept of partition keys, which take a divide-and-conquer method to facilitate the management of very large tables. Partitioning keys are used to perform aggregation of data based on an interval value, a specific list of values, or a HASH function, allowing data to be distributed in different partitions according to rules. The following six partition types are available in MySQL 5.7:

  • RANGE partition: Divide data into different partitions based on a given contiguous RANGE.
  • LIST partition: similar to RANGE partition, except that LIST partition is based on an enumerated LIST of values, while RANGE partition is based on a given continuous RANGE.
  • COLUMNS: Similar to RANGE and LIST, except that the partitioning key can be multiple COLUMNS or non-integers.
  • HASH partition: Modulates data to different partitions based on the given number of partitions.
  • KEY partition: Similar to HASH partition, but using the HASH function provided by MySQL.
  • Subpartition: Also known as composite partition or composite partition, that is, create a layer of partitions under the primary partition to split the data again.

Here’s an example of a LIST partition:

CREATE TABLE orders_list (
  id INT AUTO_INCREMENT,
  customer_surname VARCHAR(30),
  store_id INT,
  salesperson_id INT,
  order_date DATE,
  note VARCHAR(500),
  INDEX idx (id))ENGINE = INNODB
  PARTITION BY LIST(store_id) (
  PARTITION p1
  VALUES IN (1.3.4.17)
  INDEX DIRECTORY = '/var/orders/district1'
  DATA DIRECTORY = '/var/orders/district1'.PARTITION p2
  VALUES IN (2.12.14)
  INDEX DIRECTORY = '/var/orders/district2'
  DATA DIRECTORY = '/var/orders/district2'.PARTITION p3
  VALUES IN (6.8.20)
  INDEX DIRECTORY = '/var/orders/district3'
  DATA DIRECTORY = '/var/orders/district3'.PARTITION p4
  VALUES IN (5.7.9.11.16)
  INDEX DIRECTORY = '/var/orders/district4'
  DATA DIRECTORY = '/var/orders/district4'.PARTITION p5
  VALUES IN (10.13.15.18)
  INDEX DIRECTORY = '/var/orders/district5'
  DATA DIRECTORY = '/var/orders/district5'
);
Copy the code

Advantages of partitioning:

  • Expand the storage capacity.
  • Optimize the query. When partitioning conditions are included in the WHERE clause, only necessary partitions can be scanned to improve query efficiency. Colleagues can process queries involving aggregate functions such as SUM() and COUNT() in parallel on each partition.
  • For data partitions that have expired or do not need to be saved, you can delete partitions to quickly delete data.
  • Spread the query data across multiple disks for greater query throughput.

Partitioning practices: juejin.cn/post/684490…