The basic concept

1. Single library is a library

Sharding solves the problem of scalability. Sharding introduces the concepts of data routing and sharding keys. Partition table solves the problem of large amount of data, and partition database solves the problem of database performance bottleneck.

3. Groups to solve availability problems. Groups are usually implemented through replication. (Various availability level schemes are presented separately)

4. The actual software architecture of the database of Internet companies is (under large data volume) : sharding and grouping (as shown below).

Introduction to data sharding and issues

Data sharding is the process of dividing data stored in a single database into multiple databases or tables according to a certain dimension to improve performance bottlenecks and availability. Data sharding can be divided into vertical sharding and horizontal sharding

Vertical fragmentation

Vertical fragmentation

Vertical sharding: Data is split based on services. The core is dedicated database. That is, multiple tables are classified and stored. Different tables in the same library are stored in different libraries according to different services to reduce the access pressure of a single library. Advantages: It can relieve the pressure of data volume and page view to a certain extent. Problem: Vertical sharding often requires architectural and design adjustments. This results in an inability to respond to business requirements in a timely manner, as well as a failure to completely resolve a single point of problem due to data hotspots.

Level of fragmentation

Horizontal sharding: Splits data from a single table. The split criteria are no longer based on business logic, but rather fields that divide data into different tables or libraries according to some rule. Each shard stores only a portion of data. Sharding by primary key is common. Such as: odd put 0 library, even put 1 library.

Advantages: Horizontal sharding theory breaks through the bottleneck of single machine data processing, can be infinite horizontal expansion. It is the ultimate solution to the limitation of data volume and access. Disadvantages: Heavy database operations and cumbersome data location. Associative operations (grouping, aggregation, paging, sorting, and so on across cells) require special handling. Table name Because database sharding is the partitioning of the whole data into multiple pieces, the concept of logical tables and real tables came into being. Real table: the table that actually stores data. Logical table: is the original data in a single database table. The sum of real tables. For example, if a real table could be divided into order_0 and order_1, then the logical table would be the ORDER table, which represents the state of the order table when it is not shard. Distributed transaction, cross-library join(query). Split table operation. If you want to dynamically add tables, if the hash method is adopted, there will be data migration caused by Rehash

The relationship between shard and shard table

Vertical and horizontal sharding is a distributed database architecture scheme. Branch library and table is the realization way. Branch library can be vertical or horizontal, and branch table can also have two ways of vertical and horizontal segmentation.

Reading and writing separation

Primary and secondary database is a simple solution to solve the single point problem of database, that is, according to the read and write operation of the database, to reduce the access pressure of a single database. Application scenario: For a system with too many reads and too few writes, update operations and query operations are split to avoid row locks caused by data updates and improve the query performance of the entire system. One master, multiple slaves: Distributes query requests to further improve query performance. Single-point problems: Multi-master multi-slave: improves system throughput. Availability problems: Data inconsistency. Between master and slave libraries. For example, primary key consistency problems. If auto_INCREMENT is used as the primary key of the table and bidirectional synchronization is performed between the primary database and database B,a primary key conflict may occur when database A is synchronized with database B after database B is synchronized.

Data consistency between primary and secondary databases

Principle of master-slave data consistency: The master library is used for write and update operations; The slave library is used for data read operations. When data synchronization between the master and slave databases is delayed, the consistency between the master and slave databases is a problem. SQL > write >binlog >IO thread > slave database delay cause: When reading binlog from master database, SQL will be executed in serial, while the master database will execute SQL in parallel. 1. Reduce the read pressure from the slave library. Methods: Divide the library, add the configuration of the slave library machine, add the slave library machine (one master and many slaves), add the service cache 2 directly connect to the master library. The main memory database becomes meaningless. ShardingSphere will directly connect to the master library to avoid data inconsistency when it performs write operations on the same thread and database connection.

Delay of read/write separated data

The main processing methods are as follows:

Ignore applications that actually write less and read more to see if your business allows some degree of data inconsistency. It feels like an industry like bidding doesn’t have such a strong consistency requirement for data. Actual technology selection always compromises accuracy and convenience.

Compulsion to read the master database requires high real-time performance of data. The fundamental method is to read the master database, because the data of the master database will not be delayed. Read from library data

Selective read primary Records databases, tables, primary keys, message durations (master/slave synchronization delay) by caching

Add centralized cache Before data is stored in the cache. At this time, the PC saves the data, and the mobile terminal can get the corresponding order data from the cache when logging in. There are hot data issues with this approach.