Database bottleneck

Either an IO bottleneck or a CPU bottleneck will eventually lead to an increase in the number of active connections to the database, approaching or even reaching the threshold of the number of active connections the database can support. In the case of a business service, there are few or no database connections available, and you can imagine the rest (concurrency, throughput, crashes).

IO bottleneck

  • The first type: disk read IO bottleneck, too much hotspot data, database caching, each query will produce a large number of I/O, reduce the query speed -> sub-library and vertical sub-table
  • Second: network IO bottleneck, request too much data, network bandwidth is not enough -> branch library

CPU bottleneck

  • The first kind: SQl problem: if SQl contains JOIN,group by, order by, non-index field condition query, etc., increase THE OPERATION of CPU operation -> SQl optimization, establish appropriate index, and carry out business calculation in the business Service layer.
  • The second type: The amount of data in a single table is too large, and too many rows are scanned during query. The SQl is inefficient, and CPU operations are increased. -> Level table.

Depots table

The level of depots


  • The structure of each library is the same
  • The data in each library is different and there is no intersection
  • 3. Scenario: The absolute concurrency of the system comes up, it is difficult to fundamentally solve the problem of table division, and there is no obvious business ownership to divide the database vertically. 4, analysis: library more, IO and CPU pressure naturally can be doubled relief

The level of table


  • Each table has the same structure
  • The data in each table is different and there is no intersection. The union of all tables is full data. 3, scenario: the absolute concurrent amount of the system is not up, but the amount of data in a single table is too much, affecting SQL efficiency, increasing the CPU burden, so that it becomes a bottleneck, can consider horizontal table. 4, analysis: the amount of data in a single table is less, the single execution of SQL execution efficiency is high, naturally reduce the burden of the CPU.

Vertical depots


  • Each library has a different structure
  • The data in each library is also different and there is no intersection
  • The union of all libraries is the full amount of data 3, scenario: the system absolute concurrency up, and can be abstracted out of a separate business module. 4. Analysis: At this point, you can basically servitize. For example, with the development of services, there are more and more common configuration tables and dictionary tables. In this case, you can separate these tables into a separate library or even service them. Furthermore, as a business grows and incubates a business model, related tables can be separated into a separate repository or even servitized.

The vertical table


  • Each table has a different structure.
  • The data of each table is also different. In general, each table has at least one intersection of fields, usually a primary key, which is used to associate data.
  • The union of all tables is full data. 3, scene: system concurrency value is not absolutely, table records is not much, but more fields, and hotspot data and the hot data together, single data the required storage space is larger, so that the database cache data lines to reduce, the query to read disk data to produce a large number of random IO, IO bottleneck. 4. Analysis: Use list pages and detail pages to help understand. The split principle of vertical split tables is to put hotspot data (data that may be frequently queried) together as the primary table and non-hotspot data together as the extended table. In this way, more hotspot data can be cached and random READ IO is reduced. After the split, to get all the data needs to associate two tables to fetch the data. But don’t use Joins, because joins not only add CPU overhead but also couple two tables together (on one database instance). The associated data should be done at the Service layer, fetching data for the primary and extended tables separately, and then associating all data with associated fields.

Subdatabase and subtable tools

  • Sharding – JDBC (dangdang)
  • TSharding (Mushroom Street)
  • Atlas (Qihoo 360)
  • Cobar by Alibaba
  • MyCAT (based on Cobar)
  • Oceanus (58.com)
  • Vitess (Google) self-check the pros and cons of various tools

Problems brought by separate database and separate table

Database and table partitioning can effectively relieve the performance bottleneck and pressure brought by single machine and single table, and break through the bottleneck of network IO, hardware resources, and connection number. At the same time, it also brings some problems. The following describes these problems and solutions.

Transaction consistency problem

Distributed transaction

When updating content exists in different libraries at the same time, it inevitably introduces cross-library transaction problems. Cross-shard transactions are also distributed transactions and there is no simple solution. They can be handled using XA protocol and two-phase commit. Distributed transactions can maximize the atomicity of database operations. However, multiple nodes need to be coordinated when committing transactions, which delays the time point of committing transactions and prolongs the execution time of transactions, leading to an increase in the probability of conflicts or deadlocks when transactions access shared resources. This trend becomes more and more severe as more database nodes are added, thus becoming the shackles of the system scaling horizontally at the database level.

Final consistency

For those systems with high performance requirements but low consistency requirements, the real-time consistency of the system is often not required, as long as the final consistency can be achieved within the allowed period of time, transaction compensation can be adopted. Transaction compensation is a kind of after-the-fact check and remedial measure, which is different from the way that the transaction is rolled back immediately if an error occurs in the execution of the transaction. Some common implementation methods include: reconciliation of data, log-based comparison, regular synchronization with standard data sources, etc.

Query the join problem associated across nodes

Before sharding, the data of many lists and detail tables in the system can be completed by Join. However, after the sharding, the data may be distributed on different nodes. In this case, the problems caused by Join are quite troublesome. Some solutions:

Global table

Global tables, also known as “data dictionary tables”, are tables that all modules in the system may depend on. To avoid library join queries, you can keep a copy of these tables in each database. These data are usually rarely modified, so you don’t have to worry about consistency.

Fields redundancy

A typical anti-paradigm design uses space for time to avoid join queries for performance. For example, when saving the userId in the order table, also save a redundant copy of userName, so that query order details in order table can find the user name userName, there is no need to query the buyer user table. However, this method is also applicable to limited scenarios. It is more suitable for a few dependent fields, and it is difficult to guarantee the consistency of redundant fields.

Data assembly

At the level of system service business, there are two queries. The result set of the first query finds out the associated data ID, and then the associated data is obtained according to the second request of the ID initiator. Finally, the obtained results are used for field assembly. This is a common method.

ER shard

In a relational database, if the association relationship between tables (such as order table and order detail table) has been determined, and those table records with association relationship are stored in the same shard, then the problem of cross-shard join can be better avoided, and join can be carried out in a shard. In the case of 1:1 or 1: n, primary key sharding is usually done by the ID of the primary table.

Cross-node paging, sorting, function problems

Problems such as limit paging and order by sorting may occur when querying across multiple libraries. Paging needs to be sorted according to the specified field. When the sorting field is the paging field, it is easier to locate the specified fragment through the sharding rule. When sorting fields are not sharded fields, things get more complicated. The data need to be sorted and returned in different shard nodes first, then the result sets returned by different shards are summarized and sorted again, and finally returned to users as shown in the figure below:


Global primary key avoidance problem

In the environment of partitioned database and partitioned table, since the data in the table exists in different databases at the same time, the self-growth of the primary key value is useless, and the self-generated ID of a partitioned database cannot guarantee the global uniqueness. Therefore, global primary keys need to be designed separately to avoid cross-library primary key duplication. Here are some strategies:

UUID

The standard UUID format is 32 hexadecimal digits divided into five segments of 36 characters of the form 8-4-4-12. UUID is the simplest solution, which is locally generated and has high performance and no network time consuming. However, it has obvious disadvantages and occupies a lot of storage space. In addition, there are performance problems in creating indexes as primary keys and querying based on indexes, especially in InnoDb engine.

Maintain the primary key ID table with the database

Create sequence table in database:

CREATE TABLE `sequence` (  
  `id` bigint(20) unsigned NOT NULL auto_increment,  
  `stub` char(1) NOT NULL default ' ',  
  PRIMARY KEY  (`id`),  
  UNIQUE KEY `stub` (`stub`)  
) ENGINE=MyISAM;
Copy the code

The stub field is set as a unique index, and the same stub value has only one record in the Sequence table. You can have the global ID of the birthdays of multiple tables at the same time. Using the MyISAM engine instead of InnoDb has achieved higher performance. MyISAM uses a table lock, and reads and writes to the table are serial, so you don’t have to worry about reading the same ID twice concurrently. When a globally unique ID is required, execute:

REPLACE INTO sequence (stub) VALUES ('a');  
SELECT 1561439;  
Copy the code

This scheme is relatively simple, but its disadvantages are obvious: there is a single point of problem, strong dependence on DB, when DB is abnormal, the whole system can not be used. Configuring the primary and secondary can increase availability. In addition, the performance bottleneck is limited to the read and write performance of a single Mysql server. There is another primary key generation strategy, similar to the Sequence table scheme, which better solves the single point and performance bottlenecks. The overall idea of this scheme is to set up more than two servers for global ID generation, deploy only one database on each server, and each database has a sequence table to record the current global ID. The step size that grows in the table is the number of libraries, starting with staggered values so that the generated ID is hashed to each database

Snowflake Distributed auto-increment ID algorithm

  • The first digit is unused
  • The next 41 bits are milliseconds, and the length of the 41 bits represents 69 years
  • 5-bit datacenterId, 5-bit workerId. A maximum of 1024 nodes can be deployed in 10 characters
  • The last 12 bits are counted in milliseconds, and the 12-bit counting sequence number supports 4096 ID sequences per millisecond per node.

Data migration and capacity expansion problems

When services develop rapidly and face performance and storage bottlenecks, fragmentation design is considered. In this case, historical data migration must be considered. In general, historical data is read first and then written to each shard node according to the specified sharding rules. In addition, it is necessary to plan the capacity based on the current QPS of data volume and the speed of business development, and calculate the approximate number of sharding (it is generally recommended that the data volume of a single table for a single sharding does not exceed 1000W).

When to consider the separate database and table

Divide as much as you can

Not all tables need to be shard, but it depends on the growth rate of the data. The sharding has increased the complexity of the business to some extent. Do not use sub-database sub-table until absolutely necessary, avoid “over-design” and “premature optimization”. Before separating database and table, try to optimize as much as you can: upgrade hardware, upgrade network, read/write separation, index optimization, etc. When the amount of data reaches the single table bottleneck, we consider sub-table.

A large amount of data affects service access

Operation and maintenance here refers to:

  • For database backup, if a single table is too large, the backup requires a large amount of disk I/O and network I/O
  • If you DDL a large table, MYSQL will lock the entire table for a long time. During this time, services cannot access the table.
  • Large tables are accessed and updated frequently, and lock waits are more likely.

As the business evolves, some fields need to be vertically split

I’m not going to give you an example. In real business situations, some infrequently accessed or infrequently updated fields should be separated from larger tables.

The amount of data is growing rapidly

With the rapid development of services, the amount of data in a single table will continue to grow. When the performance reaches the bottleneck, it is necessary to consider horizontal shard and do database and table division.


Refer to the link: www.cnblogs.com/butterfly10… www.cnblogs.com/littlechara…