background

In the case of increasing data volume, affecting the read and write performance of database, we generally have the scheme of dividing database and table and use newSql scheme, such as TIDB. So why use TiDB? When should TiDB be used? What problems do you solve with traditional database tables? It will also explain some key points and trampling points. I’m going to use a more colloquial interpretation as a generalization of TiDB.

To share with you the Internet scene design and architecture design scheme digging gold: Underground Collection Kelvin juejin.cn/user/104639…

The current pain points

At present, no matter using native JDBC+ThreadLocal scheme, middleware proxy, or the form of SDK embedded code, that is, using Sharing-JDBC, Zdal, myCAT, the following problems exist in the branch table.

  1. Selection of subdatabase and subtable algorithm scheme
  2. The subsequent maintenance work brought after the database and table are divided. Each time a node is added, disks and machines need to be applied for
  3. New nodes need to be stopped and then migrated, which affects online users’ read and write operations in real time. Migration failure and code rollback. Before migrating, wait for mysql to have no binlog generated before migrating.
  4. After the sub-table, cross-library consistency problem, is the use of final consistency, code maintenance tedious.
  5. The data storage pressure and data storage quantity are offset from a node
  6. Data index query efficiency: even if it is separated, to go through the index query, in fact, still need to query multiple libraries after the results of aggregation.
  7. Left join other tables across libraries is not supported
  8. Unique indexes are not guaranteed to be unique across libraries and tables, in scenarios such as payment sequence numbers. Now the unique key of the table is controlled by the application layer code.
  9. Table add field trouble. Add each table to each library
  10. Access to elastic-search requires multiple libraries to access the elastic-search node
  11. Cannot do paging query

purpose

How does analysis TiDB address pain points

Overall architecture of TiDB

TiDB is a distributed database. In form, it’s more like Hadoop, where you distribute data on different machines, and you have copies, and you have machines that do the calculations, and you have machines that do the storage.

The entry layer is TIDB-server. Tidb in the figure is the entry layer for clients to access and is responsible for processing request interfaces. This layer has low requirements on storage, but high requirements on CPU because it is used for computing, and records the responsible scope of each region. The second layer is PD, which is responsible for scheduling, such as in the form of ZooKeeper, data migration and election scheduling. The third layer is tikV, or store, which is responsible for the actual storage of data. Tikv consists of one or more regions. Region is the smallest storage unit, just as the meaning of region in THE JVM G1 algorithm. Each Region will be broken up and distributed under each TikV.

Data storage model

  1. Row data (metadata) A table will be stored by one or more regions. Different tables will be in different regions, rather than having the same tables in each repository as in traditional branches. How to determine which Region to store each row of data in a table? First, the Region contains a Map. The key consists of the table_ID table ID and the ROWID primary key. Such as:

t[table_id]_r[row_id]

The value of the map is the actual data for each row in the table.

  1. Index data Index data will be stored in another Region. Each index will have a Region corresponding to that index. Its Map key consists of the table_id, index_id, and index column values encoded. Such as:

t[table_id]_i[index_id][index_value]

Value is roWId so that you can use ROWID to find the location of the above table data. SQL > select * from primary key; select * from primary key; select * from primary key;

Data segmentation

To locate a Region, you can use the Key to calculate the Region in which the Region is located. It is different from the hash algorithm that implements consistency based on a field. Regions of TiDB responsible for real row data are divided using primary key ranges. If there is an index, the Region responsible for the index is divided according to the index field range. Based on the Key, a number is obtained and multiple regions are divided according to the range. Each Region is managed by a Region.

For example, a table data primary key ROWID falls on three regions. [00 00 00 0100) [10001200) [20001300)

This range requires the rule to be determined before the data is entered into the table.

Since Region will be distributed on all TiKV, there will be multiple servers to store data. Therefore, multi-machine CPU and disk are used to solve the storage pressure of pain point 5 and the algorithm scheme for pain point 1’s sub-database and sub-table. Only the primary key range needs to be determined.

We’ll talk about how to expand later.

Improve indexing efficiency

Existing problem: The index of the traditional database and table is in each mysql instance, following the table. The rules for dividing databases and tables are generally based on the userID user field or field with high combination in the table. The same userID will be assigned to the same library or table.

In this case, if the index field in the table is assumed to be code, then the code=” AAA “may be in different libraries because of different userIDS, and need to query the full number of libraries and tables, and then reaggregate, which will increase the CPU query consumption, as well as TCP connection handshake consumption.

TiDB solution: The Key of its data structure is determined by table ID + index ID + index value ID. Value is the primary Key of roWID data rows. In addition, a Region manages a range of keys, so the same index and value will be in the same Region. In this way, it is easier to quickly locate regions with the same index value and obtain the corresponding ROWID. Then, the real data of the table can be found more quickly based on the ROWID from the Region storing table data. The mysql index lookup mechanism is to first find the index value, then find the primary key of the cluster and return the whole row of data, thus improving performance.

This is a bit like an inverted index of elastice-Search, where the data is located by value first. Here the pain point 6 is addressed to reduce index query stress.

TIDB features

1. Provide optimistic and pessimistic transaction models

Before 3.0.8, there was only optimistic transaction model, which was committed by 2PC twice. If the pessimistic transaction model is enabled, it is more like a Sharding-JDBC flexible transaction, with retry capability, but failure after multiple retries (256) will still be lost.

1.1 Advantages and Disadvantages analysis

TiDB transactions have the following advantages:

  • The implementation principle is simple and easy to understand.
  • Cross-node transactions are implemented based on single instance transactions.
  • Lock management is decentralized. However, TiDB transactions also have the following disadvantages:
  • Two-phase commit allows for increased network interaction.
  • A centralized version management service is required.
  • A large amount of transaction data may cause memory inflation.

1.2 Transaction retry

When using the optimistic transaction model, it is easy to commit a failed transaction in a scenario with high collision rates. However, MySQL uses a pessimistic transaction model internally, and conflicts are detected during the execution of SQL statements, so it is difficult to commit exceptions. To accommodate MySQL’s pessimistic transaction behavior, TiDB provides a retry mechanism. This aggravation is the pessimistic transaction.

To solve the pain point 4 mentioned above, it is no longer necessary to maintain the code of the final consistency of the transactions processed across the library, such as the transfer from user A to user B, the transaction situation between merchants and buyers, and the transaction situation when merchants have more income. Although multiple retries still fail, this part is handled by TiDB. If the system before the cross-library transaction has a framework, then now there is no need to rely on the sharding-JDBC SDK way to run the program to retry, otherwise if our program down machine retry will not be.

2. Automatic capacity expansion

2.1 Region division

A Region is the smallest storage unit. When a certain amount of data enters a Region, the Region will be split (the default value is 1/16 of the area covered by an existing Region). Note: The Range of Key in the data table must be set in advance. TiKV is dynamically split according to the size of Region.

This is to solve the problem of pain point 2, which needs to apply for resources every time, and does not need operation and maintenance to migrate data before going online. Pain point 3, which needs to stop during migration, affects the generation of users.

Since TiDB, as middleware, does not carry any business attributes, it is not possible to use fields such as userID to make keys and custom algorithms for sharding rules. Primary keys are the most common choice. (Actually, I think it would be great if TiDB could do it)

2.2 Adding a Storage Node

  1. TiDB automatically migrates a Region when a node is added or a Region is split. There is no need to invade code, or use middleware to do the logic of database and table, data migration and on-line drill, and the whole process is handed over to operation and maintenance (manual shaking).
  2. And there is no code service downtime, no need to wait for no new SQL execution before migrating, this is live migration of data during runtime.

This solves pain point 3, downtime, data migration, and pain point 5, storage stress.

3. Copy Dr

Each Region maintains a continuous segment of cluster data (about 96 MiB on average in the default configuration). Each Region stores multiple copies of data (three copies by default) in different stores. Each copy is called a Peer. Multiple peers in the same Region synchronize data using raft protocol. Therefore, Peer is also used to refer to members in the RAFT instance.

Therefore, if there is 100 million data, 300 million data will be stored in the disk, which consumes the disk but improves the reliability.

TIDB cost

  1. The official recommendation is to deploy at least 3 TiKV, 3 PD, and 2 TiDB.
  2. TiDB needs to be able to use a large number of threads, PD needs a good CPU, TiKV needs SSD and CPU good.
  3. I saw in the forum that everyone is using 100G memory, disk is 2T SSD. Since each row has a total of three copies, it consumes a lot of disk. So a system using a set of TiDB needs a lot of cost.
  4. However, this is only required for one system, and when there are multiple systems in a project, more resources are consumed. And with more data there will be more resources.

Usage scenarios

  1. When the amount of data reaches a certain level, it needs to reduce the query pressure or insufficient connection pool and other factors. Because the official recommendation is to have 2 TIDB-Servers, at least 2 PD, and 3 TIkV, and tikV needs to be SSDS. So at this cost, not all projects will use it, and the company may not be willing to spend the cost to use it. In the case of small amount of data, it is recommended to use mysql until the amount of data comes up, and then do data synchronization to TiDB.
  2. If you want to use TiDB, you need to use TiDB Data Migration.
  3. First for projects in the architecture group of the company, then for projects that are not core business, and finally rolled out for core projects.
  4. It’s expensive to get started, and it’s expensive to experiment, because the officially recommended deployment method requires multiple good machines.

Precautions and pits

  1. 3.0.4, 3.0.8 or 4.0.0 is recommended (now 2 April 2020). Version 2.0 is not recommended, otherwise there will be incompatible upgrade issues that need to be resolved.
  2. When a node is added for capacity expansion or a Region is split, SQL inserts or updates data at the same time. If a Region is being migrated, an INSERT or update error may occur. Not leader is not found. But there will be a retry, the data is finally committed.
  3. Set the fragment range rule of the routing Region in advance. Otherwise, data will be imported to the same node. If your primary key data is derived from the Snowflake algorithm, then you need to find the maximum and minimum values to calculate the range yourself and manually set the range rules.
  4. TiDB does not support SELECT LOCK IN SHARE MODE. When executed with this statement, the effect is the same as without the lock, and it does not block reads and writes of other transactions.
  5. Do not use Syncer to synchronize data or migrate to TiDB, because if you use Syncer to synchronize data when you already have separate databases and tables, it will cause problems in one of the posts. TiDB Data Migration is recommended
  6. TiDB cannot modify field types

conclusion

In fact, with the above functions, you can reduce the development, operation and maintenance costs of sub-database sub-table, mainly the usual sub-database sub-table to a certain amount of migration, often need to monitor whether to the amount of migration, migration need to drill, migration to update code or configuration and stop business is the biggest impact. Although the completion of the sub-database sub-table seems to solve some problems, but there are still a lot of follow-up, TiDB has solved the above problems for us, so that we can focus more on doing business.


Welcome to pay attention to the article faster step

My public account: Tibet thinking

Nuggets of gold: Hide Kelvin

Jane: Hide Kelvin

My Gitee: Underground Collection Kelvin gitee.com/kelvin-cai