background
Tidb is a technical term many students have more or less heard of, but many students think it is a distributed database, their business is to use mysql, basic use of this technology, may not know it. Recently, a business needs to use TIDB, so I learned the basic principles, I will find that these principles are not limited to distributed database, many technologies are common, so I write here to share some experience of learning TIDB.
Generally speaking, mysql is used in our business, but the single database capacity and concurrency performance are limited. For some large-capacity or high-concurrency scenarios, we will choose Sharing-JDBC. Using Sharding-JDBC does solve the problem, but it increases the difficulty of development. I need to set a sub-table key for each of my tables, and each query has to enter the value of this key, which increases the query limit. If there is no key value, all library tables have to be queried once, which is very inefficient. So we isomerized another piece of data into ES to satisfy the other conditions. How to solve this problem? Recently, the company has been promoting TIDB internally. I have checked that TIDB is basically compatible with mysql, with unlimited storage expansion, low development cost and good performance, so I decided to use TIDB.
History of Database development
Relational stand-alone database
Relational databases began with Edgar F.Codd’s introduction of a relational model in 1970. In the early stages of database development, there were many excellent commercial database products, such as Oracle/DB2. After 1990, there were open source databases MySQL and PostgreSQL. These databases constantly improve single-instance performance, coupled with Moore’s Law hardware upgrades, often support business growth well.
Distributed database
With the failure of Moore’s Law, the development of single database is difficult to cope with higher level challenges, so the emergence of distributed database, distributed database has the ability to cope with massive concurrency, massive storage, so it can cope with more difficult challenges.
- Nosql :HBase is a typical example. HBase is an important product in the Hadoop ecosystem. It is the open source implementation of Google BigTable. Of course, we are familiar with Redis, noSQL has some special application scenarios, so it has some disadvantages. Redis uses memory to store transactions. And noSQL is no longer a relational model.
- Sharding: We can still achieve the function of our distributed database through a stand-alone database. We can implement the function of distributing SQL to different shards through a component, such as sharing-JDBC and MyCAT, which are well-known open source ones, and DRDS on Ali Cloud. Sharing is difficult for operation and maintenance. If capacity expansion needs to be continuously manually migrated data, it also needs to specify a certain sharding key.
- Newsql: ACID transactions are guaranteed in NewSQL, the relational model is maintained, and SQL is supported. Some of the better known are Goole’s F1 and Spanner, Ali’s OceanBase and pingCap’s TiDB.
Pre-school questions
When we learn certain knowledge, we usually learn with some questions. Purposeful learning will help you get started faster. For TIDB or distributed database, I will have these questions when using:
- How to guarantee infinite scale? Since most of the sharding-JDBC technology is sharding-key, this kind of infinite expansion is quite troublesome, so I asked tiDB how to ensure infinite expansion at the beginning.
- How to ensure that the ID is unique, distributed database often will be sharded, in the standalone database, the increment of ID is not established, tiDB is how to guarantee it?
- How to guarantee transactions? We said that newSQL is required to support ACID transactions, so how does our TIDB guarantee this?
- How do you query data through indexes? How does TIDB speed up queries with indexes when a standalone database uses them?
tidb
architecture
Before answering our above questions, what is the overall architecture of TIDB?
Tidb is a typical decoupled architecture. For those unfamiliar with decoupled architecture, check out my previous article: Talk about computing and decoupled
- TiDB Server: the computing layer, the connection port of the external exposed protocol, responsible for managing the connection of the client, mainly doing is to execute SQL parsing and optimization, generate distributed execution plan, because here is the computing layer is no state, so it can be extended indefinitely.
- PD Server: As the brain of the whole cluster, PD is responsible for storing the real-time data distribution of each TiKV node and the overall topology of the cluster, providing the TiDB Dashboard management and control interface, and maintaining high availability.
- TiKV: K-V storage engine. Within TiKV, the basic unit of data storage is Region.
- Tiflash: This is a storage engine for columns
- TSpark: TiDB supports Spark, so tiDB is an HTAP database.
How to scale infinitely?
Let’s start with our first question, how does Tidb scale infinitely?
First let’s take a look at the computing layer: TIDB-server. We said that in the computing layer, it is stateless, so it can be extended indefinitely. If your scenario has high concurrency or many database connections, you can consider extending tiDB-Server multiple times.
Then let’s look at the storage layer. There is a kind of data cloud database that is often mistaken for distributed database, namely AURoradb of AWS and PolarDB of Ali Cloud. These two databases also adopt the architecture of separation of computing and storage, and can be expanded infinitely in the computing layer, but they use one data in the re-storage layer. This is also known as shared-storage architecture. These two databases rely on large disks to support higher volumes of data.
In TIDB there is a shared-nothing architecture, and the storage layer is also separated:
Each tikV is divided into several regions, which is our basic storage unit. If you look at this diagram, does this architecture look familiar?
Regions correspond to partitions in Kafka. Partitions in Kafka are also used to split the pressure of a topic between different brokers. The same applies to regions in TiDB.
Before we get into region, why is the storage engine called TikV? The reason is that the storage engine is just a key-value, which can be understood as a Hashmap in Java. In TiKV, we do not choose to develop how to land the map data, but to land the disk through a very excellent KV storage engine — Rocksdb. RocksDB is an open source KV high-performance stand-alone database of Facebook. Many companies have made many excellent storage products based on RocksDB. An article about RocksDB will be written in detail later.
Rocksdb is a stand-alone storage engine, so we need to ensure that data is not lost in the distributed environment. In Kafka, there are copies of other partitions that constantly pull the Leader copy and maintain it through an ISR mechanism. In TIkV, raft protocol is used to replicate data directly. Every data change is implemented as a RAFT log. Data is safely and reliably synchronized to each node in the replication group through the log replication function of RAFT. In real writing, however, according to Raft’s protocol, you only need to copy synchronously to most nodes to safely consider the data to have been written.
You can see that this is actually writing raft and writing Rocksdb via the RAFT interface.
Partition is not automatically expanded. In business development, the value is always constant. Region is different from region. The default size of region is 96MB. The number of regions increases with the amount of data we have. Of course, if the amount of data we have decreases, it will automatically merge.
How do you determine which region a data is in? Generally, hash(key) and range(key) schemes are available. In tiKV, rangekey is selected because it is convenient to split regions. Each region is a StartKey, EndKey representation:
When a region is split, you only need to add a region and transfer part of the data from the old region to the new region, for example, [a, b) -> [a, ab) + [ab, b). It will rehash all region data, so the range(key) method is used in tikV, and the merge method is the same.
So with TIDB, we can scale both the storage layer and the computing layer indefinitely.
How do I make my ID unique
Mysql > alter table AUTO_INCREMENT increment primary key AUTO_INCREMENT primary key AUTO_INCREMENT primary key AUTO_INCREMENT primary key AUTO_INCREMENT primary key AUTO_INCREMENT primary key AUTO_INCREMENT
- In MySQL5.7 and earlier: InnoDB engine self-increment, self-increment is stored in memory and not persisted. After each restart, when the table is opened for the first time, Max (id) is searched and Max (id)+ step is used as the current increment of the table.
- In MySQL8.0: Self-increment changes are recorded in the redo log, and the redo log is used to restore the original values during restart.
All of this is fine in a single machine, but in a distributed database, we can’t guarantee that ids are unique. I’ve written about this before: If anyone asks you about distributed ids again, throw this article to them. When we use Sharding-JDBC, we use leaf, the ID generation middleware introduced in the article, to complete ID generation.
AUTO_INCREMENT is also supported in Tidb, and the implementation principle is the same as that of the number segment mode in LEAF. Strict increment cannot be guaranteed, but only trend increment can be guaranteed. The specific principle is as follows: For each autoincrement column, a globally visible key-value pair is used to record the maximum ID currently allocated. In the distributed environment, node communication costs a certain amount. To avoid write request amplification, each TiDB node applies for a segment of IDS as cache when allocating ids, and then removes the next segment after the ID is used up, rather than applying for each allocation from storage nodes.
Tidb also supports AUTO_RANDOM, which can be used to solve hot issues caused by tables with integer increment primary key columns when large volumes of data are written to TIDB. Since region is ordered, a large number of ordered data may be generated in the same region during a period of time, so we can use AUTO_RANDOM to split our primary key data.
How to guarantee transactions
ACID is one of the four properties of transactions in Mysql innoDB.
- A: Atomicity refers to the fact that all operations in A transaction, either complete or not complete, do not end up in some intermediate link. In mysql, we rely on both Redolog and Undolog to complete atomicity
- C: consistency, which means that the integrity of the database is not compromised before and after a transaction. Consistency depends on several other features.
- I: Isolation, the ability of a database to allow multiple concurrent transactions to read, write and modify its data simultaneously. Isolation prevents data inconsistency caused by cross execution when multiple transactions are executed concurrently, and is mainly used to deal with concurrent scenarios. Mysql isolation relies on locks and MVCC. There are many kinds of locks in mysql. Mysql supports many kinds of isolation.
- D: Persistence. After a transaction, changes to data are permanent and will not be lost even if the system fails. Persistence depends on redolog and mysql’s flush mechanism.
What does ACID do in TIDB?
- A: The atomicity of the Region where the Primary Key resides is used to ensure the atoms of distributed transactions.
- C:TiDB checks data consistency before writing data. Data is written to the memory only when the check succeeds.
- I: Isolation is also achieved through locks and MVCC, but only RR(repeatable read) level is supported in TIDB, RC isolation level is also supported in optimistic mode after 4.0.
- D: Once the transaction is submitted successfully, all data will be stored to TiKV persistently, and there is a multi-copy mechanism, so data will not be lost in case of downtime.
In mysql, transaction models are pessimistic transaction models, while in TIDB, transaction models provide both optimistic and pessimistic models.
- Pessimistic model: as the name suggests, as long as the transaction executes with the assumption that every piece of data you modify has a high probability of being modified by another transaction (pessimistic view). In mysql, if you change a row in a transaction, you will be locked. If another transaction tries to change the row, the other transaction will block and wait. It can be simply understood as executing while detecting conflicts.
- Optimistic model: We believe that there is a high probability that the data we modify will not conflict with other transactions, so we do not need to conduct conflict detection during execution, but at the end of the submission. Higher performance can be achieved if there are fewer collisions.
How are these two modes implemented in TIDB? Because we’re a distributed database, two-phase commit is generally a general solution for distributed transactions, and I’ve written a lot about distributed transactions before so you can check it out for yourself.
Optimistic mode
Tidb also uses two-phase Commit to ensure atomicity of distributed transactions, divided into Prewrite and Commit phases:
- Prewrite: Detects conflicts for each Key changed by a transaction and writes a lock to prevent other transaction changes. For each transaction, TiDB selects one of the keys involved in the change as the Primary Key of the current transaction. The Primary Key needs to be modified before the transaction is committed or rolled back, and its commit or not is used as the identification of the result of the entire transaction execution.
- Commit: Prewrite After the Primary Key is committed, the transaction is committed successfully, and other Secondary Keys are committed asynchronously.
The whole transaction steps are as follows:
- Step 1: The client starts a transaction, similar to what we do in mysql
begin trasaction;
- Step 2: TiDB obtains the global time from PD to know the global order of this transaction for subsequent MVCC processing
- Initiate DML, such as update XXX. At this time, there is no collision detection, only the TIDB memory is saved;
- Step 4: Commit the transaction, which is similar to the commit in mysql. Tidb will complete the two-stage commit in the commit phase, perform prewrite lock detection first, and then commit if there is no problem.
Here’s an example:
begin; //step1 insert into xx; // step3 update xx; // step3 update xx; // step3 commit; // step4Copy the code
In the above example, if the pessimistic mode is step3, the lock detection will be carried out. In the optimistic mode, all the work is put in the COMMIT, so there will be commit abnormal state, so we use the optimistic mode to better deal with the abnormal behavior in the commit stage, which is different from our general programming. However, optimistic patterns can be used to improve performance if data competition is not too intense.
Pessimistic schema
Pessimistic locks are advanced. Pessimistic locks are added to each DML, and the locks are written to TiKV. Raft synchronization is also used to check various constraints such as Write Conflict and key uniqueness constraints when pessimistic locks are added.
Pessimistic transactions guarantee our commit success, this mode is more consistent with our programming mode, so tiDB default mode is also pessimistic mode.
How to do index query
Why did I think of index queries? At that time, after seeing that Rocksdb is the underlying storage medium of TIDB, I thought of our index in InnoDB is B+ tree, if tiDB index is B+ tree, then how should Rocksdb be constructed?
In fact, indexing in TIDB is also done in k-V format. Let’s first look at how each row is stored:
- To ensure that data from the same table is kept together for easy lookup, TiDB assigns each table a TableID, represented by TableID. The table ID is an integer that is unique within the entire cluster.
- TiDB assigns each row in the table a RowID, represented by RowID. The row ID is also an integer that is unique within the table. For row ids, TiDB makes a small optimization. If a table has an integer primary key, TiDB uses the value of the primary key as the row ID.
Each row is encoded as a (Key, Value) pair according to the following rules:
Key: tablePrefix{TableID}_recordPrefixSep{RowID}
Value: [col1, col2, col3, col4]
Copy the code
Suppose our tablePrefix is constant character t,recordPrefixSep is constant character R, our tableId is 1,rowID is our primary key here assume 100, if there is a user table data as follows:
Key: t1_r100
Value: [100, "zhangsan"]
Copy the code
If our primary key is an integer, then the above can also be regarded as our primary key index. If our primary key is not an integer, or if our primary key is unique, the code is as follows:
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue
Value: RowID
Copy the code
IndexId is the ID assigned by TIDB to each index. Therefore, in the above case, only one indexedColumnsValue can be unique. If the index is not unique, we can have:
Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue_rowId
Value: null
Copy the code
An indexedColumnsValue can have multiple rows of data. Therefore, the index of the data in our region is not combined with the data of the region, but has its own region fragment. Similarly, when we query data, we need to rely on our TIDB-server to figure out what index we should use. RowId is first queried based on index data and then our corresponding data is queried based on rowId.
conclusion
Whether it is TIDB or distributed database, there are many knowledge to learn, the above is just to do some rough analysis of TIDB, if you want to learn you can look at some of the following information:
- Pingcap document: docs.pingcap.com. Ping Cap document is one of the best documents I have ever seen. It is not a document, but a knowledge base of articles.
- Geek Time “Distributed Database” : Geek Time has a course called distributed database, which is not limited to TIDB, but mainly explains various knowledge of distributed database, and will list distributed database in the market for comparison.
- “Inside database System” : Douban score 8.5, this book explains a lot of basic knowledge of database theory, whether on distributed database or stand-alone database will be used, a little difficult, but there will be a lot of harvest.
If you find this article helpful to you, your attention and forwarding will be my biggest support.