This paper introduces TiDB, a new generation of NewSQL database, in detail from its birth background, overall architecture, capability characteristics and compatibility monitoring.


Review of previous article:
SOAR Usage Guide

Today’s database variety, RDBMS (relational database), NoSQL (Not Only SQL), NewSQL with their strengths, in the database field have a place, it can be said that the trend of a hundred schools of thought contend. First up, DBEngines’ August 2018 database rankings:

We can see that the competition between database shares is still very fierce. This article will introduce newSQL-TIDB, a rising star in the database field. Due to the rapid update of TiDB, there will be differences between this article and the latest version.

First, TiDB birth background

At present, the representative of RDBMS is Oracle, MySQL, PostgreSQL, the traditional relational database has a long history, in the field of database is “generation” relatively high, it is widely used in all walks of life. However, this kind of database has some problems, such as the limitation of its own capacity, RDBMS mostly for local storage or shared storage. As the volume of services continues to increase, the capacity gradually becomes a bottleneck. At this time, DBAs will ease the capacity problem by sharding the database table for several times. A large number of sub-databases and sub-tables not only cost a lot of manpower, but also complicate the routing logic of business accessing database. In addition, RDBMS scalability is poor, cluster expansion and scaling costs are usually high, and do not meet the requirements of distributed transactions.

NoSQL databases are represented by Hbase, Redis, MongoDB, Cassandra, etc. This kind of database solves the problem of poor RDBMS scalability and makes cluster capacity expansion more convenient. However, because the storage mode is multiple KV storage, the compatibility of SQL is greatly reduced. For NoSQL class database, it can only meet the characteristics of partially distributed transactions.

NewSQL is represented by Google’s Spanner and F1, which claim to implement global data center disaster recovery and fully satisfy distributed transaction ACID, but can only be used in the Google cloud.

TiDB was born in the background, but also make up for the vacancy in the field of NewSQL in China. TiDB has been more than 3 years since the first line of code was written in May 2015, and dozens of versions have been issued, with rapid iteration. The latest version is 2.0.6, which has received more than 14,000 likes on GitLab.

Ii. TiDB architecture features

1. Overall architecture of TiDB

The following figure shows the basic architecture of TiDB. TiDB can be divided into three types of nodes: PD Server, TiDB Server and TiKV Server.

PD Server is responsible for storing the metadata of the cluster, assigning the global transaction ID to each transaction, and scheduling and load balancing TiKV cluster data.

TiDB Server is responsible for receiving user requests, parsing them into execution plans, addressing data through PD Server, and then interacting with TiKV Server node for query.

TiKV Server is responsible for storing cluster data.

When Client submits a task, it will be forwarded by LB layer and submitted to TiDB Server cluster. PD Server will assign a global transaction ID to each transaction. Then TiDB Server will parse the application into a specific execution plan and obtain the data storage address from PD cluster. Query information by interacting with the TiKV Server node.



2.TiDB capability characteristics

Computing power: The TiDB Server itself is stateless, meaning that when computing power becomes a bottleneck, the machine can be directly expanded, transparent to users. Theoretically, there is no upper limit to the number of TiDB servers.

Storage capacity: TiKV Server is usually 3+. Each copy of data in TiDB is 3 copies by default, which is similar to HDFS, but data is replicated by Raft protocol. Data on TiKV Server is based on Region and is uniformly scheduled by PD Server cluster. Region scheduling similar to HBASE.

3. High availability of TiDB

Every role in TiDB is highly available. The failure of a single node does not affect the entire cluster. There are multiple TiDB servers. If the TiDB Server fails due to statelessness, the Applcation connects to other nodes by retry. PD Servers are generally 2N +1. PD Servers are elected through Raft protocol. After the Leader breaks down, followers are elected as the Leader and continue to complete the work. Data in each TiKV node is stored in the KV structure and is hash to a Region in key-range mode. Two copies of each Region are distributed to nodes that cannot communicate with each other.

4. Compatible with MySQL

TiDB is basically compatible with MySQL and can be transparently switched from MySQL to TiDB when used by users, but the “new MySQL” backend storage is “unlimited” and is no longer constrained by Local disk capacity. TiDB can also be attached to the MySQL master-slave architecture as a slave library for operation and maintenance.

5. Efficient storage solution

As mentioned above, the data format of TiKV cluster is KV. In TiDB, the data is not directly stored in HDD/SSD, but the tB-level localized storage scheme is realized through RocksDB. The architecture of RocksDB is not described here, but can be searched for relevant documents if you are interested. Similar to HBASE, RocksDB uses THE LSM tree as a storage solution to avoid massive random read and write operations caused by the expansion of leaf nodes in B+ trees. How to improve the overall throughput.

6. TIDB monitoring

Open source Prometheus was selected in TiDB to monitor the whole cluster. Data of all nodes would be collected and reported by Multiple roles on each node and pushed to PushGateWay, which would receive all data pushed by all clients. Prometheus Server periodically pulls data from the GateWay and monitors the entire monitoring through Grafana for visualization and monitoring queries.

Third, summary

TiDB as a new generation of NewSQL database, has gradually established in the field of database, combines the Etcd/MySQL/HDFS/HBase/Spark and so on the prominent characteristics of technology, as the TiDB accumulates promotion, will gradually blurred the boundaries of OLTP/OLAP, and simplify the tedium of ETL process, Cause a new wave of technology. In a word, TiDB has a promising future.


This article was first published on the public account “Mi Operation and Maintenance”. Click to view the original article