TiDB is an open source distributed relational database developed by PingCAP, which combines the best features of traditional RDBMS and NoSQL. TiDB is compatible with MySQL and has such core features as “distributed strong consistent transaction, online elastic horizontal expansion, high availability of fault recovery, multi-activity across data centers”. It is an ideal database cluster and cloud database solution in the era of big data.

In August this year, UCloud made TiDB public cloud and launched UCloud TiDB Service, currently using TiDB version 3.0.5. UCloud TiDB Service has no loss of performance compared with bare metal deployment. It provides high availability across the availability area, and improves monitoring and Binlog, enabling users to obtain one-click creation, pay-on-demand, and flexible expansion of TiDB Service.

UCloud TiDB Service

Why UCloud TiDB Service? Service is emphasized here because from the perspective of public cloud users, TiDB runs on the public cloud platform and is actually presented as a Service rather than a physical resource. UCloud TiDB Service is a Serverless distributed database Service with high performance, high availability across availability zones and high scalability that supports the native MySQL protocol.

Compatible with native MySQL protocol

In most cases, it is easy to migrate from MySQL to TiDB without changing the code, and the MySQL cluster can be migrated in real time with TiDB tools.

High availability across availability zones

TiDB itself has some high availability, but the average user does not have the conditions to deploy across the availability zone. All components of the UCloud TiDB Service are deployed across availability zones. The multi-instance deployment capability of all TiDB modules, combined with UCloud’s ability to deploy across availability zones, makes UCloud TiDB Service resistant to availability level failures.

The dynamic extension

TiDB can achieve horizontal expansion of both compute nodes and storage nodes. By simply adding new nodes, it can be expanded to handle processing or storage on demand, easily coping with scenarios of high concurrency and massive data.

Serverless

The product form of Serverless makes it easier and faster for users to use TiDB without caring about the underlying physical resources or the details of the underlying distribution and deployment.

Pay on demand, low access costs

You do not need to specify resources such as CPU, memory, and hard disk. You only need to pay for the disks and storage capacity that are actually used, saving upfront hardware costs.

The performance comparison

We did a test, in the same physical configuration (Intel Xeon E5-2620 V4, DDR4_16GB_2400MHz X12, U.2_NVME_3.2TB X2) and the same software deployment (TiDB X3, TiKV X3, PD X3), The test condition is SYSbench 512 Threads, 32 tables, 10 million rows. The following table shows the performance comparison between TiDB and UCloud TiDB Service deployed on bare VMS:

The results show that all indicators are basically consistent, UCloud TiDB Service does not bring performance loss compared with bare machine deployment, and some indicators are slightly better. And behind this, UCloud public cloud background to do what?

Build distributed database PaaS platform

UCloud has made a distributed database PaaS platform (as shown in the figure above). In terms of management functions, the first part on the left has resource management of physical machines, including resource allocation when creating instances and resource recovery after instance deletion, etc. The second part is the cluster deployment, the creation process to select a suitable physical machine, check whether the resources of the above content, meet after allocate certain resources, and then execute the corresponding create work, it will create TiDB cluster, and the corresponding monitoring, LB layer, and the deployment on public clouds are run in user VPC, You need to initialize the VPC network. The third part is the cluster maintenance, for example, a physical machine is abnormal, all services must be migrated to other nodes. These are mainly involved in migration, expansion, shrinkage of these work.

On the right is the monitoring alarm, which is used to timely notify and manage abnormal situations. There is also operational analysis, which is the management of UCloud database operations. Backup management Backs up and recovers databases. Users can set detailed backup policies, such as when and how to back up databases.

Native protocol is the original data flow of MySQL, here we add a layer of load balancing, there are two main purposes: one is to unify the IP address into one, users do not need to manage IP address switch; In addition, the public cloud service transmission to do some control, mainly accounts and system control.

Deployment across availability zones for high availability

TiDB is composed of distributed SQL layer (TiDB), distributed KV storage engine (TiKV) and PD module that manages the whole cluster. As shown in figure we will TiDB all the components for deployment across the available area, and provide a single high access addresses are available, and the benefits of a single address is user does not need to pay attention to address more, also don’t need to do to address the switch between, another benefit is that the whole process of disaster completely transparent to the business, such as to add/drop a TiDB node, Or to move to another machine. With the advent of unified virtual IP addresses, services do not care about addresses at all, and all operations are completely transparent to users.

The transformation of surveillance

TiDB itself uses Prometheus as a monitoring and performance indicator information collection scheme, Grafana as a visual component to display, and Alertmanager to realize the alarm mechanism, but they are all single-point deployment and do not have disaster recovery capability. We made all three modules highly available. Grafana was not able to use TiDB to store metadata, so we modified the Grafana source code, rewrote a number of multi-Schema statements, and removed the field size operation to allow Grafana to store metadata using TiDB.

The figure shows a user service monitoring system with a LB and two Grafana nodes on the left. We connect to Prometheus through LB for remote high availability.

Modification of Binlog

In a user scenario where TiDB data is imported into an existing big data cluster for data analysis, the log output to Kafka is in JSON format for Flink consumption parsing. Since Binlog is a PB format, the driver currently provided only supports TXT and mysql.

After our modification to Binlogdriver, Binlog can output Json format and write Json format logs to Kafka.

Quality improvement and Bug fixes

During the process of building TiDB service, we also discovered and solved some minor problems with the original TiDB, improving the product quality in detail. Many of them have been improved and solved in the official new version, for example:

Drainer prints db.table statements (fixed in 3.0); The time zone changes after TiDB is upgraded to 2.1. Syncer does not handle SIGTERM (fixed) in the Retry phase; Syncer can’t decode set datatype (fixed); Drainer writes only one partition, which causes data skew. We can start multiple drainers and each Drainer writes 1 DB. Raft Store single thread bottleneck (fixed in 3.0); Binlog Indicates that DDL is slow within 10 minutes (Fixed in 2.1.14). There are also some problems in statement understanding caused by different from the MySQL native protocol, such as ID segment increase, GC time leads to connection interruption, transaction number limit (a single KV entry does not exceed 6MB, the total number of entries does not exceed 30W, the total size does not exceed 100MB), automatic retry of failure, etc. These problems have been polished and accumulated in UCloud for a long time and have reached a relatively mature and stable form.

TiDB management module

The product console opens the management module of TiDB to users, which is divided into four parts: backup management, recovery task, user management and Binlog synchronization. Details are as follows:

Backup management: When creating TiDB instances, you can determine whether to enable the automatic backup policy. The backup policy includes the backup time, number of reserved automatic backup files, and automatic backup period. In addition to automatic backup, TiDB also offers manual backup options

Recovery task: TiDB can be restored from a backup file to a new TiDB instance. You need to prepare the new instance in advance and restore the data of the new instance.

User management: TiDB provides user rights management, including adding and initializing user rights, adjusting user rights, and deleting non-root users.

Binlog synchronization: Synchronizes TiDB incremental data to other storage in real time. Currently, MySQL is supported and TiDB is used as the target storage.

conclusion

It can be said that TiDB is a database born for the cloud. UCloud TiDB Service provides TiDB to users as a Service on the premise of ensuring no loss of TiDB performance, reducing the user threshold, simplifying user management, and improving disaster recovery capability. In the future, UCloud will continue to work closely with PingCAP to create more possibilities for databases on the cloud.