Author: Zhang Liang, HEAD of DBA of Xiaomi; Youfei Pan, XIAOMI DBA; Wang Biwen is a development engineer at Xiaomi.

I. Introduction to application scenarios

MIUI is a third-party mobile operating system developed by Xiaomi based on Android system, which is also the first product of Xiaomi. Based on the Android system, MIUI is deeply customized for Chinese users, on which a series of applications have been bred, such as theme store, Mi Music, App Store, Mi Reading and so on.

Currently, TiDB is mainly applied in:

  • Mi mobile phone desktop negative one screen express business
  • Commercial advertising trading platform material sampling platform

The daily read/write volume of these two service scenarios reaches hundreds of millions. After they are online, the whole service runs stably. Next, we plan to launch more business scenarios gradually. At present, Miui is actively testing the migration of the order system.

Ii. Characteristics of TiDB

TiDB combines the best features of traditional RDBMS and NoSQL, is compatible with MySQL protocol, supports unlimited horizontal scaling, and has strong consistency and high availability.

It has the following characteristics:

  • Highly compatible with MySQL, it is easy to migrate from MySQL to TiDB without changing the code in most cases, even if the MySQL cluster has been divided into libraries and tables, it can be migrated in real time with the migration tool provided by TiDB.

  • Horizontal elastic expansion: TiDB can be expanded horizontally by simply adding new nodes. It can handle processing or storage as required, and easily cope with scenarios of high concurrency and massive data.

  • Distributed transactions, TiDB supports 100% standard ACID transactions.

  • True finance-level high availability, Raft based majority election protocol provides 100% finance-level data consistency compared to traditional m-S replication solutions and auto-failover without the need for manual intervention without the loss of most copies.

The architecture and principles of TiDB are described in detail in the official website, which will not be repeated here.

Three, background

Like most Internet companies, Xiaomi’s preferred relational storage database is MySQL, with 2.6T disks on a single machine. Due to the rapid increase of millet mobile phone sales and negative one screen at a beautiful MIUI users rapidly increasing, lead to a negative one screen at the sheer volume of data express business is growing so fast, * * every day, speaking, reading and writing level have reached hundreds of millions of level, the rapid growth of data in single machine bottlenecks, such as performance significantly decreased, the decrease of available storage space, big table DDL cannot execute, etc., Have to face the problem of database expansion. ** For example, we have a business scenario (intelligent terminal), which needs to regularly write various monitoring and collection data to the database from tens of millions of intelligent terminals. The single-thread replication mode of MySQL based on Binlog is easy to cause the delay from the library, and the accumulation is becoming more and more serious.

** For MySQL, the most direct solution is to use the horizontal expansion mode of sub-library and sub-table. Generally speaking, it is not the optimal solution. For example, for business, it is more intrusive to business code; For DBAs, it is necessary to continuously split and expand the management cost. Even with middleware, there are certain limitations. ** also applies to the above intelligent terminal business scenario. From the perspective of business requirements, the query needs to be performed from multiple business dimensions, and the business dimensions may be expanded at any time. The scheme of table division cannot meet the business requirements.

After understanding the characteristics of TiDB, DBA communicated with business development to confirm the current MySQL usage mode, and made a detailed comparison with the compatibility of TiDB. After the service pressure test, according to the results of the pressure test, decided to try to migrate data storage from MySQL to TiDB. After several months of online testing, TiDB performed as expected.

Fourth, compatibility comparison

TiDB supports most of the MySQL syntax, including cross-row transactions, joins, and subqueries. You can connect directly to the MySQL client. For services that already use MySQL, you can switch to TiDB almost seamlessly.

A simple comparison between the two is as follows:

  • Function support

    • TiDB does not yet support the following:
      • Add and delete primary keys
      • Non-utf8 character set
      • Views (soon to be supported), stored procedures, triggers, some built-in functions
      • Event
      • Full-text index, spatial index
  • The default Settings

    • The default values of character set, sorting rule, SQL_mode, and lower_case_table_names are different.
  • The transaction

    • TiDB uses an optimistic transaction model and checks the return value after committing.
    • TiDB limits the size of a single transaction, keeping transactions as small as possible.
  • TiDB supports most Online DDLS.

  • In addition, some MySQL syntax can be parsed in TiDB without any effect. For example, the engine and partition options in the CREATE table statement are ignored after parsing.

  • More information can be found at pingcap.com/docs-cn/sql… .

Five, the pressure test

5.1 purpose

The OLTP performance of the TiDB is tested by pressure test to see if it meets the business requirements.

5.2 Machine Configuration

component Number of instances The CPU model memory disk version The operating system
TiDB 3 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 128G SSD Raid 5 The 2.0.3 CentOS Linux release 7.3.1611
PD 3 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 128G SSD Raid 5 The 2.0.3 CentOS Linux release 7.3.1611
TiKV 4 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 128G SSD Raid 5 The 2.0.3 CentOS Linux release 7.3.1611

5.3 Contents and results of pressure measurement

5.3.1 Standard Select pressure test

Threads QPS Latency (avg / .95 / max)
8 12650.81 0.63/0.90/15.62
16 21956.21 0.73/1.50/15.71
32 31534.8 1.01/2.61/25.16
64 38217 1.67/5.37/49.80
128 39943.05 3.20/8.43/58.60
256 40920.64 6.25/13.70/95.13

5.3.2 Standard OLTP pressure test

Threads TPS QPS Latency (avg / .95 / max)
8 428.9 8578.09 18.65/21.89/116.06
16 731.67 14633.35 21.86/25.28/120.59
32 1006.43 20128.59 31.79/38.25/334.92
64 1155.44 23108.9 55.38/71.83/367.53
128 1121.55 22431 114.12/161.51/459.03
256 941.26 18825.1 271.94/369.77/572.88

5.3.3 Standard Insert pressure test

Threads QPS Latency (avg / .95 / max)
8 3625.75 2.20/2.71/337.94
16 6527.24 2.45/3.55/160.84
32 10307.66 3.10/4.91/332.41
64 13662.83 4.68/7.84/467.56
128 15100.44 8.47/16.41/278.23
256 17286.86 14.81/25.74/3146.52

According to the pressure test, the stability of TiDB is slightly different from that expected, but the Load of pressure test will be significantly higher than the Load of business in production. Referring to the performance of TiDB at low Threads, it can basically meet the performance requirements of BUSINESS on DB. Determine the grayscale part of the MySQL read traffic from the library to experience the actual effect.

6. Migration process

The migration is divided into two parts: data migration and traffic migration.

6.1 Data Migration

Data migration consists of incremental data and stock data.

  • For the stock data, logical backup and import can be used. In addition to the traditional logical import, TiDB Lightning is also officially provided as a tool for physical import.

  • For incremental backups, TiDB provides Syncer (renamed dM-Data Migration) to ensure Data synchronization.

The Syncer structure is shown in Figure 6, which mainly relies on various rules to achieve different filtering and merging effects. One Syncer source corresponds to one Syncer process, while multiple Syncer processes are required to synchronize Sharding data.

When using Syncer, note the following:

  • Check whether server-id, log_bin, binlog_format is ROW, binlog_ROW_image is FULL, and synchronize related user permissions and Binlog information.

  • Using strict data checking mode, data violations will be stopped. Check data and table structures before data migration.

  • Good monitoring, TiDB provides ready-made monitoring solutions.

  • Pre-check that shard tables are synchronized to the same TiDB cluster. Check whether the synchronization scenario can be expressed using Route-rules, and check whether the unique key and primary key of the sub-table conflict after data merging.

6.2 Traffic Migration

Traffic switching to TiDB is divided into two parts: read and write traffic migration. Ensure the grayscale process of each switch, observe the period of 1 to 2 weeks, do a good job of rollback measures.

  • The read traffic is switched to TiDB. In this process, the rollback is relatively simple. If there is no problem with the gray scale, the full switchover is performed.

  • To switch the write to TiDB, you need to consider data rollback or double-write (Syncer needs to be disabled).

7. Cluster status

7.1 configuration

The cluster configuration adopts the official recommended seven-node configuration, including three TiDB nodes, three PD nodes, and four TiKV nodes. Each TiDB and PD are in a group and share a physical machine. As services grow or new services are added, TiKV nodes can be added as required.

7.2 monitor

The monitoring scheme provided by TiDB is adopted, and the company’s open source Falcon is also used for monitoring. At present, the whole cluster runs stably, as shown in Figure 7.

Viii. Problems encountered, causes and solutions

The problem Reasons and solutions
You cannot operate on multiple columns or indexes in a DDL. Currently, the ADD/DROP INDEX/COLUMN operation cannot create or delete multiple indexes or columns at the same time. You need to perform the operation separately.
Some operators are not well supported by the query optimizer. For example, if the OR operator uses TableScan, rewriting it to Union All can be avoided. The official statement is that the current use of the OR operator may be inaccurate in the execution plan. It is already in the improvement plan and will be optimized in the subsequent 3.0 version.
When a PD node is restarted, the service detects that PD is unavailable and reports a PD Server timeout. Because the Leader node is restarted, you need to manually switch over the Leader node and then restart the node. The official suggestion is that the Leader migration can be done before the restart to alleviate the problem. In addition, TiDB will comb and optimize the parameters related to network communication later.
The execution speed of table building sentences is slower than that of MySQL When there are multiple TIdbAs, the Owner and the TiDB Server that receives the CREATE Table statement may be slower than MySQL if the Owner and the TiDB Server that receives the CREATE Table statement are not on the same Server. Each operation takes about 0.5 seconds. This operation will be improved in the later versions.
The parsed parameters of the pd-ctl command line are strict. An extra space will prompt a syntax error. The official states that this issue may occur in earlier versions, but has been improved in 2.0.8 and above.
The tikv-ctl command failed to manually compact region. Procedure In earlier versions, tiKV-ctl is usually caused by the inconsistency between tiKV-CTL and the cluster version. Tikv-ctl of the same version needs to be replaced, which has been repaired in 2.1.
Services are affected when large tables are indexed The official recommendation is to operate during the off-peak period. In version 2.1, operation priority and concurrent read control have been added to improve the situation.
Storage space enlargement problem The problem belongs to RocksDB. The optimal spatial amplification coefficient of RocksDB is 1.111. It is recommended that dynamic level-bytes of RocksDB be enabled by TiKV in some scenarios to reduce spatial amplification.

Ix. Follow-up and Outlook

At present, TiDB mainly provides OLTP service in Xiaomi. The negative screen express service of Xiaomi mobile phone made a good start for TiDB, and then commercial advertising was also connected. Both services have been online for several months, and TiDB has withstood the test of stability and brought great experience.

  • There are a large number of businesses with similar scenarios in MIUI ecosystem business. In the future, we will actively communicate with business development to migrate from MySQL to TiDB.

  • For some business scenarios, aiming at the rational utilization of resources, archive cluster is introduced, and Syncer is used to realize the function of data archiving.

  • Data analysis, combined with TiDB tools, will support offline, real-time data analysis support.

  • The monitoring of TiDB is integrated into Xiaomi’s open source monitoring system Falcon.

Ten, thank you

Thank you very much for TiDB official support during the migration and business launch. I would like to commend every TiDB staff for their professional spirit and timely and responsible response.

More TiDB user practices: www.pingcap.com/cases-cn/