On April 27, 2018, TiDB released 2.0 GA version. There are many improvements to MySQL compatibility, system stability, optimizers and actuators compared to version 1.0.

TiDB

  • The SQL optimizer

    • Simplify statistics data structure to reduce memory usage

    • Speed up the loading of statistics at process startup

    • Support dynamic update of statistics [experimental]

    • Optimize the cost model and estimate the cost more accurately

    • Use the count-min Sketch to estimate the cost of spot checks more accurately

    • Support analysis of more complex conditions and use indexes to the fullest extent possible

    • Support for manually specifying Join order through STRAIGHT_JOIN syntax

    • If the GROUP BY clause is empty, the Stream Aggregation operator is used to improve performance

    • Max/Min functions can be calculated using indexes

    • Optimized the associated subquery processing algorithm to support more types of associated subqueries to disassociate and transform into Left Outer Join

    • Extend the use of IndexLookupJoin to index prefix matching scenarios

  • SQL execution engine

    • The Chunk structure is used to reconstruct all the executor operators to improve the performance of analytical statements, reduce the memory footprint, and significantly improve tPC-H results

    • Supports Streaming Aggregation operator push down

    • Optimize Insert Into Ignore statement performance by more than 10 times

    • Optimize Insert On Duplicate Key Update statement performance by more than 10 times

    • Push down more data types and functions to TiKV calculations

    • Optimized Load Data performance and increased it by more than 10 times

    • Supports statistics on the memory usage of physical operators, and specifies the processing behavior when the threshold is exceeded through configuration files and system variables

    • Support to limit the size of memory used by a single SQL statement, reduce the program OOM risk

    • Support for implicit row ids in CRUD operations

    • Improve point-lookup performance

  • Server

    • Support the Proxy Protocol

    • Add a large number of monitoring items to optimize the log

    • Check the validity of configuration files

    • Support HTTP API to obtain TiDB parameter information

    • Batch Resolve Lock is used to improve the garbage collection speed

    • Support multithreaded garbage collection

    • Support TLS

  • compatibility

    • Support for more MySQL syntax

    • Support configuration files to change the lower_case_table_names system variable, used to support the OGG data synchronization tool

    • Improved compatibility with Navicat

    • Support for displaying table build times in Information_Schema

    • Fixed some function/expression return types different from MySQL

    • Improved JDBC compatibility

    • Support for more SQL_modes

  • DDL

    • Optimized the execution speed of Add Index, greatly increasing the speed in some scenarios

    • The Add Index operation has a lower priority, reducing the impact on online services

    • Admin Show DDL Jobs Displays more detailed DDL task status information

    • The Admin Show DDL Job Queries JobID is used to query the original statements of the ongoing DDL tasks

    • The Admin Recover Index command is used to Recover Index data in disaster recovery scenarios. Table Options can be modified using the Alter statement

PD

  • Added support for Region Merge, empty Region generated after merged data is deleted [experimental]

  • Add Raft Learner support [experimental]

  • Scheduler optimization

    • The scheduler ADAPTS to different Region sizes

    • Improved the priority and speed of data recovery when TiKV is down

    • Improved the speed of moving data of offline TiKV nodes

    • Optimize the scheduling policy when the TiKV node space is insufficient to prevent disks from being written out as much as possible

    • Improve the scheduling efficiency of the balance-leader scheduler

    • Reduces the scheduling overhead of the balance-region Scheduler

    • Optimize the execution efficiency of hot-region Scheduler

  • O&m interfaces and configuration

    • Adding TLS support

    • The PD Leader priority can be set

    • Supports attribute configuration based on label

    • Nodes with specific labels can not be configured to schedule Region leaders

    • Manual Split Region is supported to handle single-region hotspots

    • You can scatter a Region to manually adjust hotspot Region distribution in some cases

    • Add rules for checking configuration parameters to improve the validity of configuration items

  • Debugging interface

    • Added a Drop Region debugging interface

    • Added interfaces to enumerate PD health status

  • Statistical correlation

    • Add statistics of abnormal regions

    • Example Add statistics for Region isolation level

    • Add scheduling metrics

  • Performance optimization

    • The PD leader synchronizes with the ETCD Leader to improve write performance

    • Optimized the Region heartbeat performance to support more than 1 million regions

TiKV

  • function

    • Protect key configurations from incorrect modification

    • Support Region Merge [experimental]

    • Add the Raw DeleteRange API

    • Add GetMetric API

    • Add Raw Batch Put, Raw Batch Get, Raw Batch Delete, and Raw Batch Scan

    • Add Column Family parameter to Raw KV API to operate specific Column Family

    • Coprocessor supports streaming mode and streaming aggregation

    • Supports setting the timeout period for Coprocessor requests

    • The heartbeat packet carries a timestamp

    • You can modify some parameters of RocksDB online, such as block-cache-size

    • Support for configuring the behavior of Coprocessor when it encounters certain errors

    • Support to start in derivative data mode to reduce write amplification during derivative data

    • A region can be manually split

    • Improved data repair tool tiKV-CTL

    • Coprocessor returns more statistics to guide TiDB’s behavior

    • Support for ImportSST API, can be used for SST file import [experimental]

    • New TiKV Importer binary, integrated with TiDB Lightning for fast data import [experimental]

  • performance

    • Use ReadPool to optimize read performance and increase raw_get, get, and batch_get by 30%

    • Improve metrics’ performance

    • Notify PD immediately after Raft Snapshot processing to speed up scheduling

    • Resolve performance jitter caused by RocksDB disk flushing

    • Improved space reclamation after data deletion

    • Accelerate garbage cleanup during startup

    • Use DeleteFilesInRanges to reduce I/O overhead during copy migration

  • The stability of

    • Solve the problem that gRPC call does not return when PD Leader sends switch

    • Resolve the problem of offline nodes being slow due to snapshot

    • Limit the amount of space temporarily occupied by moving copies

    • If a Region has no Leader for a long time, the Region reports the Leader

    • Update Region size statistics based on a compaction event

    • Limit the amount of scanned data in a single Scan lock request to prevent timeout

    • Limit memory usage during snapshot receiving to prevent OOM

    • Improve the speed of CI test

    • Fix OOM problems caused by too many snapshots

    • Configure keepalive parameters for gRPC

    • Fixed the OOM problem caused by increased regions

TiSpark

TiSpark uses a separate version number, now 1.0 GA. The TiSpark 1.0 component provides the ability to use Apache Spark for distributed computing for data on the TiDB.

  • A gRPC communication framework for TiKV reading is provided

  • Provides the encoding and decoding of TiKV component data and communication protocol part

  • Provides computing push-down functionality, including

    • The aggregation pushed down

    • Predicate push-down

    • TopN pushed down

    • Limit the push-down

  • Index related support is provided

    • Predicates transform cluster index ranges

    • Predicates convert secondary indexes

    • Index Only Query optimization

    • Run-time index degradation sweep table optimization

  • Cost-based optimization is provided

    • Statistics Support

    • Index selection

    • Broadcast table cost estimation

  • Multiple Spark interfaces are supported

    • The Spark Shell support

    • ThriftServer/JDBC support

    • Spark-sql interaction support

    • PySpark Shell support

    • SparkR support

Today, thanks to the efforts of the community and the PingCAP technical team, TiDB 2.0 HAS been released. Thanks to the community for their participation and contributions.

As a world-class open source distributed relational database, TiDB is inspired by Google Spanner/F1 and has core features such as “distributed strong consistent transactions, online elastic horizontal scaling, high availability for fault recovery, and multi-activity across data centers”. TiDB was launched on GitHub in May 2015, followed by Alpha in December 2015, Beta in June 2016, RC1 in December, RC2 in March 2017, and RC3 in June. RC4 was released in August, TiDB 1.0 was released in October, and 2.0 RC1 was released in March 2018.