Introduce: Li Haixiang, net name “na Hai Lanlan”, tecent finance cloud database technology expert. Master of Engineering, School of Information, Renmin University of China. He is the author of the art of Database Transaction Processing: Transaction Management and Concurrent access Control and the Art of the Database Query Optimizer: Principle Parsing and SQL Performance Optimization.

On May 11, 2018, Tencent TDSQL team brought Tencent’s latest database core technology to China Database Technology Conference DTCC: TDSQL original concept of full-state data and visibility judgment algorithm based on historical state data.

Li Haixiang, an expert engineer of Tencent, shared the technical content on DTCC with the theme of “Enabling data — Cutting-edge technology of Tencent TDSQL distributed financial database”. In this sharing, based on the core technology of database transaction processing, concurrent access control technology, TDSQL originally proposed the concept of all-state data and visibility judgment algorithm based on historical state data, and based on this to achieve all-temporal database.

The following is the main content of this share. It mainly includes: TDSQL background introduction, T-TDSQL original technology starting point, T-TDSQL core technology points, T-TDSQL typical application, T-TDSQL core concept, project thanks six parts.

First, TDSQL overview

TDSQL is a distributed database that has been running stably for ten years. It not only supports the billing business of Tencent, but also runs stably and efficiently in the core business system of webank and other financial units for four years. In recent years, TDSQL has made continuous progress in the technical level and developed many new features, such as multi-level partitioning, hotspot update, hidden primary key, distributed transaction, etc. It not only supports transactional database applications, but also moves forward to Spanner architecture in the architecture, which is a worthy NewSQL system.


TDSQL distributed transaction processing technology has made great progress, not only in the realization of 2PC based on XA to support atomic submission of distributed transactions, but also on the basis of MVCC technology, made innovation, making TDSQL transaction processing technology based on original technology and continuous development.

The starting point of the original technology

TDSQL’s original technology is not technology for technology’s sake, but innovation based on business needs to solve business problems.

Tencent’s billing system is the world’s leading financial cloud billing system. This system includes SAAS, PAAS and IAAS. At the SAAS level, including Mi Master, cloud store, TDSQL and other systems.

TDSQL has a custody account of nearly 28 billion yuan. Mi Maestro relies on TDSQL for financial transactions. Tencent recharge and its related partners have a daily flow of more than 15 billion and a daily transaction volume of more than 10 billion. Financial data are used in TDSQL database for settlement, reconciliation, audit, risk control data analysis, user portrait construction and other services. Such as king of Glory game voucher reconciliation business, user account consumption change audit and risk control business.

To perform activities such as reconciliation, auditing, etc., there are two sources of data. Part of the data is derived from log data from different systems (relational databases or NoSQL systems), called pipelining logs. However, the daily log flow data of one such system is nearly 100 GIGABytes and the incremental data increases rapidly from the trend. In addition, some data are tabulated by time in TDSQL, and it is necessary to use flow log for reconciliation calculation of data in tabulated by time after a period of time.

Reconciliation is mainly to solve several abnormal situations:

1. The system has bugs or does not perform as expected when a fault occurs. This may lead to the successful delivery of the deduction is not successful, or the success of the deduction is not shipped. Such as Tencent video VIP management system recharge, delivery.

2. Avoid hacker/internal risks. For example, illegal personnel bypass the business system to recharge their own fraudulent behavior.

There are many kinds of such reconciliation services, and the format of the log flow is not exactly the same for different applications. TDSQL managed accounts need to regularly perform data consistency reconciliation checks for thousands of multi-level businesses and accounts.

From a technical point of view, there are four problems:

1. Complex application development: To use service logs, the service system needs to continuously generate log information, and then consume computing resources to parse different log formats and store the log information to the analysis system. This brings about the burden of development and the waste of resources.

2. Data logic fragmentation: the table is divided by time in TDSQL, which can only be settled in a certain time period, rather than flexible and convenient calculation. For example, when calculating data in any time period, the table by time period physically disconnects the logical continuity of data by time, and requires several specific tables for calculation.

3. Loss of real-time features: The above two problems implicitly mean that the calculated data needs to be imported into a new analysis system for calculation. The process of exporting/importing data also consumes resources and time, making it difficult for the analysis system to have real-time computing features.

4. Complex data management: In addition, logs and other information are historical data and need to be saved for a long time. Tencent generates, stores, analyzes and manages more than 15 billion stream logs in different formats every day, which is a huge challenge.

Modern database systems only retain the current value of the data, and the historical data is discarded due to storage costs and other reasons. As an important asset, data, whether current data or data that existed in history, are of great value. Therefore, historical data storage, to be analyzed, to be mined, to be used repeatedly, is the demand of the current Internet and other enterprises. Especially financial historical state data, because of security, need to be calculated for many times, in Tencent’s billing business, with temporal attributes of the data to be managed increasingly strong demand.

Based on the above reasons, Tencent has developed the temporal database T-TDSQL based on TDSQL relational database, which can uniformly manage massive temporal data and current data by the database system, thus solving the problems in the above four businesses.


The solution of business pain points is based on the characteristics of the database and business scenarios for in-depth analysis and thinking to solve.


Because data is valuable, the TDSQL team believes that historical data is valuable. This is the core value of TDSQL temporal database T-TDSQL. Therefore, we give TDSQL a new understanding of data.

TDSQL think:

State attributes of data that identify the life cycle trajectory of data. The life cycle of data is divided into three stages, and each stage depicts different state attributes of data to identify the state in the life cycle trajectory of data.

1. Current State: indicates the latest version of the data item and the data in the Current phase. The state of the data in the current phase is called the current state.

2. Historical state: A Historical state of a data item. Its value is the old value, not the current value. The state of data in the historical phase is called the historical state. There can be more than one historical state of a data item, which reflects the process of data state change. Data in the historical state can only be read and cannot be modified or deleted.

Transitional State: it is neither the latest version nor the historical version of the data item, and is in the process of changing from the current State to the historical State. Data in the transition state is called half – decline data.

These three states, covering the life cycle of a data item, are collectively known as full-state data, or full-state data. Under THE MVCC mechanism, all three states of data exist. Under the non-MVCC mechanism, data exists only in historical and current states.

1. Current state: Under MVCC or blocking concurrent access control mechanism, the new value of the data after the transaction is committed is in the current state.

2. Historical state: Under the MVCC mechanism, the data generated before the smallest transaction in the current active transaction list is in the historical state. In the blocking concurrent access control mechanism, after a transaction is committed, the value of the data before the transaction is changed to the value of the historical state, that is, the old value of the data item is in the historical state.

Transition state: MVCC mechanism, be read version still active transaction (the latest related transaction) is in use, because of the latest related affairs changed the value of the data item, its latest values are already in a current state, be read to the value of the relative current state is in a state history, so its data state between the current and historical states, so called transition state.

The bi-temporal attributes of data are valid time attribute and transaction time attribute respectively.

The valid time attribute represents the time attribute of the object represented by the data. For example, the start and end time of Kate’s middle school is 2000-09-01 to 2003-07-30, and the start and end time of university is 2003-09-01 to 2007-07-30, the time here is the valid time.

The transaction time attribute represents the time at which a certain state of the data occurred. Data has temporal properties, that is, when and what the database system did. An operation is encapsulated within the database system as a transaction, and transactions have atomicity. Therefore, we use transaction flags to identify the transactional temporal properties of a data.

Formally, the valid time attributes and transaction time attributes are represented by ordinary user-defined fields in the data model, but are described by specific keywords for constraint checking and assignment by the database engine.

The TDSQL team wanted to build a database system that would address these issues by providing the following features:


Therefore, T-TDSQL temporal database based on TDSQL has the following features, which can cover the four aspects of bi-temporal data application, data security, data analysis and simplified application development:


Three, t-TDSQL core technology

One of the core technologies of T-TDSQL is the definition of data model. The combination of full-state data model and bi-temporal data model creates T-TDSQL.

1. Data model is the core technology of T-TDSQL

In this model, all-state data is represented in the historical version of the data item; Temporal data has not only transactional tense but also valid time tense. The historical state data of full-state data can not only trace the Operation time of the database system, but also trace the Operation type. In the “Operation” column in the figure below, we can know whether the DML Operation on the data item has been UPDATE, INSERT or DELETE. This is a very cool feature, which allows users to realize their dream of “everything back and traceable” in the T-TDSQL system.


2. T-tdsql core technology ii, historical data dump timing

The storage time of historical data is another core technology of T-TDSQL.

T-tdsql uses the concept of full-state data, cleverly utilizing MySQL’s rollback segment and Purge mechanism to implement historical state data dump. A schematic diagram is as follows:


3. T-tdsql core technology ii, consistency snapshot point

In PostgreSQL, if you implement this technology, you can consider combining the characteristics of multiple versions of storage, to achieve the current state data and historical state, transition state storage separation, this needs to modify the existing data visibility judgment algorithm, page storage format, data merge timing, buffer read and write, heap construction, etc. It is more important to implement new data consistency snapshot points.

T-tdsql implements the construction of a new data consistency snapshot point based on MySQL, so it can obtain data in any state at any time period (including the time when it happened in history).



As an original technology, the core technology and thinking of T-TDSQL, The paper has been published in the World Wide Web Journal in efficientTime-Interval Data Extraction in MVCC-based RDBMS. For details, see:

https://link.springer.com/article/10.1007/s11280-018-0552-7

Four, typical application of T-TDSQL

Temporal information processing has become the key technology of many new generation databases and information systems, especially in the financial field, e-commerce, data warehouse, cadastral management system, land use planning system, geographic information system plays an increasingly important role.

In e-commerce and financial business systems, there are a large number of data such as income, expenditure and balance, and with the development of business, new data are constantly generated, which will play an important role in reconciliation, audit, user portrait and other businesses. By implementing the transaction temporal function, T-TDSQL can obtain these data quickly, finely, and in real time.

In Internet financial services, account checking is a classic business.

T-tdsql provides a perfect solution for Tencent’s billing reconciliation business.

1. Account checking

The Internet financial industry has a high requirement on the accuracy of data, and in the Internet environment, data inconsistency or data errors occur from time to time. Therefore, it is very important to reduce the risk caused by data errors such as account balance through reconciliation.

In Tencent billing business, the account balance table (User) and account statement table (water) are compared on an hourly/daily basis to find the inconsistency between account balance and transaction statement, so as to correct the wrong transaction in time.

Traditional reconciliation is performed in a fixed time period (e.g., minute/hour/day). For the reconciliation of transactions on April 11, 2018, the initial account balance statement and ending account balance statement of April 11, as well as the transaction statement of that day, shall be obtained first. Then group the account table by user ID and calculate the ending balance of each user minus the beginning balance, denoted as result A; group the flow water table by user ID and sum the transaction amount by group, denoted as result B; Finally, the result A of each user is compared with result B. If A=B, there is no problem with the transaction; otherwise, there is an error in the transaction of the user on that day.

There are three main problems in checking accounts according to a fixed period:

1. Poor timeliness: For wrong transactions, they cannot be immediately found and reported back. Errors can only be found after a fixed period of time.

2. Inaccurate account checking: the transaction is complicated due to the wrong positioning. For example, if a user within a day of deals, including a appeared a mistake, not directly positioning by means of reconciliation by day to the concrete which trade there is an error, but only to the user level, namely still need artificial participation, the day will be the wrong user transactions are confirmed again, to find the specific error transactions.

3. Inflexible reconciliation: the reconciliation is carried out in a fixed time period. If the unit is days, the reconciliation can only be carried out after the incremental data of this day has settled.

2. Reconciliation optimization

Based on the data model and incremental calculation method proposed in this paper, the problem of daily account reconciliation can be solved well. Combined with the example in 3.1.2, we give the practical application of incremental calculation in the reconciliation business of Internet finance.

T-tdsql can accurately compare account balance table (USER) and account flow table (water) based on the function of incremental calculation, and carry out fine-grained reconciliation of flow level, so as to find transaction errors immediately, and can immediately locate the wrong transaction, eliminating the complex process of wrong transaction positioning.

The core idea of the optimized reconciliation is: general ledger summary, fine ledger.

The effect of reconciliation after optimization is: fast reconciliation of general ledger, accurate sub-ledger, no time limit, arbitrary reconciliation [1].

Reconciliation Step 1 — General Ledger reconciliation: Firstly, all the data blocks of the account table within the reconciliating period [s_start,s_stop] are read, and the data in each data block is confirmed by a formula similar to the traditional reconciliating method, that is, “Total ending balance – Total beginning balance = total transaction changes” trial calculation [2], and the total beginning balance represents the total balance at s_start. The total ending balance represents the total balance when S_stop, and the total transaction change represents the flow generated by each account. If there is an unbalance in the general ledger within the data block, it means that there is a breakdown error, so accurate reconciliation as described in steps 2 and 3 should be carried out.

Reconciliation Step 2 – Accurate reconciliation – Reconciliation process: Execute the following SQL to “snapshot differential connection” between the account balance block and the corresponding account flow block, and each record in the returned result set will contain {pre-transaction balance, post-transaction balance, transaction change}. The corresponding execution renderings are shown below:



Accurate reconciliation diagram

Reconciliation Step 3 — Accurate reconciliation — Accurate meaning: conduct a trial calculation [3] (after-before =Change) of “post-transaction balance – pre-transaction balance = transaction Change” for each returned record in the result of Step 2 to confirm whether the transaction is wrong. If there is a situation that does not satisfy this equation, it is a wrong transaction.

Error transaction is mainly divided into account table error and flow table error. For example, in the figure above, the second tuple in the result set does not satisfy the trial calculation formula, indicating that the transaction with flow ID 2 has a wrong account balance update or the transaction change value in the flow record is wrong. In the fourth tuple of the result set, the value of the Change field is NULL, indicating that the stream is missing for this transaction. Through the table in the figure above, we summarize various error situations, which need to be reported in the reconciliation process.

3. Temporal class application of valid time

Based on t-TDSQL all-temporal core technology, this share also introduces typical applications from the perspective of bi-temporal. As shown in the figure below.



4. Data security applications

Based on the feature of historical status query, T-TDSQL system provides flexible and powerful data security functions in data correction, history tracking and other aspects, which can greatly simplify and speed up audit, reconciliation and other services.

Query the data inserted within a period of time for data statistics and tracking, such as when new accounts are opened and exception records are added.

Querying deleted data plays a significant role in security and statistics, such as recovering deleted data and counting the number of users.

You can query the updated data within a time range to trace the occurrence time and data before the occurrence of data exceptions and rectify data exceptions.

The historical data of all states can be comprehensively queried. In terms of data replay, it can assist post-disaster recovery or be used for offline drills. In terms of data statistics, it is of great benefit to the business of checking accounts because it supports the data calculation of any spatio-temporal nodes; In terms of security, it simplifies tracking and recovery of wrong data and deleted data.

Here are some security examples:


In addition, based on the full temporal data, data replay, more valuable data analysis and mining, automatic tuning of the system using AI technology is possible.

Five, the core concept of T-TDSQL, enabling data

Why is T-TDSQL implementing a temporal database?

What was behind the original technology that drove the T-TDSQL team to create such a full-temporal database system?

These questions, in fact, are more important. Exploring the causes of these problems and striving for originality are the value factors that the TDSQL team values more when it comes to technology sharing.

In the opinion of the TDSQL team, “Data is valuable, historical data is valuable”. In business, mining the value of data is very important, which is also what many people are thinking, and it makes sense to think that any data has value.

Therefore, after thinking about the T-TDSQL project, the view is that “Historical data is valuable. Business is a sword, Technology is only ashield.” So, what is a shield? What is a sword? What is the relationship between shield and sword?

In TDSQL’s view, technology is just a defensive tool to turn dreams into reality. The dream is the vision of technologists who want to solve real problems with a variety of advanced and sophisticated technologies, such as distributed, consistency, snapshot, RDMA, NVM, AI, full data mining, and so on. Business is just an offensive tool for discovering dreams. TDSQL does not advocate the view that business is king, but left hand shield and right hand sword, both hands should be hard. But it is only the left hand shield and the right hand sword, walking in the river’s lake of technology, which can only achieve the dream of the technical people. Behind it, there is no pillar of soul.


And historical data is valuable in (finance/Tencent/Internet/everything…) Business, mining the value of data, is meaningful.

But one step further.

The creation of data is determined by the user and their business, and they are the party that creates the data. Database carries the responsibility of data management, can the database system also participate in the creation of data?

From the TDSQL team’s point of view, the concept of all-temporal is the perfect opportunity for a database system to participate in data creation. The database system assigned transaction tenses to data, event sources in DML operation, and even the association relationship between data (5W and Lineage in the following figure), which made the database system become the creator of data.

This is what drives us, the TDSQL team, both technologically and in the business: the idea of “empowering data”.


Supported by the idea of “enabling data”, T-TDSQL based on TDSQL was born. Empower the data, so that the data can have more value, let the database become the producer of data, participate in the creation of data. As shown in the figure below, to enable data, t-TDSQL from the 5W perspective, data has time (bi-temporal, WHEN), place (stored history table, WHERE), person (user ID, WHO), object (full-state data, WHAT), reason (DML and other operations, WHY) and other elements. Data is no longer just created by the user using the CREATETABLE statement, but contains all kinds of data created by the database system, with the data history in the data life cycle to give it depth and value.


With these, database systems can more actively trace the history of data, deduce the changes of data, and predict the future of data (the world).

Six, thanks

The project was established in Tencent, and the research content and implementation process were supported and participated by The Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education of Renmin University of China and Tencent. We would like to express our thanks to the project participants and supporters.

Hats off to those who have contributed to the project from many perspectives, such as thinking, concept and technology.

[1] No time limit, arbitrary reconciliation: the snapshot difference can be specified in the SQL statement of reconciliation, and the table name involved in the FROM clause does not change, that is, the data source used for reconciliation is not changed, so the reconciliation process will not be affected.

[2] Are one of the procedures in an accounting program, which is simply defined as checking the debit and credit totals of all transaction entries in the diary for errors. However, when doing a trial check, this action should be done immediately after each transaction is recorded.

[3] Are one of the procedures in an accounting program, which is simply defined as checking the debit and credit totals of all transaction entries in the diary for errors. However, when doing a trial check, this action should be done immediately after each transaction is recorded.