Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~
This article is published in Tencent Cloud + community by Tencent Technology Engineering official number
A classic case
Delta extraction, delta calculation, and so on are classic examples of T-TDSQL. The following takes incremental calculation as an example to analyze the typical application of T-TDSQL in Tencent financial business.
The incremental calculation
Based on the characteristics of T-TDSQL all-temporal data storage, we can conveniently conduct incremental data query, extraction and calculation.
For data incremental extraction/calculation of a single table [1], T-TDSQL first obtains the incremental data set corresponding to the given snapshot range through the snapshot differential read method, and then calls the built-in aggregation functions of the system in combination according to user-defined calculation rules, such as SUM, AVG, GROUP BY, etc., to achieve the function of incremental calculation. Data from any time period in history can be “incrementally extracted” using incremental computing techniques.
For multi-table incremental calculation, T-TDSQL supports incremental calculation scenarios through “snapshot differential join”. That is, two snapshot difference sets R and S are obtained first, and then the two tables are merged through join operation, and then the aggregation function is used to complete the calculation.
This section introduces the principle and practical application of incremental calculation through the account reconciliation service commonly used in Internet finance.
Check for business
The Internet financial industry has a high requirement on the accuracy of data, and in the Internet environment, data inconsistency or data errors occur from time to time. Therefore, it is very important to reduce the risk caused by data errors such as account balance through reconciliation.
In Tencent billing business, the account balance table (User) and account statement table (water) are compared on an hourly/daily basis to find the inconsistency between account balance and transaction statement, so as to correct the wrong transaction in time.
Traditional reconciliation is performed in a fixed time period (e.g., minute/hour/day). For the reconciliation of transactions on April 11, 2018, the initial account balance statement and ending account balance statement of April 11, as well as the transaction statement of that day, shall be obtained first. Then group the account table by user ID and calculate the ending balance of each user minus the beginning balance, denoted as result A; group the flow water table by user ID and sum the transaction amount by group, denoted as result B; Finally, the result A of each user is compared with result B. If A=B, there is no problem with the transaction; otherwise, there is an error in the transaction of the user on that day.
There are three main problems in checking accounts according to a fixed period:
-
Poor timeliness: Error transactions cannot be immediately detected and reported, and errors can be discovered after a fixed period of time.
-
Inaccurate reconciliation: positioning errors transaction is more complex. For example, if a user within a day of deals, including a appeared a mistake, not directly positioning by means of reconciliation by day to the concrete which trade there is an error, but only to the user level, namely still need artificial participation, the day will be the wrong user transactions are confirmed again, to find the specific error transactions.
-
Inflexible reconciliation: the reconciliation is conducted in a fixed period of time. If the unit is days, the reconciliation can only be carried out after the incremental data of the day is settled. If there is a demand for cross-day reconciliation (such as yesterday afternoon to this morning), the data used in the reconciliation can only be carried out across multiple tables, which may change the process of the reconciliation business.
Check for optimization
Based on the data model and incremental calculation method proposed in this paper, the problem of daily account reconciliation can be solved well. Combined with the example in 3.1.2, we give the practical application of incremental calculation in the reconciliation business of Internet finance.
T-tdsql can accurately compare account balance table (USER) and account flow table (water) based on the function of incremental calculation, and carry out fine-grained reconciliation of flow level, so as to find transaction errors immediately, and can immediately locate the wrong transaction, eliminating the complex process of wrong transaction positioning.
The core idea of the optimized reconciliation is: general ledger summary, fine ledger.
The effect of reconciliation after optimization is: fast reconciliation of general ledger, accurate sub-ledger, no time limit, arbitrary reconciliation [1].
Reconciliation Step 1 — General Ledger reconciliation: Firstly, all the data blocks of the account table within the reconciliating period [s_start,s_stop] are read, and the data in each data block is confirmed by a formula similar to the traditional reconciliating method, that is, “Total ending balance – Total beginning balance = total transaction changes” trial calculation [2], and the total beginning balance represents the total balance at s_start. The total ending balance represents the total balance when S_stop, and the total transaction change represents the flow generated by each account. If there is an unbalance in the general ledger within the data block, it means that there is a breakdown error, so accurate reconciliation as described in steps 2 and 3 should be carried out.
Reconciliation Step 2 – Accurate reconciliation – Reconciliation process: Execute the following SQL to “snapshot differential connection” between the account balance block and the corresponding account flow block, and each record in the returned result set will contain {pre-transaction balance, post-transaction balance, transaction change}.
The corresponding execution effect is shown in Figure 13:
SELECT * FROM
(
User READVIEW START s_start TOs_stop as A ORDER BY User_id, Init_trx_id DESC
FULL OUTER JOIN
User READVIEW STARTs_start TO s_stop as B ORDER BY User_id, Init_trx_id DESC
ON A.trx_id= B.init_trx_id
)
FULL OUTER JOIN
Water READVIEW START s_start TO s_stop as C ORDER BYUser_id, Trx_id DESC
ON C.trx_id = A.trx_id
Copy the code
FIG. 13 Schematic diagram of accurate account reconciliation
Reconciliation Step 3 — Accurate reconciliation — Accurate meaning: conduct a trial calculation [3] (after-before =Change) of “post-transaction balance – pre-transaction balance = transaction Change” for each returned record in the result of Step 2 to confirm whether the transaction is wrong. If there is a situation that does not satisfy this equation, it is a wrong transaction.
Error transaction is mainly divided into account table error and flow table error. For example, in FIG. 13, the second tuple in the result set does not satisfy the trial calculation formula, indicating that the transaction with flow ID 2 has a wrong account balance update or the transaction change value in the flow record is wrong. In the fourth tuple of the result set, the value of the Change field is NULL, indicating that the stream is missing for this transaction. Through the following table, we summarize various errors, which need to be reported to the police in the reconciliation process.
Table 2 Comparison table of accurate reconciliation errors
Before | After | Change | Check for the results |
---|---|---|---|
M1 | M2 | M2-M1 | correct |
M1 | M2 | NULL | Lack of water |
M1 | M2 | (M2 – M1) ‘ | Flow record error |
NULL | NULL | M3 | Water mistakenly increase |
M1 | M2 ‘ | M2-M1 | Account table update error |
M1 | NULL | M2-M1 | The account table is not updated |
NULL | M2 | NULL | Tuples are mistakenly added to the account table |
security
There is a logical structure “UNDO SEGMENT” in T-TDSQL, which is used to UNDO data, i.e. store the information required by the result of reversed DML statement. As long as a transaction changes the data, the original data before the update will be written into an UNDO SEGMENT.
T-tdsql implements all-temporal data management, based on historical state and transitional state data stored in “UNDO SEGMENT”, realizing the data flashback function at any time point in history.
Online flashback
T-tdsql provides online data flashback, you can query the database status of a certain period of time in the past.
Reading the data state of the database at a point in time in the past (the historical state is stored rather than cleaned up) is based on the three snapshot reads mentioned in Section 4.1.1. This is how flashbacks are implemented.
Based on this principle, a variety of online flashback functions are realized, including: flashback query, flashback delete, flashback file.
-
Flash query: You can query the database status at a time point in the past and roll back a table to a time point in the past.
-
Flash Drop: A flash Drop can restore a table that has been dropped. The corresponding index will also be restored (index restoration is done through reconstruction).
-
Flashback: Flashback data archiving allows the table to have a rollback to any point in the past.
Business analysis
The dual-temporal, all-state, LineAge characteristics of temporal data, and the feature of identifying operations on data items give data items 5W potential.
5 w is to point to:
-
Why: The goal of data mining and analysis.
-
Object: What operations were performed on the data item, and What was the cause of the data change.
-
Location: The location Where the data item is stored.
-
Time: A bi-temporal property.
-
Person (Who) : The user is associated with the data item, and the transaction attribute item is associated with the user UID identifier.
With the potential of this 5W, data can be analyzed with unlimited imagination based on data items and their history, using AI technology and full data mining. This is a new world of data analysis.
Fine-grained AI tuning of systems with peak (predicted peak time pressure) and resource (automatic resource rationing) features, based on historical data loads, etc.
Multidimensional analysis of the data mining technology, which can be organized in the form of multidimensional data roll-up and drill-down, slice and dice, rotation and other analysis of the operation, in order to analyze the data, analysts and policymakers can from multiple angles, multiple profile observation data in the database, so as to understand the information contained in the data and the connotation, the perspective of the future. For example, rewinding operations are performed to evaluate the overall situation, and drilling operations are performed to trace operational details, laying the foundation for system monitoring and tuning.
The data again
T-tdsql provides data replay capability.
Historical state data storage, the whole time data with transaction time, based on these two points, can analyze the data evolution and transaction at a certain point in time, so as to deduce the evolution of the database on the time axis.
The significance of data replay lies in inferring application load and data evolution process from data. T-tdsql can effectively support data replay.
The data analysis
Multi-dimensional analysis of user portrait based on the historical changes of user data can facilitate in-depth analysis of users with the support of historical state data. With the support of bi-temporal, it is convenient to analyze data according to the changes of time and the execution of transactions.
Acknowledgments
This project was initiated in Tencent TEG billing Platform Department. The research content and implementation process were supported and participated by Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education of Renmin University of China and Tencent. We would like to express our thanks to the project participants and supporters.
References
[1] Haixiang Li et al. “EfficientTime-interval Data Extraction based on MVCC-based RDBMS”. World Wide Web Journal. 2018, Pp. 922-933.
[2] Jiang Xiaoyi, Jiang Xuezhong, ZHOU Yunxuan, Research Progress of Temporal Database, Computer Engineering and Application 2005
[3] Dharavath Ramesh, Chiranjeev Kumar: A scalablegeneric transaction model scenario for distributed NoSQL databases. Journal ofSystems and Software 101: 43-58 (2015).
[4] Tang Yong, Introduction to Temporal Database 2004
[5] Haixiang Li, Yi Feng, PengchengFan. The Art of Database Transaction Processiong: Transaction Management andConcurrency Control. First edition. Beijing. China Machine Press. 2017-10-01
[6] David B. Lomet, Roger S. Barga, Mohamed F.Mokbel, German Shegalov, Rui Wang, Yunyue Zhu: Transaction Time Support Insidea Database Engine. ICDE 2006: 35
[1] No time limit, arbitrary reconciliation: the snapshot difference can be specified in the SQL statement of reconciliation, and the table name involved in the FROM clause does not change, that is, the data source used for reconciliation is not changed, so the reconciliation process will not be affected.
[2] Are one of the procedures in an accounting program, which is simply defined as checking the debit and credit totals of all transaction entries in the diary for errors. However, when doing a trial check, this action should be done immediately after each transaction is recorded.
Question and answer
MySQL connection mode of the cloud database
reading
TDSQL all – time database system — core technology
10 common MySQL high availability solution selection interpretation
Use Zipkin to track the Mysql database call chain
Has been authorized by the author tencent cloud + community release, the original link: https://cloud.tencent.com/developer/article/1151509?fromSource=waitui
Welcome toTencent Cloud + communityOr pay attention to the wechat public account (QcloudCommunity), the first time to get more massive technical practice dry goods oh ~