MVCC time machine: Time travel in TiDB! Interview with the Dodo Revival Team

TiDB is familiar with all kinds of disasters and failures, but as the saying goes, “Natural disasters are easier to avoid, but man-made disasters are harder to prevent”. There are no measures for various misoperations, bug writing wrong data, and even deleting databases to run away. Our program was originally designed to deal with these “unexpected” incidents. The original name of the project was TiDB Flashback, which was later changed to “MVCC Time Machine” after deciding that the name was too barren to justify the project’s content.

— Dodo Revival

In TiDB Hackathon 2021, the dodo Revival Team’s work “MVCC Time Machine” made full use of MVCC features to enhance MVCC data query, collation, recovery ability, improve the efficiency of problem solving, and won the “third prize” of the competition.

A very smart and lightweight implementation, very good, looking forward to release soon.

— Judge Feng Guangpu

This project is a life-saving function for operation and maintenance students at some time. It solves the problem of operation and maintenance well through SQL. Even better, the project introduced a multi-SafePoint mechanism, which allows periodic global snapshots of TiDB clusters for fast and lightweight logical backups.

— Judge Tang Liu

About the project

Natural disasters are easier to avoid than man-made ones

As a distributed database, the perfect high availability and disaster recovery mechanism can be said to be the core features of TiDB. In production, however, disaster recovery in theory and in practice can be very different. Unpredictable hardware failures and power outages caused by natural disasters are terrible, but misoperation of the production environment as a test environment, vulnerability hacking, business Bug writing bad data may be more frequent accidents.

One of the team members, @Disking, used to work for a game company and knows this human error: There is a bug in the code written by business colleagues, which is used by players with ulterior motives to obtain a lot of improper profits within a few hours after the release of the version. At this time, it is necessary to backfile the game data to a specific point in time. What needs to be realized is the function of “time machine”. The loss of manual refiles and lost data can cause a lot of trouble for the team.

Physical troubleshooting TiDB has provided solutions such as BR tools and multi-center solutions, but the risk of incorrect data writing caused by an RM -RF /* or “intern misoperation” is even more uncontrollable, which is also a big piece of the puzzle TiDB is missing.

Make MVCC Great Again!

Because TiDB transactions are based on MVCC, so the old version is in for a while, theoretically for the above man-made disaster, can be manually recovered. But the existing and planned features are relatively weak:

Data corruption may need to be checked. Currently, ONLY TS can be assigned to read data at one point in time. It is too troublesome to view the change history of a record
RECOVER TABLE only recovers DDL operations such as DROP and TRUNCATE, but does not RECOVER DML operations
Data before GC Safepoint cannot be recovered. If you want to keep data for a long time, it takes too much space
To restore data, dump data first and then write it again. It is too slow

Therefore, the efficiency of problem solving can be improved by making full use of MVCC features and strengthening the ability of MVCC data query, sorting and recovery. MVCC can be used not only for temporary transaction isolation, but also for cold backup, which has the advantage of saving space and recovering data more quickly than external backup.

@Disking actually had the idea back in the second Hackathon, inspired by Oracle’s Flashback feature. He put the idea on a development discussion group but got no response, and he didn’t have enough energy to build his own team. Although I don’t want to say that it’s hard to remember, such an interesting and useful idea should be realized at some time, and this Hackathon is a good opportunity to realize it.

Thus began the Dodo Revival.

The team used 20 dodo conversations for the demo

About the game

How do you make a time machine?

He who controls the past controls the future; He who controls the present controls the past.

— George Orwell, 1984

In general, there are some (at first glance) fancy things done around TiDB’s MVCC mechanism, such as querying the MVCC history of table records, tampering with MVCC records, and Flashback operations based on MVCC records.

The core or practical point of the whole project is the combination of GC SavePoint and Flashback. By setting periodic snapshot backup and using TiDB MVCC mechanism to achieve “cold backup” inside TiDB storage, it can save lives at some critical moments. After all, you only need one SQL to achieve a Flashback of the entire table data.

The whole project is divided into three relatively independent modules:

1.MVCC Query in SQL -> manipulate past

reference_tidb_rowidImplementation, increase_tidb_mvcc_ts._tidb_mvcc_opVirtual columns.
When querying virtual columns, TiDB sends a request to TiKV with a flag indicating that MVCC virtual columns are to be queried.
Modified the MVCC read logic for TiKV so that when a virtual column needs to be queried, all versions need to be scanned instead of just the latest version. Then set the virtual column value for each piece of data._tidb_mvcc_tsFor the transaction ofcommit_ts._tidb_mvcc_opIs the operation type of the transaction, which can bePUTorDELETE.

Here is a demonstration of how we can query MVCC records with various poses!

You can even tamper with an MVCC record.

2.GC Savepoint -> Master the present

addgc_savepointSystem table, you can add, delete, change and check through SQL management.
When GC is going on, thegc_savepointTable data, and the originalgc_safepointStore them together in PD.
Modify the GC logic to consider when reclaiming datagc_savepoint. Since there are two types of GC: traditional GC and compaction GC, you can create only one. In setting upGc_save_point_interval = "5 m"Later, ingc_safe_pointPreviously, MVCC records that would have been recycled retained a version every 5 minutes.

3.Subsecond Flashback -> Look to the future

addflashback table tsSQL statement, used to specify the table to restore data. Restore the table to a version that does not exceed the TS timestamp.
Write the time range into the Table Schema and trigger the DDL operation. When the DDL synchronization is complete, the operation succeeds.
When TiDB requests TiKV, the TS range to be ignored needs to be sent to TiKV in the request.
Modify the MVCC read logic by skipping the corresponding version based on the specified interval.
When the TS interval exceeds the GC range, it needs to be cleaned up.
Combined with the above MVCC query, we can see that the “changing Dodo” in the table data was still “time Dodo” before a certain time node. Through Flashback operation, we successfully changed the data back to its previous appearance, and called the “Time Dodo” back.

In a real disaster recovery scenario, if we accidentally change a few pieces of data on a table, or even delete an entire table by mistake, we can use Flashback SQL to restore a key to any MVCC record version.

future

Currently, only TableScan has been Demo implemented, and some adaption work of IndexScan and point query has not been carried out. Some of TiDB’s ecosystem tools do data queries across the SQL layer, and compatibility in this area is also the next consideration.

In addition, if you can extend the combination of The Flashback operation and MVCC Query, you can do much more, such as viewing Flashback records, undoing Flashback operations, modifying Flashback records, and so on.

About the players

How did you come to TiDB’s world?

@JMPotato recently finished his student career and is now a PingCAP r&d engineer. @Rinchannow, his undergraduate roommate, also joined the project. Another of their roommates also worked as an intern in byteDance’s distributed systems development department.

— A dormitory basic software engineer!

For many students, becoming an app developer or algorithm engineer or entering the AI industry is a more mainstream choice. Why do they join the industry at the same time?

@JMPotato said he started it. In late 2019, he was listening to a podcast when he overheard Dongxu sharing about distributed databases, and didn’t know about PingCAP at the time. Later, WHEN I learned about Raft, I came into contact with TiKV and TiDB, and gradually realized that they were PingCAP products. At that time, PingCAP was just recruiting interns, and I was fascinated by it. After doing a lot of preparation, I began to interview and have an internship in the company until now.

@Rinchannow also said that the internship of @JMpotato opened a whole new world for the whole dormitory. I still remember when we implemented a simple Raft protocol together, and from then on, I could feel the wonder of distributed system, which was not a passive introversion, but a kind of love and yearning from the heart.

How do you quickly join Hackathon and start working without knowing TiDB?

Although @Rinchannow has learned related knowledge of distributed system before, he has no actual experience of TiDB in his study and work. As an external developer, how do I get involved in this Hackathon about TiDB? This is also an important reason why many students are discouraged from Hackathon activities. @rinchannow has no such worries. On the one hand, TiDB is rich in documentation, and systematic learning is easy; On the other hand, TiDB has a very active community. Whether it is AskTUG, TiDB Internals or GitHub, you can meet many like-minded partners who are willing to help New TiDBer quickly integrate into the community.

RinChanNOW also shared some specific learning experiences:

In addition to preparing the development environment for TiDB and TiKV, one of the preparations was to understand the code structure of TiDB and TiKV and their data flows, that is, to have a general understanding of their source code, which was also the most time consuming process, so I did not have a lot of code, but it took a long time to write. I did a bit of a study of the source code, based on the PingCAP blog:

The Select process:
- TiDB source read series of articles (three) SQL life
- TiDB source read series articles (six) Select statement overview
How to push a query down to TiKV and execute:
- TiKV source code analysis series article (14) Coprocessor overview
- TiKV Coprocessor Executor source code analysis series of articles (16)
The Insert process:
- TiDB source read series of articles (four) Insert statement overview
- TiDB source read series of articles (16) INSERT statement details
The execution flow of an SQL statement:
- Prepare/Execute request processing
TiKV MVCC read and write flow:
- TiKV source code analysis series article (thirteen) MVCC data reading

Play for Real: What’s different about TiDB Hackathon?

In addition to TiDB Hackathon, @JMPotato and @Rinchannow have also been involved in other similar programming competitions to a greater or less extent. They agree: many programming contests are more student-oriented, with assignments with clear goals and even standard answers, and are more like tests of programming ability, perhaps to see who can achieve the most elegant and effective implementation. Participating in TiDB Hackathon is a completely different experience. TiDB Hackathon is more practical. There is no clear topic selection, and it is an unknown adventure. Creativity and thinking are more important than code implementation.

However, affected by the epidemic, TiDB Hackathon in the past two years lost a little atmosphere although it was lively. If there is a chance, I still look forward to the next year’s Hackathon and all the contestants coming to the same space for on-site communication and a real 48-hour intensive development.