Tool and Resource Center

Help developers work more efficiently and provide tools and resources around the developer lifecycle

Developer.aliyun.com/tool?spm=a1…

First, the origin of the story

In the IT circle, “delete library and run” has become a joke programmers often mentioned. It’s a joke, but it shows how important the data in the database is to the business. In 2020, the “Weimeng Event” directly caused the market value of The Hong Kong main board listed weimeng Group to lose more than 1 billion yuan in one day, directly affecting millions of users.

In xiaobian database industry experience for many years, deleting database running events are not often, but due to carelessness caused by the error of deleting data is often seen. Either by mistake, or there are bugs in the released code, resulting in data being mistakenly deleted, although it is careless, but the damage is not small.

On average, every two months, there will be a user like the above, to our duty students to seek help to recover the mistakenly deleted data.

How does PolarDB-X help these careless people recover their lost data quickly and accurately, saving their precarious jobs?

First, we classify the cases of mistakenly deleted data according to the operation type:

  • Row level error deletion, common index: 5 stars

The delete/update statement mistakenly deleted/changed multiple rows of data

  • Table level error deletion, common index: 3 stars

Use drop TABLE to drop a data table

The truncate TABLE statement was used to empty the data table

  • Database level error deletion, common index: 1 star

The drop DATABASE statement was used to delete the database

Polardb-x has created a number of data recovery capabilities to help users quickly recover data for the above different types of data deletion scenarios:

This article, the first in the data recovery series, focuses on PolarDB-X’s SQL flashback feature for row-level error deletion scenarios. Other capabilities will be covered in more detail in subsequent articles.

The scene of the accident

First, let’s start with an actual accidental deletion of data.

Let’s go over the timeline of the accident:

  • T1: DBA Xiaoming maintains an employee table, which records the employee information of the company.
  • T2: Mary resigned for personal reasons, and Xiaoming needed to DELETE Mary’s records, so he executed a DELETE statement into the database. The intention was to DELETE user Mary’s records, but an and statement was omitted due to his poor hands, which resulted in the unexpected deletion of the data of employee Ralph.
  • T3: Business continues, John is deleted, and Tim and Jose are inserted into the table. At this time careless Xiaoming found that the data was deleted by mistake, eager to restore data.

Next, around this data error accident, see how polarDB-X saved the careless Xiao Ming?

3. Existing programs

Before introducing SQL flashback, let’s take a quick look at how mainstream databases deal with row-level data deletion. According to the recovery mode, the recovery can be divided into the following two types:

  • Time between data restoration and error deletion
  • To roll back an incorrect deletion operation

(1) The time between data restoration and error deletion

1. Based on the PITR

Point-in-time Recovery(PITR): As the name implies, using backup files to restore the database to any Point in time in the past. This capability is currently supported by mainstream databases. Although different database PITR implementations are different, the overall idea is the same, as shown in the figure below:

First, the full backup set of the database is relied on to restore the data to the point in time of the previous backup (usually every few days), and then the incremental data change record is relied on to restore the data to the required point in time. Polardb-x’s PITR implementation is in the same direction, but because of distributed transactions, PolarDB-X has done more to ensure data consistency across shards, which will be covered in a future article.

With the ability of PITR, once the data is mistakenly deleted, the most direct idea is to restore the database to the time point before the data is mistakenly deleted by PITR. The advantage of this scheme is that the database can be restored to any time point required by the user, but there are some problems:

  • Long recovery time: The entire database needs to be recovered, which takes a long time. Even if only 100 pieces of data are deleted by mistake, the entire database (or the entire table) needs to be restored in this way.
  • Additional storage space: For data security purposes, PITR usually restores data to a new database rather than overwriting the data in the original database, which requires additional storage space to store the new database data, which can be significant in large data volume scenarios.
  • Some service data is lost: As shown in the preceding example, services continue to read and write data from the database after data is mistakenly deleted. If the database is restored to the time before the deletion, normal service data will be lost after the deletion.

The following figure shows an example of PITR based recovery to the time before data was mistakenly deleted for our accident site. As can be seen from the figure, at the time of recovery at T2, although the mistakenly deleted data was retrieved, the normal business changes in T2 ~ T3 were also lost.

2.Flashback Query

To solve the problem of PITR recovery time, there are many optimization strategies, among which Oracle and PolarDB-X’s Flashback Query function is a representative one.

Oracle’s Flashback Query builds a historical snapshot based on undo information, directly reading the data from the previous mirror, and restoring the data to the point in time before the error deletion. Polardb-x’s Flashback Query implementation is similar in that it uses undo table information to read data from a historical point in time. However, compared to a stand-alone database, we need to consider data consistency between different data slices when we restore to a past point in time. I will not elaborate here.

The following uses Oracle as an example to illustrate the above scenario. Suppose T2 corresponds to: 2021-04-06 19:23:24, then in Oracle, through Flashback Query function, I only need to execute the following SQL to Query the Employee table data at 2021-04-06 19:23:23:

select * from employee 
as of timestamp 
to_timestamp('2021-04-06 19:23:23','YYYY-MM-DD hh24:mi:ss')
Copy the code

Based on the queried data before error deletion, the user can quickly recover the data.

Flashback Query, which is based on Undo information recovery, greatly improves the speed of data recovery and does not require additional storage space. Compared with PITR, the recovery efficiency is higher. But there are two problems with this approach:

  • Some service data is lost: Essentially, the data is restored to the point in time before incorrect deletion. This problem still exists.
  • Timeliness problem: The Flashback uses undo data in the undo tablespace. Once the undo data is cleared due to space pressure, the Flashback cannot be used. Therefore, this mode can only support data rollback in a short period of time.

(2) Roll back incorrect deletion operations

Since the database records data changes through the incremental log, is it possible to simply roll back and forth through the incremental log? The answer is yes, and one example is the MySQL Binlog Flashback tool.

When the MySQL binlog format is set to Row mode, the binlog will record the changes in the data rows.

For the Employee table in the figure above, when I execute the following delete statement to delete two rows of data, the corresponding binlog is as follows:

As you can see from the figure above, binLog records every row that is deleted by the DELETE statement, as does update.

Based on this information, the Binlog Flashback tool reverts the data changes in the Binlog according to the operation time range and operation type, generates the corresponding rollback SQL, and performs data recovery.

For example, for the delete operation above, the execution time is 22:21:00, so we only need to find the DELETE operation between 22:20:59 and 22:21:01 in binlog, and convert it into the corresponding INSERT statement, as shown below, to recover the lost data.

insert into test.employee values('2', 'Eric Zhang');
insert into test.employee values('3', 'Leo Li');
Copy the code

This recovery method based on the incremental log rollback operation has a faster recovery speed, and because the incremental log is stored for a long time, the data recovery time is longer than the Oracle Flashback Query method. But there are some problems with this approach:

  • Large rollback scope: Existing Binlog Flashback tools can only filter data in binlog and roll back based on limited conditions such as SQL execution time and SQL type. If there are normal business operations within the filtering time range, that will also be rolled back. The following figure shows the problems in our accident site after using this recovery method:

As shown in the figure above, when we use the Flashback tool to roll back all DELETE operations in the range T2 to T3, there is 1 row more data than the actual need to restore. If the data needs to be recovered, manual comparison is required to remove the data that does not need to be recovered. And this part of the manual work is often time-consuming and error prone.

4. Main character – SQL flashback

Polardb-x SQL flashback function is used to roll back misoperations. However, compared to existing solutions, it provides sqL-level rollback capability and easy-to-use operation interface.

(1) SQL level rollback capability

What is the sqL-level rollback capability? That is, only data rows affected by misoperations are rolled back without affecting normal services.

As an example, let’s take a look at how polardb-x SQL flashback can roll back an incorrect deletion operation.

First, each SQL executed in Polardb-x is assigned a unique id number (TraceID), which ensures that all changes are traceable.

When we find the mistakenly deleted data, we can accurately find the mistakenly deleted SQL data by using the SQL flashback based on the TraceID of the mistakenly deleted SQL and then roll back the data. As shown in the figure above, the TraceID of the misoperated SQL is ABcm321. Based on the “ID number”, the SQL flashback can accurately locate the data deleted by the misoperated SQL and generate the corresponding rollback SQL.

(2) Get started quickly

Having said that, how exactly does SQL flashback work in Polardb-X? SQL flashback provides a very convenient operation mode, only three steps to complete the data recovery, fully take care of Xiao Ming’s anxious mood.

1. Find the “ID number” of the misoperated SQL in the SQL audit function

2. On the SQL flashback page, enter the approximate time range and TraceID for executing the misoperated SQL, and submit the SQL flashback task.

3. After the flashback task is complete, download the recovery file and restore data.

Five, the summary

This article mainly focuses on the row level error deletion of data, introduced polarDB-X SQL flashback is how to help users recover data. Compared with existing data recovery solutions, SQL flashback provides SQL-level rollback capabilities and easy-to-use operation interfaces to help users recover misdeleted data more accurately and quickly.

Of course, data security is an eternal topic. Polardb-x has built a number of tools to protect users’ data in different data deletion scenarios: PITR, Recycle Bin, Flashback Query, and so on. Stay tuned for future articles that will cover each of these capabilities