After two days of unremitting efforts, we finally recovered the production server data deleted by a misoperation.

Record the process and solution of this accident here, alert yourself, but also remind others not to make this mistake.

Also hope to encounter a problem of friends can find a trace of inspiration to solve the problem.

01 Accident Background

I arranged a girl to install Oracle on a production server. The girl studied and installed Oracle, but felt that it was not installed correctly, and prepared to uninstall and reinstall Oracle.

To delete the Oracle installation directory, run the following command:

rm -rf $ORACLE_BASE/*
Copy the code

If ORACLE_BASE is not assigned, the command becomes:

rm -rf /*
Copy the code

Wait, the girl is using Root. In this way, the entire disk files are deleted, including the Tomcat, MySQL database and so on……

Isn’t the MySQL database running? Can Linux delete executing files? Anyway, it was completely deleted, and there was still a Tomcat Log file left at last. It is estimated that the file was too large, so it was not deleted successfully for a while.

Look at sister remorse eyes, and because this thing is I arranged her to do, also did not tell her a strong relationship, without any training, responsibility can only be a person on the back, and how can let the beauty bear this responsibility?

Make a phone call to the machine room, hang the disk to another server, SSH to check the files are all clear, this server is running but a customer’s production system ah, has been running for half a year, have to recover as soon as possible ah.

The backup file is only 1KB and contains only a few lines of familiar mysqldump comments. The closest backup is from December 2013.

** Recall the case that a leader once said: ** When a production system was down, I found all the backup problems, the burned CD also had scratches, the tape drive was broken (an industry veteran, it is estimated that the disc used to do backup), did not expect to come true to me today, how to do?

After knowing the situation, the department leader has made the worst plan B: the leader personally leads the team and product AA to the customer’s city on Sunday, and communicates with the leadership on Monday. BB and CC go to the account manager and try to convince the customer…

02 Lifesaver: ext3grep

Quickly go to the Internet to check the data to recover the deleted data, and actually find an ext3grep can recover the deleted files through rm -RF, our disk is ext3 format, and there are many successful cases online.

Then lit up a glimmer of hope, quickly umount the disk to prevent re-write overdeleted file sector. Download ext3grep and install it.

Run the scan file name command:

ext3grep /dev/vgdata/LogVol00 --dump-names
Copy the code

I printed out all the deleted files and paths. I was so happy that I didn’t have to go to Plan B. The files were all there.

This software can not restore files by directory, can only execute the full restore command:

ext3grep /dev/vgdata/LogVol00 --restore-all
Copy the code

Results The current disk space is insufficient, can not only restore the file, tried several files, unexpectedly part of the success and part of the failure:

ext3grep /dev/vgdata/LogVol00 --restore-file var/lib/mysql/aqsh/tb_b_attench.MYD
Copy the code

Heart can not help a cool, is to delete the disk was written files? Recovery probability is not very, can recover a few count a few, perhaps important data files just in the MYD file can be restored.

Redirects all file names to a single file:

ext3grep /dev/vgdata/LogVol00 --dump-names >/usr/allnames.txt
Copy the code

Filter all MySQL database names into mysqltbname.txt.

Write a script to restore the file:

while read LINE do echo "begin to restore file " $LINE ext3grep /dev/vgdata/LogVol00 --restore-file $LINE if [ $? != 0 ]  then echo "restore failed, exit" fi done < ./mysqltbname.txtCopy the code

Execution, about 20 minutes to run, recovered more than 40 files, but not enough ah, we nearly 100 tables, each table FRM, MYD, myI three files, how to say there are more than 300 ah!

Attach the retrieved files to the existing database, and restart MySQL after the file permission is 777, which can be regarded as part of the data recovered, but the customer’s important attendance data and mobile phone report data (it is said that the customer does the employee performance according to these data) have not been recovered.

Do how? In the middle, I tried another tool extundelete, which has basically the same syntax as ext3grep, and the principle should be the same, but it is said that it can be restored by directory.

Well, give it a try:

extundelete /dev/vgdata/LogVol00 --restore-directory var/lib/mysql/aqsh
Copy the code

As expected, recovery can not come out !!!!!!!! Those documents have been destroyed. Talk to your boss and go ahead with Plan B… Helpless to go home from work. (It’s the weekend. Go back and have a rest. Think about it.)

03 Brainwave: Binlog

The next morning one early wake up (in the mind occupy), back computer, go to the company (this weekend is an expense account, not criticized, report, fine, discharge is good, still lead what weekend).

Still run ext3grep, extundelete, and that’s it. Put the system on a test server and see if the data can be patched up.

Mysqldump = mysqldump = mysqldump = mysqldump = mysqldump = mysqldump

Wait, Wait, isn’t there a Binlog? Our service requires Binlog to be enabled. Maybe we can recover data from Binlog.

Dump = Binlog; Dump = Binlog;

  • mysql-binlog0001
  • mysql-bin.000009
  • mysql-bin.000010

Restore 0001:

ext3grep /dev/vgdata/LogVol00 --restore-file var/lib/mysql/mysql-bin.000001
Copy the code

It failed… Mysql > restore mysql-bin.000010 mysql > restore mysql-bin.000010 mysql > restore mysql-bin.000010 mysql > restore mysql-bin.000010 mysql > restore mysql-bin.000010

SCP to the test server. Perform Binlog restore:

mysqlbinlog /usr/mysql-bin.000010 | mysql -uroot -p
Copy the code

Input password, stuck (good phenomenon), after a long wait, finally ended. Open the app, oh, thanks CCTV, MTV, the data is back!

04 postscript * *

** also wants to remember this incident and not make the same mistake in the future. ** Reflection on the accident is as follows:

  • This time, WHEN MM was arranged to maintain the server, she did not explain the severe situation in advance, and she did not pay attention to it. The management was chaotic and the process was chaotic. In an online production system, any change must be made first.
  • There was a problem with the automatic backup without anyone checking it. Offline backups download 1K files at a time from the server and never pay attention. People need to be clear about their responsibilities at work.
  • After an accident occurs, data is not detected in a timely manner. As a result, some data is written to disks, causing unrecoverable problems. You need to write application monitoring programs. Once the service is abnormal, the person responsible for the SMS alarm is responsible.
  • According to the comments, add one more: you can’t use Root to do this. You should set up different levels of users on the server.

Through this accident, share the tool links used in this paper:

  • code.google.com/p/ext…
  • extundelete.sourceforg…

The function is similar to ext3grep, and the principle should be similar. Compilation installation dependency package is more, you can search online how to install.

Finally, I hope you can remember this incident, happy code, never make mistakes ~