☞ welfare resources | CSDN download service free of charge

MySQL column learning series, this column I do not expect many people will read, just record, hope can help those who need.

The database does not need to worry about server crash any more.

So we need to learn why we can do this? And how?

That is, why can it be restored to any point in time? How to recover to any point in time?

Why do I need redo logs when I have a binlog?

How are transactions committed? Do I write binlog or redo log first? How can I ensure the consistency of logs in the two parts?

Last time we looked at the entire execution of a SELECT statement, so today we’ll start with an UPDATE statement.

mysql> update T set c=c+1 where ID=2;Copy the code

In fact, the execution process is the same as the query process, but the last executor is to find this data and update it.

In addition, the update process involves an important log module called the redo log and the binlog.

I personally have only heard of binlog.

1 redo log

Like most relational databases, InnoDB records physical changes to data files and guarantees that they are always logged first.

This is also known as write-Ahead Logging (WAL), which ensures that redo logs have been written to disk before persisting data files.

Every MySQL update is not written to disk. The InnoDB engine writes the record to the redo log, updates it to memory, and then updates the record to disk when appropriate.

I mentioned two important logs, I think it is necessary to post the InnoDB storage structure diagram here, to have a general understanding of it:

InnoDB physical storage structure

If you feel overwhelmed by the various Spaces below, I suggest you come back and take a look at this diagram.

1.1 What is a redo log?

When the database makes changes to the data, it needs to read the data pages from disk into the buffer pool, and then modify the data in the buffer pool.

In this case, the data pages in the buffer pool will be inconsistent with the data pages on disk. We call the data pages in the buffer pool dirty pages.

It feels like making a copy of the data, making changes to the copy, and then overwriting the original data.

dirty page

You can also see that all updates are being done in the current dirty Page.

If an abnormal DB service restart occurs at this time, the data is not in memory and has not been synchronized to the disk file (note that the synchronization to the disk file is a random IO), thus data loss will occur.

When the dirty page changes in the buffer pool end, the corresponding changes can be logged to the file (note that the log is sequential IO).

When the DB service crashes and the DB is recovered, it can be applied to the disk file again according to the record content of the file to keep the data consistent.

This file is called the redo log, and is used to log changes to data, in sequence.

A redo log is a physical space where dirty pages are stored.

1.2 When produced & released?

Redo logs are created after a transaction is started. Redo logs are written to the redo log file during a transaction, not after the transaction is committed.

After the dirty pages of the transaction are written to disk, the redo log is done, and the space occupied by the redo log can be reused (overwritten).

1.3 How to Write?

The redo logfile is named ib_logfile[number] and is written sequentially to file files. When the redo log is full, it is overwritten back to the first file.

Cycle to write

As shown in the figure:

  • Write pos is the position of the current record, which moves backwards as you write, returning to the beginning of file 0 after writing to the end of the last file.

  • Checkpoint is the current point of erasure, which is also moved backwards and iterated. Records are updated to the data file before erasure.

Write The remaining space between pos and checkpoint, which can be used to record new operations.

If write POS catches up to checkpoint, it is full, and no new update can be performed. It must stop to erase some records and push checkpoint forward.

Redo log files are written in a loop, always ensuring that dirty pages are flushed to disk before overwriting.

Under very heavy loads, redo logs can be generated very fast, leading to frequent brushes, which can degrade performance.

If such a scenario is expected, we recommend increasing the redo log file size. You can do a clean shutdown, modify the Redo log configuration, and restart the instance.

Reference: http://mysql.taobao.org/monthly/2015/05/01/

1.4 Related Configurations

By default, the corresponding physical files are ib_logfile1 and ib_logfile2 in the data directory of the database.

Innodb_log_group_home_dir Specifies the path where the log file group resides. By default, it is in the data directory of the database. Innodb_log_files_in_group Specifies the number of files in the redo log file group. By default, 2# specifies the size and number of files. Innodb_log_file_size Specifies the size of redo log files. Innodb_mirrored_log_groups specifies the number of log mirror groups. The default was 1Copy the code

1.5 other

The redo log has a buffer, Innodb_log_buffer. The default size is 8 meters. Innodb storage engine writes redo logs to Innodb_log_buffer first.

Redo log writing

Innodb log buffer logs are then flushed to disk in one of three ways:

The Master Thread refreshes Innodb_log_buffer to redo log files once per second. 2. The redo log is flushed to the redo log file for each transaction commit; The redo log cache is flushed to the redo log file when the redo log cache has less than half the available space.

With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.

CrashSafe can ensure that the MySQL server restarts after downtime:

  • Data for all committed transactions still exists

  • Data for all uncommitted transactions is automatically rolled back


2 binlog

As mentioned above, MySQL as a whole can be divided into Server layer and engine layer.

The redo log is a log unique to InnoDB at the engine level. The Server level has its own log, called binlog.

2.1 What is a binlog?

Logs in logical format can simply be thought of as SQL statements in executed transactions.

However, it is not so simple as SQL statements, but contains information about the reverse of the SQL statement executed (add, delete, change).

This means that a DELETE corresponds to a DELETE and its reverse insert; Update indicates the version before and after the update. Insert corresponds to delete and insert information itself.

2.2 When is it produced & released?

When a transaction is submitted, the SQL statements in the transaction are recorded in binlog in a certain format. Therefore, it may be slower for large transactions to commit.

The default hold time of binlog is set by expire_logs_days. That is, inactive log files are automatically deleted after the generated time exceeds the specified number of days.

2.3 Differences between redo log and Redo Log

1. Redo log is InnoDB engine specific, binlog is MySQL Server implementation, all engines can use;

The redo log is a physical log of changes made to the data page. It is a DML and DDL statement that are in progress.

The binlog is a logical log, which records the original logic of the statement. The DML and DDL SQL statements have been submitted, for example, “Add 1 to the C field in the row where ID=2”.

Redo logs are written in a circular, fixed space. You can always append files to a binlog. Once a file is written to a certain size, it will continue to write another file. The previous file will not be overwritten.

4. Different functions: Redo log is used to ensure transaction security and restore data after an abnormal down machine or media failure. Binlog is used for master slave replication and point-in-time recovery.

5. In addition, the time the logs are generated, the time they can be released, and the mechanism for cleaning up when they can be released are completely different.

Reference: http://www.importnew.com/28039.html


3 Data update transaction process

With a conceptual understanding of these two logs, let’s look at the internal flow of the executor and InnoDB engine as they execute this simple UPDATE statement.

1. The actuator first searches for the engine and fetch the line ID=2. ID is the primary key, and the engine uses the tree search directly to find this row. If the row ID=2 is already in memory, it is returned directly to the executor. Otherwise, you need to read into the memory from disk and then return.

To do this, load data into dirty data.

2. The actuator takes the row data given by the engine and adds 1 to the value. For example, it used to be N, but now it is N+1 to get a new row of data.

3. The redo log is prepared when the engine updates the data to memory and logs the update to the redo log. The executor is then told that the execution is complete and the transaction can be committed at any time.

4. The executor generates a binlog of this operation and writes the binlog to disk.

5. The executor calls the commit transaction interface of the engine, and the engine changes the redo log to commit.

Transaction process

Two-phase commit

Are redo logs and binlogs confusing?

This is called a two-phase COMMIT, where the COMMIT is automatically divided into prepare and COMMIT phases.

Two-phase commit

MySQL generates xids during the prepare phase and writes them to the binlog during the COMMIT phase. The Binlog determines whether to commit or rollback a recovery event.

As you can see from the above two-phase commit flow, the two-phase commit ensures that in all cases, the transaction exists in either the redo log or the binlog, or neither.

This ensures that the binlog and redo log sequences of transactions are consistent. Once the persistent binlog completes in phase 2, the transaction is guaranteed to commit.

It is also important to note that a fsync operation is required at each stage to ensure data consistency between the upper and lower layers.

PS: The Binlog is recorded after the InnoDB engine prepares for writing the Redo Log to disk. This is important.

Note that the Redo log is flushed to disk until the master thread fsync Redo log per second, whereas the Binlog is flushed until the transaction commit. If the binlog is too large, the commit will be slow.

Reference: http://www.ywnds.com/?p=7892

For example

The Bin log is used to record complete logical records. All logical records can be found in the Bin log. Therefore, based on the Bin log, complete logical operations recorded in the Bin log are used to back up data that is identical with the original database.

For example, when the redo log performed update t set status = 1, the status of the original database was updated to 1. The bin log failed to write data to the database, and its status was 0 during the subsequent backup and restoration.

The redo log records data that are flushed to disk even after an abnormal restart, and the bin log records data that are used for backup and recovery.

A complete trading process:

You sell a coke in the log, you take the money, you put it in the bin log, you go back, you write a commit in the log, you give the coke, you close the deal.

If the transaction is interrupted when collecting the money, turn back to sort out the transaction and find that only the bookkeeping did not collect the money, the transaction failed, delete the records on the books (rollback);

If the transaction is terminated after receiving the money, then the transaction is valid after the transaction is completed and the bin log is recorded in the account.

4 How can I restore data at any time?

When you need to restore to a specified second, for example, there is a database error and you need to retrieve the data, you can do the following:

1. First, find the most recent full backup, or if you’re lucky, last night’s 11.22 backup, and restore to the temporary repository from this backup.

2. Then, from the backup point in time, take out the backup binlog one by one and put it back to the time before the table was deleted by mistake at noon.

You can then retrieve the table data from the temporary repository and restore it to the online repository as needed.

When a crash occurs, the recovery process is also very simple:

1. During the recovery process, the last binlog file will be scanned to extract the XID from it;

Redo redo log after checkpoint, collect the linked list of prepared transactions, and compare the xID of the transaction with the XID in binlog.

If yes, the transaction is successfully recorded in binlog but not committed. If no, the transaction is committed. Otherwise, the transaction is rolled back.

It is necessary to read the above two paragraphs together to understand more clearly.

To sum up, there are basically the following situations:

  • When the database recovers and the transaction is not written to binlog and the redo log is not committed, rollback the transaction.

  • When the transaction crashes in the binlog phase, the log has not been successfully written to the disk. When the transaction is started, the transaction is rolled back.

  • InnoDB does not commit InnoDB, but InnoDB does not commit InnoDB. In this case, MySQL database recovery will read the xID in binlog. InnoDB is then told to commit these xID transactions. After committing these transactions InnoDB rolls back other transactions so that the redo log and binlog are always the same.

Let me give you my own understanding

1. Prepare stage; 2. Write binlog; 3. Commit stage;

When it crashes before 2

Restart recovery: If the commit is not found, the system rolls back.

Backup and restore: no binlog. Consistent.

When it crashes before 3

Restart recovery: Although the commit is not performed, the system is automatically committed after the restart because the prepare and binlog are complete.

Backup: a binlog file is available.

If a transaction is successfully dropped in the prepare phase and binlog is successfully written to the MySQL Server, the transaction must be committed.

conclusion

This section describes two important MySQL logs: physical log redo log and logical log binlog.

It’s a good idea to understand what the two types of logs are for, and to be able to figure out the submission process yourself.

Questions & comments section highlights

This comment section is a bit more exciting!

q

Questions after class

Earlier, I said that periodic full backups are “either daily or weekly, depending on system importance.” So in what context is daily prep more advantageous than weekly prep? Or, what metrics does it affect in the database system?

A 1

The length of the backup period has two aspects

First, it is the time to recover the lost data. Since recovery is needed, it must be the data lost. If a backup one day, as long as find a perfect day, and to join the binlog of one day a certain period of time to recover, if a backup on Monday, hypothesis is on Monday, and you want to restore data is a particular point in time, on Sunday, then need to complete all the binlog + sometime on Monday to Sunday to restore, compared to the former need to add a lot of time; See what the business can stand

If the database is backed up once a week, you need to ensure that the binlog is intact for the entire week. Otherwise, the database cannot be restored. One day, as long as the binlog of that day is intact; Of course, this can be done through validation, or redundancy, or something like that, but that’s more important

A 2

The database backup period directly affects the restoration speed. If the database is backed up once a day, you only need to modify the database on the latest day. A week is a lot of work. So in systems that are sensitive to system recovery speed, it is best to use one standby a day, or even one standby an hour, etc.

A 3

I understand that backup is the medicine for saving lives and the medicine for regret. When disaster happens, backup can save lives, and when mistakes occur, backup can regret. Every coin has two sides, no one is better than the other, only one is better than the other, completely depends on the business situation and needs. A daily backup has shorter recovery times, fewer binlogs, quicker rescues, but shorter regrets, as opposed to a weekly backup. My own backup strategy is to set up a slave repository with a 16-hour delay for replication, which acts as a fail-safe with faster recovery time. Two more days with a full stock and binlog, as a lifesaver, last minute. So it’s more balanced.

A 4

1 If there is no primary/secondary Server or Binlog Server, you are advised to back up data at least once a day to shorten the backup period, for example, back up data every hour.

2 If there are primary/secondary servers and Binlog servers, you are advised to back up data at least once a week. If the database is small and the concurrency is not high, you can shorten the backup period, for example, backing up data once a day.

In this case, there is no distinction between primary and secondary Binlog servers. It is recommended that at least primary and secondary replication be set up for critical systems, and a Binlog Server be set up if possible (especially in the financial environment).

q

1. The client sends an SQL statement to the SQL interface on the server layer through TCP/IP

2. After receiving the request, the SQL interface parses the statement to check whether the permission matches

3. After the verification is passed, the analyzer analyzes the statement to see if there are syntax errors

4. Next, the optimizer generates the corresponding execution plan and selects the best execution plan

5. Then the executor executes the statement according to the execution plan. This step will open the table and wait if there is an MDL on the table.

If not, add temporary MDL(S) to the table.

If opend_table is too large, open_table_cache is too small. Need to keep opening FRM files)

6. Enter the engine layer, first go to innodb_buffer_pool data dictionary(metadata information) to obtain table information

7. Use metadata to find out if there is any lock information in lock INFO, and write the lock information required by the update statement to lock INFO.

8. The old data involved is then stored in the undo page of Innodb_buffer_pool as a snapshot, and the undo log modified redo file is recorded

(If there is a data page, it can be directly loaded into undo Page; if there is no data page, it needs to fetch the corresponding page data from disk and load it into Undo Page.)

9. Update innodb_buffer_pool data page. Log the physical data page changes to the redo log buffer

Because the update transaction involves multiple page changes, the redo log buffer records multiple page changes.

Because of the group commit, the redo log buffer generated by this transaction may be flushed and synced to disk along with other transactions

10. Modified information is recorded in binlog_cache in the event format. (Note that binlog_cache_size is a transaction level parameter, not a session level parameter.

After the commit, the dump thread actively sends events from the binlog_cache to the SLAVE I/O thread.

11. Merge the merge index with the secondary index next time there is another SQL query that needs to read the secondary index

(Random I/O becomes sequential I/O, but since disks are now SSDS, there is not much difference between random and sequential I/O for addressing.)

12. The UPDATE statement is complete and you need to commit or rollback. Here we discuss the commit case, with a double 1

13.com MIT operation, since internal XA is used between the storage engine layer and the server layer (to ensure the consistency of the two transactions, mainly guarantee the atomicity of redo log and binlog),

Therefore, commit is divided into prepare phase and COMMIT phase

14. In the prepare phase, flush the xID of the binlog_cache and sync the xID of the binlog_cache

15. MIT: The redo log from this transaction has been synced to disk. So this step simply marks commit in the redo log

Alter tablespace doubleWrite; alter tablespace doubleWrite; alter tablespace doubleWrite; alter tablespace doubleWrite; Then it is through the Page cleaner thread to write the dirty pages to the disk

Teacher, do you see any problems with my steps? I feel there is something wrong with step 6, because step 5 has gone to open table, is it necessary to look for metadata in the buffer in step 6? Is this metadata for the system or for all tables? Thank you for your correction

a

In fact, the implementation of 5 calls the procedure of 6, so it’s the same thing. MySQL Server layer and InnoDB layer both preserve the table structure, so it will be broken down when described in the book.

This description is very detailed, and also points to the internal general which we will talk about later

q

Hello, I don’t quite understand the term ‘data page’ mentioned, is it a storage mode?

a

MySQL records are accessed in “pages”, with a default size of 16K. That is, if you want to access a record on disk, you don’t read only the record, but you read 16K of data into memory along with it

q

If you use redolog to restore data, do you still write binlog? On the other hand?

a

The crash recovery process does not write the binlog, when using the binlog to restore the instance (or build the standby library) will write redolog

Past wonderful

Share my Python learning resources

Python or Java?

Python captures the knowledge planet essence and generates an ebook

What is the last slash/in the url?

Teach you to read web pages in Python

Are you still using format to format strings?

Python to spin the Excel

↓↓↓ More than technology ↓↓↓

Your attention is my motivation to keep writing!