The LOG data +

Database data consists of two parts, one is data, one is LOG. Innodb data includes data in memory (Innodb buffer pool) and data in hard disk. Data changes are first applied to cached data in memory. Innodb then flushes the latest data to hard disk according to Flush policy. Therefore, data is not flushed to disk in real time, and if the process or system crashes, data that is not flushed to disk will be lost. Innodb uses WAL to ensure data persistence and atomicity through Redo and Undo logs.

Hi-tech technology

In computer science, write-ahead logging (WAL) is a set of techniques used in relational database systems to provide atomicity and persistence (two of the ACID properties). On systems that use WAL, all changes are written to a log file before committing.

Undo and Redo

Why Redo and Undo are used?

MySQL Database InnoDB Storage engine Log roaming (1)

I/O Performance Undo + Redo is designed to improve I/O performance. Although by caching the data, it reduces the IO required to write the data.

But a new IO was introduced, the Redo Log IO. If Redo Log IO performance is not good, it will not improve performance. InnoDB Redo logs have the following features to ensure good I/O performance:

A. Keep the Redo Log stored in A contiguous space. Therefore, the space of the log file is fully allocated when the system is first booted. Log Redo logs sequentially, using sequential IO to improve performance. B. Write logs in batches. Logs are not written directly to the file, but to the redo log buffer first. When logs need to be flushed to disk (such as transaction commits), many logs are written to disk together. C. Concurrent transactions share Redo Log storage space. Their Redo logs are recorded alternately in order of execution to reduce Log space. For example, a Log in the Redo Log might look like this: Log 1: <trx1, insert... > <trx2, update... > <trx1, delete... > <trx3, update... <trx2, insert... > D. Because of C, when a transaction writes Redo logs to disk, other uncommitted transactions are also written to disk. E. Only sequential appending operations are performed in the Redo Log. When a transaction needs to be rolled back, its Redo Log records are not deleted from the Redo Log.Copy the code

Why Redo uses physical page+ logic

Zhuanlan.zhihu.com/p/109417488 » MySQL database InnoDB storage engine Log roaming (2) this is also why we need to double the write buffer

Ensure that logs are flushed

As mentioned above, Innodb guarantees data persistence and atomicity through Redo and Undo logging. Therefore, the flushing strategy of the log is critical. Only when the log is flushed in time, the persistence and atomicity can be achieved. There are several parameters involved, including innodb_flush_method and innodb_flush_log_at_commit.

innodb_flush_method

Defines the method used to flush data to InnoDB data files and log filesInnodb_flush_method does not correspond to the open file mode. Because Innodb_flush_method specifies both data files and log files flushes, the two flushes can be inconsistent. For example, innodb_flush_method is O_DIRECT. In O_DIRECT mode, data is written directly to disk, bypassing the cache. Log files are still cached by the operating system.If innodb_flush_method is O_DIRECT, use O_DIRECT to open data files. If innodb_flush_method is O_DIRECT, use fsync to open data files.

Log files are still cached by the file system, so fsync is also required.

innodb_flush_log_at_commit

This controls the flush behavior of redo logs at commit time. As mentioned above, because the logs still need to be cached by the operating system, the data security requirement is set to 1. Fsync the redo logs to hard disk at each commit.

The optional values of innodb_flush_LOG_AT_COMMIT are 0,1, and 2.

When the value is 0, data in the LOG buffer is written to the OS cache and fsynced every 1s. When the value is 2, redo logs are written to the OS cache every 1s and fsynced to disk every 1s.

Data integrity

Redo log and Undo log ensure data persistence and atomicity. This is Innodb level. Because MySQL is the Server engine architecture, the Server layer needs to use binlog. www.infoq.cn/article/M6g… There are three aspects to consider when considering Redo and binlog logs:

  • Data content consistency
  • Data sequence consistency
  • The efficiency of

Data content consistency

Why is that?

How?

  • Each transaction is automatically assigned a unique ID (XID).
  • COMMIT is automatically divided into Prepare and COMMIT phases.
  • A Binlog is used as a Transaction Coordinator, and a Binlog Event is used as a Coordinator log.
  • When the instance recovers from the crash, the ACTIVE transaction needs to be extracted from the undo undo. For the ACTIVE transaction, if the binlog corresponding to the prepared transaction has been recorded, the transaction is committed. Otherwise, the transaction is rolled back.

The specific implementation

As you can see from the above picture, the transaction commit is divided into two main steps:

  1. Storage Engine (InnoDB) Transaction Prepare Phase

At this point, the SQL has successfully executed, and xID information and memory logs for redo and undo have been generated. The prepare method is then called to complete the first phase. The Papare method does nothing, sets the transaction state to TRX_PREPARED, and brushes the redo log to disk.

  1. Storage Engine (InnoDB) Commit Phase

    2.1 Record coordinator logs, that is, Binlog logs. If prepare is executed successfully for all storage engines involved in the transaction, the TC_LOG_BINLOG::log_xid method is called to write the SQL statement to binlog (write(). The binary log data is written to the file system cache. Fsync () writes the binary log file system cache log data to disk permanently. At this point, the transaction is bound to commit. Otherwise, the ha_rollback_trans method is called to rollback the transaction, and the SQL statement will not actually be written to binlog. 2.2 Tell the engine to commit.Copy the code

Finally, the engine’s COMMIT is called to complete the transaction. It clears undo information, brushes redo logs, and sets the transaction to TRX_NOT_STARTED.

PS: The Binlog is recorded after the InnoDB engine prepares for writing the Redo Log to disk. This is important.

As you can see from the two-phase commit flow above, once the actions in Step 2.1 are complete, the transaction is guaranteed to commit, even if the database sends down during step 2.2. In addition, it is important to note that each step requires a fsync operation to ensure data consistency between the upper and lower layers. The fsync parameter in step 2 is controlled by sync_binlog=1, and the fsync parameter in step 3 is controlled by innodb_flush_LOG_at_trx_COMMIT =1. This ensures CrashSafe.

Data sequence consistency

Why is that?

Since mysql is multithreaded, if there is no mechanism to ensure that multiple transactions are executed concurrently, writing binlog first does not necessarily mean committing innoDB first. In the following figure, the sequence of Binlog and Innodb commit is inconsistent.As shown in the figure above, the transaction starts in T1, T2, T3 order, writes binary logs (in T1, T2, T3 order) to the log file system buffer, and calls fsync() for a group commit to permanently write the log files to disk, but the storage engine commits them in T2, T3, and T1 order. Xtrabackup backs up only Redo logs and does not backup binlogs. Therefore, the binlog point of the online physical backup record is the point where T1, T2, and T3 have been committed. However, the data of T1 will be lost because T1 is not committed in Innodb layer during recovery.

How to do

Older version (poor performance)

Prepare_commit_mutex in each xa transaction, in the prepare phase transactions to a global prepare_commit_mutex first, and then execute that the persistence of redo log and binlog fsync (), Prepare_commit_mutex is then released after fsync, which is equivalent to serialization. This ensures sequential consistency between binlog and redo log, but results in an fsync operation per transaction.

Group to submit

Binary Log Group Commit The goal of Group Commit is to reduce the number of fsync attempts. The basic idea behind Binlog group commits is to introduce a queue mechanism to ensure that Innodb commits in the same order as Binlog drops.

The premise

RedoLog itself is a group submission. In the prepare phase of 2PC, a disk flush is performed for redo file (innodb_flush_log_at_trx_commit=1). In this case, the redo group commit process is as follows:

  1. Get log_mutex
  2. If flushed_to_disk_lsn>= LSN, the disk is flushed. Go to 5
  3. If current_flush_Lsn >= LSN, it indicates that the log is flushing. After 5 flush_lsn jumps, the log enters the waiting state
  4. Flush and sync logs smaller than LSN
  5. Exit log_mutex

This process is according to the order of the LSN to merge, that is to say a redo group commit process may speak other uncommitted transaction of LSN also brush plate segmentfault.com/a/119000001…

Binary Log group commit implementation
The group submits the overall process
  1. binlog prepare
  2. InnoDB prepare
  3. Binlog commit(ordered commit) –3.1 Stage #1: Flushing transactions to binary log –3.2 Stage #2: Syncing binary log file to disk –3.3 Stage #3: Commit all transactions in order.
  4. InnoDB commit
5.6

To improve concurrency performance, lock granularity must be refined. MySQL 5.6 introduces the group commit function of binlog. The prepare stage is unchanged, and only the commit stage is used.

Flush stage: Multiple threads write binlogs from the cache in the order they were entered. Sync stage: Perform fsync on the binlog file (merge the binlogs of multiple threads at once); Commit stage: InnoDB commit is performed by each thread in sequence. Each phase is protected by a LOCK, so the order of transaction writes is guaranteed.

This is done by setting up a queue at each stage, where the first thread to enter the queue becomes the leader and subsequent threads block until the commit is complete. The leader thread will lead all threads in the queue to execute the tasks of this stage and lead all followers to the next stage for execution. When the next stage is not empty, the leader thread will register into this queue as followers.

The advantage of this group commit is that the granularity of the lock is reduced and all three phases can be executed concurrently, increasing efficiency.

5.7 optimization

Delay redo writing to the Group commit phase

In MySQL 5.6 group commit logic, each transaction prepares and writes a redolog. The group commit is not performed until the commit stage. Therefore, the redolog sync operation for each transaction becomes a performance bottleneck.

In version 5.7, we changed the flush stage of a group commit to the following logic:

Collect the group commit queue, get the leader thread, the rest of the follower threads into the block; The leader calls ha_flush_logs to do a redo write/sync, that is, flush redolog for all threads at once. Write all binlog caches in queue THD to binlog file. This optimization delays redolog’s flush stage to the Binlog Group commit before sync binlog. A group write is done to Redolog by delaying writing redolog so that both binlog and Redolog are optimized.

Why does Innobackupex not need to backup Binlog?

When an instance recovers from a crash, the ACTIVE transaction is automatically rolled back. For a prepared transaction, if the binlog corresponding to the transaction has been fully recorded, the transaction is committed. Otherwise, the transaction is rolled back.

Therefore, XA Recover only affects the Prepare state transaction.

How to do

FLUSH TABLE WITH READ LOCK

Innobackupex has a FLUSH TABLE WITH READ LOCK step. FTWRL consists of three main steps:

  1. Global read lock on (lock_global_read_lock)
  2. Clearing the table cache (close_cached_tables)
  3. On the global COMMIT lock (make_global_read_lock_block_commit).

The role of FTWRL in backup => blocks the start of new transactions and active transaction commits in preparation for obtaining consistency points. At this point Redo logs and binlogs are already consistent.

FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS

Brush Redo from Log buffer onto hard disk and Innobackupex copy it.

Segmentfault.com/a/119000001… Blog.csdn.net/zbszhangbos… Mysql.taobao.org/monthly/202… Mysql.taobao.org/monthly/201…