preface
From today, we will continue to update MySQL related knowledge. Java multithreading is slowly being updated, but I want to spend most of my time on frequently asked questions because the interviews I’m going to record will occasionally ask questions.
MySQL doesn’t want to talk about indexes at the beginning, we’ll talk about indexes in the future, but we want to talk about interesting things, and we’ll be updating MySQL logs. Today I’m going to focus on one of the redo logs.
background
Before I do redo log, let me give you a little background on why it exists. MySQL data is stored in disk, but if every time data is read through disk I/O, then this is very expensive and inefficient. Innodb provides a buffer that contains a map of some of the data pages on disk as a buffer to access the database. When the database reads the data, it takes it from the buffer first. If it’s not in the buffer, it’s fetched from disk and then put into the buffer buffer. When data is written to the database, data is first written to the buffer, and the data in the buffer is periodically flushed to disk for persistence. The problem is that while read and write efficiency improves, it also increases the risk of data loss. If the buffer is not flushed to disk and MySQL is down, the buffer data will be lost, resulting in data loss. When the data is lost, the persistence of the transaction is not guaranteed. It is in this context that redo logs are introduced.
Principles of Persistence – Redo log
Redo logs. The redo log consists of two parts: an in-memory redo log buffer, which is volatile; The second is a persistent redo log file on disk. As long as a transaction is committed, its changes to the data are permanent.
After the redo log is added, the process of flushing data to disk is improved: when data is added or modified in the database, the entire data is recorded in the redo log, in addition to modifying the data in the buffer. Even if MySQL is down, it still has the redo log to recover data.
Conceptually, InnoDB implements transaction persistence through the force log at commit mechanism, that is, before a transaction commits, all transaction logs must be written to the redo log file and undo log file on disk for persistence.
To ensure that each log is written to the transaction log file, the operating system’s fsync operation (the fsync() system call) is called each time the log from the log buffer is written to the log file. Because MySQL works in user space, MySQL’s log buffer is in user space memory. To write to a logfile on disk (redo:ib_logfileN,undo:share tablespace or. Ibd), the OS buffer must be stored in the kernel space. The purpose of the fsync() call is to flush the logs from the OS buffer to the log file on disk.
The following is a redo log file from the redo log buffer:
Redolog is write-ahead logging, that is, writing to the log before writing to the disk. Since the redo log also commits logs before writing to disk, why is it faster than writing the buffer directly to disk? The first reason is that the persistence of buffer is random I/O, and every time the data position is changed, it is random. But the redo log is appending. It is appended at the end of the file. It is a sequential IO operation. Random IO is slower than sequential IO, especially on traditional mechanical hard drives. The default size of the MySQL database page is 16K. The redo log only writes the redo log to the entire page. Greatly reduces invalid IO.
The timing of the redo log flush
- The log buffer space is insufficient
The size of the log buffer is finite (as specified by the system variable innodb_log_buffer_size), and if you keep stuffing the finite size of the log buffer with logs, it will soon fill up. The designers think that the amount of logs written to the log buffer is about half of the total size of the log buffer, so they need to be flushed to disk.
- When a transaction commits
MySQL supports user customization of how logs from the log buffer are flushed to the log file during commit. This control is determined by the value of the variable innodb_flush_LOG_at_trx_COMMIT. This variable has three values: 0, 1, and 2. The default is 1. Note, however, that this variable only controls whether the COMMIT action flusher the log buffer to disk.
When set to 1, a transaction writes logs from the log buffer to the OS buffer and calls fsync() to the log file on disk for each commit. This way, no data is lost even if the system crashes, but IO performance is poor because every commit is written to disk.
When set to 0, instead of writing logs from the log buffer to the OS buffer during transaction commits, the OS buffer is written every second and fsync() is called to log file on disk. This means that 0 is written to disk (approximately) every second, and when the system crashes, 1 second of data is lost.
When set to 2, each commit is written only to the OS buffer, followed by a call to fsync() every second to write the logs from the OS buffer to the log file on disk.
- Background threads keep brushing
- Shut down the server properly
- When we do something called checkpoint
- Other cases…
MySQL supports user customization of how logs from the log buffer are flushed to the log file during commit. This control is determined by the value of the variable innodb_flush_LOG_at_trx_COMMIT. This variable has three values: 0, 1, and 2. The default is 1. Note, however, that this variable only controls whether the COMMIT action flusher the log buffer to disk.
- When set to 1, each transaction commit writes the log buffer to the OS buffer and calls fsync() to flush the log file to disk. This way, no data is lost even if the system crashes, but IO performance is poor because every commit is written to disk.
- When set to 0, instead of writing logs from the log buffer to the OS buffer during transaction commits, the OS buffer is written every second and fsync() is called to log file on disk. This means that 0 is written to disk (approximately) every second, and when the system crashes, 1 second of data is lost.
- When set to 2, each commit is written only to the OS buffer, followed by a call to fsync() every second to write the logs from the OS buffer to the log file on disk.
Redo log Log file format
By default, the MySQL data directory has two files named ib_logfile0 and ib_logfile1. By default, logs in the log buffer are flushed to these two disk files.
When writing redolog to the logfile group, start with ib_logfile0, then ib_logFILe1 if ib_logfile0 is full, and so on… What if I write to the last file? So LET’s go back to ib_logfile0 and keep writing.In this diagram, checkpoint is moving backwards and iterating, updating records to data files before erasing them. The empty space between write POS and checkpoint can be used to record new operations. If write POS catches up to checkpoint, it overwrites data that has not been updated to disk.
With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.
conclusion
Today I went over the redo log, the underlying principle of persistence in the redo log feature. Redo log: When data is added or modified to a database, the entire buffer is logged to the redo log. Even if MySQL is down, it still has the redo log to recover data. The timing of redo log flusher to disk during commit is controlled by innodb_flush_log_at_trx_commit, which can be 0, 1, and 2 respectively. Finally, we also learned about the file format of the log and how to store it.
I feel like I don’t have to worry about redo logs after I’ve mastered today’s lesson. Next time we’ll talk about atomicity undo log.
Standing on the shoulders of giants
reference
-
MySQL transaction log www.cnblogs.com/f-ck-need-u…
-
Geek Time. MySQL 45 Lecture
-
How MySQL Works: Understanding MySQL from the Root