🎓 Do your best and obey the destiny. I am a postgraduate student in Southeast University and a summer intern in Java background development in Ctrip. I love fitness and basketball, and I am willing to share what I have seen and gained related to technology. I follow the public account @flying Veal and get the update of the article as soon as possible
🎁 This article has been included in the “CS-Wiki” Gitee official recommended project, has accumulated 1.8K + STAR, is committed to creating a perfect back-end knowledge system, in the road of technology to avoid detours, welcome friends to come to exchange and study
🍉 If you do not have a good project, you can refer to a project I wrote “Open source community system Echo” Gitee official recommended project, so far has accumulated 900+ star, SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo
I was preparing for my summer internship, and this was the first question I asked in my interview with Ctrip. I apologized and said I didn’t know about it. The interviewer said redo log, and I thought, I know, what is WAL? Dumbstruck the interviewer (funny) and apologized for my ignorance. Later, I went back to Baidu to know that I saw WAL in Dinqi’s “MySQL Combat 45” recently, so I wrote an article to summarize.
InnoDB architecture
Before talking about WAL, it is important to briefly describe the architecture of the InnoDB storage engine, and redo logs are unique to the InnoDB storage engine.
As shown below, the InnoDB storage engine consists of a memory pool and some background threads:
Memory pool
Let’s explain the memory pool.
First, we need to know that the InnoDB storage engine is disk-based storage and manages the records in it on a page basis. It can therefore be thought of as a disk-based Database system, in which buffer pool techniques are commonly used to improve the overall performance of the Database due to a well-known mismatch between CPU speed and Disk speed.
So the memory pool is also called the buffer pool.
Specifically, a buffer pool is an area of memory that is accessed between the CPU and the disk to compensate for the impact of slow disk speeds on database performance.
Once you have the buffer pool, the steps of the “read page” operation look like this:
- The pages read from disk are first stored in the buffer pool
- The next time you read the same page, first determine whether the page is in the buffer pool. If the page is in the buffer pool, the page is said to be hit in the buffer pool, and the page is read directly. Otherwise, the page on disk is read.
The specific steps of the “Modify page” operation look like this:
- First modify the page in the buffer pool; It is then flushed to disk at a regular rate.
Dirty pages occur in this modification operation. If a page in the buffer pool has been modified but not flushed to disk, the page in the buffer pool is said to be a dirty page, that is, the version of the page in the buffer pool is newer than the version of the page on disk.
So far, we can conclude that the size of the buffer pool directly affects the overall performance of the database.
A background thread
The main purpose of background threads is to store pages read from disk in the buffer pool and flush data from the buffer pool to disk at a certain frequency, among other things. InnoDB Storage Engine – Edition 2
Background threads refresh data in the memory pool to ensure that the latest data is cached in the memory pool. In addition, the modified data files are flushed to disk files and InnoDB can be restored to normal operation in case of database exceptions.
In addition, the InnoDB storage engine is a multi-threaded model, which means it has multiple background threads that handle different tasks. Here are a few different background threads:
- Master Thread: asynchronously refreshes data from the buffer pool to disks to ensure data consistency
- IO Threads: The InnoDB storage engine uses a lot of AIO (Async IO) to handle write IO requests, which can greatly improve database performance. The IO Thread is responsible for the call back processing of these I/O requests
- Purge Thread: Purge the undo pages that have been used and allocated
- Page Cleaner Thread: The cleaning of dirty pages in previous versions is done in a separate Thread. The purpose of InnoDB is to improve the performance of InnoDB storage engine by reducing the work of original Master threads and blocking user query threads
Redo Log and WAL policies
As mentioned earlier, when data on a page in the buffer pool is modified, the page is marked as a “dirty page” and the data on the dirty page is periodically flushed to disk.
If every time a page changes, the version of the new page is flushed to disk, this is very expensive. Also, if the hot data is concentrated in a few pages, the performance of the database will be very poor. Also, if an outage occurs while flushing a new version of a page from the buffer pool to disk, this data cannot be recovered.
WAL (Write Ahead Log) is a common policy in today’s transaction database systems (not unique to MySQL) to avoid data loss. When a transaction commits, the redo Log is written first and then the page is modified (the buffer pool is modified before flushing to disk). When data is lost due to an outage, use the redo log to recover data. This is also a D (Durability) requirement in transaction ACID.
With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.
To take a simple example, let’s say you’re very enthusiastic and rich lent a lot of money, but you’re very old school, don’t know how to use electronic devices and have a bad memory, so you write down the names of all the people who owe you money and the exact amount in a small notebook. That way, when someone pays you back, you look in your little book, page by page, find his name and deduct the amount.
However, you are usually too busy to turn to your notebook to keep records. So you come up with an idea: Whenever someone returns money to you, you write it down on a piece of white paper.
That’s WAL. White paper is the redo log, and the redo log is the disk.
Of course, the redo log is not a blank log. When you run out of redo logs, you change them to a new one.
Each InnoDB storage engine has at least one redo log group. Each InnoDB storage engine has at least two redo log files. The default is one redo log group. There are two redo log files: ib_logfile0 and ib_logfile1.
To improve the reliability of the redo log, you can set mirrored log groups to have different file groups on different disks to improve the availability of the redo log. Each redo log file in the log group is of the same size and runs in a circular write mode.
For example, a redo log group contains three redo log files:
InnoDB storage engine writes redo log file 0. When file 0 is full, it switches to redo log file 1. When file 1 is full, it switches to redo log file 2. When file 2 is full, file 0 is switched.
InnoDB storage engine performance is significantly affected by the size of redo log files:
- Do not set the redo log file too large. If you do, it may take a long time to recover
- Do not set the redo log file too small, which may cause a transaction log to switch redo log files multiple times
CheckPoint technology
Do you feel safe with redo Logs? Obviously, it’s not that simple, and we still have three problems:
1) Buffer pools are not infinite, which means we can’t store our data forever waiting to be flushed to disk together
2) Redo logs are circular rather than infinite (maybe, but it’s expensive and difficult to maintain), so what if all the redo log files are full?
3) When a database has been running for months or even years, the redo log takes a very long time to reapply and recovery costs are very high.
So Checkpoint is designed to solve these problems:
- Flush dirty pages to disk when the buffer pool is insufficient
- Flush dirty pages to disk when the redo log is not available
- Shorten the database recovery time
A CheckPoint is simply a point in the redo log file at which all previous pages are flushed to disk.
Let’s explain the above three points in turn:
1) Shorten the database recovery time: When a database outage occurs, the database does not need to redo all the logs, because the previous Checkpoint pages have been flushed back to disk. Therefore, the database only needs to restore the redo log after Checkpoint. This obviously greatly reduces the recovery time.
2) Flush dirty pages to disk when the buffer pool is insufficient: What does InnoDB do when the buffer pool is insufficient to hold newly read pages? LRU algorithm. InnoDB storage engine optimizes the traditional LRU algorithm to manage buffer pool space.
The general idea is still the traditional LRU set, the specific optimization details will not be described here: that is, the most frequently used pages in the LRU List (LRU List) in the front, the least used pages in the LRU List at the end; When the buffer pool has no space to hold newly read pages, the pages at the end of the LRU list are first released. The freed (spilled) page, if dirty, needs to be CheckPoint flushed to disk.
3) Flush dirty pages to disk when redo log is not available:
Redo log unavailable means that all redo log files are full. But the fact is, in fact, the data in the redo log is every moment of the day is not useful to those who have no longer need to part is called “can be reused part”, namely when the database is down, database recovery operations do not need this part of the redo log, so this part can be overwritten reuse (or be erased).
To illustrate this, you have 4GB of redo log file space for a set of 4 1GB files. Write pos is the current location of the redo log. As the redo log is written to disk, write pos is moved back to the end of file 3 and back to the beginning of File 0. CheckPoint is the current point to erase (to flush previous pages back to disk), but also backwards and cyclic:
The space between write pos and CheckPoint is the empty portion of the redo log file that records new operations. If write POS has overtaken CheckPoint, the redo log file is full, and no new updates can be performed. We must stop and overwrite (erase) some redo logs to CheckPoint.
In summary, Checkpoint does nothing more than flush dirty pages from the buffer pool to disk. The differences are how many pages are flushed to disk each time, where the dirty pages are fetched each time, and when Checkpoint is triggered. Inside InnoDB storage engine, there are two Checkpoint types:
- Sharp Checkpoint: Flush all dirty pages back to disk when the database is shut down. This is the default way to work
innodb_fast_shutdown=1
- Fuzzy Checkpoin: The InnoDB storage engine uses this pattern internally, flushing only some dirty pages instead of flushing all dirty pages back to disk. The details of Fuzzy CheckPoint are not needed here.
Why do we need redo logs when we have bin logs?
As mentioned earlier, the MySQL architecture can be divided into two layers. The first layer is the Server layer, which is mainly responsible for the functions of MySQL. The other layer is the storage engine, which is responsible for storing and extracting details.
Redo logs are InnoDB engine specific logs. The Server layer has its own logs, including error logs, binary logs, slow query logs, and query logs.
The binary log records all changes made to the MySQL database, except SELECT and SHOW operations, which do not modify the data itself. In other words, a binlog is a logical log that records the original logic of the statement, such as “add 1 to field A on the line ID=1”.
MySQL can only use a binary log for archiving, so it is also called an archive log. Thanks to MySQL’s pluggable storage engine architecture, InnoDB developed another logging system, the redo log, to implement crash-safe capabilities.
That’s why we have bin logs and why we need redo logs.
Reviewing the redo log stores, you can see that the redo log is a physical log of what was changed on a data page.
Another difference is that the binlog file is appending, which means that the binlog file will be switched to the next one after being written to a certain size and will not overwrite the previous log file. Redo logs are written in a loop.
| flying veal 🎉 pay close attention to the public, get updates immediately
- I am a postgraduate student in Southeast University and a summer intern in Java background development of Ctrip. I run a public account “Flying Veal” in my spare time, which was opened on 2020/12/29. Focus on sharing computer fundamentals (data structure + algorithm + computer network + database + operating system + Linux), Java technology stack and other related original technology good articles. The purpose of this public account is to let you can quickly grasp the key knowledge, targeted. Pay attention to the public number for the first time to get the article update, we progress together on the way to growth
- And recommend personal maintenance of open source tutorial project: CS-Wiki (Gitee recommended project, has accumulated 1.8K + STAR), is committed to creating a perfect back-end knowledge system, in the road of technology to avoid detours, welcome friends to come to exchange learning ~ 😊
- If you don’t have any outstanding projects, you can refer to the Gitee official recommended project “Open Source Community System Echo” written by me, which has accumulated 900+ star so far. SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo