0. What is the process of writing a statement? What steps have you gone through?
1. Whychange buffer
和 redo log
?
Because when I look at this, I confuse the change buffer with the redo log the first and second time. Binlog, you know what it is, it’s data synchronization. What do redo logs do? Is this a memory space or disk space? Why do I need change Buffer? Who is it for? These things are easy to feel dizzy at first contact.
2, Noun interpretation 1.0
2.1. What is two-phase Commit?
Let’s look at the details of the next statement execution
Two-phase commit is best understood literally, with data being committed and written in two phases. Write the first time, confirm the second time.
2.2. What is itredo log
?
2.2.1 Explanation of nouns
It says it here. How does MVCC guarantee consistency view?
2.2.2, features,
Redo logs are fixed size, configurable, and sequential. There are several key configuration items:
innodb_log_buffer_zise
. Buffer sizeinnodb_log_file_in_group
. Number of file groupsinnodb_log_file_size
. The file size
Another feature of redo log writing is circular overwriting. Redo logs are always limited in size. What if they are full? Keep reading
2.3. What is itredo log buffer
?
The redo log buffer is an area of memory that follows the redo log. The redo log engine does not write redo log directly to disk. Instead, it writes redo log buffer, which is then flushed to disk. The redo log buffer is a user-space buffer that passes through a kernel buffer (OS buffer) before writing to disk. The process of writing to a disk looks like this:
Why do you do that? Or in pursuit of 1, writing speed; 2. Change from synchronous to asynchronous. Writing buffer flushes also supports several policies (configured with the innodb_flush_log_at_trx_COMMIT parameter) :
- Delay write, real-time brush. Fixed writing once per second
os buffer
And synchronous disk brush - Real time write, real time brush. Every time to write
redo log buffer
Will writeos buffer
And synchronous disk brushing - Real-time write, delay brush. Every time to write
os buffer
And swipe the disk every second
3. Why two-phase commit?
3.1. What problems are solved by two-phase submission?
Crash is safe. Two phases, three phases are all to solve the problem of data consistency in distributed transactions, since distributed transactions, it must be distributed, if it is a single, then you don’t need this thing. So let’s say I start a mysql instance, I don’t need binlog. Write a redo log once. After a database crash is restarted, only redo log records are used to restore the database.
The purpose of the two-phase commit is to ensure that the redo log is consistent with the binlog data (it is not completely consistent, and there is a logical decision to move forward or backward). Prevent data inconsistencies during database crash retry recovery. Redo log is used in the master database and binlog is used in the slave database.
3.2,redo log
How is it guaranteed?crash safe
?
3.2.1,redo log
The principle of
To view redo log information, run show engine Innodb status.
Log sequence number
: the currentredo log
The latestLSN
Log flushed up to
: The disk has been flushedredo log
的LSN
Last checkpoint at
: from the last checkpointLSN
The redo log file is set to two by default. InnoDB uses Log Sequence Number (LSN) to record Log numbers. LSN is ubiquitous. It allows you to locate a location in the log file.
There are two dots in the redo log. Because the redo log size is fixed, it is always full. What if it’s full? You can’t just throw it away. So you have this ring like structure. Log writing continues until the last pit ib_logfile0 is written. The ib_logfile0 data should be deleted from ib_logfile0. That’s what Checkpoint does. A checkpoint is a point at which the data after it is no longer important (it has fallen into the library disk) and can be disposed of at will. The redo log uses these two points for infinite persistence, but there’s a problem:
- if
checkpoint
The data has not been written to disk yetchange buffer
In, or inbuffer pool
But not yet. If it’s overwritten, you can’t recover the data after the crash. So whencheckpoint
When caught up with the latest log, he would stop and pushcheckpoint
Keep walking. - If caught, the latest data is not allowed to write.
Another function of the LSN is to repair data during crash recovery by comparing the REDO log to the LSN of the database page.
3.2.2 Give several cases according to the following two graphs:
- in
step 1
Hung up.redo log
No submission,binlog
He didn’t write; Transaction rollback with no impact on data - in
step 2
Hung up. There are two cases:binlog
To write abinlog cache
, has not been brushed to the disk;redo log
It hasn’t been submitted yet. When a crash recovers, judge firstredo log
的pre
Identify, and then according toxid
去binlog
Rollback the transaction.binlog
Flush to disk;redo log
It hasn’t been submitted yet. When a crash recovers, judge firstredo log
的pre
Identify, and then according toxid
去binlog
Found record, commit transaction.redo log
commit
- in
step 3
Hung up. There are also two cases:binlog
The disk has been flushed.redo log
Commit to write aredo log buffer
, has not been brushed to the disk; To determineredo log
的pre
Identify, and then according toxid
去binlog
Found record, commit transaction.redo log
commitbinlog
The disk has been flushed.redo log
Brush disk. judgeredo log
的commit
Identity; Commit the transaction directly.
In this case, you can rely on the redo log to ensure master/slave data consistency and client-aware consistency regardless of the level of the crash. But:
What if data is written to the buffer pool and hangs before it is flushed?
3.3. Persistence
Noun interpretation 2.0
4.1. What ischange buffer
?
A change buffer is a buffer that stores the latest data changes. It mainly solves the problem of large I/O consumption of random disk reads. Why random reading?
Look at the second picture of the article. If there is no change buffer, when an update statement modifies a certain data, the data needs to be retrieved from the buffer pool first. If there is no change buffer, the data needs to be retrieved from the disk. Reading data pages from disk into a buffer pool involves random IO access, which is one of the most expensive operations in a database. So once you have this buffer, you can cache data directly into the Change buffer for some write or modify operations. In the next query, read the original data from the disk, merge the original data with the changes in the change buffer and return. What is omitted is the disk I/O operation that may be involved in the write operation.
Change buffer However, it can be persisted, and the default place where it is persisted is in the IBDatA1 shared space table (see file figure). Because in order to ensure the consistency of the data
At the same time,change buffer
You also need to writeredo log
. soredo log
Not only does it record changes to regular data pages, but alsochange buffer
The record.
Change buffer is a buffer pool. As the name suggests, buffer pools. When we are doing project engineering, we usually add a layer of Redis in front of DB to carry a wave in case of high concurrency, so as to prevent a large number of requests from directly suspending DB, so Redis serves as a buffer here. In the same way, the query request from the server layer will be sent to the buffer pool first. If there is no corresponding data in the buffer pool, the disk will be queried. Otherwise, the data in the buffer pool will be returned.
What is the relationship between change buffer and buffer pool?
As shown above. Contains the relationship
The change buffer is an area in the buffer pool
innodb_change_buffer_max_size
saidchange buffer
Accounts for the largestbuffer pool
The default value is 25%
Let’s look at the details of executing a statement in the buffer section
Here are a few scenarios:
- A simple update statement.
Where name = "spend a nothing"
- judge
buffer pool
Whether this data exists in - If it exists, update it directly
buffer pool
- Otherwise, the add
change buffer
- write
redo log
- judge
- A statement that updates based on a unique key.
where uniqId = 7
-
Check whether the data exists in the buffer pool
-
If yes, the buffer pool is directly updated
-
Otherwise, check disks.
If so, load into the buffer pool. Otherwise, an error is returnedCopy the code
-
Update the buffer pool
-
Write the redo log
-
- A simple query statement.
Select * from name where (select * from name)
-
Check whether the data exists in the buffer pool
-
If no, check the disk
If yes, the load is loaded into the buffer poolCopy the code
-
Check whether the change buffer contains this data
If yes, merge the buffer poolCopy the code
-
Returns the data after the merge
-
5. Whychange buffer
?
5.1,change buffer
What problem was solved?
Reading data pages from disk into memory involves random I/O access, which is one of the most expensive operations in a database. Using the Change buffer can reduce I/O operations and improve database performance.
Then why can’t unique indexes be usedchange buffer
?
In the above process. Unique index When performing an INSERT or update, you need to determine the uniqueness of the index record, so you must get the latest record first. The disk data page is loaded into memory and then judged. So now that it is all loaded into memory, I can directly operate the memory, and do not need to merge the change buffer.
5.2,change buffer
Is it ok to kill?
The change Buffer itself is optional.
innodb_change_buffering
Parameters are used to control which operations are enabledchange buffer
The default is:all
. 支那
innodb_change_buffering
There are several options for parameters:
- All. The default value. Open the buffer
- None. Change Buffer is disabled
- Inserts. Only the buffer insert operation is enabled
- Deletes. Just enable the delete-marking operation
- Changes. Enable buffer insert and delete-marking
- Purges. Enable buffer for physical deletions that are performed only in the background
The effect of not turning it on is to go back to random reads. If the data does not exist in the buffer pool, data is read from disks into the buffer pool before the following operations are performed. This happens every time. There is no impact on logic or data accuracy, only performance.
5.3. Persistence
5.3.1,change buffer
persistence
change buffer
Is used to update secondary index B+ trees that are not primary key/unique indexes.redolog
Is the guaranteecrash-safe
.
Why is change buffer persistent?
Let’s see what happens when you don’t persist.
The change buffer is used to write a redo log. During an outage, the change buffer is lost. The redo log records all changes to the change buffer. The redo log is used to rebuild the change buffer. It feels like it doesn’t have to persist
Personal opinion:
change buffer
There is a capacity limit. Drop disk to make room for new operations when memory runs out.redo log
To ensure the accuracy of the original data,change buffer
The accuracy of the index page is guaranteed. Drop disk is for data consistency
5.3.2 Brush dirty pages
What do you mean by dirty pages?
A piece of data exists in memory. An update statement comes in and modifies memory and then returns. In this case, the data in the memory is not the same as the data on the disk. Therefore, the data in the memory is dirty. Because data is recorded in pages, they are called dirty pages.
The process of periodically applying dirty data from the buffer pool to the original data page to obtain the latest results is called dirty page flushing. Merge Change Buffer reads data from disk to memory, and then merges change buffer to disk. At this point, the data on disk and in memory is the new correct data.
What are the scenarios that trigger dirty page brushing?
- Look up the data. The disk exists.
change buffer
There is,buffer pool
There are - Background regular brush
- Database Shutdown
6, references,
MySQL: MySQL
Database kernel monthly report
www.modb.pro/db/62466