- Getting started with MySQL (1) : Internal implementation of queries and updates
- MySQL introduction (2) : indexes
- MySQL Introduction (3) : Transaction isolation
- MySQL > lock MySQL > lock
- MySQL Introduction (5) : Replication
Abstract
Simple CURD is easy to use in MySQL.
But understanding what was going on behind CURD’s back was especially difficult.
In this article, I will briefly introduce what the architecture of MySQL is and what functions it has. Then I’ll briefly describe what happens behind the scenes when we perform simple queries and updates.
1 MySQL structure
In this section, I will briefly introduce the functions of each part. It will be described in detail in sections 2 and 3.
Let’s start with a picture:
To give you an idea:
1.1 the connector
Connectors establish connections to clients, obtain permissions, and maintain and manage connections.
After the client enters the account password, the connector establishes a TCP connection with the client if the authentication succeeds. This connection will be automatically disconnected by the connector after a long period of no request (8 hours by default).
In addition, if the administrator changes the permissions of the account after the connection is established, it will not affect the current connection. The current connection has the same permissions as before.
1.2 analyzer
The analyzer has two functions: lexical analysis and syntax analysis.
For an SQL statement, the analyzer performs lexical analysis, splits the SQL statement, and identifies the meaning of each string.
The parser determines whether the SQL statement meets the MySQL syntax based on the defined syntax rules.
‘You have an error in your SQL syntax’ is returned by the parser.
1.3 the cache
The cache here stores the previous SQL query statements and results. You can think of this as a map: key is the SQL statement of the query and value is the result of the query.
And, in the official handbook, it says:
Queries must be exactly the same (byte for byte) to be seen as identical.
That is, the query must be exactly the same as before, with every byte the same, case sensitive, and not even an extra space.
However, the cache here is very vulnerable to failure. To ensure idempotency of the query, when a table is updated, the cache for that table is also invalidated.
As a result, the hit ratio of the query cache is very low for databases that are under pressure to update. You are advised to enable caching only for databases that are read or write too much.
However, caching has been removed since MySQL8.0.
1.4 the optimizer
The query optimizer’s job is to discover the best way to execute SQL queries. Most query optimizers, including MySQL’s, search more or less for the best of all possible query evaluation options.
Simply put, the optimizer is looking for a strategy that is the quickest way to find the data.
1.5 actuators
After going through the above process, the Server layer has resolved what data needs to be processed and what to do about it.
Permissions are then determined, and if the current join has permissions on the target table, the open interface of the storage engine is called to process the data that needs to be processed.
This concludes the basic architecture of MySQL. But because I omitted most of the details, and covered only a small part of it, your questions may have increased.
But that’s okay, let’s move on and try to explain each part of this with actual examples, maybe it’ll make a little bit more sense.
2 the query
Let’s start with this SQL:
select * from T whereID = 1;Copy the code
2.1 Cache Lookup
First, a parser is called for lexical analysis.
At this point, the lexical analysis finds that the SQL statement starts with SELECT, and there is no uncertain data in this statement, so the cache is searched to see if the result of this statement is saved as a cache.
But there is a speculating part to the above statement. I can’t find out in the official documentation when MySQL looked up the cache, whether it was before or after the parser.
But the book “High Performance MySQL” mentions “by checking whether the SQL statement starts with SELECT”, so I assume that finding the cache requires a simple lexical analysis first.
Only after lexical analysis, MySQL can know whether the statement is a SELECT statement, and whether there is some uncertain data (such as the current time) in the statement.
2.2 Cache Failure
At this point, if the cache misses, parsing continues with the parser. Then, according to the syntax tree, to determine whether the SQL statement is consistent with MySQL syntax.
Note that if you are interested in lexical analysis and syntax analysis, you can take a look at compilation principles.
Then comes the optimizer. The optimizer chooses a better way to query when there are multiple ways to find it.
For example, if there are multiple indexes in the SQL statement, an appropriate index is selected. Or in the context of associated query, choose a better solution.
This is something I want to cover in a future article, but I want to focus on the following, about the structure of data in MySQL.
2.3 Data structure
When we use the executor of the last step to read and write data, we actually call the storage engine in MySQL to read and write data.
Going back to our example, we’re looking for the data in table T with ID 1. However, the storage engine does not return such a specific piece of data, it returns the data page containing the data.
Let me add a little bit of knowledge:
The database uses page management, the same as our operating system. Because our machines are von Neumann, which is a memory structure that combines program instruction memory with data memory.
In this structure, there is a property called locality principle.
- Temporal Locality: If an information item is being accessed, it is likely that it will be accessed again in the near future. Program loops, stacks, and so on are the causes of time locality.
- Spatial Locality: Information to be used in the near future is likely to be adjacent to the Spatial address of the information being used.
- Order Locality: In a typical program, most instructions are sequential, except for transferring class instructions. The ratio of sequential to non-sequential execution is roughly 5:1. In addition, access to large arrays is sequential. Sequential execution of instructions and sequential storage of arrays are the causes of sequential locality.
The simple explanation is that if a line of data is read or an instruction is executed, then there is a high probability that the CPU will continue to read or execute data or instructions at that address or after that address.
The same is true in MySQL. If a page is read at one time, it is possible that subsequent reads and writes will also be performed on that page, thus reducing disk I/OS.
Going back to what we just said, how the engine finds this page will be explained in more detail in a later article on indexing. Here we simply understand that the engine can quickly find the row of data on the page, and then the page back to the actuator.
At this point, the page of data is still stored in memory. When the data is needed later, it will be processed directly in memory, and MySQL’s memory space can hold many pages of such data. This means that both the lookup and the modification can be done in memory instead of disk I/O each time.
Finally, the page of data is written back to disk at the appropriate time. As for when and how it was written back to disk, we’ll look further.
3 update
Now that we’ve talked about how to find data, we know how a row of data is stored in memory as a page. The question we are going to solve now is:
update
How is the statement executed- How do I persist executed new data to disk
This is an interesting question. Let’s assume two scenarios:
Let’s say MySQL just updates the data in memory after the update and then returns, and then at some point does IO to persist the data page. So all operations are in memory, and you can imagine MySQL performance is very high at this point. However, if MySQL goes down after the memory is updated and not persisted, then our data is lost.
Let’s look at another situation: each time MySQL updates the page in memory, it does IO immediately, and only returns when the data falls to disk. At this point we can make sure that the data is correct. However, each operation requires IO, and the efficiency of MySQL becomes very low.
So let’s take a look at how MySQL can achieve performance without losing data.
Now back to this statement:
update T set a = a + 1 where ID = 0;
Copy the code
Assume that this SQL statement is correct, there is a column named ID, A in table T, and there is data with ID 0.
At this point, it passes through the connector, the parser, and the parser discovers that this is an UPDATE statement, so it continues parsing, optimizer, and executor. The executor determines that it has permission, then opens the table, and the engine finds the data page containing the row ID 0 and stores the data page in memory.
As you can see, the UPDATE statement also goes through the same process.
Then, we will introduce how MySQL ensures data consistency.
3.1 Redo logs
There is a very important log module called the redo log.
Note that redo logging is unique to the InnoDB engine.
When data is updated, the redo log records which data pages are updated and the update is considered complete as long as the update is successfully recorded in the redo log without the data pages in memory being written back to disk.
WAL (write-Ahead-logging) is a technology in MySQL. WAL is short for write-Ahead-logging. The key point of WAL is to Write logs first and then disks. At this time, as long as the log is saved, even if MySQL is down at this time, the data page is not written back to disk, you can use the log to restore later.
However, InnoDB’s redo log is fixed in size. For example, it can be configured as a group of 4 files with a size of 1GB each. One of the problems with fixed size is that redo logs tend to fill up.
So InnoDB writes in a loop. Notice, there are two Pointers here. Write_pos represents the current write location, and write_pos is moved back whenever a record is updated. When InnoDB updates the check_point to disk, the check_point will be moved back.
Update T set a = a + 1 where ID = 0; For example, if we update a row of data to a memory page and write it to the redo log, a success message is returned. It is then represented in the redo log as an update to a memory page.
Note that on disk, data A is unchanged, but in memory, data A is changed to A +1. The redo log records the update of the memory page, and write_pos is moved back.
If you want to move the check_point back, then you should persist the update of the memory page in the record to disk, that is, write a+1 back to disk, where a is a+1 in both disk and memory. The check_point can be moved back only if the write is successful. This design makes redo logs infinitely reusable.
Now that we know that write_pos is moved back after data is updated, when will check_point be moved?
Innodb_io_capacity is involved. This parameter tells InnoDB what your disk read/write speed is and then controls the check_point movement. As for how to tune it, I want to talk about it in a future article. In this article, you can understand that it will continue to advance at a certain rate.
If the database has a large number of updates, and check_point pushes at a constant rate, then write_pos pushes forward and becomes full. This is something InnoDB tries to avoid. Because when this happens, the entire system can no longer accept updates, and all updates will be blocked. If you look at it from the monitor, the number of updates will go down to zero. As for how to avoid this situation, I’ll leave it to tuning, but we just know it’s going to happen.
There’s another situation that I want to talk about, which is also a lot of updates. As we mentioned earlier, all operations are done in memory, which means that if the data I’m trying to manipulate is distributed across different pages, then there are a lot of pages in memory. At this point, memory may be low.
We add a concept here, clean and dirty pages. A clean page is a page of data that has been read from disk into memory and has not been modified, so you can think of it as a page of data that has only been queried but not updated. Dirty pages are data pages that are different from the data on disk and have been modified. If you have a lot of queries or updates, you need a lot of memory, which already has all kinds of data pages. So what should we do?
- If there is free space, the required data pages are directly read and stored in the space.
- If there is no free space, the least recently used clean page is eliminated, that is, the clean page space is used.
- If there are no clean pages, the least recently used dirty pages need to be eliminated. How do you eliminate the dirty page? Write the dirty page back to disk, that is, update the data of the dirty page so that it becomes a clean page.
If you write the dirty page back to disk due to insufficient memory, but update to the dirty page is recorded in a different location in the redo log, what can you do when the redo log needs to update the dirty page? Do we need to delete dirty pages in the redo log or what?
This is a question I want you to think about, and if you have this question I think you understand the problem I mentioned above about redo logs and dirty pages. The answer is that there is no need to modify the redo log when updating dirty pages. The redo log skips a page that has been flushed back to disk during a check_point advance.
3.2 Archive Logs
With all this talk about journaling, let’s talk about archiving.
There are several reasons. Redo logs are circular, which means that new data overwrites old data. There is no way to use redo logs for very long periods of time.
The second reason is that redo logs are unique to the InnoDB engine. In other engines, redo logs are not available.
So here we are talking about the mandatory archive log binlog of the engine layer.
The archive log is appending, and when one file is full it moves to the next file to continue writing, recording what has changed in each statement.
In other words, during fault recovery, you can use binlog to recover records one by one.
So how do we make sure that binlog is consistent? Let’s talk about two-phase commits in MySQL.
Update T set a = a + 1 where ID = 0; For example:
Explanation: All the way up to updating the data page in memory has been mentioned above. The data page updates are then written to the redo log.
Note that the redo log is not written to the redo log file, but to a buffer called the redo log. This means that no disk IO is used and performance is not affected.
Then, it enters a stage called Prepare.
Then, write bin log. Note that bin log is also written to buffer without persistence.
Only when both have been written successfully will the transaction be committed.
Then, there are two parameters that are important.
These parameters determine whether to wait until the redo log and bin log are persisted before returning.
Sync_binlog and innodb_flush_log_at_trx_commit.
Innodb_flush_log_at_trx_commit innodb_flush_log_at_trx_commit
- When the parameter is set to 1 (the default is 1), it means that the fsync operation must be invoked once when the transaction is committed. The safest configuration ensures persistence.
- When the parameter is set to 2, only write is performed during transaction submission and only the redo log buffer is written to the system page cache. Fsync is not performed. Therefore, transactions are not lost if the MySQL database is down, but transactions may be lost if the operating system is down.
- If the value is set to 0, no redo log is written to the master thread. The master thread performs fsync on the redo log every second. Therefore, a maximum of 1 second transactions can be lost. (Master threads are responsible for asynchronously refreshing data from the buffer pool to disks to ensure data consistency.)
That is, if we set it to 1, fsync will be called and wait for the redo log to persist before returning.
Now sync_binlog:
- When sync_binlog=0, it indicates that only write, not fsync, is committed each time.
- When sync_binlog=1, fsync is performed for each committed transaction.
- When sync_binlog=N(N>1), it indicates that every transaction is committed and fsync is performed after N transactions are accumulated. But if it goes down, you might lose the last N statements. That is to say, if we set it to 1, the final submission will wait for the system as mentioned above
fsync
.
So why do we need two-phase commit to ensure consistent data?
In the prepare phase, the database is down. After the database restarts, the transaction is rolled back without affecting data.
MySQL determines whether the redo log has a commit flag. If so, commit it. Otherwise, check whether the bin log is complete, if it is, commit, otherwise roll back.
So what happens if we don’t commit phases?
Suppose we commit redo log first and then bin log. The logic is the same as a two-phase commit, but without two validations. So if we go down after the redo log commits, we can recover data from the redo log after we restart. However, because we did not update in the bin log, data inconsistency will be caused if we use the bin log for recovery or synchronization from the slave library in the future. (Master-slave synchronization is explained in a future article)
Another assumption is to commit bin log and redo log. Then the data is not updated during the recovery, but the data will be inconsistent when using the bin log in the future.
So, the two-phase commit is to ensure that the two logs are consistent.
Write in the last
First of all, thank you for being here.
I hope this article will help you understand MySQL better. Of course, the article is limited in length and the author’s level is also limited, so many details in the article are not developed. Many knowledge points will be added in future articles. In addition, if you find something wrong with the author, please do not hesitate to correct it, thank you!
Secondly, I want to thank the male brother, who gave me a lot of help! In addition, special thanks to Dinqi teacher, I take “MySQL Actual Combat 45” as the main line to learn.
PS: If you have other questions, you can also find the author on the official account. And, all articles will be updated in the public account at the first time, welcome to find the author to play ~