preface

When we use mysql, we often treat it as a black box and pay little attention to its internal mechanisms and processes. In this article, we will take MySQL apart to see what its “parts” are. We hope that this process will give you a deeper understanding of MySQL. In this way, when we encounter some exceptions or problems in MySQL, we can get to the root of the problem, locate and solve the problem more quickly.

Mysql Infrastructure & How is a query executed

In general, mysql is divided into two parts: the server layer and the storage engine layer

The Server layer includes connectors, query caches, analyzers, optimizers, actuators, etc. It covers most of the core service functions of MySQL, as well as all the built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.

The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.

Let’s look at the functions and functions of each component

The connector

As the name implies, the connector is responsible for establishing, maintaining, managing connections and obtaining permissions with clients

When we use mysql client to execute the following statement

mysql -h$ip -P$port -u$user -p
Copy the code
  • If you enter an incorrect user name or password, an error message “Access denied for user” is returned, and the client stops executing.
  • If the user name and password are authenticated, the connector checks the permission of the account based on the permission table. The subsequent judgment of permissions will be based on the permissions obtained at this time. (This means that changing the permissions of an account that is already logged into the database does not take effect immediately.)

In addition, if there is no operation for a certain amount of time after establishing a connection with mysql, the connection will be actively disconnected, which is controlled by the wait_TIMEOUT field

mysql> show variables like 'wait_timeout';
Copy the code

The query cache

Mysql 8.0 has removed the query caching function completely

Query cache is simply a record of every time the query results, when the same query requests came in again before returning to save the query results directly and, in turn, skip the follow-up process, it is very efficient for the repeated request, of course, but the query cache is often do more harm than good, because the query cache invalidation is very frequent, as long as there is update on a table, All query caches on this table will be cleared. So it’s very likely that you struggled to save the results, and an update wiped them out before you even used them.

Of course, there are certain advantages to this approach for data tables with high frequency reads and low frequency writes, such as configuration.

The best practice is to turn off the default query cache and, for those identified as needed, display the specified use of the query cache

Mysql > select SQL_CACHE * from T where ID=10;Copy the code

analyzer

The main function of the parser is to parse the syntax of SQL, that is, to judge the meaning of the SQL statement You input (i.e. what You are trying to do) and whether it is in accordance with the syntax. If the statement is illegal, it will throw ‘You have an error in your SQL syntax’.

The optimizer

The optimizer determines which index to use when there are multiple indexes in the table. Or determine the order in which tables are joined when a statement has multiple joins.

For example, the following syntactic sentence

mysql> select * from t1 join t2 using(ID)  where t1.c=10 and t2.d=20;
Copy the code
  • It can first fetch the ID value of the record c=10 from table T1, then associate it with table T2 according to the ID value, and then judge whether the value of D in T2 is equal to 20.
  • It can also take the ID value of the record d=20 from table T2, and then associate it with T1 according to the ID value, and then judge whether the value of C in T1 is equal to 10.

The optimizer is responsible for deciding which of the two approaches is better

actuator

Take the query statement as an example. Before executing the query statement, the executor checks the permission, that is, whether the account has the permission to operate the table. If it has the permission, the executor opens the table to continue execution.

mysql> select * from T where ID=10;
Copy the code

When a table is opened, the executor uses the interface provided by the table engine according to its definition. For example, in table T of our example, the ID field has no index, then the execution flow of the executor is like this:

  1. Call InnoDB engine interface to fetch the first row of the table, check whether the ID value is 10, if not skip, if yes, save the row in the result set;
  2. Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched.
  3. The executor returns the recordset of all rows that meet the criteria in the above traversal as a result set to the client.

At this point, the statement is complete.

The same logic is performed for indexed tables. The first call is the “fetch the first row that meets the condition” interface, and then the “next row that meets the condition” interface is iterated, all of which are already defined in the engine.

How does Mysql’s logging system execute an update statement

In this chapter, we will take a look at how an update statement is executed. Here is an example of the most basic update statement

mysql> update T set c=c+1 where ID=2;
Copy the code

First of all, you can be sure that the query statement of that set of processes, update statements will also go through.

  • You connect to the database through a connector before you execute a statement;
  • When an update occurs on a table, the query cache associated with that table is invalidated, so this statement clears all cached results on table T. This is why we generally don’t recommend using query caching.
  • Next, the parser knows that this is an update statement through lexical and syntactic parsing.
  • The optimizer decides to use the index ID.
  • The executor is then responsible for executing, finding the line, and updating it.

Update statements differ from query statements in the log system. Update statements involve the redo log and binlog log modules.

redo log

Redo log is only supported in the InnoDB engine, which is one of the main reasons it is the default mysql engine

In particular, when a record needs to be updated, the InnoDB engine writes the record to the redo log and updates the memory. The update is complete. At the same time, the InnoDB engine updates the operation record to disk when appropriate; If you do not use this method, every update operation has to be written to disk, then the IO cost will be relatively high, using this method can effectively improve the execution efficiency.

WAL (WAL) Logging (WAL) is a Logging technique used in MySQL. WAL stands for write-Ahead Logging.

It is also important to note that redore-log can be viewed as a circular buffer whose size is fixed. If you configure a group of four 1GB files, the size of the buffer is 4GB. When the last file is full, the buffer continues from the beginning of the first file

There are two Pointers to mark the position, write_pos is used to mark the current write position, not write a piece of data to move back once; Check_point marks the current location to be erased. After data is updated to the disk, data is moved backwards.

Between write POS and checkpoint is the portion that is still empty and can be used to record new operations. If write POS catches up with checkpoint, redolog is full and no new updates can be performed. We must stop and erase some records to push checkpoint forward. With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.

bin log

The redo log we talked about above is an InnoDB engine specific log, and the Server layer has its own log called binlog.

Why are there two logs?

Because there was no InnoDB engine in MySQL at first. The MySQL engine is MyISAM, but MyISAM does not have crash-safe capability. Binlog can only be used for archiving. InnoDB uses a redo log to implement crash-safe functionality, since it does not rely solely on binlogs.

There are three differences between the two types of logs.

  1. Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines.
  2. A redo log is a physical log of what was changed on a data page. The binlog is the logical log that records the original logic of the statement, such as “add 1 to the C field on the line ID=2”.
  3. Redo logs are written in a redo log cycle. Binlog can be appended. “Appending” means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log.

Two-stage execution

With redore-log and binlog in mind, let’s take a look at both logs as a whole or how the executor and InnoDB engine handle an UPDATE statement

  1. The executor first finds the engine and fetches the line ID=2. ID is the primary key, and the engine uses the tree search directly to find this row. If the row ID=2 is already in memory, it is returned directly to the executor. Otherwise, you need to read into the memory from disk and then return.
  2. The executor takes the row given by the engine, adds 1 to it, for example, N+1 to get a new row of data, and then calls the engine interface to write the new row of data.
  3. The engine updates the data to memory and logs the update to the redo log. The redo log is prepared. The executor is then told that the execution is complete and the transaction can be committed at any time.
  4. The executor generates a binlog of this operation and writes the binlog to disk.
  5. The executor calls the commit transaction interface of the engine, and the engine changes the redo log to the commit state.

The light-colored part of the figure is executed in InnoDb, and the dark-colored part is executed in the executor

The last three steps seem to be a bit of a “two-phase commit”. You split the redo log into two steps: prepare and commit.

So why two-phase commit? During data recovery, the OPERATION and maintenance (O&M) uses the bin-log for data playback. Therefore, the data recorded in the bin-log must be consistent with the data actually stored in the disk. Data consistency must be considered in abnormal cases as follows:

First, let’s take a look at the three-step simplification process:

  1. Prepare stage
  2. Write the binlog
  3. commit
  • When it crashes before 2
    • Restart recovery: It is found that there is no commit or binlog record, and data is not actually written to the disk.
    • Backup recovery: There is no binlog, so using binlob to restore data is as expected
  • When it crashes before 3
    • Restart recovery: Although the commit is not performed, the system is automatically committed after the restart because the prepare and binlog are complete.
    • Backup and restore: There are binlog records, so using binlob to restore data is as expected

If the two-phase execution is not used, the redolog data will crash and binlog data will fail to be written to the disk after the redolog data is written to the disk. However, the data will be lost when the bin-log data is rolled back

Some common Q&A about mysql logging system

Q: Since redo-log also writes to disk, what’s the difference from writing data directly to disk?

Because data is written randomly to disk, I/O consumption is high. Redo logs are sequential and do not require location tracking. Updating data requires location tracking, which is cheaper. In addition, redo-log can be used to uniformly write the changes of multiple transactions to disk in a group commit manner, which improves throughput.

Q: If a transaction is complete and redore-log is complete but the data is not flushed to disk, does the select data get no updates?

This is not the case, because the MVCC mode, the select does not directly remove the disk to retrieve data, but preferentially from memory; According to the content in this document, the data corresponding to the update operation must be in the memory or has been flushed to the disk. Therefore, the data to be read is the updated data. There is no problem that the data cannot be read despite the successful update

Q: Redo-log is in the prepare state. The binlog is complete. In this case, the mysql database crashes.

Normally, redore-log is considered normal only when it is commit. However, during the failover process, data in this state is automatically committed, so records in the binlog are still valid