How do SQL statements execute?

Mind mapping

preface

After collating MySQL notes, I made the above diagram. However, after reading Lin Xiaobin’s column, I am not going to get into these details directly in this article. Instead, I will start from the various common SQL, and break down the process that they execute. In the select and update section, I’ll add a few more details about order BY and group BY.

How is SELECT performed

Select * from MySQL; select * from MySQL;

Mysql > select * from T where ID=10;Copy the code

Mysql will then return the result from table T with ID 10. Let’s take a look at how this works.

Logical architecture of MySQL

In the figure above, you can visually see what components do with an SQL after it is submitted.

Generally speaking, MySQL can be divided into Server layer and storage engine layer two parts.

The Server layer consists of:

The connector
The query cache
analyzer
The optimizer
actuator

Covers most of MySQL’s core service functions, as well as all built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.

The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.

InnoDB supports transactions,MyISAM does not. You can think of it this way, business needs concurrency, choose InnoDB, don’t need concurrency, choose MyISAM. Now, in Web business, concurrency is a must, so the industry is pretty much using InnoDB. In my mysql articles, I only discuss InnoDB by default.

The connector

When we send a SQL to the mysql server, we need to establish a connection through the connector.

The role of connectors includes managing connections and authentication.

Once authenticated, the connection remains “authenticated” until the connection is disconnected, even if the user password is changed.

If the client is inactive for too long, the connector automatically disconnects it. This time is controlled by the wait_timeout parameter, and the default is 8 hours.

The query cache

After the connection is established, you can execute the SELECT statement.

Mysql also checks to see if the statement has a cache before executing it, and returns the cached result.

As mentioned in the previous article,mysql data is stored on disk. Mysql does not want to read disk every time it executes a large number of select statements, so it saves them for reuse.

In practice, however, it turns out that query cache invalidation is so frequent that if a table updates something, all query caches on that table will be cleared.

The result of painstaking efforts to save, and then removed before it could be used. MySQL removes the query cache directly after 8.0.

But even in previous versions, query caching was not recommended except for configuration tables that were rarely updated.

analyzer

Does the query cache end and the SQL statement is executed?

No, mysql also needs to check your SQL through the profiler before doing so.

The main work is:

Lexical analysis: You are entering an SQL statement consisting of multiple strings and Spaces. MySQL needs to figure out what the strings are and what they represent. For example, to identify the keyword “select”, this is a query statement. Recognize “T” as table name T, and “ID” as column ID.
Parsing: Based on the results of lexical analysis, the parser determines whether the SQL statement you entered meets the MySQL syntax based on the syntax rules. For example, does select spell selct

The optimizer

With the parser done, is it time to execute?

I’m sorry, but I can’t. Even if your statement doesn’t have a problem, the optimizer will have to optimize it for efficient execution.

The optimizer determines which index to use when there are multiple indexes in the table. Or determine the order in which tables are joined when a statement has multiple joins.

Maybe you’ve come across the concept of executive plan Explain, but it doesn’t matter if you haven’t, and we’ll cover that later. The optimizer’s job is to develop an efficient execution plan before actually starting work.

actuator

All the above steps are performed in MySQL Server, and MySQL data is stored in the storage engine, so the executor has to use the interface provided by the storage engine to obtain our data.

Call InnoDB engine interface to fetch the first row of the table, check whether the ID value is 10, if not skip, if yes, save the row in the result set;
Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched.
The executor returns the recordset of all rows that meet the criteria in the above traversal as a result set to the client.

At this point, the statement is complete.

How is the UPDATE performed

Update requires the same steps, but the execution is different.

Select has a cache, but update also has a cache. If mysql receives a large number of write requests at any one time, and mysql has to write to disk once for each request, the time spent on disk I/O will seriously affect mysql’s performance and cause QPS to fail.

Mysql’s policy is to write to the log first (in memory cache) and then to disk when it is less busy.

This technique is called WAL, write-Ahead Logging. This is the opposite of Redis, which saves the data first and then writes aOF.

Redo logs. With redo log, InnoDB guarantees that all previously committed records will not be lost in the event of an unexpected database restart. This capability is called crash-safe.

InnoDB’s redo log is fixed in size. For example, it can be configured as a group of 4 1GB files, so the redo log records 4GB of operations. Start at the beginning, then go back to the beginning at the end, as shown in the figure below.

Write_pos is a record point that moves back with each piece of data written.

Check_point is the erase point, which moves back each time a piece of data is written back to disk.

Write_pos writes back to disk and check_point writes back to disk. When check_point catches up with write_pos, all data is written back to disk. When write_pos catches up with check_point, the cache is full and cannot process new write requests.

In addition to the redo log, you also need to write a binlog to update data. The difference is:

Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines.
A redo log is a physical log of what was changed on a data page. The binlog is the logical log that records the original logic of the statement, such as “add 1 to the C field on the line ID=2”.
Redo logs are written in a redo log cycle. Binlog can be appended. “Appending” means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log.

So redo logs are used to ensure persistence during database downtime, and binlogs are used to restore data versions. The redo log is the primary log and the binlog is the secondary log.

Operations on redo logs and binlogs need to be atomic.

If a redo log is written to a transaction and the database restarts after the redo log is lost, the transaction will be lost after the database restarts.
If a transaction writes to a binlog and the redo log is lost after the database restarts, the transaction is lost after the redo log is written back to disk, but the transaction is lost after the binlog recovery.

Mysql uses two-phase commits to ensure that redo log and binlog operations are atomic, either saved or lost.

Here is their two-phase submission process:

How is ORDER BY implemented

In the SELECT phase, if there is a WHERE condition, the logic of using the index is the same as if there is no order BY.

If there is no WHERE condition, the order by field is overwritten by the index, and the SELECT field happens to be on the index (no need to return to the table), then the index will be used. Otherwise, full table scan is performed.

After the select, if the data is sorted, there is no need to sort. Or else.

Mysql creates a space in memory for sorting, called inner sort.

If the amount of data is too large to fit in the buffer, you have to use the hard disk to help sort, this is called external sort.

How is GROUP BY implemented

Select * from group BY; select * from group BY; select * from group BY; select * from group BY; select * from group BY; select * from group BY; select * from group BY; select * from group BY; select * from group BY; select * from GROUP BY; select * from GROUP BY; select * from GROUP BY Group the columns in the index of BY and fetch the data.

If there is no where condition, check whether group by can be grouped by the corresponding index.

It is important to note, however, that the group by field must be fully included in the index, not just satisfy the left-most prefix rule. If the group by column is (x) or (x,y), the group by column (x,y,z) will invalidate the index.

The group by index logic does not depend on whether the table is returned or not, so the select field is arbitrary.

If the index is not available,mysql must assign a temporary table for grouping

Group by also covers concepts such as loose indexes and compact indexes, but these will be covered in more detail in the article on index optimization.