preface

Recently, I began to learn mysql related knowledge. Based on the knowledge LEARNED, I sorted out and shared it according to my own understanding. This article will analyze the execution process of the next SQL statement in mysql, including how the SQL query will flow in mysql and how the UPDATE of SQL statement is completed. I’ll take you through the infrastructure of MySQL before I analyze it. Knowing what components MySQL is made of and what they do will help us understand and solve these problems.

First, mysql architecture analysis

Here is a brief architecture diagram of mysql:

Mysql is divided into Server layer and storage engine layer

Server layer: mainly includes connectors, query cache, analyzers, optimizers, actuators, etc. All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, functions, etc. There is also a general log module binglog log module.

Storage engine: mainly responsible for data storage and reading, using a plug-in architecture that can be replaced, support InnoDB, MyISAM, Memory and other storage engines, among which InnoDB engine has its own log module Redolog module.

InnoDB version 5.5.5 is the default engine.

The connector

Is mainly responsible for database user login, user authentication, including checking account password, permissions and other operations, if the user account password has passed, the connector will query the user to access list all permissions, in this connection the permissions after logic are will depend on the read permissions on the data at this time, that is to say, later as long as this connection is open, Even if the administrator changes the rights of the user, the user is not affected.

The query cache

After the connection is established, the Mysql database will check whether the SQL statement has been executed in the cache. The SQL statement is cached in the memory in the form of key-value, where Key is the query expectation and Value is the result set. If the cache key is matched, it is directly returned to the client. If the cache key is not matched, subsequent operations are performed and the result is cached for the convenience of the next call. Of course, the actual execution of the cache query will verify the user’s permission, whether there is a query condition of the table.

Mysql query does not recommend the use of the cache, because for frequently updated data, cache the valid time is too short, usually leads to effect is not good, for not frequently updated data, use the cache or can delete the cache function after Mysql version 8.0, the official also thinks that the function in the actual application scenario is less, So I just deleted it.

analyzer

If mysql does not hit the cache, then it will enter the parser. The parser is mainly used to analyze the SQL statement for what purpose.

The first step, lexical analysis, a SQL statement has multiple strings, the first to extract keywords, such as SELECT, put forward the query table, put forward the field name, put forward the query conditions and so on. Once you’ve done this, you’ll move on to step 2.

The second step, syntax analysis, mainly is to determine whether you input SQL is correct, whether the syntax of mysql.

With these two steps done, mysql is ready to execute, but how and what is the best result? This is where the optimizer comes in.

The optimizer

The optimizer’s job is to execute what it thinks is the optimal (though sometimes not optimal) execution, such as how to select an index for multiple indexes, how to select an association order for multiple table queries, and so on.

actuator

If the user does not have the permission, an error message is displayed. If the user does have the permission, the interface of the engine is invoked to return the result of the interface execution.

Second, statement analysis

2.1 Query Statement

With all that said, how exactly does an SQL statement execute? In fact, our SQL can be divided into two kinds, one is query, one is update (add, update, delete). Let’s first analyze the query statement, which is as follows:

select * from tb_student  A where A.age='18' and A.name='Joe';

Copy the code

With the above description, let’s analyze the execution flow of this statement:

  • Check whether the statement has permission, if not, directly return an error message, if yes, before mysql8.0, the first query cache, use this SQL statement as the key to query whether there is a result in memory, if there is a direct cache, if not, go to the next step.

  • Select * from tb_student where id=’1′; select * from tb_student where id=’1′; Then determine whether the SQL statement has syntax errors, such as whether the keyword is correct, etc., if the check is ok, proceed to the next step.

  • Next, the optimizer determines the execution plan. The above SQL statement can have two execution plans:

    A. Select student whose name is "Zhang SAN" from student table, then check whether age is 18. Select * from student where age is 18 and then select * from student whose name is "John".Copy the code

    The optimizer then chooses the solution that performs the best according to its optimization algorithm (sometimes not necessarily the best, according to the optimizer). Then confirm the execution plan and be ready to execute.

  • For permission verification, if there is no permission will return an error message, if there is permission will call the database engine interface, return engine execution results.

2.2 Update Statement

The above is a query SQL execution flow, so let’s see how an update statement is executed? The SQL statement is as follows:

update tb_student A set A.age='the' where A.name='Joe';
Copy the code

Let’s change the age of Zhang SAN. The age field will not be set in the actual database, otherwise it will be hit by the technical person in charge. Mysql has a binary log module that can be used by all storage engines. Mysql has a binary log module that can be used by all storage engines. Our common InnoDB engine also has a log module called redo log. Let’s explore the execution process of this statement in InnoDB mode. The process is as follows:

  • The first query to zhang SAN this piece of data, if there is a cache, also will use cache.
  • InnoDB takes the query, changes the age to 19, calls the engine API, and writes this line of data. InnoDB stores the data in memory and logs the redo log. The redo log is prepared and tells the executor that it is ready to commit.
  • After receiving the notification, the executor logs the binlog and calls the engine interface to submit the redo log as the commit state.
  • Update complete.

I’m sure some of you will ask, why two log modules? Why not one log module? MyISAM doesn’t have a redo log, so we know it doesn’t support transactions. It’s not that we can’t use a single log module, but InnoDB uses redo log to support transactions. So, again, I use two log modules, but don’t make it so complicated. Why did redo log introduce prepare commit state? So let’s use contradiction here to explain why we’re doing this, right?

  • If the machine hangs up and the redo log is not written to the binlog, the machine recovers data from the redo log after the restart. However, bingog does not record the data. This data will be lost during the subsequent machine backup and the master/slave synchronization.
  • Write binlog first, and then write the redo log, assuming that finished binlog, exception to restart the machine, since there is no redo log, this machine is unable to restore the record, but binlog and recorded, so for the same reason as above, can produce inconsistent data.

If the redo log is committed in two stages, the binglog is written and the redo log is committed. This prevents the above problems and ensures data consistency. So the question is, is there an extreme case? Suppose the redo log is pre-committed and the binglog has been written. What happens if an abnormal restart occurs? This is dependent on the mysql processing mechanism, mysql processing as follows:

  • Determine if the redo log is complete, and if so, commit it immediately.
  • If the redo log is pre-committed but not in the commit state, the binlog is determined to be complete. If the redo log is complete, the transaction is committed. If the redo log is incomplete, the transaction is rolled back.

This solves the problem of data consistency.

Third, summary

  • Mysql is mainly divided into Server layer and engine layer. The Server layer mainly includes connector, query cache, analyzer, optimizer, executor, and a log module (binlog), which can be shared by all execution engines.
  • The engine layer is plug-in type, including MyISAM,InnoDB,Memory and so on.
  • The execution process of a query statement is as follows: Permission check (if hit cache) — query cache — Analyzer — Optimizer — Permission check — Executor — engine
  • The update statement execution process is as follows: Analyzer —- Permission verification —- Executor engine — Redo log(prepare state — Binlog — Redo log(commit state)

Four, reference

  • Mysql Lecture 45
  • MySQL 5.6 Reference Manual