This article has participated in the third “topic writing” track of the Denver Creators Training Camp. For details, check out: Digg Project | Creators Training Camp third is ongoing, “write” to make a personal impact.

Logical architecture of MySQL

From the perspective of logical architecture, MySQL is divided into two parts: server layer and storage engine layer. The server layer and the engine layer are usually referred to.

So, what are the roles of the server layer and the engine layer? Or what function are they each responsible for?

  • The Server layer can be divided into core service functions such as connectors, query caches, analyzers, optimizers, actuators, and all built-in functions such as date, time, math, and encryption functions, stored procedures, triggers, views, and so on.

All cross-storage engine functions are implemented in this layer, except for the storage and retrieval of data in the database.

  • The storage engine layer is responsible for storing and extracting data and managing physical files. A storage engine is a file access mechanism (called a storage engine) that interacts directly with files through the abstract interface of the file access layer provided by MySQL AB.

The storage engine is designed as a plug-in, so you can choose the right storage engine based on your needs. Common storage engine MyISAM, transaction support InnoDB, Memory type Memory, Archive type Archive, column storage Infobright, RocksDB based on MyRocks and RocksDB, organized storage by fractal tree index TokuDB, etc.

Different storage engines share a Server layer.

Before MySQL 5.5.5, the default storage engine was MyISAM. From MySQL 5.5.5, the default storage engine was InnoDB.

The following is a brief description of the logical structure of MySQL, and the functions of each component, which explains the execution process of a (query) statement:

When creating a table, InnoDB is used by default if the engine type is not specified.

See the role of the Server layer and engine layer from a SQL statement query

Create a simple table T for the storage engine that is explicitly specified as memory type in the table statement:

 create table T(
    Id int(11) NOT NULL AUTO_INCREMENT
 ) ENGINE=memory;
Copy the code

Take the following example, the simplest SQL query records by Id, to see the EXECUTION of SQL.

mysql> select * from T where ID=10;Copy the code

Note: the following execution stages are not strictly implemented in accordance with 1, 2, 3 and 4 step by step. The query of their respective function calls will be interlaced with each other in actual implementation, especially in the part of permissions.

The following is only a logical division of the understanding of aspects.

1. The connection

First, connect to the database. Connectors are used to establish connections with clients, obtain permissions, and maintain and manage connections

Client connection process

Here is a join command from the command line (the first step in executing SQL) :

mysql -h$ip -P$port -u$user -p
Copy the code

After press Enter, you need to enter your password in the interactive dialog. (It is not recommended to write passwords in plain text after -p.)

Mysql is the client tool used to establish a connection with the server. After completing the TCP handshake connection, the user name and password entered are authenticated.

  • If the username or password is incorrect, the client will receive an error saying “Access denied for user” and the program will stop executing.
  • If the user name and password are authenticated, the connector checks the permissions in the permission table. After that, the permission determination logic in the connection will depend on the permissions read at that time.

After a user is successfully connected, the permission of the user is changed by the administrator account, but the permission of the existing connection is not affected.

That is, after the change of permission, you need to log in to the connection again, will use the new permission.

After the connection is established, the SQL interface in the Server layer receives the sent SQL statements.

Connection status

Once the connection is complete, if no action is taken, the connection is idle.

The show processList command shows the connected threads. The Command column showing “Sleep” indicates that the connection is currently idle.

The link display varies slightly from command line tool to command line tool.

If the client does not perform any operation for a long time, the connector automatically disconnects it. This time is controlled by the wait_timeout parameter, which defaults to 8 hours.

Lost Connection to MySQL Server during Query: Lost Connection to MySQL Server during Query: Lost connection to MySQL Server during Query

There is also a similar Connect_timeout parameter, and the same interactive_timeout.

  • Connect_timeout refers to the wait time “while connecting.”

  • Wait_timeout refers to the amount of time that no action is taken “while the connection is complete”.

  • Wait_timeout is the non-interactive connection time for the client.

  • Interactive_timeout Indicates the interactive connection between the software program and the MySQL Server. Interactive_timeout and wait_timeout should be the same as possible.

Long connection may result in OOM

Database:

A long connection means that the same connection is used all the time if the client continues to receive requests after a successful connection.

A short connection is one that is disconnected after a few queries are executed and then re-established the next time.

Long connections are usually recommended to reduce the cost and complexity of re-establishing connections.

Long connection problems:

If all connections use long connections, the memory usage of MySQL may increase too fast. The reason is that the memory temporarily used by MySQL during execution is managed in connection objects, and these resources will be released when the connection is disconnected.

If long connections continue to accumulate, the memory usage will be too high and the system will forcibly kill the MySQL server (OOM).

OOM, the full name Of OOM is Out Of Memory.

There are two solutions:

  1. Disconnect long connections periodically. Use for a period of time, or in the program to determine the execution of a large memory – consuming query, disconnect, then query and reconnect.

  2. For MySQL 5.7 and above, you can reinitialize the connection resource by executing mysql_reset_connection() after each large operation. This process does not require reconnection and re-authentication of permissions, but restores the connection to its original state.

Mysql_reset_connection () is the C API of MySQL. Cannot be used directly on the command line.

Mysql_reset_connection () returns a value of 0 for success and a non-0 for error, which affects the following information related to session status:

  • Roll back active transactions and reset automatic commit mode
  • Release all table locks
  • Close or delete all TEMPORARY tables
  • Reinitializes the system variable value of the session
  • Lost user-defined variable Settings
  • Release Prepared statement
  • Close the HANDLER variable
  • Set the value of last_insert_id() to 0
  • Release the lock obtained by get_lock()
  • Clears the current query properties defined by the mysql_bind_param() call.

2. Query the cache

The query cache

The execution logic leads to the second step: query cache.

After the query request is clicked, MySQL checks whether the query cache exists.

The query cache is where previously executed statements and their results are directly cached in memory as key-value pairs. Key is the query statement and value is the query result.

If the query cache is hit, the result is directly returned to the client, which is very efficient.

If the query cache is not matched (the query statement is not in the query cache), the subsequent operations are performed. The results of the execution are stored in the query cache (if query_cache is enabled)

Query caching is not recommended

Query caching tends to do more harm than good. The invalidation of the query cache is so frequent that whenever a table is updated, all the query cache on that table is cleared.

Also, any character differences, such as Spaces, comments, etc., will result in a cache miss.

For databases under pressure to update, the hit ratio of the query cache can be very low.

Unless your business has a static table, it only updates once in a while. For example, a system configuration table, the query on that table is suitable for query caching.

MySQL can set query_cache_type to DEMAND so that query caching is not used for default SQL statements.

Statements that determine to use the query cache can be explicitly specified using SQL_CACHE, such as:

mysql> select SQL_CACHE * from T where ID=10;Copy the code

Query caching is turned off by default. Restart MySQL services by adding query_cache_type=2 to the MySQL server configuration file.

MySQL 8.0 removes the entire query cache function from MySQL 8.0.

Several parameters related to query cache:

  1. Query_cache_type =0 (or OFF) means closed; 1 (or ON) indicates open; 2 (or DEMAND) indicates that SQL_CACHE is cached only if SQL_CACHE is specified explicitly in select.

  2. To query query_cache_type variables: show variables like ‘query_cache_type’; .

  3. The have_query_cache configuration parameter indicates whether this mysql version supports query caching. MySQL8.0 show variables like ‘have_query_cache’; Return to the NO.

  4. Query_cache_limit indicates the maximum allowed cache for a single result set.

  5. Query_cache_min_res_unit Minimum memory for each cached result set.

  6. Query_cache_size Is used to query the memory size of the cache. Query_cache_size =0 indicates that no cache is used. You are advised to disable query_cache_type=0.

  7. Qcache_free_memory Queries the remaining space of the cache.

  8. Qcache_hits Queries the number of cache hits.

  9. Qcache_inserts Queries the number of cache inserts.

3. The analyzer

If the query cache is not hit, or if the query cache is not enabled, the SQL statement is parsed by the analyzer.

  1. The parser first does a “lexical analysis”. An SQL statement consists of multiple strings and Spaces. MySQL needs to identify what the strings are and what they represent.

For example, the input “select” keyword is identified as a query statement. At the same time, the string “T” is recognized as “table name T”, the string “ID” is recognized as “column ID”, and so on.

  1. Then do “grammar analysis”. Based on the results of the lexical analysis, the parser will determine whether the SQL statement you entered meets the MySQL syntax based on the syntax rules.

If the statement has syntax errors, such as this oneelect * from t where ID=1;.selectMissing the beginnings“You have an error in your SQL syntax.

Syntax errors will indicate the first error location, the “use near” position of the error message.

  1. Semantic analysis (semantic analysis), the above analysis results (that is, the generated parsing tree), further analysis of the string identifier semantics, check whether the table, column, alias is ambiguous, and so on, after passing the generation of a new parsing tree.

The parser part can be divided into parsers and preprocessors.

  • The parser processes the syntax and parses the query, generating the corresponding “parse tree” of the lesson. The parser breaks the query into identifiers (lexical analysis) and then ensures that the identifiers in the query are valid, checking for grammatical errors (parsing), such as whether the Identifier is valid, whether the statement is closed, and so on.

  • The preprocessor further checks the validity of the parse tree to resolve semantics that the parser cannot resolve. For example, check whether tables and columns exist, and check that names and aliases are unambiguous. If it passes, a new parse tree is generated and submitted to the optimizer.

That is, the “semantic analysis” of the preprocessor is needed before the optimizer.

The core of a Parser is divided into lexical analysis, syntax analysis and semantic analysis

Select * from T where k=1; select * from T where k=1; Unknown column ‘k’ in ‘where clause’ error: Unknown column ‘k’ in ‘where clause’

4. The optimizer

The optimizer is specifically designed to optimize queries (including all types of SQL statements: DDL and DML).

The optimizer generates an execution plan based on the parse tree and decides which index to use if there are multiple indexes in the table; When a statement has multiple table joins, the order in which the tables are joined is determined.

After the optimizer is executed, the execution plan of a statement is determined.

5. Executor (Query execution engine)

MySQL knows what SQL statements do from the parser and how to do it from the optimizer. All that remains is to enter the actuator (or execution phase) and follow the “execution plan.”

If you do not have permission to perform a query on T, it will return no permission.

SELECT command denied to user ‘xx’@’localhost’ for table ‘xx’

Permission validation is not only done in the executor.

After the parser, after knowing what the statement “does,” a permission validation is also done. This is called precheck (this is done in engineering implementations, which also verify permissions when a query cache hit returns).

Precheck cannot verify permissions on tables that are involved at runtime, such as when triggers are used. Therefore, there is also a run-time permission validation in the executor.

In the connect phase, you just “get permission information”; Only when you actually start querying actions do you decide whether you have permission to manipulate the table.

The permission validation in the executor phase is mainly to determine some associated operations, such as when updating a row, the trigger will update other tables.

The actual execution and invocation paths, as well as the execution stages, are called in the mysql_parse function.

If the permission is OK, the table is opened and execution continues.

When a table is opened, the executor uses the interface provided by the table engine based on its definition.

For example, if there is no index for the ID field of the T table, the execution flow would look like this:

  1. Call InnoDB engine interface to fetch the first row of the table, check whether the ID value is 10, if not skip, if yes, save the row in the result set;
  2. Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched.
  3. The executor returns the recordset of all rows that meet the criteria in the above traversal as a result set to the client.

This completes the entire execution logic of a SELECT statement.

For indexed tables, the same logic is performed (if the index can be hit), and index push-down is enabled by default. The logic is as follows: the first call is “fetch the first row that meets the condition” interface, then the loop is “fetch the next row that meets the condition” interface, and the engine provides these interfaces.

Without indexes, the executor loops through the engine interface, calling “line 1” and filtering data, then calling “line next” and filtering data until the last line. Filtering is done at the Server layer.

With indexes, the executor invokes the engine interface for the first row that meets the condition, and then the interface for the next row that meets the condition. Data filtering is done at the engine layer when indexing is available.

Index condition pushdown (ICP) is to transfer data filtering to the storage engine layer to reduce the burden on the Server layer and improve the query efficiency.

Index Condition Pushdown ICP is a new feature in Mysql5.6. The main focus is to put the filtering process in the storage engine layer instead of the Server layer as before.

In the case of using ICP, if there is some indexed column judgment conditions, MySQL server pass this part of the judgment condition to the storage engine, and then by the storage engine through judging index meets the requirements for the MySQL server transfer only when the index accords with a condition the data retrieved is returned to the MySQL server.

Push-down index condition optimization is a very good query optimization strategy. By pushing down index condition checking originally done by the Server layer to the storage engine layer, the storage engine can reduce The Times of querying basic tables and returning tables, and the MySQL Server can also reduce The Times of receiving data from the storage engine and reduce the filtering process of the Server layer.

The ROws_EXAMINED field in the MySQL slow query log indicates the number of times that the storage engine fetched data rows during the execution of this statement.

Rows_examined is considered broadly as how many rows are scanned during statement execution. Sometimes, however, multiple rows may be scanned inside the engine at a time when the executor is invoked, so the number of rows scanned by the engine is not exactly the same as rows_EXAMINED. But it can be seen as: the number of rows of data obtained by the actuator.

The role of storage engines

The storage engine organizes and manages physical files, including binary logs (binlogs are managed by the Server but also provide interfaces for the storage engine), data files, error logs, slow query logs, full logs, and redo/undo logs.

In addition, there are instances of engine programs running, including functional structures, in-memory data management, etc. (this is the implementation of the specific engine itself).

Different storage engines have different file organization and management methods.

At the same time, the storage engine provides an interface for the Server layer to access and access data, so that the execution engine of the Server layer can obtain or store data.

reference

Main reference from MySQL combat 45, “high performance MySQL”, as well as some online information!