Do your best and obey the destiny. I am a postgraduate student in Southeast University and a summer intern in Java background development in Ctrip. I love fitness and basketball, and I am willing to share what I have seen and gained related to technology. I follow the public account @flying Veal and get the update of the article as soon as possible
This article has been included in the “CS-Wiki” Gitee official recommended project, has accumulated 1.7K + STAR, is committed to creating a perfect back-end knowledge system, in the road of technology to avoid detours, welcome friends to come to exchange and study
If you do not have a good project, you can refer to a project I wrote “Open source community system Echo” Gitee official recommended project, so far has accumulated 800+ star, SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo
MySQL has been widely used since the release of version 3.23 in 2001. With continuous upgrades and iterations, MySQL has passed 20 years.
In order to give full play to the performance of MySQL and use it smoothly, it is necessary to understand its design idea correctly. Therefore, it is necessary to understand the logical architecture of MySQL. This article will detail the various components of the MySQL architecture through the execution of an SQL query statement.
Overview of MySQL logical architecture
The most important and distinctive feature of MySQL is its pluggable Storage Engine Architecture, which is designed to separate query processing from other system tasks and data storage/extraction. Take a look at the official website to explain:
The MySQL pluggable storage engine architecture enables a database professional to select a specialized storage engine for a particular application need while being completely shielded from the need to manage any specific application coding requirements.
Basically, the MySQL pluggable storage engine architecture enables developers to select a specific storage engine for specific application requirements without managing any specific application coding requirements at all. That is, although different storage engines have different capabilities, applications are not affected by these differences.
If application changes introduce a need to change the underlying storage engine, or if one or more storage engines need to be added to support the new requirements, no significant coding or process changes can be made to keep things going. The MySQL server architecture insulates applications from the underlying complexity of the storage engine by providing a consistent and easy-to-use API that works across storage engines.
The logical architecture of MySQL is shown as follows:
We can roughly divide the logical architecture of MySQL into Server layer and storage engine layer:
1) Most of MySQL’s core service functions are in the Server layer, including joins, query resolution, analysis, optimization, caching and all built-in functions (e.g., date, time, math and encryption functions). All cross-storage engine functions are implemented in this layer: stored procedures, triggers, views, etc.
It is worth mentioning that the connector, the top service of the Server, has the function of managing MySQL connection and permission verification. This is obviously not unique to MySQL; most web-based client/server tools or services have a similar architecture.
2) The second layer is storage engine (support InnoDB, MyISAM, Memory and other storage engines). The storage engine is responsible for storing and extracting data in MySQL and responding to requests from the upper-layer server. Naturally, each storage engine has its advantages and disadvantages. Different storage engines cannot communicate with each other, so we need to choose the appropriate storage engine according to different scenarios.
The server communicates with the storage engine through apis. These interfaces mask the differences between storage engines and make these differences transparent to the upper-layer query process. The storage engine API contains dozens of low-level functions that perform operations such as “start a transaction” or “extract a row based on a primary key.”
Note that in MySQL 5.1 and earlier, MyISAM was the default storage engine, while InnoDB became the default storage engine after MySQL 5.5.5.
Connector
The official documentation for MySQL 5.7 describes the connector as follows:
MySQL Connectors provide connectivity to the MySQL server for client programs.
The MySQL connector provides the client program with a connection to the MySQL server. To be more specific, the connector actually does two things, one is to manage MySQL connections and one is to authenticate permissions. Let’s explain it one by one.
First, to connect to the MySQL server, we usually need to provide a MySQL user name and password, and specify a host name such as host if the server is running on a machine other than the one we are logging in to. So the join command looks like this:
shell> mysql -h host -u user -p
Enter password: ********
Copy the code
Of course, if you are logging in on the same machine as MySQL, you can omit the host name by using the following:
shell> mysql -u user -p
Copy the code
The command above should be familiar to you.
OK, after establishing a connection with the classic TCP three-way handshake, the connector will authenticate you based on the username and password you entered:
1) If the username or password is incorrect, you will receive an error saying “Access denied for user” and the client will stop executing.
2) If the user name and password are authenticated, you will see the following:
Mysql > tell you that mysql is ready to start typing SQL statements.
Of course, the connector does more than just compare the username and password; it also verifies that the user has permission to perform a particular query (for example, whether the user is allowed to execute a SELECT statement against the Country table in the World database). After that, all permission determination logic within the connection will depend on the permissions read at that point.
This means that when a user successfully establishes a connection, even if you change the permissions of the user on another terminal using the administrator account, the permissions of the existing connection will not be affected.
That is, when user permissions are changed, only newly created connections will use the new permissions.
When a connection is established, it is idle (Sleep) if you have no subsequent action.
In fact, for a MySQL connection (or thread), there is a state at any given moment that represents what MySQL is currently doing. There are several ways to view the current state, the simplest being to use the SHOW FULL PROCESSLIST Command (the Command column in the result of the Command return indicates the current state).
Over the life of a query, the state changes many times. The Sleep state in the figure above means that the connection is waiting for a new request from the client, and the Query state means that the connection is executing a Query or sending the result to the client.
In MyQL’s default Settings, if a connection is in Sleep state for 8 hours, the server will disconnect the connection and all subsequent operations on the connection will fail. This time is controlled by the wait_timeout parameter:
Query Cache
OK, after the connection is established, we can enter the SELECT statement to query. The execution logic comes to the second step: query the cache.
The official documentation explains Query Cache as follows:
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.
This means that the query cache stores the text of the SELECT statement and the corresponding result of the response to the client. This way, if the server later receives the same SELECT statement, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared between sessions, so a result set generated by one client can be sent in response to the same query issued by another client.
If the current query happens to hit the query cache, MySQL checks user permissions before returning the query result. There is still no need to parse the query SQL statement, because the query cache already holds the table information that the current query needs to access.
So since the cache is involved, the cache consistency problem is inevitable. Thankfully, no additional action is required and the query cache does not return stale data!
The query cache does not return stale data. When tables are modified, any relevant entries in the query cache are flushed.
When this flushed version is modified, any associated entries in the query cache will be flushed. Note that this flushed version translates as “flushed,” not flushed.
Doesn’t look bad? The invalidation cache can be emptied automatically without us having to do anything manually.
Unfortunately, however, as of MySQL 5.7.20, the use of query caching is no longer officially recommended and has been removed in MySQL 8.0!
The query cache is deprecated as of MySQL 5.7.20, and is removed in MySQL 8.0.
Actually it is easy to understand, for example, for a great BBS program flow, query the posts table needs every time exist, posts on the rise, and almost every moment that as long as the update table, the table all the query cache will be cleared, the pressure for the MySQL database, you can imagine. Take pains to save the query results, but before you can use them, they are all cleared by an update.
For versions prior to MySQL 8.0, you can set the query_cache_type parameter to DEMAND so that all SQL statements no longer use query caching. For statements where you are sure to use the query cache, you can specify it explicitly with SQL_CACHE, as in the following statement:
mysql> select SQL_CACHE * from t1 where id = 1;
Copy the code
Parser
If there is no hit or query caching is not enabled, the next thing the MySQL server needs to do is convert an SQL statement into an execution plan and then interact with the storage engine according to this execution plan. This includes several sub-stages: parsing the SQL, preprocessing, and optimizing the SQL execution plan. Any errors in this process, such as syntax errors, can terminate the query.
Parsing SQL and preprocessing is what the parser does, and optimizing the SQL execution plan is what the optimizer does. Let’s start with the parser.
The parser is used to parse SQL, and the preprocessor is used for preprocessing. I’ll call them both parsers for now
By parsing SQL, MySQL parses SQL statements using keywords and generates a “parse tree” to verify statements based on syntax rules. For example, it verifies that the wrong keywords are used, or that the keywords are used in the right order, or that the quotes match correctly.
Preprocessing further checks that the parse tree is valid, for example, to see if tables and columns exist, and to see if table and field names are correct.
Optimizer
Now that parsing the tree is legal, MySQL already knows what you have to do. However, a query can have many execution plans that all return the same result, so which execution plan to choose?
Here’s a simple example:
mysql> select * from t1 where id = 10 and name = "good";
Copy the code
For the above statement, name = good and then id = 10, or id = 10 and then name = good, the time cost of the two execution plans may be different.
The role of the optimizer is to find the best execution plan. It is important to note that the execution plan here is a data structure, not the bytecode that many relational databases generate.
In addition, the optimizer doesn’t care what storage engine the table uses, but the storage engine does have an impact on optimizing the query. The optimizer asks the storage engine for information on capacity or the cost of a specific operation, as well as statistics on table data.
When the optimizer phase is complete, the execution plan for the statement is determined and the executor phase can be entered.
actuator
As with the hit query cache, before executing the SQL statement, the executor determines whether the current user has permission to execute the query on the table. If not, it returns an error with no permission.
After permission authentication is complete, MySQL will execute the commands according to the execution plan step by step. In the process of step-by-step execution according to the execution plan, a large number of operations need to be done by calling the interfaces implemented by the storage engine, which we call the “Handler API.”
Each table in the query is represented by an instance of a handler. In fact, MySQL creates a handler instance for each table during the optimization phase. The optimizer can obtain information about the table based on the interface of these instances, including all column names of the table, index statistics, and so on.
Here’s an example:
mysql> select * from t1 where id = 10;
Copy the code
Assuming we use the default InnoDB engine, the execution flow of the executor looks like this (note that if the id is not an index, the whole table will be scanned, row by row, if it is an index, it will be queried in the index organization table, which is responsible. Here is a non-index example) :
1) Call InnoDB engine interface to get the first row of the table, determine whether the ID value is 10, if so, store the row in a collection; If not, proceed to the judgment of the next row until the last row of the table is fetched
2) The executor returns the recordset composed of all the rows that meet the conditions in the above traversal process to the client as the result
summary
To summarize the next query, use the following image from high Performance MySQL version 3:
-
The MySQL client establishes a connection with the server, and the client sends a query to the server.
-
The server checks the query cache first and returns the results stored in the cache immediately if a hit is made. Otherwise move on to the next stage;
-
SQL parsing and preprocessing are performed on the server to generate a legal parsing tree.
-
Then the optimizer generates the corresponding execution plan;
-
According to the execution plan generated by the optimizer, MySQL invokes the API of the corresponding storage engine to execute it, and returns the execution result to the client.
| flying veal pay close attention to the public, get updates immediately
- I am a postgraduate student in Southeast University and a summer intern in Java background development of Ctrip. I run a public account “Flying Veal” in my spare time, which was opened on 2020/12/29. Focus on sharing computer fundamentals (data structure + algorithm + computer network + database + operating system + Linux), Java technology stack and other related original technology good articles. The purpose of this public account is to let you can quickly grasp the key knowledge, targeted. Pay attention to the public number for the first time to get the article update, we progress together on the way to growth
- And recommend personal maintenance of open source tutorial project: CS-Wiki (Gitee recommended project, has accumulated 1.7K + STAR), committed to creating a perfect back-end knowledge system, in the road of technology to avoid detours, welcome friends to come to exchange learning ~
- If you don’t have any outstanding projects, you can refer to a project I wrote “Open source community System Echo” Gitee official recommended project, which has accumulated 800+ star so far. SpringBoot + MyBatis + Redis + Kafka + Elasticsearch + Spring Security +… And provide detailed development documents and supporting tutorials. Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo Echo