MySQL Basic Architecture
Take a look at the basic architecture of MySQL. You can clearly see the execution process of SQL statements in each functional module of MySQL.
Generally speaking, MySQL can be divided into Server layer and storage engine layer two parts.
The Server layer includes connectors, query caches, analyzers, optimizers, actuators, etc. It covers most of the core service functions of MySQL, as well as all the built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.
The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.
In other words, if you do not specify the engine type when executing create Table, InnoDB is used by default. However, you can also select another engine by specifying the type of storage engine, such as using engine=memory in the CREATE Table statement to specify the use of memory engines to create tables. Different storage engines store table data in different ways and support different functions, and we will discuss engine choices in a later article.
As you can see from the figure, the different storage engines share a Server layer, from the connector to the actuator. To give you an idea of the name of each component, I’m going to walk you through the execution process and see what each component does in turn.
The connector
As a first step, you will connect to the database, and your connector will receive you. Connectors establish connections to clients, obtain permissions, and maintain and manage connections. The join command is written like this:
mysql -h$ip -P$port -u$user -p
Copy the code
After typing the command, you need to enter the password in the interactive dialogue. Although the password can also be written directly after -p on the command line, this can cause your password to be compromised. If you are connected to a production server, you are strongly advised not to do this.
Mysql is the client tool used to establish a connection with the server. After the classic TCP handshake, the connector authenticates your identity using the username and password you enter.
- If the username or password is wrong, you’ll get an error saying “Access denied for user” and the client program will stop executing.
- If the user name and password are authenticated, the connector will go to the permission table to find out which permissions you have. After that, the permission determination logic in the connection will depend on the permissions read at that time.
This means that once a user successfully establishes a connection, even if you change the user’s permissions using the administrator account, the existing connection permissions will not be affected. After the modification is complete, only newly created connections will use the new permission Settings.
Once the connection is complete, if you have no subsequent action, the connection is idle, as you can see in the show ProcessList command. The graph in the text is the result of show ProcessList, where the Command column is displayed as “Sleep”, indicating that there is now a free connection in the system.
If the client is inactive for too long, the connector automatically disconnects it. This time is controlled by the wait_timeout parameter, and the default is 8 hours.
If the client sends a request again after the connection has been disconnected, it will receive an error message: Lost Connection to MySQL Server during Query. At this point, if you want to continue, you need to reconnect and then execute the request.
In the database, a persistent connection means that the same connection is used all the time if the client continues to receive requests after a successful connection. A short connection is one that is disconnected after a few queries are executed and then re-established the next time.
The process of establishing a connection is usually complicated, so I recommend that you minimize the action of establishing a connection by using long connections as much as possible.
However, with full use of long connections, you may find that sometimes MySQL memory usage increases very quickly, because the memory temporarily used by MySQL during execution is managed in connection objects. These resources are released when the connection is disconnected. If long connections are accumulated, the system may use too much memory and kill it forcibly (OOM). In this case, MySQL restarts abnormally.
How to solve this problem? There are two options you can consider.
- Disconnect long connections periodically. Use for a period of time, or in the program to determine the execution of a large memory – consuming query, disconnect, then query and reconnect.
- If you are using MySQL 5.7 or later, you can re-initialize the connection resource by executing mysql_reset_connection after each large operation. This process does not require reconnection and re-authentication of permissions, but restores the connection to the state it was in when it was created.
The query cache
After the connection is established, you can execute the SELECT statement. The execution logic leads to the second step: query cache.
When MySQL gets a query request, it goes to the query cache to see if the query has been executed before. Previously executed statements and their results may be cached directly in memory as key-value pairs. Key is the query statement and value is the query result. If your query can find the key directly in the cache, the value will be returned directly to the client.
If the statement is not in the query cache, subsequent execution phases continue. After the execution is complete, the results are stored in the query cache. As you can see, if the query hits the cache, MySQL can return the result directly without having to perform further complicated operations, which can be very efficient.
But in most cases I would advise you not to use query caching, why? Because query caching often does more harm than good.
The query cache invalidates so frequently that whenever a table is updated, all the query cache on that table is cleared. So it’s very likely that you struggled to save the results, and an update wiped them out before you even used them. For databases under pressure to update, the hit ratio of the query cache can be very low. Unless your business has a static table, it only updates once in a while. For example, a system configuration table, the query on that table is suitable for query caching.
Fortunately, MySQL also provides this “on demand” approach. You can set the query_cache_type parameter to DEMAND so that query caching is not used for default SQL statements. For statements where you are sure to use the query cache, you can specify it explicitly with SQL_CACHE, as in the following statement:
mysql> select SQL_CACHE * from T where ID=10;Copy the code
Note that MySQL 8.0 directly removed the entire query cache function, meaning that this function has been completely removed from MySQL 8.0.
analyzer
If the query cache is not hit, it is time to actually execute the statement. First, MySQL needs to know what you are going to do, so it needs to parse SQL statements.
The parser first does a “lexical analysis”. You are entering an SQL statement consisting of multiple strings and Spaces. MySQL needs to figure out what the strings are and what they represent.
MySQL knows from the keyword “select” that you typed in. This is a query. It also recognizes the string “T” as “table name T” and the string “ID” as “column ID.”
Once these identifications are done, “parsing” is done. Based on the results of the lexical analysis, the parser will determine whether the SQL statement you entered meets the MySQL syntax based on the syntax rules.
If your syntax is not correct, You will receive an error warning “You have an error in your SQL syntax”, such as the following statement select missing the starting letter “s”.
mysql> elect * from t where ID=1;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'elect * from t where ID=1' at line 1
Copy the code
A common grammar error will indicate where the first error occurred, so focus on the words immediately following “use near”.
The optimizer
After a profiler, MySQL knows what you need to do. Before execution can begin, it needs to be processed by the optimizer.
The optimizer determines which index to use when there are multiple indexes in the table. Or determine the order in which tables are joined when a statement has multiple joins. For example, if you execute a statement like the following, this statement performs a join on two tables:
mysql> select * from t1 join t2 using(ID) where t1.c=10 and t2.d=20;
Copy the code
- It can first fetch the ID value of the record c=10 from table T1, then associate it with table T2 according to the ID value, and then judge whether the value of D in T2 is equal to 20.
- It can also take the ID value of the record d=20 from table T2, and then associate it with T1 according to the ID value, and then judge whether the value of C in T1 is equal to 10.
The logical result of the two execution methods is the same, but the efficiency of the execution will be different, and the role of the optimizer is to decide which one to use.
After the optimizer phase is complete, the execution plan of the statement is determined, and then the executor phase is entered. If you have any questions about how the optimizer chose the index, or whether it might have chosen the index incorrectly, I’ll explain the optimizer separately in a later article.
actuator
MySQL knows what you want to do from the parser, it knows how to do it from the optimizer, so it goes to the executor phase and starts executing statements.
SQL > select * from table T; select * from table T; select * from table T; select * from table T; The query also calls Precheck to verify permissions before the optimizer).
mysql> select * from T where ID=10;
ERROR 1142 (42000) :SELECT command denied to user 'b'@'localhost' for table 'T'
Copy the code
If you have permission, open the table and continue. When a table is opened, the executor uses the interface provided by the table engine according to its definition.
For example, in table T of our example, the ID field has no index, then the execution flow of the executor is like this:
- Call InnoDB engine interface to fetch the first row of the table, check whether the ID value is 10, if not skip, if yes, save the row in the result set;
- Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched.
- The executor returns the recordset of all rows that meet the criteria in the above traversal as a result set to the client.
At this point, the statement is complete.
The same logic is performed for indexed tables. The first call is the “fetch the first row that meets the condition” interface, and then the “next row that meets the condition” interface is iterated, all of which are already defined in the engine.
You’ll see a ROWS_EXAMINED field in the database’s slow query log that shows how many rows were scanned during this statement execution. This value is accumulated each time the executor calls the engine to retrieve the row.
In some scenarios, the executor is called once and multiple rows are scanned inside the engine, so the number of rows scanned by the engine is not exactly the same as rows_EXAMINED. The internal mechanics of the storage engine will be explained in detail in a later article.
Questions after class
Select * from T where k=1; error: “Unknown column ‘k’ in ‘where clause'” At what stage do you think this error was reported?