Series: MySQL series columns

An overview of the

MySQL series will take an in-depth look at some of the underlying principles and design of MySQL to better understand and use MySQL. This series will focus on the following aspects: MySQL architecture, Innodb storage engine, indexing and SQL optimization, locking, transactions, high availability and performance optimization, logging, etc. Some basic table design, add, delete, change query SQL, database paradigm and so on are not involved.

In this article, we first understand the architecture design of MySQL as a whole, and have an overall understanding of MySQL.

MySQL 5.7.29 is used locally

Some of the main references in this series are:

  • InnoDB Storage Engine Edition 2
  • How MySQL Works: Understanding MySQL from the Root
  • Geek Time — MySQL Tutorial 45
  • MySql 5.7 中文 版

MySQL Architecture

Logical architecture

MySQL architecture:

MySQL consists of the following parts:

  • Connection pool component
  • Manage service and tool components
  • SQL Interface Component
  • Query analyzer component
  • Optimizer component
  • Buffer components
  • Plug-in storage engine
  • Physical file

From the larger level, it can be divided into four levels:

  • Layer 1 Connection layer: mainly responsible for connection processing, authentication, security, etc., common C/S architecture has this layer.

  • Second core service layer: there are query caches, analyzers, optimizers, actuators, etc., and all built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc.

  • Layer 3 Storage engine layer: This layer is the implementation part of the underlying data access operation, which is composed of multiple storage engines. The server communicates with the storage engine through an API, which avoids the differences between storage engines and prevents them from communicating with each other.

  • Layer 4 Data storage layer: The file system stores data, data files, and log files, and interacts with the storage engine.

Client connection

Our application to access the MySQL database, must rely on a database driver, the MySQL driver, it will establish a network connection with the database at the bottom, there is a network connection, to send requests to the database server, to execute a variety of SQL statements based on this connection.

For Java, PHP, Perl,.NET, Python, Ruby and other common programming languages, MySQL will provide the corresponding language MySQL driver, so that the system written in various languages can access the database through the MySQL driver.

For example, in Java, we need to introduce mysql-connector-Java driver dependencies in pom.xml:

<dependency>
    <artifactId>mysql-connector-java</artifactId>
    <groupId>mysql</groupId>
    <version>5.1.47</version>
</dependency>
Copy the code

How is an SQL query executed

Let’s look at what each of these components does with the execution of an SQL query.

SQL query execution path

The following SQL query execution path diagram, in general, an SQL query statement will go through the following process to obtain the query result:

    1. The client sends an SQL query to the server.
    1. The server first checks the query cache, and if a hit is made, it immediately returns the result stored in the cache. Otherwise move on to the next stage.
    1. SQL is parsed and preprocessed on the server, and then the optimizer generates the corresponding execution plan.
    1. MySQL invokes the storage engine API to execute the query based on the execution plan generated by the optimizer.
    1. Returns the result to the client.

Client connection pool

The first thing to remember is that our application deployed in the container has multiple worker threads to handle user requests, and each thread will acquire a database connection to access the database. It is not efficient to create a new database connection every time, consume it, and then destroy it, so the client needs a database connection pool that takes one connection at a time from the pool to process SQL requests and then puts it back into the pool to avoid creating and destroying database connections too often.

Common database connection pools include DBCP, C3P0, Druid, etc.

MySQL connector

When a client requests a connection to the database, the connector is responsible for establishing a connection with the client, obtaining permissions, and maintaining and managing the connection. MySQL server also has a connection pool, because it is common for multiple systems to establish multiple connections to MySQL. MySQL maintains database connections to clients through this connection pool.

In addition, the connector also performs security authentication and library table permission authentication based on the requested account and password. If the username or password is incorrect, you will receive an error saying “Access denied for user”. If the user name and password are authenticated, the connector will go to the permission table to find out which permissions you have. After that, the permission determination logic in the connection will depend on the permissions read at that time. This also means that once a user successfully establishes a connection, even if you change the user’s permissions using the administrator account, the existing connection permissions will not be affected. After the modification is complete, only newly created connections will use the new permission Settings.

After a connection is completed, if there is no subsequent action, the connection is idle. If the client is inactive for too long, the connector automatically disconnects it. This time is controlled by the wait_timeout parameter, and the default is 8 hours.

For a MySQL connection, at any time there is a state that represents what MySQL is currently doing. You can use the SHOW FULL PROCESSLIST Command to view the status of the current connection (the Command column). Over the life of a query, the state changes many times.

The MySQL official manual provides the following states:

  • Sleep: Is idle. The thread is waiting for the client to send a new request.
  • Query: The thread is executing a query or sending results to the client.
  • Locked: At the MySQL server layer, the thread is waiting for a table lock. Locks implemented at the storage engine level, such as InnoDB’s row locks, are not reflected in thread state. This is a typical state for MyISAM.
  • Analyzing and statistics: The thread is collecting storage engine statistics and generating an execution plan for the query.
  • Copying to tmp table [on disk]: The thread is executing a query and copying its result set into a temporary table, typically doing either GROUP BY, file sort, or UNION. If this state is followed by an “on Disk” flag, it indicates that MySQL is placing a temporary memory table on disk.
  • Sorting result: The thread is sorting the result set.
  • Sending dataA thread may be passing data between states, generating a result set, or returning data to a client.

Client/server communication

The communication protocol between the MySQL client and server is half-duplex, which means that at any given moment, either the server is sending data to the client or the client is sending data to the server, and the two actions cannot occur simultaneously.

This protocol makes MySQL communication easy and fast, but it also limits MySQL in many ways. One obvious limitation is that there is no flow control, and once one end starts sending a message, the other end cannot respond to it until it receives the entire message.

The client sends the query to the server in a single packet. If the query is too large, the server rejects more data and throws an error (the size of the request can be controlled by the max_allowed_packet parameter). Once the client sends the request, all it can do is wait for the result.

In contrast, the server typically responds to the client with a large number of data packets. When the server starts responding to client requests, the client must receive the entire returned result in its entirety, not simply take the first few results and tell the server to stop sending data. So be sure to LIMIT the number of returns in your query if necessary.

When the client fetch data from the server, it looks like a process of pulling data, but it is actually a process of MySQL pushing data to the client. The client is constantly receiving data pushed from the server, and the client cannot stop the server.

SQL interface

A connection in the database connection pool receives a network connection. For example, when a client sends an SQL query, a MySQL worker thread listens for the request, reads the request data from the connection, parses the SQL statement sent by the client, and then forwards the connection to the SQL interface for processing.

We can easily understand the intention of a SQL statement, but the database to perform a variety of add, delete, change and query SQL is very complex, so MySQL internal first provides a SQL interface component, which is a set of SQL statement execution interface, specifically to execute the client sent to MySQL to add, delete, change and query SQL statements.

The query cache

Before parsing a query statement, if the query cache is open, MySQL checks first to see if the query matches the data in the query cache. Previously executed statements and their results may be cached directly in memory as key-value pairs. Key is the query statement and value is the query result. Even if the query differs from the cached query by only one byte, it will not match the cached result, in which case the query will proceed to the next stage of processing.

If the current query happens to hit the query cache, MySQL checks user permissions before returning the query result. There is still no need to parse the query SQL statement, because the query cache already holds the table information that the current query needs to access. If the permissions are ok, MySQL skips all other stages and gets the results directly from the cache and returns them to the client. In this case, the query is not parsed, no execution plan is generated, and no execution is performed, which can be very efficient.

However, it is recommended not to use query caching in most cases because it often does more harm than good. The query cache invalidates so frequently that whenever a table is updated, all the query cache on that table is cleared. So it is possible that cached results will be wiped out by an update before they can be used. For databases under pressure to update, the hit ratio of the query cache can be very low.

Note that MySQL 8.0 directly removed the entire query cache function, meaning that this function has been completely removed from MySQL 8.0.

Query parsers and preprocessors

Next, how does the SQL interface execute SQL? At this point, a parser is required to disassemble the SQL, generate a corresponding “parse tree”, and turn it into something that MySQL can understand.

SQL statements are composed of multiple strings and Spaces. MySQL needs to identify what the strings are and what they represent. Then the syntax is analyzed. Based on the results of the lexical analysis, the parser will determine whether the SQL statement you entered meets the MySQL syntax according to the syntax rules.

For example, it verifies that the wrong keywords are used, that the keywords are used in the right order, that the quotes match correctly, and so on. If your statement is incorrect, You will receive an error message saying “You have an error in your SQL syntax”.

The preprocessor further checks that the parse tree is valid based on some MySQL rules, checking for tables and columns, resolving names and aliases to see if they are ambiguous. Next, the preprocessor verifies that it has table and field permissions.

Query optimizer

A query can be executed in many ways, all returning the same result. The optimizer’s job is to find the best execution plan. MySQL uses a cost-based optimizer, which will try to predict the cost of a query using a certain execution plan and select the one with the least cost.

However, it is important to know that the optimal execution plan selected by the query optimizer may not be the best. There are many reasons why the MySQL optimizer may choose the wrong execution plan, such as the following:

  • Statistics are not accurate, MySQL relies on statistics provided by storage engines to estimate costs, but some storage engines provide accurate information, and others can be very wide. For example, InnoDB cannot maintain accurate statistics on the number of rows in a table because of its MVCC architecture.

  • The estimated cost of the execution plan is not the same as the actual cost of the execution. For example, the MySQL level does not know which pages are in memory and which are on disk, so it is impossible to know how many physical I/O will be required during the actual execution.

  • MySQL simply chooses the optimal execution plan based on its cost model, and sometimes this is not the fastest way to execute.

  • MySQL never considers other queries that are executed concurrently, which can affect the speed of the current query.

  • The optimizer is sometimes unable to estimate all possible execution plans, so it may miss the actual optimal execution plan.

The MySQL query optimizer is a very complex component that uses many optimization strategies to generate an optimal execution plan. Here are some of the types of optimizations that MySQL can handle:

  • Redefine the order of associated tables: Data tables are not always associated in the order specified in the query, and the order of associated tables is an important optimization.

  • Convert outer join to inner join: Outer join may be converted to inner join, and then adjust the table association order.

  • Use equivalence transformation rules: MySQL can use equivalence transformation to simplify and normalize expressions.

  • Optimize COUNT(), MIN(), and MAX() : MySQL takes advantage of the storage engine or some index features to optimize such expressions.

  • Estimate and convert to a constant expression: When MySQL detects that an expression can be converted to a constant, it always optimizes the expression as a constant.

  • Overwrite index scan: When the columns in the index contain all the columns in the query, MySQL can use the index to return the required data without querying the corresponding rows.

  • Subquery optimization: In some cases, subqueries can be converted to a more efficient form to reduce multiple queries accessing data multiple times.

  • Early termination of the query: The ability to terminate the query as soon as it is found that the query requirement has been satisfied. For example, when the LIMIT clause is used.

  • Equivalent propagation: If the values of two columns are related by an equality, MySQL can pass the WHERE condition of one column to the other column.

  • Comparison of IN() : MySQL sorts the data IN the IN() list and then uses binary lookup to determine whether the values IN the list meet the criteria.

We can add Explain Extended before a query SQL and show Warnings at the end to see the refactored query statement.

EXPLAIN EXTENDED 

select f from xxxx;
	
SHOW WARNINGS;
Copy the code

Actuators and storage engines

Now that the query optimizer has selected the optimal query path, it is up to the underlying storage engine to actually execute it. At this time, it is necessary to execute the engine according to the execution plan, in accordance with a certain sequence and steps, constantly call the storage engine’s various interfaces to complete the SQL statement execution plan, roughly is constantly update or extract some data out.

At the start of execution, check whether the table has permission to execute the query, if not, return no permission error. If you have permission, open the table and continue. When a table is opened, the executor uses the interface provided by the table engine according to its definition.

Storage engine is actually the execution of SQL statements, he will follow certain steps to query memory cache data, update disk data, query disk data and so on. MySQL supports a variety of storage engines, such as InnoDB, MyISAM, Memory, etc.

The storage engine

One of the most important features that distinguishes MySQL database from other databases is its plug-in table storage engine. The storage engine is the implementation of the underlying physical structure and is responsible for data storage and extraction. Each storage engine has its own characteristics. You can create different storage engine tables based on specific applications.

Common storage engines include InnoDB, MyISAM, Memory, etc. The most common storage engine is InnoDB, which has become the default storage engine since MySQL 5.5.8.

Build table

InnoDB is used by default if the engine type is not specified when executing create Table. You can also specify the engine to use with engine= XXX when creating the table.

When creating a table, regardless of the storage engine, MySQL creates a.frm file in the database subdirectory with the same name as the table to hold the table definition. For example, if you create a table named MyTable, MySQL will store the definition of the table in mytable. FRM.

You can run the show table status like

command to display information about tables or query tables in information_schema.

Some key information such as:

  • Name: indicates the table Name.
  • Engine: Storage Engine type of a table.
  • Row_format: indicates the format of a row. Dynamic has a variable length and generally contains variable-length fields such as VARCHAR or BLOB.
  • Rows: The number of Rows in a table. For InnoDB, this value is an estimate, and for some other storage engines, this value is accurate.
  • Avg_row_length: The average number of bytes per row.
  • Data_length: size of table data, in bytes.
  • Index_length: indicates the size of the index in bytes.
  • Auto_increment: indicates the value of the next Auto_increment.
  • Collation: The default character set and character column Collation for a table.

InnoDB storage engine

InnoDB storage engine supports transactions and is designed primarily for online transaction processing (OLTP) applications. It features a row lock design, support for foreign keys, and support for non-locked reads similar to Oracle, meaning that read operations do not generate locks by default.

The InnoDB storage engine puts data into a logical table space, a black box managed by InnoDB that consists of a series of data files. InnoDB can store data and indexes for each table in a separate file.

The InnoDB storage engine clustered tables so that each table is stored in primary key order. If the primary key is not explicitly specified at table definition, the InnoDB storage engine generates a 6-byte ROWID for each row and uses it as the primary key.

InnoDB uses multi-version Concurrency Control (MVCC) to support high concurrency and implements four isolation levels of the SQL standard. The default level is REPEATABLE READ. Meanwhile, the RR level prevents phantom by next-keylocking.

In addition, InnoDB storage engine also provides high performance and high availability functions such as Insert Buffer, double write, adaptive HashIndex and Read Ahead.

MyISAM storage engine

MyISAM offers a number of features, including full-text indexing, compression, spatial functions (GIS), and so on. However, MyISAM does not support transaction and row-level locking, and has a defect that it cannot safely recover from crashes. MyISAM is targeted at some OLAP database applications. In addition, another unique aspect of MyISAM storage engine is that its buffer pool only caches index files, not data files.

MyISAM stores tables in two files: data files and index files, respectively. MYD and.myi are extensions. The number of rows that a MyISAM table can store is generally limited by the available disk space or the maximum size of a single file on the operating system. Starting with MySQL version 5.0, MyISAM supports 256TB of single-table data by default, which is sufficient for general application requirements.

Memory engine

Memory tables can be useful if you need to access data quickly and if it can’t be modified or lost after a restart. Memory tables are at least an order of magnitude faster than MyISAM tables because all data is kept in Memory and no disk I/O is required. If the database restarts or crashes, the structure of the Memory table remains, but the data is lost.

Memory tables support Hash indexes, so lookups are very fast. Memroy tables are table-level locks, so concurrent write performance is low. It does not support BLOB or TEXT columns, and the length of each row is fixed, so even if a VARCHAR column is specified, the actual storage is converted to CHAR, which can result in a partial memory waste.

If MySQL needs to use temporary tables to hold intermediate results during query execution, the temporary tables used internally are Memory tables. If the intermediate results are too large for the Memory table, or contain BLOB or TEXT fields, the temporary table is converted to a MyISAM table.

Note the difference between a Memory TABLE and a TEMPORARY TABLE. A TEMPORARY TABLE is a TABLE created using the CREATE TEMPORARY TABLE statement. It can use any storage engine and is therefore not the same as a Memory TABLE. Temporary tables are visible only in a single connection, and when the connection is disconnected, the temporary table ceases to exist.

Storage Engine Comparison

MySQL storage engine there are many kinds, but generally we choose InnoDB storage engine, if we must use some InnoDB does not have the characteristics of the engine, we can also use other solutions to replace, such as cache Redis, message middleware and so on.

You can use the SHOW ENGINES command to view the storage ENGINES currently supported by MySQL:

Here are some comparisons of common storage engine functions :(refer to official documentation)

System file storage layer

The system file storage layer is mainly responsible for storing database data and logs in the system files and interacting with the storage engine. It is the physical storage layer of files.

The main files are data files, log files, and configuration files.

The data file

You can use SHOW VARIABLES LIKE ‘%datadir%’; Command to view the data file directory, in the data file directory we can see some of the following files.

  • db.opt: Records the default character set and validation rules used by this database.
  • Ib_logfile0, ib_logfile1: Redo log Log file.
  • .frm: Stores metadata information related to tables, including the definition of table structures. Each table has an FRM file corresponding to it.

InnoDB data files:

  • .ibd: An independent tablespace is used to store table data and index information. Each table corresponds to one IBD file
  • .ibdata: A shared table space is used to store table data and index information. All tables share one or more IBData files

MyISAM data file:

  • .MYD: Stores table data
  • .MYI: data tree used to store task indexes in table data files

The log file

Common logs include error logs, binary logs, query logs, slow query logs, transaction redo logs, relay logs, and so on.

SHOW VARIABLES LIKE ‘%\_log_%’; Example Query the current MySQL log usage. Specific log files will be covered in more detail in a later article.

The configuration file

This file is used to store all configuration information of MySQL, such as my. CNF and my.ini.

. You can use the mysql — help | grep my CNF to view the location of the mysql configuration file:

CNF -> /etc/mysql.my.cnf -> /usr/etc/my.cnf -> ~/.my.cnf. If the configuration is the same, the parameter in the last configuration file is used.