MySQL Architecture for high-performance MySQL

1.1 logical Architecture of MySQL

Most of the top level services are web-based clients, server-side tools such as connection handling, authorization, security, and so on.

The second layer is the core functions of MySQL, including query parsing, analysis, optimization, caching and all built-in functions. All functions across storage engines are implemented in this layer: stored procedures, triggers, views, etc.

The third tier includes the storage engine, which stores and extracts data from MySQL and does not parse SQL.

MySQL processing requests are divided into:

Connection management and security: Each client connection has a thread in the server process that authenticates the client when it connects to the MySQL server.
Optimization and execution: MySQL query before will first check the query cache, otherwise it will parse the query, and create the internal data structures (parsing tree), and then carries on the various optimization, including the query and decision table reading the sequence, choosing appropriate index, etc., the user can through the special keyword prompts the optimizer, influence the decision-making process.

1.2 Concurrency Control

1.2.1 read-write lock

When dealing with concurrent read or write operations, the problem is solved by implementing a lock system consisting of two types of locks: shared (read) and exclusive (write).

Implicit lock

InnoDB automatically locks when needed, depending on the isolation level
Locks are released only when commit or ROLLBACK is performed, and all locks are released at the same time

According to lock

Shared lock: select… lock in share mode
Exclusive lock: select… for update

1.2.2, lock granularity

Improving concurrency for shared resources requires a balance between the cost of locking and data security. Each storage engine can implement its own locking policy and lock granularity.

Table locks

Is the least expensive policy that locks the entire table, and the server uses table locks for statements like ALTER TABLE, ignoring the storage engine’s locking mechanism. Intentional lock is a special table-level lock. Before obtaining the shared lock and the exclusive lock of rows, we must obtain the shared lock and the exclusive lock of rows first, which is used to judge the conflict between table lock and row lock.

Row-level locks

Maximum support for concurrent processing with maximum overhead.

1.3 transactions

A transaction is an atomic set of SQL in which all statements are either executed successfully or failed. Four features of transactions:

Atomicity: All operations in the entire transaction are either committed successfully or rolled back on failure. It is not possible to perform only some operations to avoid dirty reads.
Consistency: The database always transitions from one consistent state to another consistent state to solve the problem of unrepeatable read.
Isolation: Changes made by one transaction are not visible to other transactions until they are finally committed.
Durability: Once a transaction is committed, their modifications are permanently stored in the database, 100% durability guaranteed policy is impossible in practice.

1.3.1 Isolation Level

READ UNCOMMITTED: A transaction can READ UNCOMMITTED data, resulting in dirty reads. Rarely used in practical applications.
READ COMMITTED: The default isolation level for most database systems is COMMITTED reads. Changes made before and after a transaction is started are invisible to other transactions and can cause unrepeatable reads. Two transactions, A and B,
REPEATABLE READ (REPEATABLE READ) : ensure that READ many times in the same transaction record is consistent with the result of the same, but will cause the phantom READ, phantom READ refers to when a transaction in the books within a certain range of records, another transaction within the scope of the insert or delete records, when before the transaction again, while reading the scope will produce phantom lines. Repeatable reads are MySQL’s default isolation level.
SERIALIZABLE: The highest isolation level, by forcing transactions to be executed serially, locks are placed on every row read, potentially leading to significant timeouts and lock contention.

1.3.2 deadlock

A deadlock occurs when two or more transactions occupy the same resource and request to lock the resource occupied by the other, resulting in a vicious cycle. To address deadlocks, the database system implements various deadlock detection and deadlock timeout mechanisms. InnoDB currently handles deadlocks and rolls back transactions that hold the least row-level exclusive locks. The behavior and order of locks is storage engine dependent.

1.3.3 Transaction logging

With transaction logging, the storage engine only needs to modify the in-memory copy of a table’s data and then record the changes in the transaction log that persists on disk, rather than persisting the data itself to disk each time. Transaction logs are appended, sequential I/O.

1.3.4 Transactions in MySQL

MySQL provides two transactional storage engines: InnoDB and NDB Cluster

AUTOCOMMIT
Mix storage engines in transactions: Transactions are implemented by the underlying storage engines, and it is not reliable to use multiple storage engines in the same transaction.
Implicit and explicit locking: InnoDB uses a two-phase locking protocol. It can be locked at any time during a transaction and is only released during commit or ROLLBACK. All locks are released at the same time and are automatically locked according to the isolation level. InnoDB also supports display locking, see 1.2.1.

1.4 Multi-version Concurrency Control (MVCC)

Most of MySQL’s transactional storage engines do not implement simple row-level locking, but generally implement multi-version concurrency control (MVCC) at the same time to improve concurrency performance. You can think of MVCC as a variant of row-level locking that allows non-blocking reads and writes to lock only necessary rows.

MVCC is implemented by keeping a snapshot of the data at a point in time. Depending on when the transaction starts, each transaction may see different data for the same table at the same time.

Different storage engines have different IMPLEMENTATIONS of MVCC, typically optimistic concurrency control and pessimistic concurrency control. InnoDB’s MVCC is implemented by storing two hidden columns at the end of each row record. These two columns store the creation time of the row and the expiration time (or deletion time) of the row. The system version number is automatically incremented before a new transaction is started.

MVCC only works at the repeatable read and commit read isolation levels. Because uncommitted reads always read the latest rows, not rows that match the current version of the transaction, serial reads lock all read rows.

Let’s look at how MVCC operates at the repeatable read isolation level.

select

InnoDB checks each row based on two criteria: a.InnoDB only looks for rows whose version is earlier than the current transaction version (i.e. rows whose system version number is less than or equal to the transaction’s system version number) to ensure that the rows read by the transaction either existed before the transaction started or were inserted or modified by the transaction itself. B. The deleted version of the row is either undefined or greater than the current transaction version number, ensuring that the row read by the transaction is not deleted before the transaction begins. Only records that meet the preceding two conditions can be returned as the query result.

insert

InnoDB stores the current system version number as the row version number for each newly inserted row.

delete

InnoDB stores the current system version number as a row deletion identifier for each row that is deleted.

update

InnoDB inserts a new record, saves the current system version number as the row version number, and saves the current system version number to the original row as the row deletion mark. Keeping these two additional system versions allows most read operations to be unlocked, improving performance. The downside is that each row requires additional storage, more row checking, and some additional maintenance.