High performance MySQL-MySQL architecture and history

1.1. MySQL logical architecture

Connection/thread handling, this layer is part of the client interaction, and typically, authentication, security, etc.
Query cache, parser, optimizer, belong to the service layer. Most of the core functions of MySQL are here, including query parsing, analysis, optimization, caching, and all the built-in functions. All cross-storage engines are implemented in this step.
The storage engine implements the actual operation of storing and extracting data. The service layer communicates with storage engines through apis. Different storage engines cannot communicate with each other.

1.1.1. Connection Management and Security

For each client that communicates with the server, the server creates a thread for it to handle client requests (just like ServerSocket and Socket), but for performance purposes, the thread is cached so that it can be used directly when the same client reconnects.

Once connected, authentication is performed to determine whether the client can manipulate a library.

1.1.2. Optimization and execution

MySQL parses query requests and makes various optimizations to create parse trees. For select statements, the query cache is checked for this item before the query is parsed, and if so, it is returned, otherwise it continues.

1.2. Concurrency control

Only two layers of concurrency control are discussed here: the service layer and the storage engine layer.

1.2.1. Read/write locks

In general, there is a good solution to concurrency control: read/write locks. Read locks (shared locks) and write locks (exclusive locks).

Most of the time, the internal management of MySQL locks is transparent.

1.2.2. Lock granularity

To increase the concurrency of shared resources, you need to reduce the scope of locks. In general, the smaller the lock granularity, the higher the concurrency. However, locking and unlocking can also consume resources, and too fine a lock granularity will aggravate this burden. Therefore, a good locking strategy is to strike a good balance between lock overhead and system concurrency.

MySQL has no restrictions on how the storage engine implements locking, so let’s look at the two most important locking strategies in MySQL.

Table locks. Locks the entire table, which is the most basic locking strategy. The write lock has a higher priority than the read lock. In a waiting queue, the write lock may jump to the front of the read lock queue. Although storage engines can have their own implementations, MySQL ignores storage engine locks for certain behaviors and uses its own mechanism instead, such as table locks when alter table is enabled.
Row locks. Row locking maximizes concurrency and also increases lock overhead. Row locking is only implemented by the storage engine, not the MySQL service layer.

1.3. The transaction

A transaction is a set of operations that contain multiple SQL statements. Once a transaction is started, from beginning to end, either all operations succeed and data is updated; Or fail, the data stays the same. There is no possibility that part of the transaction will succeed and part will fail.

Transactions have ACID properties.

A transaction is the smallest indivisible unit of work. It will either all succeed or all fail.
Consistency. The database always moves from one state to another without partial state updates.
Isolation, in general, changes made by one transaction are not visible to other transactions until the final commit occurs.
Durability, once a transaction is committed, its modifications are persisted regardless of database crashes.

Transactions can do so much security based on the assumption that a certain amount of system resources are occupied, so non-transactional databases can be used if transactions are not required.

1.3.1. Isolation level

Isolation is more complex than it is. In the SQL standard, four isolation levels are defined, with lower levels supporting higher concurrency and lower system overhead.

Read Uncommitted, a level at which changes made by a transaction are visible to other transactions even if they are not committed. Transactions that read uncommitted data are called dirty reads.
Read Committed, a level at which changes to a transaction are visible only when Committed. However, if the transaction is read before and after the commit, the read may be inconsistent, so this level is also called “non-repeatable read”.
Repeatable Read solves the dirty Read problem, but also introduces the phantom Read problem, that is, the data is changed when the transaction reads a certain range of data. This can be resolved by multi-version concurrency control.
Serializable, which enforces the serial execution of transactions, locking each row of data, which can result in a large number of locks.

Isolation level	Dirty read probability	Non-repeatable read possibility	Illusory possibility	Read lock
READ UNCOMMITED	YES	YES	YES	NO
READ COMMITED	NO	YES	YES	NO
REPEATABLE READ	NO	NO	YES	NO
SERIALIZABLE	NO	NO	NO	YES

For illusory and unrepeatable reading, see this one.

What’s the difference between unrepeatable and illusory?

A non-repeatable read is a multiple read with updates in between, reading data that has been changed by other transactions, for update operations

Solution: Use A row-level lock that locks the row and releases the lock only after transaction A has read multiple times, allowing other transactions to change the previous data.

A phantom read is a sequential read with an addition or deletion operation, reading data added or deleted by other transactions, for INSERT and DELETE operations

Solution: Use A table-level lock, lock the entire table, transaction A read the total number of times before releasing the lock, then allow other transactions to add data.

It’s much easier to understand the transaction isolation level.

Take a look:

1.3.2. The deadlock

When a deadlock occurs, the transaction holding the least row-level exclusive lock is rolled back.

1.3.3. Transactions in MySQL

MySQL handles transactions automatically by default. That is, each query triggers a transaction commit.

InnoDB uses a two-phase locking protocol. Locks can be performed at any time during a transaction. Locks are released only at COMMIT or ROLLBACK, and all locks are released at the same time. InnoDB automatically locks when needed, depending on the isolation level. These are all unlocked implicitly. InnoDB also supports display locking, which is implemented in SQL statements.

1.4. Multi-version concurrency control

Most of MySQL’s transactional storage engines do not simply implement row-level locking, but instead use a solution called MVCC(Multi-version Concurrency Control) for performance reasons.

You can think of MVCC as a variant of row-level locking that avoids locking operations in many cases, but it also allows for non-blocking reads and blocking writes that lock only the necessary rows.

The implementation of MVCC technology is essentially achieved by saving snapshots. MVCC implementations vary between storage engines, but there are two main types: optimistic concurrency control and pessimistic concurrency control. Let’s look at a simple implementation of InnoDB.

InnoDB’s MVCC is implemented by saving two hidden columns after each row. One is when the row was created, and the other is when the row expired. Here you can use the system version number (not the version of the OS, but the database system) for recording. The system version number is automatically incremented for each transaction started, with the system version number at the transaction start time as the transaction version number.

Select: Only looks for rows with a create version number earlier than the transaction version number and deletes a version that is not defined or later than the transaction version. (Create version <= transaction version < delete version)
Insert: use the current system version number as the row creation version number.
Delete: deletes the current system version as the version number of the row.
Update: Inserts a new row (not a direct update), using the current system version as the create version number and the current system version as the delete version number of the original row.

MVCC only works in repeatable reads and commits reads (second and third).