MySQL architecture

[TOC]

1. Logical architecture

MySQL logical architecture diagram

  • The first layer is the client connection authentication, C/S has this architecture
  • The second layer is the server layer, which contains most of the core service functions of MySQL
  • The third layer contains the storage engine, which the server communicates with via an API that circumvents the differences between storage engines, which do not communicate with each other, and which does not parse SQL(InnoDB is an exception, it will parse foreign key definitions, since the server itself does not implement this function).

1.1 Connection Management and Security

  • Each client has one thread in the server process
  • The server is responsible for caching threads and does not need to create or destroy threads for each new connection. (After 5.5, thread pools are available to service a large number of connections using a small number of threads.)

  • The server authenticates the client based on the user name, original host information, and password. After the connection is successful, the server verifies the permission of a specific operation.

1.2 Optimization and execution

  • MySQL parses queries, creates internal data structures (parse trees), and optimizes them in various ways (rewriting queries, determining the order in which tables are read, selecting appropriate indexes)
  • The user can influence the MySQL decision making process through the special hint optimizer. The optimizer can also be asked to explain elements of the optimization process so that users can refactor queries and schemas and modify configurations
  • The optimizer does not care about the storage engine used by the table, but the storage engine has an impact on optimizing the query. The optimizer will ask the storage engine for capacity or overhead information for a specific operation, statistics on existing table data, and so on.
  • For SELECT statements, the server checks the Query Cache before parsing the Query.

2. Concurrency control

Concurrency control at two levels: the server layer and the storage engine layer

2.1 read-write lock

  • Shared lock, read lock: shared, do not block each other
  • Exclusive lock, write lock: Block other read and write locks

2.2 the lock granularity

  • One provides concurrency for shared resources by making locking objects more selective. Try to lock only part of the data that needs to be modified. The less the amount of data locked, the higher the degree of concurrency.
  • The various operations on locks (obtaining locks, checking to see if they are unlocked, releasing locks) consume resources.
  • Lock strategy:
    • Table lock
      • The most basic and least expensive strategy. Server layer implementation
      • Lock the entire table. Write locks block other locks, but read locks do not block each other.
      • Table locks can also have good performance in certain scenarios. For example, the READ LOCAL table lock supports some types of concurrent writes. Write locks have a higher priority than read locks and may be inserted at the front of the read lock queue.
      • The storage engine can manage its own locks, but the server layer still uses various valid table locks for different purposes. For example, the server uses TABLE locks for statements such as ALTER TABLE, ignoring the storage engine’s locking mechanism.
    • Row lock:
      • Maximum support for concurrent processing and maximum lock overhead.
      • Only implemented in storage engine (InnnoDB and XtraDB etc.)

3. The transaction

  • A transaction is a set of atomic SQL queries, or a single unit of work.
  • An ACID transaction:
    • Atomicty: An indivisible, minimal unit of work.
    • Consistency: A database always transitions from one consistent state to another consistent state.
    • Isolation: Changes made by one transaction are not visible to other transactions until they are finally committed.
    • Durability: After transactions are submitted, their modifications are permanently saved to the database. (There is no 100% persistence persistence guarantee policy)

3.1 Isolation level of the isolation Line

There are four isolation levels in the SQL standard:

  • READ UNCOMMITED:
    • Changes in a transaction are visible to other transactions even if they are not committed.
    • Transactions that READ uncommitted data are called Dirty reads.
    • Performance is not much better than other levels, but it does have many fewer benefits and is not recommended.
  • READ COMMITED, nonrepeatable READ:
    • The default level for most databases, but not MySQL.
    • A transaction can only read the changes made by committed transactions; in other words, any changes made by a transaction from the beginning until the commit are not visible to other transactions.
    • Virtual reads may occur. For example, transaction A reads data twice, and transaction B modifies the data between the two reads. These two reads will have different results
  • REPEATABLE READ:
    • Default MySQL level.
    • Solve dirty read problems to ensure that the results of multiple reads of the same record in the same transaction are consistent.
    • This may result in A Phantom Read, where transaction A reads and modifies the data, and transaction B modifies the data (insert or delete) in the range. Transaction A reads the data in the range again, finds the Phantom Row, and reads the additions and deletions of other transactions
  • SERIALIZABLE:
    • The highest isolation level that forces transactions to be executed sequentially.
    • Locks are placed on every row of data that is read, resulting in numerous timeouts and lock contention issues.
Isolation level Dirty read probability Non-repeatable read possibility Illusory possibility Read lock
Uncommitted read Yes Yes Yes No
Submit to read No Yes Yes No
Repeatable read No No Yes No
serializable No No No Yes

3.2 a deadlock

  • A deadlock is a vicious cycle in which two or more transactions occupy the same resource and request to lock the resource occupied by the other.
  • Causes:
    • Real data conflicts
    • Storage engine implementation
  • Solution (partially or completely rolling back one of the transactions) :
    • Deadlock detection. InnoDB handles deadlocks by rolling back transactions that hold the least row-level exclusive lock (simpler rollback algorithm)
    • Deadlock timeout mechanism whereby the request for a lock is abandoned after a timeout

3.3 Transaction Logging (write-ahead Logging)

  • With transaction logging, the storage engine modifies table data simply by modifying its in-memory copy and logging the behavior to a transaction log that persists on hard disk.
  • In most implementations of storage engines, data changes are written to disk twice (the first time for logging and the second time for data)
  • Advantages:
    • Improve transaction efficiency
    • Speed is fast. In the appending mode, log writing operations are sequential I/OS of a small disk area rather than random I/OS of multiple areas.

3.4 Transactions in MySQL

Transactional storage engines: MySQL like InnoDB and NDB Cluster, third-party like XtraDB and PBXT.

  • AUTOCOMMIT:

    • MySQL default mode. If a transaction is not explicitly started, each query is committed as a transaction.

    • The AUTOCOMMIT variable can be set to enable or disable automatic commit mode on the current connection. When disabled, all queries are in one transaction until an explicit COMMIT or ROLLBACK is performed, after which a new transaction is started.

      mysql> SHOW VARIABLES LIKE 'AUTOCOMMIT';

      mysql> SET AUTOCOMMIT = 1;Copy the code
    • There is no impact on non-transactional surfaces such as MyISAM or memory tables. Because this type of table does not have the concepts of COMMIT or ROLLBACK, it is always in AUTOCOMMIT mode.

    • In data Definition Language (DDL), operations such as ALTER TABLE, LOCK TABLES, and others that result in large data changes force COMMIT to COMMIT the current transaction before they can be executed.

      SQL type

    • MySQL can set the isolation level during the current session, and the new isolation level takes effect after the next transaction starts.

      mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITED;Copy the code
    • MySQL recognizes all four ANSI isolation levels and InnoDB engine supports all isolation levels.

  • Mix storage engines in transactions

    • The MySQL server layer does not manage transactions; transactions are implemented by the underlying storage engine. Therefore, it is not reliable to use multiple storage engines in the same transaction.
    • If you mix storage engines such as InnoDB and MyISAM in a transaction, there is no problem under normal circumstances, but when rollback is required, changes on a non-transactional table cannot be undone, resulting in an inconsistent state of the database, and MySQL does not prompt for non-transactional table operations.
    • Summary: It is best not to mix storage engines in one transaction
  • Implicit and explicit locking

    • Implicit locking:

      • InnoDB uses the two-phase locking protocol. Locks can be performed at any time during transaction execution, locks are only released at COMMIT or ROLLBACK, and all locks are released at the same time. InnoDB automatically locks when needed, depending on the isolation level.
    • Explicit locking:

      • MySQL can be explicitly locked by specific statements that are not part of the SQL specification and should be avoided

        SELECT ... LOCK IN SHARE MODE

        SELECT ... FOR UPDATECopy the code
    • Other locking

      • MySQL supports LOCK TABLES and UNLOCK TABLES statements, which are implemented at the server layer and are independent of the storage engine. It is not a substitute for transaction processing. When used in InnoDB, performance is severely affected and not as good as its row-level locking
      • Suggestion: Do not perform any use explicitly, regardless of the storage engine, except that AUTOCOMMIT is disabled in transactions and LOCK TABLES can be used.

4. Multi-version Concurrency Control (MVVC)

Most of MySQL’s storage engines, Oracle and PostgreSQL, implement MVVC, but the implementation mechanism is different, typically optimistic concurrency control and pessimistic concurrency control.

  • Objective:

    • Simple row-level locking is deprecated and can be regarded as a variant of row-level locking to improve concurrency.
    • In many cases, locking is avoided and the overhead is lower.
    • Non-blocking reads are implemented, and writes lock only the necessary rows
  • Implementation principle:

    • To save a snapshot of the data at a point in time.
    • Each transaction sees the same data regardless of how long it is executed.
    • Depending on the transaction start time, each transaction may see different data for the same table at the same time.
  • InnoDB simplified VERSION of MVVC works:

    By keeping two hidden columns after each row record, one holds when the row was created and one holds when the row expired. Instead of storing the actual time value, you store the system version number. The system version number is automatically incremented for each new transaction. The system version number at the start of the transaction is used as the version number of the transaction and is compared with the version number of each row of records queried.

    Work only at REPEATABLE READ and READ COMMITED isolation levels.

    Advantages: Most of the read operations do not need to lock, the read operation is simple, the performance is good, and only the rows that meet the standard can be read.

    Cons: Each row requires additional storage, more row checking, and additional maintenance work.

    At the REPETABLE READ isolation level, MVVC works:

    • SELECT:

      Check each record according to the following two conditions, and return the query result only when the records meet the conditions

      • Only rows with a version number less than or equal to the current transaction version are found. Ensure that the rows already exist before the transaction begins, or are inserted or modified by the transaction itself.
      • The deleted version number of the row is either undefined or greater than the current transaction version number. Ensure that the data row is not deleted before the transaction reads.
    • INSERT:

      Save the current system version number as the row version number for each newly inserted data row

    • DELETE

      Save the current system version number for each row that you delete as a row deletion identifier.

    • UPDATE:

      Insert a new record, save the current system version number as the row version number, save the current system version number to the original row as the row deletion mark.

MySQL storage engine

In the file system, MySQL saves each database (schemaa) as a subdirectory under the data directory, and creates a. FRM file with the same name for each table in the database in this subdirectory to save the table definition.

-- Table path: /MySQL_data/MyDB/ mytalbe.frm

mysql> SHOW TABLE STATUS LIKE 'MyTABLE'\G;Copy the code

5.1 InnoDB Storage Engine

MySQL is the default transactional storage engine with good performance and automatic crash recovery features.

  • Designed to handle a large number of short-lived transactions (short-term transactions are mostly committed normally and are rarely rolled back)
  • The issue:
    • Data is stored in a tablespace, a black box managed by InnoDB, consisting of a series of data files.
    • MVVC supports high concurrency and achieves four standard isolation levels. The default is REPEATABLE READ, and the next-key locking policy makes InnoDB lock the rows involved in the query, and also locks the gaps in the index to prevent phantom reads.
    • Based on cluster index, it has high performance for primary key query. However, secondary indexes (non-primary indexes) must contain primary key columns. If the primary key index is too large, all other indexes will be large.
    • Predictive prefetch for reading data from disk, adaptive hash index that automatically creates hash indexes in memory to speed up read operations, and Insert buffer that accelerates insert operations
    • There are mechanisms and tools to support true hot backup
    • Suggestion: Read the section “InnoDB Transaction Model and locks “in the official manual sometime

5.2 MyISAM Storage Engine

Default storage engine for MySQL5.1 and previous versions. Supports full-text indexing, compression, spatial functions (GIS), does not support transaction and row-level locking, and cannot be safely recovered after a crash. Consider MyISAM for read-only data or tables that are small enough to tolerate repair operations.

  • Storage: table with.myd and. MYI’s data files and index files are stored on the file system.
  • Features:
    • Locking and concurrency: Locks the entire table instead of specific rows. A shared lock is added to read tables while an exclusive lock is added to write tables. Support for CONCURRENT inserts, which allow new data to be inserted while reading queries.
    • Repair: Different from transaction recovery and crash recovery. If the speed is slow, data may be lost. throughCHECK TABLE mytableCheck the table for errors,REPAIR TABLE mytableMake repairs.
    • Indexing features: Support for full-text indexing, which is an index created based on segmentation. Even BOLB and TEXT fields can be indexed based on the first 500 characters.
    • Delayed Key Write: If DELAY_KEY_WRITE is specified, the modified index data is not written to the disk but to the in-memory key buffer when the key cache is cleared or the table is closed. Can greatly improve write performance, but can be repaired in the event of a database or host crash resulting in index corruption.
  • Compressed table: The compressed table can be used for read operations only, greatly reducing disk space and I/O and improving query performance.
  • Performance: Simple design, data stored in tight formats, performance is good in some scenarios. The most typical performance problem is table locking.

5.3 Other Storage engines built into MySQL

  • Archive engine: non-transactional engine optimized for tell insertion and compression. Supports INSERT, SELECT, and index. Each SELECT requires a full table scan and prevents other SELECT execution to achieve consistent read. Supports row-level locks and dedicated buffers for high concurrent inserts. Suitable for log and data collection applications.
  • Blackhole engine: Does not implement any storage mechanism because it discards all inserted data and does not save it, but the server logs Blackhole tables and can be used to copy data to a standby repository or simply to log. Suitable for specific replication architectures and log auditing, but not recommended.
  • CSV engine: CSV files are treated as MySQL tables, but indexes are not supported. Suitable as a mechanism for data exchange.
  • Federated engine: Agents that access other MySQL servers, create a remote MySQL client connection, pass queries to the remote server for execution, and extract or send the required data.
  • The Memory engine:
    • All data is kept in memory, requires no disk IO and is an order of magnitude faster than MyISAM.
    • Hash indexes are supported, but table level locks are used, so concurrent write performance is low
    • BOLB and TEXT columns are not supported, and the length of each row is fixed.
    • It is good for fast access to data that is not modified and is not lost after a restart (the data is lost and the table structure remains).
    • Application Scenarios:
      • Used to lookup or map a table
      • Used to cache the results of periodically aggregated data
      • Used to store intermediate data generated during data analysis
    • If the data is too large or contains BLOB or TEXT fields, the MyISAM table is used.
    • Memory tables are notCREATE TEMPORARY mytableTable that can be created using any storage engine.
  • Merge engine: deprecated
  • NDB cluster engine: participates in MySQL cluster

5.4 Selecting an Appropriate Storage engine

  • Unless you need to use some features that InnoDB does not have and there is no other way to replace them, the InnoDB engine should be preferred.
  • Do not mix storage engines.
  • Factors to consider:
    • The transaction
    • The backup
    • Crash recovery
    • Unique characteristics
  • Application Scenarios:
    • Logging applications:
      • Use MyISAM or Archive storage engines
      • The active/standby architecture is used for read/write separation
      • The log table obviously contains date information

5.5 Transform table engine

The transformation engine loses features associated with the original engine.

  • ALTER TABLE

    mysql> ALTER TABLE mytable ENGINE = InnoDB;

    It takes a long time for the database to copy data from the original table to a new table, consuming all the IO capacity of the system.

  • Export and import

    Use mysqldump to modify the CREATE TABLE statement.

  • CREATE and query (CREATE and SELECT)

    mysql> CREATE TABLE innodb_table LIKE myisam_table;

    mysql> ALTER TABLE innodb_table ENGINE=InnoDB;

    mysql> INSERT INTO innodb_table SELECT * FROM myisam_table;

    If the amount of data is too large, it can be processed in batches.

    mysql> START TRANSACTION;

    mysql> INSERT INTO innodb_table SELECT * FROM myisam_table where id BETWEEN x AND y;

    mysql> COMMIT;

    -- If necessary, lock the original table to ensure data consistency.Copy the code