1, MySQL basic architecture

  • Server layer: mainly includes connector, query cache, analyzer, optimizer, executor, etc., as well as the general binlog module.
    • Connector: manages connections, authenticates identity and permissions.
    • Query cache: Cache the query result set. Key is the SQL statement and value is the result set. This feature has been removed since version 8.0.
    • Analyzer: analyzes the use of SQL statements and whether the syntax is correct, including lexical analysis, syntax analysis. No cache hit before entering profiler.
    • Optimizer: Select an execution plan that MySQL considers optimal.
    • Executor: Executes a statement, calls the storage engine, and returns execution data.
  • Storage engine layer: Responsible for storing and reading data. Plug-in architecture. Supports storage engines such as MyISAM and InnoDB. InnoDB has its own redo log module.

2. SQL statement execution process

  • Query statement: Permission verification -> Query cache -> Analyzer -> Optimizer -> Permission verification -> Executor -> Storage engine.
  • Update statement (add, delete, modify) : Analyzer -> Permission verification -> Executor -> Storage engine -> Read data from disk/memory, modify data, update memory, asynchronously flush disk -> Write redo log -> Write binlog. Redo log Flush (commit state).

MySQL storage engine

  • MyISAM: the default storage engine before version 5.5, only table level locks, does not support transactions, foreign keys, MVCC, crash cannot be safely recovered.
  • InnoDB: the default storage engine after version 5.5, supports row-level locking (default), table-level locking, supports transactions, foreign keys, MVCC, crash recovery.

4. Four characteristics of transaction (ACID)

  • Atomicity: Transactions are not split and a set of operations will either all succeed or all fail.
  • Consistency: Data is consistent before and after a transaction. Multiple transactions read the same data with the same result.
  • Isolation: When accessing the database concurrently, a transaction is not disturbed by other transactions.
  • Durability: After a transaction is submitted, its changes to data are persistent.

5. How to ensure ACID

  • Atomicity (A) : Guaranteed by the undo log, which records the log information to be rolled back.
  • Consistency (C) : consistency at the database level is guaranteed by A, I, and D, where C is the goal and A, I, and D are the means to ensure C. Consistency at the application level is controlled by code, which determines whether data is valid and then commits or rolls back.
  • Isolation (I) : Guaranteed by locks and MVCC.
  • Persistence (D) : The redo log ensures that when data is modified, the redo log records the modification. When a transaction is committed, data in the redo log is flushed. During an outage, data in the redo log is restored to the database.

6. What problems can concurrent transactions cause

  • Dirty read: a transaction reads uncommitted updates from another transaction.
  • Lost update: Updates from one transaction are overwritten by updates from another transaction.
  • Unrepeatable read: a transaction reads the update or delete committed by another transaction. That is, during multiple reads within a transaction, the last read data changes (changes or disappears).
  • Phantom Read: when a transaction reads an insert from another transaction, a number of reads within a transaction present data that was not present before.

7. Four isolation levels

  • READ-UNCOMMITTED: UNCOMMITTED data is allowed to be READ, the lowest isolation level.
  • READ-COMMITTED: Allows reading of data COMMITTED by concurrent transactions.
  • REPEATABLE READ: Multiple reads within a transaction return the same result, unless the transaction itself is modified.
  • SERIALIZABLE: All transactions are executed one by one, with the highest isolation level and complete compliance with ACID.
Isolation level Dirty read Unrepeatable read Phantom read
READ-UNCOMMITTED Square root Square root Square root
READ-COMMITTED x Square root Square root
REPEATABLE-READ x x Square root
SERIALIZABLE x x x

8 MVCC.

Multiversion Concurrency Control is a Concurrency Control approach that maintains multiple versions of a piece of data. Work only at repeatable-Read (RR) and Read-committed (RC) isolation levels.

  • MVCC is a lock-free concurrency control used to resolve Read/write conflicts. Its implementation relies on version chains and Read views.
  • Versioning: Each time a transaction changes a record, a copy of the previous record is saved in the Undo log. When multiple transactions modify the same record, multiple versions of the record exist in the Undo log.
    • The record contains the transaction ID and the rollback pointer. The transaction ID is the transaction ID that modifies the record, which is incremented continuously. The rollback pointer points to the previous version of the record.
    • Multiple versions recorded in undo log are concatenated into a version chain by a rollback pointer.
  • Read view: When a transaction performs a snapshot read, a snapshot of the data at the current time is generated and the active transaction ID (uncommitted transaction ID) is recorded. According to the visibility algorithm, the read view determines which version is visible to the current transaction along the version chain until it finds one.
    • If the transaction ID of a version is equal to the current transaction ID, it indicates that the version was created by the current transaction and therefore is visible to the current transaction.
    • If the transaction ID of a version is less than the minimum ID in the list of active transaction ids, the transaction that generated the version was committed before the read view was generated, and therefore the version is visible to the current transaction.
    • If the transaction ID of a version is not in the list of active transaction ids and is smaller than the maximum ID in the list of active transaction ids, the transaction that generated that version was committed before the read view was generated, and therefore the version is visible to the current transaction.
  • RR and RC generate read views at different times.
    • RR: a read view is generated only in the first snapshot of the same transaction and is used thereafter.
    • RC: Each snapshot read of the same transaction generates a new read view.

Current read, snapshot read

  • Current read: The latest version of the record is read, while other concurrent transactions cannot modify the current record. Records read are locked.
  • Snapshot reads: Read the visible (and possibly historical) version of the record. Records are not locked. The serial level degrades to the current read. The implementation of snapshot reading is based on MVCC.

10, index,

  • Hash index: The underlying hash table is suitable for single record query.
  • BTree index: ordered, suitable for multiple records query (range query), MySQL uses B+Tree.

MyISAM, InnoDB to B+Tree index implementation

  • MyISAM: non-clustered index, index file and data file separation.
    • Primary index: the data field of the leaf node stores the address of the data record, and the key must be unique. During index retrieval, the address value of data field is taken out according to key, and corresponding data record is read according to address value.
    • Secondary index: Structurally the same as the primary index, the key can be repeated.
  • InnoDB: Clustered index, data files themselves are index files.
    • Primary index: The table data file itself is an index structure organized according to B+Tree. The data field of the leaf node stores complete data records. Key is the primary key of the table.
    • Secondary index: the data field of the leaf node stores the primary key. During retrieval, the primary key needs to be obtained first, and then the primary key is used to retrieve records from the primary index (back to the table).

12. Overwrite indexes

  • In a query, an index (typically a federated index) overwrites (contains) all the fields that need to be queried.
  • Query results can be obtained directly through the index, without the need to query back to the table.

13. Lock type

  • Shared lock (read lock) : Other transactions can read, but cannot write.
  • Exclusive lock (write lock) : Other transactions can neither read nor write.
    • Table lock: Locks the entire table. Small overhead, fast lock, will not appear deadlock; Large lock granularity, high conflict probability, low concurrency.
    • Row lock: Locks a row of records. Overhead is big, add lock slowly, can appear deadlock; Lock granularity is small, conflict probability is low, and concurrency is high. It can be classified into optimistic lock and pessimistic lock. Optimistic lock can be implemented by version number.

14. Three lock algorithms for InnoDB

  • Record Lock: A lock on a single row Record.
  • Gap Lock: A Gap lock that locks a range, excluding the record itself.
  • Next-key lock: Record + Gap locks a range, including the Record itself.

MySQL > alter database lock

  • The isolation level InnoDB supports by default is REPEATABLE-READ. At this isolation level, for snapshot reads, MySQL uses MVCC to avoid phantom reads. For the current read, MVCC will still have phantom read, need to use lock read to avoid phantom. The Next-key lock algorithm is used to lock read.
  • Next-key lock is degraded to Record lock when the query index contains unique attributes.

16, large table optimization

  • Limited data range: Query with conditions that limit the data range.
  • Read/write separation: the master library is responsible for writing and the slave library is responsible for reading.
  • Vertical splitting: column splitting. A table with many columns is split into multiple tables.
  • Horizontal split: line split, the data of a table into multiple tables to store, the best library.

17. How to ensure the uniqueness of ID after dividing database and table

  • UUID: the UUID is not recommended because the query efficiency is low.
  • Self-increment ID set step size: Make ids fall on different tables.
  • Redis ID generation: the incr command of Redis is used to realize the atomic increment of ID.
  • Distributed ID algorithms: such as the SnowFlake algorithm.

Communication area



Wechat official account: Surprised but a glance

Personal Blog:Jing is a mesh

Reference

[1] snailclimb. Gitee. IO/javaguide – I… [2] www.cnblogs.com/wyq178/p/11… [3] blog.csdn.net/SnailMann/a… [4] zhuanlan.zhihu.com/p/343134817 [5] the high-performance MySQL