MySQL as the current core relational database management tool, in addition to more thinking and more actual combat, there is no other way


Proud not long, not vertical, not extremely, chi not full


MySQL > select * from ‘MySQL’

  • Partition by index storage structure:
    • B Tree index, Hash index, FULLTEXT full-text index, and R Tree index
  • Classification by application level:
    • Normal index, unique index, primary key index, composite index
  • By index key value type:
    • Primary key index, secondary index (secondary index)
  • Partitioning by data store and index key-value logic:
    • Clustered index (clustered index), non-clustered index (non-clustered index)

Part 1 – Index type

“Plain index”

CREATE INDEX <Index name> ON table_name;
ALTER TABLE table_name ADDINDEX [INDEX] (field name);CREATE TABLEtable_name ([...] , INDEX [INDEX name] (field name));Copy the code

Unique index

The difference from a normal index is that the index field must be unique, but null values are allowed. (When a unique constraint is appended to a table when creating or modifying it, the corresponding unique index is automatically created.)

CREATE UNIQUE INDEX <Index name> ONTable_name (field name);ALTER TABLE table_name ADD UNIQUEINDEX [name of the INDEX] (field name);CREATE TABLEtable_name ( [...] .UNIQUE[Index name] (field name);Copy the code

Primary key index

Special unique index that does not allow null values. (Only one primary key per table can be added when creating or modifying a table)

CREATE TABLEtable_name ( [...] .PRIMARYKEY (field name);ALTER TABLE table_name ADD PRIMARYKEY (field name);Copy the code

“Composite index”

Unlike a single index (an index that is implemented in a single column), it can be implemented on multiple columns.

  • Narrow index: index of 1 or 2 columns
  • Wide index: index of 2+ columns
CREATE INDEX <Index name> ONTable_name (field name1The field name2...) ;ALTER TABLE table_name ADDINDEX [name of the INDEX] (name of the field1The field name2...) ;CREATE TABLEtable_name ( [...] , INDEX [name of the INDEX] (name of the field1The field name2...) );
Copy the code

Full Text Index

LIKE fuzzy query is inefficient when there is a large amount of data, so full-text index is required.

CREATE FULLTEXT INDEX <Index name> ONTable_name (field name);ALTER TABLE table_name ADDFULLTEXT [index name] (field name);CREATE TABLEtable_name ( [...] , FULLTEXT KEY [index name] (field name);Copy the code

Full text index (match and Against)

SELECT * FROM user WHERE match(name) against('aaa');
Copy the code

[Precautions] :

  1. Full-text indexes must be built on string, text fields.
  2. The full-text index field value must be between the minimum and maximum characters to be valid. (InnoDB:3-84 / MyISAM:4-84)
  3. Full-text index field values are to be sliced according to syntax characters, such as B + AAA, divided into B and AAA
  4. Full-text index matching query, default use is equivalent matching, for example, A matches a, will not match AB, AC. If you want to match, you can search for a* in Boolean mode

    SELECT * FROM user WHERE match(name) against('a*' in boolean mode);

Part 2 – Indexing principles

Index definition: A data structure used by a storage engine to quickly find records. Extra space and data maintenance work is required.

  • Indexes are physical data page storage, stored in data files (InnoDB, IBD files) using data pages (pages).
  • Indexes can speed up the search speed, but at the same time reduce the speed of add, delete and modify operations, index maintenance costs.

Binary search

(abbreviated)

“Hash structure”

MySQL Hash structure is mainly used in Memory native Hash index and InnoDB adaptive Hash index.

——–InnoDB adaptive hash index ———

InnoDB storage engine monitors the query of each index page on the table. When InnoDB notices that some index values are frequently accessed, it creates a hash index in memory based on the B+Tree index. In this way, the B+Tree index in memory has the function of hash index, that is, it can quickly fix the value of the frequently accessed index page.

InnoDB adaptive hash index function, users can only choose to turn on or off the function, no manual intervention:

show variables like '%innodb_adaptive%';
Copy the code

“B + Tree”

MySQL database index uses B+Tree structure, optimized in B Tree structure

——–B Tree———

  • Index values and data are distributed throughout the tree structure
  • Each node can store multiple index values and corresponding data data
  • Multiple index values in a tree node are arranged in ascending order from left to right

Starting from the root node, the binary search is used for the sequence of index values in the node, and the search is ended if a match is made. No hit will enter the child node to repeat the search process, until the corresponding node pointer is empty, or is already a leaf node

——–B+ Tree———

  • Non-leaf nodes do not store data, but only index values, which is convenient for storing more index values
  • The leaf node contains all the indexes and data
  • Leaf nodes are connected with Pointers to improve the access performance of the interval

Compared with B tree, when B+ tree conducts range search, it only needs to search the index values of two nodes and then traverse by using the Pointers of leaf nodes. However, B Tree requires traversal of all nodes and data within the range. Obviously, B+Tree has high efficiency

“Clustering Index & Secondary Index”

Clustered indexes: The leaf nodes of a B+Tree store primary key indexes and row records in a centralized manner. Non-clustered indexes: The leaf nodes of a B+Tree store primary key indexes and row records separately

Primary key index: the leaf node of B+Tree stores primary key field values. Secondary index: the leaf node of B+Tree stores non-primary key field values

——– Cluster index (cluster index) ———

InnoDB is a data storage method. InnoDB’s clustered index is a B+Tree structure based on primary key order. The leaf nodes of a B+Tree are row records, which are tightly stored with primary key values. This also means that the primary key index of InnoDB is the table itself. It stores the data of the entire table in primary key order. The space occupied by InnoDB is the size of the data volume of the entire table. A primary key index is a clustered index.

InnoDB tables must be indexed by clusters!

  1. If there is a primary key, the primary key index is the cluster index.
  2. If there is no primary key, the first non-empty unique column is the cluster index.
  3. If not, InnoDB itself creates a hidden row-ID as the cluster index.

——– Secondary index ———

InnoDB can only create one clustered index per table, but can create multiple secondary indexes

Part 3 – Index analysis and optimization

“EXPLAIN”

MySQL provides an EXPLAIN command that analyzes SELECT statements and outputs detailed information about SELECT execution for developers to optimize.

EXPLAIN SELECT * FROM user WHERE id = 2;
Copy the code

  • select_type
    • SIMPLEThe: query statement does not contain subqueries or unions
    • PRIMARY: Indicates the outermost query
    • UNION: is the second or subsequent query of the UNION
    • DEPENDENT UNION: UNINO queries using external query results
    • UNION RESULT: result of UNION
    • SUBQUERY: SELECT sub-query results
    • DEPENDENT SUBQUERYThe: SELECT subquery depends on the results of the outer query
  • type: Indicates the mode used by the storage engine to query data. An important property to determine whether a query is a full table scan or a partial index-based scan
    • ALL: Full scan (worst performance)
    • index: index-based full scan, which scans the index first and then the whole
    • range> \ >= \ < \ <= \ in…
    • ref: Single-value queries using non-unique indexes
    • eq_refIn a multi-table JOIN query, each entry in the preceding table matches only one row of the following table
    • const: Use primary key or unique index to perform equivalent query, constant query
    • NULL: No need to access the table (fastest)
  • Possible_keys: index that can be used in query. Note that it is not always used; the index name is displayed.
  • Key: indicates the actual index used in the query. The index name is displayed.
  • Rows: The MySQL query optimizer estimates, based on the statistics, how many rows the SQL needs to scan to get the results. In principle, the less rows, the higher the efficiency, you can intuitively understand the efficiency of SQL.
  • key_len: The number of bytes used in the index. You can determine whether all composite indexes are used.

    The calculation rules are as follows:
    • String typeThe length of the string depends on the character set: latin1=1, GBK =2, UTf8 =3, UTF8MB4 =4
      • Char (n) : n * The length of the character set
      • Varchar (n) : n x Character set length + 2 bytes
    • Numeric types :
      • TINYINT: 1 byte
      • SMALLINT: 2 bytes
      • MEDIUMINT: 3 bytes
      • INT, FLOAT: 4 bytes
      • BIGINT, DOUBLE: 8 bytes
    • Time to type :
      • DATE: 3 bytes
      • TIMESTAMP: 4 bytes
      • DATETIME: 8 bytes
    • The field propertiesThe: NULL attribute takes up 1 byte. If a field is NOT NULL, this item is NOT available.
  • Extra: A lot of Extra information, all kinds of operations will prompt relevant information in Extra
    • Using where: Query Data needs to be queried by indexing back to the table
    • Using index: The query needs to be indexed, and the index can meet the required data
    • Using filesort: The query results need to be sorted. The data amount is small in memory, but large in disk. Therefore, optimization is recommended when Using filesort is available
    • Using temproray: A temporary table is used, usually for operations such as deduplication and grouping

“Back table Query”

InnoDB indexes have clustered indexes and secondary indexes. The leaf node of the clustered index stores row records. InnoDB must have only one. Leaf nodes of secondary indexes store primary key values and index field values, and row records cannot be located directly through secondary indexes. Typically, you need to scan the index tree twice. Locating the primary key value through the secondary index and then the row record through the clustered index is called a table-back query, which has lower performance than scanning the index tree.

“Overwrite index”

Index overwriting can be triggered when the Extra field of the output result of Explain is Using index

You can get all the column data you need in an index tree without going back to the table. This is called index overwriting.

The most common way to achieve index coverage is to create the queried fields into a composite index.

“Left-most Prefix Rule”

For a composite index query that uses the leftmost column, the query uses the index, and the index is invalidated if the search starts from the second column of the index

“LIKE query”

MySQL > query like fuzzy index A: The index can be used when MySQL is using a Like fuzzy query, only after the % character is written will the index be used.

select * from user where name like '%o%'; //Don't workselect * from user where name like 'o%'; //Play a roleselect * from user where name like '%o'; //Don't workCopy the code

“NULL Query”

If a column in a MySQL table contains a NULL value, is the index containing the column valid? Answer: Effective, but not recommended!

For MySQL, NULL is a special value. Conceptually, NULL means “an unknown value” and is handled a little differently than other values.

  • Operators such as =, <, > cannot be used
  • The result of an arithmetic operation on NULL is always NULL
  • Count does not include NULL rows
  • NULL requires more storage space than an empty string

MySQL can use indexes on columns that contain NULL, but NULL is different from other data, and it is not recommended to allow NULL on columns. It is best to set NOT NULL and give a default value, such as 0 and an empty string. If datetime is used, you can also set the current system time or a fixed special value, such as ‘1970-01-01 00:00:00’.

“Indexing and Sorting”

Two sorts in a query

  1. filesort: Searches for the result first and sorts the result in the cache or disk, which is inefficient
    1. Dual sorting: The user data is obtained after two disk scans.

      The first time the sort field is read out, and then sort;

      The second time to read the other field data.
    2. Single way sorting: Queries all required column data from disk and returns the results in memory sort.

      If the query data exceeds the cache sort_buffer, it will result in multiple disk reads and the creation of temporary tables, resulting in multiple I/OS, which will increase the burden.

      So = = >Use lessselect *Increase sorT_BUFFer_SIZE capacity and max_LENGTH_FOR_SORT_data capacity

    Here’s how to implement filesort:

    • Both ASC and DESC are used for index columns:
    explain select id from user order by age asc,name desc; //Corresponds to the (age,name) indexCopy the code
    • The WHERE and ORDER BY clauses satisfy the leftmost prefix, but the WHERE clause uses range queries (e.g. >, <, in, etc.):
    explain select id from user where age>10 order by name; //Corresponds to the (age,name) indexCopy the code
    • The ORDER BY or WHERE+ORDER BY index column does not satisfy the left-most front of the index:
    explain select id from user order by name; //Corresponds to the (age,name) indexCopy the code
    • MySQL only uses one index at a time. ORDER BY involves two indexes:
    explain select id from user order by name,age; //Correspond to (name) and (age)Copy the code
    • The WHERE clause uses a different index than the ORDER BY clause:
    explain select id from user where name='tom' order by age; //Corresponding to (name), (age) indexCopy the code
    • Indexed columns in the WHERE or ORDER BY clause use expressions, including function expressions:
    explain select id from user order by abs(age); //Corresponding (age) indexCopy the code
  2. index: Use the index to automatically achieve sorting, do not need to do another sorting operation, high efficiency

    Here is how index is implemented:
    • The ORDER BY clause index column combination satisfies the left-most front of the index:
    explain select id from user order by id; //The corresponding (id) and (id,name) indexes are validCopy the code
    • WHERE clause +ORDER BY clause index column combination satisfies the left front of the index:
    explain select id from user where age=18 order by name; //Corresponds to the (age,name) indexCopy the code

If we Explain SQL analysis, the Extra attribute shows Using filesort, indicating that filesort sorting method is used, which needs to be optimized. If Using Index is displayed in Extra, it means overwriting the index and all operations are completed on the index. It is recommended to use overwriting the index as much as possible.

Part 4 – Query optimization

Slow Query location

—- Enable slow log query —-

Check whether the commands for slowly querying logs and the storage location of slow querying log files are enabled in the MySQL database

SHOW VARIABLES LIKE 'slow_query_log%'
Copy the code

Enable slow log query

SET global slow_query_log = ON; 
SET global slow_query_log_file = 'OAK-slow.log'; 
SET global log_queries_not_using_indexes = ON; //Whether to record queries that do not use indexesSQL. The premise is that the value of slow_query_log isONOtherwise it won't work.SET long_query_time = 10; //Specifies a threshold for slow queries, in seconds. ifSQLIf the execution time exceeds the threshold, slow queries are recorded in the log fileCopy the code

—- View slow query logs —-

  1. Open it directly using the editorslow.logLog :(the log path is availableshow variables like '%slow_query_log_file%';Query)
    • Time: indicates the time when logs are recorded
    • User@Host: indicates the user and host to be executed
    • Query_time: execution time
    • Lock_time: indicates the time when a table is locked
    • Rows_sent: Number of records sent to the requester
    • Rows_examined: Indicates the number of scanned records
    • SET timestamp: The time at which the statement was executed
    • select…. : Indicates the specific SQL statement to execute
  2. Use mysqlDumpslow to view:

    Run it in MySQL binperl mysqldumpslow --helpYou can view the usage format
    • Tips for viewing logging:
    # get the most recordsets returned10aSQL. mysqldumpslow-s r -t 10 /database/mysql/Mysql06_slow.log # gets the most accesses10aSQL 
    mysqldumpslow -s c -t 10 /database/mysql/Mysql06_slow.log # gets the first sorted by time10Bar contains the left join query statement. mysqldumpslow-s t -t 10 -G"left join"/database/mysql/Mysql06_slow.log # It is also recommended to combine these commands when using them|And more. Otherwise, the screen may be flooded. mysqldumpslow-s r -t 20 /mysqldata/mysql/mysql06-slow.log | more 
    Copy the code

“Slow Query Optimization”

How to determine whether the QUERY is slow: The SQL whose running time exceeds the value of long_query_time is recorded in the slow query log. The default value of long_query_time is 10(s). How to determine whether to apply indexes: Run the explain command to check whether the table indexes are used during SQL statement execution. Check whether the key value in the result is NULL

  • Whether to use an index only indicates the execution process of an SQL statement. And whether it is a slow query is determined by its execution time, and there is no necessary relationship between the two.
  • When we use an index, we should not only focus on whether it works, but also on whether it reduces the number of rows scanned by the query. If the number of rows scanned is reduced, efficiency will be improved. For a large table, you should not only create an index, but also consider index filtering, which will be fast

Summary of causes for slow query

  1. Full table scan: Explain the type attribute all
  2. Full index scan: Explain analyze the type attribute index
  3. Index filter is not good: rely on index field selection, data volume and state, table design
  4. Frequent back-table query overhead: minimize select * and use overwrite indexes

“Paging Query Optimization”

Question Point 1:

select * from user limit 10000.1; 
select * from user limit 10000.10; 
select * from user limit 10000.100; 
select * from user limit 10000.1000; 
select * from user limit 10000.10000; .Copy the code

When the number of returned records is less than 100, the query time remains unchanged. As the number of query records increases, it takes more and more time!

Question Point 2:

select * from user limit 1.100; 
select * from user limit 10.100; 
select * from user limit 100.100; 
select * from user limit 1000.100; 
select * from user limit 10000.100; .Copy the code

When the offset exceeds 100, the query time increases sharply as the offset increases. (This paged query mechanism starts with the first record in the database and slows down the subsequent queries, and the more data you query, the slower the overall query speed.)

Optimization scheme:

  1. Optimization with coverage index:
    • Select * from user limit 10000,100; ×
    • Select id from user limit 10000,100;
  2. Optimization with subqueries:
    • Select * from user limit 10000,100; ×
    • Select * from user where id>= (select id from user 10000,1) limit 100;Tick tick

MySQL transactions and locks

Part 1 – ACID properties

Atomicity, Consistency, Isolation and Durability

Three of the four features are related to WAL and are guaranteed through Redo and Undo logs

* Write-Ahead Logging, full name for WAL, writes logs first, then writes disks.

“A atomicity”

All or none of the changes to the data are executed

  • If the dirty pages in the Buffer Pool are not flushed, how can we ensure that the modified data takes effect?Redo
  • If the transaction is not committed, but the Buffer Pool is flushed with dirty pages, how can we ensure that the data that should not exist is erased?Undo

For each write transaction, the BufferPool is modified to generate a Redo/Undo log. This log is written to the BufferPool before pages are flushed to disk. After the database is restarted, it can be restored through the Redo log to ensure that data from dirty page writes is not lost. ==> If the dirty page refresh succeeds, the database hangs, and you need to use Undo to implement this.

“C consistency”

Database integrity limits are not broken before and after a transaction.

  1. Constraint consistency: foreign key, Check, unique index, and other constraints specified when creating a table structure.
  2. Data consistency: is a comprehensive requirement because it is guaranteed by atomicity, persistence, and isolation rather than relying on a single technology.

Consistency can also be understood as data integrity. Data integrity is guaranteed through atomicity, isolation, and persistence, which in turn are guaranteed through Redo/Undo. Logical consistency, including unique indexes, foreign key constraints, and check constraints, falls under business logic.

“I isolation”

The execution of a transaction cannot be disturbed by other transactions, that is, the operations and data used within a transaction are isolated from other concurrent transactions

Isolation from low to high: read uncommitted -> read committed -> repeatable read -> serializable locks and multiple version control (MVCC) technologies are used to ensure isolation (more on this later)

“D Persistence”

Once a transaction is committed, its changes to the data in the database should be permanent and should not be affected by subsequent operations or failures

As shown in the figure below, a “commit” action triggers the following operations: binlog drop, send binlog, storage engine commit, flush_logs, check_point, transaction commit flag, and so on. These are the database to ensure the integrity of its data, the means of persistence.

Part 2 – Evolution of transaction control

“Problems with transaction concurrency”

  • Update the lost: When two or more transactions update the same row, update loss occurs. It can be divided into rollback override and commit override.
    • Rollback override: A transaction rollback operation that overwrites committed data from another transaction.
    • Commit override: A transaction commit operation that overwrites data already committed by another transaction.
  • Dirty read: a transaction reads data that has been modified but not committed by another transaction.
  • Non-repeatable read: a transaction in which the same row is read inconsistently multiple times.
  • Phantom read: multiple queries in a transaction with inconsistent results based on the same criteria. The result of the subsequent query is different from the result of the previous query, with several more or fewer rows.

“Line up”

Database operations for all transactions are performed in complete sequence, without locking, and the database processes only one transaction at a time

“Exclusive locks (mutex)”

After the entry is monopolized by the first transaction, other transactions block, waiting for the previous transaction to release the lock

Read/write lock

Refine read and write operations. Read parallel and read and write, write read, write between these or to add exclusive lock

“MVCC”

Concurrency Control (Multi Version Concurrency Control), the idea of Copy on Write. Support reading, reading, writing and reading parallel; (To ensure consistency, write in parallel)

When transaction 1 starts a write operation, it copies a copy of the record. Other transaction read operations will read the copy of the record. Therefore, it does not affect the reading of the record by other transactions

— – MVCC definition

In order to achieve high concurrent data access in the database, multi-version processing of data is carried out, and the visibility of transactions is used to ensure that transactions can see the version of data they should see. Multi-version control is a clever way to transform the exclusive exclusivity of scarce resources into concurrency, which greatly improves the throughput and read and write performance of the database

Before each transaction modification operation, the data status and transaction number before modification are recorded in the Undo log. This backup record can be used to read other transactions or roll back data if necessary

If you want to further resolve writing conflicts, you can use optimistic/pessimistic locking.

Part 3 – Transaction isolation level

  • Read Uncommitted: Updates of rollback coverage types are lost, but dirty reads can occur, that is, data that has not been modified by a committed transaction in another session can be Read.
  • Read Committed (default for Oracle and SQLServer) : Only data Committed by other sessions can be Read, preventing dirty reads. However, unrepeatable reads may occur, that is, the results of two queries may be inconsistent in a transaction.
  • Repeatable Read: Solves non-repeatable reads by ensuring that multiple instances of the same transaction will see the same rows when reading data concurrently. However, in theory, phantom reading can occur. Simply speaking, phantom reading refers to that when a user reads a data row in a certain range, another transaction inserts a new row in this range, and when the user reads the data in this range, he will find a new phantom row.
  • Serializable: All add, delete, change and check are executed in serial. It resolves the illusion problem by forcing transaction ordering and resolving conflicts. This level can cause a lot of timeout and lock contention, which is inefficient.
  • The higher the transaction isolation level of the database, the smaller the concurrency problem, but the worse the concurrency capability (cost)
  • For users, it is necessary to manually set locks during development if the isolation level chosen does not address concurrency problems or requirements

MySQL > create transaction isolation level

show variables like 'tx_isolation'; # orselect @tx_isolation;
Copy the code

Set transaction isolation level:

set tx_isolation='READ-UNCOMMITTED'; 
set tx_isolation='READ-COMMITTED'; 
set tx_isolation='REPEATABLE-READ'; 
set tx_isolation='SERIALIZABLE';
Copy the code

Part 4 – Locking mechanism and combat

“Lock classification”

  • fromOperating grain size:
    • Table level lock: Locks the entire table for each operation. The lock granularity is large, and the probability of lock conflict is high and the concurrency is low. Application in MyISAM, InnoDB, BDB and other storage engines.
    • Row-level locks: Locks one row of data per operation. The lock granularity is minimum, the probability of lock conflict is lowest, and the concurrency is highest. Application in InnoDB storage engine.
    • Page-level locks: Each time a group of adjacent records are locked, the locking granularity is between table lock and row lock, the overhead and locking time are between table lock and row lock, and the concurrency is average. Application in BDB storage engine.
  • fromOperation type:
    • Read lock (S lock): Shared lock. Multiple read operations on the same data can be performed simultaneously without affecting each other

      Transaction A adds an S lock to the record and can read the record but cannot modify it. Other transactions can add an S lock to the record but cannot add an X lock. The X lock needs to be added and all the S locks need to be released
    • Write lock (X lock): Exclusive lock, which blocks other write locks and read locks until the current write operation is complete

      Transaction A adds an X lock to the record and can read or modify the record. Other transactions cannot read or modify the record
    • Intentional read lock (IS lock), intentional write lock (IX lock): is a table level lock. An intent lock is added to the table before the corresponding S\X lock.
  • fromOperating performance:
    • Optimistic locking: The general implementation method is to compare the versions of recorded data, and the conflict detection will be performed only when the data update is submitted. If the conflict is found, an error message will be displayed
    • Pessimistic locking: When modifying a piece of data, to avoid being modified by others at the same timeLock first, then modifyControl mode.

      (Shared and exclusive locks are different implementations of pessimistic locks, but both are pessimistic locks.)

“Row Locking Principle”

  • Record the Lock Lock: Locks a single row record. (Record locks, RC and RR isolation levels are supported.)
  • The Lock Lock Gap: gap lock locks the gap between index records to ensure that the gap between index records remains unchanged. (The lock range, RR isolation level support)
  • Next – key Lock Lock: Records lock and gap lock combinations, simultaneously locks the data, and locks the data before and after the range. (Record lock + range lock, RR isolation level support)

At the RR isolation level, InnoDB uses next-key Lock for record locking first, but when SQL operations have unique indexes, InnoDB optimizes next-key Lock and downgrades it to RecordLock, which only locks the index itself rather than the range.

  1. select … From statements: InnoDB engine uses MVCC mechanism for non-blocking reads, so InnoDB does not lock normal SELECT statements
  2. select … From Lock in Share mode statement: If a shared lock is appended, InnoDB uses a next-key lock. If a scan finds a unique index, it can degrade to a RecordLock
  3. select … From for UPDATE: An exclusive Lock is appended. InnoDB uses a next-key Lock. If a scan finds a unique index, it can be downgraded to a RecordLock
  4. update … Where statement: InnoDB uses a next-key Lock for processing. If a scan finds a unique index, it can degrade to a RecordLock
  5. delete … From statement: InnoDB uses a next-key Lock for processing. If a scan finds a unique index, it can degrade to a RecordLock
  6. Insert statement: InnoDB sets an exclusive RecordLock on the row to be inserted

“Pessimism lock”

Definition: In the process of data processing, data is locked in a broad sense, the afore-mentioned row lock, table lock, read lock, write lock, shared lock, exclusive lock, etc., these are pessimistic lock category

—- table level (pessimistic) locking —-

Manually add table locks

lock tableThe name of the table1 read|Write, table name2 read|write, ... ;Copy the code

View the locks added to the table

show open tables;
Copy the code

Delete table locks

unlock tables;
Copy the code

Table-level read locks block writes, but not reads. Write locks block both read and write operations

—- Shared row-level (pessimistic) locks —-

A shared lock is also called a read lock, or S lock for short. A shared lock means that multiple transactions can share the same lock for the same data. All transactions can access the data, but they can only read the data and cannot modify it. To use a shared lock, select… Lock in share mode, only applicable to query statements.

If a transaction uses a shared lock (read lock), it can only be read, but cannot be modified. The modification operation is blocked

—- Exclusive row level (pessimistic) lock —-

Exclusive lock is also called write lock, or X lock for short. It cannot coexist with other locks. If a transaction acquires an exclusive lock on a row, other transactions cannot perform other operations on that row or acquire the lock on that row. The exclusive lock can be used by adding “for update” at the end of SQL. Delete statement plus for update. Row-level locking is implemented by relying on the corresponding index, so if you operate on a query that does not have an index, the entire table record will be locked.

Transactions use exclusive locks (write locks), which can be read and modified by the current transaction, but cannot be modified by other transactions, and cannot acquire record locks (select… For update). If the query does not use the index, the entire table record will be locked.

“Optimism Lock”

Optimistic locking, as opposed to pessimistic locking, is not a feature provided by the database and needs to be implemented by the developer himself. In the database operation, the idea is very optimistic, thinking that this operation will not cause conflicts, so in the database operation does not do any special processing, that is, do not lock, but in the transaction commit to determine whether there is a conflict.

The key point of optimistic lock implementation is conflict detection

  • Optimistic lock universalimplementation:
    • Add a version field to the table, and increment the version number of that record by 1 for each operation. Version is used to check whether the read record is changed to prevent the record from being modified by other transactions during service processing
    • Using a Timestamp is similar to using a version field. You also need to add a field to the table. The field type uses a Timestamp. Also, check the timestamp of the data in the current database and compare it with the timestamp obtained before the update. If the timestamp is consistent, submit the update; otherwise, it is version conflict and cancel the operation.

↓ Manually implement optimistic lock using SQL ↓ :

  1. Query product information:
select (quantity,version) from products where id=1;
Copy the code
  1. Generate orders based on product information:
insert into orders ... 
insert into items ...
Copy the code
  1. Modify commodity inventory:
update products set quantity=quantity- 1,version=version+1 where id=1 and version=#{version};
Copy the code

In addition to implementing optimistic locking manually, many database access frameworks also package optimistic locking implementations such as the MyBatis framework, which can be extended using the OptimisticLocker plugin.

“Deadlocks and Solutions”

——– table level deadlock ——–

  • Possible cause: User X– > table A (table lock) > table B (table lock) user Y– > table B (table lock) > table A (table lock)
  • Solution: This kind of deadlock is common and is caused by a BUG in the program. There is no other way to adjust the program logic

——– row-level deadlock ——–

  • Possible cause 1: A query without index conditions is executed in a transaction, triggering a full table scan and increasing the row-level lock to a full table record lock (equivalent to a table-level lock). After multiple transactions are executed, deadlocks and blocks are easily generated. As a result, the application system slows down and blocks or deadlocks occur
  • Solution 1: Do not use too complex queries associated with multiple tables in SQL statements; Use explain “Execution plan” to analyze SQL statements. For SQL statements with full table scan and full table lock, build corresponding indexes for optimization
  • Cause 2Deadlock occurs when two transactions wait for each other to acquire the lock held by the other
  • Solution 2:
    • Try to lock as many resources as you need in the same transaction at once
    • Sort the resources by ID and process them in that order

——– The shared lock is converted to the exclusive lock ——–

  • Cause: Transaction A queries A record and updates the record. At this time, transaction B also updates the record, and the exclusive lock of transaction B can only be obtained after transaction A releases the shared lock. A deadlock occurs when transaction A performs the update operation again because transaction A requires an exclusive lock to do the update operation. However, the lock request cannot be granted because transaction B already has an exclusive lock request and is waiting for transaction A to release its shared lock.
A transaction is A:select * from dept where deptno=1 lock in share mode; //A Shared lock,1Transaction A: Update DEPTset dname='java' where deptno=1;//Exclusive lock,3
-------------------------------------------------------------Transaction B: Update DEPTset dname='Java' where deptno=1;//Due to the1There is a shared lock, cannot obtain exclusive lock, need to wait,2
Copy the code
  • The solution:
    • For buttons and other controls, click immediately invalid, do not let the user click repeatedly, to avoid causing the same record multiple operations
    • Use optimistic locks for control. Optimistic locking mechanism avoids the overhead of database locking in long transactions and greatly improves the system performance under large concurrency.

——– Deadlock check ——–

MySQL provides several lock-related parameters and commands to help optimize lock operations and reduce deadlocks.

  1. View deadlock logs:
    • show engine innodb statusCommand to view recent deadlock logs
  2. View lock status variables:
    • show status like'innodb_row_lock%Command to check status variables and analyze contention for row locks in the system
      • Innodb_row_lock_current_waits: Number of locks currently waiting for
      • Innodb_row_lock_time: total lock time since system startup
      • Innodb_row_lock_time_avg: average time spent waiting for a lock
      • Innodb_row_lock_time_max: The longest time to wait for a lock since system startup
      • Innodb_row_lock_waits: Total number of waits since system startup

MySQL cluster architecture

Part 1 – Master/slave mode

MySQL master-slave mode is where exponential data can be copied from a MySQL database server master node to one or more slave nodes. MySQL uses asynchronous replication by default, so that slave nodes do not have to visit the master server all the time to update their own data. Slave nodes can replicate all the databases in the master database, or specific databases, or specific tables.

  • Primary/secondary replication purposes:
    • Real-time DISASTER Recovery for failover (High Availability)
    • Read/write separation, providing query service (read extension)
    • Data backup to avoid service impact (high availability)
  • Prerequisites For active/standby deployment:
    • The slave library server can communicate with the master library
    • Enable binlog logging for primary library (set log-bin parameter)
    • The primary and secondary server ids are different

“Implementation Principle”

  1. The master library records database changes to a Binlog log file
    • BinlogDump ThreadAfter receiving a write request, the Slave reads Binlog information and pushes it to the SlaveI/O Thread.
  2. Read the Binlog Log file from the primary library Information is written to the Relay Log of the secondary library
    • The SlaveI/O ThreadWrites the read Binlog information to the local directoryRelay Log
  3. Read relay log information from the library Perform Replay in the slave library to update data information from the slave library
    • The SlaveSQL ThreadA Relay Log change request was detected and resolvedRelay logContent in the slave libraryperform

Master/Slave Replication Problem

  • Data may be lost after the primary library goes down
  • The secondary database has only one SQL Thread. The primary database is under heavy write pressure, and the replication is likely to be delayed

[Solution] :

  • Semi-synchronous replication – Solves the problem of data loss
  • Parallel replication —- solves the problem of late replication from the library

Semi-synchronous Replication

To improve data security, MySQL asks the Master to wait for an Acknowledge character (ACK) message from the Slave node at a certain point in time and commit the transaction only after receiving the ACK message. This is also the basis of semi-synchronous replication. MySQL has introduced semi-synchronous replication since version 5.5 to reduce the probability of data loss

Parallel Replication

5.6+ Implement MTS (Enhanced multi-threaded slave) There are two IO threads and SQL threads working in single-threaded mode in the slave library. If delay is caused, we can implement multi-threaded slave mechanism to enhance delay. Reduce slave library replication latency. I/O Thread, I/O Thread

——MySQL 5.6 Parallel Replication ——

Just library-based parallel replication. (Pseudo parallel replication)

——MySQL 5.7 Parallel Replication ——

Parallel replication based on group commit. (Start true parallel replication)

By grouping transactions, they are written to the binary log in a single operation when a transaction commits. If multiple transactions can commit successfully at the same time, they mean that there are no conflicts and can therefore be executed in parallel on the Slave, so by adding group commit information to the binary log on the master library.

In order to be compatible with MySQL 5.6 library-based parallel replication, 5.7 introduced a new variable slave-parallel type, which can be configured with the following values: DATABASE (default, library-based parallel replication), LOGICAL_CLOCK (group commit parallel replication).

——MySQL 8.0 parallel replication ——

Parallel replication based on write-set.

MySQL has a set variable that stores the records of transaction changes (primary key hashes). All committed transaction changes (primary key hashes) are hashed against the set of that variable to determine whether the row changes conflict with it and to determine dependencies. At this level of granularity, you get to the row level, where the granularity of parallelism is finer and the parallelism is faster.

—— Parallel replication configuration and tuning ——

  • Binlog_transaction_dependency_history_size Controls the size of a collection variable.
  • binlog_transaction_depandency_trackingUsed to control the dependencies between transactions in a binlog file, i.elast_committedValue.
    • COMMIT_ORDERE: COMMIT_ORDERE is based on the group commit mechanism
    • WRITESET: based on the WRITESET mechanism
    • WRITESET_SESSION: based on the writeset, the value of last_committed transactions in the same session is incremented in sequence
  • Transaction_write_set_extraction Detection algorithm for controlling transactions. The value can be OFF, XXHASH64, or MURMUR32
  • master_info_repositoryAfter the MTS function is enabled, set the parametersmaster_info_repostitorySet it to TABLE for a 50% to 80% performance improvement. This is because when parallel replication is enabled, updates to the meta-master.info file will be greatly increased and competition for resources will become greater.
  • slave_parallel_workers
    • If theslave_parallel_workersSet to 0, MySQL 5.7 degenerates to the original single-threaded replication
    • If theslave_parallel_workersIf the parameter is set to 1, the SQL thread functions as a coordinator thread
    • However, there is only one worker thread for playback, which is also a single-thread replication. Slave_parallel_workers =1 is worse than 0 because there is another coordinator thread forwarding.
  • slave_preserve_commit_orderMTS after MySQL 5.7 can achieve smaller granularity of parallel replication, but requires theslave_parallel_typeSet to LOGICAL_CLOCK, but only LOGICAL_CLOCK is also problematic, because the order of transactions applied on the slave is not the same as that recorded in the relay log, so data consistency cannot be guaranteed. To ensure that transactions are played back in the order recorded in the relay log, the parameter needs to be enabledslave_preserve_commit_order.

Enable enhanced multi-threaded Slave:

slave-parallel-type=LOGICAL_CLOCK 
slave-parallel-workers=16 
slave_pending_jobs_size_max = 2147483648 
slave_preserve_commit_order=1 
master_info_repository=TABLE 
relay_log_info_repository=TABLE 
relay_log_recovery=ON
Copy the code

—— Parallel replication monitoring ——

mysql> show tables like 'replication%'; 
Copy the code

You can run replication_applier_status_by_worker to view the working status of worker processes: select * from replication_applier_status_by_worker;

“Read-write separation”

The database is divided into master and slave libraries. One master library is used to write data, and multiple slave libraries are used to read data. Data is synchronized between the master and slave libraries through the master and slave replication mechanism

  • In the application, you can add multiple indexes to the secondary database to optimize the query. The indexes in the primary database can be omitted to improve the write efficiency.
  • Read/write separation architecture can also eliminate read/write lock conflicts to improve database read/write performance. Considerations for using the read-write separation architecture include: master-slave synchronization delays and read-write allocation mechanisms

Part 2 – Dual master mode

Therefore, as the business evolves, the database architecture can evolve from master-slave to dual-master. In dual-master mode, two servers are the master and slave of each other. Data changes on one server are replicated to the other server’s database

With the development of business, architecture will evolve from master/slave mode to dual master mode. It is recommended to use dual master single-write mode, and then introduce high availability components such as Keepalived and MMM to realize automatic failover of master library failure.

“MMM Architecture”

MMM (Master-Master Replication Manager for MySQL) is a set of third-party software that manages and monitors dual-master Replication and supports dual-master failover. MMM is developed in Perl. Although it has a dual-master architecture, it allows only one node to write at a time. The following figure shows a dual-master high availability architecture based on MMM implementation.

  • MMM Fault handling mechanismMMM contains writer and reader roles, corresponding to write nodes and read nodes respectively.

    In addition to managing the two master nodes, MMM also manages the Slave nodes. In case of downtime, replication delay, or replication errors, MMM removes the VIPs from the node until the node is restored.
    • When a Writer node becomes faulty, the program automatically removes the VIP from the node
    • Switch to Master2 and set Master2 as Writer
    • All Slave nodes will point to Master2
  • MMM Monitoring mechanismMMM consists of monitor and Agent programs with the following functions:
    • Monitor: Monitors the database status in the cluster and issues a switchover command when an exception occurs. It is deployed separately from the database.
    • Agent: The agent process running on each MySQL server, the executor of the monitor command, completes the probe work of monitoring and specific service Settings, such as setting VIP (virtual IP) and pointing to the new synchronization node.

“MHA Architecture”

Master High Availability (MHA) is a mature High Availability solution for MySQL and an excellent High Availability software for failover and Master/slave promotion. In the process of MySQL failover, MHA can automatically complete the failover operation of the database within 30 seconds. In the process of failover, MHA can ensure data consistency to the maximum extent and achieve high availability in the true sense. MHA also supports fast switching of Master to another host online, typically in 0.5-2 seconds.

! To set up an MHA, you must have at least three database servers in a replication cluster

  • MHA composition: MHA Manager (management Node) and MHA Node (data Node).
    1. The MHA Manager can be deployed on an independent machine to manage multiple master-Slave clusters or on a slave node. Responsible for checking whether master is down, controlling failover, and checking MySQL replication status.
    2. The MHA Node runs on each MySQL server. Both the Master and Slave roles are called nodes, which are monitored and managed object nodes. Responsible for saving and copying binary logs of the master, identifying and applying the different trunk log events to other slaves, and clearing trunk logs.
    • The MHA Manager periodically probes the master nodes in the cluster. When the master fails, it can automatically promote the latest slave to the new master and then repoint all other slaves to the new master. The whole failover process is completely transparent to the application.
  • MHA troubleshooting mechanism:
    • Save the master’s binlog
    • Find the latest slave based on the binlog position
    • Repair other slaves with the latest slave relay log
    • Restore the saved binlog on the latest slave
    • Upgrade the latest slave to master
    • Redirect other slaves to the newly promoted master and enable master/slave replication
  • MHA advantages:
    • Automatic failover is fast
    • There are no data consistency issues with primary library crashes
    • Provides excellent performance and supports semi-synchronous and asynchronous replication
    • A Manager monitoring node can monitor multiple clusters