Mysql FAQ and answers


All kinds of trees

  • A B-tree (B-, which is a multi-way search tree (not binary)) can hit keywords on both leaf and non-leaf nodes
  • B+ trees (Innodb uses this tree) can only hit in leaf nodes and all data is stored in leaf nodes
  • A B* tree differs from a B + tree by adding a linked list pointer to non-leaf nodes
  • The node on the left of the search tree (binary, multi-fork) is less than the value of the tree node, and the node on the right is greater than the value of the tree node
  • The absolute value of the difference between the left and right subtrees of any node in a balanced binary tree (balanced multi-tree) can only be <=1
  • Binary balanced search tree combines the characteristics of search tree and balanced tree
  • Red and black tree

All kinds of keys:

  • Super key: The set of attributes that uniquely identify a tuple in a relationship is called a relational superkey

    Candidate keys: Superkeys that do not have redundant attributes are called candidate keys. If you delete an attribute from a candidate key, it’s no longer a key!

    Primary key: A program primary key selected by the user as a candidate key for tuple identification

    Foreign key: if the attribute K in relational schema R is the primary key of another schema, then K is called a foreign key in schema R.

Log:

1, Redo log

Purpose: To ensure the persistence of a transaction in case of a failure and dirty pages are not written to disk. Restarting the database causes redo log execution to reach transaction consistency

2. Undo log Rollback logs

Purpose: To ensure atomicity of data, record a version of the data before the transaction occurred, used for rollback.

Innodb transaction repeatable read and read committed isolation levels are achieved through MVCC +undo

3, error log Error log

Effect: Error information about Mysql starting, stopping, or running

4. Slow query log Slow query logs

Description: Records the SQL statements that take a long time to execute. The time threshold can be configured. Only the SQL statements that take a long time to execute are recorded

5. Binlog Binary logs

Function: Used for master/slave replication to achieve master/slave synchronization

6. Relay log Indicates the relay log

Function: Used for master/slave database synchronization. The binlog sent by the master database is saved locally and then played back by the slave database

7. General log

Function: Records database operation details. This function is disabled by default. Enabling this function reduces database performance

Some query tips:

  • High frequency query, can establish a joint index to use the overwrite index, do not need to return to the table.
  • Non-high frequency query, based on the existing joint index, using the left-most prefix principle to quickly query.
  • Index push-down was introduced for MySQL 5.6 to reduce the number of table returns

Deadlock detection and handling:

Mysql deadlock handling mode

1. Wait until a timeout occurs and the transaction is automatically rolled back.

2. Initiate deadlock detection to roll back one transaction and allow other transactions to execute.

Deadlock detection, build a transaction as the starting point, lock as the edge of the directed graph, to see whether there is a ring.

Alter table alter table alter table alter table alter table

​ truncate = drop + create

Drop Deletes a table to release space occupied by the table

Truncate deletes data in the table. When inserting data into the table, the ID of the truncate table increases from 1 (the space occupied by the table is restored to the original), and the definition is not deleted

Delete Deletes data in a table. The deletion operation is recorded in logs for rollback. Space is not released and definitions are not deleted

Delete speed DROP > TRUNCate > DELETE

3. If the index value is null, the index cannot be removed

The index value is null, depending on the cost


Mysql > alter database

1, MySQL in an SQL statement execution process

Mysql is mainly divided into Server layer and storage engine layer:

  • Server layer: mainly includes connectors, query cache, analyzer, optimizer, executor, etc. All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, functions, etc., as well as a general log module binlog
  • Storage engine: It mainly supports data storage and reading. It adopts replaceable plug-in architecture and supports multiple storage engines such as InnoDB, MyISAM and Memory, among which InnoDB engine has its own log module called redo Log module. The most common storage engine is InnoDB, which has been used as the default storage engine since Mysql5.5.5.

Description of each component in the figure above:

  • Connector: Identity authentication (login) and permission verification (table lookup)
  • Query cache: When executing a query, the cache will be queried first (MySQL 8.0 was removed because this feature is not practical)
  • Parser: If the cache is not hit, the Sql statement results in the parser. Lexical analysis (analysis of the structure of the statement), grammatical analysis (analysis of the syntax of the statement is correct)
  • Optimizer: Execute according to what Mysql thinks is optimal (which may not be optimal) (generate an execution plan)
  • Executor: Executes statements and returns data from the storage engine

Conclusion:

  • Query execution: permission check — – > query cache — — — — > > parsers optimizer > permission check — – > actuators – > engine
  • Redo log prepare > Binlog -> Redo log commit

The execution of an SQL statement

2. Several paradigms of database

Database paradigm:

1NF: The properties of each relationship are atomic and impossible to split. Each column has only one value.

2NF: If the relational schema R is 1NF and each non-primary attribute is completely (but not partially) dependent on the candidate constructs, then R is said to be second normal form.

3NF: If the relational schema R is 2NF and all non-primary attributes in the relational schema R (U,F) have no transitive dependence on any candidate keywords, the relational schema R is said to belong to the third normal form.

BCNF: BC normal form (BCNF) : conforms to 3NF, and the main attribute does not depend on the main attribute

3. What are the four characteristics of transaction consistency?

1, A atomicity: transactions are the smallest execution unit and are not allowed to be split. The atomicity of the transaction ensures that the action either completes completely or does not work at all;

2. C Consistency: Data remains consistent before and after a transaction is executed. Multiple transactions read the same data with the same result

3. I isolation: when accessing the database concurrently, a user’s transaction will not be disturbed by other transactions, and the database is independent between the concurrent transactions;

D persistence: after a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails.

4. Transaction isolation level. What are the concurrency problems of the four isolation levels?

Isolation level:

  • ** Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty reads, unrepeatable reads, or phantom reads

  • ** read-committed (** READ COMMITTED): Allows concurrent transactions to be READ, preventing dirty reads, but phantom or unrepeatable reads can still occur

  • REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.

  • SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible, meaning that this level prevents dirty reads, unrepeatable reads, and phantom reads.


    Problems at each level:

  • Dirty read: When a transaction is accessing data and making changes to the data that have not yet been committed to the database, another transaction also accesses the data and then consumes it. Because this data is not committed yet, another transaction reads this data as “dirty data” and may not be doing the right thing based on “dirty data”.

  • Lost modification: When a transaction reads data that is also accessed by another transaction. Then after modifying the data in the first transaction, the second transaction also modifies the data, causing the modification result of the first transaction to be lost, so it is called lost modification

  • Non-repeatable reads: Between two reads in a transaction, the data read by the first transaction may be different due to changes made by the second transaction. This occurs when the data read twice within a transaction is not the same and is therefore called a non-repeatable read.

  • Phantom reading: Phantom reading is similar to unrepeatable reading. It occurs when one transaction (T1) reads several rows of data, followed by another concurrent transaction (T2) inserts some data. In subsequent queries, the first transaction (T1) will find more records that did not exist before, as if an illusion occurred, so it is called phantom read

    The difference between unrepeatable read and unreal read: unrepeatable read focuses on modification, while unreal read focuses on addition or deletion

Mysql default isolation level How to ensure concurrency security?

MySQLInnoDB supports repeatable reads by default, but uses the next-key Lock algorithm to avoid phantom reads. This is different from other database systems (such as SQL Server). The isolation requirements for guaranteed transactions are fully met, that is, the SERIALIZABLE isolation level of the SQL standard. However, under distributed transactions, SERIALIZABLE isolation boundaries are generally used

Three concurrency control mechanisms:

Pessimistic concurrency control: Pessimistic concurrency control is actually the most common concurrency control mechanism, namely locking

Optimistic concurrency control: Optimistic concurrency control has another name: Optimistic locking

3. Multi-version concurrency control: MVCC multi-version concurrency control mechanism can be used in combination with either of the previous two mechanisms to improve database read performance

6. Introduce Innodb lock mechanism, including row lock, table lock, record lock, gap lock and intention lock

Optimistic locking: By default, no other transaction will modify the data before it is accessed, so access the data first and then find if any transactions have modified the data in the meantime. This is not database built-in, we need to achieve their own, generally based on the version to achieve. (Use version number to implement)

Adding a version identifier to the data is typically done by adding a “Version” field to the database table in a database table-based version solution. This version number is read with the data when it is read, and is incremented with the version number when it is updated later. In this case, the version data (such as the version number) of the submitted data is compared with the current version of the database table. If the version number of the submitted data is greater than the current version of the database table, the submitted data is updated. Otherwise, the submitted data is considered to be expired.

Pessimistic locks:

Database locks are divided into table – level locks and row – level locks according to the granularity of locks.

  • Table lock: lock the entire table of the current operation, simple, fast lock, undead lock, but low concurrency.
  • Row-level locking: Locks only the current row. Row-level locking can greatly reduce conflicts in database operations. It has the smallest locking granularity and high concurrency, but the cost of locking is also the largest, and locking is slow, resulting in deadlock.

Record Lock: Locks a row. If the table has an index, the Record Lock is on the index. If the table has no index, InnoDB creates a hidden cluster index Lock.

Gap Lock: A Gap Lock is a Lock created when there is a Gap between the recording rows or before the first or after the last recording row. A gap lock may occupy a single row, multiple rows, or empty record. Locks “gaps” between index entries to lock the range of records, excluding the index entries themselves. ** Other transactions cannot insert data within the lock range, thus preventing other transactions from adding phantom rows.

Next-key Lock: Locks the index entry itself and the index range. Next-key is a combination of record and gap locks. Locks both rows and gaps. The combination of Record Lock and Gap Lock. Can solve the illusory problem.

According to whether exclusive, lock can divide for share lock and exclusive lock again.

Also known as read Locks, other users can read data concurrently, but no transaction can acquire an exclusive lock ** on the data until all shared Locks are released.

** If transaction T locks A with X, only T is allowed to read and modify A. No other transaction is allowed to lock A until T releases the lock on A.

Innodb supports both row and table locks. However, the existence of row locks and table locks may conflict. For example, USER A applies for A shared row lock and user B applies for A mutex lock. In this case, B not only needs to check whether other table locks exist, but also check whether row locks exist one by one, which is inefficient. So intentional locks were introduced. An intent lock is a table-level lock that indicates what type of lock (shared or exclusive) will be acquired by the next transaction. Intent locks are divided into intent shared locks (IS) and intent exclusive locks (IX), indicating that the next transaction will acquire a shared lock or an exclusive lock.

Intended Shared lock (IS) : a transaction intending to lock a row with a shared lock must acquire the IS lock on that table before it can lock a row with a shared lock.

Intentional exclusive lock (IX) : a transaction that intends to lock a row exclusively must acquire an IX lock on that table before it can lock a row exclusively.

​ Intention locks are table-level locks that indicate which type of lock (shared or exclusive) a transaction requires later for a row in a table

If an intent lock exists, transaction A must first apply for an intent lock and then apply for A row lock. If transaction B finds that there is an intention to share a lock on the table, some rows in the table are locked by the shared row lock. Therefore, the write lock of transaction B’s application will be blocked. Moreover, the action of applying for the intent lock is automatically completed by the database, and we do not need to apply manually.

7. Introduce MVCC

There is no fixed specification for the implementation of MVCC, and each database will have a different implementation

Concurrency Control MVCC (Multiversion Concurrency Control) : Refers to a technique for improving Concurrency. In the earliest database systems, only reading and writing could be concurrent, and reading and writing were blocked. With the introduction of multiple versions, only writing blocks each other, and the other three operations can be done in parallel, greatly increasing InnoDB’s concurrency.

Each write operation creates a new version of the data, and the read operation selects the most appropriate result from the finite number of versions of the data. At this point, conflicts between read and write operations are no longer a concern, and managing and quickly picking versions of data becomes the main problem for MVCC.

The implementation of MVCC in each database is not uniform. MVCC only works at READ COMMITTED and REPEATABLE READ isolation levels.

For tables using InnoDB’s storage engine, there are two necessary hidden columns in the clustered index record (trx_id-> transaction ID, roll_pointer-> last version pointer, and a hidden column for row_id that is not used here);

Each time a change is made to the record, the corresponding transaction ID is assigned to the trx_ID hidden column and the old version is written to the undo log.

So in the concurrent case, there may be multiple versions of a record, with roll_pointer forming a version chain. The core task of MVCC is to determine which version in the version chain is visible to the current transaction. This gives us the concept of ReadView. This ReadView contains the active read/write transactions in the system, and puts their transaction ids in a list. We’ll call this list m_IDS. Compare ReadView’s list of active transaction ids with version-chain transaction ids to find the largest version of the transaction ID visible:

1. If the value of the version’s trx_ID attribute is less than the minimum transaction ID in the M_IDS list, the transaction that generated the version was committed before the ReadView was generated, so the version can be accessed by the current transaction.

2. If the value of the version’s trx_ID attribute is greater than the maximum transaction ID in the M_IDS list, the transaction that generated the version was generated after the ReadView was generated, so the version cannot be accessed by the current transaction.

The value of the trx_id attribute is between the maximum transaction ID in m_IDS and the minimum transaction ID in m_IDS. If the value of the trx_id attribute is between m_IDS and the minimum transaction ID in m_IDS, the value of the trx_id attribute is between M_IDS and the minimum transaction ID in m_IDS. If the value of the trx_id attribute is between M_IDS and the minimum transaction ID, the value of the trx_id attribute is between M_IDS and the minimum transaction ID in M_IDS. If not, the transaction that generated the version of the ReadView when it was created has been committed and the version can be accessed.

MVCC runs only under the isolation mechanism of read committed and repeatable reads. The difference between the two MVCC implementations is that read Committed generates a ReadView each time data is read. Repeatable read generates a ReadView after reading data for the first time, and the subsequent repeated query no longer produces an eadView.

Conclusion:

Multi-version concurrency control refers to the process of accessing the version chain of records when ordinary SEELCT operations are performed by transactions with READ COMMITTD and REPEATABLE READ isolation levels. In this way, read-write and read-read operations of different transactions can be executed concurrently, thus improving system performance.

READ COMMITTD, REPEATABLE READ One big difference between the READ COMMITTD and REPEATABLE READ isolation levels is when a ReadView is generated. READ COMMITTD generates a ReadView before a normal SELECT operation. REPEATABLE READ generates a ReadView only before the first normal SELECT operation and repeats the ReadVie for all subsequent query operations.

At the RR level,

Snapshot reading is implemented by MVCC and Undo log

The current read is implemented by adding next-key lock (record lock and gap lock).

The difference between “reading” and “reading”

Reads in MySQL are not the same as reads in transaction isolation level.

Let’s see, in THE RR level, the MVCC mechanism makes the data repeatable, but the data we read may be historical data, is not the current data of the database! This can be problematic in businesses that are particularly sensitive to the timeliness of data

This method of reading historical data is called snapshot read, and the method of reading the current version of the database is called current read. Obviously, in MVCC:

  • The snapshot to read:
    • select * from table
  • Current read: The special read operations, insert, update, and delete operations belong to the current read and process the current data and need to be locked
    • Select * from table where? lock in share model
    • select * from table where ? for update
    • insert
    • update
    • delete

The isolation level of a transaction actually defines the current read level. In order to reduce the lock processing time (including waiting for other locks) and improve the concurrency capability, MySQL introduced the concept of snapshot read, so that the select does not need to lock. Update and insert are currently handled by separate modules.

MVCC does not solve the phantom problem, which next-key Locks are designed to solve. At the REPEATABLE READ isolation level, the phantom problem is solved using MVCC + next-key Locks

9. Different usage scenarios of storage engines Innodb and Myisam

MyISAM does not support transactions while Innodb does.

MyISAM supports table-level locking, while Innodb supports row-level and table-level locking (default is row-level locking).

3. Foreign key support: MyISAM tables do not support foreign keys, while InnoDB does.

MyISAM cache has the number of rows in the table. This cache is only the total number of rows in the table. Innodb doesn’t.

Respective usage scenarios:

MyISAM is good for :(1) doing a lot of count calculations; (2) Reading intensive; (3) No transaction.

InnoDB is suitable for :(1) requiring transactions; (2) write intensive (3) high concurrency

10. Differences between clustered index and non-clustered index

The clustering index is explained as follows: the order of clustering index is the physical storage order of data;

The interpretation of non-clustered index is: the index order has nothing to do with the physical order of data;

The difference between:

  • Cluster index:

    • Leaf nodes indexed by primary keys hold entire rows of data. In InnoDB, primary key indexes are also called clustered indexes.
    • Leaf node contents that are not primary key indexes are primary key values. In InnoDB, non-primary key indexes are also called secondary indexes.
  • Non-clustered index: The content of the leaf node is the address of the data

Create index rules, use index notes, and avoid full table scans

Use of indexes:

The primary key and foreign key of a table must have an index.

2. Tables with more than 300 data should have indexes;

3. Create an index on the join field (foreign key) of a table that is frequently joined with other tables.

5. Indexes should be built on fields with high selectivity;

7, frequent data operations of the table, do not create too many indexes;

Columns that frequently appear in a WHERE or ORDER BY statement as query criteria are indexed.

3. Create an index for the sorted column; (A simple order by does not use an index, but if it appears in a WHERE, it does.)

5. Tendency of joint index under high concurrency;

6. Columns used for aggregate functions can be indexed. For example, column_1 needs to be indexed when Max (Column_1) or count(column_1) is used.

7. Use short indexes. Index the string, specifying a prefix length if possible. (formula, high performance mysql)

When not to use indexes?

1. Do not create indexes for columns that are frequently added or deleted;

2. Create index for duplicate columns;

3, too few table records do not create an index.

4, Indexes should be built on small fields. Do not build indexes for large text fields or even long fields

Index usage notes:

1. Avoid using a function on a field in the WHERE clause, which would result in an index failure

2. Use the business-independent increment primary key as the primary key when using InnoDB, i.e. use the logical primary key instead of the business primary key

3. Set index columns to NOT NULL because NULL requires more storage space than an empty string.

MySQL 5.7 can query the schemA_unused_INDEXES view of the SYS library to determine which indexes are not in use.

Index optimization:

1. According to the leftmost prefix principle of the union index, we generally place the columns with the highest sorted grouping frequency on the leftmost

Queries that start with % can only be optimized using full-text indexes.

3. Use short indexes. Index the string, specifying a prefix length if possible.

4. If there is a mixed judgment condition of unequal sign and equal sign, please place the column of equal sign condition before the index construction. A range column can use an index (the union index must be the left-most prefix), but the column after the range column cannot use the index. The index can be used for at most one range column, and not all if there are two range columns in the query condition.

Avoid full table scan:

1, Do not use the word or link, can be union all

2, Between and

3. Avoid null values for fields

4. Don’t use the where clause! = and <> operators

5, do not use the WHERE clause to perform expression and function operations on fields

6, do not use wildcard beginning like queries

Do not use select * from t anywhere

8. Use numeric fields whenever possible.

9. Union indexes meet the left-most prefix principle as far as possible.

10. If limit 1 is used to find a single item of data, invalidation will occur if the type is inconsistent. For example, if the string is not quoted, index invalidation will occur.

Recommended specifications for Mysql high-performance optimization

30 Tips for writing high-quality SQL

12, index advantages and disadvantages, index types, index data structure, hash index implementation

Index advantages and disadvantages:

Advantages: (1)

(a) Significantly faster retrieval of data (main reason)

(b) Create unique indexes to ensure the uniqueness of data in each row, etc

(2) disadvantages:

(a) The index needs to be maintained dynamically when data is added, deleted or modified

(b) Index needs space

(c) Creating and maintaining indexes takes time

Index types: primary key index, unique index, full text index, normal index, union index.

What are the two main data structures used by Mysql indexes?

  • Hash index (adaptive Hash) : For Hash indexes, the underlying data structure is Hash table. Therefore, when most of the requirements are for single record query, Hash indexes can be selected to achieve the fastest query performance. In most scenarios, you are advised to select the BTree index.
  • BTree index: the Mysql BTree index uses B+Tree in the BTree. However, the two main storage engines (MyISAM and InnoDB) are implemented differently.

Hash index is to use a certain hash algorithm, the key value into a new hash value, just a hash algorithm can immediately locate the corresponding position, very fast.

Disadvantages of hash indexes:

1. Cannot use range query.

2. Cannot use indexed data to avoid any sort operations;

3. The left-most matching rule of multi-column joint index is not supported;

4. Table scans cannot be avoided at any time.

5. There is the so-called hash collision problem.

13, Why use B+ tree index, compared with B tree what advantages? Why not red black trees? Mention disk prefetch

B tree: Keywords can be matched on leaf nodes and non-leaf nodes

B+ tree, only the leaf nodes store data, the other nodes only serve as indexes. Balanced, stable performance, each query is the height of the tree. Non-leaf nodes only store key-value information. Data records are stored in leaf nodes. There is a chain pointer between all leaf nodes.

An index is a data structure. Indexes themselves are too large to be stored in memory, so indexes are stored on disk as indexed tables. Therefore, the disk I/O consumption of index lookups is several orders of magnitude higher than memory access, so the most important indicator of a data structure as an index is the progressive complexity of disk I/O operations during lookups. In other words, indexes should be structured to minimize the amount of disk I/O needed to access a lookup.

Advantages of B+ trees:

1. The middle nodes of the B+ tree do not store data, so the same size of disk page can accommodate more node elements and fewer I/O times.

2. A B+ tree query must eventually find a leaf node, whereas a B- tree only needs to find a matching element. B+ trees are stable.

3, the scope of inquiry is convenient. B- trees can only rely on tedious middle-order traversal, whereas B+ trees only need to traverse the linked list.

Because ordinary full table query time complexity is O(n); If it’s a balanced binary tree, or a red-black tree, the search time becomes order logN, but they’re still not good for indexing. Because the index is large, usually stored in the disk, cannot take all the index of a loaded into memory, each time can only read a page from the disk into memory, the underlying implementation and balanced binary tree is an array, logically adjacent nodes may be far in the physical structure, so the disk IO number may be large, balanced binary tree can’t make full use of the disk read ahead. The disk does not read data strictly on demand, but prereads data every time. Even if only one byte is required, the disk reads a certain length of data backward from this position into the memory (locality principle). The theory behind this is the famous principle of locality in computer science. In red-black trees, h is obviously much deeper. Because nodes (parent and child) that are logically close may be physically far away and cannot take advantage of locality, the I/O progressive complexity of red-black Tree is also O(h), and the efficiency of red-black Tree is significantly lower than that of B-tree.

Each node of the B (B-) tree can store multiple keywords. It sets the node size to the size of the disk page, making full use of the disk prefetch function. An entire node is read each time the disk page is read. Because each node stores many keywords, the depth of the tree is very small. Then there are fewer disk reads and more lookups in memory.

The key words of the B+ tree are all stored in the non-leaf node, which is used as the index, and the leaf node has a pointer to the next leaf node. The purpose of this optimization is to improve the performance of interval access. It is this feature that makes B+ trees more suitable for storing external data.

SQL > alter table query

Back table query: B+ tree has two types of primary key index and secondary index

Primary key index: BUILDS a B+ tree based on the order of the primary keys in the table and stores the row records in the leaf node. Each table can have only one primary key index.

Secondary index: Leaf nodes do not store row record data, only the primary key value. The corresponding primary key is found through the secondary index, and the corresponding row record is retrieved using the value of the primary key in the clustered index.

Overwrite index: If an index contains (or overwrites) the values of all the fields to be queried, it is called a “overwrite index.” We know that in InnoDB storage engine, if not primary key index, leaf node stores primary key + column value. Eventually, you have to “back to the table,” that is, look it up again by the primary key, which is slower. Overwrite index is to query the column and index is the corresponding, do not do back table operation! The InnoDB storage engine supports overwriting indexes, which means that the queried records can be retrieved from secondary indexes without the need to query records in the clustered index. SQL can return the data required by the query only through the index, rather than through the secondary index to query the data after the primary key.

Select columns from index (s); select columns from index (s);

An index is an efficient way to find rows. When you can retrieve the desired data by retrieving the index, there is no need to read rows from the table. An index that contains (or overwrites) data that meets the fields and conditions in the query statement is called an overwritten index.

Explain the three: A form of non-clustered composite index that contains all columns used by the Select, Join, and Where clauses in the query (i.e. the index is indexed by fields that override the query statement and the query condition).

15. The difference between in and exists

The difference between in and exists.

For use, a query following in can return only one field. There is no limit to what exists.

A exists B; Exists is equivalent to traversing outside A to see if data in A exists in B. In is equivalent to splitting the result set B and connecting it with OR, which is equivalent to making multiple queries.

Exists is equivalent to query filtering, and in is multiple queries

If two tables are of the same size, there is little difference between in and exists.

2, If one of two tables is large and the other is small, then IN is suitable for the case of large appearance and small child query table.

3. If one of two tables is large and the other is small, EXISTS applies to a situation where a subquery table is large and has a small appearance.

In does not use index search, but full table scan.

Slow query reasons and solutions

There are two types of slow query:

1, most cases are normal, knowledge will occasionally appear very slow

2. This Sql statement has been executing very slowly without changing the amount of data

The first kind: occasionally slow

1, the database is brushing dirty pages

When we insert, update, or modify a piece of data into a database, the database will first update the corresponding field data in memory, but after the update, these updated fields are not immediately synchronous persistence to disk. Instead, you log these updates to a redolog, wait until necessary (idle, out of memory), and then flush the latest records to disk via the Redolog log.

When the contents of the memory data page and disk data page are inconsistent, we call the memory page “dirty page”. After memory data is written to disk, the contents of memory and data pages on disk are the same, called “clean pages.”

Brush dirty pages are divided into the following 4 cases:

Redo log full: The inside of the redo log (default 4 gb) capacity is limited, if the database has been very busy, database update frequently, so redo log will soon be full, this time can’t wait to spare time to go to the data synchronization to disk, must stop working, all of the data synchronization to disk, so this time, This will cause our normal SQL statements to suddenly execute slowly. Therefore, when the database synchronizes data to disk, it may cause our SQL statement to execute slowly.

2, memory is not enough: if a query more data, just encounter the data page is not in memory, need to apply for memory, and just at this time when the memory is insufficient, you need to eliminate part of the memory data page, if it is clean page, it is directly released, if it happens to be dirty page you need to brush dirty page

** When MySQL considers the system “idle” : ** When the system is not under pressure

MySQL flush all dirty pages to disk, so that the next time MySQL starts, it can read data directly from disk

2. Can’t get the lock

We are going to execute this statement, it happens that this statement refers to the table, someone else is using, and locked, we can not get the lock, can only wait for someone to release the lock. Or, the table is not locked, but a row to be used is locked

Second: always slow

1, no index (SQL syntax problem)

For example, the field has no index; Index unavailable due to field operations, function operations.

  • 1, query field no index

    Query fields do not have indexes, only full table scan

  • 2, The query field has an index, but is not used

    Pay attention to your Sql syntax and see if it follows some basic principles of computation

  • 3. The index is not used because of the function operation

2, the database selected the wrong index (optimizer caused by)

The system is judged by index differentiation. The more different values on an index, the fewer indexes with the same value, and the higher the index differentiation. We also call the degree of differentiation cardinal number, that is, the higher the degree of differentiation, the larger the cardinal number.

Of course, the system does not traverse the whole data to obtain the cardinality of an index. It is too expensive. The index system predicts the cardinality of an index by traversing part of the data, that is, by sampling

The prediction of the number of scanned rows is only one of the reasons for the system to judge whether to go to the index. Whether the query statement needs to use temporary tables and whether to sort will also affect the system’s choice.

The query is performed by forcing the index

select * from t force index(a) where c>100 and c<10000
Copy the code

Query whether the cardinality of the index matches the actual value

select index from t
Copy the code

To recalculate the index cardinality, use this command

analyze table t
Copy the code

Geek time Mysql45 talk -19 talk

What are the causes of slow execution of an SQL statement

17, Explain the meaning of each field of the statement

Explain:

1, ID: SELECT serial number;

Select_type: show the type of each select clause in the query

1, SIMPLE(SIMPLE SELECT, do not use UNION or subquery, etc.)

Select * from select (select); select (select); select (select);

3, UNION(the second or subsequent SELECT statement in UNION)

4, DEPENDENT UNION(the second or subsequent SELECT statement in UNION, depending on the external query)

5, UNION RESULT(the RESULT of the UNION statement, the second select after all the select)

6. SUBQUERY(the first SELECT in a SUBQUERY whose results do not depend on external queries)

7, DEPENDENT SUBQUERY(first SELECT in SUBQUERY, DEPENDENT on external query)

8, DERIVED table SELECT, subquery FROM clause

9, UNCACHEABLE SUBQUERY(the result of a SUBQUERY cannot be cached, the first line of the external link must be reevaluated)

3. Table: Displays the Table names in the database accessed in this step

4. Type: access mode of the table. It indicates the way MySQL finds the required rows in the table.

ALL, index, range, ref, eq_ref, const, system, NULL (left to right, poor to good performance)

1. All: scans All tables.

2, index: full index scan.

Range: retrieves only rows in a given range, using an index to select rows

Ref: indicates the join matching condition of the above table, that is, which columns or constants are used to find the value on the index column

5, eq_ref: similar to ref, the difference is that the index used is a unique index. For each index key value, only one record in the table matches

Const, system: These types of access are used when MySQL optimizes part of the query and converts it to a constant. MySQL can convert the query to a constant if the primary key is placed in the WHERE list. System is a special case of const type, and is used when the table being queried has only one row

7, NULL: MySQL breaks down statements during optimization without even accessing the table or index. For example, selecting a minimum value from an index column can be done by a separate index lookup

Possible_keys: Possible_keys specifies which index can be used by MySQL to find records in the table. If there is an index on a column involved in the query, it will be listed, but not necessarily used by the query

Key: The possible_keys Key (index) that MySQL actually decided to use must be included in possible_keys

7. Key_len: Specifies the number of bytes used in the index. This column can be used to calculate the length of the index used in the query.

Ref: the comparison of columns and indexes, indicating the join matching conditions of the above table, that is, which columns or constants are used to find the value on the index column

9. Rows: Estimates the number of rows in the result set, which represents the number of rows that MySQL needs to read to find the desired record based on the table statistics and index selection

MySQL > select * from ‘extra’;

1, Using WHERE: the mysql server will retrieve rows from the storage engine after they are retrieved by the storage engine

Using temporary: indicates that MySQL needs to use a temporary table to store the result set. This is common for sorting and grouping queries, and is common for group by. order by

Using filesort: an order by operation cannot be performed Using an index.

4, Using join buffer: The change emphasizes that no index is used when fetching join conditions and that the join buffer is needed to store intermediate results. If this value is present, it should be noted that indexes may be added to improve performance depending on the query.

5. Impossible WHERE: This value emphasizes that the WHERE statement will result in no eligible rows (the result cannot exist by collecting statistics)

Select Tables Optimized Away: This value means that the optimizer may return only one row from the aggregate function result by using the index alone

7, No tables used: Query uses FROM dual or does not contain any FROM clause

Left-most prefix, union index B+ tree? How does a federated index hit look when > appears in the WHERE clause? Where a > 10 and b = “111” where a > 10 and b = “111” where a > 10 and b = “111”

If the query condition matches one or more consecutive columns to the left of the index, the query matches the index. The combined index of a, B, and C, (a,b) can hit (a, C) cannot hit. The union index is not all or nothing.

MySQL’s query optimizer automatically adjusts the conditional order of the WHERE clause to use the appropriate index, but it’s a good practice to keep the order of the fields after the WHERE clause consistent with the union index

A joint index (greater than or equal to 2 and less than or equal to 3) also builds a B+ tree, except that the non-leaf nodes store the first column. Leaf node combination of columns, after the first column, then install the other column index search.

Benefits of federated indexes: Use overwrite indexes to avoid back – to – table operations.

19, Left join,right join,inner join,outer join

Inner join: an inner join is a one-to-one mapping that is displayed only when both tables have it

Left join: All the data in the left table is displayed, while the data in the right table is displayed only in the common part. If there is no corresponding part, only the blank part can be displayed

Right join: If you join right, all the data in the right table will be displayed. If you join right, all the data in the left table will be displayed.

Outer Join Outer Join: Query all data from the left and right tables, but remove duplicate data from the two tables

[Specific cases] [github.com/0voice/inte…] (github.com/0voice/inte… Type of connection. Md))

20, mysql master/slave replication process, binlog record format, asynchronous semi-synchronous synchronization mode of replication

The main library receives an update request from the client and executes the internal transaction update logic and writes to the binlog.

Standby database B maintains A long connection with primary database A. Primary library A has an internal thread dedicated to servicing this long connection for standby library B. The complete process of a transaction log synchronization looks like this:

  1. Run the change master command on standby library B to set the IP, port, username, password of primary library A, and the location from which the binlog request should start. This location contains the file name and log offset.
  2. Run the start slave command on slave library B. In this case, the slave library starts two threads, namely io_thread and SQL_thread, as shown in the figure. Io_thread is responsible for establishing connections with the primary library.
  3. After verifying the user name and password, primary database A reads the binlog from the local database to the location sent by secondary database B and sends the log to secondary database B.
  4. Standby database B obtains the binlog and writes it to a local file, which is called the relay log.
  5. Sql_thread (multi-threaded) reads relay logs, parses the commands in the logs, and executes them.

The I/O process on the node connects to the master node and requests the log content from the specified location in the specified log file (or from the original log); After the primary node receives the I/O request from the secondary node, the I/O process in charge of replication reads the log information after the specified location based on the request information and returns the log information to the secondary node. In addition to the log information, the returned information also includes the bin-log file and bin-log position of the returned information. After receiving the log content from the I/O process of the node, update the received log content to the local relay log. In addition, the binary log file name and location are saved in master-info. When the Slave SQL thread detects the new content added to the relay log, it will parse the content into the actual operation performed on the Slave node and execute it in the local database.

MySQL primary/secondary replication is asynchronous by default:

1. Asynchronous replication: after executing the transaction submitted by the client, the master library will immediately send the result back to the client, regardless of whether the slave library has received and processed it; Master nodes do not actively push bin logs to slave nodes;

2. Semi-synchronous mode: In this mode, the master node only needs to receive the return information from one of the slave nodes before committing; Otherwise, wait until the timeout period and then switch to asynchronous mode before committing; The purpose of this is to reduce the data latency of the master and slave databases and improve data security. The semi-synchronization mode is not built-in to mysql. You need to install a plug-in to enable the semi-synchronization mode.

3. Full synchronization mode: In full synchronization mode, the primary and secondary nodes return a success message to the client only after the commit is performed and confirmed.

Binlog record format:

1. SQL statement based replication: SQL statements that modify data are recorded in the binlog, reducing the amount of binlog logs, saving I/O, and improving performance. In some cases, data on the primary and secondary nodes may be inconsistent.

2. Row-based replication: Split SQL statements into row-based changes and record them in a binlog, which records only which data was modified and to what extent.

Advantages: Solves the problem that stored procedures, functions, or trigger calls or triggers cannot be copied correctly in certain cases. Disadvantages Too much log volume. 3, mixed mode: can statement statement, can not switch line statement.

21. Data inconsistency such as primary/secondary replication or read/write separation

Problems with read/write separation

The phenomenon of reading an expired state of the system from the library is termed “expired read”.

Force-out of the master is a scheme that categorizes query requests and forces them to be sent to the master if the latest results are required.

Sleep solution: After the master library is updated, sleep before reading the slave library. The solution is similar to executing the select sleep(1) command.

Mysql > alter table master/slave replication

Data may be lost after the primary library goes down

The secondary database has only one SQL Thread. The primary database is under heavy write pressure, and the replication is likely to be delayed

Solutions:

Semi-synchronous replication: Solves the problem of data loss

Parallel replication — Solve the problem of late replication from the library (parallel means that multiple threads apply binlog from the library level in parallel, the same library data changes or serial (version 5.7 parallel replication based on transaction group) Settings)

22. Distributed Id generation

1, UUID: not suitable for the main key, because it is too long, and need not be unreadable, query efficiency is low. Tags that are better used to generate unique names such as the name of a file

2. Database self-increment ID: : Set the asynchronous length of the two databases respectively and generate a policy that does not duplicate ids to achieve high availability. This method generates orderly ids, but requires independent database instance deployment, high cost, and performance bottlenecks.

3, using Redis to generate ID: good performance, flexible and convenient, does not rely on the database. However, the introduction of new components resulted in more complex systems, reduced availability, more complex coding, and increased system costs

4. Twitter’s Snowflake algorithm: Github

5. Leaf distributed ID generation system of Meituan: Leaf is an open source distributed ID generator of Meituan, which can ensure global uniqueness, trend increasing, monotonic increasing, and information security. It also mentioned the comparison of several distributed schemes, but it also needs to rely on relational database, Zookeeper and other middleware. Meituan distributed ID technology article

Wechat -IT pasture – distributed Id generation supplement and summary

Learning links

Zhihu about index article, write is own (usually use a lot of) about Ali interview

The relationship between transaction isolation levels and locks

B-,B+, and B star trees

How to avoid phantom reading in Mysql RR level

That’s how easy it is to index a database

Mysql lock: Soul seven torture

Mysql-Innodb-MVCC

Mysql large table optimization