MySQL frequent interview questions, the higher the number of stars after the question, the more frequent the interview

Recommended reading:

  • Computer Network High-frequency interview questions (latest version)
  • Java Set interview Questions (latest version)
  • Java Basics Interview questions (latest version)

Search the public account zhang on wechat, reply to the interview manual, and get the PDF version of more frequent interview questions and more interview materials.

Directory:

What is MySQL? *

MySQL is an open source relational database management system (RDBMS), which uses the most commonly used database management language — structured Query Language (SQL) for database management. MySQL is open source, so anyone can download it under the General Public License and modify it for their own needs.

What are the storage engines commonly used by MySQL? What’s the difference? * * *

  • InnoDB

    InnoDB is the default storage engine for MySQL and supports transactions, row locks and foreign keys.

  • MyISAM

    MyISAM is the default storage engine before MySQL5.1. MyISAM has poor concurrency and does not support transactions and foreign keys. The default lock granularity is table-level lock.

InnoDB MyISAM
A foreign key support Does not support
The transaction support Does not support
The lock Table and row locks are supported Support table locks
recoverability Recover from the transaction log None Transaction Log
Table structure Data and indexes are stored centrally,.ibD and.frm Data and indexes are stored separately, data.MYDIndex,.MYI
Query performance The general condition is worse than MyISAM The general case is worse than InnoDB
The index Clustering index Non-clustered index

Three paradigms for databases * *

  • First normal form: Ensure that each column remains atomic and that all field values in the data table are non-decomposable atomic values.
  • Second normal form: Make sure that every column in the table is related to the primary key
  • Third normal Form: Make sure that each column is directly related to the primary key column, not indirectly

What are the data types of MySQL

  • The integer

    TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT occupy 8, 16, 24, 32, and 64 bit storage space, respectively. It is worth noting that 10 in INT(10) is just the number of characters to display and has no practical meaning. For example, if the data type is INT(3), the property is UNSIGNED ZEROFILL, and if the inserted data is 3, the actual stored data is 003.

  • Floating point Numbers

    FLOAT, DOUBLE, and DECIMAL are floating-point types, whereas DECIMAL is processed with strings and can store exact decimals. DECIMAL is less efficient than FLOAT and DOUBLE. FLOAT, DOUBLE, and DECIMAL can all specify column widths, such as FLOAT(5,2) for a total of five digits, with two for DECIMAL and three for integer.

  • string

    The main strings used are CHAR and VARCHAR. VARCHAR is mainly used to store variable-length strings, which saves more space than fixed-length CHAR. CHAR is fixed length and allocates space based on the defined length of the string.

    Usage Scenario: It is better to use CHAR for frequently changed data because it is less likely to generate fragments. It is also better to use CHAR for very short columns, which is more efficient than VARCHAR. The TEXT/BLOB type is generally avoided because temporary tables are used for query, resulting in significant performance overhead.

  • The date of

    Year, time, date, Datetime, timestamp, etc. Datetime saves the time from 1000 to 9999 years, precision bit second, uses 8 bytes of storage space, has nothing to do with the time zone. Timestamp is the same as UNIX timestamps, holding the time from midnight on January 1, 1970 to 2038 with accuracy to the second, using four bytes of storage space, and time zone dependent.

    Application scenario: Use timestamp as much as possible. Compared with datetime, it has higher spatial efficiency.

Index * * *

What is an index?

An index is a structure that sorts the values of one or more columns of a database table. Using an index, you can quickly access specific information in the data table.

Pros and cons of indexes?

Advantages:

  • Greatly speed up data retrieval.
  • Turn random I/O into sequential I/O(because the leaves of a B+ tree are joined together)
  • Accelerometers join to each other

Disadvantages:

  • From a spatial perspective, building an index requires physical space
  • In terms of time, it takes time to create and maintain indexes. For example, indexes need to be maintained when data is added, deleted, or modified.

Index data structure?

Index data structure mainly includes B+ tree index and hash table, corresponding indexes are B+ tree index and hash index respectively. InnoDB engine index types are B+ tree index and hash index. The default index type is B+ tree index.

  • B + tree index

    Those familiar with data structure know that B+ tree, balanced binary tree and red-black tree are all classical data structures. In a B+ tree, all record nodes are placed on leaf nodes in order of key-value size, as shown in the figure below.

As you can see from the figure above, because B+ trees are ordered and all the data is stored in leaf nodes, lookup is very efficient and supports sorting and range lookup.

The index of B+ tree can be divided into primary index and secondary index. The primary index is clustered index, and the secondary index is non-clustered index. Cluster index is a B+ tree index composed of the primary key as the key value of the B+ tree index. The leaf nodes of the cluster index store complete data records. A non-clustered index is a B+ tree index consisting of columns with non-primary keys as the key values of the B+ tree index. The leaf nodes of the non-clustered index store primary key values. Therefore, when using a non-clustered index to perform a query, the primary key value will be found first, and then the data field corresponding to the primary key will be found according to the clustered index. In the figure above, the leaf node stores data records and is the structure diagram of clustered index. The structure diagram of non-clustered index is as follows:

The letters in the figure above are the column values of the non-primary key of the data. If you want to query the information of the column value B, you need to find primary key 7 and query the data field corresponding to primary key 7 in the cluster index.

  • The hash index

    Hash index is based on the hash table, for each row of data, the storage engine will be on the indexed column by hash algorithm calculated hash hash code, and the hash algorithm to keep different column values computed hash code value is different, the hash code value as the key value of the hash table, pointer will point to the data line as the value of a hash table values. The time complexity of finding a data in this way is O (1), which is usually used for accurate searching.

What’s the difference between a Hash index and a B+ tree?

Because of their differences in data structure, hash indexes are generally used for accurate equivalent lookup, while B+ indexes are mostly used for other searches besides accurate equivalent lookup. In most cases, you will choose to use B+ tree indexes.

  • Hash indexes do not support sorting because hash tables are unordered.
  • Hash indexes do not support range lookups.
  • Hash indexes do not support fuzzy queries and left-most prefix matching of multi-column indexes.
  • Since hash conflicts occur in hash tables, the performance of hash indexes is unstable, whereas the performance of B+ tree indexes is relatively stable, with each query going from root to leaf nodes

What are the types of indexes?

The main index types of MySQL are FULLTEXT, HASH, BTREE, and RTREE.

  • FULLTEXT

    FULLTEXT is a full-text index. MyISAM storage engine and InnoDB storage engine support full-text index in MySQL5.6.4 and above. It is generally used to find keywords in text, rather than directly compare whether they are equal or not. Full-text index is mainly used to solve the problem of low efficiency of fuzzy query for text such as WHERE name LIKE “%zhang%”.

  • HASH

    HASH is a HASH index. HASH indexes are mostly used for equivalent queries with complex time of O (1) and high efficiency. However, they do not support sorting, range query, and fuzzy query.

  • BTREE

    BTREE is a B+ tree index, INnoDB storage engine default index, support sorting, grouping, range query, fuzzy query, and so on, and stable performance.

  • RTREE

    RTREE is a spatial data index, which is mainly used to store geographical data. Compared with other indexes, the spatial data index has the advantage of range search

What are the types of indexes?

  • Primary key index: Data columns cannot be duplicate or NULL, and a table can have only one primary key index
  • Composite index: An index consisting of multiple column values.
  • Unique index: Data columns cannot be duplicated and can be NULL. The value of the index column must be unique. If it is a combined index, the combination of column values must be unique.
  • Full-text indexing: Searches for the content of text.
  • Normal index: Basic index type, which can be NULL

What’s the difference between a B tree and a B+ tree?

There are two main differences between B trees and B+ trees:

  • The inner node and leaf node of B tree store keys and values, while the inner node of B+ tree only has keys and no values. The leaf node stores all keys and values.

  • The leaf nodes of the B+ tree are linked together to facilitate sequential retrieval.

    The structure of both is shown below.

Why does the database use B+ trees instead of B trees?

  • B trees are good for random retrieval, while B+ trees are good for both random and sequential retrieval
  • The space utilization of the B+ tree is higher because each node of the B+ tree stores keys and values, while the internal nodes of the B+ tree only store keys. In this way, one node of the B+ tree can store more indexes, which lowers the height of the tree, reduces I/O times, and speeds up data retrieval.
  • The leaf nodes of a B+ tree are all connected together, so the range search and sequence search are more convenient
  • The performance of B+ trees is more stable because in B+ trees, each query goes from the root node to the leaf node, whereas in B trees, the value to be queried may not be at the leaf node but already found at the internal node.

When is it appropriate to use a B-tree? Because the internal nodes of a B-tree can also store values, you can put some frequently accessed values closer to the root node, which can improve query efficiency. In summary, the performance of B+ trees is more suitable as database indexes.

What is a clustered index and what is a non-clustered index?

The main difference between clustered indexes and non-clustered indexes is whether data and indexes are stored separately.

  • Clustered indexes: Data and indexes are stored together, and the leaves of the index structure retain rows of data.
  • Non-clustered indexes: Data forward and index are stored separately. Index leaf nodes store addresses pointing to data rows.

In InnoDB storage engine, the default index is B+ tree index. Indexes created using primary keys are primary indexes and clustered indexes. Indexes created on primary indexes are secondary indexes and non-clustered indexes. The secondary index is created on top of the primary index because the leaf node in the secondary index stores the primary key.

In MyISAM storage engine, the default index is also B+ tree index, but the primary and secondary indexes are non-clustered indexes, that is, the leaf node of the index structure stores an address to the data row. And use secondary indexes to retrieve indexes that do not require access to the primary key.

To see the difference, here are two classic images (courtesy of the Internet) :

Does a non-clustered index have to do a table-back query?

Is said above the leaf node of the cluster index storage is a primary key, that is to find the primary key, through the clustering index through clustering index data of the primary key, the latter to find primary key corresponding data through clustering index of query process is back to the table, then the clustering index table query will certainly to back?

The answer is not necessarily, there is an index coverage problem, if the query data is fully available on the secondary index, there is no need to query back to the table. For example, a table stores personal information including id, name, age, and other fields. Select ID,name from user where name = ‘zhangsan’; select ID,name from user where name = ‘zhangsan’; This query does not need to be queried back to the table because all the data is already retrieved through the non-clustered index, which is what index coverage is. Select id,name,age from user where name = ‘zhangsan’; A back table query is required because the value of age cannot be retrieved through a non-clustered index. So how do you solve that? Select id,name,age from user where name = ‘zhangsan’; select id,name,age from user where name = ‘zhangsan’; Query.

So index coverage can solve the problem of non-clustered index query back to the table.

What are the use scenarios for indexes?

  • Indexing is very effective for medium and large tables, but for very small tables, full table scans are generally faster.
  • For very large tables, where the cost of creating and maintaining indexes can become high, consider partitioning.
  • If the number of changes to the table is very large and the number of queries is very small, there is no need to create an index, because maintaining the index is also costly.
  • Fields that do not normally appear in a WHERE condition do not need to be indexed.
  • Consider a federated index if multiple fields are frequently queried.
  • Consider unique indexes when there are many fields and their values are not duplicated.
  • Consider normal indexes when there are multiple and duplicate fields.

Index design principles?

  • The best columns for an index are those that appear after WHERE or specified in a join sentence, not in a selection list that appears after the SELECT keyword.
  • The larger the cardinality of the index column is, the better the index effect is. In other words, the higher the differentiation of the index column, the better the index effect is. For example, using a poorly differentiated column, such as gender, as an index, would be bad, because the column has a maximum of three cardinals, most of which are either male or female.
  • Use shorter indexes whenever possible. You should specify a shorter prefix length when indexing longer strings because smaller indexes involve less disk I/O and because blocks in the index cache can hold more key values, making queries faster.
  • Use left-most prefixes whenever possible.
  • Don’t over-index, each index requires extra physical space and takes time to maintain, so more indexes is not always better.

How can indexes be optimized?

The key to index optimization is to conform to the index design principles and application scenarios. The indexes that do not meet the requirements are optimized to conform to the index design principles and application scenarios.

In addition to the design principles and application scenarios of the index, it can also be considered from the following two aspects.

  • Indexed columns cannot be part of an expression or arguments to a function when querying, because indexes cannot be used. For example,select * from table_name where a + 1 = 2
  • Place the most differentiated index first
  • Use select* as little as possible

The usage scenarios of indexes, the design principles of indexes and how to optimize indexes can be regarded as a problem.

How do I create/drop an index?

Create index:

  • Use the CREATE INDEX statement

    CREATE INDEX index_name ON table_name (column_list);

  • Created when CREATE TABLE is created

    	CREATE TABLE user(
    	id INT PRIMARY KEY,
    	information text,
    	FULLTEXT KEY (information)
    );
    Copy the code
  • Create index with ALTER TABLE

    ALTER TABLE table_name ADD INDEX index_name (column_list);

Delete index:

  • Delete the primary key index

    Alter table TABLE name drop primary key

  • Delete other indexes

    Alter TABLE TABLE name DROP Key index name

Does performance improve when using indexed queries?

Not necessarily. How to use indexes properly has been mentioned in index usage scenarios and index design principles. Because creating and maintaining indexes costs space and time, improper use of indexes will degrade query performance.

What is a prefix index?

Prefix index is used to index the first few characters of text or string. In this way, the index length is shorter and the query speed is faster.

Application scenario: Prefixes are highly distinguishable.

The way to build a prefix index

ALTER TABLE table_name ADD KEY(column_name(prefix_length));

There’s a prefix_length argument that’s hard to determine, and that’s what prefix length means. This can usually be determined using the following method, where the distinction of the entire column is calculated first

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;

And then the distinction between the prefix length and the entire column is the most similar.

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;

Constantly adjust the value of prefix_length until it is close to the distinction calculated for the entire column.

What is the leftmost matching principle?

Left-most matching rule: Continuous matching starts from the left-most and stops when a range query (<, >, between, or like) occurs.

For example, to create an index (a,b,c), you can guess whether an index is used in the following cases.

  • The first kind of

    select * from table_name where a = 1 and b = 2 and c = 3 
    select * from table_name where b = 2 and a = 1 and c = 3
    Copy the code

    SQL > select * from ‘where’; SQL > select * from ‘where’; SQL > select * from ‘;

  • The second,

    select * from table_name where a = 1
    select * from table_name where a = 1 and b = 2  
    select * from table_name where a = 1 and b = 2 and c = 3
    Copy the code

    The answer is that all three queries use indexes, because all three matches from the far left.

  • The third kind of

    select * from table_name where  b = 1 
    select * from table_name where  b = 1 and c = 2 
    Copy the code

    The answer is that neither query uses the index because the match is not from the left

  • A fourth

    select * from table_name where a = 1 and c = 2 
    Copy the code

    This query only uses the index for column A, but does not use the index for column C because column B is skipped.

  • The fifth

    select * from table_name where  a = 1 and b < 3 and c < 1
    Copy the code

    In this query, only columns A and B use indexes, and column C does not because a range query is stopped according to the left-most matching query principle.

  • 6 kinds of

    select * from table_name where a like 'ab%'; 
    select * from table_name where  a like '%ab'
    select * from table_name where  a like '%ab%'
    Copy the code

    In the case of column strings, only prefix matches can use indexes, and infix matches and suffix matches can only perform full table scans.

Under what circumstances does an index fail?

In addition to the several cases that failed to comply with the leftmost matching principle described above, the following cases can also cause index invalidation.

  • The condition has or, for exampleselect * from table_name where a = 1 or b = 3
  • Evaluation on an index will invalidate the index, for exampleselect * from table_name where a + 1 = 2
  • An implicit conversion of the data type to the type of the index will invalidate the index. For example, the string must be quoted, assumingselect * from table_name where a = '1' I’m going to use the index, if I writeselect * from table_name where a = 1 The index is invalidated.
  • Using a function in an index will invalidate the index, for exampleselect * from table_name where abs(a) = 1
  • Starting with % in a like query invalidates the index
  • Used on index! , =, <> Will result in index invalidation, for exampleselect * from table_name where a ! = 1
  • If is NULL/IS not NULL is used on an index field, the index is invalid, for exampleselect * from table_name where a is null

Database transactions * * *

What are database transactions?

A database transaction is a sequence of database operations that access and may operate on various data items. These operations are either all executed or none executed. It is an indivisible unit of work. A transaction consists of all database operations performed between the beginning of a transaction and the end of a transaction.

What are the four characteristics of transactions?

  • Atomicity: Atomicity means that operations containing transactions either all execute successfully or all fail and are rolled back.
  • Consistency: Consistency means that the state of a transaction is consistent before and after execution.
  • Isolation: Changes made by one transaction are not visible to other transactions until the final commit.
  • Persistence: Once data is committed, changes made to it are permanently stored in the database.

Database concurrency consistency issues

When multiple transactions are executed concurrently, the following problems may occur:

  • Dirty read: Transaction A updates data, but has not committed yet. Then transaction B reads the updated data from transaction A, and then transaction A rolls back. The read data from transaction B becomes dirty.
  • Non-repeatable read: When transaction A reads data for multiple times, transaction B performs an update operation and commits the data. As A result, data read by transaction A for multiple times is inconsistent.
  • Phantom read: After transaction A reads data, transaction B inserts several pieces of data into the data read by transaction A. When transaction A reads data again, transaction A finds several more pieces of data, which are inconsistent with the data read before.
  • Lost modification: Transaction A and transaction B both modify the same data. Transaction A modifies first, then transaction B modifies later, and transaction B’s modification overwrites transaction A’s modification.

Inrepeatability and phantom reads look similar, but the main difference is that inconsistencies are found in unrepeatable reads because the data has been updated. In magic reading, data inconsistencies are found mainly because the data has increased or decreased.

What are the isolation levels for the database?

  • Uncommitted read: Changes made to a transaction are visible to other transactions before the transaction is committed.
  • Commit read: After a transaction commits, its changes can be seen by other transactions.
  • Repeatable read: Data read repeatedly in the same transaction is consistent.
  • Serialization: A locking implementation is required to force the transaction to be executed serially.

The isolation level of a database can solve problems such as dirty reads, unrepeatable reads, and phantom reads.

Isolation level Dirty read Unrepeatable read Phantom read
Uncommitted read allow allow allow
Submit to read Don’t allow allow allow
Repeatable read Don’t allow Don’t allow allow
serialization Don’t allow Don’t allow Don’t allow

MySQL’s default isolation level is repeatable reads.

How are isolation levels achieved?

The isolation mechanism of transactions is mainly realized by locking mechanism and MVCC(multi-version concurrency control), commit read and repeatable read can be realized by MVCC, serialization can be realized by locking mechanism.

What is MVCC?

Multiple Version Concurrent Control (MVCC) is a method to control concurrency, which is mainly used to improve the concurrency performance of databases.

You should first understand current and snapshot reads before learning about MVCC.

  • Current read: Reads the latest version of the database and locks the read record to ensure that other transactions do not repair the current record.
  • Snapshot read: Reads without locks are snapshot reads. MVCC is used to read data in snapshots to avoid performance loss caused by locks.

It can be seen that the role of MVCC is to solve the problem of database read and write conflicts without locking, and solve the problems of dirty read, unreal read, and unrepeatable read, but can not solve the problem of lost modification.

The realization principle of MVCC:

  • The version number

    System version number: is an increment ID. The system version number increases with each transaction started.

    Transaction version number: The transaction version number is the system version number at the beginning of the transaction. You can determine the chronological order of the transaction by the transaction version number.

  • Row records hidden columns

    DB_ROW_ID: space required 6 bytes, an implicit increment ID used to create a cluster index. InnoDB uses this hidden ID to create a cluster index if the table does not specify a cluster index.

    DB_TRX_ID: required space 6 bytes, specifies the ID of the most recently modified transaction, which records the ID of the transaction in which the record was created or the record was last modified.

    DB_ROLL_PTR: Requires 7 bytes to roll back a pointer to the previous version of this record.

    They look something like this, omitting the values of specific fields. ,

  • The undo log

    The snapshot used by MVCC is stored in the Undo log, which connects all the snapshots, one row at a time, through the rollback pointer. They look something like this.

So let’s do a simple example, let’s say that the first record looks something like this

Now a transaction comes along and modifies his age field to look like this

Now there’s another transaction 2 that changes its gender, and it looks like this again

From the above analysis, it can be seen that when a transaction changes the same record, each record will be joined into a linear table in the Undo log, and at the head of the table is the latest old record.

InnoDB’s workflow at the isolation level of repeat reads:

  • SELECT

    As a result of the query, two conditions must be met:

    1. The snapshot version of the row to be queried by the current transaction must be smaller than the version number of the current transaction. This ensures that the snapshot of the row read by the current transaction either existed before the current transaction started, or was inserted or modified by the current transaction itself.
    2. The deleted version number of the row snapshot to be read by the current transaction must be greater than the version number of the current transaction. If the deleted version number is less than or equal to that of the current transaction, the row snapshot has been deleted and cannot be read.
  • INSERT

    Use the current system version number as the creation version number of the data row snapshot.

  • DELETE

    Use the current system version number as the deleted version number of the data row snapshot.

  • UPDATE

    Save the current system version as the version of the data row snapshot created before the update, and save the current system version as the deleted version of the data row snapshot after the update. In fact, delete the data row snapshot before the update.

To sum up, MVCC is designed to solve the problem of concurrent read and write conflicts without locking. It can achieve two isolation levels of commit read and repeatability.

Database locks * * *

What are database locks?

When a database has concurrent transactions, the mechanism to ensure the order of data access is called the locking mechanism.

How does the lock and isolation level relate to the database?

Isolation level implementation
Uncommitted read Always read the latest data without locking
Submit to read The shared lock is added when data is read and released after data is read
Repeatable read The shared lock is added when the data is read and released when the transaction ends
serialization Locks the entire range of keys and holds the lock until the end of the transaction

What are the types of database locks?

MySQL locks can be divided into three types according to the lock granularity:

MySQL lock category Resource overhead Locking speed Whether a deadlock occurs The granularity of the lock Concurrent degree
Table level lock small fast Don’t big low
Row-level locks big slow will small high
Lock the page general general Don’t general general

MyISAM uses table locking by default and InnoDB uses row locking by default.

The classification of lock can be divided into shared lock and exclusive lock

  • Shared lock: A shared lock, also called read lock, is abbreviated as S lock. When a transaction adds S lock to a data object, the data object can be read but cannot be updated. In addition, other transactions can only lock the data object S, not X during the lock.
  • Exclusive lock: Also known as write lock, or X lock for short, a transaction locks a data object X, and the data object can be read and updated. During the lock, other transactions cannot lock the data object X or S.

Their compatibility is as follows (not good with Excel, too ugly diagrams) :

MySQL InnoDB engine row lock mode and how to achieve?

Row lock mode

In the presence of row locks and table locks, a transaction to add a table X lock, need to check whether there is other issues on the table added a lock or a row of the table to add the lock, and each row of the table is for testing once it is very inefficient, in order to solve this problem, realize the multi-granularity locking mechanism, InnoDB has two kind of internal use intent locks, Both types of intent locks are table locks.

  • Intentional shared lock: IS lock for short. A transaction must obtain an IS lock on a table before it intends to share a lock on a row.
  • Intentional exclusive lock: referred to as IX lock, a transaction must acquire an IX lock on the table before it intends to lock rows exclusively.

With intent locks, a transaction that wants to lock X on a table simply checks to see if any other transactions have X/IX/S/IS locks on the table.

The compatibility of locks is as follows:

INnoDB locks rows by locking index entries on indexes. If there is no index, INnoDB locks records by hiding clustered indexes.

There are three types of InnoDB row locks:

  • Record Lock: Locks an index entry
  • Grap Lock: Locks “gaps” between indexes, before the first entry, or after the last entry.
  • Next-key lock: combination of the first two to lock the record and the gap in front of it.

InnoDB row locks: If data is not retrieved by index criteria, InnoDB will lock all records in the table, which has the same effect as table locks.

MVCC does not solve the phantom problem; at repeatable read isolation levels, the phantom problem is solved using MVCC+ next-key Locks.

What are optimistic and pessimistic database locks and how to implement them?

Optimistic locking: The system assumes that data updates will not cause conflicts most of the time, so the database detects conflicts only when the update operation is committed. If a conflict exists, the update fails.

Optimistic lock is generally implemented by version number and CAS algorithm.

Pessimistic locking: Shielding all operations that might violate data integrity, assuming concurrency conflicts. Generally speaking, every time you go to retrieve data, you assume that someone else will modify it, so every time you retrieve data, you lock it.

Pessimistic lock is implemented by the locking mechanism of the database, adding for updata to the query statement.

What is a deadlock? How to avoid it?

A deadlock is a blockage caused by two or more processes competing for resources or communicating with each other during execution. In MySQL, MyISAM acquires all the locks it needs at once, either satisfy them all or wait, so no deadlocks occur. In The InnoDB storage engine, except for transactions consisting of a single SQL, locks are acquired progressively, so there are deadlocks.

MySQL > prevent deadlock or lock conflicts

  • If multiple tables are accessed concurrently by different programs, try to access the tables in the same order.

  • When the program processes data in batches, if the data is sorted, try to ensure that each thread processes records in the same order.

  • In a transaction, if you need to update records, you should directly apply for an exclusive lock of sufficient level, rather than apply for a shared lock first. When you apply for an exclusive lock, other transactions may have already obtained the shared lock of the same record, resulting in lock conflicts or deadlocks.

  • Try to use a low isolation level

  • Try to use indexes to access data to make locking more accurate and thus reduce the chance of lock conflicts

  • The probability of lock conflict is lower for small transactions when the transaction size is properly selected

  • Try to access data with equal conditions to avoid the impact of next-key locks on concurrent inserts.

  • Do not apply for more than the actual need of the lock level, try not to display the lock query

  • For certain transactions, table locks can be used to speed up processing or reduce the probability of deadlocks.

SQL statement basic knowledge and optimization

What are the main categories of SQL statements? *

  • Data Definition Language (DDL) : CREATE, DROP, and ALTER operate logical structures, including table structures, views, and indexes.
  • Data Query Language (DQL) : The database Query Language is MAINLY SELECT
  • Data Manipulation Language (DML) : Includes INSERT, UPDATE, and DELETE
  • Data Control Language (DCL) : allows permission Control operations, such as GRANT, REVOKE, COMMIT, and ROLLBACK.

What are the SQL constraints? * *

  • Primary key constraint: A primary key is a combination of one or more columns in a table that uniquely identifies each row in the table. A table has only one primary key, and primary key constrained columns cannot be empty.
  • Foreign key constraint: A foreign key constraint is used to establish a relationship between two tables by specifying which column of the main table is referenced. Only the primary key of the primary table can be used as a foreign key by the secondary table. The columns of the constrained secondary table may not be primary keys. Therefore, to create a foreign key constraint, you need to define the primary key of the primary table and then the foreign key of the secondary table.
  • Unique constraints: A table can define multiple unique constraints to ensure that a column of data in the table does not have the same value.
  • Default constraint: When inserting new data, the system assigns a default value to the row if no data is specified, or NULL if no default value is set.
  • Check constraint: Check uses logical expressions to determine the validity of data and limits the range of values that can be entered in one or more columns. When a column updates data, the input must satisfy the Check constraint.

What is a subquery? * *

Subquery: Use the results of one query in another query

Subqueries can be divided into the following categories:

  • Standard quantum query: refers to a subquery that returns a value. You can use the operators =,>,<,>=,<=,<> to compare the scalar results of the subquery. Generally, the subquery is placed on the right side of the comparison formula.

    SELECT * FROM user WHERE age = (SELECT max(age) from user) // Query oldest personCopy the code
  • Column subquery: the result of a subquery is one column with n rows. It is usually used to query a field in a table. You can use operators such as IN, ANY, SOME, and ALL, but not directly

    SELECT num1 FROM table1 WHERE num1 > ANY (SELECT num2 FROM table2)
    Copy the code
  • Row subquery: indicates the row n columns of the result returned by the subquery

    SELECT * FROM user WHERE (age,sex) = (SELECT age,sex FROM user WHERE name="zhangsan")
    Copy the code
  • Table subquery: a table whose subquery is n rows and n columns

    SELECT * FROM student WHERE (name,age,sex) IN (SELECT name,age,sex FROMClass1) // Find the class in the student table1The students of classCopy the code

Do you know some of the connection queries in MySQL? * * *

MySQl connection query can be divided into external join, internal join, cross join

  • Outer join

    External joins are mainly divided into LEFT JOIN, RIGHT JOIN and full JOIN.

    Left outer join: Displays all the data in the left table and the eligible data in the right table. The unqualified data in the right table is null.

Right outer join: Displays all data in the left table and the eligible data in the right table. The unqualified data in the right table is null.

MySQL does not support full external connections.

  • Inner join: Only the data that meets the conditions is displayed

  • Cross join: A join that uses cartesian products.

    The Cartesian product of two sets X and Y is expressed as X × Y, where the first object is a member of X and the second object is a member of all possible ordered pairs of Y. For example: A = {A, b}, b = {0}, A * b = {(A, 0), (A, 1), (A, 2), (b, 0), (b, 1), (b, 2)}

For example: There are two tables, L and R.

L table

A B
a1 b1
a2 b2
a3 b3

R table

B C
b1 c1
b2 c2
b4 c3
  • Select L. ‘*’,R. ‘*’ from L left join R on L.b=R.b

    A B B C
    a1 b1 b1 c1
    a2 b2 b2 c2
    a3 b3 null null
  • Select L. ‘*’,R. ‘*’ from L right join R on L.b=R.b

    B C A B
    b1 c1 a1 b1
    b2 c2 a2 b2
    b4 c3 null null
  • Select L. ‘*’,R. ‘*’ from L inner join R on L.b=R.b

    A B B C
    a1 b1 b1 c1
    a2 b2 b2 c2
  • Select L. ‘*’,R. ‘*’ from L,R

    A B B C
    a1 b1 b1 c1
    a1 b1 b2 c2
    a1 b1 b4 c3
    a2 b2 b1 c1
    a2 b2 b2 c2
    a2 b2 b4 c3
    a3 b3 b1 c1
    a3 b3 b2 c2
    a3 b3 b4 c3

What is the difference between in and exists in mysql? * *

In and EXISTS are generally used for subqueries.

  • If exists is used, the external query will be performed first, and each row of data queried will be entered into the inner table query to see whether the conditions are met. In typically retrieves the result set from the inline table query, and then returns the data from the outer query matching the result set.
  • An index is used in either an inner table query or an outer table query.
  • Exists only uses indexes for inner table queries
  • Generally speaking, exist is more efficient when the result set of sub-query is large and the appearance is small. When the subquery finds a small result set and a large appearance, in is more efficient.
  • For not IN and NOT EXISTS, the efficiency of not EXISTS is higher than that of NOT in. It has nothing to do with the result set of the subquery, because NOT IN scans all tables on both inner and outer surfaces without using indexes. Indexes on tables can be used in a not EXISTS subquery.

What is the difference between varchar and char? * * *

  • Varchar means changed length, char means fixed length. When inserted characters exceed their length, insertions are rejected in strict mode with an error message, and insertions are truncated in normal mode. For example, char(5) is 5 regardless of the length of the inserted character. If the inserted character is less than 5, it is supplemented by a space. For vARCHar (5), if the inserted character length is less than 5, the stored character length is the inserted character length and will not be filled.
  • The storage capacity varies. For a CHAR, a maximum of 255 characters can be stored. For vARCHAR, the maximum number of characters can be 65532.
  • A char has a fixed length and is stored faster than a vARCHar. However, it takes up more space than a vARCHar. Varchar has higher space utilization but slower storage speed, which is a time-for-space strategy.

Int (10); char(10); varchar(10); * * *

The 10 in int(10) represents the length of the displayed data, while char(10) and vARCHar (10) represent the size of the stored data.

What is the difference between DROP, DELETE and TRUNCate? * *

drop delete truncate
speed fast Delete line by line, slow faster
type DDL DML DDL
The rollback Do not roll back Can be rolled back Do not roll back
Delete the content If you drop the entire table, all rows and indexes are deleted Delete some or all of the table data while the table structure is still in place Delete all data from table

In general, use DROP to delete an entire table, use DELETE to delete part of the table, and use TRUNCate to delete all data of the table while retaining the table structure.

The difference between a UNION and a UNION ALL? * *

Both union and union all are used to join two sets of results together.

  • The union will rehash and sort the result. The union all will return the merged result directly without rehash or sorting.
  • The performance of union All is better than that of Union.

What is a temporary table, when is it used, and when is it dropped? *

MySQL creates temporary tables that store intermediate result sets during SQL statement execution. These tables are called temporary tables. Temporary tables are visible only to the current connection.

Temporary tables are classified into memory temporary tables and disk temporary tables. MEMORY temporary tables use the MEMORY storage engine and disk temporary tables use the MyISAM storage engine.

Temporary tables are commonly used in the following situations:

  • Subquery in FROM
  • DISTINCT query with ORDER BY
  • Temporary tables are created when the ORDER BY and GROUP BY clauses are different
  • Using UNION queries produces temporary tables

How to optimize large table data query? * * *

  • The index optimization
  • SQL statement optimization
  • Horizontal split
  • The vertical resolution
  • Create intermediate tables
  • Use caching technology
  • Fixed-length tables are faster to access
  • Smaller columns access faster

Do you understand slow log queries? Statistics too slow query? How to optimize slow queries? * * *

Slow queries are used to log SQL statements whose execution time exceeds a certain threshold.

Related parameters:

  • Slow_query_log: indicates whether slow log query is enabled. 1 indicates that slow log query is enabled, and 0 indicates that slow log query is disabled.
  • Slow_query_log_file: path for storing slow query logs of the MySQL database.
  • Long_query_time: slow query threshold. If the QUERY time of an SQL statement exceeds the threshold, it will be recorded in a log.
  • Log_queries_not_using_indexes: Queries that do not use indexes are recorded in the slow query log.
  • Log_output: indicates the log storage mode. FILE: saves logs to a FILE. TABLE: saves logs to the database.

How to optimize slow queries?

  • Analyze the execution plan of the SQL statement to check whether the index of the SQL statement matches
  • Optimize the database structure by splitting tables with many fields into multiple tables, or consider creating intermediate tables.
  • Optimize LIMIT paging.

Why a primary key? * *

A primary key is a unique identifier that uniquely distinguishes each row in a table. Without a primary key, updating or deleting a particular row in a table can be difficult because a row cannot be uniquely and accurately identified.

Is the primary key an autoincrement ID or a UUID? * *

Benefits of using an incremented ID:

  • The field length is much smaller than the UUID.
  • Database automatic numbering, according to the order of storage, easy to search
  • You don’t have to worry about primary key duplication

Disadvantages of using an incremented ID:

  • Because the service volume increases automatically, it is easy to be queried by others in some service scenarios.
  • When data migration occurs, or table merges can be cumbersome
  • In high concurrency scenarios, competitive auto-lock will reduce the throughput of the database

UUID: a universal unique identifier (UUID) calculated based on data such as the current time, counter, and hardware id.

Advantages of using UUID:

  • Unique identifier, does not consider the problem of duplication, in the data split, merge can also achieve global uniqueness.
  • It can be generated in the application layer to improve the throughput of the database.
  • There is no need to worry about leakage of business volume.

Disadvantages of using UUID:

  • Because the UUID is generated randomly, random I/OS occur, affecting the insertion speed and causing low disk usage.
  • UUID occupies a large space. The more indexes you create, the greater the impact.
  • The comparison between UUids is much slower than the self-added ID, which affects the query speed.

In general, MySQL recommends using an incremented ID. Because InnoDB storage engines in MySQL, the primary key index is a kind of clustering index, the primary key index of B + tree leaf nodes stored in order according to the primary key and the data, if the primary key index is the ID, you just need to back are arranged in sequence, if the UUID, the ID is randomly generated, in inserting data will cause a lot of data movement, A large amount of memory fragmentation is generated, resulting in a decrease in insert performance.

Why is the field set to not NULL? * *

First, NULL is NOT the same as NULL. NULL does NOT take up space, whereas NULL does. Therefore, NULL values can still be inserted after being set to NOT NULL.

The field is set to not NULL for several reasons:

  • A NULL value will affect the statistics of some functions, such as count. If a NULL value is encountered, the record will not be counted.

  • The B tree does not store NULL, so the index does not use NULL.

  • The NOT IN subquery returns NULL values IN the case of NULL values.

    For example, the user table is as follows

    id username
    0 zhangsan
    1 lisi
    2 null

    select * from `user` where username NOT IN (select username from `user` where id ! = 0), this query should find zhangsan, but the result is null.

  • When MySQL performs comparison, NULL will participate in field comparison. Because NULL is a special data type, the database needs special data processing during processing, which increases the complexity of database record processing.

How to optimize data access during query? * * *

In terms of reducing data access:

  • Use indexes correctly and try to achieve index coverage
  • Optimize the SQL execution plan

In terms of returning less data:

  • Data paging
  • Return only the required fields

In terms of reducing server CPU overhead:

  • Fair use sort
  • Reduce comparison operations
  • Complex operations are handled on the client side

In terms of increasing resources:

  • Client multi-process parallel access
  • Database parallel processing

How to optimize long and difficult query statements? * *

  • Decompose a large query into several smaller queries
  • Decompose associated query to make cache more efficient

How do I optimize LIMIT paging? * *

  • When the LIMIT offset is large, the query efficiency decreases. You can record the maximum ID retrieved each time and use the ID for the next query

  • Create composite indexes

How to optimize UNION query * *

If the result set does not need to be de-duplicated or sorted, it is better to use UNION ALL.

How to optimize the WHERE clause * * *

  • Do not use it in a WHERE clause! = and <> are not equal to determine, which will cause the index to drop for full table scan.
  • Do not use null or null values in the WHERE clause. Try to set the field to not NULL.
  • Try to use union all instead of or
  • Index the columns involved in WHERE and Order by
  • Minimize the use of in or not in. Full table scan is performed
  • Using parameters in the WHERE clause results in a full table scan
  • Avoid expression or function operations on fields in the WHERE clause that cause the storage engine to drop the index and perform a full table scan

What causes slow EXECUTION of SQL statements? * * *

  • If the SQL statement is executed slowly only occasionally, it may be because of a lock, or because the redo log is full, and data in the redo log must be synchronized to disk.
  • If the SQL statement is always slow, it may be that the field has no index or the field has an index but no index.

What is the execution order of SQL statements? *

SELECT DISTINCT 
	select_list 
FROM 
	left_table 
LEFT JOIN 
	right_table ON join_condition 
WHERE 
	where_condition 
GROUP BY 
	group_by_list 
HAVING 
	having_condition 
ORDER BY 
	order_by_condition
Copy the code

The execution sequence is as follows:

  • FROM: When querying an SQL statement, join the tables on either side of the keyword as a Cartesian product and create a virtual table V1. Virtual tables are views, and the data comes from the execution results of multiple tables.

  • ON: Filters the results of the FROM connection ON and creates virtual table V2

  • JOIN: Add the left table filtered by ON and create a new virtual table V3

  • WHERE: Filters virtual table V3 WHERE and creates virtual table V4

  • GROUP BY: Creates virtual table V5 BY grouping the records in V4

  • HAVING: Filter V5 to create virtual table V6

  • SELECT: Create virtual table V7 by filtering the results in V6 according to SELECT

  • DISTINCT: Create virtual table V8 BY using the GROUP BY clause. You do not need to use DISTINCT because the unique values in a column are grouped into a GROUP, and each GROUP returns only one row of records. Therefore, h is different from h.

  • ORDER BY: Sorts the results in the V8 table.

Database optimization

How to optimize large tables? * * *

  • Scoping of data: Avoid queries that do not have any scoping conditions.
  • Read/write separation: the master library is responsible for writing and the slave library is responsible for reading.
  • Vertical split table: A table is divided into multiple tables by field, and each table stores a portion of its fields.
  • Horizontal partition table: In the same database, the data of one table is divided into multiple tables according to certain rules.
  • Single table optimization: Optimize the fields, indexes, and query SQL in the table.
  • Add the cache

What is vertical sub – table, vertical sub – database, horizontal sub – table, horizontal sub – database? * * *

Vertical split table: A table is divided into multiple tables by field, and each table stores a portion of its fields. Commonly used fields are placed in one table and less commonly used fields in another.

Advantages of vertical table:

  • Avoiding IO contention reduces the probability of locking tables. Because large fields are less efficient, the first data volume is large, and the reading time is long. Second, large fields take up more space and store fewer rows on a page, resulting in more I/O operations.

  • It can improve the query efficiency of popular data.

Vertical database: Tables are classified by service and deployed to different databases. Different databases can be placed on different servers.

Advantages of vertical repository:

  • This reduces service coupling and facilitates hierarchical management of different services.
  • It can increase the number of I/O and database connections and solve the bottleneck of hardware resources on a single machine.

Disadvantages of vertical split (library, table) :

  • The primary key is redundant and redundant columns need to be managed
  • Transaction processing becomes complex
  • There is still the problem of too much data in a single table

Horizontal partition table: In the same database, the data of the same table is divided into multiple tables according to certain rules.

Advantages of horizontal table:

  • The problem of large data volume in a single table is solved
  • Avoid IO contention and reduce the probability of locking tables

Horizontal database: Split the data of the same table into different databases according to certain rules. Different databases can be placed on different servers.

Advantages of horizontal repository:

  • The bottleneck problem of large data volume in single database is solved
  • IO conflicts are reduced, lock contention is reduced, and the failure of one database does not affect other databases (availability), improving system stability and availability

Disadvantages of horizontal split (table, library) :

  • Shard transaction consistency is difficult to resolve
  • Cross-node JOIN performance is poor and the logic becomes complicated
  • Data expansion is difficult and difficult to maintain

In system design should be according to the business when coupled to determine the scheme of vertical depots and vertical table, the data access should be considered when the pressure is not particularly great caching, reading and writing methods of separation, if the amount of data is very big, or sustained growth can consider depots table, horizontal split the logic involved is more complex, common solutions are client architecture and evil agent architecture.

How to deal with the ID key after dividing the database into different tables? * * *

After a database is divided into different tables, the ids of each table cannot start from 1. Therefore, a global ID is required. You can set the global ID using the following methods:

  • UUID: Advantages: Locally generated ID, no need to remote call; Globally unique is not repeated. Disadvantages: Large footprint, not suitable for indexing.

  • Database increment ID: when the database increment ID is used after the sub-database and sub-table table, a special database is needed to generate the primary key. Each time the service receives a request, it first inserts a meaningless data into the database, obtains a self-increment ID of the database, and uses this ID to write data into the sub-database and sub-table. Advantages: Simple and easy to implement. Disadvantages: Bottlenecks in high concurrency. The system structure is shown below (the picture is from the network)

  • Redis generates ID: Advantages: Independent of database, good performance. Disadvantages: Introducing new components can increase the complexity of the system

  • Twitter’s Snowflake algorithm is a 64-bit LONG ID with 1 unused bit, 41bit for the number of milliseconds, 10bit for the work machine ID, and 12bit for the serial number.

    1bit: The first bit is 0 by default, because the first bit in binary is negative if it is 1, but the ID cannot be negative.

    41bit: Indicates the time stamp, in milliseconds.

    10bit: records the working machine ID. Five bits indicate the equipment room ID and five bits indicate the machine ID.

    12bit: records different ids generated in the same millisecond.

  • Meituan Leaf distributed ID generation system, Meituan Dianping distributed ID generation system

MySQL replication principle and process? How to implement master slave replication? * * *

MySQL replication: To ensure data consistency between the primary server and secondary server, after data is inserted to the primary server, the secondary server automatically synchronizes the modified data from the primary server.

Principle of master-slave replication:

There are three main threads for master-slave replication: binlog thread, I/O thread, and SQL thread.

  • Binlog thread: Responsible for writing data changes on the primary server to the Binary log.
  • I/O thread: Reads Binary logs from the primary server and writes them to the Relay log of the secondary server.
  • SQL thread: Is responsible for reading the relay log, parsing out the data changes that have been made in the master server and replaying them in the slave server

The copying process is as follows (picture from network) :

  1. The Master writes the record of the operation to the binlog before each transaction updates the data.
  2. The Slave Slave library connects to the Master library and creates as many binlog dump threads as there are Master slaves. When the binlog of the Master node changes, binlog dump notifies all slaves and sends the corresponding binlog to them.
  3. After receiving the binlog, the I/O thread writes it to the Relay log.
  4. The SQL thread reads the relay log and replays it from the slave server.

Here is an easy-to-understand diagram.

The role of master-slave replication:

  • High availability and failover
  • Load balancing
  • The data backup
  • Upgrading test

Do you know about read-write separation? * * *

Read/write separation mainly depends on primary/secondary replication, which serves as read/write separation service.

Advantages of read-write separation:

  • The master server writes and the slave server reads, alleviating lock contention
  • Secondary servers can use MyISAM to improve query performance and save system overhead
  • Increase redundancy and improve availability