FaQs

Database transactions (ACID)

** atomicity: ** transactions are the smallest unit of execution and cannot be split. The atomicity of the transaction ensures that the action either completes completely or does not work at all;

Consistency: Data is consistent before and after a transaction is executed. Multiple transactions read the same data with the same result.

Isolation: when accessing the database concurrently, a user’s transaction is not disturbed by other transactions, and the database is independent between the concurrent transactions.

Persistence: After a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails.

What is dirty reading? Phantom read? Unrepeatable?

Drity Read: a transaction has updated a copy of data, and another transaction has Read the same copy of data. For some reason, the first transaction has rolled back, and the data Read by the second transaction is incorrect.

Non-repeatable read: Data inconsistency between two queries of a transaction. This may be because the original data updated by a transaction was inserted between the two queries.

Phantom Read: a transaction in which data is inconsistent between two queries. For example, one transaction queries for rows and another transaction inserts new columns. The previous transaction will find columns that it did not have before in subsequent queries.

What is the isolation level of a transaction? What is the default isolation level for MySQL?

Read UNcommitted, Read committed, Repeatable Read, and Serializable can solve dirty Read, unrepeatable Read, and phantom Read problems one by one.

The SQL standard defines four isolation levels:

· Read-uncommitted: ** The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads.

· Read-committed: ** Allows concurrent transactions to READ COMMITTED data, preventing dirty reads, but phantom or unrepeatable reads can still occur.

· REPEATABLE-READ(** REPEATABLE READ) : ** Multiple reads of the same field are consistent, unless the data is modified by the transaction itself, which can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.

· SERIALIZABLE(** SERIALIZABLE) : ** The highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

REPEATABLE_READ isolation level used by Mysql by default

READ_COMMITTED Isolation level used by Oracle

Why does the database use B+ trees instead of B trees

· B tree is only suitable for random retrieval, while B+ tree supports both random and sequential retrieval;

· Higher utilization of B+ tree space can reduce I/O times and lower disk read and write costs. Generally, indexes themselves are too large to be stored in memory, so indexes are often stored on disk as index files. In this case, disk I/O consumption is incurred during index lookups. The internal nodes of the B+ tree do not have Pointers to the specific information about keywords, but are used as indexes. The internal nodes of the B+ tree are smaller than those of the B tree. The number of keywords in the nodes that can be contained in the disk block is larger, and the number of keywords that can be searched in the memory at a time is larger. IO read and write times are the biggest factor affecting index retrieval efficiency.

· B+ tree has a more stable query efficiency. B-tree search may end at non-leaf nodes, and the closer it is to the root node, the shorter the record search time is. As long as the key word is found, the existence of the record can be determined, and its performance is equivalent to a binary search in the whole set of keywords. However, in B+ tree, sequential retrieval is more obvious. In random retrieval, any keyword must be searched from the root node to the leaf node. All keyword search paths have the same length, resulting in the same query efficiency of each keyword.

· B+ tree improves disk IO performance without solving the problem of inefficiency of element traversal. The leaf nodes of a B+ tree are connected together sequentially using Pointers, and the entire tree can be traversed simply by traversing the leaf nodes. Moreover, range-based queries are very frequent in the database, and B trees do not support such operations.

· More efficient in adding and deleting files (nodes). Because the leaf node of B+ tree contains all keywords and is stored in an ordered linked list structure, the efficiency of addition and deletion can be greatly improved.

B+ trees do not need to query data back to the table when meeting the requirements of clustered index and overwritten index.

In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index.

When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.

What is a cluster index? When to use clustered and non-clustered indexes

· Clustered index: store data together with the index, find the index and find the data

· Non-clustering index: Myisam uses key_buffer to cache the index in memory. When it needs to access data (through the index), myISam directly searches the index in memory, and then finds the corresponding data on disk through the index. This is why indexes are slow when they are not hit by the key buffer