Hello, today I’m going to share with you Mysql interview questions. Take out your notebook and write them down
Three paradigms of database
First normal Form (1NF) no duplicate columns
The so-called first normal form (1NF) means that each column of a database table is an indivisible basic data item, and there cannot be multiple values in the same column, that is, an attribute in an entity cannot have multiple values or duplicate attributes. If duplicate attributes occur, you may need to define a new entity consisting of duplicate attributes in a one-to-many relationship with the original entity. In first normal Form (1NF), each row of a table contains information for only one instance. In short, the first normal form is a repeatless column.
Second normal Form: Attributes completely dependent on primary keys [eliminates partial function dependence]
Assume that the course selection table is SelectCourse(Student ID, name, age, course name, grade, credit) and the keywords are combination keywords (student ID, course name), because there are the following decision relationships:
(Student ID, course name) → (Name, age, grade, credit)
This database table does not satisfy the second normal form because there are the following determinations:
(Course Name) → (Credits)
(Student ID) → (Name, age)
Third normal Form: Properties not dependent on other non-primary properties [eliminates transitive dependencies]
To satisfy the third normal form (3NF), one must first satisfy the second normal form (2NF). The third normal Form (3NF) requires that one database table does not contain non-primary keyword information that is already contained in other tables.
What is a transaction? Transaction nature?
A transaction is a logical unit consisting of a set of SQL statements. It is a set of operations that meet ACID properties. A transaction can be committed by Commit or rolled back using Rollback
- A – Atomicity transaction is an atomic unit of operation in which all or none of the modifications to data are performed.
- C — Consistent data must be in a Consistent state at the start and end of a transaction (consistency here refers to the migration of a system from one correct state to another)
- I — Isolation Database systems provide a certain amount of isolation mechanism, to ensure the transaction is not affected by external concurrent operation of “independent” execution environment, isolation is when multiple concurrent users access the database, such as the operation of the same table, the database for each user transaction, should not be disturbed by the operation of other transactions, to mutual isolation between multiple concurrent transactions
- D – Durable Data changes after transactions are finished are permanent and can be maintained even in the case of system failures
supplement
These features are not hierarchical
The execution result of a transaction is correct only if consistency is satisfied
In the case of no concurrency, the transaction is executed sequentially and the isolation must be satisfied. In this case, as long as the atomicity is satisfied, the consistency must be satisfied
In the case of concurrency, where multiple transactions are executed concurrently, transactions must satisfy not only atomicity but also isolation to satisfy consistency
Transactions are persistent in case of a database crash (logging system)
What are the concurrency consistency issues?
Update the lost
Both T1 and T2 transactions modify a data,T1 first,T2 later, and T2 overrides T1
Solution: Pessimistic lock (low concurrency) Optimistic lock (high concurrency)
Dirty read
Transaction B reads the data modified by transaction A and performs operations on the data. If transaction A rolls back, the data read by transaction B becomes invalid and does not meet the consistency requirements
Unrepeatable read
Within a transaction, read the same data for many times, when the transaction is not over yet, another transaction also access the same data, then, in the first transaction between reading data, due to the second transaction to modify, then read the first transaction data may be different, so it happened to cross for the second time in a single transaction data is not the same
T2 reads the data,T1 modifies the data, and if T2 reads the data again, it will get a different result from the first read
If the data can only be read after the modification transaction has fully committed, you can avoid this problem by adjusting the isolation level of the database to REPEATABLE_READ
Phantom read
Transaction A has read the new data submitted by transaction B, which does not comply with isolation
Workaround: You can avoid this problem by changing the transaction isolation level of the database to SERIALIZABLE_READ if no other transaction can add new data until the operation transaction has finished processing the data
The isolation level of the transaction?
- Read Uncommitted
- Read Committed
- Repeatable Read
- Serialization (Serializable)
How is transaction persistence achieved?
redolog binlog
Redo log is InnoDB engine specific, binlog is MySQL Server implementation, all engines are available
The redo log is a physical log of “XXX changes made on XXX page”. Binlog is a logical log, like “add 1 to the c field on the line id = 2”
The redo log has a fixed size, so it runs out of space. If it runs out, you must do some writing to disk before continuing. The binlog can be appended, so there’s no space in the binlog, just keep writing
Do you have a redo log and a binlog confused?
What are the Mysql locks? Particle size? How does isolation level relate to locks?
The lock
1. A Shared lock
Lock in share mode
Locked data can be read by other transactions, but cannot be modified or deleted
2. The exclusive lock
X lock, write lock
Locked data cannot be read or written to by other transactions
If the locked row has no index, it is a table lock
Innodb row locks lock index entries on indexes
3. Intent to share lock
IS When a shared lock IS added to a row, the database automatically adds the intended shared lock to the table
4. Intention exclusive lock
IX When exclusive locks are added to a row, the database automatically adds intentional exclusive locks to the table
Intentional lock is automatically operated by the system. When other transactions attempt to perform a full table operation (lock), you can first ask whether intentional lock indicates that certain rows in the table are locked by a row lock to avoid a full table scan
5. On the lock
Special table-level locks are for Insert operations
About keylock, gap lock, record lock
Gap locks and next-key locks are specific to the RR isolation level
1) Record lock: mysql default row lock is next-key lock. When a record is matched using a unique index equivalent query, it degrades to a record lock.
2) Gap lock: mysql default row lock is next-key lock. When an index query does not match any records, it degrades to a gap lock. Select * from t where id= 3; select * from t where id= 3; select * from tb_temp where id > 1 and id < 4 for update; I’m going to lock in the interval 1,4
3) next-key lock: Lock the record itself, but also the gap between the records. Eg: select * from tb_temp where id > 2 and id <= 7 for update; Will lock (2, 7], (7, ~)
When no records are matched, a gap lock is degraded.
4) Range query: hit part of the record, use next-key lock. Eg: select * from tb_temp where id > 2 and id <= 7 for update; Will lock (2, 7], (7, ~)
Gap lock Example
1) Gap lock (when auxiliary index) : take (key column, secondary index column) as gap point, and lock the data area between two gap points.
Next-key locking: Includes record locking and gap locking, that is, locking a range and locking the record itself. InnoDB defaults to next-key locking.
2) InnoDB row locks are implemented by locking index entries on indexes. InnoDB uses row locks only when index conditions are used to retrieve data. Otherwise, InnoDB uses table locks.
Any lock on a secondary index, or on a non-indexed column, eventually goes back to the primary key, and a lock is added to the primary key.
3) The purpose of gap lock is to prevent illusory reading, new data insertion in the gap, and existing data updating to the data in the gap.
6. Key in the lock
7. Clearance lock
Particle size
Locks can be divided into row locks, table locks, record locks, page locks, library locks (rare)
No index read/write lock ——> table lock
Autoincrement lock, intent lock — > table lock
There are index read/write locks —-> row locks
Record lock: a type of row lock in which the scope of a record in a table is specified. A record lock means that a transaction is locked on only one record in a table
Accurate conditional hit, and the hit condition is a unique index
Page lock: a page-level lock in Mysql is a lock whose granularity is in the middle between row-level lock and table-level lock. Table-level lock is fast but has many conflicts, while row-level lock has few conflicts but is slow
Features: Lock unlocking and locking time is between table lock and row lock: deadlocks occur, lock granularity is between table lock and row lock, and the concurrency is general
Gap lock: range lock
Read Uncommitted
There are dirty reads, unrepeatable reads, and phantom reads
Read Committed
Solve dirty reads with write locks
This record must be blocked by other transactions while the data is being written and can only be viewed after the data is written and committed
This can lead to repetitive reading and phantom reading
Repeatable Read
Repeat reads are resolved by long-term read locks and long-term write locks
Read lock: I read the data, you can not change, thus solve the repeat read
Phantoms are solved by keylocking
Temporary locking is primarily for INSERT insert operations
Serialization (Serializable)
ACID by what guarantee
A Atomicity is guaranteed by the Undo log log, which records the log information that needs to be rolled back. When A transaction is rolled back, the SQL that has been successfully executed is cancelled
C Consistency is guaranteed by the other three features, and the program code must ensure service consistency
I Isolation is guaranteed by MVCC
The mysql database records changes in both memory and redo log. In case of downtime,mysql can recover data from the redo log
The InnoDB transaction enters the commit state (redo log). If the prepare is successful, the binlog writes to the disk and persists the transaction log to binlog A commit record in the log)
Does MVCC understand?
Multi-version concurrency control: Data is stored in the same snapshot-like way when reading data, so that read locks and write locks do not conflict, and different transaction sessions will see their own specific version, version chain
The commit of any changes to the database does not directly overwrite the previous data, but instead produces a new version that sits alongside the old one, making it possible to read without locking at all, so MVCC mainly solves the performance problem of concurrent reads
MVCC only works at READ COMMITTED and REPEATABLE READ isolation levels. The other two isolation levels are not compatible with MVCC because of READ UNCOMMITTED always reads the latest rows, not rows that match the current version of the transaction, while SERIALIZABLE locks all read rows
There are two necessary hidden columns in the clustered index record
Trx_id: stores the transaction ID each time a cluster index record is modified
Roll_pointer: every time a change is made to a cluster index record, the old version is written to the undo log. Roll_pointer stores a pointer to the previous version of the cluster index to retrieve the previous version Does not have this property because it does not have older versions)
Delete TRUNCate,drop What is the difference?
Drop Drop the entire table structure. If you want to create a new table, drop it
Truncate: Clears the table data and releases space. The table structure is still in place. Clears the index and cannot be rolled back
Delete: deletes specified data in a table without releasing space or clearing indexes
How do I locate inefficient SQL?
Slow Query logs
In a business system, except for queries using primary keys, all other queries are tested on the test library, and statistics of slow queries are mainly done by operation and maintenance.
Slow query optimization first to understand what is the cause of slow? Does the query condition not match the index? Load unwanted columns? Or too much data?
The statement is first analyzed to see if additional data is loaded, perhaps by querying for extra rows and discarding them, or by loading many columns that are not needed in the result. The statement is analyzed and overwritten
Analyze a statement’s execution plan to see how it uses the index, and then modify the statement or index so that the statement matches the index as closely as possible
If it is impossible to optimize the statement, you can consider whether the amount of data in the table is too large. If so, you can divide the table horizontally or vertically
Horizontal and vertical subtables
Longitudinal table
Artificially divide content that could otherwise be in the same table into multiple tables
For a blog system, article title, author, classification, creation time and so on, is the change frequency is slow, the query times are many, and it is best to have very good real-time data, we call it cold data. The number of blog views, the number of comments, things like that, or something that changes a lot, we call it active data. Therefore, in the database structure design, we should consider the sub-table, the first is longitudinal sub-table processing.
Lateral table
Divide the user table user into user1, and user2 does special processing for ids
Difference between InnoDB and MyISAB
- InnoDB supports transaction, MyISAM does not support, for InnoDB each SQL language encapsulated as a transaction, automatic commit, this will affect the speed, so it is best to put multiple SQL languages between begin and commit, constitute a transaction;
- InnoDB supports foreign keys, while MyISAM does not. Converting an InnoDB table with foreign keys to MYISAM will fail;
- InnoDB is a clustered index, data files are tied to the index, must have a primary key, through the primary key index efficiency. But secondary indexes require two queries, first to the primary key and then to the data through the primary key. Therefore, the primary key should not be too large,
- Because the primary key is too large, the other indexes are also large. While MyISAM is a non-clustered index, data files are separated and indexes hold Pointers to data files. Primary and secondary indexes are separate. InnoDB does not store the exact number of rows in a table. Select count(*) from table requires a full table scan. MyISAM uses a variable to store the number of rows in the entire table. When executing the above statement, you only need to read the variable, which is fast.
- Innodb does not support full-text indexing, while MyISAM supports full-text indexing. MyISAM has higher query efficiency.
The fundamentals of indexing
An index is a data structure used by a storage engine to quickly find records.
Indexes are used to quickly find records that have a particular value, and if there is no index, generally the entire table is traversed when the query is executed
The principle of indexing: is to turn disordered data into ordered queries
- Sort the contents of columns that have been indexed
- Generate an inversion list of sorted results
- Spell the data address chain on the inverted list contents
- In the query, first get inverted list content, and then take out the data address chain, so as to get specific data
Why isn’t Hash used in Mysql indexes?
From the perspective of memory, indexes in a database are usually stored on disks. To use hash indexes, hash tables need to be built in memory. However, when a table has a large amount of data, it may be impossible to load all index column data into memory at a time. The size of each super node in the B+ tree can be set to the size of a data page, and only a few data pages that meet the requirements can be loaded in each query, instead of loading the full amount of index data into memory.
From the business scenario, if you only need to query a data according to specific conditions really hash faster, but in the actual business more than often query, query the data within the scope of a certain condition, at this time due to B + tree index and orderly, and a linked list is linked together, can look for the first match, and then down the list of data to conform to the query conditions at a time out; Hash, on the other hand, cannot do such a range query because it is unordered and requires only a matching traversal.
Why not use red black tree for Mysql index?
The search time of a tree is related to the height of the tree. B+ tree is a multi-path search tree, which can reduce the height of the tree and improve the search efficiency. In addition, the operating system’s minimum unit for disk read and write is a block. The size of a block is 4KB, that is, at least 4KB will be read at a time. Red-black tree is a binary tree with only two nodes in each layer. Loading some nodes requires multiple random DISK I/O operations, which is very inefficient.
Mysql clustered index and non-clustered index
Both are B+ tree data structures
Clustered index: stores data and indexes together and organizes them in a certain order. If an index is found, the data is found. The physical storage order of data is the same as the index order, that is, as long as the indexes are adjacent, the corresponding data must be stored adjacent to each other on the disk.
Non-clustered index: leaf nodes do not store data, but store the address of the data row. That is to say, they look up the location of the data row according to the index and then look up the data on disk. This is similar to a tree of directories
advantage
Queries can retrieve data directly through clustered indexes, which is more efficient than non-clustered indexes that require a second query (in the case of non-covered indexes)
Clustered indexes are efficient for range queries because the data is sorted by size
Clustered indexes are suitable for sorting, but non-clustered indexes are not
disadvantage
Indexes can be expensive to maintain, especially when new rows are being inserted or primary keys are being updated causing pagination, so optimize table with the optimize Table at a time of low load after a large number of new rows
If the UUID(random ID) is used as the primary key, the data store will be sparse, and the clustered index may be slower than the full table scan. Therefore, it is recommended to use Int auto_INCREMENT as the primary key
If the primary key is large, the secondary index will be larger, because the leaf of the secondary index stores the primary key value. If the primary key value is too long, the non-leaf node will occupy more physical space
InnoDB must have a primary key, and the primary key must be a clustered index. If InnoDB does not have a unique index, it uses the hidden ID of a row inside the database as the primary key. An index created on top of a clustered index is called a secondary index. Secondary indexes always require a secondary lookup to access data. Non-clustered indexes are secondary indexes, such as composite indexes, prefix indexes, and unique indexes.
MyISM of using clustering index, there is no clustering index, two of the clustering index looks no different from B + tree, node structure identical, just different from the contents of the storage, the primary key index of B + tree node stores the primary keys, auxiliary index B + tree storage auxiliary key, and table data stored in a separate place, both of the B + tree leaf nodes are used An address points to the actual table data, and there is no difference between the two keys for the table data, since the index tree is independent, retrieving the index tree through the key does not require access to the primary key
When it comes to sorting large volumes of data, full table scans, and count operations, MyISAM still has the upper hand because indexes take up less space and these operations need to be done in memory
Mysql index data structure, pros and cons
B+ tree —– balanced multi-fork tree
Principles of index design
Queries are faster and take up less space
What about mysql execution plan? (explain)
explain select * from A whrer X=? and Y=?
The serial number of a SELECT query that contains a set of numbers indicating the order in which the SELECT clause or operation table is executed in the query
The ids are the same and the execution sequence is from top to bottom
The ID is different. For subqueries, the ID sequence increases. A larger ID has a higher priority and is executed earlier
The ids are the same and different and exist at the same time
Id Number Each number indicates an independent query. The fewer times an SQL query runs, the better.
SelectType: Indicates the type of the query. It is used to distinguish complex queries from common queries, federated queries, and subqueries
SIMPLE: A SIMPLE select query that does not contain subqueries or unions
PRIMARY: The outermost query is marked as PRIMARY if it contains any complex subparts
DERIVED: Subqueries contained in the FROM list are marked as DERIVED
MySQL performs these subqueries recursively, putting the results in temporary tables.
SUBQUERY: Contains subqueries in a SELECT or WHERE list
DEPENDENT SUBQUERY: Contains subqueries in a SELECT or WHERE list based on the outer layer
UNCACHEABLE SUBQUREY
UNION: If the second SELECT appears after the UNION, it is marked as UNION;
If UNION is included in the subquery of the FROM clause, the outer SELECT will be marked: DERIVED
UNION RESULT: SELECT the RESULT from the UNION table
Table: Shows which table this row is about
Partitions: represents hits in partitioned tables, null for non-partitioned tables
Type: an important field for SQL optimization. It is also an important indicator for us to judge SQL performance and optimization degree
Possible_keys: Displays one or more possible indexes that can be applied to the table.
If an index exists on a field involved in the query, the index is listed but not necessarily used by the query
Key: indicates the actual index. If NULL, no index is used
Key_len: Indicates the number of bytes used in the index. This column is used to calculate the length of the index used in the query. The key_len field helps you check to see if you are making full use of the index
Ref: Shows which column of the index is used, if possible, as a constant. Which columns or constants are used to find values on index columns
Rows: The rows column shows the number of rows that MySQL thinks it must check when performing a query.
Filtered: This field indicates the percentage of filtered data returned by storage engines at the server layer that matches the number of queried records. Note that it is a percentage, not the specific number of records
Extra: Contains important additional information that is not suitable for display in other columns
Index failure scenario
- An INDEX can be invalidated if an OR is included in a WHERE statement. (Using OR does not always invalidate an index; you need to see if the columns to the left and right of an OR match the same index.)
- The index column in the WHERE statement uses a negative query, which may cause index invalidation. Negative queries include NOT,! =, <>! <,! >, NOT IN, NOT LIKE, etc.
- The index field can be null. Using IS NULL or IS not NULL may invalidate the index
- Using built-in functions on index columns is a surefire way to invalidate an index
- An operation on an index column is bound to invalidate the index
- The like wildcard may invalidate the index
- In a federated index, where columns violate the leftmost matching rule, and the index will be invalidated
- MySQL optimizer’s ultimate choice, do not walk the index
Principle of mysql master-slave synchronization
Mysql master/slave synchronization process
The master(binlog dump thread),slave(I/O thread, SQL thread), master thread and two slave threads are used in the Mysql replication.
Master node binlog, the basis of master/slave replication is that the master database records all changes in the database to the binlog. The binlog is a file that stores all changes to the database structure or content from the moment the database server is started
The log dump thread of the primary node reads the contents of the binlog and sends them to the node when the binlog changes.
The binlog content is received from the node I/O thread and written to the relay log
The SQL thread of the secondary node reads the contents of the relay log file and replay data updates to ensure consistency between the primary and secondary databases
Note: The master and slave nodes use the binlog file +position offset to locate the master and slave synchronization position. The slave node will save the offset it has received. If the slave node sends an outage and restarts, the slave node will automatically initiate synchronization from position
Since the default replication mode of mysql is asynchronous, the master database sends logs to the slave database and does not care whether the slave database has processed the logs. This will cause a problem: if the master database has been suspended and the slave database has failed to process the logs, the logs will be lost after the slave database is promoted to the master database
Full synchronous replication
The master writes to the binlog and forces the log to be synchronized to the slave library. All slave libraries are executed and then returned to the client, but performance is obviously affected in this way
Semi-synchronous replication
In contrast to full synchronization, the logic of semi-synchronous replication is as follows: the slave database sends an ACK to confirm the write to the master database. The master database considers the write to be complete when it receives at least one acknowledgement from the slave database
This section describes mysql index types and their impact on database performance
Normal indexes: Allow columns of indexed data to contain duplicate values
Unique index: Ensures the uniqueness of data records
PRIMARY KEY: a special type of index. Only one PRIMARY KEY can be defined in a table. A PRIMARY KEY is used to uniquely identify a record
Joint Index: An Index can cover multiple data columns, such as the Index(columnA,columnB) Index
Full-text index: through the establishment of inverted index, can greatly improve the retrieval efficiency, solve the problem of judging whether the field contains, is a key technology of search engine.
Indexes can greatly improve the query data of the data.
By using indexes, you can improve the performance of the system by using optimization hiders during the query process
However, it slows down the speed of inserting, deleting, and updating tables because the index files are also manipulated while these writes are performed
Indexes need to occupy physical space. In addition to data tables, each index needs to occupy a certain amount of physical space. If clustering indexes are to be established, more space will be needed
Well, this is the end of today’s article, I hope to help you confused in front of the screen