MySQL this piece of knowledge is still quite a lot, ask depth, generally how to tune, of course, not MySQL basics and other knowledge.

MySQL Interview FAQ

The difference between InnoDB and MyISam

InnoDB

Is MySQL’s default “transactional” storage engine, which is only considered for features it does not support.

Four standard isolation levels are implemented, with the default being “REPEATABLE READ”. At the repeatable read isolation level, “Phantom reads are prevented by multi-version concurrency Control (MVCC)+ next-key Locking.”

The primary index is a “clustered index” that stores data in the index to avoid reading directly from disk, thus greatly improving query performance.

Internal optimizations include “predictive reads” for reading data from disk, “adaptive hash indexes” that speed up read operations and are automatically created, and insert buffers that speed up insert operations.

Supports true “online hot backup”. Other storage engines do not support online hot backup. To obtain a consistent view, you need to stop writing to all tables. In a mixed read/write scenario, stopping writing may also mean stopping reading.

MyISAM

The design is simple and the data is stored in a “tight format”. It can still be used for read-only data, or for tables that are small enough to tolerate fixes.

Provides a number of features, including “compressed table”, “spatial data index”, and more.

Transactions are not supported.

Row-level locking is not supported. Only the entire table can be locked. Shared locks are added to all tables that need to be read when the table is read, and exclusive locks are added to the table when the table is written. However, new records can be inserted into a table while a read is being performed on the table, which is called CONCURRENT INSERT.

Check and repair operations can be performed manually or automatically, but unlike transaction recovery and crash recovery, some data can be lost and repair operations are very slow.

If the DELAY_KEY_WRITE option is specified, the modified index data will not be written to the disk immediately after the modification is complete, but will be written to the key buffer in memory. The corresponding index block will be written to the disk only when the key buffer is cleared or the table is closed. This method can greatly improve the write performance, but in the case of database or host crash, the index is damaged and needs to be repaired.

To compare

  • “Transactions” : InnoDB is transactional and can use Commit and Rollback statements.
  • “Concurrent” : MyISAM only supports table-level locking, while InnoDB also supports row-level locking.
  • “Foreign keys” : InnoDB supports foreign keys.
  • “Backup” : InnoDB supports online hot backup.
  • “Crash Recovery” : MyISAM crashes at a much higher rate than InnoDB, and recovery is slower.
  • “Other Features” : MyISAM supports compressed table and spatial data indexes.

How does MySQL execute a SQL

The entire process of SQL execution

MySQL can be divided into service layer and storage engine layer:

  1. “Service layer including connector, query cache, analyzer, the optimizer, actuators, etc”, covers most of the core MySQL service function, and all of the built-in functions (such as date, time, mathematics, and encryption function, etc.), all across the storage engine functions are realized in this layer, such as stored procedures, triggers, views, etc.
  2. ** The storage engine layer is responsible for storing and extracting data. ** Its architecture mode is plug-in, supporting InnoDB, MyISAM, Memory and other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5.

The Server layer executes SQL in sequence as follows:

Client request -> Connector (verify user identity, give permissions) -> Query cache (if cache exists, return directly, -> analyzer (lexical analysis and syntax analysis of SQL operations) -> optimizer (mainly for the execution of SQL optimization to select the best execution scheme method) -> executor (execution will first see whether the user has execution permission, -> Go to the engine layer to retrieve data returns (if query caching is enabled, query results will be cached)

“In a nutshell” :

  • Connector: manages connections and authenticates permissions.
  • Query cache: If the cache is matched, the result is returned directly.
  • Analyzer: performs lexical and syntax analysis on SQL. (This step is also used to determine whether the SQL field of the query exists.)
  • Optimizer: Perform plan generation and select indexes;
  • Executor: Operates the engine and returns results.
  • Storage engine: stores data and provides read/write interfaces.

Mysql acid principle

ACID, Atomicity, Consistency, Isolation, Durability!

Let’s take the example of transferring 50 yuan from account A to account B to illustrate the four characteristics of ACID.

atomic

By definition, atomicity means that a transaction is an indivisible unit of work in which all or none of the operations are done. That is, either the transfer succeeds, or the transfer fails, there is no state in between!

What happens if you can’t guarantee atomicity?

OK, there will be “inconsistent data” situation, account A minus 50 yuan, but account B increase 50 yuan operation failed. The system will lose 50 yuan ~ without cause

Isolation,

By definition, isolation means that when multiple transactions are executed concurrently, “the operations within the transaction are isolated from other transactions” and the concurrently executed transactions cannot interfere with each other.

What if isolation is not guaranteed?

OK, so let’s say account A has $200, and account B has $0. Account A transfers 50 yuan to account B twice, which is executed in two transactions. If isolation is not guaranteed, the following situation occurs

Transaction isolation

As shown in the figure, if the isolation is not guaranteed, A deducts the payment twice, while B only adds the payment once, 50 yuan disappears out of thin air, and the situation of “data inconsistency” still appears!

persistence

By definition, “persistence means that once a transaction is committed, its changes to the database should be permanent”. Subsequent operations or failures should not affect it in any way.

What if persistence is not guaranteed?

In MySQL, to solve the problem of inconsistent CPU and disk speed, MySQL loads data from disk into memory, performs operations on memory, and then writes back to disk. Well, suppose there is an outage and all the data modified in memory is lost, so persistence is not guaranteed.

Imagine that the system tells you that the transfer has been successful. However, when you find that the amount has not changed at all, the data has an illegal data state, which is considered as “data inconsistency”.

consistency

By definition, consistency is when the data is in a legitimate state, semantically rather than syntactically, before and after a transaction is executed. So what is a legitimate data state? OK, the state that satisfies a predetermined constraint is called a legal state, or more generally, it’s up to you to define it. “Satisfy this state, the data is consistent, do not satisfy this state, the data is inconsistent”!

What if consistency is not guaranteed?

  • Example 1: Account A has 200 yuan, transfer 300 yuan out, and the balance of account A is -100 yuan. You naturally see that the data is inconsistent at this point. Why? Because you define a state, the balance column has to be greater than 0.
  • Example 2:200 yuan was transferred from account A to account B, 50 yuan was deducted from account A, but the balance of account B did not increase due to various accidents. You also know that the data is inconsistent at this point. Why? Because you define A state where the balance of A plus B has to be constant.

How does mysql guarantee consistency?

OK, there are two levels to the question. “At the database level,” databases ensure consistency through atomicity, isolation, and persistence. In other words, THE four properties of ACID, C(consistency) is the purpose, A(atomicity), I(isolation), D(persistence) is the means, is to ensure consistency, database provides the means. The database must implement AID to achieve consistency. For example, atomicity is not guaranteed, and obviously consistency is not guaranteed either.

However, if you deliberately write code in a transaction that violates the constraints, consistency is not guaranteed. For example, if you’re in the transfer case, and your code deliberately doesn’t add money to account B, that’s still not consistent. Therefore, you must also consider the application layer.

“At the application level,” the code determines whether the database data is valid, and then decides whether to roll back or commit the data!

How does mysql guarantee atomicity

OK, is using Innodb undo log. Undo log is called rollback log and is the key to achieving atomicity. When a transaction is rolled back, it can “undo all SQL statements that have been successfully executed”. It needs to record the corresponding log information that you want to roll back. For example,

  • (1) When you delete a data, you need to record the data, and when you rollback, you need to insert the old data
  • (2) When you update data, you need to record the old value, and when you rollback data, you need to perform the update operation based on the old value
  • (3) The primary key of the record is required when a data insert is performed, and the primary key is used when a data rollback is performed

The undo log records the information needed for rollback. When a transaction fails or a rollback is called, the transaction needs to be rolled back. The undo log information can be used to rollback the data to the original state.

How does mysql guarantee persistence

OK, use Innodb redo log. As mentioned earlier, Mysql loads data from disk into memory, makes changes to the data in memory, and then flusher the data back to disk. If there is a sudden outage, the data in memory will be lost. How to solve this problem? Simple, just write the data to disk before the transaction commits. What’s wrong with that?

  • It would be a waste of resources to change a single byte on a page and flush the entire page to disk. After all, it doesn’t make sense to swipe 16KB of content onto disk if you change just a few things on a 16KB page.
  • After all, SQL in a single transaction may involve changes to multiple data pages, which may not be contiguous or belong to random IO. Obviously, it’s slow to operate random I/O.

So I decided to do the redo log. When data is modified, it is not only performed in memory, but also recorded in the redo log. When a transaction is committed, redo log logs are flushed (some are in memory and some are on disk). When the database is restarted, the redo log is restored to the database, and the undo log and binlog are used to determine whether to roll back or commit data.

What are the benefits of redo logs?

The benefit is that redo logs are flushed more efficiently than data pages, as shown below

  • redo logSmall size, after all, only record which page modified what, so small size, brush plate fast.
  • redo logAppend all the way to the end, sequential IO. Efficiency is obviously faster than random IO.

How does mysql guarantee isolation

The locking and MVCC mechanisms are utilized.

Problems with concurrent transactions

Dirty read

Dirty read

Discard changes

Both T1 and T2 transactions modify a data, “T1 is modified first, T2 is modified later, and the modification of T2 overwrites the modification of T1”. For example, transaction 1 reads A table A=20, transaction 2 also reads A=20, transaction 1 changes A=A-1, transaction 2 also changes A=A-1, the final result A=19, the modification of transaction 1 is lost.

Discard changes

Unrepeatable read

“T2 reads a piece of data, T1 modifies it. If T2 reads the data again, it will read a different result from the first one.”

Unrepeatable read

Phantom read

T1 reads data in a certain range, T2 inserts new data in this range, T1 reads data in this range again, and the read result is different from the first read result.

Phantom read

The difference between unrepeatable and phantom reads

“Unrepeatable reading focuses on modification, while unreal reading focuses on adding or deleting.”

Example 1: Mr. A in transaction 1 reads his salary as 1000, but before completing the operation, Mr. B in transaction 2 changes A’s salary as 2000, resulting in A reading his salary as 2000; This is unrepeatable reading.

Example 2 (under the same conditions, the number of records read on the first and second time is different) : There are 4 people whose salary is more than 3000 in a payroll table. Transaction 1 reads all the people whose salary is more than 3000 and checks 4 records in total. At this time, transaction 2 inserts another record whose salary is more than 3000.

Isolation level of the database

  1. It’s not committed, it’s changed, and even if it’s not committed, other transactions are visible, so for example, if I change A from 50 to 100, but I haven’t committed it yet, another transaction sees it, and then the original transaction rolls back, and A is still 50, But another transaction sees an A of 100. “May cause dirty, phantom, or unrepeatable reads”
  2. Commit read, any changes made by A transaction from the beginning until the commit are not visible to other transactions. For example, A number A was originally 50 and then committed to 100. In this case, another transaction read A 50 before A committed the change, and immediately changed A to 100. And then another transaction reads and suddenly A is 100; “Dirty reads can be prevented, but phantom or non-repeatable reads may still occur”
  3. Repeatable read means that the same record is read multiple times. For example, if A number A is read, it is always A, and A is read twice before and after. “Dirty and unrepeatable reads can be prevented, but phantom reads can still occur.”
  4. Serializable reads, in the concurrent case, have the same results as serializable reads, such as no dirty reads and no phantom reads; “This level protects against dirty, unrepeatable, and phantom reads.”
Isolation level Dirty read Unrepeatable read The phantom read
READ-UNCOMMITTED Square root Square root Square root
READ-COMMITTED x Square root Square root
REPEATABLE-READ x x Square root
SERIALIZABLE x x x

The default isolation level supported by MySQL InnoDB storage engine is REPEATABLE-READ.

“A note of caution here” : Unlike the SQL standard, InnoDB storage engine uses a “next-key Lock” algorithm in REPEATABLE READ transaction isolation level, thus avoiding phantom reads, unlike other database systems such as SQL Server. The default isolation level of InnoDB storage engine is REPEATABLE-READ, which can guarantee transaction isolation, i.e. the SERIALIZABLE isolation level of SQL standard.

Most database systems have read-committed isolation because the lower the isolation level is, the less locks are COMMITTED, but remember that InnoDB storage engine uses “REPEATABLE READ” by default without any performance penalty.

The InnoDB storage engine typically uses the SERIALIZABLE isolation level for distributed transactions.

Why use indexes

  • Creating a unique index ensures the uniqueness of each row of data in a database table.
  • It can greatly “speed up the retrieval of data,” which is the primary reason for creating an index.
  • Help the server “Avoid sorting and temporary tables”
  • Example Change random I/O to sequential I/O.
  • Can speed up “joins between tables”, especially in terms of achieving referential integrity of data.

With all the advantages of indexes, why not create an index for each column of the table

  • When adding, deleting, or modifying data to a table, “indexes are maintained dynamically,” which slows down data maintenance.
  • “Index needs to occupy physical space”, in addition to the data table occupies data space, each index also needs to occupy a certain physical space, if the cluster index needs to be established, then the space will be larger.
  • “Creating and maintaining indexes takes time,” which increases with the volume of data

How does an index speed up queries

Turn disordered data into relatively ordered data (as if lookup has purpose)

Considerations for using indexes

  • You can speed up searches on frequently searched columns;
  • Create indexes on columns that are often used in the WHERE clause to speed up the determination of conditions.
  • Create indexes on columns that often need to be sorted because the indexes are already sorted so that queries can take advantage of index sorting to speed up sorting query times
  • Indexes are very effective in medium to large tables, but the maintenance cost of very large tables is too high to build indexes
  • On frequently used contiguous columns, which are mainly composed of foreign keys, the join speed can be accelerated
  • Avoid using a function on a field in the WHERE clause, which would result in an index miss
  • When using InnoDB, use the business-independent increment primary key as the primary key, i.e. use the logical primary key instead of the business primary key.
  • Set the column to be indexed to NOT NULL, otherwise it will cause the engine to abandon the index for a full table scan
  • Delete indexes that have not been used for a long time. Unused indexes may cause unnecessary performance loss
  • Indexes can be used to improve performance when using limit offset to query the cache.

There are two main data structures used by MySQL indexes

  • “Hash index”, for hash index, the underlying data structure must be hash table, so in the vast majority of requirements for single record query, you can choose hash index, query performance is the fastest; In most scenarios, you are advised to select the BTree index
  • Mysql BTree index uses B+Tree in BTree. However, the two main storage engines (MyISAM and InnoDB) implement the BTree index differently.

Myisam and InnoDB implement btree index method difference

  • MyISAM, “the data field of B+Tree leaf node stores the address of data record”. During index retrieval, the index is searched according to the B+Tree search algorithm first. If the specified key exists, the value of its data field is retrieved, and then the value of data field is used as the address to read the corresponding data record. This is called a “non-clustered index”
  • InnoDB’s data files themselves are index files. Compared with MyISAM, “index files and data files are separated”, “its table data files themselves are an index structure organized by B+Tree, and the node data field of the Tree stores complete data records”. The key of this index is the primary key of the table. So the InnoDB table data file itself is the primary index. The data field of the secondary index stores the value of the corresponding primary key instead of the address. This is also different from MyISAM. When searching according to the primary index, you can directly find the node where the key is located to retrieve the data. When searching by secondary index, we need to fetch the primary key value first, and then walk through the primary index. Therefore, when designing tables, it is not recommended to use long fields as primary keys or non-monotonic fields as primary keys, as this can cause frequent splitting of primary indexes.

4) Database structure optimization

  • “Paradigm optimization” : for example, eliminating redundancy (saving space.
  • “Anti-paradigm optimization” : for example, adding redundancy properly (reducing joins)
  • Limit the range of data: The query statements without any conditions that limit the range of data must be prohibited. For example, when users query the order history, we can control it within a month.
  • “Read/write separation” : the classical database split scheme, the master library is responsible for writing, the slave library is responsible for reading;
  • Partition table: Partitions physically separate data. Data in different partitions can be stored in data files on different disks. So, when the query on the table, only need to scan in the partition table, without a full table scan, significantly shorten the query time, at different disk partition also scatter data transmission on the table in different disk I/O, an elaborate set of partition of disk I/O data transmission can be evenly spread out. This method can be adopted for time tables with large data volume. Table partition can be built automatically on a monthly basis.

Split actually divided into vertical split and horizontal split:

  • Case: The temporary establishment of a simple shopping system involves the following table:
  • 1. Product table (data volume 10W, stable)
  • 2. Order table (data volume 200W, with an increasing trend)
  • 3. User table (data volume 100W, with an increasing trend)
  • Take mysql as an example to describe horizontal split and vertical split, mysql can tolerate the order of magnitude of millions of static data can be up to tens of millions
  • Vertical split:
    • Solve the problem: IO contention between tables
    • Not solving the problem: The stress that occurs when the amount of data in a single table grows
    • Solution: Put the product table and user table on the same server order form on the same server
  • Horizontal split:
    • Problem solving: Stress arising from increasing data volume in a single table
    • Do not resolve the problem: IO contention between tables
  • Solution: “User table” is split into male user table and female user table by gender. “Order table” is split into completed orders and unfinished orders by completed and completed. “Product table” uncompleted orders are placed on a server, and completed order table boxes male user table on a server. The female user table is placed on a server.

Primary key superkey candidate what is a foreign key

  • “Superkey” : A superkey that uniquely identifies the “set of attributes” of a tuple in the relationship as a relational schema

  • “Candidate keys” : Superkeys that do not have “redundant attributes” are called candidate keys. If you delete an attribute from a candidate key, it’s no longer a key!

  • Primary key: “a candidate program primary key selected by the user as a tuple identifier”

  • “Foreign key” : if the relational schema “R property K is the primary key of another schema”, then “K is called a foreign key in schema R”.

Examples:

Student id The name gender age Is don’t professional
20020612 Li hui male 20 The computer Software development
20060613 Zhang Ming male 18 The computer Software development
20060614 Xiao-yu wang female 19 physical mechanics
20060615 Shu-hua lee female 17 biological zoology
20060616 jing male 21 chemical Food chemistry
20060617 jing female 20 biological botany
  1. Superkey: Thus we can see from the example that the student number is a unique identifier for the student entity. The superkey of that tuple is the student number. We can also combine it with other attributes, such as 🙁Student id.gender), (Student id.age)
  2. Candidate key: Student number is a unique identifier that uniquely identifies a tuple, so student number is a candidate key. In fact, a candidate key is a subset of the superkey. For example, (student number, age) is a superkey, but it is not a candidate key. Because it has additional properties.
  3. Primary key: Simply put, the candidate key for the tuple in this example is the student number, but if we select it as the unique identifier for the tuple, then the student number is the primary key.
  4. The foreign key is relative to the primary key, for example, in the student record, the primary key is the student number, in the transcript table also has the student number field, so the student number is the foreign key of the transcript table, is the primary key of the student table.

Primary keys are a subset of candidate keys, candidate keys are a subset of superkeys, and foreign keys are determined relative to primary keys.

Differences between DROP, DELETE, and TRUNCate

  • Drop drop table;
  • Truncate Deletes data in the table. When inserting data, the self-growth ID starts from 1.
  • Delete Deletes data from a table.
  1. The DELETE statement deletes a row at a time from the table and stores the row deletion as a transaction in the log for rollback. TRUNCATE TABLE deletes all data from the TABLE at one time, and individual deletion operations are not logged. Deleted rows cannot be restored. And no table-related delete triggers are activated during the delete process. Fast execution speed.
  2. Space occupied by tables and indexes. After a table is TRUNCATE, the space occupied by the table and index is restored to the initial size. The DELETE operation does not reduce the space occupied by the table or index. The drop statement frees all space occupied by the table.
  3. Generally, drop > TRUNCate > DELETE
  4. Scope of application. TRUNCATE only applies to TABLE; DELETE can be a table or a view
  5. TRUNCATE and DELETE DELETE only data, while DROP deletes the entire table (structure and data).
  6. Truncate and DELETE without WHERE: Only data is deleted, not table structure. Drop removes the dependent constraint, trigger index. Stored procedures/functions that depend on this table are retained with a state of: INVALID.
  7. The delete statement is the Data Manipulation Language (DML). The operation is added to the rollback segment and takes effect only after the transaction is committed. If there is a corresponding tigger, it will be fired at execution time.
  8. Truncate and DROP are Data Define Language (DDL). The operation takes effect immediately. The original Data is not added to the rollback segment and cannot be rolled back
  9. When no backup is available, use DROP and TRUNCate with caution. To delete some rows, use delete and be careful to constrain the scope of influence with WHERE. The rollback section should be large enough. To drop a table, use drop; To preserve a table and delete data from it, use TRUNCate if the data is not related to a transaction. Use delete if it is related to a transaction or if you always want to trigger.
  10. Truncate TABLE has a high speed and efficiency. The Truncate table has the same function as the DELETE statement without a WHERE clause. All rows in the table are deleted. However, TRUNCATE TABLE is faster than DELETE and uses less system and transaction log resources. The DELETE statement deletes one row at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE deletes data by releasing the data pages used to store TABLE data, and only the release of the pages is recorded in the transaction log.
  11. TRUNCATE TABLE deletes all rows in a TABLE, but the TABLE structure, columns, constraints, and indexes remain unchanged. The count used by the new row identity is reset to the seed of the column. If you want to preserve the identity count value, use DELETE instead. To DROP the TABLE definition and its data, use the DROP TABLE statement.
  12. For tables referenced by FOREIGN KEY constraints, use DELETE statements without A WHERE clause instead of TRUNCATE TABLE. Since TRUNCATE TABLE is not logged, it cannot activate the trigger.

What does a view do, can a view change

Views are virtual tables that, unlike tables that contain data, contain only queries that dynamically retrieve data when used; Does not contain any columns or data. Views simplify complex SQL operations, hide details, and protect data. Once views are created, they can be exploited in the same way as tables.

Views cannot be indexed, have associated triggers or default values, and order BY is overwritten if the view itself has an order BY.

Create a view: create view XXX as XXXX

Some views, such as Distinct Union, can be updated without the use of join subquery grouping. Updates to views will update the base table. But views are primarily used to simplify retrieval and protect data, not for updates, and most views are not updatable.

Database paradigm

The first paradigm

In any relational database, the first normal form (1NF) is the basic requirement for the relational schema, and a database that does not meet the first normal form (1NF) is not a relational database. The so-called first normal form (1NF) means that each column of a database table is an indivisible basic data item, and there cannot be multiple values in the same column, that is, an attribute in an entity cannot have multiple values or duplicate attributes. If duplicate attributes occur, you may need to define a new entity consisting of duplicate attributes in a one-to-many relationship with the original entity. In first normal Form (1NF), each row of a table contains information for only one instance. In short, “The first normal form is a repeatless column.”

The second paradigm

The second normal form (2NF) is established on the basis of the first normal form (1NF), that is to satisfy the second normal form (2NF) must first satisfy the first normal form (1NF). Second normal Form (2NF) requires that every instance or row in a database table must be uniquely locatable. Differentiation typically requires adding a column to the table to store the unique identity of each instance. This unique attribute column is called the primary key or primary key, primary code. The second normal Form (2NF) requires that the attributes of the entity depend entirely on the primary key. The so-called complete dependence means that there cannot be attributes that only depend on part of the master key word. If there is, then this attribute and this part of the master key word should be separated to form a new entity. The relationship between the new entity and the original entity is one-to-many. Differentiation typically requires adding a column to the table to store the unique identity of each instance. In short, “The second normal form is a non-primary attribute that is not partially dependent on the primary keyword”.

The third paradigm

To satisfy the third normal form (3NF), one must first satisfy the second normal form (2NF). In short, the third normal Form (3NF) requires that one database table does not contain non-primary keyword information that is already contained in other tables. For example, ** has a department information table, where each department has department id (dept_id), department name, and department introduction. After listing the department number in the employee information table, the department name, department introduction and other department-related information cannot be added to the employee information table. If the department information table does not exist, it should also be built according to the Third Normal Form (3NF), otherwise there will be a lot of data redundancy. ** In a nutshell, the third normal form is that properties do not depend on other non-primary properties. (My understanding is to eliminate redundancy)

What is an overwrite index

If an index contains (or overwrites) the values of all the fields to be queried, it is called a “overwriting index.” We know that in InnoDB storage, if not primary key index, leaf node stores primary key + column value. Eventually, you have to “back to the table,” that is, look it up again by the primary key, which is slower. Overwrite index is to query the column and index is the corresponding, do not do back table operation!

All kinds of trees

There is no more introduction here, but you can see all kinds of trees here

Creation is not easy, if you think it will help, give a small star. Making address 😁 😁 😁

This article is formatted using MDNICE