Q: What is the MySQL architecture?

A: It can be summarized as: Client Connectors layer, MySQL Server layer and storage engine layer.

  • Client Connectors Layer: Processes connection requests from the Client and creates connections with the Client.
  • MySQL Server layer:
    • Connector: management connection, permission verification;
    • Analysis: lexical analysis, grammar analysis;
    • Optimizer: perform plan generation, index selection;
    • Executor: Operates the storage engine and returns the execution result.
  • Storage engine: Stores data and provides read and write interfaces.

Issues related to

Q: What are the isolation levels of MySQL transactions? How are these isolation levels implemented?

A:

  • Read uncommitted (RU): When a transaction has not committed, its changes can be seen by other transactions, resulting in dirty reads, each time reading the latest version of the MVCC.
  • Read Commit (RC): After a transaction commits, its changes are not seen by other transactions. There are unrepeatable reads, phantom reads, and the implementation mechanism relies on MVCCSELECTOne is generated before the operationReadView.
  • Repeatable read (RR): The data seen during the execution of a transaction is always the same as the data seen when the transaction is started. Of course, at the repeatable read isolation level, uncommitted changes are also invisible to other transactions, and the implementation mechanism relies on MVCC and is only common the first timeSELECTGenerate one before operationReadViewThis is repeated for subsequent query operationsReadViewIt’s good
  • Serialization (S): For the same row, both read and write are locked. When a read-write lock conflict occurs, the later-accessed transaction must wait for the previous transaction to complete before continuing to execute, and the lock is implemented each time.

Q: What about repeatable read isolation levels?

A: Repeatable read isolation level, in which a read-view is created when a transaction T is started, and then during the execution of a transaction T, even if other transactions have modified the data, the transaction T will still see it as it saw it at start. That is, a transaction executed at the repeatable read isolation level appears to be impervious to the outside world.

Problem: Repeatable read Application scenario:

A: Suppose you are managing a personal bank account. One holds the balance at the end of each month and the other holds the statement details. At this time, you need to do data proofreading, that is, judge whether the difference between the balance of last month and the current balance is consistent with the statement details of this month. You want to be able to proofread without affecting your results if a user makes a new transaction.

Question: What is illusory? How to solve illusory reading?

A: Phantom reading is when a transaction queries the same range twice before and after, and the latter query sees rows not seen by the previous query.

  • At the repeatable read isolation level, normal queries are snapshot reads and do not see data inserted by other transactions. Therefore, illusion only appears under “current read”.
  • Phantom read refers only to “newly inserted row”. Other row modification operations are not phantom read.

There are two ways to solve illusionary reading:

  • One is to use the MVCC mechanism.
  • Add the gap lock.

Question:InnoDBWhy is it possible to support unlocked reads?

A: InnoDB supports non-locked reads based on MVCC. MVCC relies on undo logging. Undo logs store multiple versions of records. Undo logs support transaction rollback and consistent data read.

Question: What does MVCC do?

A: Concurrency Control (Multi-version Concurrency Control) Multi-version concurrency control (MULTI-version concurrency control) refers to the process of accessing the version chain of records when ordinary SELECT operations are performed by transactions with READ COMMITTD and REPEATABLE READ isolation levels. In this way, read-write and read-read operations of different transactions can be executed concurrently, thus improving system performance. READ COMMITTD and REPEATABLE READ One big difference between the isolation levels is that readViews are generated at different times. Readviews store the transaction information that is currently being executed. ReadView is used to judge visibility. REPEATABLE READ only generates a ReadView before the first normal SELECT operation. This ReadView is used repeatedly for all subsequent query operations.

To sum up:

  • Two premises

    1. Each complete record is in the clustered index leaf node.
    2. Each datum has multiple versions, which are stored in the Undo log. Each datum has a transaction ID(trx_id) and a pointer to the previous version (roll_pointer), which is also stored in the Undo log. This way the data in the undo log can be found by roll_pointer correlation, and each data has a transaction ID when the data was changed. An undo log is logged for each change to the record, and each undo log has a roll_pointer attribute (the undo log for INSERT does not have this attribute because the record does not have an older version). You can chain these undo logs together into a linked list.

    The figure above shows how a row of data is stored in a primary key index leaf node or in an undo log.The diagram above shows the version chain of multiple versions in the undo log

  • With the above two premises, we can analyze how the READ COMMITTD, REPEATABLE READ isolation level is implemented under MVCC. The core issue for transactions using READ COMMITTED and REPEATABLE READ isolation levels is to determine which version in the version chain is visible to the current transaction. To this end, InnoDB design uncle proposed a ReadView concept. ReadView contained in the

    • M_ids: currently active transactions (executing transactions and open transactions)
    • Min_trx_id: minimum active transaction, minimum value in M_IDS.
    • Max_trx_id:ReadViewIs the id value in the system that should be assigned to the next transaction.
    • Creator_trx_id:ReadViewThe transaction ID of the transaction

    So how do you tell if the version in the version chain is visible to the current view? The version transaction ID in the version chain compares with the ReadView:

    • Trx_id = creator_trx_id in ReadView, indicating that the previous transaction is accessing its own modified record, visible
    • Trx_id of the accessed version < min_trx_id in ReadView, which indicates that the accessed version was committed before the ReadView was generated by the current transaction.
    • Trx_id >= max_trx_id in ReadView, which indicates that the transaction generating this version is started after the current transaction generates the ReadView and is invisible.
    • Min_trx_id in ReadView < trx_id in accessed version < max_trx_id in ReadView. Check whether trx_id is in m_IDS of ReadView. Yes: active transaction, not visible; No: historical transaction, visible.

    Then there’s the comparison. If one version of the data is not visible to the current transaction, go down the version chain to the next version of the data, continue the above steps to determine the visibility, and so on, until the last version in the chain. If the last version is also not visible, it means that the record is not visible to the transaction at all and is not included in the query result. As shown below:

Question: Describe the following MVCC principles?

Question:MySQLHow to solve the problem of dirty reading, unrepeatable reading and phantom reading?

A: There are two possible solutions

  • Scheme 1: Read operation uses multi-version concurrency control (MVCC), and write operation is locked. The MVCC solution is to generate a ReadView, which finds the matching record version (the history version is built from the undo log). The query can only read the changes that were committed before the ReadView was generated. Changes made by transactions that were not committed before the ReadView was generated or that were opened later are not visible. The write operation must be for the latest version of the record. The historical version of the read record does not conflict with the latest version of the change record, that is, the read-write operation does not conflict with the MVCC. The ReadView itself ensures that transactions cannot read changes made by uncommitted transactions, thus avoiding dirty reads. In REPEATABLE READ isolation level, only the first SELECT operation in a transaction will generate a ReadView, which will be reused in all subsequent SELECT operations, thus avoiding the problem of unrepeatable READ and phantom READ. Reads using MVCC are often referred to as consistent reads, or consistent non-locked reads.

  • Scheme 2: Lock the read and write operations. Dirty reads occur because the current transaction has read a record from another uncommitted transaction. If the other transaction locks the record while writing to the record, the current transaction cannot continue to read the record, so there is no dirty read problem.

    Not repeatable read because of the current transaction reads a record, the first another transaction made after the changes and submit for the record, the current transaction reads again will get different values, if while reading records in the current transaction will give the record locking, then another transaction will not be able to modify the record, also won’t happen naturally not repeatable read.

    Phantom read problems occur because the current transaction reads a range of records, and then another transaction inserts new records into the range. When the current transaction reads the range of records again, the new records are found. We call these newly inserted records phantom records. The big problem with locking is that the phantom records don’t exist at the time of the transaction’s first read, and we can’t lock the phantom records, so we get Gap Locks. Reads and writes that use locks are called lock reads.

What is the difference between begin/start transaction and Transaction with consistent snapshot?

A: Begin /start transaction commands are not the starting point of a transaction, the transaction is actually started by the first statement that operates on the InnoDB table after they are executed. If you want to start a transaction immediately, use the start Transaction with consistent snapshot command.

Indexes related to

Question: What are the common models for indexing?

A:

  • Hash table: Hash table this structure is suitable for equivalence only query scenarios, is not friendly to range query.
  • Ordered arrays: Excellent performance in both equivalent and range query scenarios. If you look at query efficiency alone, ordered arrays are the best data structures. However, when you need to update the data, the trouble is that you insert one record in the middle and have to move all the records behind it, which is too expensive. Therefore, ordered array indexes are only suitable for static storage engines.
  • Search tree

Question: Why do you use multi-fork trees instead of binary trees?

A: A binary tree has only two sons per node, whereas a multi-tree can have multiple sons per node. Therefore, for the same number of nodes, the height of the book in the binary tree must be higher than that in the multi-tree. Therefore, the number of times that the binary tree needs to access, the number of data blocks that need to be accessed, and the number of disks that need to be accessed is more. Quoting from a teacher’s article

Question: What is the difference between a primary key index and a normal index?

A: Primary key index leaves store entire rows of data. In InnoDB, primary key indexes are also called clustered indexes. Leaf node contents that are not primary key indexes are primary key values. In InnoDB, non-primary key indexes are also called secondary indexes. Non-primary key indexes need to return to the table. The return table is random I/O.

Q: What is the difference between a B+ Tree index and a Hash index?

Answer: There are mainly the following differences:

  • Hash indexes are good for equivalent queries, but not for range queries.
  • Hash index can not be used to complete the sort of index, because it is (K,V) structure, there is no order on multiple data stores, naturally cannot be sorted.
  • Hash indexes do not support left-most matching rules for multi-column union indexes.
  • Hash indexes are inefficient if there are a large number of duplicate keys because of hash collisions

Q: What MySQL index related optimizations have been made?

A:

  • Use primary key queries as much as possible: all data is stored on the clustered index, which reduces the consumption of back tables compared to normal index queries.
  • MySQL5.6After that, index push-down optimization is introduced to reduce the consumption of back table judgment through proper use of joint indexes.
  • If a column is frequently queried, you can consider using an overwrite index to avoid returning to the table (soselect *Be careful to avoid).
  • The sequential selection of the joint index fields should be considered from the perspective of reusability and space usage. In terms of reusability, according to the left-most prefix principle, the high-frequency fields should be placed at the left-most. In terms of space usage, if two fields need to be queried separately and jointly, you can create an index with the smaller field. The larger field and the smaller field form a joint index, and the larger field comes first.

Question: What are the benefits of auto-increment primary keys?

A:

  • In terms of performance, the auto-increment primary key fits the scenario of incremental insert, so it is appended every time. The advantage is that it is sequential, efficient, and does not move other records, and does not involve page splitting. The business logic primary key is often not easy to ensure orderly insertion, which is relatively expensive to write data to.
  • In terms of space utilization, suppose you do have a unique field in your table, such as a string id number, if you use the id number as the primary key. Each leaf node that is not a primary key index stores the value of the primary key, so a leaf node that is not a primary key index requires about 20 bytes, 4 bytes if the primary key is an integer, and 8 bytes if the long key is an integer. Obviously, the smaller the primary key length, the smaller the leaf nodes of the normal index, and the smaller the space taken up by the normal index.

Question: Are there any scenarios where you can use business fields directly as home keys?

A: Yes. For example, some business scenarios require something like this:

  1. There is only one index;
  2. The index must be unique.

This is a typical KV scenario (e.g. dictionary table). Since there are no other indexes, there is no need to worry about the size of leaf nodes for other indexes. In this case, we should give priority to the principle of “use primary key as much as possible” and set the index as the primary key to avoid having to search two trees per query.

Problem: In table T below, if I executeselect * from T where k between 3 and 5, how many tree searches will be performed, and how many rows will be scanned?

A: Initialize the script

mysql> create table T ( ID int primary key, k int NOT NULL DEFAULT 0, s varchar(16) NOT NULL DEFAULT '', index k(k)) engine=InnoDB; Insert into T values (100, 1, 'aa'), (200, 2, 'bb'), (300, 3, 'cc'), (500, 5, 'ee'), (600, 6, 'ff'), (700, 7, 'gg');Copy the code

Now let’s take a look at the execution flow of this SQL query:

  1. Find the record k=3 in the index tree, obtain ID = 300;
  2. Select R3 where ID=300;
  3. Select the next value k=5 in the k index tree and get ID=500;
  4. Select * from R4 where ID=500;
  5. The next value k=6 in the k index tree is not satisfied, and the loop ends.

In this process, back to the primary key index tree search process, we call back to the table. As you can see, the query reads three records of the K-index tree (steps 1, 3, and 5) and returns to the table twice (steps 2 and 4).

Question:select k from T where k between 3 and 5select * from T where k between 3 and 5What’s the difference? (T table structure is the same as above)

A: In the index tree of K, the leaf node stores the value of K and the ID value of K, so there is no value of s, so we need to return the table.

Question: is it necessary to have a joint index of id number and name on a citizen information sheet? Table deconstruction is as follows:

CREATE TABLE `tuser` (
  `id` int(11) NOT NULL,
  `id_card` varchar(32) DEFAULT NULL,
  `name` varchar(32) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `ismale` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `id_card` (`id_card`),
  KEY `name_age` (`name`,`age`)
) ENGINE=InnoDB
Copy the code

A: The id number is the unique identification of citizens. That is to say, if there is a need to query the information of citizens based on the ID number, we only need to build an index on the ID number field. But if there is now a high frequency request to look up a citizen’s name based on his id number, the joint index makes sense. It can be used on this high frequency request, eliminating the need to go back to the table to look up the entire row, reducing statement execution time. Of course, index maintenance comes at a cost, and there are trade-offs when creating redundant indexes to support overwriting.

Question: when creating a federated index, how to arrange the field order in the index?

A: The evaluation criteria are

  • Index reuse since the left-most prefix can be supported, there is usually no need to create a separate index on a when there is already a joint index (a,b). Therefore, the first rule of thumb is that if you can maintain one less index by adjusting the order, that order is often the preferred one. In addition, when creating a multi-column index, we put the most frequently used column in the WHERE clause on the left according to business requirements, because MySQL index queries follow the principle of left-most prefix matching, i.e. left-most first, starting from the left-most of the union index when retrieving data.
  • (a, B); (b); (a, B); (b); (a, B); The principle here is that which field is smaller and which field is indexed separately. For example, if the name field is larger than the age field, then I recommend that you create a joint index (name,age) and a single field index (age).

Question: What is an index push down? Based on the above citizen table analysis

Answer: Take the union index (name, age) of the citizens table above. Now if there is a demand: retrieve all the boys in the table whose name begins with Zhang and whose age is 10. So, the SQL statement looks like this:

Mysql > select * from tuser where name like 'zhang %' and age=10 and ismale=1;Copy the code

So this statement can only search the index tree with “zhang”, and then find the records associated with “zhang” and return the table by primary key? Assume that the index in the citizen table with (name, age) is as follows:Before MySQL 5.6, you could only return tables from ID3. Find the rows on the primary key index and compare the field values. The diagram below:

The index condition pushdown feature introduced in MySQL 5.6 can be used to determine the fields contained in the index during index traversal and filter out the records that do not meet the conditions to reduce the number of table returns.This is the index push down.

Therefore, index push-down is mainly used to reduce the number of times the table is returned, and it isMySQL5.7Do optimization.

Question: How do I index a string field?

A: Suppose there is a mailbox login system, user table design:

mysql> create table SUser( ID bigint unsigned primary key, email varchar(64), ... ) engine=innodb;Copy the code

Mysql > select f1, f2 from SUser where email=’ XXX ‘; For email, you can create an index as follows:

mysql> alter table SUser add index index1(email); Alter table SUser add index index2(email(6)); alter table SUser add index index2(email(6)); // Create a prefix indexCopy the code

Creating a full index results in something like the following image

Creating a prefix index results in something like the following imageAs you can see from the figure, the email(6) index structure uses only the first 6 bytes (i.e. zhangs) for each mailbox field, so it takes up less space. This is the advantage of using a prefix index.

At the same time, however, there is the potential for additional record scans. Because the second one is less discriminative, if you do itselect id,name,email from SUser where email='[email protected]';This statement, with the full index, only needs to find it once, whereas with the prefix index, because the prefix is incomplete, it can only filter those not with the prefix, so it needs four times to return the table comparison.

That is, using a prefix index with a defined length can save space without adding too much query cost.

In fact, we focus on differentiation when building indexes, and the higher the differentiation, the better. Because higher differentiation means fewer duplicate keys.

And the prefix index not only affects the number of times the table is returned, but it also does not support overwriting the index, because on the secondary index, it only stores partial data, so it must find the full data back to the table, so overwriting the index is not used.

If the prefix differentiation is not high, for example, the id number of our country, a total of 18 digits, among which the first 6 digits are address code, so the first 6 digits of the ID number of people in the same county will generally be the same. However, the longer the index is, the more disk space will be occupied, the less index values can be placed on the same data page, and the lower the search efficiency will be. In this case, we can use reverse order storage, and then create a prefix index, so that the distinction is high.

You can also use hash fields. You can create a second integer field on the table to hold the ID check code and create an index on this field.

Conclusion:

  • Create the full index directly, which may take up more space.
  • Create prefix indexes, which save space, but increase the number of query scans, and cannot use overwrite indexes.
  • Reverse order storage, and then create prefix index, used to bypass the character string prefix distinction is not enough;
  • Create hash index, stable query performance, extra storage and calculation cost, same as the third method, does not support range scan.

Question: If the code already controls the uniqueness of the data, for performance reasons, would you prefer a unique index or a plain index?

A: This choice is analyzed from a query and update perspective, for example, as shown below:

  • Select ID from T where k=5.

    • For a normal index, after finding the first record that meets the condition (5,500), the next record is searched until the first record that does not meet the condition k=5 is encountered.
    • For unique indexes, since the index defines uniqueness, the search stops after the first record that meets the condition is found.

    Is unique index performance high? InnoDB reads data by page. The default data page is 16K. When a record with k=5 is found, the data page is stored in memory. So, for a normal index, the extra “find and judge the next record” operation requires only one pointer search and one calculation. If K=5 happens to be the last record on the page, that’s a little bit more trouble, but it’s very unlikely. So overall, there is little difference in query performance.

  • The update process

    • For normal indexes, if a data page is updated directly if it is in memory, or if the data page is not in memory, it is written to the change buffer (which can be persisted) and then applied to the original page. This process is called merge.

    • For unique indexes, if a data page is updated and no conflicts are determined if the data page is in memory, the update is performed. If the data page is not in memory, it cannot use the change buffer because it needs to determine whether it is conflicting. It can only read the data page from disk and determine the update. Reading data from disk into memory involves random IO access and is one of the most expensive operations in a database. The change Buffer will significantly improve update performance because it reduces random disk access.

    Back to the question, how to choose between a normal index and a unique index. In fact, there is no difference between the two types of indexes in terms of query capability, but the main consideration is the impact on update performance. Therefore, I recommend choosing plain indexes as much as possible.

Question:change bufferUse scenarios?

A: Use change Buffer to speed up the update process. Change Buffer is only used for normal indexes, not unique indexes. However, change Buffer is not suitable for all normal indexes. The main purpose of the change buffer is to cache recorded changes. So before a data page is merged, the more changes recorded in the change buffer (that is, the more times the page has to be updated), the greater the benefit. Therefore, for businesses that write too much and read too little, the probability of the page being accessed immediately after writing is relatively small. In this case, the use of change Buffer is the best. This business model is common in billing and logging systems. On the other hand, if a business update pattern is that a query is made immediately after a write is written, then even if the condition is met, the update is recorded in the Change Buffer first, but then the merge process is triggered immediately because the data page is to be accessed immediately. This will not reduce the number of random ACCESS to IO, but increase the maintenance cost of change buffer.

Q: The change buffer is written to memory at first. If the machine is powered down and restarted at this time, will the change buffer be lost? Change buffer loss is not a trivial matter. The merge process is not used to read data from the disk. Could that happen?

A: Not lost. Although only memory is updated, the change buffer is recorded in the redo log during the transaction, so it can be retrieved during crash recovery.

Question: When DBA Xiao Lu joined the new company, he found that there was a table in the database he took over to maintain. The table structure was defined as follows:

CREATE TABLE `geek` (
  `a` int(11) NOT NULL,
  `b` int(11) NOT NULL,
  `c` int(11) NOT NULL,
  `d` int(11) NOT NULL,
  PRIMARY KEY (`a`,`b`),
  KEY `c` (`c`),
  KEY `ca` (`c`,`a`),
  KEY `cb` (`c`,`b`)
) ENGINE=InnoDB;
Copy the code

His colleague told him that due to historical reasons, this table needs a and B as the joint primary key, which Xiao Lu understood. However, since the primary key contains a, B these two fields, that means that a separate index on the field C, already contains three fields ah, why to create “CA” and “cb” these two indexes? His colleagues told him it was because they had two statements in their business:

select * from geek where c=N order by a limit 1;
select * from geek where c=N order by b limit 1;
Copy the code

My question to you is, is this colleague’s explanation correct? Are both indexes necessary for both query modes? Why is that? Answer: CA index can be removed, cb index can be retained. Ca index, through the index to filter data, back to the table, a itself is the primary key index, so can ensure the order; Cb index, there is no index on b, ab index can not meet the leftmost matching principle, can be retained to speed up the sorting speed.

Question: In which cases do indexes need to be created?

A:

  • The primary key automatically creates a unique index
  • Fields frequently used as query criteria
  • The foreign key relationship is used to index the fields associated with other tables in the query
  • Single key/composite index selection problem, high concurrency tends to create composite index
  • A sorted field in a query that is significantly faster through index access
  • Statistics or grouping fields in the query

Question: In what cases cannot indexes be created?

MySQL lock related

Question: What are the types of locks?

A: According to the scope of locking, MySQL locks can be roughly divided into global locks, table locks and row locks.

  • Global locking A global lock locks the entire database instance. MySQL provides a method for adding a global read lock by using Flush tables with read lock (FTWRL). A typical use scenario of global locking is to perform a full logical backup. Of course, if you do not use a global lock, you can also start a transaction at the repeatable read isolation level to complete the backup, because the repeatable read isolation level ensures that the consistent view is retrieved. You must be wondering why you need FTWRL with this functionality. Consistent reads are fine, but only if the engine supports this isolation level. For example, with a non-transactional engine like MyISAM, if updates are made during a backup and only the latest data is always retrieved, the consistency of the backup is broken. At this point, we need to use the FTWRL command.

    You may ask why not use set Global ReadOnly =true if you want the entire library to be read-only. The reason is that

    • First, on some systems, the value of readonly is used for other logic, such as determining whether a library is primary or secondary. Therefore, modifying global variables has more impact and is not recommended.

    • Second, there are differences in the mechanism of exception handling. If the client is abnormally disconnected after executing the FTWRL command, MySQL will automatically release the global lock and the entire library will return to a normal update state. If a client exception occurs, the database will remain in the ReadOnly state. As a result, the entire library will remain unwritable for a long time, which is a high risk.

  • Table level locks are classified into two types: table locks and meta Data locks (MDL).

    • Table lock table syntax is lock tables… Read /write, which can be released using unlock tables or automatically when the client is disconnected. Note that the Lock Tables syntax, in addition to limiting reads and writes by other threads, also limits what the thread can manipulate next.

    • Metadata lock (MDL) MDL does not need to be used explicitly and is automatically added when accessing a table. The function of MDL is to ensure the correctness of read and write. It can be divided into read and write locks. If a query is iterating through a table and another thread changes the table structure and deletes a column during execution, the query thread will get results that do not match the table structure.

  • Row locking MySQL row locking is implemented by the engines themselves at the engine level. However, not all engines support row locking, such as MyISAM engine does not support row locking. A row lock is a lock on a row in a table. This makes sense, for example, if transaction A updates A row and transaction B updates the same row, it must wait for transaction A to complete the update. In InnoDB transactions, row locks are added when they are needed, but are not released immediately when they are not needed, but wait until the end of the transaction. This is the two-phase lock protocol. Because of this, if you need to lock more than one row in your transaction, put the locks that are most likely to cause lock conflicts and affect concurrency as late as possible. Because of this, the row lock on the row most likely to cause a conflict does not remain in a transaction for very long.

Problem: Your MySQL is down. When you log on to the server, the CPU consumption is close to 100%, but the entire database is performing less than 100 transactions per second. What is the reason for this?

Question: How do I safely add fields to a small table?

A: First we need to resolve long transactions, which will hold the MDL lock until they commit. In the innodb_TRx table of MySQL’s Information_SCHEMA library, you can look up the transaction currently executing. If the table to which the DDL changes are to be made happens to have a long transaction executing, consider suspending the DDL first or killing the long transaction. However, if the table you want to change is a hot table with a small amount of data but a high number of requests, kill may not work because new requests are coming in soon. An ideal mechanism would be to set the wait time in the ALTER TABLE statement. If you can get the MDL write lock within the specified wait time, it is better not to block subsequent business statements. The process is then repeated by the developer or DBA with a retry command.

Question: If you want to delete the first 10,000 rows of a table, there are three ways to do it:

Delete from T limit 10000; The second option is to loop delete from T limit 500 20 times in one connection; Third, delete from T limit 500 is performed on all 20 connections simultaneously. Which approach would you choose? Why is that? A: The second way is relatively good. In delete from T limit 10000, a single statement takes a long time and the lock takes a long time. Large transactions also cause master-slave delays. The third option (delete from T limit 500 on all 20 connections) creates an artificial lock conflict.

other

Question:select count(*)Why is it slow?

A: Select count(*) slow actually refers to the InnoDB storage engine, because it needs to read data line by line from the engine and then accumulate the count. In MyISAM, count(*) is fast because it’s a statistic, so just return it. So why can’t InnoDB use statistics? This is because even with multiple queries at the same time, InnoDB tables “should return how many rows” is uncertain due to multi-version concurrency control (MVCC).

Question:select count(*)What can we do with such a slow time?

A: We can put the count directly into a separate count table C in the database. But you can’t put it in Redis. The reason why putting counts in Redis does not guarantee accurate consistency between counts and MySQL tables is that the system consisting of two different stores does not support distributed transactions and cannot get an accurate and consistent view. Putting the count in MySQL solves the problem of consistent views. Because of MVCC(multi-version concurrency mechanism), different transactions can see different totals, so it is not possible to put them directly into Redis.

Question: Which statistic should I use for count(*), count(primary key ID), count(field), and count(1)?

Answer: First the semantics of count(). Count () is an aggregate function that evaluates the returned result set line by line, incrementing the total if the count function argument is not NULL, otherwise it is not incremented. Finally returns the cumulative value. So, count(*), count(primary key ID), and count(1) all represent the total number of rows that return a result set that meets the criteria; Count (field) indicates the total number of rows in which the parameter “field” is not NULL.

  • For count(primary key ID), the InnoDB engine iterates through the table, fetching the ID value for each row and returning it to the server layer. The server layer gets the ID, determines that it cannot be empty, and adds it up by row.

  • For count(1), the InnoDB engine traverses the entire table without taking a value. The server layer adds a number “1” to each row returned, judging that it cannot be empty, and adds it up. If you just look at the difference between these two uses, you can see that count(1) executes faster than count(primary key ID). Because returning the ID from the engine involves parsing the rows and copying the field values.

  • For count(field) :

    • If the “field” is defined as not null, read the column from the record line by line.

    • If the “field” definition is allowed to be NULL, then the value must be extracted to determine if it is not null.

  • Count (*), however, is an exception. Instead of fetching all fields, count(*) is optimized for no value. Count (*) must not be null.

So the conclusion is: in order of efficiency, count(field)<count(primary key ID)<count(1)≈count(*), so I recommend that you use count(*) as much as possible.

Question: If only one line of statements is found to be slow, what is the reason? How to check?

A: If the MySQL database itself is under a lot of pressure, causing the database server to have a high CPU usage or iOUTIL (IO utilization), then all statements may be slow to execute. If this is not the case, consider the following directions.

  • A table lock is most likely locked. You can use show ProcessList to see what state the statement is in.This is an MDL lock (metadata lock). This type of problem can be solved by finding out who owns the MDL write lock and then killing itsys.schema_table_lock_waitsWith this chart, we can go right to the blockageprocess id, disconnect the connection with the kill command.

  • Select * from t sys.innodb_lock_waits WHERE locked_table=’test’.’t’\G;

  • Slow query if none of the above is consistent, you can enable slow query analysis and compare the following two sentences:

    Select * from t where id=1; select * from t where id=1 lock in share modeCopy the code

    SQL: lock in share mode SQL: lock in share mode SQL: lock in share mode SQL: lock in share mode Select * from t where ID =1 is a consistent read statement. Undo log must be executed for the current read statement.

Question: Suppose there are 1 million rows in the table, and the value of b for 100,000 rows is’ 1234567890 ‘.

mysql> CREATE TABLE `table_a` (
  `id` int(11) NOT NULL,
  `b` varchar(10) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `b` (`b`)
) ENGINE=InnoDB;
Copy the code

Mysql > select * from table_A where b=’1234567890abcd’; In this case, how does MySQL perform?

A: Ideally, MySQL will see that field B defines vARCHar (10), which must return null. Unfortunately, MySQL doesn’t do this. Otherwise, if ‘1234567890abcd’ is matched in the index, it will not be able to quickly determine that the value does not exist in the index tree B, and will return a null result soon. But actually, MySQL doesn’t do that either. The execution of this SQL statement is slow, and the flow looks like this: character truncation is done on the way to engine execution. Since the engine only defines this line to be of length 10, only the first 10 bytes are truncated, which is’ 1234567890 ‘. There are 100,000 rows of data that satisfy this condition; Because it’s select *, we’re going to do 100,000 loops; SQL > select * from ‘1234567890abcd’ where ‘1234567890abcd’; The return result is null.