preface

This article is mainly for developers, so it does not involve MySQL service deployment and other operations, and more content, we are ready to patience and melon seeds mineral water.

A while ago, I systematically studied MySQL and had some practical operation experience. When I happened to see an interview article related to MySQL, I found that I couldn’t answer some questions well. Although I knew most of the knowledge points, I couldn’t connect them together.

So I decided to do a MySQL Soul 100 quiz to try to answer the questions in a way that makes me understand the knowledge point a little more deeply.

Indexes related to

The MySQL index is linked to the MySQL index principle and its optimization.

1. What is an index?

An index is a data structure that helps us quickly find data.

2. What data structure is an index?

The data structure of index is related to the implementation of specific storage engine. Indexes used in MySQL include Hash index,B+ tree index, etc. The default index of InnoDB storage engine we often use is B+ tree index.

3. What’s the difference between a Hash index and a B+ tree?

The basic implementation principles of Hash index and B+ tree index are as follows: Hash index is a Hash table. When searching, you can call the Hash function once to obtain the corresponding key value, and then query the table to obtain the actual data. The underlying implementation of a B+ tree is a multi-path balanced lookup tree. For each query, it starts from the root node, and the key value can be obtained when the leaf node is found, and then it is judged whether it is necessary to query data back to the table according to the query. So you can see that they have the following differences:

  • Hash indexes are faster for equivalent queries (in general), but not for range queries.

After the hash function is used to create indexes in the hash index, the index order cannot be the same as the original order, and range query cannot be supported. All nodes of a B+ tree follow the rules (the left node is smaller than the parent node, the right node is larger than the parent node, and the same is true for multi-fork trees), which naturally supports the range.

  • Hash indexes do not support sorting by indexes.

  • Hash indexes do not support fuzzy query and left-most prefix matching of multi-column indexes. It also works because hash functions are unpredictable. The indexes of AAAA and AAAAB have no correlation.

  • Hash indexes can always be used to query data back to the table, whereas B+ trees can use indexes only when certain conditions (clustered indexes, overwriting indexes, etc.) are met.

  • Hash indexes, while fast for equivalent queries, are not stable. Performance is unpredictable. When there are a large number of duplicate key values, hash collisions occur, and the efficiency may be very poor. The query efficiency of B+ tree is relatively stable. All queries are from the root node to the leaf node, and the height of the tree is relatively low.

Therefore, in most cases, choosing B+ tree indexes directly can achieve stable and good query speed. Instead of using hash indexes.

4. The above mentioned B+ tree does not need to query data back to the table when it meets the requirements of clustered index and overwritten index. What is clustered index?

In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index. When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.

5. Will non-clustered indexes always be queried back into the table?

Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table. Select age from employee where age < 20; select age from employee where age < 20; select age from employee where age < 20;

6. What are the factors to consider when building an index?

When building an index, you should take into account the frequency of field usage. Fields that are often queried as conditions are more suitable. If you need to build a federated index, you also need to consider the order in the federated index. There are other things to consider as well, such as preventing too much of everything from putting too much pressure on the table. It all has to do with the actual table structure and how queries are made.

7. What is a federated index? Why do I care about the order in a federated index?

MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched. MySQL > create index (name,age,school); MySQL > create index (name,age,school); MySQL > create index (school); When the query is performed, the indexes are only strictly ordered according to name, so the name field must be used for equivalent query first. Then, the matched columns are strictly ordered according to age field, and the age field can be used for index search, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure.

8. Is the created index used? Or how do I know why this statement is running slowly?

MySQL provides the Explain command to view the execution plan of a statement. Before executing a statement,MySQL runs the statement through the query optimizer and then gets an analysis of the statement, known as the execution plan, which contains a lot of information. The possilbe_key,key,key_len, and other fields can be used to analyze whether the index is hit. The possilbe_key,key,key_len, and other fields can be used to describe the possible index, the actual index, and the length of the index.

9. In what cases can an index be created for a column but not used in the query?

  • Use is not equal to query

  • Columns are involved in mathematical operations or functions

  • The left side of the string like is the wildcard. Aaa is similar to the ‘%’.

  • Mysql does not use indexes when parsing full table scans faster than using indexes.

  • When using a federated index, the first condition is a range query, and the second condition cannot use the index even if it complies with the left-most prefix rule.

In this case,MySQL cannot use the index.

Issues related to

1. What are transactions?

Understand what is the most classic transaction is the transfer of chestnut, I believe we also understand, here no longer say side. A transaction is a series of operations that conform to ACID properties. The most common understanding is that the operations in a transaction either all succeed or all fail. But that’s not enough.

2. What is ACID? Can you explain it in detail?

A=Atomicity

Atomicity, as I said above, is all or nothing. It is impossible to perform only part of the operation.

C=Consistency

The system (database) always moves from one consistent state to another; there is no intermediate state.

I=Isolation

Isolation: In general, a transaction is not visible to other transactions until it is fully committed. Notice that in general I put red on the front, which means there are exceptions.

D=Durability

Persistence. Once a transaction is committed, it will remain so forever, even if a system crash does not affect the outcome of the transaction.

3. What about having multiple transactions going on at the same time?

The concurrent execution of multiple transactions generally causes the following problems:

  • Dirty read: transaction A reads uncommitted content from transaction B, and transaction B rolls back later.

  • Non-repeatable reads: Setting transaction A to read only what transaction B has committed will result in two different queries within transaction A, because transaction B committed during this time.

  • Phantom read: transaction A reads A range of contents while transaction B inserts A single piece of data in the meantime. Causing “hallucinations “.

**4. How to solve these problems? MySQL transaction isolation level **MySQL has four isolation levels:

  • READ UNCOMMITTED

This is the exception to the above isolation level where other transactions can see changes that have not been committed by the transaction. This causes the problem of dirty reads (reading uncommitted portions of other transactions that are later rolled back). This level of performance does not have a big enough advantage, but it has many problems, so it is rarely used.

  • READ COMMITTED

Other transactions can only read the committed part of the transaction. This isolation level has the problem of non-repeatable reads, where two reads in the same transaction get different results because another transaction has modified the data.

  • REPEATABLE READ(REPEATABLE READ)

The repeatable read isolation level solves the problem of unrepeatable reads above (as the name indicates), but there is still a new problem of unreal reads when you read id> 10 Rows of data to involve all the lines and read lock, exception a transaction at this time to insert a new article id = 11 data, because it is newly inserted, so don’t trigger lock the rejection of the above transactions in the next query are found to have an id = 11 data, and the last query operation did not get to, then insert will have a primary key Conflict.

  • SERIALIZABLE

This is the highest isolation level and solves all of the problems mentioned above, because it forces all operations to be executed serially, which leads to a very rapid decline in concurrency performance and is therefore not very common.

5. What isolation level does Innodb use?

InnoDB uses repeatable read isolation by default.

6. Do you know about MySQL locks?

When a database has concurrent transactions, data inconsistencies may occur, and some mechanism is needed to ensure the order of access. The locking mechanism is such a mechanism. Just like a hotel room, if people go in and out at random, there will be many people snatches for the same room, and a lock will be installed on the room. Only the person who has obtained the key can enter and lock the room, and others can use it again only after they have finished using it.

7. What locks does MySQL have? Isn’t locking like the above kind of hinders concurrency efficiency?

Will tell from the type of lock, have share lock and exclusive lock. Shared lock: also known as read lock. When the user wants to read the data, a shared lock is placed on the data. Multiple shared locks can be added simultaneously. Exclusive lock: also known as write lock. An exclusive lock is placed on the data when the user writes to it. Only one exclusive lock can be added, and other exclusive locks and shared locks are mutually exclusive. In the above example, there are two kinds of user behaviors. One is to view the house. It is acceptable for multiple users to view the house together. One is a real one-night stay, during which neither those who want to stay nor those who want to see the house are allowed. The granularity of locking depends on the specific storage engine. InnoDB implements row-level locking, page-level locking, and table-level locking. Their locking overhead varies from large to large, and their concurrency varies from large to small.

Table structure design

1. Why try to have a primary key?

Primary keys ensure the uniqueness of data rows in the entire table. You are advised to add a self-growing ID column as the primary key even if the table does not have a primary key. After setting the primary key, it is possible to make subsequent deletions faster and ensure the safety of the operation data range.

2. Does the primary key use an autoincrement ID or a UUID?

It is recommended to use an incremented ID instead of a UUID. This is because the InnoDB storage engine uses the primary key index as a clustered index. In other words, the primary key index and all data are stored (in order) on the leaf node of the B+ tree. If the primary key index is an increment ID, then it only needs to be continuously sorted backwards. If the primary key index is a UUID, because the incoming ID and the original size are not determined, there will be a lot of data inserts, data movement, which will cause a lot of memory fragmentation, which will cause insert performance degradation. In general, in the case of large data volumes, the performance is better with auto-increment primary keys. _uuid is a test using the primary UUID key, which tests the performance of inserting 100W rows and 300W rows.

As for the primary key being a clustered index,InnoDB selects a unique key as the clustered index if there is no primary key, and generates an implicit primary key if there is no unique key.

If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

**3. Why is the field required to be not null? **MySQL

NULL columns require additional space in the rowto record whether their values are NULL. For MyISAM tables, each NULL columntakes one bit extra, rounded up to the nearest byte.

Null values take up more bytes and cause a lot of mismatches in your program.

4. What fields should be used to store user password hashes?

Fixed length strings such as password hashes, salt, and user id numbers should be stored in char rather than vARCHar to save space and improve retrieval efficiency.

Storage engine correlation

1. Which storage engines does MySQL support?

MySQL support for multiple storage engines, such as InnoDB, MyISAM, Memory, Archive, etc. In most cases, it is best to use the InnoDB engine, which is the default storage engine for MySQL.

  1. What is the difference between InnoDB and MyISAM?
  • InnoDB supports things, MyISAM does not support things

  • InnoDB supports row-level locking, while MyISAM supports table-level locking

  • InnoDB supports MVCC, while MyISAM does not

  • InnoDB supports foreign keys, while MyISAM does not

  • InnoDB does not support full-text indexing, while MyISAM does.

Scattered problem

MySQL > select varchar from varchar;

Char is a fixed-length field, and if you claim space for char(10), it doesn’t matter how much content is actually stored. Each of these fields takes up 10 characters, while vARCHAR is variable length, which means that the maximum length is requested. The space occupied is the actual character length +1, and how much space was used to store the last character. In terms of retrieval efficiency,char > vARCHar, so in use,char can be used if the length of a field value is determined, otherwise vARCHAR should be used as much as possible. For example, use char to store the MD5 encrypted password of the user.

2. What do varchar(10) and int(10) represent?

Varchar 10 represents the requested space length, is also the maximum length of data can be stored, while int 10 only represents the displayed length, less than 10 bits are filled with zeros. That is,int(1) and int(10) can store the same number of digits and occupy the same amount of space, except that they are displayed by length.

3. How many types of entries are available for MySQL binlog? What’s the difference?

There are three formats, Statement, Row and mixed.

  • In statement mode, the record unit is a statement. That is, the impact of each SQL is recorded. Because SQL execution is contextual, relevant information needs to be saved at the time of saving, and some statements that use functions and the like cannot be recorded and copied.

  • At the row level, the unit of recording is the change of each row. Basically, all the changes can be recorded. However, many operations will result in a large number of changes of rows (such as ALTER table).

  • Mixed. A compromise where statement records are used for normal operations and row records are used when statement is not available.

In addition, the row level has been optimized in the new version of MySQL to record statements instead of row by row when table structure changes.

4. How to deal with large pages?

Large paging is generally handled in two directions.

  • At the database level, which is where we focus most of our attention (though not as much)

    Select * from table where age > 20 LIMIT 100000010

    This statement takes load1000000 data and then basically dumps it all. Fetching 10 is slow of course.

    Select * from table where id in (select id from table where age > 20 limit 1000000,10)

    This also loads a million data, but it is fast because of index overwriting, so all the fields to be queried are in the index.

    Select * from table where ID > 1000000 limit 10 select * from table where ID > 1000000 limit 10

    There are many possibilities for optimization, but the core idea is the same: reduce load data.

  • Reduce this request from a requirement perspective… The main thing is not to do similar requirements (jump directly to a specific page after millions of pages. Allows only page-by-page viewing or following a given path, which is predictable and cacheable) and prevents ID leaks and continuous malicious attacks.

In fact, to solve the problem of large paging, we mainly rely on cache. We can check the content in advance predictably, cache it to redis and other K-V databases, and return it directly. In the Alibaba Java Development Manual, the solution to large paging is similar to the first one mentioned above.

**5. Care about the SQL time in the business system? Statistics too slow query? How are slow queries optimized? ** In the business system, except for queries using primary keys, I will test the time consumption of other queries on the test library. Statistics of slow queries are mainly done by operation and maintenance, and we will regularly feed back the slow queries in the business. Slow query optimization first to understand what is the cause of slow? Does the query condition not match the index? Load unwanted columns? Or too much data? So optimization is going in those three directions,

  • The statement is first analyzed to see if additional data is loaded, perhaps by querying for extra rows and discarding them, or by loading many columns that are not needed in the result. The statement is analyzed and overwritten.

  • Analyze a statement’s execution plan to see how it uses the index, and then modify the statement or index so that the statement matches the index as closely as possible.

  • If statement optimization is no longer possible, consider whether the amount of data in the table is too large, and if so, split the table horizontally or vertically.

6. As mentioned above, can you give an example suitable for the horizontal and vertical sub-tables?

A horizontal table is a table by row. Suppose we have a user table, the primary key is the ID of the user and the primary key is the ID of the user. The amount of data is large, there are more than 100 million, so the query effect in a table at this time is not ideal. We can divide the table according to the primary key ID, either by the tail number, or by the interval of the ID. If 100 tables are divided according to the ending numbers 0-99, each table contains only 100W of data. At this time, the query efficiency can undoubtedly meet the requirements. The vertical table is divided by column. Suppose we now have a list of articles. Contains fields ID – Summary – content. The display form in the system is to refresh a list, which only contains the title and summary. When the user clicks on an article to enter the details, the body content is required. At this point, if there is a large amount of data, putting together a large and infrequently used column of content can slow down the query speed of the original table. We can divide the above table into two pieces: id- summary,id- content. When the user clicks details, the primary key retrieves the content again. The added storage is for small primary key fields. The cost is very small. Of course, sub-tables are actually highly related to business, so you must do research and benchmark before sub-tables. Don’t blindly follow your own assumptions.

7. What are stored procedures? What are the pros and cons?

Stored procedures are pre-compiled SQL statements. 1, the understanding of the more straightforward: the stored procedure is a set of records, it is composed of some T – SQL statement block These T – SQL code as a way to realize some function (for single or multiple tables to add and delete table), and then to the block of code a name, call him when use this function. 2. A stored procedure is a precompiled code block with high execution efficiency. A stored procedure can replace a large number of T_SQL statements and reduce network traffic. Improving the communication rate can ensure data security to a certain extent. However, stored procedures are not recommended in Internet projects. It is famous that the Java Development Manual of Ali prohibits the use of stored procedures My personal understanding is that in Internet projects, the iteration is too fast, the project life cycle is shorter, and the staff flow is more frequent than in traditional projects. In this case, the management of stored procedures is not so convenient, and the reuse is not as good as that written in the service layer.

8. Name three paradigms

First normal form: No column can be split again. Second normal form: non-primary key columns depend entirely on the primary key, not on a part of the primary key. Third normal form: non-primary key columns depend only on primary keys and not on other non-primary keys. When designing a database structure, try to follow the three paradigms, and if not, there must be a good reason for it. In fact, we often compromise database design for the sake of performance.

9. In MyBatis

Got into a weird question… . I just want to document this issue separately because it comes up so often. The incoming content is treated as a string, and $concatenates the incoming value directly into the SQL statement. So # can protect against SQL injection attacks to a certain extent