My own summary of Java learning some knowledge points and interview questions, has been open source, will always improve, welcome advice and guidance welcome Star: github.com/Snailclimb/…

Books recommended

High Performance MySQL: Edition 3

Text tutorial recommendation

MySQL Tutorial

MySQL教程

Video Tutorials

Basics: Close contact with MySQL – MOOCs

Mysql development skills: Mysql development skills (a) Mysql development skills (b) Mysql development skills (c)

Mysql5.7 new features and related optimization tips: MySQL optimization for Mysql5.7 new feature performance optimization

MySQL Cluster (PXC) Introduction MyCAT introduction

FAQ Summary

  • ① Storage Engine

    Two storage engines common to MySQL: MyISAM and InnoDB love hate

  • ② Character set and calibration rules

    A character set refers to a mapping from a binary encoding to a class of character symbols. Collation rules refer to collation rules under certain character sets. Each character set in Mysql has a set of collation rules.

    Mysql uses an inheritance like approach to specify default values for character sets. Each database and table has its own default values, which are inherited layer by layer. For example, the default character set for all tables in a database is the character set specified by the database (the tables use the default character set only when no character set is specified)

    For details, see: MySQL character set and collation rules

  • Index related content (database use is very key technology, reasonable and correct use of index can greatly improve the query performance of the database)

    The data structures used by Mysql index are BTree index and hash index. For hash index, the underlying data structure is hash table, so in the vast majority of requirements for a single record query, you can choose hash index, query performance is the fastest; In most scenarios, you are advised to select the BTree index.

    The Mysql BTree index uses B+Tree, but the two main storage engines implement it differently.

    MyISAM: the data field of the B+Tree node stores the addresses of data records. During index retrieval, the index is searched according to the B+Tree search algorithm. If the specified Key exists, the value of its data field is retrieved, and the corresponding data record is read using the value of the data field as the address. This is called a “non-clustered index”.

    InnoDB: Its data files are themselves index files. Compared with MyISAM, the index file and the data file are separated. The table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of the Tree saves complete data records. The key of this index is the primary key of the table, so the InnoDB table data file itself is the primary index. This is called a clustered index (or clustered index). The rest of the indexes are secondary indexes. The data field of the secondary index stores the value of the corresponding record primary key rather than the address, which is also different from MyISAM. When searching according to the main index, data can be retrieved by directly finding the node where the key is located. When searching by secondary index, we need to fetch the primary key value first, and then walk through the primary index. Therefore, when designing tables, it is not recommended to use long fields as primary keys or non-monotonic fields as primary keys, as this can cause frequent splitting of primary indexes. PS: Collated from “Java Engineer Training”

    For details, please refer to:

    Dry stuff: Data structures for mysql indexes

    MySQL Optimization series 3 — Use, Principles, and design optimization of indexes

  • ④ Query the use of cache

    My.cnf add the following configuration, restart Mysql to enable query cache

    query_cache_type=1
    query_cache_size=600000
    Copy the code

    Mysql can also enable query cache by running the following command

    set global  query_cache_type=1;
    set global  query_cache_size=600000;
    Copy the code

    As shown above, if the query cache is enabled, the results are directly returned in the cache under the same query conditions and data. The query criteria include the query itself, the current database to be queried, and the client protocol version number, which may affect the results. So any two queries that differ on any character will result in a cache miss. In addition, if the query contains any user-defined functions, storage functions, user variables, temporary tables, or system tables in the Mysql library, the query results will not be cached.

    After the cache is created, Mysql’s query caching system keeps track of each table involved in the query. If these tables (data or structure) change, all cached data associated with that table is invalidated.

    Although caching can improve the performance of database queries, it also brings additional overhead. After each query, a cache operation must be done, and destruction must be done after failure. Therefore, be careful about enabling cached queries, especially for write-intensive applications. If this function is enabled, properly control the size of the cache space. Generally, it is appropriate to set the size to tens of MB. Sql_cache and SQL_NO_cache can also be used to control whether a query needs to be cached:

    select sql_no_cache count(*) from usr;
    Copy the code
  • ⑤ Transaction mechanism

    Relational databases must follow the ACID rule, which reads as follows:

  1. Atomicity: Transactions are the smallest unit of execution and do not allow splitting. The atomicity of the transaction ensures that the action either completes completely or does not work at all;
  2. Consistency: Data is consistent before and after a transaction is executed.
  3. Isolation: when accessing the database concurrently, a user’s things are not disturbed by other things, and the database is independent between the concurrent transactions.
  4. Persistence: After a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails.

To achieve the above transaction features, the database defines several different transaction isolation levels:

  • READ_UNCOMMITTED: The lowest isolation level that allows uncommitted data changes to be read, possibly resulting in dirty, illusory, or unrepeatable reads

  • READ_COMMITTED: Allows concurrent transactions to read data that has already been committed. This can prevent dirty reads, but phantom or unrepeatable reads can still occur

  • REPEATABLE_READ: Multiple reads of the same field give the same result, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.

  • SERIALIZABLE: The highest isolation level, fully subject to the ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads. But this severely affects the performance of the program. This level is also not typically used.

    REPEATABLE_READ isolation level used by Mysql READ_COMMITTED isolation level used by Oracle.

    The implementation of transaction isolation mechanism is based on locking mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-version Concurrency Control), which supports concurrent consistent read and rollback by saving modified old version information.

    See: Probably the most beautiful Spring transaction management tutorial yet

  • Lock mechanism and InnoDB lock algorithm

    MyISAM and InnoDB storage engines use locks:

    • MyISAM uses table-level locking.
    • InnoDB supports row-level locking and table-level locking. The default row-level locking is performed

    Table level locking vs. row level locking

    • Table-level lock: the lock with the largest granularity in Mysql. It locks the entire table in the current operation. It is simple to implement, consumes less resources, and locks quickly, avoiding deadlocks. It has the largest locking granularity, the highest probability of triggering lock conflicts, and the lowest concurrency. Both MyISAM and InnoDB engines support table-level locking.
    • Row-level lock: the lock with the smallest granularity in Mysql. Only the current row is locked. Row-level locking can greatly reduce conflicts in database operations. It has the smallest locking granularity and high concurrency, but the cost of locking is also the largest, and locking is slow, resulting in deadlock.

    For details, see: Mysql lock mechanism

    InnoDB storage engine lock algorithm has three kinds:

    • Record Lock: A lock on a single row Record
    • Gap Lock: A Gap lock that locks a range, excluding the record itself
    • Next-key lock: Record +gap locks a range, including the record itself

    Related knowledge:

    1. Innodb uses next-key lock for row queries
    2. Next-locking keying to solve Phantom Problem
    3. Demote the next-key lock to a Record key when the query index contains unique attributes
    4. Gap locks are designed to prevent multiple transactions from inserting records into the same range, which can cause phantom problems
    5. There are two ways to explicitly close gap locks :(use only record locks except for foreign key constraints and uniqueness checks) A. Set the transaction isolation level to RC B. Set innodb_locks_unsafe_for_binlog to 1
  • ⑦ Large table optimization

    When the number of MySQL single table records is too large, the CRUD performance of the database will be significantly reduced. Some common optimization measures are as follows:

  1. Limit the scope of data: It is important to prohibit queries that do not contain any conditions that limit the scope of data. For example, when users query the order history, we can control it within a month. ;

  2. Read/write separation: the classical database split scheme, the master library is responsible for writing, the slave library is responsible for reading;

  3. Caching: Use MySQL’s cache. For heavy, less-updated data, you can consider using application-level caching.

  4. Vertical partition:

    Split according to the correlation of the tables in the database. For example, if the user table contains both the user login information and the user’s basic information, you can split the user table into two separate tables, or even put them into separate libraries.

    To put it simply, vertical splitting is the splitting of data table columns. A table with many columns is split into multiple tables. This should make it a little bit easier to understand.

    Advantages of vertical split: Smaller row data, fewer blocks to read during query, and fewer I/ OS. In addition, vertical partitioning simplifies table structure and is easier to maintain.

    Disadvantages of vertical split: Redundant primary keys, need to manage redundant columns, and may cause Join operations, which can be solved by joining at the application layer. In addition, vertical partitioning makes transactions more complex;

  5. Horizontal zoning:

    Keep the data table structure unchanged and store the data shards with some policy. In this way, each piece of data is dispersed to different tables or libraries, achieving the purpose of distribution. Horizontal splitting can support very large amounts of data.

    Horizontal splitting is the splitting of index table rows. When the number of table rows exceeds 2 million, it will slow down. At this time, the data of a table can be split into multiple tables to store. For example, we can split the user information table into multiple user information tables to avoid the performance impact of a single table having too much data.

    Water resolution can support very large amounts of data. Note that the split table only solves the problem of large data in a single table, but because the table data is still on the same machine, in fact, there is no significance to improve MySQL concurrency, so it is best to split the database.

    Horizontal splitting can support very large amount of data storage and less application side transformation, but it is difficult to solve fragmented transactions, poor Join performance of cross-border points and complicated logic. The author of “The Way to Train Java Engineers” recommends avoiding data sharding as much as possible because of the complexity of logic, deployment, and operation and maintenance. A typical data table can support less than 10 million data volumes with proper optimization. If sharding is necessary, choose client sharding architecture to reduce network I/O with middleware.

    Here are two common solutions for database sharding:

    • Client proxy: The sharding logic is on the application side, encapsulated in jar packages, and implemented by modifying or encapsulating the JDBC layer. Dangdang’s Sharding-JDBC and Ali’s TDDL are two commonly used implementations.
    • Middleware proxy: Adds a proxy layer between applications and data. The sharding logic is uniformly maintained in middleware services. We are talking about Mycat, 360 Atlas, netease DDB and so on are the realization of this architecture.

For details, see MySQL Large table Optimization

Welcome to pay attention to my wechat public number :”Java interview Customs manual “(a temperature wechat public number, no advertising, simple technology sharing, looking forward to common progress with you ~~~ adhere to the original, share beautiful articles, share a variety of Java learning resources. You want to pay attention to pay attention to, the public account is only my record text and life place, does not care about interests.