Advantages: Fast data access speed Disadvantages: Data cannot be permanently stored in a file Advantages: Data is permanently stored Disadvantages: 1) Slower than memory operations and frequent I/O operations. 2) Inconvenient data query data stored in the database 1) data permanent storage 2) using SQL statements, convenient and efficient query. 3) Management data convenience what is SQL? Structured Query Language (SQL) is a database Query Language. MySQL is used to access, query, update, and manage relational database systems. MySQL is a relational database management system developed by MySQL AB, a Swedish company and a product of Oracle. MySQL is one of the most popular Relational Database Management systems, and one of the best RDBMS (Relational Database Management System) applications in WEB applications. It is commonly used in Java enterprise development because MySQL is open source, free and easy to expand. What are the three paradigms of a database? The first paradigm: No column can be split again.

Second normal form: On a first normal form basis, non-primary key columns are completely dependent on the primary key, not part of it.

Third normal form: On a second normal form basis, non-primary key columns depend only on the primary key and not on other non-primary keys.

When designing a database structure, try to follow the three paradigms, and if not, there must be a good reason for it. Like performance. In fact, we often compromise database design for performance.

Mysql servers use permission tables to control user access to the database. Permission tables are stored in the mysql database and initialized by the mysql_install_db script. These permission tables are user, DB, table_priv, columns_priv, and host. The structure and contents of these tables are described as follows:  User Permission table: Records the information about the user accounts that are allowed to connect to the server. The permission in the table is global.  DB permission table: Record the operation rights of each account on each database.  TABLE_priv Permission table: Record the tablespace operation permission.  Columns_priv Permission table: Record the column-level operation permission.  Host Permission table: Use the DB permission table to control database-level operation rights on a given host. This permission list is not affected by GRANT and REVOKE statements.

How many types of entries are available for MySQL binlog? What’s the difference? There are three formats, Statement, Row and mixed.  In Statement mode, every SQL file that modifies data is stored in binlog. You do not need to record the changes of each row, reducing the amount of binlog logs, saving I/O, and improving performance. Because SQL execution is contextual, relevant information needs to be saved at the time of saving, and some statements that use functions and the like cannot be recorded and copied.  At the ROW level, you do not record the SQL statement context information. You only store the modified record. The recording unit is the change of each row. Basically, all the changes can be recorded, but many operations will lead to a large number of changes of rows (such as ALTER table). Therefore, files in this mode save too much information and log too much.  Mixed. A compromise is to use statement for common operations and row for common operations. In addition, the row level has been optimized in the new version of MySQL to record statements instead of row by row when table structure changes.

Data Type Data types of mysql

1. The integer type includes TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, which are integers of 1, 2, 3, 4, and 8 bytes respectively. Any integer type can have an UNSIGNED attribute to indicate that the data is UNSIGNED, that is, a non-negative integer. Length: The length of an integer can be specified. For example, INT(11) indicates an INT of length 11. Length is meaningless in most scenarios. It does not limit the legal range of values, only affects the number of characters displayed, and needs to be used in conjunction with the UNSIGNED ZEROFILL property to be meaningful. For example, if the type is set to INT(5) and the property is UNSIGNED ZEROFILL, the database will actually store 00012 if the user inserts 12. 2. Real number types, including FLOAT, DOUBLE, and DECIMAL. DECIMAL can be used to store integers larger than BIGINT and can store exact decimals. FLOAT and DOUBLE have a range of values and support approximations using standard floating points. FLOAT is much more efficient than a DOUBLE in computing DECIMAL, which you can interpret as a string. 3. The string types include VARCHAR, CHAR, TEXT, and BLOB. VARCHAR stores varium-length strings and saves more space than fixed-length strings. VARCHAR stores string length with an extra 1 or 2 bytes. If the column length is less than 255 bytes, 1 byte is used; otherwise, 2 bytes is used. If the VARCHAR stores more content than the set length, the content is truncated. CHAR is a fixed length that allocates sufficient space based on the defined length of the string. CHAR is padded with Spaces as needed for comparison purposes. CHAR is good for storing very short strings, or all values close to the same length. CHAR also truncates the content stored beyond the set length. Use strategy: For frequently changing data, CHAR is better than VARCHAR because CHAR is less prone to fragmentation. For very short columns, CHAR is more storage efficient than VARCHAR. Be careful to allocate only as much space as you need; sorting longer columns consumes more memory. Avoid the TEXT/BLOB type. Temporary tables are used for query, which incurs significant performance overhead. 4. Enumeration type (ENUM) stores the data that does not repeat as a predefined set. Sometimes you can use ENUM instead of the common string type. ENUM storage is very compact, condensing list values to one or two bytes. ENUM is stored internally as an integer. Avoid using numbers as constants in ENUM enumerations because they are confusing.  The sort is based on the internal storage integer 5. The date and time type is timestamp. The space efficiency is higher than datetime. If you need to store subtlety, you can use BigInt storage. See here, this real question is not easier to answer. Storage engine: How data, indexes, and other objects in MySQL are stored is a set of file system implementation. The common storage engines are as follows: Innodb engine: The Innodb engine supports ACID transactions. Row-level locking and foreign key constraints are also provided. It is designed to handle database systems with large data volumes. MyIASM engine (the default Mysql engine) : It doesn’t support transactions, row-level locking, and foreign keys. MEMORY engine: All data is stored in MEMORY. Data processing is fast but not secure.

MyISAM is different from InnoDB

What is the difference between MyISAM index and InnoDB index?  The InnoDB index is a clustered index, and the MyISAM index is a non-clustered index.  The leaf nodes of InnoDB’s primary key index store row data, so the primary key index is very efficient.  The leaf node of the MyISAM index stores the row data address and needs to be addressed again to get the data. InnoDB leaf nodes with non-primary key indexes store primary key and other indexed columns, so overwriting indexes is very efficient.

 Insert Buffer  Double Write  Adaptive Hash Index  Read Ahead Storage engine choice If there are no specific requirements, use the default InnoDB. MyISAM: Read-write and insert-oriented applications, such as blogging systems, news portals. Innodb: Updates (deletes) frequently, or to ensure data integrity; High concurrency, support for transactions and foreign keys. For example, OA office automation system. Index What is an index? Indexes are special files (indexes on InnoDB tables are part of the table space) that contain Pointers to all the records in the table.

An index is a data structure. A database index is a sorted data structure in a database management system to help query and update data in a database table quickly. Indexes are usually implemented using B trees and their variant B+ trees.

More generally, an index is a table of contents. In order to facilitate the search of the contents of the book, through the content of the index to form a catalog. An index is a file that occupies physical space. What are the advantages and disadvantages of indexes? The advantages of indexes  You can greatly speed up data retrieval, which is the main reason for creating indexes.  Using indexes, you can use the optimization hide during query to improve system performance. The disadvantages of indexes  Time: It takes time to create and maintain indexes. To be specific, indexes need to be maintained dynamically when data in a table is added, deleted, or modified, which reduces the efficiency of adding, changing, and deleting data.  Space: Indexes take up physical space. Index Usage scenario (emphasis) WHERE

In the figure above, the record is queried by ID. Because the ID field is only the primary key index, this SQL execution can select only the primary key index. If there are more than one, it will eventually select the better one as the basis for the retrieval.

Alter table innodb1 add sex char(1); alter table innodb1 add sex char(1); NullEXPLAIN SELECT * from innodb1 where sex=’ male ‘;

Alter table add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index))) Alter table add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index(alter table name add index)))

Join Indexes the fields involved in the match relation (ON) of the JOIN statement to improve efficiency

Indexes cover

If the fields to be queried are all indexed, the engine will query directly in the index table without accessing the raw data (otherwise, it will do a full table scan whenever a field is not indexed), which is called index overwrite. Therefore, we need to write only the necessary query fields after the select as much as possible to increase the chance of index coverage. It’s worth noting here that you don’t want to index every field, because the advantage of using indexes in preference is their small size. What are the types of indexes? 4. What are the types of indexes? Primary key index: Data columns cannot duplicate or be NULL, and a table can have only one primary key. Unique index: Data columns are not allowed to duplicate and NULL values are allowed. A table allows multiple columns to create unique indexes.  You can ALTER TABLE table_name ADD UNIQUE (column);  Create a UNIQUE index. ALTER TABLE table_name ADD UNIQUE (column1,column2); Create unique composite index Plain index: A basic index type that has no restriction on uniqueness and allows NULL values.  You can ALTER TABLE table_name ADD INDEX index_name (column);  Creating common indexes You can ALTER TABLE table_name ADD INDEX index_name(column1, column2, column3); Create composite index full text index: is a key technology used by search engines at present.  You can ALTER TABLE table_name ADD FULLTEXT (column); The data structure of the index is related to the implementation of the specific storage engine. The indexes used in MySQL include hash index, B+ tree index, etc. The default index of InnoDB storage engine we often use is B+ tree index. For hash index, the underlying data structure is hash table, so in the vast majority of requirements for a single record query, you can choose hash index, query performance is the fastest; In most scenarios, you are advised to select the BTree index. Mysql uses the storage engine to fetch data. Almost 90% of people use InnoDB. According to the implementation, there are only two index types: BTREE index and HASH index. B-tree index is the most frequently used index type in Mysql database. Almost all storage engines support BTree index. Mysql > select * from BTREE; select * from BTREE; select * from BTREE;

Query method: Primary key Index area :PI(address of the associated saved data) Press the primary key to query data. Common index area: SI (address of the associated ID, and then the address above). B+tree: 1.) N subtrees contain n keys, and do not store data. 2.) All leaf nodes contain information of all keywords and Pointers to records containing these keywords, and leaf nodes themselves are linked in large order according to the size of keywords. 3.) All non-terminal nodes can be regarded as index parts, which only contain the maximum (or minimum) keyword in the sub-tree. 4.) In B+ tree, data objects are inserted and deleted only on leaf nodes. 5.) B+ trees have two head Pointers, one is the root node of the tree and the other is the leaf node of the minimum key code. 2) HASH index Briefly, similar to data structure HASH table (HASH table), when we use HASH index in mysql, mainly through the HASH algorithm (common HASH algorithms include direct addressing, square center, fold, divisor mod, random number method), Convert database field data into a fixed-length Hash value and store the corresponding position of the Hash table along with the row pointer of this data; If a Hash collision occurs (two different keywords have the same Hash value), they are stored in a linked list under the corresponding Hash key. Of course, this is just a rough simulation.

Fundamentals of Indexing Indexing is used to quickly find records that have specific values. If there is no index, the query will generally traverse the entire table. The principle of indexing is simple: turn unordered data into ordered queries 1. Sort the contents of indexed columns 2. Generate an inversion table for sorting results. 3. Spell the data address chain 4 on the inversion list. In the query, first get the inversion list content, and then take out the data address chain, so as to get the specific data index algorithm what? Index algorithms include BTree algorithm and Hash algorithm BTree algorithm BTree is the most commonly used index algorithm of mysql database and the default algorithm of mysql. Because it can be used not only on the =,>,>=,<,<= and between comparison operators, but also on the like operator, as long as its query condition is a constant that does not begin with a wildcard, for example: Select * from user where name like ‘jack%’; select * from user where name like ‘jack%’; Select * from user where name like ‘%jack’; select * from user where name like ‘%jack’; Hash algorithm Hash The Hash index can only be used for peer comparison, for example, the =,<=> (equivalent to the =) operator. Because it is a positioning data, unlike the BTree index, which needs to access the page node from the root node to the branch node for many IO visits, the retrieval efficiency is much higher than that of the BTree index. Principles of index design? 1. Suitable columns for indexing are those that appear in the WHERE clause, or those specified in the join clause 2. Use a short index. If you are indexing a long string column, you should specify a prefix length. This saves a lot of index space. Don’t over-index. Indexes require additional disk space and reduce write performance. When table contents are modified, the index is updated or even reconstructed, and the more index columns, the longer this takes. So keep only the indexes you need to help the query. Mysql > select * from left prefix; mysql > select * from left prefix; mysql > select * from left prefix; mysql > select * from left prefix; mysql > select * from left prefix; mysql > select * from left prefix; A = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and C > 3 and d = 4 4) It is not suitable to create index columns if the data cannot be effectively distinguished (such as gender, male and female unknown, at most three, too low distinction) 5) as far as possible to expand the index, do not create new indexes. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index. 6) foreign key columns must be indexed. 7) For those columns that are rarely involved in the query, do not create indexes for those columns with a high number of duplicate values. 8) Do not index columns of data types defined as text, image, and bit. There are three ways to create an index. The first way is to delete an index: CREATE TABLE user_INDEx2 (id INT auto_increment PRIMARY KEY, first_name VARCHAR (16), last_name VARCHAR (16), id_card VARCHAR (18), information text, KEY name (first_name, last_name), FULLTEXT KEY (information), UNIQUE KEY (id_card) ); ALTER TABLE table_name ADD INDEX index_name (column_list); 1 ALTER TABLE creates a common index, UNIQUE index, or PRIMARY KEY index. Table_name indicates the name of the table to which the index is to be added. Column_list indicates the column to which the index is to be added. If there are multiple columns, the columns are separated by commas. The index name index_name is self-naming. By default, MySQL assigns a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time. 3. Run the CREATE INDEX command to CREATE CREATE INDEX index_name ON table_name (column_list). CREATE INDEX Can add a normal or UNIQUE INDEX to a table. Alter table user_index DROP KEY name; alter table user_index drop KEY name; alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information; Alter table table name drop primary key; It is worth noting here that this operation cannot be performed directly if the primary key grows (self-growth depends on the primary key index) :

Alter table user_index — Redefine the field MODIFY ID int, drop the PRIMARY KEY but usually do not drop the PRIMARY KEY because the design of the PRIMARY KEY must not be related to the business logic. What should I pay attention to when creating an index?  Non-null Fields: You should specify a NOT NULL column unless you want to store NULL. Columns with null values are difficult to query optimize in mysql because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string;  Place the difference between variable values before the joint index. You can use the count() function to view the difference value of the field. The larger the return value is, the more unique values the field has, the higher the dispersion of the field is.  The smaller the index field, the better: The database stores more data on a page. The more data you obtain in an I/O operation, the more efficient it is. Does using indexed queries necessarily improve query performance? Why is it usually faster to query data through an index than a full table scan? But we must also be aware of the costs.  Indexes need space to store and need regular maintenance. The index itself is modified whenever records are added or deleted from the table or index columns are changed. This means that each INSERT, DELETE, and UPDATE record will cost 4 or 5 more disk I/ OS. Because indexes require extra storage and processing, unnecessary indexes can slow query response times. Using INDEX queries may not improve query performance. INDEX RANGE SCAN queries are applicable to two situations:  Based on a range query, the general query returns a result set less than 30% of the number of records in the table.    Based on non-unique indexes retrieve millions or more data. Because index files need extra maintenance costs, when we add, modify and delete data, additional operations on index files will be generated, which will consume extra IO and reduce the efficiency of add/change/delete. So, when we delete millions of database data, check the MySQL official manual to see that the speed of deleting data is proportional to the number of indexes created. 1. If we want to delete millions of data, we can delete the index first (which takes about three minutes) 2. Then delete the unnecessary data (this process takes less than two minutes). 3. After the deletion is complete, re-create the index (at this time, there is less data). 4. With the previous direct delete is definitely a lot faster, not to mention in case of delete interruption, all delete will be rolled back. That’s even worse. Prefix index syntax: index(field(10)) : uses the first 10 characters of a field value to create an index. By default, the entire content of a field is used to create an index. Prerequisite: The identifier of the prefix is high. Passwords, for example, are good for prefix indexing because they are almost always different. Practical difficulty: the length of the prefix cut. Select count(*)/count(distinct left(password,prefixLen)); By adjusting the prefixLen value (incremented from 1) to see an average match for different prefix lengths, close to 1 (prefixLen characters representing a password almost determine a single record) what is the leftmost prefix rule?  The left-most matching rule Just as the name implies, the left-most is first. When you create a multi-column index, place the most frequently used column in the WHERE clause on the left.  The left-most prefix matching rule is a very important rule. Mysql keeps matching to the right until it encounters a range query (>, <, between, like). For example, if a = 1 and b = 2 and C > 3 and d = 4, D (a,b,d,c); d (a, B,d,c); d (a, B,d);  You can set up (A,b, C) indexes in any order. The mysql query optimizer will help you   The difference between B trees and B+ trees. In a B tree, you can store keys and values on internal nodes and leaf nodes. But in a B+ tree, the inner nodes are all keys and have no values, and the leaf nodes hold both keys and values.  The leaf nodes of the B+ tree are connected by a chain, but the leaf nodes of the B tree are independent.

Benefits of using B-trees B-trees can store keys and values in internal nodes at the same time, so placing frequently accessed data near the root node greatly improves the efficiency of hot data queries. This feature makes b-trees more efficient in scenarios where a particular data is queried repeatedly. The advantages of using a B+ tree Because the internal nodes of a B+ tree only store keys, not values, you can obtain more keys in the memory page in a single read, which helps to narrow down the search more quickly. The leaf nodes of B+ tree are connected by a chain. Therefore, when a full data traversal is needed, B+ tree only needs O(logN) time to find the smallest node, and then O(N) sequential traversal through the chain is enough. B trees need to iterate over each level of the tree, which requires more memory replacement times and therefore more time. The basic implementation principles of Hash index and B+ tree index are as follows: Hash index is a Hash table. When searching, you can call the Hash function once to obtain the corresponding key value, and then query the table to obtain the actual data. The underlying implementation of a B+ tree is a multi-path balanced lookup tree. For each query, it starts from the root node, and the key value can be obtained when the leaf node is found, and then it is judged whether it is necessary to query data back to the table according to the query. The differences are the following:  Hash indexes do equivalent queries faster (in general), but they do not do range queries. After the hash function is used to create indexes in the hash index, the index order cannot be the same as the original order, and range query cannot be supported. All nodes of a B+ tree follow the rules (the left node is smaller than the parent node, the right node is larger than the parent node, and the same is true for multi-fork trees), which naturally supports the range.  Hash indexes do not support index sorting.  Hash indexes do not support fuzzy query and left-most prefix matching for multi-column indexes. It also works because hash functions are unpredictable. The indexes of AAAA and AAAAB have no correlation.  Hash indexes can always be used to query data back into the table, but B+ trees can use indexes only if they meet certain conditions (clustered indexes, overwrite indexes, etc.).  Hash indexes are faster on equivalent queries, but they are unstable. Performance is unpredictable. When there are a large number of duplicate key values, hash collisions occur, and the efficiency may be very poor. The query efficiency of B+ tree is relatively stable. All queries are from the root node to the leaf node, and the height of the tree is relatively low. Therefore, in most cases, choosing B+ tree indexes directly can achieve stable and good query speed. Instead of using hash indexes.  THE DATABASE uses THE B+ tree instead of the B tree. The B+ tree is only suitable for random retrieval, and the B+ tree supports both random and sequential retrieval.  The B+ tree space utilization is higher, reducing I/O times and reducing disk read and write costs. Generally, indexes themselves are too large to be stored in memory, so indexes are often stored on disk as index files. In this case, disk I/O consumption is incurred during index lookups. The internal nodes of the B+ tree do not have Pointers to the specific information about keywords, but are used as indexes. The internal nodes of the B+ tree are smaller than those of the B tree. The number of keywords in the nodes that can be contained in the disk block is larger, and the number of keywords that can be searched in the memory at a time is larger. IO read and write times are the biggest factor affecting index retrieval efficiency.  The B+ tree provides stable query efficiency. B-tree search may end at non-leaf nodes, and the closer it is to the root node, the shorter the record search time is. As long as the key word is found, the existence of the record can be determined, and its performance is equivalent to a binary search in the whole set of keywords. However, in B+ tree, sequential retrieval is more obvious. In random retrieval, any keyword must be searched from the root node to the leaf node. All keyword search paths have the same length, resulting in the same query efficiency of each keyword.  The B-tree improves disk I/O performance but does not solve the inefficiency of element traversal. The leaf nodes of a B+ tree are connected together sequentially using Pointers, and the entire tree can be traversed simply by traversing the leaf nodes. Moreover, range-based queries are very frequent in the database, and B trees do not support such operations.  Adding and deleting files (nodes) is more efficient. Because the leaf node of B+ tree contains all keywords and is stored in an ordered linked list structure, the efficiency of addition and deletion can be greatly improved. In a B+ tree, the leaf node may store the current key value, or the current key value, as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index. When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table. What is a cluster index?  The clustered index and the non-clustered index  The clustered index: If you place the data store and the index together, you can find the data  the non-clustered index: Myisam uses key_buffer to cache the index in memory. When it needs to access data (through the index), myISam directly searches the index in memory, and then finds the corresponding data on disk through the index. This is why indexes are slow when they are not hit in the key buffer. Innodb, above the clustering index created index called secondary index, auxiliary index access data is always need a second search, the clustering index is auxiliary index, as a composite index, the prefix index, the only index, auxiliary index leaf node storage is no longer the physical location, but the primary key When to use the cluster index and the clustering index

Must a non-clustered index be queried back into the table? Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table. Select age from employee where age < 20; select age from employee where age < 20; select age from employee where age < 20; What is a federated index? Why do I care about the order in a federated index? MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched. MySQL > create index (name, age, school); MySQL > create index (name, age, school); MySQL > create index (name, age, school); When the query is performed, the indexes are only strictly ordered according to name, so the name field must be used for equivalent query first. Then, the matched columns are strictly ordered according to age field, and the age field can be used for index search, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure. Transactions What are database transactions? Transaction is an indivisible sequence of database operations and the basic unit of database concurrency control. The result of its execution must make the database change from one consistency state to another. A transaction is a logical set of operations that either all or none of them execute.

The most classic and often cited example of a transaction is the transfer of money.

If Xiao Ming wants to transfer 1000 yuan to Xiao Hong, the transfer will involve two key operations: reducing Xiao Ming’s balance by 1000 yuan and increasing Xiao Hong’s balance by 1000 yuan. If something goes wrong between these two operations like the banking system crashes, and Ming’s balance goes down and Red’s balance doesn’t go up, that’s not right. A transaction is a guarantee that both of these critical operations will either succeed or fail.

What are the four properties of ACID? Relational databases must follow the ACID rule, which reads as follows:

1. Atomicity: Transactions are the smallest unit of execution and are not allowed to be split. The atomicity of the transaction ensures that the action either completes completely or does not work at all; 2. Consistency: Data is consistent before and after a transaction is executed. Multiple transactions read the same data with the same result. 3. Isolation: when accessing the database concurrently, a user’s transaction will not be disturbed by other transactions, and the database is independent between the concurrent transactions; 4. Persistence: After a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails. What is dirty reading? Phantom read? Unrepeatable?  Drity Read: One transaction updates one piece of data, and another transaction reads the same piece of data. For some reason, the first transaction rolls back, and the data Read by the second transaction is incorrect.  Non-repeatable Read: Data is inconsistent between the two queries of a transaction. This may be because a transaction updated data was inserted in the middle of the two queries.  Phantom Read: If the number of pens in a transaction is inconsistent between two queries, for example, if one transaction queries for rows and the other one inserts new columns, the previous transaction will find columns that it did not have in the next query.

What is the isolation level of a transaction? What is the default isolation level for MySQL? In order to achieve the four characteristics of transaction, the database defines four different transaction isolation levels, which are Read uncommitted, Read committed, Repeatable Read, Serializable. The four levels solve the problems of dirty reads, unrepeatable reads, and phantom reads one by one.

The SQL standard defines four isolation levels:  Read-uncommitted: The lowest isolation level. Data changes that have not been committed are allowed to be READ. These changes may cause dirty reads, illusionary reads, or unrepeatable reads.  Read-committed: Allow concurrent transactions to READ data that has been COMMITTED. This prevents dirty reads, but phantom or unrepeatable reads may occur. REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This prevents dirty and unrepeatable reads, but phantom reads are possible. SERIALIZABLE: The highest isolation level, which is ACID. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

REPEATABLE_READ isolation level used by Mysql READ_COMMITTED isolation level used by Oracle

The implementation of transaction isolation mechanism is based on locking mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-version Concurrency Control), which supports concurrent consistent read and rollback by saving modified old version information.

Most database systems have read-committed isolation: because the lower the isolation level, the fewer locks are COMMITTED, but remember that InnoDB storage engine uses **REPEATABLE READ ** by default without any performance penalty.

InnoDB storage engine typically uses the **SERIALIZABLE ** isolation level for distributed transactions.

MySQL > lock MySQL > lock When a database has concurrent transactions, data inconsistencies may occur, and some mechanism is needed to ensure the order of access. The locking mechanism is such a mechanism.

Just like a hotel room, if people go in and out at random, there will be many people snatches for the same room, and a lock will be installed on the room. Only the person who has obtained the key can enter and lock the room, and others can use it again only after they have finished using it.

Relationship between isolation levels and Locks At the Read Uncommitted level, shared locks are not required to Read data so that it does not conflict with exclusive locks on modified data

At the Read Committed level, shared locks are added to Read operations but released after the statement is finished.

In Repeatable Read level, Read operations need to add the shared lock, but the shared lock is not released before the transaction is committed, that is, the shared lock must be released after the transaction is completed.

SERIALIZABLE is the most restrictive isolation level because it locks the entire range of keys and holds the lock until the transaction completes.

What are the database locks by lock granularity? In relational databases, database locks can be divided into row-level locks (InnoDB engine), table-level locks (MYISAM engine) and page-level locks (BDB engine) according to the granularity of locks.  Locking for MyISAM and InnoDB storage engines: MyISAM uses table-level locking. InnoDB supports row-level locking and table-level locking. The default row-level locking is row-level locking. Compared with row-level locking and page-level locking, InnoDB provides the most fine-grained locking in Mysql. Row-level locking can greatly reduce conflicts in database operations. Its locking particle size is the smallest, but the locking cost is also the largest. Row-level locks are divided into shared locks and exclusive locks.

Features: high overhead, slow lock; Deadlocks occur; The lock granularity is the lowest, the probability of lock conflict is the lowest, and the concurrency is the highest.

Table level lock Table level lock is the lock with the largest granularity in MySQL. It locks the entire table in the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table level locks are classified into shared table read locks (shared locks) and exclusive table write locks (exclusive locks).

Features: low overhead, fast lock; No deadlocks occur; The lock granularity is large, and the probability of lock conflict is high and the concurrency is low.

Page-level lock Page-level lock is a type of lock whose granularity is in the middle between row-level lock and table-level lock in MySQL. Table level locking is fast but has many conflicts, while row level locking is slow but has few conflicts. So a compromise page level is taken, locking adjacent sets of records at a time.

Features: Overhead and locking time are between table and row locks; Deadlocks occur; The locking granularity is between table locks and row locks, and the concurrency is average

What locks does MySQL have? Locking like this is a little bit inefficient in terms of concurrency in terms of types of locks, there are shared locks and exclusive locks.

Shared lock: also known as read lock. When the user wants to read the data, a shared lock is placed on the data. Multiple shared locks can be added simultaneously.

Exclusive lock: also known as write lock. An exclusive lock is placed on the data when the user writes to it. Only one exclusive lock can be added, and other exclusive locks and shared locks are mutually exclusive.

In the above example, there are two kinds of user behaviors. One is to view the house. It is acceptable for multiple users to view the house together. One is a real one-night stay, during which neither those who want to stay nor those who want to see the house are allowed.

The granularity of locking depends on the specific storage engine. InnoDB implements row-level locking, page-level locking, and table-level locking.

Their locking overhead goes from high to low, and so does their concurrency. MySQL InnoDB engine row lock how to implement? A: InnoDB does row locking based on indexes

Select * from tab_with_index where id = 1 for update;

For UPDATE, you can do row locking based on conditions, and id is a column with an index key. If ID is not an index key, InnoDB will do table locking and concurrency will be no longer possible. Gap lock: The Gap lock applies to a range, but the record itself does not Next-key lock: Record + Gap Applies to a range. Innodb uses next-key lock for rows 2.Next-locking keying to solve Phantom Problem 3. Demoting a next-key lock to a Record key when the index of the query has a unique attribute 4. The Gap lock is designed to prevent multiple transactions from inserting records into the same range, which can cause phantom read problems 5. There are two ways to explicitly close gap locks :(use only record locks except for foreign key constraints and uniqueness checks) A. Set transaction isolation level to RC B. Set innodb_locks_unsafe_for_binlog to 1. What is a deadlock? How to solve it? A deadlock is a vicious cycle in which two or more transactions occupy each other’s resources and request to lock each other’s resources.

Common solutions to deadlocks

1. If different programs concurrently access multiple tables, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlocks.

2, in the same transaction, as far as possible to lock all the resources needed to reduce the probability of deadlock;

3. For services that are prone to deadlocks, upgrade locking granularity can be used to reduce the probability of deadlocks by table-level locking.

You can use distributed transaction locks if the business is not doing well or you can use optimistic locks what are optimistic and pessimistic locks for databases? How do you do that? The task of concurrency control in a database management system (DBMS) is to ensure that the isolation and unity of transactions and the unity of the database are not broken when multiple transactions simultaneously access the same data in the database. Optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main techniques used in concurrency control.

Pessimistic locking: Shielding all operations that might violate data integrity, assuming concurrency conflicts. The transaction is locked after the data is queried until the transaction is committed. Implementation: use the locking mechanism in the database

Optimistic locking: Data integrity violations are checked only at commit time, assuming no concurrency conflicts will occur. The transaction is locked while the data is being modified, using version locking. Implementation: Music will generally use the version number mechanism or CAS algorithm implementation.

Two types of lock usage scenarios

From the introduction of the two kinds of lock, we know that the two kinds of lock have their own advantages and disadvantages, can not be considered better than the other kind, for example, optimistic lock is suitable for the situation of less write (multi-read scenario), that is, conflict is really rare, this can save the lock overhead, increase the overall throughput of the system.

However, in the case of overwrite, conflicts often arise, which can cause the upper application to be repeatedly retry, thus reducing performance. Pessimistic locking is suitable for overwrite scenarios. Views Why use views? What is a view? To improve the reusability of complex SQL statements and the security of table operations, the MySQL database management system provides the view feature. A view is essentially a virtual table that does not physically exist and contains a list of named columns and rows similar to a real table. However, views do not exist in the database as stored data values. The row and column data comes from the base table referenced by the query that defines the view and is generated dynamically when the view is specifically referenced.

Views improve the security of data in the database by allowing developers to focus only on specific data they are interested in and specific tasks they are responsible for, and only see the data defined in the view rather than the data in the tables referenced by the view. What are the characteristics of views? Views have the following characteristics:

 The columns of a view can come from different tables, which is an abstraction of the table and a logical new relationship.  The view is a table (virtual table) generated by the base table (real table).  View creation and deletion do not affect the basic table.  Updates to view content (adding, deleting, and modifying) directly affect the base table.  If the view comes from multiple base tables, data cannot be added or deleted.

Operations on a view include creating a view, viewing a view, deleting a view, and modifying a view. What are the usage scenarios for views? View basic purpose: Simplify SQL queries and improve development efficiency. If there is another use, it is to be compatible with older table structures.

The following are common usage scenarios for views:

 Reuse SQL statements;  Simplify complex SQL operations. After you write a query, you can easily reuse it without knowing its basic query details;  Use parts of a table instead of the entire table;  Protect data. Users can be granted access to specific parts of a table rather than the entire table;  Change the data format and presentation. Views can return data that is different from the presentation and format of the underlying table.

Advantages of views 1. Simple query. Views simplify user operations. 2. Data security. Views enable users to view the same data from multiple perspectives and provide security for confidential data. 3. Logical data independence. Views provide a degree of logical independence for refactoring the database. Performance. If the view is defined by a complex multi-table query, then even a simple query for a view will take the database some time to turn into a complex combination.

2. Modify restrictions. When the user tries to modify some rows of the view, the database must translate it into changes to some rows of the base table. In fact, the same is true when inserting or deleting from a view. This is convenient for simple views, but may not be modifiable for more complex views

These views have the following characteristics: 1. Views with collection operators such as UNIQUE. 2. View with GROUP BY clause. 3. Views with aggregate functions such as AVG\SUM\MAX. 4. Views that use the DISTINCT keyword. 5. Join table views (with some exceptions) What is a cursor? A cursor is a data buffer created by the system for users to store the execution results of SQL statements. Each cursor area has a name. The user can retrieve records one by one through a cursor and assign them to the main variable for further processing by the main language. Stored Procedures and Functions What is a stored procedure? What are the pros and cons? A stored procedure is a precompiled SQL statement that has the advantage of allowing modular design, meaning that it needs to be created once and can be called multiple times later in the program. If an operation requires multiple SQL executions, using stored procedures is faster than simply executing SQL statements. advantages

1) Stored procedures are precompiled and run efficiently.

2) Stored procedure code is directly stored in the database, through the stored procedure name directly call, reduce network communication.

3) High security. Users with certain permissions are required to execute stored procedures.

4) Stored procedures can be reused to reduce the workload of database developers.

disadvantages

1) Debugging is troublesome, but debugging with PL/SQL Developer is very convenient! Make up for that shortcoming.

2) Migration issues, database side code is of course database related. But if you are doing engineering projects, there are basically no migration problems.

3) Recompile problem, because the back-end code is compiled before run, if the object with reference relationship changes, the affected stored procedures, packages will need to be recompiled (but can also be set to automatically compile at run time).

4) if in a program in the system a lot of the use of stored procedures, used to program delivery time with the increase of user requirements will lead to the change of the data structure, then there is the system of related problems, and finally, if the user wants to maintain the system can be said to be the very difficult, and the price is unprecedented, to maintain more of a problem. Trigger What is a trigger? What are the use scenarios for triggers? Triggers are special event-driven stored procedures defined by users on relational tables. A trigger is a piece of code that is automatically executed when an event is triggered. Usage Scenarios  Changes can be cascaded through related tables in the database.  Monitor the changes of a field in a table in real time and take appropriate actions.  For example, you can generate some service numbers.  Be careful not to abuse it. It may cause maintenance difficulties for the database and applications.  Keep the basics in mind. The point is to understand the difference between the data type CHAR and VARCHAR and the difference between the table storage engine InnoDB and MyISAM. What triggers are available in MySQL? There are six types of triggers in MySQL database: Before Insert After Insert Before Update After Update Before Delete After Delete Common SQL Statements What are the following types of SQL statements Data Ddefinition Language (DDL) CREATE, DROP, ALTER

Mainly for the above operations that have operations on logical structures, including table structures, views and indexes.

Data Query Language (DQL) SELECT

This is better understood as a query operation with the select keyword. All simple queries and connection queries belong to DQL.

Data Manipulation Language (DML) INSERT, UPDATE, and DELETE

DQL and DML jointly construct the add, delete, change and check operations commonly used by most junior programmers. Queries are a special kind of DQL.

Data Control Language (DCL) GRANT, REVOKE, COMMIT, ROLLBACK

Mainly for the above operations, that is, database security integrity and other operations, can be simply understood as permission control. What are superkeys, candidate keys, primary keys, and foreign keys?  Superkeys: The set of attributes that uniquely identify a tuple in a relationship are called relational superkeys. An attribute can be used as a superkey, or a combination of attributes can be used as a superkey. Superkeys contain candidate keys and primary keys.  Candidate key: A minimum superkey, that is, a superkey with no redundant elements.  Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify the stored data object. A data column can have only one primary key, and the value of the primary key cannot be missing, that is, cannot be Null.  Foreign key: The primary key of another table that exists in one table is called the foreign key of that table.

What kinds of SQL constraints are there? What kinds of SQL constraints are there?

NOT NULL: The content of the control field cannot be NULL. UNIQUE: The control field content cannot be repeated. A table can have multiple UNIQUE constraints. PRIMARY KEY: This is also used for control fields, but only one field is allowed in a table. FOREIGN KEY: The actions you take to prevent breaking connections between tables also prevent illegal data from being inserted into the FOREIGN KEY column, because it has to be one of the values in the table to which it points. CHECK: Controls the value range of the field.  CROSS JOIN INNER JOIN  LEFT JOIN and RIGHT JOIN  combine query (UNION and UNION ALL)  FULL JOIN  CROSS connect  SELECT * FROM A,B(,C) or SELECT * FROM A CROSS JOIN B(CROSS JOIN C)# SELECT * FROM A,B WHERE a.id = b.id or SELECT * FROM A INNER JOIN B ON A.id=B.id INNER JOIN can be shortened to JOIN

There are three types of   equal connection: ON A.id= B.ID  Unequal connection: ON A.id > B.ID   self-connection: SELECT * FROM A T1 INNER JOIN A T2 ON T1.id=T2.pid

 LEFT JOIN/RIGHT JOIN: LEFT OUTER JOIN, the LEFT table is the majority. The LEFT table is first query, and the RIGHT table is matched based ON the association conditions after ON. Select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; You can shorten it to RIGHT JOIN Joint query (UNION and UNION ALL)  SELECT * FROM A UNION SELECT * FROM B UNION…  Aggregate multiple result sets. The results before UNION are the benchmark. Note that the number of columns in the joint query should be the same.  Efficiency UNION is higher than UNION ALL FULL JOIN MySQL does not support FULL JOIN. You can use LEFT JOIN and UNION and RIGHT JOIN to  SELECT * FROM A LEFT the JOIN B ON Anderson, d = B.i d UNIONSELECT * FROM A RIGHT JOIN B ON Anderson, d = B.i d table junction surface questions have 2 table, 1 R and 1 S, R table with ABC three columns, CD S table has two columns, each have three records in the table. R table

S table

Select r.,s. from r,s

Select r.,s. from r inner join s on r.c=s.c

Select r.,s. from r left join s on r.c=s.c

Select r.,s. from r right join s on R.c =s.c

Select r.,s. from r full join s on r.c=s.c

What is a subquery? 1. Condition: The query result of one SQL statement is the condition or result of another query statement. Nested: Multiple SQL statements are nested. The internal SQL query statements are called subqueries. The result set is a single value. The parent query uses the =, <, and > operators.  — Who is the highest-paid employee? select * from employee where salary=(select max(salary) from employee); 2. Subqueries are multi-row, single-column cases: the result set is similar to an array and the parent query uses the: in operator

O O — Query who is the highest paid employee? select * from employee where salary=(select max(salary) from employee);

Select * from subquery where subquery is multi-row, multi-column O o o o o o o — 1) select * from ‘where’ select * from ‘where’ — 2) select * from ‘where’ — 3) select * from ‘where’ — 4) select * from ‘where’ — 4) select * from ‘where’ Find all employees with equal department ids. select * from dept d, (select * from employee where join_date > ‘2011-1-1’) e where e.dept_id = d.id; Use table join: select d., E. From dept d inner join employee e on D.id = e.dept_id where e.join_date > ‘2011-1-1’ There are differences between in and exists in mysql The in statement in mysql hashes the outer table to the inner table, while the exists statement loops the outer table to query the inner table. It has long been thought that exists is more efficient than in statements, but this is not accurate. This is the distinction between the environment. 1. If two tables are of the same size, there is little difference between in and exists. 2. If one of the two tables is small and the other is large, exists is used for the large subtable, and in is used for the small subtable. 3. Not in and NOT EXISTS: If not in is used in a query statement, a full table scan is performed on both the internal and external tables without an index. The not extsts subquery can still be used for indexes on the table. So regardless of the size of the table, not exists is faster than not in. Differences between vARCHar and char Char Features  CHAR indicates a fixed length string.  If the inserted data length is less than the fixed length of char, fill it with Spaces.  Because the length is fixed, the storage speed is much faster than VARCHar, even 50% faster. But because the length is fixed, it takes up extra space.  For char, the maximum number of characters you can store is 255. It has nothing to do with the code.  Store as much data as you insert.  VARCHar is the opposite of char in storing data. It stores data slowly because the length is not fixed, but because of this, varchar does not occupy extra space. It replaces space with time.  For VARCHar, the maximum number of characters you can store is 65532

In short, a combination of performance (faster CHAR) and disk space savings (smaller VARCHAR) is a good way to design your database in a specific situation. Varchar (50) stores up to 50 characters. Varchar (50) takes up the same amount of space as hello (200), but the latter consumes more memory when sorting. This is because order by COL uses fixed_length to calculate col length (as does the memory engine). In earlier versions of MySQL, 50 stood for the number of bytes; now it stands for the number of characters. The meaning of 20 in int(20) refers to the length of the display character. 20 indicates that the maximum display width is 20, but it still occupies 4 bytes of storage, and the storage range remains unchanged. Mysql does not affect internal storage, but only affects the int with zerofill definition, in front of the number of zeros, easy to report display mysql why this design does not make sense for most applications, just specify some tools used to display the number of characters; Int (1) and int(20) store and calculate the same;  the difference between int(10) and char(10) and VARCHAR (10)  the 10 of int(10) indicates the length of the data to be displayed, not the size of the data to be stored. The 10 in chart(10) and VARCHar (10) indicates the size of the stored data, that is, how many characters are stored. Varchar (10) is a 10-bit variable string containing a maximum of 10 characters. The value is a 4-bit fixed string containing a maximum of 10 characters.  CHAR (10) Stores 10 fixed-length characters.  USE Spaces to fill up the space.  VARCHar (10) stores 10 varium-length characters. This is different from char(10) Spaces, which are placeholders that don’t count as a character what’s the difference between FLOAT and DOUBLE? FLOAT data can store up to eight decimal digits and take up to four bytes of memory. DOUBLE data can store up to 18 decimal digits and take up 8 bytes of memory. The differences between DROP, DELETE, and TRUNCate all indicate deletion, but they have some differences:

Therefore, when a table is no longer needed, use drop; When you want to delete some rows, use delete; Truncate is used when deleting all data from a reserved table. The difference between a UNION and a UNION ALL?  Repeating rows are not merged if UNION ALL is used.  Efficient UNION Is Higher than UNION ALL. How do I locate and optimize performance problems in SQL statements? Is the index being used? Or how do I know why this statement is running slowly? The most important and effective way to locate low performance SQL statements is to use an execution plan. MySQL provides the Explain command to view the execution plan of a statement. As we know, no matter what kind of database or database engine, there are many related optimizations in the execution of a SQL statement. For query statements, the most important optimization method is the use of indexes. And the execution plan, is to show the database engine for SQL statement execution details, including whether to use the index, what index to use, the use of the index information.

The information ID contained in the execution plan consists of a set of numbers. Represents the execution order of subqueries in a query.  ID The same execution sequence starts from top to bottom.  THE ID is different. A larger ID has a higher priority and is run earlier.  The NULL ID indicates a result set. You do not need to use the null id. It is often used in query statements including union. Select_type Specifies the query type of each subquery. Some common query types.

Table query data. When querying data from derived tables, x indicates the corresponding execution plan ID of the table partitions. The table can be partitioned by the specified column when the table is created. Here’s an example:

create table tmp ( id int unsigned not null AUTO_INCREMENT, name varchar(255), PRIMARY KEY (id)) engine = innodbpartition by key (id) partitions 5; Type (very important, ALL Scanning the entire table data  INDEX  Iterate index  Range Index query  INDEx_SubQuery Use ref unique_subquery Use eq_ref in the subquery  REF_OR_NULL  FULLTEXT Uses full-text index  REF  Uses non-unique indexes to query data eq_ref Uses PRIMARY KEYorUNIQUE NOT Null index association in the join query. Possible_keys Possible index. Note that possible_keys may not be used. If there is an index on the field involved in the query, the index will be listed. When this column is NULL, it is time to consider whether the current SQL needs to be optimized.

Key Displays the actual index used by MySQL in the query. If no index is used, the value is NULL.

TIPS: If an overwritten index is used in a query, it only appears in the key list

Key_length Indicates the length of the index

Ref indicates the join match criteria for the above table, that is, which columns or constants are used to find values on indexed columns

Rows returns the estimated number of result sets, which is not an exact value.

The information of EXTRA is very rich, common ones are:

3. Use filesort to sort files Using non-indexed columns. 例 句 : The goal of SQL optimization can be found in the Ali development manual

SQL performance optimization objectives: at least range level, requirements are ref level, conSTS is the best. Note: 1) Consts has at most one matching row (primary key or unique index) in a single table. Data can be read in the optimization stage. 2) ref means normal index is used. 3) range Retrieves the index in range. Counter example: explain table results, type=index, index physical file full scan, speed is very slow, this index level comparison range is also low, and full table scan is nothing.

SQL life cycle? 1. The application server establishes a connection with the database server. 2. Parse and generate an execution plan, perform 4. Read data into memory and perform logical processing 5. Send the result to the client through the connection in Step 1. 6. Close the connection to release resources

SQL > select * from shema; select * from shema; 2. Second add cache, memcached, redis; 3. Master/slave replication and read/write separation. 4. Vertical split, according to the coupling degree of your module, divide a large system into many small systems, that is, distributed system; For tables with large data volume, this step is the most troublesome and can test the technical level. It is necessary to choose a reasonable Sharding key. In order to have good query efficiency, the table structure should also be changed to make certain redundancy and the application should also be changed. Instead of scanning all tables;

How to handle large pages?  Large pagination is generally handled in two directions. At the database level, this is what we mostly focus on (though it doesn’t matter), and there is room for some kind of optimization like select * from table WHERE age > 20 limit 100000010. This statement takes load1000000 data and then basically dumps it all. Fetching 10 is slow of course. Select * from table where id in (select id from table where age > 20 limit 1000000,10) This also loads a million data, but it is fast because of index overwriting, so all the fields to be queried are in the index. There are many options for optimization, but the main point is to reduce load.  Reduce requests from the point of view of requirements. The main thing is not to do similar requirements (jump directly to a specific page millions of pages later. Allows only page-by-page viewing or following a given path, which is predictable and cacheable) and prevents ID leaks and continuous malicious attacks.

In fact, to solve the problem of large paging, we mainly rely on cache. We can check the content in advance predictably, cache it to redis and other K-V databases, and return it directly. In the Alibaba Java Development Manual, the solution to large paging is similar to the first one mentioned above.

[Recommendation] Use deferred association or subquery to optimize super multi-page scenarios. Note: MySQL does not skip the offset line, but select offset+N, and return N. When offset is very large, it is very inefficient to either control the total number of pages returned, or perform SQL rewriting if the number of pages exceeds a certain threshold. Example: Quickly locate the ID segment to be obtained, and then associate it: SELECT a.* FROM table 1 (select id from table 1 where LIMIT 100000,20) b where a.id=b.id mysql pagination LIMIT clause can be used to force select statement to return specified number of records. LIMIT accepts one or two numeric parameters. The argument must be an integer constant. If you are given two arguments, the first parameter specifies the offset of the first row to return, and the second parameter specifies the maximum number of rows to return.  the offset of the initial row is 0(not 1).  mysql> SELECT * FROM table LIMIT 5,10;  mysql> SELECT * FROM table LIMIT 95,-1 to retrieve all rows FROM an offset to the end of the recordset, set the second parameter to -1.  mysql> SELECT * FROM table LIMIT 5; In other words, LIMIT n is equivalent to LIMIT 0,n. Slow query logs record SQL logs whose execution time exceeds a threshold. Slow query logs are used to locate slow query logs and provide reference for optimization. Enable slow log query

Configuration item: slow_query_log

Use show variables like ‘slov_query_log’ to check whether the slov_query_log is enabled. If the status is OFF, use set GLOBAL slow_query_log = on to enable the slov_query_log function. It will generate an XXx-slow.log file under datadir.

Set critical time

Configuration item: long_query_time

Check: show VARIABLES like ‘long_query_time’, in seconds

Set: set long_query_time=0.5

The real time should be set from long time to short time, that is, the slowest SQL optimization away

Check the log. Once the SQL exceeds the critical time we set, it will be recorded in xxx-slow.log. Statistics too slow query? How are slow queries optimized? In the business system, except for the query using the primary key, I will test the time on the test library. The statistics of the slow query are mainly done by the operation and maintenance, and the slow query in the business will be fed back to us regularly. Slow query optimization first to understand what is the cause of slow? Does the query condition not match the index? Load unwanted columns? Or too much data? So the optimization was based on these three points. First, analyze the statement to see if extra data was loaded. It might be that redundant rows were queried and discarded, or columns that were not needed in the results were loaded.  Analyze the execution plan of the statement and get the index usage. Then modify the statement or the index so that the statement matches the index as much as possible.  If you cannot optimize the statement, consider whether the amount of data in the table is too large. If so, split the table horizontally or vertically. Why try to have a primary key? Primary keys ensure the uniqueness of data rows in the entire table. You are advised to add a self-growing ID column as the primary key even if the table does not have a primary key. After setting the primary key, it is possible to make subsequent deletions faster and ensure the safety of the operation data range. Does the primary key use an autoincrement ID or a UUID? It is recommended to use the autoincrement ID instead of the UUID.

Because in InnoDB storage engines, the primary key index as a clustering index, that is, the primary key index of B + tree leaves node stores the primary key index, and all the data (in order), if the primary key index is the ID, so you just need to constantly backward arrangement, if it is a UUID, due to the size of the ID with the arrival of the original not sure. It causes a lot of data inserts, a lot of data movement, and then a lot of memory fragmentation, which in turn degrades insert performance.

In general, in the case of large data volumes, the performance is better with auto-increment primary keys.

As for the primary key being a clustered index, InnoDB selects a unique key as the clustered index if there is no primary key, and generates an implicit primary key if there is no unique key. Why is the field required to be not NULL? Null values take up more bytes and cause a lot of mismatches in your program. If you want to store user password hashes, what fields should be used for storage? Fixed length strings such as password hashes, salt, and user id numbers should be stored in char rather than vARCHar to save space and improve retrieval efficiency.  Data Access during the query Process Decreases the query performance due to Too much Data Access  Determine whether the application retrives too much data. The possible cause is too many rows or columns  Check whether the MySQL server analyzes too many unnecessary data rows  Avoid the following SQL statement  Querying unnecessary data.  Multitable association returns all columns. The solution: Specify the column name  Always return all columns. The solution: Avoid using SELECT *  to query the same data repeatedly. Solution: You can cache the data, and the next time you read the  cache is scanning for additional records. The solution:  Use Explain. If you find that the query scans a lot of data but returns only a few rows, you can optimize it by using the following tips:  Use index overwrite scan to place all columns in the index so that the storage engine does not need to go back to the table to retrieve rows.  Change the database and table structure and change the data table paradigm  rewrite the SQL statements so the optimizer can run queries in a better way. Optimization of long hard query  one complex queries or more simple query  MySQL internal per second can scan millions of lines of data in memory, in contrast, the response data to the client will be much slower  using the query as small as possible is good, but sometimes will be a big query is decomposed into multiple small query, it is very necessary.  Sharding Queries  Splitting a large query into multiple identical queries  Deleting 10 million data at once costs more than deleting 10,000 data at once and stopping for a while.  Decompose associated queries to make caching more efficient.  Running a single query reduces lock contention.  It is easier to split the database by associating it at the application layer.  Query efficiency will improve significantly.  Minimize redundant record query.  Count () will ignore all the columns and just count all the columns. Don’t use count(column name). In MyISAM, count() without any where conditions is very fast.  When there is a WHERE condition, MyISAM’s count count is not necessarily faster than other engines.  You can use the explain query approximation instead of count().  Add the aggregate table  Use the cache optimization associated query  Determine whether there is an index in the ON or USING clauses.  Make sure that GROUP BY and ORDER BY have only one table column so MySQL can use indexes.  Replace associated query with  Optimizing GROUP BY and DISTINCT  The two types of query data can be optimized using indexes, which is the most effective optimization     using identity column grouping for associated query. MySQL > select ORDER BY NULL from GROUP BY; WITH ROLLUP Super aggregation can be moved to the   The LIMIT page  The LIMIT offset is large, but the query efficiency is low.  You can record the maximum ID of the last query. UNION query UNION ALL The efficiency is Higher than UNION WHERE. For this type of test, explain how to locate inefficient SQL statements and locate the cause of inefficient SQL statements. Start with the index. Consider the above aspects, data access questions, long difficult query sentences, or some specific type of optimization questions, and answer them one by one. Some methods of SQL statement optimization? 1. To optimize queries, avoid full table scans and start by building indexes on where and ORDER BY columns. 2. Avoid null values in the WHERE clause. Otherwise, the engine will abandon the index and conduct full table scan.   SELECT ID from t where num is null– You can set the default value of num to 0 to ensure that the num column does not have a null value.  SELECT ID from t where num= 3 Use in where clauses should be avoided! = or <> otherwise the engine will abandon the index and perform a full table scan.   avoid using OR in the WHERE clause.    select ID from t where num=10 or num=20. Select ID from t where num=10 union all select ID from t where num=20 5. Be cautious about using in and not in. Otherwise, you will cause full table scanning.   select id from t where num in(1,2,3) — for consecutive numbers, do not use in: Select ID from t where num between 1 and 3 6. The following query will also cause a full table scan: select ID from t where name like ‘% li %’ 7. Using parameters in the WHERE clause also causes a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; It must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is unknown and therefore cannot be used as an input for index selection.   SELECT ID from t where num=@num– You can force the query to use the index: select ID from t with(index(index name)) where num=@num 8 Expression operations on fields in the WHERE clause should be avoided as much as possible, which can cause the engine to abandon indexes for a full table scan.   select ID from t where num/2=100. Try to avoid functional manipulation of fields in the WHERE clause, which will cause the engine to abandon indexes for full table scans. Such as:   select id from t where substring(name,1,3)= ‘ABC’ — name  id from t where name like ‘ABC %’ Do not perform functions, arithmetic operations, or other expression operations to the left of the “=” in the WHERE clause, or the system may not use the index properly.  The throughput bottleneck is usually caused by the database access speed.  As the application runs, more and more data is stored in the database, and the processing time slows down.  The data is stored on disk, and the read and write speed cannot be compared with the memory speed. Reduce system bottlenecks, reduce resource occupancy, and increase system response speed. Database structure optimization A good database design scheme for the performance of the database will often get twice the result with half the effort.

You need to consider data redundancy, speed of query and update, and whether the data type of the field is reasonable.

Split a table with many fields into multiple tables

For a table with many fields, if some fields are used infrequently, you can separate these fields to form a new table.

Because when a table has a large amount of data, it is slowed down by the presence of infrequently used fields.

Add intermediate tables

For tables that require frequent joint queries, you can create intermediate tables to improve query efficiency.

By creating an intermediate table, you insert the data that needs to be queried through the federated query into the intermediate table, and then change the original federated query to a query against the intermediate table.

Add redundant fields

The design of data tables should follow the rules of the paradigm theory as far as possible, reduce the redundant fields as far as possible, and make the database design look delicate and elegant. However, reasonable addition of redundant fields can improve the query speed.

The more normalized a table is, the more relationships there are between tables, the more queries need to be joined, and the worse the performance.

Note:

If the value of a redundant field is changed in one table, you have to find a way to update it in another table, otherwise you will have data inconsistency problems. MySQL database CPU up to 500% When the CPU increases to 500%, run the top command of the operating system to check whether mysqld is occupied. If not, find out the processes with high CPU usage and handle the problem.

If mysqld is the cause, show processList to see if there is a session running in it. Find the high SQL consumption to see if the execution plan is accurate, if the index is missing, or if there is simply too much data.

In general, it is important to kill these threads (and see if CPU usage drops), and then re-run the SQL after making appropriate adjustments (such as adding indexes, changing SQL, changing memory parameters).

It is also possible that each SQL table consumes a small amount of resources, but suddenly, a large number of sessions are connected, resulting in a CPU spike. In this case, the application needs to analyze why the connection number increases, and then make corresponding adjustments, such as limiting the number of connections, how to optimize the large table? A table has nearly ten million data, CRUD is slow, how to optimize? How is cent library cent table done? What problem does cent table cent library have? Does middleware work? Do you know how they work? When the number of MySQL single table records is too large, the CRUD performance of the database will be significantly reduced. Some common optimization measures are as follows:

 Limiting the data range: You must prohibit queries that do not contain any limiting data range conditions. For example, when users query the order history, we can control it within a month. ;  Read/write Separation: In the classic database splitting scheme, the master database writes and the slave database reads.  Cache: Use MySQL cache. You can also use application-level cache for heavy and little-updated data. There is also the way of optimization by sub-table, mainly vertical sub-table and horizontal sub-table

Vertical partition:

Split according to the correlation of the tables in the database. For example, if the user table contains both the user login information and the user’s basic information, you can split the user table into two separate tables, or even put them into separate libraries. To put it simply, vertical splitting is the splitting of data table columns. A table with many columns is split into multiple tables. This should make it a little bit easier to understand.

Advantages of vertical split: Smaller row data, fewer blocks to read during query, and fewer I/ OS. In addition, vertical partitioning simplifies table structure and is easier to maintain.

Disadvantages of vertical split: Redundant primary keys, need to manage redundant columns, and may cause Join operations, which can be solved by joining at the application layer. In addition, vertical partitioning makes transactions more complex;

Vertical split tables put the primary key and some columns in one table, and then put the primary key and other columns in another table

Application Scenario 1. If some columns in a table are frequently used and others are not frequently used 2. Can make the data row smaller, a data page can store more data, reduce the number of I/O times during query disadvantage 1. Some sub-table strategies are based on the logical algorithm of the application layer. Once the logical algorithm changes, the whole sub-table logic will change, resulting in poor scalability 2. For the application layer, the logical algorithm increases the development cost. 3. Management of redundant columns, query all data need join operation horizontal partition:

Keep the data table structure unchanged and store the data shards with some policy. In this way, each piece of data is dispersed to different tables or libraries, achieving the purpose of distribution. Horizontal splitting can support very large amounts of data. Horizontal splitting is the splitting of index table rows. When the number of table rows exceeds 2 million, it will slow down. At this time, the data of a table can be split into multiple tables to store. For example, we can split the user information table into multiple user information tables to avoid the performance impact of a single table having too much data.

Water resolution can support very large amounts of data. Note that the split table only solves the problem of large data in a single table, but because the table data is still on the same machine, in fact, there is no significance to improve MySQL concurrency, so horizontal split is best.

Horizontal splitting can support very large amount of data storage and less application side transformation, but it is difficult to solve fragmented transactions, poor Join performance of cross-border points and complicated logic.

The author of “The Way to Train Java Engineers” recommends avoiding data sharding as much as possible because of the complexity of logic, deployment, and operation and maintenance. A typical data table can support less than 10 million data volumes with proper optimization. If sharding is necessary, choose client sharding architecture to reduce network I/O with middleware.

Horizontal split table: A large table can be divided to reduce the number of data and index pages that need to be read during query, reduce the number of index layers, and improve the query times

Application Scenario 1. Data in a table is independent. For example, a table records data in different regions or periods. 2. Data needs to be stored on multiple media. Horizontal segmentation faults 1, adds complexity to the application, is usually need more than one table name query, query all data should be the UNION operation, 2 in many database applications, more than it brings the advantages of this complexity, query increases when reading a disk number index layer Add database under the shard of two common solutions:

1. Client proxy: The sharding logic is encapsulated in jar packages on the application side and can be implemented by modifying or encapsulating the JDBC layer. Dangdang’s Sharding-JDBC and Ali’s TDDL are two commonly used implementations. 2. Middleware proxy: a proxy layer is added between applications and data. The sharding logic is uniformly maintained in middleware services. We are talking about Mycat, 360 Atlas, netease DDB and so on are the realization of this architecture.

 Problems with database and table  Transactions support database and table splitting, you become a distributed transaction. If you rely on the distributed transaction management function of the database itself to execute transactions, it will pay a high performance cost. If the application program to assist control, the formation of program logic transactions, and will cause programming burden.  Cross-library Join As long as there is fragmentation, the problem of cross-node join is inevitable. But good design and sharding can reduce this. A common way to solve this problem is to implement it in two queries. The ids of the associated data are found in the result set of the first query, and the second request is made according to these ids to get the associated data.  Cross-node count, Order BY, Group BY, and aggregate function issues These are class a problems because they all need to be calculated based on the entire data set. Most agents do not automatically handle merges. Solution: Similar to the cross-node join problem, get the results separately on each node and merge them on the application side. Unlike a JOIN, the queries on each node can be executed in parallel, so it is often much faster than a single large table. However, if the result set is large, the consumption of application memory is an issue.  data migration, capacity planning, capacity, etc Comprehensive business platform team from taobao, more than it used to a multiple of 2 take forward compatible characteristics (e.g., for more than 4 take 1 for more than 2 to 1) to assign data, to avoid the line levels of data migration, but still need to be table level of migration, at the same time and table size of expansion quantity are limited. Generally speaking, these schemes are not very ideal and have some disadvantages more or less, which also reflects the difficulty of Sharding’s capacity expansion from one side. ID Question  Once the database is shelled over multiple physical nodes, you can no longer rely on the primary key generation mechanism of the database itself. On the one hand, the self-generated ID of a partitioned database is not guaranteed to be globally unique. Applications, on the other hand, need to obtain ids before inserting data for SQL routing. Some common primary key generation strategies UUID Using UUID as the primary key is the simplest solution, but the disadvantages are quite obvious. Because UUID is very long, in addition to taking up a large amount of storage space, the main problem is the index, which has performance problems when creating indexes and querying based on indexes. In distributed systems, there are many occasions when you need to generate a global UID. Snowflake solves this need and implementation is also very simple, except configuration information. The core code is a 41-bit machine ID 10-bit sequence of 12 bits in milliseconds.  Sort paging across fragments Generally speaking, you need to sort the pages by the specified fields. When the sort field is a shard field, we can easily locate the specified shard through the sharding rule, but when the sort field is not a shard field, the situation becomes more complicated. In order to ensure the accuracy of the final result, we need to sort and return the data in different shard nodes, summarize and sort the result set returned from different shards, and finally return it to the user. As shown below:

Master/slave replication: Transfers DDL and DML operations from the master database to the slave database using binary logs, and then re-executes the logs. This keeps the data from the slave database consistent with the master database. Primary/Secondary replication 1. If the primary database fails, you can switch to the secondary database. 2. Read/write separation can be implemented at the database level. 3. You can perform daily backup on the secondary database.  Data distribution: Start or stop the replication and distribute data backup in different locations  Load balancing: Reduce the pressure on a single server  High availability and failover: Help applications avoid single point of failure  Upgrade testing: Can use a higher version of MySQL as the working principle of the library from MySQL master-slave replication  higher data recorded on the main library binary log  from library will be the main library journal copied to your own relay  read relay log events from the library, its weight on the basic principle from the library data process, three threads, and the correlation between the main: The binlog thread — records all statements that change the database data and places them in the binlog on the master; Slave: IO thread — after using the start slave, it is responsible to pull the binlog content from the master and put it into its own relay log. From: SQL thread — execute statements in relay log; Replication process

Step 1: Before each transaction completes, the master writes the operation record serially to a Binary log file. Step 2: Salve starts an I/O Thread. This Thread opens a normal connection on master. If the reading has caught up with the master, it goes to sleep and waits for the master to generate new events. The ultimate goal of the I/O thread is to write these events to the relay log. Step 3: THE SQL Thread reads the relay log and executes the SQL events in the log in order to be consistent with the data in the primary database. What are the solutions for read/write separation? Read/write separation depends on master/slave replication, which in turn serves read/write separation. Since master/slave replication requires that the slave cannot write and can only read (if a write operation is performed on the slave, show slave status will show Slave_SQL_Running=NO, in which case you need to manually synchronize the slave as mentioned above). Advantages of using mysql-proxy: Directly implements read/write separation and load balancing without modifying the code. The master and slave use the same account. The mysql authorities do not recommend using this account in actual production. Reduce performance, do not support transaction scheme using aop AbstractRoutingDataSource + + annotation in the dao layer decision data source. If mybatis is used, you can put read/write separation in ORM layer. For example, Mybatis can use mybatis plugin to block SQL statements, all inserts /update/delete access master library, all select access salve library. This is transparent to the DAO layer. Plugins can be implemented to select master and slave libraries by annotations or by analyzing whether the statement is a read-write method. But it still has a problem, that is, do not support transactions, so we need to rewrite the DataSourceTransactionManager, to throw in the affairs of the read – only read library, the rest have read write the thrown into library. Plan 3 using aop AbstractRoutingDataSource + + annotation in the service layer decision data sources, can support the transaction. Disadvantages: Aop does not intercept internal class methods that call each other in this.xx() mode, requiring special handling. (1) The backup plan depends on the size of the database. Generally speaking, if the database is less than 100GB, you can use mysqldump to make backup plans. Because mysqldump is lighter and more flexible, the backup time is selected during the service peak period. Full backups can be performed daily (mysqldump backups are smaller and smaller when compressed). Xtranbackup is a faster backup than mysqlDump for libraries over 100GB. Full backup is performed once a week and incremental backup is performed every other day during off-peak service periods. (2) Backup and recovery time Physical backup is fast, while logical backup is slow. This depends on the speed of the machine, especially the hard disk. 20GB 2-minute (mysqldump) 80GB 30-minute (mysqldump) 111G 30-minute (mysqldump) 288GB 3-hour (Xtra) 3TB 4-hour (Xtra) logical import takes more than 5 times the backup time (3) How to deal with backup and recovery failure? First of all, we should do enough preparation work before recovery to avoid errors during recovery. For example, the validity check, permission check, space check after backup. If any error occurs, adjust accordingly according to the error prompt. Mysqldump mysqldump is a logical backup. Add the – single-transaction option for consistency backup. The background process sets the TRANSACTION ISOLATION LEVELREPEATABLE READ level of the session to RR(SET Session TRANSACTION ISOLATION LEVELREPEATABLE READ) and then explicitly starts a TRANSACTION (START TRANSACTION /*! 40100 WITH CONSISTENTSNAPSHOT /), this ensures that all data read in this transaction is the snapshot of the transaction. And then read the data out of the table. — master-data=1 FLUSH TABLES WITH READ LOCK; showmaster status =1 FLUSH TABLES WITH READ LOCK Unlock it now and read the table. Xtrabackup: Xtrabackup is a physical backup. It copies the tablespace files and scans the redo logs. When innoDB is finally backed up, a flush engine logs operation is performed to ensure that all redo logs have been dropped. Because Xtrabackup does not copy binlogs, you must ensure that all redo logs fall to disk, otherwise the last set of committed transaction data may be lost. This point in time is when InnoDB completes the backup. Data files are not consistent, but redo files from this time period are consistent. Flush tables with read lock for myISam and other engines This makes for perfect hot spare. What are the repair methods for data table corruption?  Use myisamchk to fix the problem. 1) Stop the mysql service before the fix. 2) Run the command line and go to the /bin directory of mysql. 3) Run the myisamchk -recover database path /.MYI command to repair table or OPTIMIZE table. REPAIR TABLE table_name REPAIR TABLE OPTIMIZE TABLE table_name REPAIR TABLE used to REPAIR damaged tables. The OPTIMIZE TABLE command OPTIMIZE TABLE is used to reclaim the spare database space, the disk space is not immediately reclaimed when the TABLE rows are removed, and the rows are rearranged with the OPTIMIZE TABLE command.