preface
MySQL > select * from github
Github.com/whx123/Java…
Public number: a boy picking up snails
The database
1. What are the precautions for using MySQL index?
This question can be answered in three dimensions: where an index fails, where an index does not fit, and index rules
Index which cases will fail
- If the query condition contains OR, the index may become invalid
- If the field type is a string, where must be quoted, otherwise the index will be invalid
- The like wildcard may invalidate the index.
- Union index, the query condition column is not the first column in the union index, index failure.
- Index invalid when using mysql’s built-in functions on index columns.
- Index column operations (e.g., +, -, *, /) invalidate the index.
- Index fields using (! = or < >, not in) may cause index invalidation.
- If is null is not null is used on an index field, the index may become invalid.
- Left-link query or right-link query The encoding format of the field associated with the query is different, which may cause index failure.
- Mysql estimates that a full table scan is faster than an index, so no index is used.
Essential for backend programmers: Ten miscellaneous diseases of index failure
Which scenarios are not appropriate for indexes
- Small amounts of data are not suitable for indexing
- Frequent updates are not suitable for indexing
- Poorly differentiated fields are not suitable for indexing (e.g., gender)
Some unspoken rules for indexing
- Cover index
- Back to the table
- Index data structure (B+ tree)
- Left-most prefix rule
- An index pushdown
2. MySQL has encountered deadlock problems, how do you solve it?
Here’s my general procedure for checking deadlocks:
- Query deadlock logs show engine Innodb status;
- Find the deadlock Sql
- Analyze SQL locking
- Simulate a deadlock
- Analyze deadlock logs
- Analyze deadlock results
You can read my two articles:
- Mysql deadlocks
- Mysql > insert on duplicate deadlocks
3. How do you optimize SQL in your daily work?
The question can be answered in several dimensions:
- indexed
- Avoid returning unnecessary data
- Do it in batches as appropriate
- Optimizing SQL structure
- Depots table
- Reading and writing separation
Check out my article: Backend Programmers must Have: 30 Tips for Writing High-quality SQL
4. Talk about the design of sub-database and sub-table
Sub – database sub – table scheme, sub – database sub – table middleware, sub – database sub – table possible problems
Sub-database and sub-table scheme:
- Horizontal sorting: Split the data in one library into multiple libraries based on fields according to certain policies (hash, range, etc.).
- Horizontal table: Split the data in one table into multiple tables based on fields and policies (such as hash and range).
- Vertical database division: Split different tables into different databases based on service ownership.
- Vertical split table: Separate the fields in the table into different tables (main table and extended table) according to the activity of the fields.
Common sub-database sub-table middleware:
- Sharding – JDBC (dangdang)
- Mycat
- TDDL (Taobao)
- 58-city Database Middleware
- Vitess (Database middleware developed by Google)
- Atlas(Qihoo 360)
Problems that may be encountered in database and table
- Transaction problem: Need to use distributed transaction
- Cross-node Join problem: Solving this problem can be implemented in two queries
- Count, Order BY, Group BY, and aggregate function issues across nodes: get results on each node and merge them on the application side.
- Data migration, capacity planning, and capacity expansion
- ID problem: After the database is shard, it can no longer rely on the primary key generation mechanism of the database itself. The simplest is to consider UUID
- Sort paging across shards
Personally, I think these two articles on the Internet are good, friends can go to have a look at it.
- MySQL database in the Internet common sub-database sub-table scheme
- The problems and schemes that need to be considered in sub-database sub-table
5. Differences between InnoDB and MyISAM
- InnoDB supports transactions, MyISAM does not
- InnoDB supports foreign keys, MyISAM does not
- InnoDB supports MVCC(multi-version concurrency control), MyISAM does not
- Select count(*) from table MyISAM is faster because it has a variable that holds the total number of rows in the entire table and can be read directly. InnoDB requires a full table scan.
- Innodb does not support full-text indexing, MyISAM does (Innodb after 5.7 also supports full-text indexing)
- InnoDB supports table and row locking, while MyISAM supports table locking.
- InnoDB tables must have primary keys, while MyISAM can have no primary keys
- Innodb tables require more memory and storage, while MyISAM can be compressed and has less storage space.
- Innodb records are inserted in order by primary key size, MyISAM records are inserted in order, save records in order.
- InnoDB storage engine provides transaction security with commit, rollback, crash recovery capabilities, less efficient InnoDB writes than MyISAM, and takes up more disk space to retain data and indexes
6. The principle of database index, why use B+ tree, why not binary tree?
You can look at it from several dimensions, is the query fast enough, is the efficiency stable, how much data is stored, how many times is the disk searched, why is it not a binary tree, why is it not a balanced binary tree, why is it not a B tree, but a B+ tree?
Why not just a binary tree?
If the binary tree is specialized as a linked list, it is equivalent to a full table scan. Compared with binary search tree, balanced binary search tree has more stable search efficiency and higher overall search speed.
Why not a balanced binary tree?
We know that querying data in memory is much faster than querying data on disk. If this tree data structure as the index, requires that each time we find the data read from the disk in a node, that is we said a disk block, but balanced binary tree is each node and data store only one key values, if it is B tree, more nodes can store the data, the height of the tree would also reduce, so read the number of disk will be down, Query efficiency is fast.
So why not a B tree but a B+ tree?
1) Non-leaf nodes in B+ tree do not store data, only store key values, while nodes in B tree store not only key values, but also data. Innodb’s default page size is 16KB. If innoDB does not store data, it stores more key values, and the corresponding tree order (tree of nodes’ children) is larger, the tree is shorter and fatter, which again reduces the number of I/OS needed to find data on disk and makes data queries more efficient.
2) All the data of B+ tree index are stored in leaf node, and the data is arranged in order, linked by the list. So B+ trees make range lookup, sort lookup, group lookup, and de-lookup incredibly easy.
If someone asks you why MySQL uses B+ tree as index, send this article to her
7. Differences between clustered and non-clustered indexes
- A table can have only one clustered index, but a table can have multiple clustered indexes.
- Clustered indexes, in which the logical order of the key values determines the physical order of the corresponding rows in the table; The logical order of the indexes in an index is different from that of the physical storage on the disk.
- An index is described by the data structure of a binary tree. We can think of a clustered index as: the leaf nodes of the index are data nodes. A leaf node that is not a clustered index is still an index node but has a pointer to the corresponding data block.
- Clustered indexes: Physical storage is sorted by index; Non-clustered index: Physical storage is not sorted by index;
When to use clustered or non-clustered indexes?
8. Limit 1000000 How do you handle slow loading?
If the id is continuous, return the maximum record (offset) of the last query and lower limit
Select id, name from employeewhere id>1000000 limit 10.
Copy the code
Option 2: Limit the number of pages as business allows:
Suggestions to discuss with business, there is no need to check the page after so. Because most users don’t scroll too far back.
Order by + index (id = index)
Select id, name from employee order by IDlimit1000000,Copy the code
Scheme 4: Use delayed association or sub-query to optimize the super-multi-page scenario. (Quickly locate the ID segment to be obtained, and then associate it)
SELECT a.* FROM employee a, (select id from employee whereA) LIMIT B) LIMITwhere a.id=b.id
Copy the code
9. How to choose an appropriate distributed primary key scheme?
- A database grows its own sequence or field.
- UUID.
- Redis generated ID
- Twitter’s Snowflake algorithm
- Generate unique ids using ZooKeeper
- MongoDB的ObjectId
10. What are the isolation levels for transactions? What is the default isolation level for MySQL?
- Read Uncommitted
- Read Committed
- Repeatable Read
- Serializable
Repeatable Read is the default transaction isolation level of Mysql.
Read my article on thoroughly understanding the four isolation levels of MySQL transactions
11. What is magic, dirty, and unrepeatable reading?
- Transaction A and TRANSACTION B execute alternately. Transaction A is disturbed by transaction B because transaction A reads data that transaction B has not committed. This is called dirty read
- In the scope of a transaction, two identical queries that read the same record but return different data are called non-repeatable reads.
- Transaction A queries the result set of A range, another concurrent transaction B inserts/deletes data into the range and commits it silently, then transaction A queries the same range again and gets A different result set from the two reads. This is A phantom read.
Read my article on thoroughly understanding the four isolation levels of MySQL transactions
12. How to safely modify the same row of data under high concurrency?
To safely modify the same row, ensure that one thread cannot update the row while it is being modified. Generally, there are two kinds of pessimistic lock and optimistic lock
Using pessimistic Locks
The pessimistic locking idea is that when the current thread wants to modify data, all other threads have to shut out. For example, you can use select… for update ~
select * from User whereName = "jay"for update
Copy the code
This SQL statement locks all records in the User table that match the search criteria (name= ‘jay’). No other thread can modify these records until the transaction commits.
Use optimistic Locks
The idea of optimistic locking is that the cable program is sent to modify first. If it is not modified by another thread, the modification will be successful. If it is modified by another thread, the modification will fail or retry. Implementation: Optimistic locking is usually implemented using version number mechanism or CAS algorithm.
Can have a look at my article, mainly is the idea ha ~ CAS optimistic lock to solve the concurrency problem of a practice
Optimistic and pessimistic locks for databases.
Pessimistic locks:
Pessimistic lock she is single-minded and insecure. Her mind is only on the current transaction, and she is always worried that her beloved data may be modified by other transactions. So once a transaction has (obtains) pessimistic lock, no other transaction can modify the data, but can only wait for the lock to be released.
Optimistic locking:
Optimism lock’s “optimism” is that it thinks the data won’t change very often. Therefore, it allows multiple transactions to make changes to the data simultaneously. Implementation: Optimistic locking is usually implemented using version number mechanism or CAS algorithm.
Reprinted this article before, feel the author wrote quite detailed ~
Take you to understand the pessimistic lock and optimistic lock
14. What are the general steps of SQL optimization, how to look at the execution plan (explain), how to understand the meaning of each field.
- The show status command shows the execution frequency of various SQL statements
- Use slow query logs to locate SQL statements that execute inefficiently
- Explain execution plans for inefficient SQL (this is very important, as it can be used to analyze SQL in daily development and greatly reduce SQL online accidents)
General steps for optimizing SQL statements
15. What is the meaning of select for UPDATE?
Select for Update Meaning
Select for UPDATE does not lock. Select for update does not lock. Whether a row or table lock is added depends on whether an index/primary key is used.
No index/primary key is a table lock, otherwise it is a row lock.
Select for UPDATE lock authentication
Table structure:
//id 为主键,name为唯一索引
CREATE TABLE `account` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`balance` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8
Copy the code
Select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from (select * from))))))))))); The diagram below:
Will add row locks
Let’s go north and change the normal field balance, but it’s blocked again. Therefore, select for UPDATE adds a table lock without an index/primary key
16. MySQL transaction has four major features and implementation principles
- Atomicity: The transaction is executed as a whole, and all or none of the operations on the database contained within it are executed.
- Consistency: data will not be destroyed before and after the transaction. If account A transfers 10 yuan to account B, the total amount of account A and B will remain the same regardless of success or failure.
- Isolation: When multiple transactions are accessed concurrently, transactions are isolated from each other, that is, one transaction does not affect the performance of other transactions. In short, there is no trespass between things.
- Persistence: Indicates that after a transaction is complete, operational changes made by the transaction to the database will be persisted in the database.
The idea of implementing the transaction ACID feature
- Atomicity: It is implemented using undo log. If an error occurs during the transaction execution or the user performs rollback, the system returns the status of the beginning of the transaction using undo log logs.
- Persistence: Use the redo log to restore data in the event of a system crash, as long as the redo log is persisted.
- Isolation: Transactions are isolated from each other through locks and MVCC.
- Consistency: Consistency is achieved through rollback, recovery, and isolation in concurrent cases.
17. How to optimize a table with nearly ten million data and SLOW CRUD?
Depots table
A table has nearly ten million data, you can consider optimizing the structure of the table, table (horizontal table, vertical table), of course, you so answer, need to prepare the interviewer to ask you the database table related questions ah, such as
- Split-table scheme (horizontal split-table, vertical split-table, split-rule hash, etc.)
- Sub-database sub-table middleware (Mycat, Sharing-JDBC, etc.)
- Sub-database sub-table some problems (transaction problems? Cross-node Join problem)
- Solutions (distributed transactions, etc.)
The index optimization
In addition to the sub-table, optimize the table structure, of course, there are all index optimization and other programs ~
If you are interested, check out this article: 30 Tips for Writing High quality SQL for Backend Programmers
18. How to write SQL to effectively use composite indexes.
Composite indexes, also known as composite indexes, allow users to create indexes on multiple columns, called composite indexes.
When we create a composite index, such as (k1,k2,k3), we create (k1), (k1,k2) and (k1,k2,k3), which is the left-most matching principle.
select * from table where k1=A AND k2=B AND k3=D
Copy the code
With respect to composite indexes, we need to focus on the order in which the Sql conditions are queried to ensure that the leftmost matching principle is valid, and to remove unnecessary redundant indexes.
19. Difference between in and exists in mysql
This, with the demo to see more exciting, haha
SQL > query all employees in all departments; SQL > query all employees in all departments;
select * from A where deptId in (select deptId from B);
Copy the code
This is equivalent to:
Query department table B first
select deptId from B
Then deptId queries the employee of A
select * from A where A.deptId = B.deptId
This can be abstracted into a loop like this:
List<> resultSet ;
for(int i=0; i<B.length; i++) {for(int j=0; j<A.length; j++) {if(A[i].id==B[j].id) {
resultSet.add(A[i]);
break; }}}Copy the code
Obviously, in addition to using in, we can also implement the same query function in exists as follows:
select * from A where exists (select 1 from B where A.deptId = B.deptId);
Copy the code
Because the understanding of exists query is that the master query is executed first, the data is obtained, and then put into the sub-query for conditional verification. According to the verification result (true or false), the data result of the master query is determined whether the data result is satisfied.
So, writing this is equivalent to:
Select * from A
Select * from B where a.deptid = b.deptid; select * from B where a.deptid = b.deptid;
Similarly, this can be abstracted as a loop:
List<> resultSet ;
for(int i=0; i<A.length; i++) {for(int j=0; j<B.length; j++) {if(A[i].deptId==B[j].deptId) {
resultSet.add(A[i]);
break; }}}Copy the code
The hardest part of the database is the release of the link with the program. Let’s say you link twice, do a million data set queries each time, and then walk away, so it’s only done twice; Instead, you create a million links, request the release of links over and over again, and the system becomes overwhelmed. The mysql optimization principle is that small tables drive large tables, and small data sets drive large data sets for better performance.
Therefore, we choose the one with the smallest outermost loop, that is, if the data amount of B is smaller than A, it is suitable to use in, and if the data amount of B is larger than A, it is suitable to choose EXISTS. This is the difference between IN and EXISTS.
20. What problems may occur when the database increments its primary key?
- When using autoincrement primary key to divide database into different tables, problems such as duplicate primary key may occur. Solution, simple point can consider using UUID ha
- Auto-increment primary keys can cause problems by creating table locks
- Auto-increment primary keys may run out.
21. Are you familiar with MVCC, its underlying principles?
MVCC, multi-version concurrency control, is a mechanism to improve concurrency performance by reading historical version data to reduce concurrent transaction conflicts.
MVCC needs to pay attention to these points:
- Transaction version number
- The hidden column of the table
- undo log
- read view
Read my article on thoroughly understanding the four isolation levels of MySQL transactions
Do you know database middleware, Sharding JDBC, MyCAT?
- Sharding-jdbc is currently based on JDBC drivers and does not require additional proxies, so there is no need to focus on the high availability of the proxy itself.
- Mycat is based on Proxy, which copies the MySQL protocol and disguises Mycat Server as a MySQL database, while Sharding-JDBC is based on JDBC interface extension and provides lightweight services in the form of JAR packages.
Some netizens recommend this article:
Sharding-jdbc: Be the lightest database middle tier
MYSQL master-slave delay, how do you solve it?
Hee hee, first review the master slave replication principle, as shown in the figure:
- Step 1: Update events (UPDATE, INSERT, DELETE) of the master library are written to the binlog
- Step 2: Initiate a connection from the library to connect to the master library.
- Step 3: The master library creates a binlog dump thread and sends the contents of the binlog to the slave library.
- Step 4: After starting from the slave library, create an I/O thread to read the binlog content from the master library and write it to the relay log
- Step 5: An SQL thread is also created to read from the relay log, execute the read update event from Exec_Master_Log_Pos, and write the update to the SLAVE DB
If you are interested, you can also check out this article: Essential for back-end Programmers: mysql database related flow charts/schematics
The cause of the master/slave synchronization delay
A server opens N links to the client to connect, so there will be large concurrent update operations, but there is only one thread from the server to read the binlog, when a certain SQL execution on the secondary server for a long time or because of a certain SQL lock table will result in a large backlog of SQL on the primary server. Not synchronized to the slave server. This leads to master-slave inconsistencies, known as master-slave delays.
Solution to master/slave synchronization delay
- The primary server is responsible for the update operation and has higher security requirements than the secondary server, so some parameters can be changed, such as synC_binlog =1, innodb_flush_log_at_trx_COMMIT =1, etc.
- Choose a better hardware device as slave.
- If a slave server is used as a backup, instead of providing a query, its load is down, and the execution of SQL in the relay log is naturally more efficient.
- Add slave server, this purpose is also to spread the read stress, thus reducing server load.
MySQL primary/secondary synchronization delay causes and solutions
24. Let’s talk about the optimization of a large table query
- Optimize shema, SQL statement + index;
- Consider caching, memcached, Redis, or JVM native caching;
- Master/slave replication, read/write separation;
- Sub-database sub-table;
25. What is a database connection pool? Why do you need a database connection pool?
Basic principle of connection pool: Database connection pool Principle: Maintains a certain number of database connections in an internal object pool and exposes the methods for obtaining and returning database connections.
The process of establishing a connection between an application and a database:
- Establish a connection to the database server through TCP three-way handshake
- Send the database user account password and wait for the database to authenticate the user
- After authentication is complete, the system can submit SQL statements to the database for execution
- Closing the connection, TCP waved goodbye four times.
Database connection pooling benefits:
- Resource reuse (connection reuse)
- Faster system response time
- New means of resource allocation
- Unified connection management to avoid database connection leakage
If you are interested, you can check out my article on database connection pool memory leak analysis and solution
26. How does a SQL statement execute in MySQL?
Mysql logical architecture
Query statement:
- First check whether the statement has permissions
- If you do not have permission, an error message is displayed
- If you have permission, the cache will be queried prior to MySQL8.0.
- If there is no cache, the parser performs lexical analysis to extract key elements of SQL statements such as SELECT. Then determine whether the SQL statement has syntax errors, such as whether the keyword is correct, and so on.
- The optimizer determines the execution plan
- For permission verification, if there is no permission directly return error message, if there is permission will call the database engine interface, return the execution result.
How does an SQL statement execute in MySQL
27. InnoDB engine index policy, understand?
- Cover index
- Left-most prefix rule
- An index pushdown
Index push-down optimization is introduced in MySQL 5.6. In the index traversal process, it can judge the fields contained in the index first and directly filter out the records that do not meet the conditions to reduce the number of times back to the table.
Take a look at this article: talk about these indexing strategies in the InnoDB engine
28. How do you consider time zone transitions when storing date formats in databases?
- The datetime type is suitable for recording the original creation time of the data. If you change the values of other fields in the record, the value of the Datetime field will not change unless you change it manually.
- The TIMESTAMP type is suitable for recording the last modification time of data. As long as the values of other fields in the record are changed, the values of the TIMESTAMP field are automatically updated.
How to consider time zone transitions/Take a look at this: How to consider time zone transitions when storing date formats in a database?
29. How do you optimize a SQL that takes too long to execute?
- Check whether multiple tables and sub-queries are involved, and optimize the Sql structure, such as removing redundant fields and detachable tables
- Optimize the index structure to see if indexes can be added appropriately
- For a large number of tables, consider separate/sub-table (e.g., transaction statement)
- The database is separated from the master and slave databases, and the read and write databases are separated
- Explain analyze SQL statements, view execution plans, and optimize SQL
- Check the mysql execution logs to see if there are other problems
MYSQL database server performance analysis method commands
- Show status, some variables worth monitoring:
- Bytes_received and Bytes_sent Indicates the traffic to and from the server.
- Com_* The command that the server is executing.
- Created_* Temporary tables and files created during the query execution period.
- Handler_* Storage engine operations.
- Select_* Different types of join execution plans.
- Sort_* Several sort information.
- Show Profiles is used by MySql to analyze resource consumption of SQL statement execution in the current session
What’s the difference between a Blob and a text?
- Blob is used to store binary data, while Text is used to store large strings.
- Blob values are treated as binary strings (byte strings), they have no character set, and they sort and compare values based on bytes in column values.
- Text values are treated as non-binary strings (character strings). They have a character set and sort and compare values according to the collation rules of the character set.
32. What is the best type of field in mysql to record currency?
- Money is commonly represented in databases by MySQL as Decimal and Numric types, which are implemented as the same type by MySQL. They are used to keep data about money.
- Salary DECIMAL(9,2), 9(precision) represents the total number of DECIMAL places that will be used to store values, and 2(scale) represents the number of places that will be used to store values after the DECIMAL point. The values stored in the SALARY column range from -9999999.99 to 9999999.99.
- DECIMAL and NUMERIC values are stored as strings rather than as binary floating-point numbers to preserve the DECIMAL precision of those values.
Mysql > create lock ();
There are three types of lock granularity:
- Table lock: low overhead, fast lock; The locking force is large, the probability of lock conflict is high, and the concurrency is low. No deadlocks.
- Line lock: expensive, slow lock; Deadlocks occur; The lock granularity is small, the probability of lock conflict is low, and the concurrency is high.
- Page locks: the overhead and speed of locking is between table and row locks. Deadlocks occur; The locking granularity is between table and row locks, and the concurrency is average
Interested partners can see my article, there is a variety of lock ha:
Back-end programmer required: mysql database related flow chart/schematic diagram
34. What is the difference between a Hash index and a B+ tree? How did you choose to design the index?
- B+ trees can perform range query, but Hash indexes cannot.
- B+ trees support the leftmost principle of joint indexes, but Hash indexes do not.
- B+ trees support order by, but Hash indexes do not.
- Hash indexes are more efficient than B+ trees for equivalent queries.
- When B+ trees use like for fuzzy queries, words after like (e.g. %) are optimized, and Hash indexes do not perform fuzzy queries at all.
Mysql > select * from inner join; select * from left join;
- Inner join: when two tables are joined, only the result sets in the two tables that match exactly are retained
- Left JOIN when a join query is performed between two tables, all rows from the left table are returned, even if there are no matching records in the right table.
- When a right Join query is performed between two tables, all rows from the right table are returned, even if there are no matching records in the left table.
36. Talk about the MySQL infrastructure diagram
- The first layer is responsible for connection handling, authorization, security, and so on
- The second layer compiles and optimizes SQL
- The third layer is the storage engine.
37. What is inner join, outer join, cross join, cartesian product?
- Inner join: Obtain the records in two tables that satisfy the connection matching relation.
- Outer join: Obtain the records in two tables that meet the join matching relation, and the records in one table (or two tables) that do not meet the match relation.
- Cross join: displays the one-to-one correspondence of all records in two tables without matching relation for filtering, also known as Cartesian product.
38. Describe the three paradigms of databases
- First normal form: Every column (every field) in a data table cannot be split again.
- Second normal form: Based on the first normal form, the split primary key column is completely dependent on the primary key, not on a part of the primary key.
- Third normal form: Non-primary keys in a table depend only on the primary key and not on other non-primary keys if the second normal form is satisfied.
Mysql > select * from ‘mysql’ where ‘privileges’ are stored.
The MySQL server controls user access to the database through the permission table, which is stored in the MySQL database and initialized by the mysql_install_db script. These permission tables are user, DB, table_priv, columns_priv, and host.
- User permission table: records the information about the user accounts that are allowed to connect to the server. The permissions in the table are global.
- Db rights table: records the operation rights of each account on each database.
- Table_priv Permission table: records data table-level operation permissions.
- Columns_priv permission table: records operation permissions at the data column level.
- Host permission table: Works with db permission table to control database-level operation permissions on a given host. This permission list is not affected by GRANT and REVOKE statements.
40. How many types of input are available for Mysql binlog? What’s the difference?
There are three formats, statement, row and mixed.
- Statement. Each SQL statement that modifies data is recorded in a binlog. You do not need to record the changes of each row, reducing the amount of binlog logs, saving I/O, and improving performance. Because SQL execution is contextual, relevant information needs to be saved at the time of saving, and some statements that use functions and the like cannot be recorded and copied.
- Row does not record information about the context of the SQL statement, but only the record that was modified. The recording unit is the change of each row. Basically, all the changes can be recorded, but many operations will lead to a large number of changes of rows (such as ALTER table). Therefore, files in this mode save too much information and log too much.
- Mixed, a compromise, uses statement records for normal operations and row records for situations where statement is not available.
41. Have you heard about the four features of InnoDB engine
- Insert buffer
- Double write
- Adaptive Hash index (AHI)
- Pre-reading (read ahead)
42. What are the advantages and disadvantages of indexes?
Advantages:
- A unique index ensures the uniqueness of data for each row in a database table
- Indexes can speed up data query and reduce query time
Disadvantages:
- Creating and maintaining indexes takes time
- Indexes take up physical space. In addition to data tables, each index takes up a certain amount of physical space
- Indexes are maintained dynamically when adding, deleting, or modifying data from a table.
43. What are the types of indexes?
- Primary key index: Data columns cannot duplicate or be NULL, and a table can have only one primary key.
- Unique index: Data columns are not allowed to duplicate and NULL values are allowed. A table allows multiple columns to create unique indexes.
- Plain index: A basic index type that has no restrictions on uniqueness and allows NULL values.
- Full-text indexing: it is a key technology used by search engines at present, which is used for word segmentation and search of text content.
- Overwrite index: The query column is overwritten by the created index without reading rows
- Combined index: An index composed of multiple column values for combined search, which is more efficient than index merging
44. What are the rules for creating indexes?
- Left-most prefix matching rule
- Create indexes for fields that are frequently used as query criteria
- Frequently updated fields are not suitable for creating indexes
- Indexed columns cannot participate in calculations and have no function operations
- Prioritize extending indexes over creating new ones to avoid unnecessary indexes
- In the order by or group by clause, creating an index requires a careful order
- Poorly differentiated data columns are not suitable for indexing (e.g., gender)
- Data columns that define foreign keys must be indexed.
- Do not index columns defined as text or image data types.
- Delete indexes that are no longer used or rarely used
45. Three ways to create an index
- The index is created when the CREATE TABLE is executed
CREATE TABLE `employee` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`sex` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_name` (`name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Copy the code
- Add indexes using the ALTER TABLE command
ALTER TABLE table_name ADD INDEX index_name (column);
Copy the code
- Run the CREATE INDEX command to CREATE the INDEX
CREATE INDEX index_name ON table_name (column);
Copy the code
46. How do you delete millions or more of data?
- When we want to delete millions of data, we can delete the index first
- Then delete unnecessary data in batches
- Create the index again after the deletion is complete.
47. What is the left-most prefix rule? What is the leftmost matching principle?
- When creating a multi-column index, the most frequently used column in the WHERE clause should be placed on the leftmost according to business requirements.
- When we create a composite index, such as (k1,k2,k3), we create (k1), (k1,k2) and (k1,k2,k3), which is the left-most matching principle.
48. The difference between B tree and B+ tree. Why does the database use B+ tree instead of B tree?
- In a B-tree, keys and values are stored in both internal and leaf nodes. In a B+ tree, internal nodes only hold keys, while leaf nodes hold both keys and values.
- The leaves of a B+ tree are connected by a chain, whereas the leaves of a B tree are independent.
- All data of B+ tree index is stored in leaf node, and the data is arranged in order, linked list. So B+ trees make range lookup, sort lookup, group lookup, and de-lookup incredibly easy. .
- Non-leaf nodes of a B+ tree do not store data, only store key values, while B tree nodes store not only key values, but also data. Innodb’s default page size is 16KB. If innoDB does not store data, it stores more key values, and the corresponding tree order (tree of nodes’ children) is larger, the tree is shorter and fatter, which again reduces the number of I/OS needed to find data on disk and makes data queries more efficient.
49. Overwrite indexes, back tables, etc.
- Overwrite index: The query column is overwritten by the index to be built, not read from the table. In other words, the query column is overwritten by the index to be used.
- Table back: The secondary index cannot directly query all column data. Therefore, the process of querying the desired data through the secondary index after the cluster index is queried is called table back.
Mysql overwrites indexes and back tables
50. B+ tree does not need to query data back to the table when it meets the requirements of clustered index and overwritten index?
- In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index.
- When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.
51. When to use clustered and non-clustered indexes
Does a non-clustered index have to be queried back into the table?
Not necessarily, if all the fields in the query match the index, then there is no need to do a table back query (haha, that’s what overwriting an index does).
Select * from student where age < 20; select * from student where age < 20; select * from student where age < 20;
53. What is a composite index? Why care about order in composite indexes?
Composite indexes. Users can create indexes on multiple columns. These indexes are called composite indexes.
Because of the leftmost indexing policy in the InnoDB engine, you need to pay attention to the order in composite indexes.
54. What are database transactions?
Database transaction (referred to as: transaction), is a logical unit in the process of database management system execution. It consists of a limited sequence of database operations, which are either all executed or not executed. It is an indivisible work unit.
55. Relationship between isolation levels and locks
To answer this question, you can explain the four isolation levels and then explain how they are implemented. Isolation levels depend on locks and MVCC.
Read my article on thoroughly understanding the four isolation levels of MySQL transactions
56. By lock granularity, what are the database locks? Locking mechanism and InnoDB locking algorithm
- According to the lock granularity: table lock, page lock, row lock
- By locking mechanism points: optimistic lock, pessimistic lock
57. What locks does MySQL have?
Will tell from the type of lock, have share lock and exclusive lock.
- Shared lock: also known as read lock. When the user wants to read the data, a shared lock is placed on the data. Multiple shared locks can be added simultaneously.
- Exclusive lock: also known as write lock. An exclusive lock is placed on the data when the user writes to it. Only one exclusive lock can be added, and other exclusive locks and shared locks are mutually exclusive.
The compatibility of locks is as follows:
58. How to implement row lock in MySQL InnoDB engine?
A row lock based on an index.
select * from t where id = 666 for update;
Copy the code
For UPDATE can perform row locks based on conditions, and ids are columns with index keys. If id is not an index key then InnoDB will perform table locks.
59. What is a deadlock? How to solve it?
A deadlock is a vicious cycle in which two or more transactions occupy each other’s resources and request to lock each other’s resources. Take a look at the picture, as follows:
To solve the deadlock idea, the general is to cut off the loop, as far as possible to avoid the formation of concurrent loop.
- If different programs concurrently access multiple tables, try to agree to access the tables in the same order to greatly reduce the chance of deadlocks.
- In the same transaction, all resources needed should be locked as much as possible to reduce the probability of deadlock.
- For the part of the business that is very prone to deadlock, you can try to use the granularity of upgrade lock to reduce the probability of deadlock through table-level locking.
- Distributed transaction locks or optimistic locks can be used if the business is not doing well
- Deadlock and index are inseparable, to solve the index problem, you need to optimize your index,
If you are interested in Mysql deadlock analysis, you can read this article
60. Why use views? What is a view?
Why use views?
To improve the reusability of complex SQL statements and the security of table operations, the MySQL database management system provides the view feature.
What is a view?
A view is a virtual table. It is a filtered display of data in a table. A view consists of a predefined query SELECT statement.
61. What are the characteristics of views? What are the usage scenarios?
View features:
- The columns of a view can come from different tables, which are abstractions of tables and new relationships established in a logical sense.
- A view is a table (virtual table) generated by a base table (real table).
- View creation and deletion do not affect the base table.
- Updates (additions, deletions, and modifications) to view content directly affect the base table.
- When the view comes from more than one base table, data cannot be added and deleted.
View Purpose: Simplifies SQL queries, improves development efficiency, and is compatible with old table structures.
Common usage scenarios for views:
- Reuse SQL statements;
- Simplify complex SQL operations.
- Use parts of a table rather than the entire table;
- Protect the data
- Change the data format and presentation. Views can return data that is different from the presentation and format of the underlying table.
62. What are the advantages and disadvantages of views?
- Simplify queries. Views simplify user operations
- Data security. Views enable users to view the same data from multiple perspectives and secure confidential data
- Logical data independence. Views provide a degree of logical independence for refactoring the database
63. What is the difference between count(1), count(*) and count(column name)?
- Count (*) includes all columns and is equivalent to the number of rows. NULL columns are not ignored when the result is counted
- Count (1) involves ignoring all columns, using 1 to represent the line of code, and not ignoring NULL columns when counting results
- Count (column name) includes only the column name. When the result is counted, the count of the column value that is null is ignored. That is, if the value of a column is null, the count is not counted.
64. What is a cursor?
Cursors provide a flexible means of manipulating data retrieved from tables, essentially a mechanism for extracting one record at a time from a result set containing multiple data records.
65. What is a stored procedure? What are the pros and cons?
Stored procedures are compiled SQL statements that act as a method to perform some function (add, delete, change, or query a single table or multiple tables) and then give the code block a name that can be called when the function is used.
Advantages:
- A stored procedure is a precompiled block of code that executes efficiently
- Stored procedures run on the server side, reducing the stress on the client side
- Allows modular programming, where a procedure is created once and can be called any number of times later in the program, similar to the reuse of methods
- Replacing a large number of T_SQL statements with a stored procedure can reduce network traffic and increase communication rates
- Data security can be ensured to a certain extent
Disadvantages:
- Debugging trouble
- Portability is not flexible
- Recompile problem
66. What is a trigger? What are the use scenarios for triggers?
Trigger: a piece of code that is automatically executed when an event is triggered.
Usage Scenarios:
- Changes can be cascaded through related tables in the database.
- Monitor changes to a field in a table in real time and need to be processed accordingly.
- For example, some service numbers can be generated.
- Do not abuse it; otherwise, database and application maintenance will be difficult.
67. What triggers do MySQL have?
MySQL database has six types of triggers:
- Before Insert
- After Insert
- Before Update
- After Update
- Before Delete
- After Delete
68. What are superkeys, candidate keys, primary keys, and foreign keys?
- Superkeys: In relational mode, a set of attributes that uniquely identify a tuple is called a superkey.
- Candidate key: is the minimum superkey, that is, a superkey with no redundant elements.
- Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify a stored data object. A data column can have only one primary key, and the value of the primary key cannot be missing, that is, cannot be Null.
- Foreign e key: The primary key of another table that exists in one table is called the foreign key of that table.
69. What are the types of SQL constraints?
- NOT NULL: The content of a constraint field must NOT be NULL.
- UNIQUE: Constraint field uniqueness, a table can have more than one UNIQUE constraint.
- PRIMARY KEY: the constraint field is unique and cannot be repeated. Only one constraint field can exist in a table.
- FOREIGN KEY: Action used to prevent breaking connections between tables and prevent illegal data from being inserted into FOREIGN keys.
- CHECK: Used to control the value range of the field.
70. Talk about six kinds of associated query and application scenarios.
- Cross connection
- In the connection
- Outer join
- The joint query
- All connection
- Cross connection
71. The meaning of 50 in VARCHar (50)
- The field holds a maximum of 50 characters
- Varchar (50) and varchar(200) take up the same amount of space to store “jay” strings, which consume more memory when sorting
Int (20); char(20); varchar(20)
- Int (20) indicates that the field is of type int and the display length is 20
- Char (20) indicates that the field is a fixed-length string of 20 characters
- Varchar (20) indicates that the field is a variable length string of 20 characters
73. Differences between DROP, DELETE and TRUNCate
delete | truncate | drop | |
---|---|---|---|
type | DML | DDL | DDL |
The rollback | Can be rolled back | Do not roll back | Do not roll back |
Delete the content | Delete all or some rows from the table while the table structure is still in place | Delete all data from table while table structure is still in place | When a table is dropped from the database, all rows, indexes, and permissions are also deleted |
Delete the speed | The deletion speed is slow and the deletion is line by line | Fast deletion | Fastest deletion |
74. The difference between a UNION and a UNION ALL?
- Union: Union two result sets, excluding duplicate rows, and sort the default rules at the same time;
- Union All: Performs the Union operation on two result sets, including repeating rows without sorting;
- The efficiency of UNION is higher than that of UNION ALL
75. SQL life cycle?
- The server establishes a connection to the database
- The database process gets the requested SQL
- Parse and generate an execution plan, execute
- Read data into memory and process it logically
- Send the result to the client through the connection in step 1
- Close the connection and release resources
76. What is the execution order of an Sql query?
77. Does a query use an index when the column value is NULL?
Columns with NULL values can also be indexed
When you plan to index columns, you should avoid setting it to nullable because it makes it difficult for MySQL to optimize queries that reference nullable columns and adds complexity to the engine
78. Care about the SQL time in the business system? Statistics too slow query? How are slow queries optimized?
- When we write Sql, we should develop the habit of using Explain analysis.
- Operation and maintenance will give us the statistics of slow query regularly
Optimizing slow queries:
- Analyze statements to see if unnecessary fields/data are loaded.
- Analyze SQl execution sentences, index hits, etc.
- If the SQL is complex, optimize the SQL structure
- If the amount of data in the table is too large, consider splitting the table
Check out my article: Backend Programmers must Have: 30 Tips for Writing High-quality SQL
79. Does the primary key use an incremented ID or a UUID? Why?
In a single machine, select auto-increment ID. For distributed systems, UUID is preferred, but it is best to have a distributed unique ID production solution in your company.
- Self-added ID: provides small data storage space and high query efficiency. However, if the amount of data is too large, it will exceed the value range of self-growth, and there may be problems with multi-library merging.
- Uuid: It is suitable for inserting and updating a large amount of data, but it is disorderly, slow to insert data, and occupies a large space.
Mysql > alter table select * from primary key;
Autoincrement primary key generally uses int type, generally can not reach the maximum value, can consider ahead of the library sub-table.
81. Why is the field required to be not NULL?
Null values take up more bytes, and null has a lot of pits.
82. If you want to store user password hashes, what fields should be used for storage?
Fixed length strings such as password hashes, salt, and user id numbers should be stored in char rather than vARCHar to save space and improve retrieval efficiency.
What is the Mysql driver?
Mysql-connector-java-5.1.18.jar
The Mysql driver helps the programming language communicate with the Mysql server, such as connecting, transferring data, closing, etc.
84. How to optimize long and difficult query statements? Have you ever been in action?
- Divide a large query into multiple small identical queries
- Reduce the search for redundant records.
- A complex query can be considered split into multiple simple queries
- Decompose associated queries to make caching more efficient.
85. Optimize specific types of query statements
Accumulation at ordinary times:
- For example, use select specific fields instead of select *
- Use count(*) instead of count(column name)
- Use caching where business is not affected
- Explain the analysis of your SQL
Check out my article: Backend Programmers must Have: 30 Tips for Writing High-quality SQL
MySQL database CPU surge, how to handle?
Investigation process:
- Use the top command to observe and determine if mysqld is causing this or some other cause.
- If mysqld is the cause, show processList to check session status to see if any SQL is running that consumes resources.
- Identify high-consumption SQL to see if the execution plan is accurate, if the index is missing, and if the amount of data is too large.
Processing:
- Kill these threads (and see if CPU usage drops),
- Make adjustments accordingly (such as adding indexes, changing SQL, changing memory parameters)
- Rerun the SQL.
Other information:
It is also possible that each SQL server does not consume a lot of resources, but all of a sudden, a large number of sessions are connected and the CPU spikes. In this case, you need to work with the application to analyze why the number of connections increases and adjust accordingly, such as limiting the number of connections
87. Common schemes for read/write separation?
- Based on the service logic, write operation commands such as add, delete, or change are sent to the primary database, and query commands are sent to the secondary database.
- Middleware is used as a proxy, responsible for identifying read or write requests from the database and distributing them to different databases. (e.g. Amoeba, mysql-proxy)
88. MySQL replication principle and process
The master-slave replication principle, in short, consists of three steps, as follows:
- The main database has a bin-log binary file that records all Sql statements. (Binlog thread)
- Copy the SQL statements from the bin-log file of the primary database from the secondary database. (IO thread)
- Execute these SQL statements again from the database relay-log redo log file. (Sql execution thread)
As shown below:
The master-slave replication above is performed in five steps:
Step 1: Update events (UPDATE, INSERT, DELETE) of the master library are written to the binlog
Step 2: Initiate a connection from the library to connect to the master library.
Step 3: The master library creates a binlog dump thread and sends the contents of the binlog to the slave library.
Step 4: After starting from the slave library, create an I/O thread to read the binlog content from the master library and write it to the relay log
Step 5: An SQL thread is also created to read from the relay log, execute the read update event from Exec_Master_Log_Pos, and write the update to the SLAVE DB
MySQL > select TIMESTAMP from DATETIME
The storage accuracy is all seconds
The difference between:
- DATETIME dates range 1001 — 9999; The time range of TIMESTAMP is 1970-2038
- DATETIME Storage time is independent of time zone; TIMESTAMP stores time depending on the time zone, and the value displayed depends on the time zone
- DATETIME is stored in 8 bytes; The storage space of TIMESTAMP is 4 bytes
- DATETIME defaults to null; The TIMESTAMP field is not null by default, the default value is the current time (CURRENT_TIMESTAMP)
Innodb transaction implementation principle?
- Atomicity: It is implemented using undo log. If an error occurs during the transaction execution or the user performs rollback, the system returns the status of the beginning of the transaction using undo log logs.
- Persistence: Use the redo log to restore data in the event of a system crash, as long as the redo log is persisted.
- Isolation: Transactions are isolated from each other through locks and MVCC.
- Consistency: Consistency is achieved through rollback, recovery, and isolation in concurrent cases.
91. Talk about MySQL Explain
Explain execution plan contains the following fields: ID, select_TYPE, table, PARTITIONS, type, possible_keys, key, KEY_len, ref, rows, filtered, and Extra.
We’ll focus on Type, whose attributes are sorted as follows:
system > const > eq_ref > ref > ref_or_null >
index_merge > unique_subquery > index_subquery >
range > index > ALL
Copy the code
Interviewer: Will not see Explain execution plan, resume dare to write SQL optimization?
Innodb transaction and log implementation
How many kinds of logs are there
Innodb log redo and undo.
Log storage mode
- Redo: Page changes are written to the redo log buffer, then to the redo log file system cache (fwrite), and then to the disk file (fsync).
- Undo: Before MySQL5.5, Undo logs can only be stored in ibData files. After 5.6, innodb_undo_TABLespaces can be used to store Undo logs outside ibData.
How are transactions implemented through logging
- This is because when a transaction changes a page, undo is remembered first, redo is remembered before undo, then redo is modified, then redo is remembered. Redo (which includes undo modifications) must be persisted to disk before data pages.
- When a transaction needs to be rolled back, because of Undo, the data page can be rolled back to the state of the previous mirror. On crash recovery, if the redo log does not have a commit record for the transaction, undo needs to be rolled back to the time before the transaction started.
- If there is a commit record, roll forward through redo until the transaction is complete and commit.
The maximum length of the TEXT data type in MySQL
- TINYTEXT: 256 bytes
- TEXT: 65535 bytes (64 KB)
- MEDIUMTEXT: 16777215 bytes (16 MB)
- LONGTEXT:4,294,967,295 bytes(4GB)
500 DB units, restart as soon as possible.
- You can use the batch SSH tool PSSH to run restart commands on the machines that need to be restarted.
- You can also use a multithreaded tool such as salt (provided the client has salt installed) or Ansible (ansible requires SSH to be accessible) to operate multiple services at the same time
95. How do you monitor your database? How do you search your slow logs?
There are many monitoring tools, such as Zabbix and LEPUS, which I’m using here
96. Have you done a master-slave consistency check? If so, how? If not, what are you going to do?
There are many tools for checking primary-secondary consistency, such as checksum, mysqlDIff, and pt-table-checksum
97. Does your database support emoji storage? If not, how to operate?
Replace character set UTf8 –> UTF8MB4
98. How does MySQL get the current date?
SELECT CURRENT_DATE();
99. With table A of 600 million and table B of 300 million, how can you query the 200 data records from 50000 to 50200 in the fastest way through the external TID association?
Select * from A,B where a.id = b.id and a.id >500000 LIMIT 200 select * from A where a.id = b.id and a.id >500000 limit 200
2. If the TID of table A is not continuous, then overwrite index is required. TID is either a primary key or secondary index. Table B ID also needs to have an index. Select * from b, (select tid from a limit 50000,200) a where b.id = a.tid;
Mysql > select * from ‘SQL’
An SQL lock can be divided into 9 cases:
- Combination one: the ID column is the primary key, RC isolation level
- Combination two: the ID column is a secondary unique index, RC isolation level
- Combination three: the ID column is a secondary non-unique index, RC isolation level
- Combination four: No index on id column, RC isolation level
- Combination 5: The ID column is the primary key and RR isolation level
- Combination 6: THE ID column is a secondary unique index, RR isolation level
- Combination 7: ID column is level 2 non-unique index, RR isolation level
- Combination 8: No index on id column, RR isolation level
- Combination nine: Serializable isolation level
The public,
- Welcome to pay attention to my personal public number, make friends, learn together ha ~
- If the answer is wrong, welcome to point out ha, thank you ~