MySQL– Classic Eight strands

MySQL architecture

  1. Connection layer: At the top are the clients and connection services.
  2. Service layer: Most of the core service functions are performed, including query parsing, analysis, optimization, caching, and all built-in functions. All cross-storage functions are implemented in this layer, including triggers, stored procedures, views, and so on.
  3. Engine layer: The third layer is the storage engine layer. The storage engine is really responsible for the storage and extraction of data in MySQL. The server communicates with the storage engine through API. Different storage engines have different functions, so you can select them according to your actual needs
  4. Storage layer: The fourth layer is the data storage layer, which stores data on the file system running on the device and interacts with the storage engine

Q: What is the query process of MySQL? How does an SQL statement execute in MySQL?

Client request –> Connector (verify user identity, grant permissions) –> query cache (if cache exists, return directly, If not, perform subsequent operations) –> analyzer (perform lexical analysis and syntax analysis operations on SQL) –> optimizer (mainly select the best execution scheme method for SQL optimization) –> executor (execute will first see whether the user has execution permission, –> Go to the engine layer to retrieve data returns (if query caching is enabled, query results will be cached)

Storage engine

You can choose which engine to use flexibly. Multiple tables in a database can use different engines to meet various performance and practical requirements. Using an appropriate storage engine will improve the performance of the entire database.

Common storage engines are InnoDB, MyISAM, Memory, and NDB.

InnoDB is now MySQL’s default storage engine with support for transactions, row-level locking and foreign keys

Q: What storage engines does MySQL have? What are the differences?

  1. InnoDB supports transactions, MyISAM does not
  2. InnoDB supports foreign keys, while MyISAM does not
  3. InnoDB is a clustered index, MyISAM is a non-clustered index. The files in the clustered index are stored on the leaf node of the primary key index, so InnoDB must have a primary key, which is very efficient.
  4. InnoDB’s minimum lock size is row lock, MyISAM’s minimum lock size is table lock. An update statement locks the entire table, blocking all other queries and updates
  5. InnoDB does not store the exact number of rows of a table, MyISAM does, so MyISAM is executingselect count(*)faster

Data types

There are five main categories

  1. Integer type: BIT, BOOL, INT, BIG INT
  2. Floating point types: FLOAT, DOUBLE, DECIMAL
  3. The value can be CHAR, VARCHAR, TINY TEXT, TEXT, MEDIUM TEXT, or LONGTEXT
  4. Date type: Date, DateTime, TimeStamp, Time, Year
  5. Others: BINARY, VARBINARY, ENUM, SET

Interview question: What is the difference between CHAR and VARCHAR?

Char is fixed length, vARCHar length is variable. Char allocates storage space regardless of the actual data length. The latter allocates final storage space based on the data actually stored. For very short columns, CHAR is also more storage efficient than VARCHAR.

The same:

  1. Char (n), varchar(n) n represents the number of characters
  2. The string is truncated when the maximum length n of a char varchar is exceeded

Different:

  1. Char takes up n characters regardless of the actual number of characters stored, whereas vARCHar takes up only the amount of byte space that the actual character should take up plus 1 to record the length
  2. The maximum storage space for a char is 255 bytes

Index — key points

** Interview question: ** What do you understand about MySQL indexes?

An Index is a data structure that helps MySQL retrieve data efficiently. It can be simply described as a “fast ordered lookup data structure”. These data structures refer to (point to) the data in a way that makes it possible to implement advanced lookup algorithms on these data structures. It exists to improve query efficiency.

Indexes themselves are too large to be stored in memory, but are usually stored on disk in the form of index files

advantage

  • Improve data retrieval efficiency and reduce database IO costs
  • Reduce the cost of data sorting and reduce CPU consumption

disadvantage

  • An index is also a table that holds the primary key and index fields and points to records in the entity table, so it is also memory intensive
  • While indexes greatly speed up queries, they slow down the speed of updating tables, such as INSERTS, UPDATES, and DELETES. MySQL will update the index file every time it updates a column that has been added to the index

In which case you need to create an index

  1. The primary key automatically creates a unique index
  2. Fields frequently used as query criteria
  3. The foreign key relationship is used to index the fields associated with other tables in the query
  4. Single key/composite index selection problem, high concurrency tends to create composite index
  5. A sorted field in a query that is significantly faster through index access

When indexes should not be created:

  1. Indexes should not be created for columns that are rarely used or referenced in queries
  2. You should also not add indexes to columns that have very few data values
  3. Indexes should not be created when the modification performance is much greater than the retrieval performance

The index classification

Data structure perspective
  • B+ tree index: data ordered, can be a range query
  • Hash index: Converts database field data to a fixed-length Hash value using the Hash algorithm. Queries are efficient, but they cannot be sorted, so they are not suitable for range lookups. There may also be hash conflicts
  • Full-text index, a special type of index in MyISAM
  • R – Tree indexes
Physical Storage Angle
  • Clustered Index

  • Both clustered and non-clustered indexes are B+ tree structures

    Differences between the two:

    • Clustered index A table can have only one clustered index, but a non-clustered index a table can have multiple clustered indexes
    • Clustered indexes store records that are physically contiguous, whereas non-clustered indexes are logically contiguous and the physical storage is not contiguous
    • The leaf node of the cluster index is the data node. A leaf node that is not a clustered index is still an index node but has a pointer to the corresponding data block
The logical point of view
  • Primary key index: A primary key index is a special unique index that does not allow empty values
  • Plain index or single-column index: Each index contains a single column. A table can have multiple single-column indexes
  • Multi-column index (compound index, combined index) : A compound index is an index created on multiple fields. The index is used only when the first field is used in the query condition. When using composite indexes, follow the left-most prefix rule — that is, the left-most index takes precedence
  • Unique index or non-unique index
  • Spatial index: A spatial index is an index of a field of a spatial data type

Index structure

The first thing to understand is that indexes are implemented at the storage Engine level, not the server level. And not all engines support all types of indexes

MyISAM and InnoDB storage engines both use B+Tree data structure. Compared with B-tree structure, all data are stored on leaf nodes, and leaf nodes are connected together by Pointers to form a data linked list to speed up the retrieval efficiency of adjacent data.

The first thing to understand is the difference between a B tree and a B+ tree

B tree

B-tree is a balanced search Tree designed for external storage devices such as disks.

When the system reads data from disks to memory, the basic unit is disk blocks. The data in the same disk block is read at a time instead of what is needed. InnoDB storage engine has the concept of pages, the smallest unit of disk management. The default page size in the InnoDB storage engine is 16KB,

The storage space of a system disk block is usually not that large, so InnoDB uses several contiguous disk blocks to achieve a page size of 16KB each time it requests disk space. InnoDB reads data from disk to disk on a page basis. If each piece of data on a page helps locate data records, this will reduce disk I/O times and improve query efficiency.

Data in the B-tree structure enables the system to efficiently locate the disk block where the data resides. Each node in the B-tree can contain a large number of keywords and branches

Each node occupies the disk space of a disk block. A node has two keywords in ascending order and three Pointers to the child root node. The Pointers store the address of the disk block where the child node resides.

The three scope fields divided into two keywords correspond to the scope fields of the data of the subtree pointed to by the three Pointers. For the root node, the keywords are 17 and 35. The data range of the subtree to which the P1 pointer points is less than 17, the data range of the subtree to which the P2 pointer points is 17-35, and the data range of the subtree to which the P3 pointer points is greater than 35.

Simulate the process of finding keyword 29:

  1. Locate disk block 1 based on the root node and read it into memory. [Disk I/O operation 1]
  2. Compare keyword 29 in the interval (17,35) to find pointer P2 to disk block 1.
  3. Locate disk block 3 according to P2 pointer and read into memory. [Disk I/O operation 2nd]
  4. Compare keyword 29 in interval (26,30) to find pointer P2 to disk block 3.
  5. Locate disk block 8 according to P2 pointer and read into memory. [Disk I/O operation 3]
  6. Find keyword 29 in the keyword list in disk block 8.

Analyzing the above procedure, we found that three disk I/O operations and three memory lookups were required. Since the keyword in memory is an ordered table structure, dichotomy lookup can be used to improve efficiency. Three disk I/O operations affect the efficiency of the entire B-tree search. Compared with AVLTree, B-tree reduces the number of nodes, so that each disk I/O data from memory plays a role, thus improving the query efficiency.

B + tree

B+Tree is an optimization based on B-Tree, making it more suitable for implementing external storage index structure.

From the previous B-tree diagram, you can see that each node contains not only the data key value, but also the data value. However, the storage space of each page is limited. If the data is large, the number of keys that can be stored on each node (that is, a page) is small. If the data is large, the depth of the B-tree is large, which increases the disk I/O times and affects the query efficiency.

In a B+Tree, all data nodes are stored on leaf nodes at the same layer according to the size of key values, instead of storing only key value information on non-leaf nodes. In this way, the number of key values stored on each node is greatly increased and the height of the B+Tree is reduced

So the difference with B trees is that

  1. Non-leaf nodes only store key-value information;
  2. There is a chain pointer between all leaf nodes;
  3. Data records are stored in leaf nodes

There are usually two head Pointers on a B+Tree, one to the root node and the other to the leaf node with the smallest keyword, and there is a chain-ring structure between all the leaf nodes (that is, data nodes). Therefore, there are two kinds of lookup operations on B+Tree: a range and paging lookup on the primary key, and a random lookup starting from the root node.

B + Tree

  1. We know that the number of I/OS depends on the height of b+ number h. Assuming that the data in the current data table is N and the number of data items in each disk block is M, h=㏒(m+1)N. When the amount of data N is constant, the larger m is, the smaller H is. The size of a disk block is the size of a data page. The size of a disk block is fixed. If the space occupied by data items is smaller, the number of data items is larger and the height of the tree is lower
  2. When the data items of b+ tree are compound data structures, such as (name,age,sex), the search tree is built on the order of b+ numbers from left to right. For example, when retrieving data such as (Tom,20,F), the b+ tree will preferentially compare names to determine the next direction of search. If the name is the same, then age and sex are compared in turn to obtain the retrieved data. But when (20,F) does not have a name, the b+ tree does not know which node to look up next, which is the left-most prefix principle mentioned earlier

InnoDB primary key index and secondary index structure

The data fields of the leaf nodes of the InnoDB engine index structure store the actual data records. In other words, InnoDB data files themselves are primary key index files, also known as “clustered index”, a table can only have one clustered index.

Secondary indexes are different. For example, a conditional search for the name column requires two steps:

(1) Name is retrieved on the secondary index, and the corresponding primary key is obtained by reaching the leaf node; ② Use the primary key to perform the corresponding retrieval operation on the primary index. This is called a “back table query”

MyISAM primary key index and secondary index structure

MyISAM engine index files and data files are separate. The data fields of the leaf nodes of the MyISAM engine index structure do not store the actual data records, but the addresses of the data records. Index files are separated from data files, and such indexes are called “non-clustered indexes.” The primary index of MyISAM is not much different from the secondary index except that the primary key index cannot have duplicate keys

Mysql > select * from B+ tree;

Each node in a B+ tree stores data, whereas only leaf nodes in a B+ tree store data. Therefore, the height of a B+ tree is higher and I/OS are more frequent when the same amount of data is searched. Database indexes are stored on disk. When there is a large amount of data, the entire index cannot be loaded into memory. Instead, each disk page (corresponding node of the index tree) can be loaded one by one. The B+ tree is further optimized in MySQL: the leaf node is a bidirectional linked list, and the head node and tail node in the linked list are also pointed to circularly.

Interview question: Why not Hash?

The underlying Hash index is a Hash table, which is a key-value data storage structure. Therefore, the storage relationship of multiple data has no sequential relationship at all. Therefore, the interval query cannot be directly queried through the index, so full table scan is required. Therefore, hash indexes are only suitable for equivalent query scenarios. B+ Tree is a multi-way balanced query Tree, so its nodes are naturally ordered (the left child node is smaller than the parent node, and the parent node is smaller than the right child node), so there is no need to perform full table scan for range query. Moreover, hash indexes do not support the leftmost matching rule of multi-column joint indexes. If there are a large number of duplicate keys, hash indexes will be inefficient because of hash collisions.

5. The query

The difference between count(*) and count(1) and count(column name)

The execution result

  1. Count (*) includes all columns and is equivalent to the number of rows. NULL columns are not ignored when the result is counted
  2. Count (1) includes all columns, with 1 representing the line of code. NULL columns are not ignored when counting results
  3. Count (column name) includes only the column name. When the result is counted, the count of the column value that is null is ignored. That is, if the value of a column is null, the count is not counted

Execution efficiency

  1. Column name primary key, count(column name) will be faster than count(1)
  2. The column name is not the primary key, count(1) will be faster than count(column name)
  3. If the table has multiple columns and no primary key, count(1) performs better than count(*)
  4. Select count (primary key) is optimal if there is a primary key
  5. Select count(*) is optimal if the table has only one field

Q: What is the difference between in and exists in MySQL?

Reference: blog.csdn.net/jinjiniao1/…

SELECT * FROM A WHERE A.id IN (SELECT id FROM B);
//Can be converted intoSELECT * FROM A WHERE A.id = 1 OR A.id = 2 OR A.id = 3;//Using the index of A has little to do with BSELECT * FROM A WHERE EXISTS (SELECT * from B WHERE B.id = A.id);//Using the index of B doesn't have much to do with ACopy the code
  • Exists: An external object is queried by loop. Each query will check the condition statement in exists. If the condition statement in exists can return the record row, the condition is true and returns the current loop to the record. Otherwise, the current loop to the record is discarded. The exists condition is like a bool condition: true if it can return a result set; false if it cannot

    Summary: If there are n records in table A, then the exists query is to extract the n records one by one and determine the exists condition n times

  • In: In query is equivalent to the summation of multiple OR conditions: in query is to first find out all records of sub-query conditions, assume the result set is B, there are m records in total, and then decompose the result set of sub-query conditions into M, and then conduct m queries

Therefore, there is little difference between in and EXISTS if the two tables queried are of the same size.

If one of the two tables is smaller and the other is larger, the subtable exists and the smaller subtable in are used:

What is the difference between a UNION and a UNION ALL?

Both UNION and UNION ALL combine two result sets into one. The number of SQL statement fields to be combined must be the same and the field types must be the same.

  • A UNION will filter out duplicate data records after table join (inefficient), while a UNION ALL will not remove duplicate data records.
  • A UNION sorts by field order, whereas a UNION ALL simply merges the two results and returns.

SQL Execution sequence

  • handwritten
SELECT DISTINCT <select_list>
FROM  <left_table> <join_type>
JOIN  <right_table> ON <join_condition>
WHERE  <where_condition>
GROUP BY  <group_by_list>
HAVING <having_condition>
ORDER BY <order_by_condition>
LIMIT <limit_number>
Copy the code
  • Machine readable
FROM  <left_table>
ON <join_condition>
<join_type> JOIN  <right_table> 
WHERE  <where_condition>
GROUP BY  <group_by_list>
HAVING <having_condition>
SELECT
DISTINCT <select_list>
ORDER BY <order_by_condition>
LIMIT <limit_number>
Copy the code

Join operation

6. The SQL transaction

A transaction is a logical set of operations that either all or none of them execute. It is also divided into distributed transactions and database transactions

Transaction logging is InnoDB’s use of logging to reduce the overhead of committing transactions

ACID implementation principle

Atomicity — All operations in the entire transaction either complete or not complete. Undo log Rollback logs

Consistency – Database integrity constraints are not broken before and after a transaction. The other three guarantees consistency

Isolation – The execution of one transaction cannot be interfered by other transactions. Guaranteed by lock, MVCC

“Durability” – After a transaction completes, changes made to the database by that transaction persist in the database and are not rolled back. Redo log Ensures transaction persistence

What are the problems associated with concurrent transactions?

  • Dirty read: When transaction A reads the data updated by transaction B and then transaction B rolls back the data, the data read by transaction A is Dirty
  • Lost Update: When transaction A and transaction B select the same row and then Update the row based on the value originally selected, the Lost Update problem occurs because both transactions are unaware of each other’s existence
  • Unrepeatable read: When transaction A reads the same data for many times, transaction B updates and commits the data during the process of reading the same data for many times, resulting in inconsistent results when transaction A reads the same data for many times
  • Phantom read: Phantom read is similar to unrepeatable reading. It occurs when one transaction (T1) reads several rows of data, followed by another concurrent transaction (T2) inserts some data. In subsequent queries, the first transaction (T1) will find more records that did not exist before, as if an illusion occurred, so it is called phantom read.

The difference between unrepeatable and phantom reads:

  • The point of non-repeatable reads is to modify: in the same transaction, the data read for the first time is different from the data read for the second time under the same conditions. (Because other transactions committed changes in the middle)
  • The key point of magic reading is to add or delete: in the same transaction, under the same conditions, the first and second read records are not the same. (Because other transactions committed insert/delete in the middle)

Transaction isolation level

  • Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads.
  • Read-committed: Allows concurrent transactions to READ data that has been COMMITTED, preventing dirty reads, but magic or unrepeatable reads can still occur.
  • REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.
  • SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

MySQL InnoDB storage engine supports isolation level REPEATABLE-READ by default

It is important to emphasize that the transaction isolation level and the concurrency of data access are antithetical, and the higher the transaction isolation level, the worse the concurrency. There is no one-size-fits-all rule for determining the appropriate transaction isolation level for your application.

MVCC multi-version concurrency control

Most of MySQL’s transactional storage engine implementations are not simple row-level locking. In order to improve concurrency, multi-version concurrency control (MVCC) is generally implemented at the same time. MVCC can be considered as a variant of row-level locking, but it avoids locking operations in many cases, so the cost is lower.

The typical MVCC implementation is divided into optimistic concurrency control and pessimistic concurrency control

MVCC only works at COMMITTED READ and REPEATABLE READ isolation levels.

7. MySQL lock

MySQL does not provide a mechanism for deadlock resolution. MySQL does not provide a mechanism for deadlock resolution. MySQL does not provide a mechanism for deadlock resolution

MySQL lock classification

Lock level classification

MySQL InnoDB engine row lock how to implement?

InnoDB implements two types of row locking:

  • Share lock (read lock) Share lock: Other transactions can read but cannot write.
  • Exclusive lock: Other transactions cannot read or write.
Lock granularity classification

What is lock granularity: In order to maximize the concurrency of the database, the scope of data locked at a time is as small as possible. However, managing locks can be very resource-intensive, so database systems need to balance high concurrency responses with system performance, resulting in lock granularity

  1. Row-level lock: the smallest lock granularity in mysql. Indicates that only the current row is locked. Row-level locking can greatly reduce the conflicts of database operations, and its locking granularity is the smallest, but the locking cost is also the largest. Row-level locks are divided into shared locks and exclusive locks.

    Overhead, slow lock, deadlock will occur. The probability of lock conflict is the lowest and the concurrency is the highest.

    InnoDB uses row locks by locking index entries on indexes. InnoDB uses row locks only when index conditions are used to retrieve data. Otherwise,InnoDB uses table locks

  2. Table lock: low overhead, fast lock; No deadlocks occur; Large lock granularity has the highest probability of lock conflict and the lowest concurrency

    Low overhead, fast lock, no deadlocks. The probability of lock conflict is the highest and the concurrency is the lowest

  3. Page-level lock: page-level lock is a type of lock in MySQL whose lock granularity is in the middle of row-level and table-level locks. Deadlocks occur and the concurrency is mediocre.

Interview question: What about optimistic and pessimistic locks for databases?

Optimistic locking and pessimistic locking are two concurrency control ideas that can be used to solve the lost update problem

Optimistic locking will “optimistically” assume that there is no concurrent update conflict, data access and processing will not be locked, only in the update of data according to the version number or timestamp to determine whether there is a conflict, if there is, the transaction will be processed, if there is no transaction. It is implemented with the Version recording mechanism, which is the most common implementation of optimistic locking

Pessimistic locking “pessimistically” assumes a high probability of concurrent update conflicts, applies exclusive locks before data is accessed and processed, locks data during the entire data processing process, and releases the locks only after a transaction is committed or rolled back. In addition to the optimistic lock corresponding, pessimistic lock is achieved by the database itself, to use, we directly call the relevant statements of the database can be.

The lock mode

  • Record Locks: Locks on a single row Record. Locks the index entry to lock the eligible rows. Other transactions cannot modify or delete locked entries;

  • Gap Locks: When we retrieve data using a range condition rather than an equality condition and request shared or exclusive Locks, InnoDB Locks index entries for existing data records that meet the condition. Records whose key values are in the condition range but do not exist are called gaps. InnoDB also locks this “gap”. This locking mechanism is called gap locking. The scope of the record is locked so that other transactions cannot insert data within the scope of the lock, thus preventing phantom reads.

    Note: Using a gap lock locks an interval, not just each piece of data in the interval.

SELECT * FROM table WHERE id BETWEN 1 AND 10 FOR UPDATE;
Copy the code

That is, all rows within (1, 10) will be locked, and all rows with ids 2, 3, 4, 5, 6, 7, 8, and 9 will be blocked, but rows 1 and 10 will not be locked.

  • Next-key Locks:Key in the lock, it isCombination of record lock and gap lock, which contains both index records and index ranges. The main purpose of keylocking is also to avoidPhantom read(Phantom Read).

A deadlock

Deadlock generation: A deadlock occurs when two or more transactions occupy the same resource and request to lock the resource occupied by the other transaction. (Both transactions hold the lock that the other needs and are waiting for the other to release it, and neither transaction will release its own lock).

Associated with transaction execution order and storage engine

Deadlock solution?

  1. Wait until time out
  2. Initiate deadlock detection that actively rolls back one transaction (the transaction holding the least row-level exclusive lock is rolled back) and allows other transactions to continue.

InnoDB avoids deadlocks:

  • Can be used at the start of a transaction by each ancestor (row) that is expected to be modifiedSELECT ... FOR UPDATEStatement to obtain the necessary locks
  • In a transaction, if you want to update records, you should directly apply for a sufficient level of locks, that is, exclusive locks, rather than applying for shared locks first
  • If a transaction needs to modify or lock more than one table, lock statements should be used in the same order in each transaction. In an application, if different programs concurrently access multiple tables, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlock

MyISAM avoids deadlocks:

In the case of automatic locking, MyISAM always obtains all the locks required by the SQL statement at once, so MyISAM tables do not deadlock.

Eight. MySQL optimization

How do you optimize SQL in your daily work? What are the general steps of SQL optimization, and how to write SQL to effectively use composite indexes? How do you optimize a SQL that takes too long to execute? What is the leftmost prefix principle? What is the leftmost matching principle?

MySQL common performance analysis methods

Common analysis methods include slow query logs, EXPLAIN analysis queries, profiling and show commands to query system status and system variables

What is the implementation plan explain

Use the Explain keyword to simulate the optimizer’s execution of SQL queries to see how MySQL processes your SQL statements. It can be

  • The read order of the table
  • Operation type of data read operation
  • Which indexes are available
  • Which indexes are actually used
  • References between tables

The index optimization

  • For single-key indexes, try to select indexes with better filtering for the current Query

  • When selecting a composite index, the filtering field in the current Query is higher in the index field order, the better.

  • When selecting a composite index, try to select an index that contains as many fields as possible from the WHERE clause in the current query

In what cases do you not use indexes in a query?

  1. When you use the LIKE keyword in a query statement, if the first character of the matching string is %, the index is not used. If ‘%’ is not in the first position, the index is used.
  2. Not doing anything on the index column (calculation, function, (automatic or manual) type conversion) will cause the index to fail and move to a full table scan
  3. A multi-column index creates an index on multiple fields of a table. The index is used only when the first field of these fields is used in the query condition.
  4. When only the OR keyword is used, if the columns before and after the OR are indexes, the indexes are used in the query. If the column with a condition before OR after is not an index, the index will not be used in the query.

Query optimization

Let the small table drive the big table

The Order by optimization

Order by clause, try to sort by Index. Do the sorting on the index column as much as possible, following the best left prefix for the index

Group by optimization

Group by is essentially sorted and then grouped, following the best left prefix for the index

Ix. Other questions

Sub-database and sub-table (Understanding)

Why to separate the database: After the database cluster environment, there are many slaves, which basically meet the reading operation; However, write operations or large data and frequent write operations have a great impact on master performance. In this case, a single library cannot solve the problem of large-scale concurrent write, so separate libraries are considered.

Therefore, we split the tables originally stored in one library and store them in multiple libraries. Usually, the tables are divided according to functional modules and relationship degree and deployed to different libraries. Advantages:

  • Reduce the impact of locks on queries when incremental data is written
  • As the number of single tables decreases, common query operations reduce the number of records that need to be scanned, reducing the number of rows required for a single query, reducing disk I/OS, and shortening the latency

However, it cannot solve the problem of too much data in a single table -> split table

  • Vertical split Vertical split table, usually by the frequency of use of business functions, the main and popular fields together as the main table. Then the infrequently used ones are aggregated according to their business attributes and split into different secondary tables.
  • Horizontal split (data sharding) the capacity of a single table cannot exceed 500W. Otherwise, horizontal split is recommended. A table is copied into different tables with the same table structure, and data is divided according to certain rules and stored in these tables to ensure that the capacity of a single table is not too large and improve performance.

Problems caused by separate database and table:

  • Data integrity and consistency issues
  • Data manipulation dimensions
  • Federated query problems across libraries

Interview question: Tell me how to delete data of million level or above

  1. So when we want to delete millions of data, we can delete the index first
  2. Then delete the useless data
  3. Re-creating the index after the deletion (when there is less data) is also very fast
  4. With the previous direct delete is definitely much faster, not to mention in case of delete interruption, all delete will be rolled back.

Reference and thanks:

zhuanlan.zhihu.com/p/164519371

zhuanlan.zhihu.com/p/29150809

Juejin. Im/post / 5 e3eb6…