Why use autoincrement column as primary key
1. If we define a PRIMARY KEY, InnoDB will select the PRIMARY KEY as the clustered index. If there is no explicit PRIMARY KEY, InnoDB will select the first unique index that does not contain a NULL value as the PRIMARY KEY index. InnoDB selects the built-in 6-byte ROWID as the implied clustered index (ROWID increases with the primary key as row records are written, and this ROWID is not referential like ORACLE’s ROWID, which is implied).
2. The data record itself is stored on the leaf node of the main index (a B+Tree). This request within the same leaf node (the size of a memory or disk pages) of the individual data records in the primary key order, so every time when we have a new record into the MySQL will according to its primary key node and insert it into the appropriate position, if the page to load factor (InnoDB default for 15/16), then create a new page (node)
3. If the table uses auto-increment primary keys, each time a new record is inserted, the records are sequentially added to the current index node, and when a page is full, a new page is automatically created
4, if the use of the primary key (if the id number or student id, etc.), since each into the approximation of random key values, so every time a new record to be inserted into the existing index page to a certain position in the middle, the MySQL had to in order to insert a new record to the appropriate location and mobile data, or even the target page may have been back from the cache to disk and clear it, The OPTIMIZE TABLE had to be rebuilt and the fill page had to be optimized as it had to be read back from disk, which added a lot of overhead and fragmentation due to frequent movement and paging.
Why does data indexing improve efficiency
1. Data index storage is ordered
2. In the ordered case, querying a data through an index does not require traversing the index record
3. In extreme cases, the query efficiency of data index is dichotomy query efficiency, which is close to log2(N).
B+ tree index and hash index difference
B+ tree is a balanced multi-fork tree, the height difference from the root node to each leaf node is no more than 1, and the nodes of the same level are linked by Pointers, which is ordered
Hash index is to use a certain hash algorithm, the key value into a new hash value, the retrieval does not need to be similar to B+ tree from the root node to the leaf node step by step search, just a hash algorithm, is disordered
Advantages of hash indexes:
Equivalent query.
Hash indexes have an absolute advantage.
The premise is that there are not a lot of duplicate keys. If there are a lot of duplicate keys, hash indexes are inefficient because of the so-called hash collision problem.
Scenarios where hash indexes are not applicable:
1. Range query is not supported
2, index complete sort is not supported
3. The leftmost prefix matching rule of the federated index is not supported
In general, the B+ tree index structure is suitable for most scenarios, but hash indexes are more advantageous in scenarios like the following:
In HEAP tables, hash indexes are especially suitable when the data stored is of low repetition (that is, a large cardinality), and the column data is mainly queried by equivalence, without range queries, and without sorting, as in SQL like this:
Select id,name from table where name=' li Ming '; - Query only equivalentsCopy the code
And the commonly used InnoDB engine is used by default B + tree index, it can real-time monitoring the usage of table index, if think build a hash index can improve the efficiency of query, automatically in memory “adaptive hash index buffer” establish a hash index (the default in the InnoDB open adaptive hash index), by observing the search pattern, MySQL uses the index key prefix to create a hash index. If a table is mostly in the buffer pool, creating a hash index can speed up equivalent queries.
Note: Under some workloads, the performance gain from hash index lookups far outweighs the additional overhead of monitoring index searches and maintaining the hash table structure. But in some cases, read/write locks added to adaptive hash indexes can cause competition under high load conditions, such as high-concurrency join operations. The like operation and the % wildcard operation also do not work with adaptive hash indexes and may have to be turned off.
B tree and B+ tree
1. B tree, each node stores key and data, all nodes constitute this tree, and leaf node pointer is NUL, leaf node does not contain any keyword information.
2, B + tree, all the leaf node contains all the key information, and pointer to contain these keywords records, and the leaf node itself depends on the size of the keywords from the big order link, all the terminal nodes can be regarded as index part, nodes contains only its roots in the largest (or smallest) key. (The non-end nodes of the B-tree also contain valid information to look up)
Why is B+ more suitable than B tree for file index and database index in practical operating system?
1. The disk read and write cost of B+ is lower. The internal node of B+ does not point to the specific information of the keyword. So the internal nodes are smaller than the b-tree.
If all the keywords of the same internal node are stored in the same disk block, then the disk block can contain more keywords. Read into memory at a time to find more keywords. The number of IO reads and writes is relatively low.
2. The query efficiency of B+-tree is more stable because the non-endpoints are not the nodes that ultimately point to the content of the file, but only the index of the keyword in the leaf node. So any keyword lookup must take a path from root to leaf. The length of all keyword query paths is the same, resulting in the same query efficiency of each data.
MySQL > select * from MySQL;
A federated index is an index on two or more columns. For federated indexes :Mysql uses fields in the index from left to right. A query can use only a portion of the index, but only the leftmost portion. For example, the index is the key index (A, B, C). A combination of A, A, B, A, B, and C is supported, but b and C are not supported. An index is useful when the leftmost field is a constant reference.
2. You can narrow your search by using additional columns in the index, but using one index with two columns is different from using two separate indexes. The structure of a composite index is similar to that of a phone book, where people’s names are made up of first and last names, and the phone book sorts people with the same last name by first name. If you know your last name, a phone book will be very useful. A phone book is more useful if you know first and last names, but useless if you only know first and last names.
When should no or few indexes be built
1. Too few table records
2, frequently inserted, deleted, modified table
3. If A table has 100,000 rows and A field A has only T and F values, and the probability of distribution of each value is about 50%, building indexes for such A table A field will not improve the speed of database query.
Select * from a table where the index value of the primary field is larger than that of the primary field
What is a table partition?
Table partitioning refers to dividing a table in a database into smaller, manageable parts according to certain rules. Logically, there is only one table, but the underlying table is composed of multiple physical partitions.
The difference between a table partition and a sub-table
Table splitting: A table is divided into several different tables according to certain rules. For example, record user orders into multiple tables based on time.
The difference between a partitioned table and a partition is that a partition logically has only one table, while a partitioned table splits a table into multiple tables.
What are the benefits of table partitioning?
1. Partitioned table data can be distributed across different physical devices to efficiently utilize multiple hardware devices. 2. More data can be stored than a single disk or file system
2. Optimize queries. When partitioning conditions are included in a WHERE statement, you can scan only one or more partitioned tables to improve query efficiency; When sum and count statements are involved, they can also be processed in parallel across multiple partitions, and the results can be summarized.
Partition tables are easier to maintain. For example, if you want to delete a large amount of data in a batch, you can clear the entire partition.
4. You can use partitioned tables to avoid specific bottlenecks, such as mutually exclusive access to InnoDB’s single indexes, and ext3’s inode lock contention.
Constraints on partitioned tables
1. A table can have a maximum of 1024 partitions
In MySQL5.1, a partition expression must be an integer, or an expression that returns an integer. Non-integer expression partitioning is supported in MySQL5.5.
3. If a partition column has a primary key or unique index, all columns with a primary key and unique index must be included. That is, a partitioned field contains either no primary key or index columns, or all primary key and index columns.
Foreign key constraints cannot be used in partitioned tables
5, MySQL partition applies to all data and index of a table, can not only partition table data without index partition, can not only partition table index partition without table partition, can not only partition table data partition.
How to determine whether the current MySQL database supports partitioning?
Mysql > show variables like ‘%partition%’
mysql> show variables like '%partition%'; +-------------------+-------+| Variable_name | Value |+-------------------+-------+| have_partitioning | YES | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + 1 row in the set (0.00 SEC)Copy the code
The value of HAVE_PARTintrees is YES, indicating that partitions are supported.
What are the partition types supported by MySQL?
1. RANGE partitioning: This mode allows data to be divided into different ranges. For example, a table can be divided into several partitions by year
LIST partitioning: This mode allows the system to split data by predefined LIST values. Partition by the values in List. The difference between RANGE and List is that RANGE values are continuous.
3. HASH partitioning: This mode allows the HASH Key of one or more columns of the table to be computed and the data region corresponding to the different values of the HASH code to be partitioned. For example, you can create a table that partitions the primary key of the table.
4. KEY partitioning: An extension of the Hash pattern above, where Hash keys are generated by the MySQL system.
Four levels of isolation
1, Serializable: can avoid dirty read, unrepeatable read, phantom read.
Repeatable read: can avoid the occurrence of dirty reading and unrepeatable reading.
3, Read COMMITTED: Can avoid dirty reads.
4. Read Uncommitted: The lowest level, never guaranteed.
About MVVC
The MySQL InnoDB storage engine is a multi-version Concurrency Control protocol (MVCC). The opposite of MVCC is the Lock-based Concurrency Control.
The biggest benefits of MVCC: read without locking, read and write without conflict. In OLTP applications with more read and less write, read and write conflict is very important, which greatly increases the concurrency performance of the system. At present, almost all RDBMS support MVCC.
Concurrency Control for a lock-based Concurrency Control.
Concurrency Control is a multi-version Concurrency Control protocol. While purely lock-based concurrency has low concurrency, MVCC is an improvement on lock-based concurrency control, mainly for read operations.
In MVCC concurrency control, read operations can be divided into two categories:
1. Snapshot read: reads the visible version of the record (possibly the historical version) without locking (the shared read lock s is not locked, so it does not block other transactions).
2. Current read: The latest version of the record is read, and any record returned by the current read is locked to ensure that other transactions will not modify the record concurrently.
Advantages of row-level locking:
1. There are only a few locking conflicts when different rows are accessed in many threads.
2. Rollback with only a few changes
3. Can lock a single row for a long time.
Disadvantages of row-level locking:
1. More memory than page – or table-level locking.
2. Slower than page-level or table-level locking when used in large parts of a table because you have to acquire more locks.
3. If you do GROUP BY frequently on most data or have to scan the entire table frequently, it is significantly slower than other locks.
4. With high-level locking, you can also easily tune your application by supporting different types of locking because the locking cost is less than row-level locking.
MySQL optimization
1. Enable query cache to optimize the query
Explain your select queries. This can help you analyze performance bottlenecks in your query or table structure. The EXPLAIN query results will also tell you how your index primary key is used and how your tables are searched and sorted
When limit 1 is used, the MySQL database engine will stop searching for one row of data, rather than continuing to search for the next row that matches the record
4. Index the search field
5, Use ENUM instead of VARCHAR. If you have a field such as “sex”, “country”, “ethnicity”, “status” or “department”, and you know that these fields have finite and fixed values, then you should use ENUM instead of VARCHAR.
Prepared Statements are a collection of SQL Statements that run in the background, much like stored procedures. We can benefit a lot from using Prepared Statements, both in terms of performance and security. Prepared Statements can check some of the variables you have bound to protect your program from “SQL injection” attacks
7, vertical table
8. Select the correct storage engine
Difference between key and index
1, key is the physical structure of the database, it contains two layers of meaning and function, one is the constraint (focus on constraints and standardize the structural integrity of the database), the second is the index (used to assist the query). The value includes primary key, unique key, and foreign key
2. Index is the physical structure of a database. It is only used for query purposes. If the index should be classified, it can be divided into prefix index, full-text index, etc.
What are the differences between MyISAM and InnoDB in Mysql?
The difference between:
1, InnoDB supports transactions, MyISAM does not support, for InnoDB each SQL language encapsulated as a transaction by default, automatic commit, this will affect the speed, so it is best to put multiple SQL languages between begin and commit, constitute a transaction;
InnoDB supports foreign keys, MyISAM does not. Converting an InnoDB table with foreign keys to MYISAM will fail;
3. InnoDB is a clustered index. Data files are tied together with indexes and must have primary keys. But secondary indexes require two queries, first to the primary key and then to the data through the primary key. Therefore, the primary key should not be too large, because if the primary key is too large, the other indexes will be too large. While MyISAM is a non-clustered index, data files are separated and indexes hold Pointers to data files. Primary and secondary indexes are separate.
Select count(*) from table. InnoDB does not store the number of rows in a table. MyISAM uses a variable to store the number of rows in the entire table. When executing the above statement, you only need to read the variable, which is fast.
Innodb does not support full-text index, while MyISAM supports full-text index. MyISAM has higher query efficiency.
How to choose:
1, whether to support transactions, if yes please select InnoDB, if not you can consider MyISAM;
2. If most of your tables are just read queries, consider MyISAM. If both reads and writes are frequent, use InnoDB.
3. After system crash, it is more difficult for MyISAM to recover. Is that acceptable?
Innodb has become the default Mysql engine (previously MyISAM), showing its advantages are obvious, if you don’t know what to use, use Innodb, at least not bad.
Database table creation considerations
1. Field name and field configuration are reasonable
- Eliminate fields that are not closely related;
- Field names should have rules and corresponding meanings (not part of English, part of pinyin, and such as A.B.C unknown meaning of the field);
- Do not use abbreviations for field names (most abbreviations do not make field meanings clear);
- Fields should not be mixed with case (to be readable, multiple English words can be underlined);
- Do not use reserved words or keywords in field names;
- Keep field names and types consistent;
- Choose number types carefully;
- Leave enough margin for text fields;
2. Handling of special fields of the system and suggestions after completion
- Add delete markers (such as operator, delete time);
- Build version mechanism;
3. Reasonable configuration of table structure
- The processing of multi-type fields is whether there are fields in the table that can be broken down into smaller independent parts (e.g., people can be divided into men and women);
- Multi-value field processing, can be divided into three tables, so that the retrieval and sorting more conditioning, and ensure the integrity of the data!
4. Other suggestions
- For big data fields, separate tables are stored to affect performance (for example, profile fields);
- Use the vARCHar type instead of char, because vARCHAR allocates the length dynamically. Char specifies that the length is fixed.
- Creating a primary key for a table that does not have a primary key affects the query and index definition.
- You are advised to set the default value (for example, the default value of the int type is 0) for index queries.
- Create indexes, preferably on unique and non-empty fields. Creating too many indexes will have certain impact on late inserts and updates (consider the actual situation to create)
Wenyuan network, only for the use of learning, such as infringement, contact deletion.
I’ve compiled the interview questions and answers in PDF files, as well as a set of learning materials covering, but not limited to, the Java Virtual Machine, the Spring framework, Java threads, data structures, design patterns and more.
Follow the public account “Java Circle” for information, as well as quality articles delivered daily.