Java interview summary summary, including Java key knowledge, as well as common open source framework, welcome to read. The article may have wrong place, because individual knowledge is limited, welcome everybody big guy to point out! The article continues to be updated at……

ID The title address
1 Design Mode Interview Questions (Most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
2 Java Basics (most comprehensive interview questions) Juejin. Cn/post / 684490…
3 Java Set interview Questions (the most comprehensive interview questions) Juejin. Cn/post / 684490…
4 JavaIO, BIO, NIO, AIO, Netty Interview Questions (Most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
5 Java Concurrent Programming Interview questions (most comprehensive interview questions) Juejin. Cn/post / 684490…
6 Java Exception Interview Questions (The most comprehensive interview Questions) Juejin. Cn/post / 684490…
7 Java Virtual Machine (JVM) Interview Questions Juejin. Cn/post / 684490…
8 Spring Interview Questions (the most comprehensive interview questions) Juejin. Cn/post / 684490…
9 Spring MVC Interview Questions (The most comprehensive interview Questions) Juejin. Cn/post / 684490…
10 Spring Boot Interview Questions (Most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
11 Spring Cloud Interview Questions (The most comprehensive interview questions) Juejin. Cn/post / 684490…
12 Redis Interview Questions (most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
13 MyBatis Interview Questions (most comprehensive interview questions) Juejin. Cn/post / 684490…
14 MySQL Interview Questions (most comprehensive interview questions) Juejin. Cn/post / 684490…
15 TCP, UDP, Socket, HTTP interview questions Juejin. Cn/post / 684490…
16 Nginx Interview Questions (The most comprehensive interview Questions) Juejin. Cn/post / 684490…
17 ElasticSearch interview questions
18 Kafka interview questions
19 RabbitMQ interview questions (most comprehensive summary of interview questions) Juejin. Cn/post / 684490…
20 Dubbo Interview Questions (the most comprehensive interview questions) Juejin. Cn/post / 684490…
21 ZooKeeper Interview Questions Juejin. Cn/post / 684490…
22 Netty Interview Questions (Most comprehensive summary of interview questions)
23 Tomcat Interview Questions (The most comprehensive interview questions) Juejin. Cn/post / 684490…
24 Linux Interview Questions (Most comprehensive Summary of interview questions) Juejin. Cn/post / 684490…
25 Internet Related interview Questions (the most comprehensive summary of interview questions)
26 Internet Security Questions (Summary of the most comprehensive interview questions)

Database basics

Why use a database

  • Data is stored in memory

    • Advantages: Fast access speed
    • Disadvantages: Data cannot be stored permanently
  • Data is saved in files

    • Advantages: Data is saved permanently
    • Disadvantages: 1. Slower than memory operations and frequent I/O operations. 2. It is inconvenient to query data
  • The data is stored in the database

    • Data permanence
    • Using SQL statements, the query is convenient and efficient.
    • Convenient data management

What is SQL?

  • Structured Query Language (SQL) is a database Query Language.

Purpose: Used to access data, query, update, and manage relational database systems.

What is a MySQL?

  • MySQL is a relational database management system developed by MySQL AB, a Swedish company and a product of Oracle. MySQL is one of the most popular Relational Database Management systems, and one of the best RDBMS (Relational Database Management System) applications in WEB applications. It is commonly used in Java enterprise development because MySQL is open source, free and easy to expand.

MySql, Oracle, Sql Service

  1. Sql Service can only be used on Windows, while MySql and Oracle can be used on other systems and can be ported between different database systems
  2. MySql is open source and free, Sql Service and Oracle cost money.
  3. MySql is small, Sql Service is in the middle, Oracle is the largest
  4. Oracle supports large concurrency, large traffic, Sql Service is ok, while MySql is not so much pressure, so it is best to use cluster or cache for MySql now
  5. Oracle supports multiple users with different permissions to operate, while MySql can operate all databases as long as it has login permissions
  6. The space used for installation is also very different. After Mysql is installed, it is only a few hundred meters while Oracle has several GIGABytes, and Oracle takes up a particularly large amount of memory space and other machine performance.
  7. For pagination, MySql uses Limit, Sql Service uses TOP, and Oracle uses row
  8. Oracle does not have an automatic growth type. Mysql and Sql Services generally use automatic growth types

What are the three paradigms of database

  • First normal form: No column can be split again.

  • Second normal form: On a first normal form basis, non-primary key columns are completely dependent on the primary key, not part of it.

  • Third normal form: On a second normal form basis, non-primary key columns depend only on the primary key and not on other non-primary keys.

When designing a database structure, try to follow the three paradigms, and if not, there must be a good reason for it. Like performance. In fact, we often compromise database design for performance.

Mysql > select * from ‘privileges’ where’ privileges’ are stored

The MySQL server controls user access to the database through the permission table, which is stored in the MySQL database and initialized by the MySQL \_install\_db script. These permission tables are user, DB, table\_priv, columns\_priv and host. The structure and contents of these tables are described below:

  • User permission table: records the information about the user accounts that are allowed to connect to the server. The permissions in the table are global.
  • Db rights table: records the operation rights of each account on each database.
  • Table_priv Permission table: records data table-level operation permissions.
  • Columns_priv permission table: records operation permissions at the data column level.
  • Host permission table: Works with db permission table to control database-level operation permissions on a given host. This permission list is not affected by GRANT and REVOKE statements.

How many types of entries are available for MySQL binlog? What’s the difference?

There are three formats, Statement, Row and mixed.

  • In statement mode, each SQL statement that modifies data is recorded in the binlog. You do not need to record the changes of each row, reducing the amount of binlog logs, saving I/O, and improving performance. Because SQL execution is contextual, relevant information needs to be saved at the time of saving, and some statements that use functions and the like cannot be recorded and copied.
  • At the ROW level, information about the CONTEXT of SQL statements is not recorded. Only the modified record is saved. The recording unit is the change of each row. Basically, all the changes can be recorded, but many operations will lead to a large number of changes of rows (such as ALTER table). Therefore, files in this mode save too much information and log too much.
  • Mixed, a compromise, uses statement records for normal operations and row records for situations where statement is not available.

In addition, the row level has been optimized in the new version of MySQL to record statements instead of row by row when table structure changes.

A function commonly used by a database

  • Count (*/column) : Returns the number of rows

  • Sum (column) : Returns the sum of the unique values in the specified column

  • Max (column) : Returns the maximum value in the specified column or expression

  • Min (column) : Returns the minimum value in the specified column or expression

  • Avg (column) : Returns the average value of a specified column or expression

  • Date (Expression) : Returns the date represented by the specified Expression

.

The data type

What are the data types of mysql

classification Type the name instructions
Integer types tinyInt Very small integer (8-bit binary)
Integer types smallint Small integer (16-bit binary)
Integer types mediumint Medium size integer (24 bits binary)
Integer types int(integer) An integer of normal size (32-bit binary)
The decimal type float Single-precision floating point number
The decimal type double A double – precision floating – point number
The decimal type decimal(m,d) Compress strict fixed-point numbers
The date type year YYYY 1901~2155
The date type time HH:MM:SS -838:59:59~838:59:59
The date type date YYYY-MM-DD 1000-01-01~9999-12-3
The date type datetime YYYY-MM-DD HH:MM:SS 1000-01-01 00:00:00~ 9999-12-31 23:59:59
The date type timestamp YYYY-MM-DD HH:MM:SS 19700101 00:00:01 UTC~2038-01-19 03:14:07UTC
Text, binary type CHAR(M) M is an integer ranging from 0 to 255
Text, binary type VARCHAR(M) M is an integer ranging from 0 to 65535
Text, binary type TINYBLOB The value contains 0 to 255 characters
Text, binary type BLOB The value ranges from 0 to 65535 bytes
Text, binary type MEDIUMBLOB The value ranges from 0 to 167772150 bytes
Text, binary type LONGBLOB The value ranges from 0 to 4294967295 bytes
Text, binary type TINYTEXT The value contains 0 to 255 characters
Text, binary type TEXT The value ranges from 0 to 65535 bytes
Text, binary type MEDIUMTEXT The value ranges from 0 to 167772150 bytes
Text, binary type LONGTEXT The value ranges from 0 to 4294967295 bytes
Text, binary type VARBINARY(M) A string of 0 to M variable length bytes is allowed
Text, binary type BINARY(M) A string of 0 to M bytes is allowed
  • 1. The value can be TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT, which are 1-byte, 2-byte, 3-byte, 4-byte, and 8-byte integers respectively. Any integer type can have an UNSIGNED attribute to indicate that the data is UNSIGNED, that is, a non-negative integer. Length: The length of an integer can be specified. For example, INT(11) indicates an INT of length 11. Length is meaningless in most scenarios. It does not limit the legal range of values, only affects the number of characters displayed, and needs to be used in conjunction with the UNSIGNED ZEROFILL property to be meaningful. For example, if the type is set to INT(5) and the property is UNSIGNED ZEROFILL, the database will actually store 00012 if the user inserts 12.

  • Real number types, including FLOAT, DOUBLE, and DECIMAL. DECIMAL can be used to store integers larger than BIGINT and can store exact decimals. FLOAT and DOUBLE have a range of values and support approximations using standard floating points. FLOAT is much more efficient than a DOUBLE in computing DECIMAL, which you can interpret as a string.

  • 3. String types, including VARCHAR, CHAR, TEXT, and BLOB. VARCHAR is used to store variable-length strings, which save more space than fixed-length strings. VARCHAR stores string length with an extra 1 or 2 bytes. If the column length is less than 255 bytes, 1 byte is used; otherwise, 2 bytes is used. If the VARCHAR stores more content than the set length, the content is truncated. CHAR is a fixed length that allocates sufficient space based on the defined length of the string. CHAR is padded with Spaces as needed for comparison purposes. CHAR is good for storing very short strings, or all values close to the same length. CHAR also truncates the content stored beyond the set length.

    Use strategy: For frequently changing data, CHAR is better than VARCHAR because CHAR is less prone to fragmentation. For very short columns, CHAR is more storage efficient than VARCHAR. Be careful to allocate only as much space as you need; sorting longer columns consumes more memory. Avoid the TEXT/BLOB type. Temporary tables are used for query, which incurs significant performance overhead.

  • 4. Enumeration type (ENUM), which stores non-repeating data as a predefined set. Sometimes you can use ENUM instead of the common string type. ENUM storage is very compact, condensing list values to one or two bytes. ENUM is stored internally as an integer. Avoid using numbers as constants in ENUM enumerations because they are confusing. Sort by internally stored integers

  • 5, date and time type, try to use timestamp, space efficiency is higher than datetime, with an integer to save the timestamp is usually not convenient to deal with. If you need to store subtlety, you can use BigInt storage. See here, this real question is not easier to answer.

engine

MySQL storage engine MyISAM is different from InnoDB

  • Storage Engine: How data, indexes, and other objects are stored in MySQL is an implementation of a file system.

  • Common storage engines are as follows:

    • Innodb engine: The Innodb engine provides support for DATABASE ACID transactions. Row-level locking and foreign key constraints are also provided. It is designed to handle database systems with large data volumes.
    • MyIASM engine (originally Mysql’s default engine) : does not support transactions, row-level locks and foreign keys.
    • MEMORY engine: All data is stored in MEMORY, data processing speed is fast, but not high security.

MyISAM is different from InnoDB

To compare MyISAM Innodb
Storage structure Each table is stored in three files: FRM – table definition, MYD(MYData)- data file, and MYI(MYIndex)- index file All tables are stored in the same data file (or multiple files, or separate table space files). The size of InnoDB tables is limited only by the size of the operating system file, which is generally 2GB
The storage space MyISAM can be compressed and has less storage space InnoDB tables require more memory and storage, and it creates its own buffer pool in main memory for caching data and indexes
Portability, backup, and recovery Since MyISAM data is stored as a file, it is convenient for cross-platform data transfer. You can operate on a single table during backup and restore Free solutions can be copying data files, backing up binlogs, or using mysqldump, which can be painful at tens of gigabytes
The file format Data and indexes are stored separately, data.MYDIndex,.MYI Data and indexes are stored centrally,.ibd
Record storage sequence Save in record insertion order Order insert by primary key size
A foreign key Does not support support
The transaction Does not support support
Lock support (locks are a mechanism to avoid contention, MySQL locks are almost transparent to users) Table level lock Row-level locking and table-level locking, with small locking force and high concurrency
SELECT MyISAM better
INSERT, UPDATE, DELETE InnoDB better
select count(*) Myisam is faster because myISAM maintains a counter that can be called directly.
How indexes are implemented B+ tree index, myISam is heap table B+ tree index, Innodb is the index organization table
The hash index Does not support support
The full text indexing support Does not support

What is the difference between MyISAM index and InnoDB index?

  • InnoDB index is clustered index, MyISAM index is non-clustered index.
  • InnoDB’s primary key index is very efficient because its leaf nodes store rows.
  • The leaf node of the MyISAM index stores the row data address, which needs to be addressed again to get the data back.
  • InnoDB leaf nodes that are not primary key indexes store primary key and other indexed column data, so overwriting indexes can be very efficient when querying.

Four features of the InnoDB engine

  • Insert buffer

  • Double write

  • Adaptive Hash index (AHI)

  • Pre-reading (read ahead)

Storage Engine Selection

  • If there are no specific requirements, use the default Innodb.

  • MyISAM: Read-write and insert-oriented applications, such as blogging systems, news portals.

  • Innodb: Updates (deletes) frequently, or to ensure data integrity; High concurrency, support for transactions and foreign keys. For example, OA office automation system.

The index

What is an index?

  • Indexes are special files (indexes on InnoDB tables are part of the table space) that contain Pointers to all the records in the table.

  • An index is a data structure. A database index is a sorted data structure in a database management system to help query and update data in a database table quickly. Indexes are usually implemented using B trees and their variant B+ trees.

  • More generally, an index is a table of contents. In order to facilitate the search of the contents of the book, through the content of the index to form a catalog. An index is a file that occupies physical space.

What are the advantages and disadvantages of indexes?

Advantages of indexes

  • It can greatly speed up the retrieval of data, which is the main reason for creating indexes.
  • By using indexes, you can improve the performance of the system by using optimization hiders during the query process.

Disadvantages of indexes

  • Time: It takes time to create and maintain indexes. To be specific, indexes need to be maintained dynamically when data in a table is added, deleted, or modified, which reduces the efficiency of adding, changing, or deleting.
  • Spatial: Indexes need to occupy physical space.

How is the index created? What are the benefits? What are the categories

  1. Create INDEX DEPE_unique_ide on DEPE (dept_no) TABLESPACE idx_
  2. Creating indexes can increase query speed, unique indexes can ensure consistency of database columns, and can determine connections between tables
  3. Classification of indexes: Logical classification: single-column index, compound index, unique index, non-unique index, functional index Physical classification: B-number index, reverse key index, bitmap index

Briefly describe the indexes and functions

Function of indexes: Indexes can greatly improve the speed of database retrieval and improve database performance

  1. Unique index: No two rows are allowed to have the same value
  2. Primary key indexes: To maintain relationships between tables in a database
  3. Clustered index: The physical order of rows in a table is the same as the logical (index) order of key values.
  4. Non-clustered indexes: The fundamental difference between a clustered index and a non-clustered index is whether the table records are sorted in the same order as the index
  5. Compound index: When creating an index, you can create an index for more than one column, just like a primary key
  6. Full-text indexes: Full-text indexes provide effective support for complex word searches in string data

Index Usage Scenarios

  1. Normal indexes are used when there is a lot of data and the field values have the same value.

  2. A unique index is used when there are many fields and their values are not duplicated.

  3. Compound indexes are used when multiple field names are frequently queried.

  4. Normal indexes do not support null values. Unique indexes support null values.

  5. However, if the table has a lot of additions, deletions and changes but few queries, do not create the index, because if you create an index for a column, the column’s index will be accessed first when you add, delete, or change the column.

  6. If added, a subset of the index is created in the index of the column with the value of the newly filled field name.

  7. If the value is changed, it will delete the original value and add a subset of the index with the new value of the field name.

  8. If deleted, the subset of the index named this field is deleted.

  9. So, it slows down the implementation of additions, deletions,

  10. So, if the table has a lot of changes and fewer queries, don’t create indexes.

  11. Fields that are updated too frequently are not suitable for index creation.

  12. Fields that do not appear in a WHERE condition should not be indexed.

The difference between a primary key and a unique index

  1. A primary key is a constraint, a unique index is an index, and the two are fundamentally different.

  2. A primary key must contain a unique index. A unique index does not have to be a primary key.

  3. Unique index columns allow null values, while primary key columns do not.

  4. Primary key columns are created with a null value ++ unique index by default.

  5. A table can create at most one primary key, but can create multiple unique indexes.

  6. Primary keys are better for unique identifiers that are not easily changed, such as auto-increment columns, ID numbers, and so on.

  7. A primary key can be referenced as a foreign key by other tables, but a unique index cannot. ?

What are the types of indexes?

Primary key index: Data columns cannot duplicate or be NULL, and a table can have only one primary key.

Unique index: Data columns are not allowed to duplicate and NULL values are allowed. A table allows multiple columns to create unique indexes.

  • ALTER TABLE table_name ADD UNIQUE (column); Create a unique index

  • ALTER TABLE table_name ADD UNIQUE (column1,column2); Create a unique composite index

Plain index: A basic index type that has no restrictions on uniqueness and allows NULL values.

  • ALTER TABLE table_name ADD INDEX index_name (column); Creating a normal index

  • ALTER TABLE table_name ADD INDEX index_name(column1, column2, column3); Create composite indexes

Full-text indexing is a key technology used by search engines at present.

  • Can be achieved byALTER TABLE table_name ADD FULLTEXT (column);Creating a full-text index

Index data structure (B-tree, hash)

  • The data structure of index is related to the implementation of specific storage engine. Indexes used in MySQL include Hash index, B+ tree index, etc. The default index of InnoDB storage engine we often use is B+ tree index. For hash index, the underlying data structure is hash table, so in the vast majority of requirements for a single record query, you can choose hash index, query performance is the fastest; In most scenarios, you are advised to select the BTree index.

1, B tree index

  • Mysql uses storage engine to fetch data, and almost 90% of people use InnoDB. According to the implementation, InnoDB has only two index types: BTREE index and HASH index. B-tree index is the most frequently used index type in Mysql database. Almost all storage engines support BTree index. Mysql > select * from BTREE; select * from BTREE; select * from BTREE;

  • Query mode:

    • Primary key index area :PI(address of associated saved data) press the primary key to query,
    • Common index area :si(address of the associated ID, and then to the address above). So press the primary key query, the fastest
  • B + tree properties:

    1. N subtree nodes contain n keywords and do not store data but indexes of the data.
    2. All leaf nodes contain information of all keywords and Pointers to records containing these keywords, and leaf nodes themselves are linked in large order according to the size of keywords.
    3. All non-terminal nodes can be considered as index parts, which contain only the maximum (or minimum) keyword in the subtree.
    4. In a B+ tree, data objects are inserted and deleted only on leaf nodes.
    5. B+ trees have two head Pointers, one for the root node of the tree and one for the leaf node of the minimum key code.

2, hash index

  • Briefly said, simple implementation of a HASH table is similar to the data structures (HASH), when we use HASH index in mysql, mainly through the HASH algorithm (common HASH algorithm is directly addressing method, in the square method and folding method, the divisor residual method, random number method), puts the data into a database field long HASH value, The row pointer to this data is stored in the Hash table; If a Hash collision occurs (two different keywords have the same Hash value), they are stored in a linked list under the corresponding Hash key. Of course, this is just a rough simulation.

The fundamentals of indexing

  • Indexes are used to quickly find records that have specific values. If there is no index, the query will generally traverse the entire table.

  • The principle of indexing is simple: it turns unordered data into ordered queries

    1. Sort the contents of columns that have been indexed
    2. Generate an inversion list of sorted results
    3. Spell the data address chain on the inverted list contents
    4. In the query, first get inverted list content, and then take out the data address chain, so as to get specific data

What are the indexing algorithms?

  • Index algorithms include BTree algorithm and Hash algorithm

1. BTree algorithm

  • BTree is the most common mysql database indexing algorithm and the default mysql algorithm. Because it can be used not only on the =,>,>=,<,<= and between comparison operators, but also on the like operator, as long as its query condition is a constant that does not begin with a wildcard, for example:

    Select * from user where name like select * from user where name like'jack%'; Select * from user where name like; select * from user where name like'%jack'; 
    Copy the code

2. Hash algorithm

  • Hash Hash indexes can only be used for peer comparison, such as the =,<=> (equivalent to the =) operator. Because it is a positioning data, unlike the BTree index, which needs to access the page node from the root node to the branch node for many IO visits, the retrieval efficiency is much higher than that of the BTree index.

Principles of index design?

  1. The columns that are suitable for indexing are those that appear in the WHERE clause or are specified in the join clause
  2. Classes with a small cardinality are poorly indexed and there is no need to index this column
  3. Use short indexes. If you index long string columns, you should specify a prefix length to save a lot of index space
  4. Don’t over-index. Indexes require additional disk space and reduce write performance. When table contents are modified, the index is updated or even reconstructed, and the more index columns, the longer this takes. So keep only the indexes you need to help the query.

Principles for index creation

  • Indexing is good, but it is not unlimited. It is best to comply with the following principles
  1. Mysql will keep matching to the right until it encounters a range query (>, <, between, like). A = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and c > 3 and d = 4 a = 1 and b = 2 and C > 3 and d = 4

  2. Create indexes for fields that are used as query criteria more frequently

  3. Frequently updated fields are not suitable for creating indexes

  4. If the column cannot distinguish data effectively, it is not suitable for index column (such as gender, male and female unknown, there are only three at most, the distinction is too low).

  5. Expand indexes as much as possible, do not create new ones. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.

  6. Data columns that define foreign keys must be indexed.

  7. For columns that are rarely involved in a query, do not index columns with a high number of duplicate values.

  8. Do not index columns of data types defined as text, image, and bit.

There are three ways to create indexes

  • The first method: CREATE an index when executing CREATE TABLE

    CREATE TABLE user_index2 ( id INT auto_increment PRIMARY KEY, first_name VARCHAR (16), last_name VARCHAR (16), id_card VARCHAR (18), information text, KEY name (first_name, last_name), FULLTEXT KEY (information), UNIQUE KEY (id_card) );

  • Second: use the ALTER TABLE command to add an index

    ALTER TABLE table_name ADD INDEX index_name (column_list);
    Copy the code
    • ALTER TABLE creates a normal, UNIQUE, or PRIMARY KEY index.
    • Table_name indicates the name of the table to which the index is to be added. Column_list indicates the column to which the index is to be added. If there are multiple columns, the columns are separated by commas.
    • The index name index_name is self-naming. By default, MySQL assigns a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time.
  • Third method: Run the CREATE INDEX command to CREATE the INDEX

    CREATE INDEX index_name ON table_name (column_list);
    Copy the code
    • CREATE INDEX Can add a normal or UNIQUE INDEX to a table. (However, you cannot create a PRIMARY KEY index.)

How to Drop an index

  • Drop a normal, unique, or full-text index by index name: ALTER TABLE table name drop KEY index name

    alter table user_index drop KEY name;
    alter table user_index drop KEY id_card;
    alter table user_index drop KEY information;
    Copy the code
  • Alter table table name drop primary key; It is worth noting here that this operation cannot be performed directly if the primary key grows (self-growth depends on the primary key index) :

  • Need to cancel self-growth and then delete:

    Alter table user_index -- Alter table user_indexint,
    drop PRIMARY KEY
    Copy the code
  • However, primary keys are usually not removed because design primary keys must be independent of business logic.

What should I pay attention to when creating an index?

  • Non-empty fields: Columns should be specified NOT NULL unless you want to store NULL. Columns with null values are difficult to query optimize in mysql because they complicate indexes, index statistics, and comparison operations. You should replace null values with 0, a special value, or an empty string;
  • The columns of fields with large value dispersion (the difference between values of variables) are placed before the joint index. You can view the difference value of the field by using the count() function. The larger the returned value is, the more unique values of the field the higher the dispersion degree of the field is.
  • The smaller the index field, the better: Database data is stored in pages. More data is stored on a page. More data is obtained in one I/O operation, the more efficient it is.

Does using indexed queries necessarily improve query performance? why

In general, querying data through an index is faster than a full table scan. But we must also be aware of the costs.

  • Indexes require space to store and need regular maintenance, and the index itself is modified whenever a record is added or subtracted from the table or an index column is modified. This means that each INSERT, DELETE, and UPDATE record will cost 4 or 5 more disk I/ OS. Because indexes require extra storage and processing, unnecessary indexes can slow query response times. Using INDEX queries may not improve query performance. INDEX RANGE SCAN queries are applicable to two situations:
  • Based on a range retrieval, a typical query returns a result set less than 30% of the number of records in the table
  • Retrieval based on non-unique indexes

How do I delete data at the million level or above

  • About index: Because index needs extra maintenance cost, index file is a separate file, so when we add, modify and delete data, there will be extra operations on index file, these operations need to consume extra IO, which will reduce the efficiency of add/change/delete. So, when we delete millions of database data, check the MySQL official manual to see that the speed of deleting data is proportional to the number of indexes created.
  1. So when we want to delete millions of data, we can delete the index first.
  2. Then delete the useless data (this process takes less than two minutes)
  3. Re-create the index after the deletion (when there is less data) is also very fast, in about 10 minutes.
  4. With the previous direct delete is definitely much faster, not to mention in case of delete interruption, all delete will be rolled back. That’s even worse.

The prefix index

  • Syntax: index(field(10)) : uses the first 10 characters of a field value to create an index. The default value is to use the entire content of a field to create an index.

  • Prerequisite: The identifier of the prefix is high. Passwords, for example, are good for prefix indexing because they are almost always different.

  • Practical difficulty: the length of the prefix cut.

  • Select count(*)/count(distinct left(password,prefixLen)); By adjusting the prefixLen value (incremented from 1) to see an average match for different prefix lengths near 1 (the prefixLen characters representing a password almost determine a single record)

What is the leftmost prefix principle? What is the leftmost matching principle

  • As the name implies, left-most first. When creating a multi-column index, the most frequently used column in the WHERE clause is placed on the left-most, depending on business requirements.
  • A = 1 and b = 2 and C > 3 and d = 4; a = 2 and C > 3 and d = 4; D (a,b,d,c); d (a, B,d,c); d (a, B,d);
  • = and in can be out of order, such as a = 1 and b = 2 and c = 3. Create (a,b,c) indexes in any order. Mysql’s query optimizer will help you optimize them into a form that can be recognized by the indexes

B tree and B+ tree

  • In a B-tree, you can store keys and values in internal nodes and leaf nodes; But in a B+ tree, the inner nodes are all keys and have no values, and the leaf nodes hold both keys and values.

  • The leaves of a B+ tree are connected by a chain, whereas the leaves of a B tree are independent.

Benefits of using B trees

  • B-trees can store both keys and values in internal nodes, so placing frequently accessed data near the root node greatly improves the efficiency of hot data queries. This feature makes b-trees more efficient in scenarios where a particular data is queried repeatedly.

Benefits of using B+ trees

  • Since the internal nodes of the B+ tree only store keys, not values, a single read can fetch more keys in the memory page, which helps to narrow down the search more quickly. The leaf nodes of B+ tree are connected by a chain. Therefore, when a full data traversal is needed, B+ tree only needs O(logN) time to find the smallest node, and then O(N) sequential traversal through the chain is enough. B trees, on the other hand, need to traverse each level of the tree, which requires more memory replacement times and therefore more time

What’s the difference between a Hash index and a B+ tree?

  • First, understand the underlying implementation of Hash indexes and B+ tree indexes:

  • The underlying hash index is a hash table. When searching for data, you can call the hash function once to obtain the corresponding key value, and then query back to the table to obtain the actual data. The underlying implementation of a B+ tree is a multi-path balanced lookup tree. For each query, it starts from the root node, and the key value can be obtained when the leaf node is found, and then it is judged whether it is necessary to query data back to the table according to the query.

So you can see that they have the following differences:

  • Hash indexes are faster for equivalent queries (in general), but not for range queries.
  • After the hash function is used to create indexes in the hash index, the index order cannot be the same as the original order, and range query cannot be supported. All nodes of a B+ tree follow the rules (the left node is smaller than the parent node, the right node is larger than the parent node, and the same is true for multi-fork trees), which naturally supports the range.
  • Hash indexes do not support sorting by indexes.
  • Hash indexes do not support fuzzy query and left-most prefix matching of multi-column indexes. It also works because hash functions are unpredictable. The indexes of AAAA and AAAAB have no correlation.
  • Hash indexes can always be used to query data back to the table, whereas B+ trees can use indexes only when certain conditions (clustered indexes, overwriting indexes, etc.) are met.
  • Hash indexes, while fast for equivalent queries, are not stable. Performance is unpredictable. When there are a large number of duplicate key values, hash collisions occur, and the efficiency may be very poor. The query efficiency of B+ tree is relatively stable. All queries are from the root node to the leaf node, and the height of the tree is relatively low.
  • Therefore, in most cases, choosing B+ tree indexes directly can achieve stable and good query speed. Instead of using hash indexes.

Why does the database use B+ trees instead of B trees

  • B tree is only suitable for random retrieval, while B+ tree supports both random and sequential retrieval.
  • The B+ tree provides higher space utilization, reduces I/O times, and lowers disk read/write costs. Generally, indexes themselves are too large to be stored in memory, so indexes are often stored on disk as index files. In this case, disk I/O consumption is incurred during index lookups. The internal nodes of the B+ tree do not have Pointers to the specific information about keywords, but are used as indexes. The internal nodes of the B+ tree are smaller than those of the B tree. The number of keywords in the nodes that can be contained in the disk block is larger, and the number of keywords that can be searched in the memory at a time is larger. IO read and write times are the biggest factor affecting index retrieval efficiency.
  • The query efficiency of B+ tree is more stable. B-tree search may end at non-leaf nodes, and the closer it is to the root node, the shorter the record search time is. As long as the key word is found, the existence of the record can be determined, and its performance is equivalent to a binary search in the whole set of keywords. However, in B+ tree, sequential retrieval is more obvious. In random retrieval, any keyword must be searched from the root node to the leaf node. All keyword search paths have the same length, resulting in the same query efficiency of each keyword.
  • B-tree improves disk IO performance but does not solve the problem of inefficiency of element traversal. The leaf nodes of a B+ tree are connected together sequentially using Pointers, and the entire tree can be traversed simply by traversing the leaf nodes. Moreover, range-based queries are very frequent in the database, and B trees do not support such operations.
  • Adding and deleting files (nodes) is more efficient. Because the leaf node of B+ tree contains all keywords and is stored in an ordered linked list structure, the efficiency of addition and deletion can be greatly improved.

B+ trees do not need to query data back to the table when meeting the requirements of clustered index and overwritten index.

  • In the index of a B+ tree, the leaf node may store the current key value, or it may store the current key value as well as the entire row of data. This is the clustered index and the non-clustered index. In InnoDB, only primary key indexes are clustered indexes. If there is no primary key, a unique key is selected to create a clustered index. If there is no unique key, a key is implicitly generated to build the cluster index.

When a query uses a clustered index, the entire row of data can be retrieved at the corresponding leaf node, so there is no need to run a query back to the table.

What is a cluster index? When to use clustered and non-clustered indexes

  • Clustered indexes: Store data together with the index, find the index and find the data
  • Non-clustered index: Myisam uses key_buffer to cache the index in memory. When it needs to access data (through the index), myISam directly searches the index in memory, and then finds the corresponding data on disk through the index. This is why indexes are slow when they are not hit by the key buffer

To clarify a concept: innodb, above the clustering index created index called secondary index, auxiliary index access data is always need a second search, the clustering index is auxiliary index, as a composite index, the prefix index, the only index, auxiliary index leaf node storage is no longer the physical location, but the primary key

When to use clustered and non-clustered indexes

Must a non-clustered index be queried back into the table?

  • Not necessarily. This involves whether all the fields required by the query match the index. If all the fields match the index, then there is no need to perform the query back to the table.

  • Select age from employee where age < 20; select age from employee where age < 20; select age from employee where age < 20;

What is a federated index? Why do I care about the order in a federated index?

  • MySQL can use multiple fields to create an index at the same time, called a federated index. If you want to match an index in a joint index, you need to match the index one by one in the order of the fields when the index is created. Otherwise, the index cannot be matched.

Specific reasons are as follows:

  • MySQL > create index (name, age, school); MySQL > create index (name, age, school); MySQL > create index (school);

  • When the query is performed, the indexes are only strictly ordered according to name, so the name field must be used for equivalent query first. Then, the matched columns are strictly ordered according to age field, and the age field can be used for index search, and so on. Therefore, when establishing a joint index, we should pay attention to the order of index columns. In general, the columns with frequent query requirements or high field selectivity should be placed first. Additional adjustments can be made individually, depending on the specific query or table structure.

The transaction

What are database transactions?

  • Transaction is an indivisible sequence of database operations and the basic unit of database concurrency control. The result of its execution must make the database change from one consistency state to another. A transaction is a logical set of operations that either all or none of them execute.

  • The most classic and often cited example of a transaction is the transfer of money.

  • If Xiao Ming wants to transfer 1000 yuan to Xiao Hong, the transfer will involve two key operations: reducing Xiao Ming’s balance by 1000 yuan and increasing Xiao Hong’s balance by 1000 yuan. If something goes wrong between these two operations like the banking system crashes, and Ming’s balance goes down and Red’s balance doesn’t go up, that’s not right. A transaction is a guarantee that both of these critical operations will either succeed or fail.

What are the four properties of ACID?

  • Relational databases must follow the ACID rule, which reads as follows:

  1. Atomicity: Transactions are the smallest unit of execution and do not allow splitting. The atomicity of the transaction ensures that the action either completes completely or does not work at all;
  2. Consistency: Data is consistent before and after a transaction is executed. Multiple transactions read the same data with the same result.
  3. Isolation: when accessing the database concurrently, a user’s transaction is not disturbed by other transactions, and the database is independent between the concurrent transactions.
  4. Persistence: After a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails.

What is dirty reading? Phantom read? Unrepeatable?

  • Drity Read: a transaction has updated a copy of data, and another transaction has Read the same copy of data. For some reason, the first transaction has rolled back, and the data Read by the second transaction is incorrect.
  • Non-repeatable read: Data inconsistency between two queries of a transaction. This may be because the original data updated by a transaction was inserted between the two queries.
  • Phantom Read: a transaction where the number of pens is inconsistent between two queries. For example, if one transaction queries for rows and another inserts new columns, the previous transaction will find columns that it did not have before on subsequent queries.

What is the isolation level of a transaction? What is the default isolation level for MySQL?

In order to achieve the four characteristics of transaction, the database defines four different transaction isolation levels, which are Read uncommitted, Read committed, Repeatable Read, Serializable. The four levels solve the problems of dirty reads, unrepeatable reads, and phantom reads one by one.

Isolation level Dirty read Unrepeatable read The phantom read
READ-UNCOMMITTED Square root Square root Square root
READ-COMMITTED x Square root Square root
REPEATABLE-READ x x Square root
SERIALIZABLE x x x

The SQL standard defines four isolation levels:

  • Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads.
  • Read-committed: Allows concurrent transactions to READ data that has been COMMITTED, preventing dirty reads, but magic or unrepeatable reads can still occur.
  • REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.
  • SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

Note:

  • REPEATABLE_READ isolation level used by Mysql READ_COMMITTED isolation level used by Oracle

  • The implementation of transaction isolation mechanism is based on locking mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-version Concurrency Control), which supports concurrent consistent read and rollback by saving modified old version information.

  • Most database systems have read-committed isolation: because the lower the isolation level, the fewer locks are COMMITTED, but remember that InnoDB storage engine uses **REPEATABLE READ ** by default without any performance penalty.

  • InnoDB storage engine typically uses the **SERIALIZABLE ** isolation level for distributed transactions.

The lock

Do you know about MySQL locks

  • When a database has concurrent transactions, data inconsistencies may occur, and some mechanism is needed to ensure the order of access. The locking mechanism is such a mechanism.

  • Just like a hotel room, if people go in and out at random, there will be many people snatches for the same room, and a lock will be installed on the room. Only the person who has obtained the key can enter and lock the room, and others can use it again only after they have finished using it.

What locks does MySQL have?

Will tell from the type of lock, have share lock and exclusive lock.

  • Shared lock: also known as read lock. When the user wants to read the data, a shared lock is placed on the data. A shared lock allows multiple threads to acquire the same lock simultaneously.

  • Exclusive lock: also known as write lock. An exclusive lock is placed on the data when the user writes to it. An exclusive lock is also called an exclusive lock. A lock can only be held by one thread at a time, and other threads must wait for the lock to be released before they can acquire it. Only one exclusive lock can be added, and other exclusive locks and shared locks are mutually exclusive.

The relationship between isolation levels and locks

  • At the Read Uncommitted level, shared locks are not required to Read data so that it does not conflict with exclusive locks on modified data

  • At the Read Committed level, shared locks are added to Read operations but released after the statement is finished.

  • In Repeatable Read level, Read operations need to add the shared lock, but the shared lock is not released before the transaction is committed, that is, the shared lock must be released after the transaction is completed.

  • SERIALIZABLE is the most restrictive isolation level because it locks the entire range of keys and holds the lock until the transaction completes.

What are the database locks by lock granularity? Locking mechanism and InnoDB locking algorithm

  • In relational databases, database locks can be divided into row-level locks (INNODB engine), table-level locks (MYISAM engine), and page-level locks (BDB engine) according to the granularity of locks.

  • MyISAM and InnoDB storage engines use locks:

    • MyISAM uses table-level locking.
    • InnoDB supports row-level locking and table-level locking. The default row-level locking is performed

Row-level locking, table-level locking and page-level locking comparison

  • Row-level lock The row-level lock is the most fine-grained lock in Mysql. It only locks the current row. Row-level locking can greatly reduce conflicts in database operations. Its locking particle size is the smallest, but the locking cost is also the largest. Row-level locks are divided into shared locks and exclusive locks.

    • Features: high overhead, slow lock; Deadlocks occur; The lock granularity is the lowest, the probability of lock conflict is the lowest, and the concurrency is the highest.
  • Table level lock Table level lock is the lock with the largest granularity in MySQL. It locks the entire table in the current operation. It is simple to implement, consumes less resources, and is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table level locks are classified into shared table read locks (shared locks) and exclusive table write locks (exclusive locks).

    • Features: low overhead, fast lock; No deadlocks occur; The lock granularity is large, and the probability of lock conflict is high and the concurrency is low.
  • Page-level lock Page-level lock is a type of lock whose granularity is in the middle between row-level lock and table-level lock in MySQL. Table level locking is fast but has many conflicts, while row level locking is slow but has few conflicts. So a compromise page level is taken, locking adjacent sets of records at a time.

    • Features: Overhead and locking time are between table and row locks; Deadlocks occur; The locking granularity is between table locks and row locks, and the concurrency is average

MySQL InnoDB engine row lock how to implement?

  • InnoDB does row locking based on indexes

  • Select * from tab_with_index where id = 1 for update;

  • For UPDATE can perform row locks based on conditions, and ids are columns with index keys. If id is not an index key then InnoDB will complete table locks and concurrency will not be possible

There are three locking algorithms for InnoDB storage engine

  • Record Lock: A lock on a single row Record
  • Gap Lock: A Gap lock that locks a range, excluding the record itself
  • Next-key lock: Record +gap locks a range, including the record itself

Related knowledge:

  1. Innodb uses next-key lock for row queries
  2. Next-locking keying to solve Phantom Problem
  3. Demote the next-key lock to a Record key when the query index contains unique attributes
  4. Gap locks are designed to prevent multiple transactions from inserting records into the same range, which can cause phantom problems
  5. There are two ways to explicitly close gap locks :(use only record locks except for foreign key constraints and uniqueness checks) A. Set the transaction isolation level to RC B. Set innodb_locks_unsafe_for_binlog to 1

What is a deadlock? How to solve it?

  • A deadlock is a vicious cycle in which two or more transactions occupy each other’s resources and request to lock each other’s resources.

  • Common solutions to deadlocks

    • 1. If different programs concurrently access multiple tables, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlocks.
    • 2, in the same transaction, as far as possible to lock all the resources needed to reduce the probability of deadlock;
    • 3. For services that are prone to deadlocks, upgrade locking granularity can be used to reduce the probability of deadlocks by table-level locking.

If the business is difficult to handle, you can use distributed transaction locks or use optimistic locks

What are optimistic and pessimistic locks for databases? How do you do that?

  • The task of concurrency control in a database management system (DBMS) is to ensure that the isolation and unity of transactions and the unity of the database are not broken when multiple transactions simultaneously access the same data in the database. Optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main techniques used in concurrency control.

  • Pessimistic locking: Shielding all operations that might violate data integrity, assuming concurrency conflicts. The transaction is locked after the data is queried until the transaction is committed. Implementation: use the locking mechanism in the database

    // Core SQL, mainly by for update
    select status from t_goods where id=1 for update;
    Copy the code
  • Optimistic locking: Data integrity violations are checked only at commit time, assuming no concurrency conflicts will occur. The transaction is locked while the data is being modified, using version locking. Implementation: Music will generally use the version number mechanism or CAS algorithm implementation.

    SQL update table set x=x+1, version=version+1 WHERE id=#{id} and version=#{version};Copy the code

Two types of lock usage scenarios

  • From the introduction of the two kinds of lock, we know that the two kinds of lock have their own advantages and disadvantages, can not be considered better than the other kind, for example, optimistic lock is suitable for the situation of less write (multi-read scenario), that is, conflict is really rare, this can save the lock overhead, increase the overall throughput of the system.

  • However, in the case of overwrite, conflicts often arise, which can cause the upper application to be repeatedly retry, thus reducing performance. Pessimistic locking is suitable for overwrite scenarios.

view

Why use views? What is a view?

  • To improve the reusability of complex SQL statements and the security of table operations, the MySQL database management system provides the view feature. A view is essentially a virtual table that does not physically exist and contains a list of named columns and rows similar to a real table. However, views do not exist in the database as stored data values. The row and column data comes from the base table referenced by the query that defines the view and is generated dynamically when the view is specifically referenced.

  • Views improve the security of data in the database by allowing developers to focus only on specific data they are interested in and specific tasks they are responsible for, and only see the data defined in the view rather than the data in the tables referenced by the view.

What are the characteristics of views?

Views have the following characteristics:

  • The columns of a view can come from different tables, which are abstractions of tables and new relationships established in a logical sense.

  • A view is a table (virtual table) generated by a base table (real table).

  • View creation and deletion do not affect the base table.

  • Updates (additions, deletions, and modifications) to view content directly affect the base table.

  • When the view comes from more than one base table, data cannot be added and deleted.

Operations on a view include creating a view, viewing a view, deleting a view, and modifying a view.

What are the usage scenarios for views?

View basic purpose: Simplify SQL queries and improve development efficiency. If there is another use, it is to be compatible with older table structures.

The following are common usage scenarios for views:

  • Reuse SQL statements;

  • Simplify complex SQL operations. After you write a query, you can easily reuse it without knowing its basic query details;

  • Use parts of a table rather than the entire table;

  • Protect data. Users can be granted access to specific parts of a table rather than the entire table;

  • Change the data format and presentation. Views can return data that is different from the presentation and format of the underlying table.

Advantages of views

  1. Simplify queries. Views simplify user operations
  2. Data security. Views enable users to view the same data from multiple perspectives and secure confidential data
  3. Logical data independence. Views provide a degree of logical independence for refactoring the database

Disadvantages of Views

  1. Performance. If the view is defined by a complex multi-table query, then even a simple query for a view will take the database some time to turn into a complex combination.

  2. Modify restrictions. When the user tries to modify some rows of the view, the database must translate it into changes to some rows of the base table. In fact, the same is true when inserting or deleting from a view. This is convenient for simple views, but may not be modifiable for more complex views

    These views have the following characteristics: 1. Views with collection operators such as UNIQUE. 2. View with GROUP BY clause. 3. Views with aggregate functions such as AVG\SUM\MAX. 4. Views that use the DISTINCT keyword. 5. Join table views (with some exceptions)

What is a cursor?

  • A cursor is a data buffer created by the system for users to store the execution results of SQL statements. Each cursor area has a name. The user can retrieve records one by one through a cursor and assign them to the main variable for further processing by the main language.

Stored procedures and functions

What is a stored procedure? What are the pros and cons?

  • A stored procedure is a precompiled SQL statement that has the advantage of allowing modular design, meaning that it needs to be created once and can be called multiple times later in the program. If an operation requires multiple SQL executions, using stored procedures is faster than simply executing SQL statements.

advantages

  1. Stored procedures are precompiled and execute efficiently.
  2. Stored procedure code is directly stored in the database and can be invoked by the stored procedure name to reduce network communication.
  3. High security. Users with permissions are required to execute stored procedures.
  4. Stored procedures can be reused, reducing the workload of database developers.

disadvantages

  1. Debugging is cumbersome, but debugging with PL/SQL Developer is very convenient! Make up for that shortcoming.
  2. Migration issues, database side code is of course database specific. But if you are doing engineering projects, there are basically no migration problems.
  3. Recompilation is an issue because the back-end code is compiled before run, and if objects with references change, affected stored procedures and packages will need to be recompiled (although it can also be set to be automatically compiled at run time).
  4. If in a program in the system a lot of the use of stored procedures, used to program delivery time with the increase of user requirements will lead to the change of the data structure, then there is the system of related problems, and finally, if the user wants to maintain the system can say is very difficult, and the price is unprecedented, to maintain more of a problem.

The trigger

What is a trigger? What are the use scenarios for triggers?

  • Triggers are special event-driven stored procedures defined by users on relational tables. A trigger is a piece of code that is automatically executed when an event is triggered.

Usage scenarios

  • Changes can be cascaded through related tables in the database.
  • Monitor changes to a field in a table in real time and need to be processed accordingly.
  • For example, some service numbers can be generated.
  • Do not abuse it; otherwise, database and application maintenance will be difficult.
  • Keep the basics in mind, but understand the difference between the data type CHAR and VARCHAR, and the difference between the table storage engine InnoDB and MyISAM.

What triggers are available in MySQL?

There are six types of triggers in MySQL database:

  • Before Insert
  • After Insert
  • Before Update
  • After Update
  • Before Delete
  • After Delete

Common SQL statements

What are the main categories of SQL statements

  • Data Ddefinition Language (DDL) CREATE, DROP, ALTER

    Mainly for the above operations that have operations on logical structures, including table structures, views and indexes.

  • Data Query Language (DQL) SELECT

    This is better understood as a query operation with the select keyword. All simple queries and connection queries belong to DQL.

  • Data Manipulation Language (DML) INSERT, UPDATE, and DELETE

    DQL and DML jointly construct the add, delete, change and check operations commonly used by most junior programmers. Queries are a special kind of DQL.

  • Data Control Language (DCL) GRANT, REVOKE, COMMIT, ROLLBACK

    Mainly for the above operations, that is, database security integrity and other operations, can be simply understood as permission control.

SQL statement syntax order:

  1. SELECT
  2. FROM
  3. JOIN
  4. ON
  5. WHERE
  6. GROUP BY
  7. HAVING
  8. UNION
  9. ORDER BY
  10. LIMIT

What are superkeys, candidate keys, primary keys, and foreign keys?

  • Superkeys: The set of attributes that uniquely identify a tuple in a relationship is called a relational schema superkey. An attribute can be used as a superkey, or a combination of attributes can be used as a superkey. Superkeys contain candidate keys and primary keys.
  • Candidate key: is the minimum superkey, that is, a superkey with no redundant elements.
  • Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify a stored data object. A data column can have only one primary key, and the value of the primary key cannot be missing, that is, cannot be Null.
  • Foreign key: The primary key of another table that exists in one table is called the foreign key of that table.

What kinds of SQL constraints are there?

What kinds of SQL constraints are there?

  • NOT NULL: The contents of the control field must NOT be NULL.
  • UNIQUE: The content of the control field cannot be repeated. A table can have multiple UNIQUE constraints.
  • PRIMARY KEY: Also used for control field contents cannot be repeated, but it is allowed only one in a table.
  • FOREIGN KEY: Action used to prevent breaking connections between tables and to prevent illegal data from being inserted into a FOREIGN KEY column, since it must be one of the values in the table to which it points.
  • CHECK: Used to control the value range of the field.

Six associated queries

  • CROSS JOIN

  • INNER JOIN

  • LEFT JOIN/RIGHT JOIN

  • UNION query (UNION and UNION ALL)

  • FULL JOIN

  • CROSS JOIN

    SELECT * FROM A,B(,C) or SELECT *FROM A CROSS JOIN B (CROSS JOIN C)SELECT * FROM A,B WHERE a. id; SELECT * FROM A,B WHERE a. idSELECT * FROM A INNER JOIN B ON A.id= b. id SELECT * FROM A INNER JOIN B ON A.id=B.idCopy the code

Inner joins fall into three categories

  • ON A.id=B.id
  • Unequal connection: ON A.id > B.id
  • SELECT * FROM A T1 INNER JOIN A T2 ON t1.id = t2.pid

LEFT JOIN/RIGHT JOIN

  • LEFT OUTER JOIN: select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN; select * from LEFT OUTER JOIN
  • Select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN; select * from RIGHT OUTER JOIN

UNION query (UNION and UNION ALL)

SELECT * FROM A UNION SELECT * FROM B UNION ...
Copy the code
  • The result before the UNION is the benchmark. It should be noted that the number of columns in the joint query should be equal, and the same rows of records will be merged
  • If you use UNION ALL, duplicate rows are not merged
  • Efficiency of UNION is higher than that of UNION ALL

FULL JOIN

SELECT * FROM A LEFT JOIN B ON A.id=B.id UNIONSELECT * FROM A RIGHT JOIN B ON A.id=B.id
Copy the code
  • MySQL does not support full connections
  • You can use LEFT JOIN in combination with UNION and RIGHT JOIN

Table join questions

There are two tables.
  • 1 R, 1 S, R has three columns ABC, S has two columns CD, three records in each table

R table

A B C
a1 b1 c1
a2 b2 c2
a3 b3 c3

S table

C D
c1 d1
c2 d2
c4 d3
1. Cross join (Cartesian product)
  • SQL
select r.*,s.* from r,s
Copy the code
  • The results of
A B C C D
a1 b1 c1 c1 d1
a2 b2 c2 c1 d1
a3 b3 c3 c1 d1
a1 b1 c1 c2 d2
a2 b2 c2 c2 d2
a3 b3 c3 c2 d2
a1 b1 c1 c4 d3
a2 b2 c2 c4 d3
a3 b3 c3 c4 d3
2. Internal connection results
  • SQL
select r.*,s.* from r inner join s on r.c=s.c
Copy the code
  • The results of
A B C C D
a1 b1 c1 c1 d1
a2 b2 c2 c2 d2
3. Result of left connection
  • SQL
select r.*,s.* from r left join s on r.c=s.c
Copy the code
  • The results of
A B C C D
a1 b1 c1 c1 d1
a2 b2 c2 c2 d2
a3 b3 c3
4. Right join result
  • SQL
select r.*,s.* from r right join s on r.c=s.c
Copy the code
  • The results of
A B C C D
a1 b1 c1 c1 d1
a2 b2 c2 c2 d2
c4 d3
MySql > select * from table_name where table_name is not supported
  • SQL
select r.*,s.* from r full join s on r.c=s.c
Copy the code
  • The results of
A B C C D
a1 b1 c1 c1 d1
a2 b2 c2 c2 d2
a3 b3 c3
c4 d3

What is a subquery

  1. Condition: The query result of one SQL statement is the condition or result of another query statement

  2. Nested: Multiple SQL statements are nested. The internal SQL query statements are called subqueries.

In and exists are different in mysql

  • The in statement in mysql hashes the outer table to the inner table, while the exists statement loops the outer table to query the inner table. It has long been thought that exists is more efficient than in statements, but this is not accurate. This is the distinction between the environment.
    1. There is little difference between in and EXISTS if the two tables queried are of the same size.
    2. If one of the two tables is smaller and the other is larger, exists is used for the larger subtable and in is used for the smaller subtable.
    3. Not in and NOT EXISTS: If not in is used in a query statement, a full table scan is performed on both the internal and external tables without an index. The not extsts subquery can still be used for indexes on the table. So regardless of the size of the table, not exists is faster than not in.

Varchar differs from char

The characteristics of the char

  • Char indicates a string of fixed length.

  • If the length of the inserted data is less than the fixed length of char, it is padded with Spaces.

  • Because the length is fixed, the access speed is much faster than VARCHAR, even 50% faster, but because its length is fixed, so it will occupy extra space, is the way of space for time;

  • For char, the maximum number of characters that can be stored is 255, regardless of encoding

The characteristics of the varchar

  • Varchar stands for variable length string, the length is variable;

  • Store the data as long as it is inserted;

  • Varchar is the opposite of char in that it accesses slowly because the length is not fixed, but because of this, it does not occupy extra space.

  • For vARCHAR, the maximum number of characters can be 65532

In short, a combination of performance (faster CHAR) and disk space savings (smaller VARCHAR) is a good way to design your database in a specific situation.

Meaning of 50 in VARCHar (50)

  • Varchar (50) and (200) take up the same amount of space to store Hello, but the latter will consume more memory when sorting because order by COL uses fixed_length (the same with the memory engine). In earlier versions of MySQL, 50 stood for the number of bytes; now it stands for the number of characters.

Meaning of 20 in int(20)

  • The length of the display character. 20 indicates that the maximum display width is 20, but it still occupies 4 bytes of storage, and the storage range remains unchanged.

  • Does not affect internal storage, only affects the number of zeros before the int with zerofill definition, easy report display

Why is mysql designed this way

  • Does not make sense for most applications, but specifies some tool used to display the number of characters; Int (1) and int(20) store and calculate the same;

Int (10); char(10); varchar(10)

  • Int (10) 10 indicates the length of the displayed data, not the size of the stored data; The 10 in chart(10) and VARCHar (10) indicates the size of the stored data, that is, how many characters are stored.

  • Char (10) Stores 10 characters of a fixed length. If the number of characters is less than 10, Spaces are used to fill up the storage space

  • Varchar (10) stores as many variable length characters as possible. Spaces are also stored as one character, unlike char(10) Spaces, which are placeholders that do not count as one character

What’s the difference between FLOAT and DOUBLE?

  • FLOAT data can store up to 8 decimal digits and account for 4 bytes in memory.
  • Data of type DOUBLE can store up to 18 decimal digits and occupy 8 bytes of memory.

Difference between DROP, DELETE, and TRUNCate

  • All three indicate deletion, but there are some differences:
To compare Delete Truncate Drop
type Belong to the DML Belong to the DDL Belong to the DDL
The rollback Can be rolled back Do not roll back Do not roll back
Delete the content Delete all or some rows from the table while the table structure is still in place Delete all data from table while table structure is still in place When a table is dropped from the database, all rows, indexes, and permissions are also deleted
Delete the speed The deletion speed is slow and needs to be deleted line by line Fast deletion Fastest deletion
  • Therefore, when a table is no longer needed, use drop; When you want to delete some rows, use delete; Truncate is used when deleting all data from a reserved table.

The difference between a UNION and a UNION ALL?

  • If you use UNION ALL, duplicate rows are not merged
  • Efficiency of UNION is higher than that of UNION ALL

SQL optimization

What is your database optimization experience?

  1. Foreign key constraints affect the performance of additions, deletions, and deletions. Remove excluded keys if the application can maintain database integrity
  2. Sql statements are all capitalized, especially column names, because the way the database works is that the Sql statement is sent to the database server, and the database will compile the Sql in uppercase first. If you compile the Sql in uppercase first, you don’t need to compile the Sql in uppercase
  3. If your application can ensure database integrity, you don’t need to design the database according to the three paradigms
  4. You don’t have to create a lot of indexes. Indexes can speed up queries, but they consume disk space
  5. If it’s JDBC, create SQl using PreparedStatement instead of Statement. PreparedStatement performance is faster than Statement performance. PreparedStatement SQL statements are precompiled in this object, and PreparedStatement objects can be efficiently executed multiple times

How to optimize SQL query statement

  1. Queries should be optimized to avoid full table scans, and indexes should be considered on where and order by columns first
  2. Use indexes to improve queries
  3. Avoid * in the SELECT clause and use all uppercase SQL
  4. Avoid using IS NULL values for fields in the WHERE clause. This will cause the engine to abandon the index and use IS NOT NULL for full table scans
  5. The use of OR in the WHERE clause to join conditions also causes the engine to abandon the index for a full table scan
  6. In and not in should also be used with caution, otherwise full table scanning will occur

How do you know if SQL statement performance is high or low

  1. Check the execution time of the SQL
  2. Use the Explain keyword to simulate the optimizer’s execution of SQL queries to see how MYSQL processes your SQL statements. Analyze performance bottlenecks in your query or table structure.

The execution order of SQL

  1. FROM: Loads data FROM the hard disk to the data buffer for easy operation.
  2. WHERE: Selects the tuple that meets the condition from the base table or view. (Cannot use aggregate functions)
  3. JOIN (e.g. Right left ——- reads a tuple from the right table and finds the corresponding tuple or set of tuples in the left table)
  4. ON: Join ON To implement multi-table join query. It is recommended to perform multi-table query without sub-query.
  5. GROUP BY: a grouping, usually used with aggregate functions.
  6. HAVING: Selects eligible tuples based on tuples. (Usually used with GROUP BY)
  7. SELECT: query the columns that need to be listed for all tuples.
  8. DISTINCT: Indicates a deduplication function.
  9. UNION: Combines multiple query results (duplicate records are removed by default).
  10. ORDER BY: Performs the corresponding sort.
  11. LIMIT 1: Display and output a data record (tuple)

How to locate and optimize performance problems of SQL statements? Is the index being used? Or how do I know why this statement is running slowly?

  • The most important and effective way to locate low performance SQL statements is to use an execution plan. MySQL provides the Explain command to view the execution plan of a statement. As we know, no matter what kind of database or database engine, there are many related optimizations in the execution of a SQL statement. For query statements, the most important optimization method is the use of indexes. And the execution plan, is to show the database engine for SQL statement execution details, including whether to use the index, what index to use, the use of the index information.

  • Information contained in the execution planidIt’s a set of numbers. Represents the execution order of subqueries in a query.
    • Id Same execution sequence from top to bottom.
    • Different ids. A larger ID has a higher priority and is executed earlier.
    • A null id indicates a result set and does not need to be queried. It is commonly used in query statements containing union.

Select_type Specifies the query type of each subquery. Some common query types.

id select_type description
1 SIMPLE Does not contain any subqueries or queries such as unions
2 PRIMARY The outermost query containing the subquery is displayed as PRIMARY
3 SUBQUERY A query contained in a SELECT or WHERE sentence
4 DERIVED The query contained in the from sentence
5 UNION Appears in the query statement after the union
6 UNION RESULT Get the result set from the UNION, as in the third example above
  • Table query data. When querying data from derived tables, x indicates the corresponding execution plan ID of the table partitions. The table can be partitioned by the specified column when the table is created. Here’s an example:
create table tmp (
    id int unsigned not null AUTO_INCREMENT,
    name varchar(255),
    PRIMARY KEY (id)
) engine = innodb
partition by key (id) partitions 5;
Copy the code
  • Type (very important, you can see if there is an index) access type

    • ALL Scans ALL table data
    • Index Traverses the index
    • Range Indicates the search range of the index
    • Index_subquery uses ref in subqueries
    • Unique_subquery uses eq_ref in subqueries
    • Ref_or_null The optimized ref to index Null
    • Fulltext uses full-text indexes
    • Ref uses a non-unique index to find data
    • Eq_ref uses PRIMARY KEYorUNIQUE NOT NULL index association in join query.
  • Possible_keys Possible index. Note that possible_keys may not be used. If there is an index on the field involved in the query, the index will be listed. When this column is NULL, it is time to consider whether the current SQL needs to be optimized.

  • Key Displays the actual index used by MySQL in the query. If no index is used, the value is NULL.

  • TIPS: If an overwritten index is used in a query, it only appears in the key list

  • Key_length Indicates the length of the index

  • Ref indicates the join match criteria for the above table, that is, which columns or constants are used to find values on indexed columns

  • Rows returns the estimated number of result sets, which is not an exact value.

  • The information of EXTRA is very rich, common ones are:

    1. Using index Uses an overwrite index
    2. Using WHERE uses the WHERE clause to filter result sets
    3. Use filesort to sort files Using non-indexed columns.
    4. The goal of SQL optimization can be found in the Ali development manual
SQL performance optimization objectives: at least range level, requirements are ref level, conSTS is the best. Description:1Consts there is at most one matching row (primary key or unique index) in a single table, and data can be read in the optimization phase.2Ref refers to using a normal index.3) range performs a range search on an index. Counter example: explain table results, type=index, index physical file full scan, speed is very slow, this index level comparison range is also low, and full table scan is nothing.Copy the code

SQL life cycle?

  1. The application server establishes a connection with the database server

  2. The database process gets the requested SQL

  3. Parse and generate an execution plan, execute

  4. Read data into memory and process it logically

  5. Send the result to the client through the connection in step 1

  6. Close the connection and release resources

Large table data query, how to optimize

  1. Optimize shema, SQL statement + index;
  2. Second plus cache, memcached, redis;
  3. Master/slave replication, read/write separation;
  4. Vertical split, dividing a large system into smaller systems based on how well your modules are coupled, is a distributed system;
  5. Horizontal segmentation, for tables with large data volume, this step is the most troublesome and can test the technical level. It is necessary to choose a reasonable Sharding key. In order to have good query efficiency, the table structure should also be changed to make certain redundancy, and the application should also be changed. Instead of scanning all tables;

How to handle large pages?

Large paging is generally handled in two directions.

  • The database level, which is what we’re focusing on (though not as much), is something likeSelect * from table where age > 20 limit 1000000,10There is room for optimization. This statement takes load1000000 data and then basically dumps it all. Fetching 10 is slow of course. We could have changed it toSelect * from table where id in (select id from table where age > 20 limit 1000000,10)This is a million data load, but because of index overwrite, all the fields to be queried are in the index, so it is very fast. And if the ids are continuous, we can alsoselect * from table where id > 1000000 limit 10Efficiency is also good. There are many possibilities of optimization, but the core idea is the same, that is, to reduce load data.
  • Reduce this request from a requirement perspective… The main thing is not to do similar requirements (jump directly to a specific page millions of pages later. Allows only page-by-page viewing or following a given path, which is predictable and cacheable) and prevents ID leaks and continuous malicious attacks.

In fact, to solve the problem of large paging, we mainly rely on cache. We can check the content in advance predictably, cache it to redis and other K-V databases, and return it directly

Mysql paging

  • The LIMIT clause can be used to force a SELECT statement to return a specified number of records. LIMIT accepts one or two numeric parameters. The argument must be an integer constant. If you are given two arguments, the first parameter specifies the offset of the first row to return, and the second parameter specifies the maximum number of rows to return. Initial row offset is 0(not 1)

    SELECT * FROM table LIMIT 5,10; // Retrieve record rows 6-15

  • To retrieve all rows from an offset to the end of the recordset, specify a second parameter of -1:

    SELECT * FROM table LIMIT 95,-1; // Retrieve the record line 96-last.

  • If only one argument is given, it indicates the maximum number of rows returned:

    SELECT * FROM table LIMIT 5; // Retrieve the first five rows

  • In other words, LIMIT n is equivalent to LIMIT 0,n.

Slow Query logs

This log is used to record SQL logs whose execution time exceeds a critical value. This log is used to quickly locate slow queries and provide reference for optimization.

  • Enable slow log query

  • Configuration item: slow_query_log

  • Use show variables like ‘slov_query_log’ to check whether the slov_query_log is enabled. If the status is OFF, use set GLOBAL slow_query_log = on to enable the slov_query_log function. It will generate an XXx-slow.log file under datadir.

  • Set critical time

  • Configuration item: long_query_time

  • Check: show VARIABLES like ‘long_query_time’, in seconds

  • Set: set long_query_time=0.5

  • The real time should be set from long time to short time, that is, the slowest SQL optimization away

  • View the log, which is logged to xxx-slow.log whenever the SQL exceeds the critical time we set

Care about the SQL time in the business system? Statistics too slow query? How are slow queries optimized?

  • In the business system, except for the query using the primary key, I will test the time on the test library. The statistics of the slow query are mainly done by the operation and maintenance, and the slow query in the business will be fed back to us regularly.

  • Slow query optimization first to understand what is the cause of slow? Does the query condition not match the index? Load unwanted columns? Or too much data?

So optimization is going in those three directions,

  • The statement is first analyzed to see if additional data is loaded, perhaps by querying for extra rows and discarding them, or by loading many columns that are not needed in the result. The statement is analyzed and overwritten.
  • Analyze a statement’s execution plan to see how it uses the index, and then modify the statement or index so that the statement matches the index as closely as possible.
  • If statement optimization is no longer possible, consider whether the amount of data in the table is too large, and if so, split the table horizontally or vertically.

Why try to have a primary key?

  • Primary keys ensure the uniqueness of data rows in the entire table. You are advised to add a self-growing ID column as the primary key even if the table does not have a primary key. After setting the primary key, it is possible to make subsequent deletions faster and ensure the safety of the operation data range.

Does the primary key use an autoincrement ID or a UUID?

  • It is recommended to use the autoincrement ID instead of the UUID.

  • Because in InnoDB storage engines, the primary key index as a clustering index, that is, the primary key index of B + tree leaves node stores the primary key index, and all the data (in order), if the primary key index is the ID, so you just need to constantly backward arrangement, if it is a UUID, due to the size of the ID with the arrival of the original not sure. It causes a lot of data inserts, a lot of data movement, and then a lot of memory fragmentation, which in turn degrades insert performance.

In general, in the case of large data volumes, the performance is better with auto-increment primary keys.

As for the primary key being a clustered index, InnoDB selects a unique key as the clustered index if there is no primary key, and generates an implicit primary key if there is no unique key.

Why is the field required to be not NULL?

  • Null values take up more bytes and cause a lot of mismatches in your program.

If you want to store user password hashes, what fields should be used for storage?

  • Fixed length strings such as password hashes, salt, and user id numbers should be stored in char rather than vARCHar to save space and improve retrieval efficiency.

How to optimize data access during query

  • Too much data is accessed and query performance deteriorates
  • Determine if the application is retrieving more data than it needs, perhaps too many rows or columns
  • Verify that the MySQL server is not parsing a large number of unnecessary rows
  • Avoid the following SQL statement errors
  • Avoid querying for data you don’t need. Solution: Use limit to resolve
  • Multi-table association returns all columns. Solution: Specify column names
  • Always return all columns. Workaround: Avoid using SELECT *
  • Query the same data repeatedly. Workaround: You can cache the data and read the cache directly next time
  • Use Explain for analysis, if you find that the query needs to scan a large amount of data, but only returns a small number of rows, you can use the following techniques to optimize:
  • Using an index override scan, all columns are placed in the index so that the storage engine does not need to go back to the table for the corresponding row to return the result.
  • Change the database and table structure and modify the data table paradigm
  • Rewrite the SQL statement so that the optimizer can execute the query in a better way.

How to optimize long and difficult query statements

  • Analyze whether one complex query or multiple simple queries are faster
  • MySQL internally scans millions of rows of data in memory per second, and responding to data to clients is much slower
  • It is good to use as small a query as possible, but sometimes it is necessary to decompose a large query into several smaller ones.
  • Divide a large query into multiple small identical queries
  • Deleting 10 million data at a time costs more than deleting 10, 000 and pausing for a while.
  • Decompose associated queries to make caching more efficient.
  • Performing a single query can reduce lock contention.
  • The association at the application layer makes it easier to split the database.
  • Query efficiency will be greatly improved.
  • Fewer queries for redundant records.

Optimize specific types of query statements

  • Count (*) ignores all columns and counts all columns. Do not use count(column name)
  • In MyISAM, count(*) without any WHERE conditions is very fast.
  • When there are WHERE conditions, MyISAM’s count count is not necessarily faster than other engines.
  • You can use explain to query for approximations and replace count(*) with approximations
  • Add summary tables
  • Use the cache

Optimizing associated query

  • Determines whether there is an index in the ON or USING clause.
  • Ensure that GROUP BY and ORDER BY are only columns in one table so that MySQL can use indexes.

Optimized subquery

  • Use associative queries instead
  • Optimize GROUP BY and DISTINCT
  • These two types of query data can be optimized using indexes, which are the most effective optimization methods
  • In associative query, the use of identity column group is more efficient
  • ORDER BY NULL (GROUP BY);
  • WITH ROLLUP super aggregation that can be moved to application processing

Optimize LIMIT paging

  • If the LIMIT offset is large, the query efficiency is low
  • You can record the maximum ID of the last query. The next query is performed based on this ID

Optimizing UNION queries

  • The efficiency of UNION ALL is higher than that of UNION

Optimize the WHERE clause

  • Most databases process conditions from left to right, putting conditions that filter more data first and those that filter less later

Some methods of SQL statement optimization

  • 1. To optimize the query, avoid full table scan as far as possible, and first consider creating indexes on the columns involved in WHERE and ORDER by.

  • 2. Avoid null values in the WHERE clause. Otherwise, the engine will abandon the use of index and perform full table scan.

    select id from t where num is nullYou can set the default value on num0Make sure there is no num column in the tablenullSelect id from t where num=0
    
    Copy the code
  • 3. Avoid using it in where clauses! = or <> otherwise the engine will abandon the index and perform a full table scan.

  • 4. Avoid the use of OR in the WHERE clause to join conditions, otherwise the engine will abandon the use of index and perform full table scan, such as:

    select id from t where num=10 or num=20Select id from t where num=10 union all select id from t where num=20
    
    Copy the code
  • 5. Use in and not in with caution, otherwise it will cause a full table scan.

    select id from t where num in(1.2.3)Select id from t where num between 1 and 3Copy the code
  • Select id from t where name like ‘% li %’ select id from t where name like ‘% li %’

  • 7. If you use parameters in the WHERE clause, it will also cause a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; It must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is unknown and therefore cannot be used as an input for index selection. The following statement will perform a full table scan:

    select id from t where num=@numYou can force a query to use an index instead:select id from t with(index name)) where num=@num
    
    Copy the code
  • 8. Expression operations on fields in the WHERE clause should be avoided as much as possible. This will cause the engine to abandon the use of indexes and perform a full table scan. Such as:

    select id from t where num/2=100Select id from t where num=100*2
    
    Copy the code
  • 9. Avoid functional manipulation of fields in the WHERE clause, which will cause the engine to abandon indexes and perform a full table scan. Such as:

    select id from t where substring(name,1.3)Select id from t where name = 'ABC %' select id from t where name = 'ABC %'Copy the code
  • 10. Do not perform functions, arithmetic operations, or other expression operations to the left of the “=” in the WHERE clause, or the system may not use the index properly.

Database optimization

Why optimize

  • The throughput bottleneck of the system often appears in the database access speed
  • As the application runs, more and more data is stored in the database, and processing times are correspondingly slower
  • Data is stored on disk and read and write speeds are not comparable to memory

Optimization principle: Reduce system bottlenecks, reduce resource occupation, and increase system response speed.

4) Database structure optimization

  • A good database design scheme for the performance of the database often get twice the result with half the effort.

  • You need to consider data redundancy, speed of query and update, and whether the data type of the field is reasonable.

Split a table with many fields into multiple tables

  • For a table with many fields, if some fields are used infrequently, you can separate these fields to form a new table.

  • Because when a table has a large amount of data, it is slowed down by the presence of infrequently used fields.

Add intermediate tables

  • For tables that require frequent joint queries, you can create intermediate tables to improve query efficiency.

  • By creating an intermediate table, you insert the data that needs to be queried through the federated query into the intermediate table, and then change the original federated query to a query against the intermediate table.

Add redundant fields

  • The design of data tables should follow the rules of the paradigm theory as far as possible, reduce the redundant fields as far as possible, and make the database design look delicate and elegant. However, reasonable addition of redundant fields can improve the query speed.

  • The more normalized a table is, the more relationships there are between tables, the more queries need to be joined, and the worse the performance.

Note:

If the value of a redundant field is changed in one table, you have to find a way to update it in another table, otherwise you will have data inconsistency problems.

MySQL database CPU up to 500%

  • When the CPU increases to 500%, run the top command of the operating system to check whether mysqld is occupied. If not, find out the processes with high CPU usage and handle the problem.

  • If mysqld is the cause, show processList to see if there is a session running in it. Find the high SQL consumption to see if the execution plan is accurate, if the index is missing, or if there is simply too much data.

  • In general, it is important to kill these threads (and see if CPU usage drops), and then re-run the SQL after making appropriate adjustments (such as adding indexes, changing SQL, changing memory parameters).

  • It is also possible that each SQL server does not consume a lot of resources, but all of a sudden, a large number of sessions are connected and the CPU spikes. In this case, you need to work with the application to analyze why the number of connections increases and adjust accordingly, such as limiting the number of connections

How to optimize the large table? How is cent library cent table done? What problem does cent table cent library have? Does middleware work? Do you know how they work?

When the number of MySQL single table records is too large, the CRUD performance of the database will be significantly reduced. Some common optimization measures are as follows:

  1. Limit the scope of data: It is important to prohibit queries that do not contain any conditions that limit the scope of data. For example, when users query the order history, we can control it within a month. ;
  2. Read/write separation: the classical database split scheme, the master library is responsible for writing, the slave library is responsible for reading;
  3. Caching: Use MySQL’s cache. For heavy, less-updated data, you can consider using application-level caching.

There is also through the way of sub-database sub-table optimization, mainly vertical partition, vertical sub-table and horizontal partition, horizontal sub-table

1. Vertical partitioning

  • Split according to the correlation of the tables in the database. For example, if the user table contains both the user login information and the user’s basic information, you can split the user table into two separate tables, or even put them into separate libraries.

  • To put it simply, vertical splitting is the splitting of data table columns. A table with many columns is split into multiple tables. This should make it a little bit easier to understand.

  • Advantages of vertical split: Smaller row data, fewer blocks to read during query, and fewer I/ OS. In addition, vertical partitioning simplifies table structure and is easier to maintain.

  • Disadvantages of vertical split: Redundant primary keys, need to manage redundant columns, and may cause Join operations, which can be solved by joining at the application layer. In addition, vertical partitioning makes transactions more complex;

2, vertical table

  • Put the primary key and some columns in one table, and the primary key and other columns in another table

Applicable scenario

  • 1. If some columns in a table are commonly used and others are not
  • 2, can make the data row smaller, a data page can store more data, reduce the number of I/O queries

disadvantages

  • Some sub-table strategies are based on the logical algorithm of the application layer. Once the logical algorithm changes, the whole sub-table logic will change, resulting in poor scalability
  • For the application layer, logical algorithms increase development costs
  • To manage redundant columns, the join operation is required to query all data

3. Horizontal partition

  • Keep the data table structure unchanged and store the data shards with some policy. In this way, each piece of data is dispersed to different tables or libraries, achieving the purpose of distribution. Horizontal splitting can support very large amounts of data.

  • Horizontal splitting is the splitting of index table rows. When the number of table rows exceeds 2 million, it will slow down. At this time, the data of a table can be split into multiple tables to store. For example, we can split the user information table into multiple user information tables to avoid the performance impact of a single table having too much data.

  • Water resolution can support very large amounts of data. Note that the split table only solves the problem of large data in a single table, but because the table data is still on the same machine, in fact, there is no significance to improve MySQL concurrency, so horizontal split is best.

  • Horizontal splitting can support very large amount of data storage and less application side transformation, but it is difficult to solve fragmented transactions, poor Join performance of cross-border points and complicated logic.

The author of "The Way to Train Java Engineers" recommends avoiding data sharding as much as possible because of the complexity of logic, deployment, and operation and maintenance. A typical data table can support less than 10 million data volumes with proper optimization. If sharding is necessary, choose client sharding architecture to reduce network I/O with middleware.

4. Level Table:

  • The table is very large, so it can reduce the number of data and index pages that need to be read during query, reduce the number of index layers, and improve the query times

Applicable scenario

  • 1, the data in the table itself is independent, for example, the table records the data of various regions or data of different periods, especially some data are commonly used, some are not commonly used.
  • 2. Data needs to be stored on multiple media.

Disadvantages of horizontal segmentation

  • 1. Add complexity to the application. Usually, multiple table names are required when querying, and all data needs UNION operation
  • 2. In many database applications, this complexity can outweigh the benefits, as queries increase the number of disk reads at an index level

There are two common scenarios for database sharding:

  • Client proxy: The sharding logic is on the application side, encapsulated in jar packages, and implemented by modifying or encapsulating the JDBC layer. Dangdang’s Sharding-JDBC and Ali’s TDDL are two commonly used implementations.
  • Middleware proxy: Adds a proxy layer between applications and data. The sharding logic is uniformly maintained in middleware services. We are talking about Mycat, 360 Atlas, netease DDB and so on are the realization of this architecture.

Problems faced after sub-database sub-table

  • When a transaction supports separate libraries and tables, it becomes a distributed transaction. If you rely on the distributed transaction management function of the database itself to execute transactions, it will pay a high performance cost. If the application program to assist control, the formation of program logic transactions, and will cause programming burden.

  • Cross-database joins

    As long as it is segmented, the problem of cross-node Join is inevitable. But good design and sharding can reduce this. A common way to solve this problem is to implement it in two queries. The ids of the associated data are found in the result set of the first query, and the second request is made according to these ids to get the associated data. Sub-database and sub-table scheme products

  • Count, Order BY,group by, and aggregate function issues across nodes these are a class of problems because they all need to be evaluated based on the entire data set. Most agents do not automatically handle merges. Solution: Similar to the cross-node join problem, get the results separately on each node and merge them on the application side. Unlike a JOIN, the queries on each node can be executed in parallel, so it is often much faster than a single large table. However, if the result set is large, the consumption of application memory is an issue.

  • Data migration, capacity planning, capacity and other issues Comprehensive business platform team from taobao, more than it used to a multiple of 2 take forward compatible characteristics (e.g., for more than 4 take 1 for more than 2 to 1) to assign data, to avoid the line levels of data migration, but still need to be table level of migration, at the same time and table size of expansion quantity are limited. Generally speaking, these schemes are not very ideal and have some disadvantages more or less, which also reflects the difficulty of Sharding’s capacity expansion from one side.

  • ID problem

  • Once the database is shelled across multiple physical nodes, we can no longer rely on the primary key generation mechanism of the database itself. On the one hand, the self-generated ID of a partitioned database is not guaranteed to be globally unique. Applications, on the other hand, need to obtain ids before inserting data for SQL routing. Some common primary key generation strategies

    • UUID Using UUID as the primary key is the simplest solution, but the disadvantages are obvious. Because UUID is very long, in addition to taking up a large amount of storage space, the main problem is the index, which has performance problems when creating indexes and querying based on indexes. In distributed systems, there are many occasions when you need to generate a global UID. Snowflake solves this need and implementation is also very simple, except configuration information. The core code is a 41-bit machine ID 10-bit sequence of 12 bits in milliseconds.
  • Sorting paging issues across shards

    Generally speaking, paging requires sorting by specified fields. When the sort field is a shard field, we can easily locate the specified shard through the sharding rule, but when the sort field is not a shard field, the situation becomes more complicated. In order to ensure the accuracy of the final result, we need to sort and return the data in different shard nodes, summarize and sort the result set returned from different shards, and finally return it to the user. As shown below:

MySQL replication principle and process

  • Master-slave replication: Transfer DDL and DML operations from the master database to the slave database using binary logs, and then re-execute (redo) the logs. This keeps the data from the slave database consistent with the master database.

The role of master-slave replication

  1. If the primary database has a problem, you can switch to the secondary database.
  2. Read/write separation can be performed at the database level.
  3. Daily backups can be made on a slave database.

MySQL master-slave replication solves a problem

  • Data distribution: Start or stop replication at will and distribute data backups across geographic locations
  • Load balancing: Reduce the stress on a single server
  • High availability and failover: Helps applications avoid single points of failure
  • Upgrade testing: you can use a higher version of MySQL as the slave library

MySQL master-slave replication works

  • Log data higher to binary logs on the main library
  • The slave library copies the master library’s logs to its own relay logs
  • An event that reads the relay log from the library and replaces it into slave library data

The rationale for the flow, the three threads and the associations between them

  • Master: binlog thread — records all statements that change the database data in the master binlog;

  • Slave: IO thread — after using the start slave, it is responsible to pull the binlog content from the master and put it into its own relay log.

  • From: SQL thread — execute statements in relay log;

Replication process

  • Binary log: Binary log of the primary database

  • Relay log: indicates the Relay log of the secondary server

  1. The master writes the operation record serially to a binlog file before each transaction updates the data.
  2. Salve starts an I/O Thread. This Thread opens a normal connection at master. The main job is binlog dump. If the reading has caught up with the master, it goes to sleep and waits for the master to generate new events. The ultimate goal of the I/O thread is to write these events to the relay log.
  3. The SQL Thread reads the relay log and executes the SQL events in the log in order to be consistent with the data in the primary database.

What are the solutions for read/write separation?

  • Read/write separation depends on master/slave replication, which in turn serves read/write separation. Because master slave replication requiresslaveCan’t write but can read (if correctslaveWrite operation, thenshow slave statusWill be presentSlave_SQL_Running=NOAt this point, you need to follow the manual synchronization mentioned earlierslave).

Plan a

  • Use the mysql-proxy proxy

  • Advantages: Directly implements read/write separation and load balancing without modifying the code. The master and slave use the same account. It is not recommended to use this account in actual production

  • Disadvantages: Reduced performance, no transaction support

Scheme 2

  • Using aop AbstractRoutingDataSource + + annotation in the dao layer decision data source.
  • If mybatis is used, you can put read/write separation in ORM layer. For example, Mybatis can use mybatis plugin to block SQL statements, all inserts /update/delete access master library, all select access salve library. This is transparent to the DAO layer. Plugins can be implemented to select master and slave libraries by annotations or by analyzing whether the statement is a read-write method. But it still has a problem, that is, do not support transactions, so we need to rewrite the DataSourceTransactionManager, to throw in the affairs of the read – only read library, the rest have read write the thrown into library.

Plan 3

  • Using aop AbstractRoutingDataSource + + annotation in the service layer decision data sources, can support the transaction.

  • Disadvantages: Aop does not intercept internal class methods that call each other in this.xx() mode, requiring special handling.

Backup plan, mysqlDump and XtranBackup implementation principle

  • (1) Backup plan

    • Mysqldump can be used to perform full backup every day (the files backed up by mysqldump are smaller and smaller after compression) because it is lighter and more flexible.

    • Xtranbackup is a faster backup than mysqlDump for libraries over 100GB. Full backup is performed once a week and incremental backup is performed every other day during off-peak service periods.

  • (2) Backup and restoration time

    • Physical backup is fast, but logical backup is slow

    • Here with the machine, especially the hard disk speed has a relationship, the following list a few for reference only

    • Mysqldump 20G 2 minutes (mysqldump)

    • 30 minutes for 80G (mysqldump)

    • 111G 30 minutes (mysqldump)

    • 288GB in 3 hours (Xtra)

    • 4 hours of 3T (XTRA)

    • The logical import time is usually five times longer than the backup time

  • (3) How to handle the backup and restoration failure

    • First of all, we should make full preparations before recovery to avoid mistakes during recovery. For example, the validity check, permission check, space check after backup. If any error occurs, adjust accordingly according to the error prompt.

(4) MysqlDump and Xtrabackup implementation principle

  • mysqldump

    Mysqldump is a logical backup. Add the – single-transaction option for consistency backup. The background process sets the TRANSACTION ISOLATION LEVELREPEATABLE READ level of the session to RR(SET Session TRANSACTION ISOLATION LEVELREPEATABLE READ) and then explicitly starts a TRANSACTION (START TRANSACTION /*! 40100 WITH CONSISTENTSNAPSHOT */), this ensures that the data read in this transaction is the snapshot of the transaction. And then read the data out of the table. — master-data=1 FLUSH TABLES WITH READ LOCK; showmaster status =1 FLUSH TABLES WITH READ LOCK Unlock it now and read the table. When all data has been derived, the transaction can be terminated

  • Xtrabackup:

    Xtrabackup is a physical backup. It copies tablespace files and scans for redo logs. When innoDB is finally backed up, a flush Engine logs operation is performed to ensure that all redo logs have been dropped (involving a two-phase commit)

  • Because Xtrabackup does not copy binlogs, you must ensure that all redo logs fall to disk, otherwise the last set of committed transaction data may be lost. This point in time is when InnoDB completes the backup. Although the data files are not consistent, having redo during this time period makes the data files consistent. Flush tables with read lock for myISam and other engines This makes for perfect hot spare.

What are the repair methods for data table corruption?

Use MyISamchk to fix it.

  • 1 Stop the mysql service before restoration.
  • 2 Open the CLI and go to the /bin directory of mysql.
  • 3 Run myisamchk -recover database path /*.myi

Use the repair table or OPTIMIZE table command to fix, REPAIR TABLE TABLE \_name REPAIR TABLE OPTIMIZE TABLE TABLE \_name REPAIR TABLE used to REPAIR damaged tables. The OPTIMIZE TABLE command OPTIMIZE TABLE is used to reclaim the spare database space, the disk space is not immediately reclaimed when the TABLE rows are removed, and the rows are rearranged with the OPTIMIZE TABLE command.