Relational databases (mainly Mysql)

This is the 25th day of my participation in the Gwen Challenge

Database knowledge as an interview question, in the interview process accounted for a good proportion, especially back-end development, must be proficient, especially index and business, each professional interviewers will ask, must not only stay on the increase and delete check.

1. Database three paradigms

First normal Form: Requires each column to be the smallest unit of data and indivisible.

For example, the student table (student number, name, gender, date of birth), the date of birth can be divided into (birth year, birth month, birth date), so it does not conform to the first paradigm.

Second normal form: Based on the first normal form, each column is required to be associated with the primary key.

For example, the student table (student ID, name, course ID, credit), where the name depends on the student ID, credit depends on the course ID, the second normal form emphasizes that non-primary key fields must depend on the primary key, so it does not conform to the second normal form.

Possible problems: (1) Data redundancy: each record contains the same information, for example, all students choose the same course. (2) Abnormal deletion: if students are deleted, the corresponding courses are also deleted. (3) Insert exception: if a student does not select a course, the database cannot be inserted. (4) Abnormal update: Course credits are adjusted, and all lines may have to be updated.

The correct design should be as follows: Student table (Student ID, name) Curriculum table (course ID, credits) Student selection table (Student ID, course ID, grade)

Third normal form: On a second normal form basis, each column is required to be directly related to the primary key, not indirectly.

For example, student list (student NUMBER, name, age, name of college, telephone number of college)

Because there are dependencies: student number -> student -> college -> college phone, and the third normal Form requires that no field be derived from any other field, it requires that there be no redundancy in the field, that is, no transitive dependencies.

Possible problems are as follows: (1) Data redundancy: college information is saved repeatedly. (2) Abnormal update: When updating college information, multiple records may need to be updated; otherwise, data inconsistency may occur.

The correct design should be as follows: Student list (student ID, name, age, school)

2. Describe the advantages and disadvantages of paradigms and antiparadigms respectively

Fan Shihua

Advantages: (1) Reduce data redundancy. (2) There are less repeated data in the table and the update is relatively fast. (3) Formalized tables are usually smaller than antiformalized tables.

Disadvantages (1) Multiple tables are required for joint query, which reduces the query efficiency. (2) Increased the difficulty of index optimization.

The Fan Shihua

Advantages: (1) Can reduce table association. (2) Better index optimization.

Disadvantages: (1) data redundancy. (2) More costs are needed to modify data.

Mysql > select * from B+ tree

(1) The non-leaf node of B+ tree only stores keywords and Pointers to child nodes, while the non-leaf node of B+ tree also stores data. In the case of the same size, B+ tree can store more data. (2) The leaf node of B+ tree stores all the keywords and data, and multiple nodes are connected by linked list, which can be searched quickly. (3) Non-leaf nodes of B+ tree do not store data, so the query time complexity is fixed at O(logN), while the query time complexity of B tree is uncertain, which is better at O(1).

4, why B+ tree is more suitable than B tree for database index, besides database index, there are other places to use (operating system file index)

As both leaf nodes and non-leaf nodes of a B tree store data, non-leaf nodes can store fewer keywords and Pointers. To store a large amount of data, you can only increase the height of the tree, resulting in more I/O operations and lower query performance.

In addition to database indexes, there are operating system file indexes using B tree.

5. Clustered index and non-clustered index

(1) Cluster index, also called primary key index, each table has only one primary key index, and the leaf node stores the value and data of the primary key. (2) Non-clustered index, also called secondary index, leaf node stores the value of index field and primary key.

Prefix index and overwrite index

(1) For columns with long values, such as TEXT, BLOB, VARCHAR, we must create a prefix index, that is, the first part of the index. This can save space and improve query efficiency. However, you cannot use a prefix index to do ORDER BY and GROUP BY, nor can you use a prefix index to do an overwrite scan.

(2) The data column that overwrites the select index can be obtained from the index, rather than from the table. An index that contains (or overwrites) the fields and conditions of a query is called an overwritten index.

When you initiate an indexed covered query (also known as an indexed covered query), you can see the Using Index information in the Extra column of EXPLAIN

7. Introduce database transactions

A transaction is a sequence of operations that either all or none of them execute.

Transactions have four characteristics: A (atomicity), C (consistency), I (isolation), and D (persistence)

What isolation levels does Mysql have

Mysql > select * from ‘mysqld_product’; How to solve

Dirty read: there are two transactions, A and B. A reads A field that B has modified but not committed. If B rolls back, the field read by A is temporarily invalid. You can raise the isolation level to read commit to resolve dirty read problems.

Unrepeatable reads: there are two transactions A and B. A reads A field value, B updates and commits the transaction, and A reads the field again, which is not the same as before. You can solve this problem by increasing the isolation level to repeatable reads.

Phantom read: There are two transactions, A and B. A reads A record in A range, and B inserts A new record in that range and commits it. When A reads the record in that range again, phantom row is generated. You can increase the isolation level to serializable, or use MVCC + Next-key.

9, Mysql at the isolation level of repeatable read, will there be phantom read cases, why?

No, MySQL’s default isolation level is RRMVCC + next-keyLock the way to solve illusionary reading.

PS: In fact, strictly speaking, there is illusionary reading… A starts the transaction and executes the query. Then B starts the transaction and adds A new data and commits it. At this time, A queries again and finds no phantom read. I think it should be A at the time of executing the update operation, has built A create A version number for the record of A transaction version number, and then mark B transaction created record to be deleted, query the version number of the basis is to remove the version number is empty, or greater than the current version number, and create the version number less than or equal to the current transaction version number, so here just A update the data, It obviously fits the criteria for the query, so it will be detected.

MVCC version number theory reference articles: www.cnblogs.com/shujiying/p… Detailed test reference articles: blog.csdn.net/w139074301/…

How is Mysql transaction implemented

Atomicity: Passesundo logThe implementation. Each change in data is accompanied by oneundo logLogs are generated when system errors occur or data is rolled back based onundo logDo the reverse.
Consistency: passredo logThe implementation.redo logData modification logs are recorded. Data is stored in the buffer pool and is periodically synchronized to disks. If the system is down, data may be lost and can be read after the system restartsredo logRestore records.
Isolation: The mysql database passesMVCC + next-keyThe locking mechanism enables isolation.
Consistency: The above three features ensure the consistency of transactions.

What is the difference between a Binlog and a Redo log?

A binlog is a binary file that records all changes to the database, excluding select and show operations, which do not modify the data itself. But if you manipulate data and it doesn’t change, it’s also recorded in the binlog. Often used for data recovery, data backup.

A redo log is a log file that records changes made to a transaction, whether or not the transaction is committed. In the case of instance and media failures, InnoDB storage engine restores to the previous state according to the Redo log to ensure data integrity.

Between the three log must understand the mysql binlog, undo and redo log log:segmentfault.com/a/119000002…

Talk about MVCC multi-version concurrency control

MVCC is implemented by adding two columns after each row record. These two columns, one holds the time when the row was created and the other holds the time when the row was deleted. You don’t store the actual time, you store the System Version Number. The system version number increases each time a new transaction is started. The system version number at the start of the transaction is used as the version number of the transaction and is compared with the version number of each row of records queried.

select

InnoDB checks each row based on two criteria: (1) InnoDB only looks for rows whose version number is less than or equal to the current transaction version number. This ensures that the rows read by the transaction either existed before the transaction started or were inserted or modified by the transaction itself. (2) The deleted version number of the line is either undefined or greater than the current transaction version number. This ensures that rows read by the transaction are not deleted before the transaction begins.

Only the records that meet the preceding conditions are returned as the query results.

insert

InnoDB stores the current system version number as the row version number for each row inserted.

delete

InnoDB stores the current system version number as a row deletion identifier for each row that is deleted.

update

InnoDB inserts a new record, saves the current system version number as the row version number, and saves the current system version number to the original row as the row deletion mark.

13. What is the difference between Innodb and MyISAM

(1) InnoDB supports transactions, MyISAM does not. (2) InnoDB supports foreign keys, MyISAM does not. (3) InnoDB primary key index leaf node stores data files, secondary index leaf node is the primary key value. MyISAM’s primary key index and secondary index, leaf nodes store Pointers to data files. Mysql > select count(*) from TB; mysql > select count(*) from TB; MyISAM holds the number of rows of the entire table in a single variable, which is a fast read. (5) All InnoDB tables are stored in one file on disk, MyISAM in three files. (6) InnoDB needs more memory and storage, MyISAM can be compressed, storage space is smaller. (7) Innodb migration scheme copy files, backup binlog, or use mysqldump, migration is difficult. MyISAM data is stored as a file and can be manipulated separately against tables during backup and recovery. InnoDB supports row lock, table lock, MyISAM supports table lock. (9) Innodb does not support full-text indexing until version 5.7. MyISAM supports full-text indexing.

14. What is Innodb’s default locking mode and how to achieve it

InnoDB’s default locking is row-level locking, which is implemented by locking index entries on indexes.

15. How to deal with DDL efficiently

DDL is an index defined statement, that is, build a table, build a view. For example, in the definition of data fields, follow the small rules. Table creation to reduce coupling.

Mysql > alter database create index

Alter table XXX ENGINE = InnoDB (3) repaire table XXX, (4) OPTIMIZE TABLE XXX

17, For a multi-column index, when the index is available and when it is not

Select * from vARCHar; select * from varchar; select * from varchar; select * from varchar; select * from varchar; select * from varchar; Select not, <>,! =, index invalid (6

18. Why is using a database index more efficient, and under what circumstances is it not?

The default SQL statement is to perform a full table scan. If a match condition is encountered, the query result is added to the search result set. If there is an index, the number of rows in the index table that are located to a specific value at one time is removed, reducing the number of matches traversed. Indexes transform disordered data into relatively ordered data structures.

Where no index is needed, as above.

Usage scenarios of shared locks and exclusive locks

Update, add, delete default exclusive lock, query default unlocked.

Shared lock: Use the syntax select * from TB lock in share mode, which can be read by itself or by other transactions (or by continuing to add the shared lock), but cannot be modified by other transactions.

Exclusive lock: use the syntax select * from TB for update, its own can be added, deleted, changed, and checked, other transactions can not be any operation.

Advantages and disadvantages of relational and non-relational databases

(1) Relational database

Advantages:

Two dimensional table, easy to understand. Easy to operate. Easy to maintain. Support SQL.

Disadvantages:

The read and write performance is poor. Fixed table structure, not flexible enough. In high concurrency scenarios, disk I/O bottlenecks exist. The read/write performance of massive data is poor.

(2) non-relational database

Advantages:

No SQL parsing is required, and the read and write performance is high. You can use memory or hard disk as the carrier to achieve high speed. Based on key-value pairs, the data has no coupling, making it easy to expand. Simple deployment.

Disadvantages:

SQL is not supported, which increases the learning cost. There are no transactions.

Mysql > create a slow query

When the response time exceeds the threshold, slow query logs are generated. Generally, the following conditions will cause slow query.

(1) No index is set, or no index is used in the query. (2) The I/O throughput is too small. (3) Insufficient memory. (4) The network is bad. (5) The amount of data queried is too large. (6) Lock or deadlock. (7) Return unnecessary rows or columns. (8) There are problems in query statements, which need to be optimized.

Slow query logs are disabled by default. Do not enable slow query logs if it is not necessary. Otherwise, performance will be affected.

Use SHOW VARIABLES LIKE ‘slow_query%’;

Slow_query_log, slow query enable or disable status

Slow_query_log_file: slowly queries the log location. Open the file in the log location using the text editor and query the log location slowly

How to deal with slow queries, how do you generally deal with slow queries

(1) Put the data, logs and indexes on different I/O devices to increase the reading speed. (2) vertical and horizontal segmentation table, reduce the size of the table. (3) Upgrade the hardware. (4) According to the query conditions, establish index, index optimization. (5) Improve network speed. (6) Expand server memory. (7) Sub-database sub-table.

Mysql > select varchar from varchar

Varchar changes the length depending on what is stored. Char is a fixed length, supplemented with Spaces if the length is insufficient.

The advantages and disadvantages of database foreign keys

Advantages: (1) The consistency and integrity of data can be guaranteed to the maximum extent. (2) Increase the readability of ER diagram.

Disadvantages: (1) affect the efficiency of data operation. (2) Increase the difficulty of development, resulting in too many tables.

25, have you used the database view

Use create view view_name as select * from TB to create a view. Use create or replace view view_name as select * from TB to modify the view. Run the select * from view_name command to query a normal view. Drop a view using drop View view_name.

The basic concept of database, view and how it works: blog.csdn.net/buhuikanjia…

Mysql > insert data into Mysql > insert data into Mysql > insert data into Mysql

(1) Single instance or single node group does not worry about the amount of data acquired by web crawler. It is recommended to use self-increasing ID for better performance. (2) Distributed scenario: small-scale distributed scenario with 20 nodes. Uuid is recommended. In a medium-scale distributed scenario with 20 to 200 nodes, the self-increasing ID + step policy is recommended. If the value is greater than 200, the global auto-increment ID of the snowflake algorithm is recommended.

27, What are the data types of Mysql

(1) Integer types: BIT, BOOL, TINY INT, SMALL INT, MEDIUM INT, INT, BIG INT. (2) Floating point types: FLOAT, DOUBLE, DECIMAL (3) String type: CHAR, VARCHAR, TINY TEXT, TEXT, MEDIUM TEXT, LONGTEXT, TINY BLOB, MEDIUM BLOB, LONG BLOB. (4) DATE type: DATE, DATETIME, TIMESTAMP, TIME, YEAR

When using, it is recommended to follow the small principle.

(1) When using char or varchar, note that char removes the space at the end of the string. (2) While using text or blob, periodically clean up the fragmentation space using the OPTIMIZE TABLE command. (3) Floating point numbers cause precision loss, so use Decimal as much as possible.

28, What are the methods of Mysql cluster and what scenarios are they applicable to

LVS + Keepalived + MySQL DRBD + Heartbeat + MySQL MySQL + Proxy MySQL Cluster MySQL + MHA MySQL + MMM

(2) Scenario: MHA or Keepalived or Heartbeat is used instead of data splitting. Cobar is used when data splitting is performed. Amoeba is used when data splitting is performed when read/write separation is required

29, how to ensure strong consistency between primary and secondary mode in Mysql

Master/slave replication principle: The master writes data and writes data to logs. The slave writes data to logs based on the master’s data execution process.

This process may cause data inconsistency between the master and slave. (1) The master fails to write logs. (2) The slave fails to imitate data according to the log.

Solution: (1) Modify the configuration on the master interface

innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
Copy the code

The above two options ensure that each transaction is flushed to disk in real time after it is committed. In particular, ensure that the binlog corresponding to each transaction is flushed to disk in time. (2) Modify the configuration on the slave

master_info_repository = "TABLE"
relay_log_info_repository = "TABLE"
relay_log_recovery = 1
Copy the code

The first two options above do: Ensure that the metadata tables related to slave and replication also use InnoDB engine, protected by InnoDB transaction security. The latter option enables the automatic repair mechanism of relay log. Relay logs will be automatically determined which relay logs need to be captured from the master to be used again to avoid the possibility of some data loss.

How does Mysql cluster ensure primary/secondary availability

Use the HA tool. The HA tool is deployed on a third server and connects to the primary and secondary servers to check whether the primary and secondary servers survive. If the master library goes down, the slave library is upgraded to the master library in time, and the original master library is downgraded to the slave library.

Mysql read-write separation what are the solutions

(1) Configure multiple data sources. (2) Use middleware proxy.