1. Primary key Superkey Candidate foreign key

Primary key:

A combination of data columns or attributes in a database table that uniquely and completely identify a stored data object. A data column can have only one primary key, and the value of the primary key cannot be missing, that is, cannot be Null.

Super button:

The set of attributes that uniquely identify a tuple in a relationship is called the superkey of the relational schema. An attribute can be used as a superkey, or a combination of attributes can be used as a superkey. Superkeys contain candidate keys and primary keys.

A candidate key:

Is a minimum superbond, that is, a superbond with no redundant elements.

Foreign keys:

The primary key of another table that exists in one table is called a foreign key of that table.

2. Four characteristics of database transactions and their meanings

Four basic elements for proper execution of database transaction transanction. ACID, Atomicity, Correspondence, Isolation, Durability. Atomicity: All operations in the entire transaction are either complete or not complete, and cannot be stopped at some intermediate stage. If a transaction fails during execution, it will be rolled back to the state before the transaction began, as if the transaction had never been executed. Consistency: Database integrity constraints are not broken before and after a transaction. Isolation: The isolation state performs transactions as if they were the only operation performed by the system at a given time. If you have two transactions, running at the same time and performing the same function, the isolation of the transaction ensures that each transaction is considered by the system to be the only one using the system. This property is sometimes called serialization. To prevent confusion between transaction operations, requests must be serialized or serialized so that only one request is used for the same data at a time. Persistence: After a transaction completes, changes made to the database by that transaction persist in the database and are not rolled back.

3. What does a view do? Can a view be changed?

Views are virtual tables that, unlike tables that contain data, contain only queries that dynamically retrieve data when used; Does not contain any columns or data. Views simplify complex SQL operations, hide details, and protect data. Once views are created, they can be exploited in the same way as tables. Views cannot be indexed, have associated triggers or default values, and order BY is overwritten if the view itself has an order BY. Create a view: create view XXX as XXXXXXXXXXXXXX; Some views, such as Distinct Union, can be updated without the use of join subquery grouping. Updates to views will update the base table. But views are primarily used to simplify retrieval and protect data, not for updates, and most views are not updatable.

4. Differences between DROP, DELETE, and TRUNCate

Truncate Deletes data from a table. The VALUE of TRUNCate increases and the ID starts from 1. Delete Deletes data from a table.

(1) The DELETE statement deletes a row from the table each time and saves the row deletion operation as a transaction in the log for rollback. TRUNCATE TABLE deletes all data from the TABLE at one time, and individual deletion operations are not logged. Deleted rows cannot be restored. And no table-related delete triggers are activated during the delete process. Fast execution speed.

(2) Space occupied by tables and indexes. After a table is TRUNCATE, the space occupied by the table and index is restored to the initial size. The DELETE operation does not reduce the space occupied by the table or index. The drop statement frees all space occupied by the table.

(3) Generally, DROP > TRUNCate > DELETE

(4) Application scope. TRUNCATE only applies to TABLE; DELETE can be a table or a view

(5) TRUNCATE and DELETE only DELETE data, while DROP deletes the entire table (structure and data).

(6) TRUNCate and DELETE without WHERE: delete data only, not table structure; DROP removes dependent, trigger index; Stored procedures/functions that depend on this table are retained with a state of: INVALID.

(7) The DELETE statement is data Maintain Language (DML). This operation will be put into the rollback segment and take effect after the transaction is committed. If there is a corresponding tigger, it will be fired at execution time.

Truncate and DROP are DATA define languages (DLLS). The operation takes effect immediately. The original data cannot be rolled back because it is not in the rollback segment

(9) In the case of no backup, use DROP and TRUNCate with caution. To delete some rows, use delete and be careful to constrain the scope of influence with WHERE. The rollback section should be large enough. To drop a table, use drop; To preserve a table and delete data from it, use TRUNCate if the data is not related to a transaction. If it is related to a transaction, or if the teacher wants to trigger, use delete.

(10) Truncate TABLE name is fast and efficient. The Truncate table name is the same as the DELETE statement without a WHERE clause. All rows in the table are deleted. However, TRUNCATE TABLE is faster than DELETE and uses less system and transaction log resources. The DELETE statement deletes one row at a time and records an entry in the transaction log for each deleted row. TRUNCATE TABLE deletes data by releasing the data pages used to store TABLE data, and only the release of the pages is recorded in the transaction log.

(11) TRUNCATE TABLE deletes all rows in the TABLE, but the TABLE structure, columns, constraints, indexes, etc., remain unchanged. The count used by the new row identity is reset to the seed of the column. If you want to preserve the identity count value, use DELETE instead. To DROP the TABLE definition and its data, use the DROP TABLE statement.

(12) For tables referenced by FOREIGN KEY constraints, the TRUNCATE TABLE cannot be used, but the DELETE statement without the WHERE clause should be used. Since TRUNCATE TABLE is not logged, it cannot activate the trigger.

5. How indexes work and their types

A database index is a sorted data structure in a database management system to help query and update data in a database table quickly. Indexes are usually implemented using B trees and their variant B+ trees.

In addition to the data, the database system maintains data structures that satisfy specific lookup algorithms, and that reference (point to) the data in a way that makes it possible to implement advanced lookup algorithms on these data structures. This data structure is called an index.

Indexing tables comes at a cost in terms of increased database storage and the time it takes to insert and modify data (because indexes change accordingly).

! [] (P3-juejin.byteimg.com/tos-cn-i-k3…)

The figure shows one possible way of indexing. On the left is the data table, with two columns and seven records, and on the far left is the physical address of the data records (note that logically adjacent records are not physically adjacent on disk). To speed up Col2 lookup, a binary lookup tree can be maintained as shown on the right. Each node contains the index key value and a pointer to the physical address of the corresponding data record, so that binary lookup can be used to obtain the corresponding data within O(log2n) complexity.

Creating indexes can greatly improve system performance.

First, by creating unique indexes, you ensure that each row of data in a database table is unique.

Second, it can greatly speed up the retrieval of data, which is the main reason for creating indexes.

Third, it can speed up joins between tables, especially in terms of achieving referential integrity of data.

Fourth, when using grouping and sorting clauses for data retrieval, you can also significantly reduce the grouping and sorting time in queries.

Fifth, through the use of index, can be in the query process, the use of optimization hide, improve the performance of the system.

One might ask: with all the advantages of adding indexes, why not create an index for every column in the table? There are many downsides to increasing indexes.

First, creating and maintaining indexes takes time, which increases with the volume of data.

Second, the index needs to occupy physical space, in addition to the data table occupies data space, each index also needs to occupy a certain physical space, if the establishment of clustered index, so the space will be larger.

Third, indexes are maintained dynamically when adding, deleting, or modifying data in a table, which slows down data maintenance.

An index is built over certain columns in a database table. When creating an index, you should consider which columns you can and cannot create indexes on. ** In general, indexes should be created on these columns: ** On columns that need to be searched frequently, you can speed up the search; Enforces uniqueness of primary key columns and organizes the arrangement of data in the table; Columns that are often used to join are mainly foreign keys that speed up the join; Create indexes on columns that often need to be searched by range because the indexes are sorted and specify a contiguous range; Create an index on a column that often needs to be sorted because the index is already sorted so that queries can take advantage of the index sort to speed up the sort query time; Create indexes on columns that are often used in the WHERE clause to speed up the determination of conditions.

Also, indexes should not be created for some columns. In general, columns that should not be indexed have the following characteristics:

First, you should not create indexes for columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, having an index or no index does not improve query speed. On the contrary, due to the increase of indexes, the maintenance speed of the system is reduced and the space requirements are increased.

Second, you should not add indexes to columns that have very few data values. This is because, because the values of these columns are very small, such as the gender column of the personnel table, in the result of the query, the data rows of the result set account for a large proportion of the data rows in the table, that is, a large proportion of the data rows that need to be searched in the table. Adding indexes does not significantly speed up retrieval.

Third, you should not add indexes to columns defined as text, image, and bit data types. This is because the data volume of these columns is either quite large or has very few values.

Fourth, indexes should not be created when the modification performance is much greater than the retrieval performance. This is because modification performance and retrieval performance are incompatible. When indexes are added, retrieval performance is improved, but modification performance is reduced. When indexes are reduced, modification performance is improved and retrieval performance is reduced. Therefore, indexes should not be created when the modification performance is much greater than the retrieval performance.

Depending on the capabilities of the database, you can create three types of indexes in the database designer: unique indexes, primary key indexes, and clustered indexes.

The only index

A unique index is one in which no two rows are allowed to have the same index value.

When duplicate key values exist in existing data, most databases do not allow the newly created unique index to be stored with the table. The database may also prevent the addition of new data that would create duplicate key values in the table. For example, if a unique index is created on the last name (lname) of an employee in the Employee table, no two employees can have the same last name. Primary key indexes Database tables often have a column or combination of columns whose value uniquely identifies each row in the table. This column is called the primary key of the table. Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a specific type of unique index. This index requires that each value in the primary key be unique. It also allows fast access to data when using a primary key index in a query. Clustered index In a clustered index, the physical order of the rows in the table is the same as the logical (index) order of the key values. A table can contain only one clustered index.

If an index is not a clustered index, the physical order of the rows in the table does not match the logical order of the key values. Clustered indexes generally provide faster data access than non-clustered indexes.

Locality principle and disk prefetch

Due to the characteristics of the storage medium, the disk itself access is much slower than the main memory, coupled with the cost of mechanical movement, the disk access speed is often one hundred percent of the main memory, so in order to improve efficiency, to minimize the disk I/O. To achieve this goal, the disk does not read data strictly on demand. Instead, the disk prereads each time. Even if only one byte is required, the disk reads a certain length of data backward from this position into memory. The theory behind this is the famous principle of locality in computer science: when a piece of data is used, nearby data is usually used immediately. The data required during the running of a program is usually concentrated.

Because sequential disk reads are efficient (no seek time required, very little spin time required), preread can improve I/O efficiency for localized programs.

The prefetch length is an integer multiple of a page. A page is a logical block of computer-managed memory. Hardware and operating systems often divide the main memory and disk storage areas into contiguous, equal-sized blocks. Each block is called a page (in many operating systems, the page size is usually 4k). When the data that the program is trying to read is not in main memory, a page-missing exception will be triggered. At this time, the system will send a disk read signal to the disk. The disk will find the starting location of the data and load one or more pages back into memory successively.

B-/+Tree index performance analysis

Index B-/+Tree index B-/+Tree index

As mentioned above, disk I/O counts are generally used to evaluate index structures. Based on the definition of the B-tree, a maximum of H nodes need to be accessed in a search. The designers of the database system took advantage of disk prefetch by setting the size of a node equal to one page so that each node could be fully loaded with only one I/O. In order to achieve this, the following techniques are also needed to implement the B-tree in practice:

Each time a new node is created, it requests space for a page directly. This ensures that a node is physically stored on a page, and computer storage allocation is page-aligned, so that a node needs only one I/O.

** One search in b-tree requires at most H-1 I/ OS (root node resident memory), and the progressive complexity is O(h)=O(logdN). ** In general practice, the degree D is a very large number, usually more than 100, so h is very small (usually no more than 3).

In red-black trees, the H is significantly deeper. Because nodes (parent and child) that are logically close may be physically far away and cannot take advantage of locality, the I/O progressive complexity of red-black Tree is also O(h), and the efficiency of red-black Tree is significantly lower than that of B-tree.

In summary, using B-tree as index structure is very efficient.

6. Type of connection

Alter table table1,table2; create table table1(id int,name varchar(10)) create table table2(id int,score int) insert into table1 select 1,’lee’ Insert into table1 select 2,’zhang’ insert into table1 select 4,’wang’ insert into table2 select 1,90 insert into table2 Select 2100 insert into table2 the select 3, such as table — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — table1 | table2 | ————————————————- id name |id score | 1 lee |1 90| 2 zhang| 2 100| 4 wang| 3 70| ————————————————-

The following are performed in the query analyzer i. Outer join 1. Concept: including left outer join, right outer join or full outer join

The result set of a left outer join includes all the rows of the left table specified in the left outer clause, not just the rows matched by the join column. If a row in the left table has no matching row in the right table, all selection list columns in the right table are null in the associated result set row. Select * from table1 left join table2 on table1.id=table2.id ————- result ————- idnameidscore — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 1 lee190 2 zhang2100 wangnullnull 4 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — notes: Table1 include all the clauses, according to the conditions specified return table2 the corresponding field, does not conform to the null for show

Outer join right join right outer join right outer join right outer join right outer join All rows of the right table are returned. If a row in the right table has no matching row in the left table, a null value is returned for the left table. Select * from table1 right join table2 on table1.id=table2.id ————- result ————- idnameidscore — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 1 lee190 2 zhang2100 NULLNULL370 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — notes: Table2 include all the clauses, according to the conditions specified return table1 corresponding field, does not conform to the null for show

Outer join (1) Full join returns all rows in the left and right tables. When a row has no matching row in another table, the selection list column in the other table contains null values. If there are matching rows between tables, the entire result set row contains the data values of the base table. Select * from table1 full join table2 on table1.id=table2.id ————- result ————- idnameidscore — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 1 lee190 2 zhang2100 4 wangnullnull NULLNULL370 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — notes: Returns the sum of left and right joins (see top left and right joins)

1. Concept: An inner join is a join that uses a comparison operator to compare the values of the columns to be joined

2. Inner join: join or inner join

Select * from table1 join table2 on table1.id=table2.id ————- result ————- idnameidscore — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 1 lee190 zhang2100 2 — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — note: return only eligible table1 and table2

A:select A.*,b.* from table1 A,table2 b where a.id=b.id b :select * from table1 cross join table2 where A.*,b.* from table1 A,table2 b where A. id=b.id b :select * from table1 cross join table2 where Table1. id=table2.id (cross join where (on))

Cross connection **(complete)**

1. Concept: A cross join without a WHERE clause produces a Cartesian product of the tables involved in the join. The number of rows in the first table times the number of rows in the second table is equal to the size of the Cartesian product result set. (table1 and table2 cross join to produce 3*3=9 records)

2. Cross join (without condition where…)

3. The SQL statement select * from table1 cross join table2 — — — — — — — — — — — — — the — — — — — — — — — — — — — idnameidscore — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 1lee190 2zhang190 4wang190 1Lee2100 2zhang2100 4wang2100 1lee370 2Zhang370 4wang370 —————————— Returns 3*3=9 records, the Cartesian product

A:select * from table1,table2

7. Database paradigm

1 First Normal Form (1NF)

In any relational database, the first normal form (1NF) is the basic requirement for the relational schema, and a database that does not meet the first normal form (1NF) is not a relational database. The so-called first normal form (1NF) means that each column of a database table is an indivisible basic data item, and there cannot be multiple values in the same column, that is, an attribute in an entity cannot have multiple values or duplicate attributes. If duplicate attributes occur, you may need to define a new entity consisting of duplicate attributes in a one-to-many relationship with the original entity. In first normal Form (1NF), each row of a table contains information for only one instance. In short, the first normal form is a repeatless column.

2. Second Normal Form (2NF)

The second normal form (2NF) is established on the basis of the first normal form (1NF), that is to satisfy the second normal form (2NF) must first satisfy the first normal form (1NF). Second normal Form (2NF) requires that every instance or row in a database table must be uniquely locatable. Differentiation typically requires adding a column to the table to store the unique identity of each instance. This unique attribute column is called the primary key or primary key, primary code. The second normal Form (2NF) requires that the attributes of the entity depend entirely on the primary key. The so-called complete dependence means that there cannot be attributes that only depend on part of the master key word. If there is, then this attribute and this part of the master key word should be separated to form a new entity. The relationship between the new entity and the original entity is one-to-many. Differentiation typically requires adding a column to the table to store the unique identity of each instance. In short, the second normal form is a non-primary attribute that is not partially dependent on the primary keyword.

3. Third Normal Form (3NF)

To satisfy the third normal form (3NF), one must first satisfy the second normal form (2NF). In short, the third normal Form (3NF) requires that one database table does not contain non-primary keyword information that is already contained in other tables. For example, there is a department information table, where each department has the department id (dept_id), department name, and department introduction. After listing the department number in the employee information table, the department name, department introduction and other department-related information cannot be added to the employee information table. If the department information table does not exist, it should also be built according to the Third Normal Form (3NF), otherwise there will be a lot of data redundancy. In short, the third normal form is that attributes do not depend on other non-primary attributes. (My understanding is to eliminate redundancy)

8. Database optimization ideas

1.SQL statement optimization

1) Avoid using it in where clauses! = or <> otherwise the engine will abandon the index for a full table scan. 2) Avoid null values in the WHERE clause. Otherwise, the engine will abandon the use of index and perform full table scan, such as: Select id from t where num is null; select id from t where num is null; Select id from t where num=0 3) Many times it is a good choice to use exists instead of in. 4) Replace the HAVING clause with the WHERE clause because HAVING only filters the result set after all records have been retrieved

2. Index optimization

See above index

3. Database structure optimization

1) Paradigm optimization: such as eliminating redundancy (saving space…) 3) Split table: Partition physically separates data, and data of different partitions can be arranged and stored in data files on different disks. So, when the query on the table, only need to scan in the partition table, without a full table scan, significantly shorten the query time, at different disk partition also scatter data transmission on the table in different disk I/O, an elaborate set of partition of disk I/O data transmission can be evenly spread out. This method can be adopted for time tables with large data volume. Table partition can be built automatically on a monthly basis. 4) Split actually divided into vertical split and horizontal split: case: simple shopping system temporarily involved in the following table: 1. Product table (data volume 10W, stable) 2. Order table (data volume 200W, and growing trend) 3. Mysql can tolerate the order of magnitude in millions of static data can be up to tens of millions ** vertical split: ** Horizontal split: Solve the problem: the pressure caused by the increase of data volume in a single table does not solve the problem: IO contention between tables: The user table is split by gender into male user table and female user table. The order table is split into completed order table and unfinished order table. The product table is split into completed order table and unfinished order table.

4. Optimize server hardware

That costs a lot!

9. The difference between stored procedures and triggers

Triggers are very similar to stored procedures in that they are also sets of SQL statements. The only difference is that triggers cannot be called with an EXECUTE statement, but instead trigger (activate) execution automatically when the user executes a Transact-SQL statement. A trigger is a stored procedure that is executed when data in a specified table is modified. Triggers are typically created to enforce referential integrity and consistency of logically related data in different tables. Since users cannot bypass triggers, they can be used to enforce complex business rules to ensure data integrity. Triggers differ from stored procedures in that they are executed primarily through event execution, whereas stored procedures can be invoked directly through stored procedure names. When operations such as UPDATE, INSERT, or DELETE are performed on a table, SQL Server automatically executes SQL statements defined by triggers to ensure that data is processed in accordance with the rules defined by these SQL statements.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =