Database Concepts SQL basics

This chapter covers the basics of the SQL language — understanding the common data types, aggregates, subqueries, views, constraints, indexes, transactions, and so on that allow you to translate relational algebra into SQL logic. Indexes and transactions will be covered in detail in a future article.

The data type

The SQL standard defines a variety of data types that are common across databases.

Basic data types

Char (n), a fixed string of n lengths.
Varchar (n), a string of up to n lengths.
Int, an integer.
Smallint specifies a small integer.
Numeric (n,m), a user-specified precision floating point number. In this n digit, the sign digit occupies 1 digit, and there are m digits of n digits to the right of the decimal point.
Real & Double: floating point and double precision floating point numbers.
Float (n), a floating point number with at least n bits of precision.

Date and time in SQL

Date: records the year, month, and day.
Time: Records the hours and seconds of a day.
Timestamp: Combines date and time at the same time.

In MySQL, there are three functions that correspond to how to get them:

-- Gets the current date
SELECT curdate(a);Get the current time
SELECT curtime(a);-- Gets the current timestamp
SELECT now(a);Copy the code

In addition, you can use the extract(f from d) function to extract the corresponding field from the value d representing time. F may include year, month, day, hour, minute, and second, depending on the time type of D.

-- Extracts the year from the date
SELECT EXTRACT(year FROM CURDATE())

-- Extract hours from time
SELECT EXTRACT(hour FROM CURTIME())
Copy the code

Stores large byte/character data

The database itself provides the storage of large character data (such as an article), or large binary data (such as images, videos, etc.), collectively known as large-object date types. There are cloBs and bloBs.

The following describes the two data types from the perspective of the MySQL database.

Clobs are used to store large amounts of text, including:

type	The largest size
TinyText	255 bytes
Text	65535 bytes (about 65K)
MediumText	16 777 215 bytes (about 16 M)
LongText	4 294 967 295 (approx. 4 G)

Blobs are used to store binary files and contain:

type	The largest size
TinyBlob	255 bytes
Blob	65535 bytes (about 65K)
MediumBlob	16 777 215 bytes (about 16 M)
LongBlob	4 294 967 295 (approx. 4 G)

Create tables and delete

SQL > create a table;

CREATE TABLE <`table_name`> (<`attr1`> <`type1`> [` constraint `]."`attr2`> <`tupe2`> [` constraint `]."Primary key constraint>,Foreign key constraint],
	[` ` check clause];)Copy the code

The CREATE clause must have a primary code constraint. Examples such as non-empty constraints, unique constraints, and the short check constraint (mostly for aesthetic reasons) can be placed directly after property declarations. The last line of the declaration statement is used; A semicolon. If the primary key is a numeric key, set AUTO_INCREMENT to increment if the primary key is a numeric key. Thus, we should avoid artificially storing the primary key when we insert new tuples later.

Conversely, to drop a relationship from the database, use the drop command:

DROP TABLE <`table_name`>
Copy the code

Alternatively, you can add attributes to an existing relationship using the ALTER add clause:

ALTER ADD <`table_name`> <`attr`> <`type`>
Copy the code

The constraint

Primary key(

), foreign key(

) references
They are all used in DDL SQL. In addition, SQL also has the following constraints:
)>

Uniqueness constraint

Like non-empty constraints, add the unique keyword after attributes that require unique constraints.

CREATE TABLE <`table_name`> (<`attr1`> <`type1`> UNIQUE. PRIMARYKEY <attrn>
)
Copy the code

If no non-null constraint is imposed on this attribute, the only value that can be repeated for this column is NULL.

Check clause

The check clause provides a flexible conditional that can be modified with any predicate. For example, the constraint gender column should be either “female” or “male”.

Gender is a property.
CHECK(gender IN ('Female'.'Male'))
Copy the code

The simple check clause can modify properties directly, and the complex check clause can also be placed after the Create table. It is worth noting that some database vendors may not provide predicates containing subqueries.

Add, delete and modify operations

A delete request is expressed very similarly to a query.

DELETE FROM r 
WHERE p
Copy the code

Where R represents the relational name (table name) and P represents the conditional predicate. If the WHERE clause is omitted, the DBMS removes all tuples within the relationship. and

Insert data uses the INSERT into clause. For example, insert a new teacher information into the instructor relationship:

In simple form, this requires the values of all attributes of a given tuple.
INSERT INTO course VALUES('CS-437'.'DataBase Systems'.'Comp.Sci'.4);
If you can declare the order of attributes, the values given should also be in that order.
This is used when inserting some attributes of a tuple.
INSERT INTO course(title,course_id,credits,dept_name) VALUES ('DataBase Systems'.'CS-437'.4.'Comp.Sci')
Copy the code

Update changes the values of some attributes of an existing tuple.

UPDATE `instructor` SET salary = 70000
WHERE name = 'Wu'
Copy the code

The sorting

A relation is itself an unordered set. Therefore, if you want to order the query results, you can use the Order by clause. For example, rank the staff in descending order of salary and ascending order of name:

SELECT * FROM `instructor`
ORDER BY salary DESC.name ASC
Copy the code

Desc indicates descending order, and ASC indicates ascending order.

Aggregation function

An aggregate function is a specification function that compresses a set into a value return. The SQL standard specifies five basic protocol functions: AVg, min, Max, sum, and count. For example, query the number of teachers whose salaries are greater than 50K.

SELECT COUNT(*) FROM instructor
WHERE salary > 50000
Copy the code

The following is a slightly complicated query: “Query the number of people in each department whose salary is greater than 50K”, then we need to do a group aggregation by first, at this time, the return is the set of the calculation results of the aggregation function of each group.

SELECT COUNT(*) FROM instructor
WHERE salary > 50000
GROUP BY dept_name
Copy the code

Attributes that are not directly selected by group BY should not be directly projected by Select and can only appear in the parameters of the aggregate function. For example, in the above query, salary cannot appear directly in the SELECT clause because group by is aggregated only by dept_name:

Wrong example. This salary doesn't mean anything.
SELECT COUNT(*),salary FROM instructor
WHERE salary > 50000
GROUP BY dept_name

But salary can be used in aggregate functions. So let's say I'm looking for the average salary of a faculty member in each department greater than 50K.
-- DEPt_name can appear directly in the SELECT statement.
SELECT AVG(salary),dept_name FROM instructor
WHERE salary > 50000
GROUP BY dept_name
Copy the code

A more complex query: “Query for departments with ‘number of people earning more than 50K’ > 1”. The problem we have now is that we need to filter each department after screening. If the number of people is less than or equal to 1, then the sub-group will be discarded. A new keyword was introduced — having.

SELECT COUNT(*) AS nums,dept_name FROM instructor
WHERE salary > 50000
GROUP BY dept_name
HAVING nums > 1
Copy the code

Similarly, having clauses should not directly appear properties that are not selected by Group by (but they can still exist as arguments to aggregate functions). Another important point to emphasize is to distinguish it from the WHERE keyword: The WHERE condition is used to filter tuples, whereas having is used to filter aggregated groups.

The subquery

Subqueries can be classified as subqueries that return a relationship or scalar queries that return only a single value.

For the first seed query, it can appear in any from, in, or not in clause, as demonstrated earlier. Here’s a simpler example:

SELECT `name`
Mysql requires that subqueries following the FROM clause have a nickname.
FROM (
	SELECT * FROM instructor 
	WHERE salary > 50000
) AS t1
Copy the code

Additionally, in and not in can also be used for enumerated collections:

SELECT * FROM instructor
WHERE dept_name IN ("History"."Finance")
Copy the code

If you ensure that a subquery returns only one value, it can appear in select, WHERE, having, and so on.

The with clause

If a subquery following a FROM clause is too verbose, it can be extracted to the front and declared using the with keyword. The format is WITH < temporary table name > AS (subquery).

-- These two paragraphs are a whole.
WITH more_than_50000 AS (
	SELECT * FROM instructor 
	WHERE salary > 50000
) 
SELECT `name`
FROM more_than_50000
Copy the code

If more than one subquery needs to be named, just use the with keyword, followed by a comma, separated by:

WITH history_instructors AS (
	SELECT * FROM instructor
	WHERE dept_name = "history"
), music_instructors AS (
	SELECT * FROM instructor
	WHERE dept_name = "music"
)
SELECT * FROM history_instructors 
UNION 
SELECT * FROM music_instructors
Copy the code

Some / All

Here are two clauses: some and all. The former means “at least”, the latter means “all”. Long ago, SQL used any to mean “at least”. But ‘any’ can lead to confusion in English between ‘any’ and ‘all’, so ‘some’ was introduced later. For example: query “who earns more than at least one history faculty member” :

SELECT * FROM instructor
WHERE salary > SOME (
	SELECT salary FROM instructor 
	WHERE dept_name = "history"
)
Copy the code

For example, query “teacher who earns more than all history faculty” :

SELECT * FROM instructor
WHERE salary > ALL (
	SELECT salary FROM instructor 
	WHERE dept_name = "history"
)
Copy the code

Note that there can only be one column in the query result because we only need one value to compare with the outer salary. Also, =some is equivalent to in, but <>some is not equivalent to not in; Similarly, <>any is equivalent to not in, but =all is not equivalent to in.

Case branch

Occasionally, we need to return different results in a query based on the value of an attribute. In SQL, this branch statement can be written in two ways:

SELECT 
ID, (-- Similar to switch
	CASE grade
		WHEN "A+" THEN "great"
		WHEN "A" THEN "good"
		ELSE "ok"
	END
) AS `rank`
FROM takes
Copy the code

The first is to follow a case clause with an attribute and then retrieve it like the switch clause, with each when clause followed by a single value. Another way is:

SELECT 
ID, (-- similar to if
	CASE 
		WHEN grade IN ("A+"."A") THEN "great"
		WHEN grade IN ("B"."C") THEN "good"
		ELSE "ok"
	END
) AS `rank`
FROM takes
Copy the code

If case is not followed by an attribute, the expression can be followed by when. Either way, you must end with the end keyword. A case branch is a subquery that returns a single value, so it can theoretically appear in select, WHERE, having, and so on.

view

For some commonly used subqueries, we can declare a view to save rather than declare a large with section before each query. Grammar:

create view

(
,
,…) As (select clause). For example, create a view of the history faculty from the Instructor table:

CREATE VIEW history_instructor(name,salary) AS (
	SELECT `name`,salary FROM `instructor`
)
Copy the code

All views are based on the original database schema and can refer to each other.

Different vendors implement views differently. For some vendors, views are lazily loaded: the DBMS only records the lookup of the view, and only executes the process and returns the contents of the view when it is actually called. But for other vendor implementations, the view is real, the data stored in the database, which is called the Materialized view. However, materialized views must be updated when the tables on which they depend change.

The process of keeping the materialized view up to date is called materialized view maintenance. The most diligent way is to update the view every time a referenced table update is detected. The “lazy” approach is to set a timer to recalculate data periodically.

If you can save a lot of complex, time-consuming aggregation results as views ahead of time, you can greatly speed up the next query and avoid reading large underlying relationships. But here’s the problem: the user’s acceptance of the view’s data may be “lagging.” By stereotype, we tend to think of views as read-only. Can you affect the underlying relationships by inserting, changing, and deleting views?

INSERT INTO history_instructor(name,salary) VALUES ('Li'.50000)
Copy the code

Suppose we choose to insert a record in the history_INSTRUCTOR view, obviously this record will eventually be inserted into the INSTRUCTOR table. What about ID and dept_name? Therefore, influencing the underlying relational tables with views is a complex problem. The SQL:1999 standard sets very strict and complex criteria for when views can be inserted, changed, and deleted.

By default, databases such as MySQL do not prevent this high-risk behavior. However, the statement still fails because the ID and DEPT_name information are missing.

The index

When the amount of data in the database is large enough, traversal search becomes inefficient. For example, search hundreds of thousands of data for “teachers in physics departments earning more than 50,000”. If we can establish an appropriate index for the instructor table according to dept_name, the search range can be reduced from tens of thousands to thousands, thus improving the query efficiency.

An index is a redundant structure that is not necessary for the correctness of query results. Create an index as follows:

CREATE INDEX <`index_name`> ON <`table`> (<`attribute`>)
Copy the code

So if we wanted to index the DEPt_name of the instructor table, the SQL statement would look like this:

CREATE INDEX dept_index ON instructor(dept_name)
Copy the code

To drop an index, use the following SQL statement:

DROP INDEX <`index_name`>
Copy the code

There are two ways to index – B+ tree index and hash index.

The transaction

In simple terms, a transaction consists of a series of independent queries/statements that are “do all or do none” to ensure consistency. Here is how to use a simple transaction in MySQL:

BEGIN;
	
	UPDATE bank SET balance = balance - 1000.00 WHERE `name` = 'Li Ping';
	UPDATE bank SET balance = balance + 1000.00 WHERE `name` = 'Wang Fang';

COMMIT;
Copy the code

If you execute Begin; . Commit; There are no errors in the block process, so we actively Commit a Commit after the block ends to permanently save the changes made in the transaction to the database. On the other hand, if you get some “unexpected errors”, you need to commit a ROLLBACK after the transaction to bring the database back to BEGIN. The state before we started.

BEGIN;
	
	UPDATE bank SET balance = balance - 1000.00 WHERE `name` = 'Li Ping';
	UPDATE bank SET balance = balance + 1000.00 WHERE `name` = 'Wang Fang';
	
All of the above changes are invalidated and we call the transaction rolled back.
ROLLBACK;
Copy the code

Rollback can undo Delete, Update, and Insert operations, but DDL operations such as Create and Drop cannot be rolled back. It is also important to note that the database itself does not determine whether a transaction is committed or rolled back based on the success of the transaction block execution. For example, if a statement a “unfortunately” contains a syntax error in a transaction block, the transaction will interrupt abnormally, and the results of the previous statement A will be preserved, but subsequent queries will not be executed, which obviously violates the consistency requirement of the transaction.

In addition to these simple errors, determining whether a transaction should be committed or rolled back requires us to set up our own controls — which will be covered in the next SQL topic.