In addition to basic CRUD, some complex queries must be done with more sophisticated tools. This chapter mainly introduces the following contents from the perspective of MySQL:

  1. Functions, stored procedures, triggers.
  2. Logical control with cursors (equivalent to “T-SQL”).
  3. Based on theWITH RECURSIVEThe idea of recursive query.
  4. Windowing calculation and rotation.
  5. High dimensional aggregation: ROLLUP and CUBE.

MySQL does not provide support for table functions, rotations, cubes, etc., but it is important to understand these concepts and implement them using equivalent SQL statements.

function

No matter what application language you’ve learned: C, Java, or Python, you only need three core parts to write a function — the function name, argument list, and return value. The same is true for SQL. With a slight difference, SQL can return not only a value, but also a table.

Note that although SQL defines a standard syntax format, most DBMS vendors provide a syntax that differs slightly from the standard format. In this article, we’ll exercise with the most commonly used MySQL database today. To do this, we need to set the MySQL parameters:

SET GLOBAL log_bin_trust_function_creators = 1;
Copy the code

This setting causes MySQL to trust the creator of the function, otherwise most of the subsequent function creation process will not continue, and an error code-named 1418 will be thrown. Error Code: 1418. The CSDN blog first discusses the return value of the function. For example, the dept_count function is responsible for querying the number of people by department name:

CREATE FUNCTION dept_count (
	dname VARCHAR ( 20 )) RETURNS INTEGER BEGIN
	DECLARE i INTEGER;
	The variable itself can also be assigned using the SET keyword:
	-- SET i = 10;
		SELECT	COUNT(*) INTO i FROM instructor WHERE instructor.dept_name = dname GROUP BY dept_name;
	RETURN i;
END
Copy the code

There are several points to note:

  1. Distinguish betweenreturnsreturnKeyword, one for declaration and one for function body.
  2. The body of the function toBeginAt the beginning,EndAt the end.
  3. Each line of statements is followed strictly by;At the end.
  4. The name of a function parameter should not be the same as the name of any of the internal table attributes, which can cause trouble.

A function passes data to the outside world by:

  1. First of all, throughDeclareThe keyword defines a variablei
  2. Use the results of the table queryIntoThe keyword is assigned to it;
  3. returni. Its data types and function declarationsReturnsAre of the same data type.

This function retrieves the number of departments using the aggregate function and externally specified department name, and then returns it. The function is called using the SELECT clause:

SELECT dept_count('Music')
Copy the code

When you declare a function, you can add its Characteristic. This part of the content is stated in the stored procedure, the characteristics of the two content are the same, not introduced here.

The stored procedure

SQL even allows the return value to be a table. Such functions are called table functions — however, in MySQL, table functions can be replaced by stored procedures (more officially known as Persistent Storage modules, or PSM).

CREATE PROCEDURE check_dept (IN dname VARCHAR(20),OUT result INTEGER) 
READS SQL DATA
BEGIN
	SELECT COUNT(*) INTO result FROM instructor WHERE instructor.dept_name = dname GROUP BY dept_name;
END
Copy the code

One big difference between a stored procedure and a function is that it supports input and output of multiple parameters. Each parameter is marked IN for input parameter, OUT for output parameter, and INOUT for both input parameter and output parameter. If no declaration is made, the parameter defaults to IN.

READS SQL DATA indicates that the stored procedure only READS DATA internally and does not use DML to modify the original table DATA (including, update, insert, delete, etc.). This section is the characteristic part of the stored procedure. This section has rich semantics to specify or restrict the execution permissions of stored procedures (or functions).

Function/stored procedure characteristics

[NOT] DETERMINISTIC: Indicates whether the execution of a stored procedure (or function) is DETERMINISTIC. DETERMINISTIC means the result is DETERMINISTIC. The same input produces the same output each time the stored procedure is executed. NOT DETERMINISTIC means that the result is indeterministic, the same input may get different outputs. By default, the result is nondeterministic.

[CONTAINS SQL SQL | | NO READS SQL DATA | MODIFIES SQL DATA] : indicates the limitation of subroutines using SQL statements:

  1. CONTAINS SQLRepresents a subroutine that contains SQL statements, but not statements that read or write data;
  2. NO SQLIndicates that the subroutine does not contain SQL statements.
  3. READS SQL DATAA statement that represents a subroutine containing read data;
  4. MODIFIES SQL DATARepresents a statement in a subroutine that contains write data.
  5. By default, it is specified asCONTAINS SQL.

SQL SECURITY [DEFINER | INVOKER] : indicate who has the authority to perform. DEFINER means that only the DEFINER can do it; INVOKER means that the caller can execute. By default, the system-specified permission is DEFINER.

[COMMENT ‘string’] : Comments information.

Values are obtained from stored procedures

Externally, stored procedures are called using the CALL keyword. In addition, if the stored procedure has an output parameter identified as OUT, it needs to be retrieved using a temporary variable starting with @ (also known as a user variable, whose lifetime is only within the session) before it can be output to the console or outside the program by SELECT.

CALL check_dept('Music',@i);
SELECT @i 
Copy the code

View stored procedures or functions

To view the STATUS of stored procedures/functions, run the SHOW STATUS command:

SHOW [PROCEDURE | FUNCTION] STATUS LIKE 'pattern'
Copy the code

For example, check the check_DEPT stored procedure we just created:

SHOW PROCEDURE STATUS LIKE 'check_dept'
Copy the code

Depending on the content of the re match, this command returns stored procedure/function information in multiple DBMSS. To view the internal logic of a function/stored procedure:

SHOW CREATE [PROCEDURE | FUNCTION] <`sp_name`>
Copy the code

Here, sp_name needs to specify the name of a specific function or stored procedure.

Logic control

Logical control is an essential part of an application, and it also exists in SQL. SQL Server calls these control flows T-SQL (enhanced VERSION of SQL). In general-purpose programming languages, we use {… } to represent a block of code, however, in SQL it can be a bit more cumbersome: here, < keyword >… END < keyword > package control logic.

Logic control to a certain extent gives DBMS the ability to deal with complex logic business by itself, without recourse to external general programming language. The following describes common logical control structures in SQL.

IF statement

The IF statement is a basic conditional branch, but unlike other languages, else IF… The continuous sentence pattern of. Beginners may not be used to the fact that conditional statements must END with END IF.

CREATE PROCEDURE `test_if`(OUT ans VARCHAR(4))
BEGIN
	DECLARE i INT;
	SET i = 5;
	IF i % 2 = 0 THEN SET ans = 'even number'; ELSE SET ans = 'odd';
	END IF;	
END
Copy the code

MySQL also has a library function called IF(), so don’t confuse the two.

CASE process control

CASE flow control here is used in stored procedures: the CASE ends in an END CASE, and each branch needs to be completed with a semicolon.

Similarly, CASE flow control here has two forms, one for value judgments and one for integrating multiple conditional judgments.

CREATE PROCEDURE test_case(OUT answer VARCHAR(4))
BEGIN
	DECLARE i INT;
	SET i = 5;
	CASE (i % 2 )
		WHEN 0 THEN SELECT 'even number' INTO answer;
		WHEN 1 THEN SELECT 'odd' INTO answer;
	END CASE;
END
Copy the code

The above logic could also be written as:

CREATE PROCEDURE test_case(OUT answer VARCHAR(4))
BEGIN
	DECLARE i INT;
	SET i = 5;
	CASE 
		WHEN (i % 2) = 0 THEN SELECT 'even number' INTO answer;
		WHEN (i % 2) = 1 THEN SELECT 'odd' INTO answer;
		ELSE SELECT 'unknown' INTO answer;
	END CASE;
END
Copy the code

Another way to write an assignment to a parameter is: SET answer = ‘even’.

WHILE statement

A WHILE block is no different from a WHILE in any other programming language:

CREATE PROCEDURE test_while()
BEGIN
	declare i INTEGER;
	SET i = 0;
  	WHILE i < 10 DO
		SET i = i + 1;
	END WHILE;
END
Copy the code

Sometimes you can do this by giving the WHILE branch a nickname:

CREATE PROCEDURE test_while()
BEGIN
	declare i INTEGER;
	SET i = 0;
	
  	loop_i: 
  	WHILE i < 10 DO
		SET i = i + 1;
	END WHILE loop_i;
END
Copy the code

This makes it easier to break out or repeat the loop with LEAVE (the break of other programming languages) or ITERATE (the continue of other languages). The same is true for the REPEAT and LOOP loops below.

The LOOP cycle

A LOOP is equivalent to an infinite while that executes unconditionally. This requires us to set up checks internally to break out of the loop when we find the desired results.

CREATE PROCEDURE test_loop(OUT answer VARCHAR(4))
BEGIN
	DECLARE i INT;
	DECLARE sum INT;
	SET i = 1;
	SET sum = 0;
	sum_loop: LOOP
		IF 
			i = 11 THEN LEAVE sum_loop;
		END IF;
		
		IF
			i % 2 = 0 THEN 
				SET i = i + 1;
				ITERATE sum_loop;
		END IF;
		
		SET i = i + 1;
		SET sum = sum + i;
	END LOOP sum_loop;
	SET answer = sum;
END
Copy the code

REPEAT loop

The loop body of REPEAT can be executed at least once and exited when the UNTIL loop condition is satisfied. (Note that this loop controls structurally the same as other languages’ do… While similar, but the semantics are different, pay attention to the discrimination.

DROP PROCEDURE test_repeat;
CREATE PROCEDURE test_repeat(OUT ans INT)
BEGIN
 DECLARE i INTEGER;
 SET i = 1;
 REPEAT
	SET i = i + 5;
 UNTIL i > 2 END REPEAT;
 SET ans = i;
END
Copy the code

The result set is returned using a temporary table implementation

MySQL does not support returning a table, so if a stored procedure wants to display a collection of results externally, it has to use other methods. In general, there are three steps:

  1. Create a stored procedure that does not require active declarationOUTCan be set as requiredINParameters.
  2. Returns multiple results that you want to returnINSERTInto this temporary table.
  3. Ensure the last of the stored proceduresSELECTStatement is to select the temporary table.

When the stored procedure is finished running, you can choose to actively DROP all temporary tables. Call the stored procedure directly in an external query to see what is returned.

Note that temporary tables with the same name do not affect each other between sessions. However, you cannot create more than one table with the same name in a session. Creating a TEMPORARY table in MySQL is very easy and only requires an additional TEMPORARY keyword. Since TEMPORARY tables are only used to load query results and are reclaimed after the session ends, we generally do not need to set additional constraints on this table.

CREATE TEMPORARY TABLE(
	Nothing else is different from creating a table.
)
Copy the code

At the end of the session, the temporary table is deleted. In the following example, data is first obtained from the original table instructor,

CREATE PROCEDURE name_start_with_C()
BEGIN
	CREATE TEMPORARY TABLE tmp(`name` VARCHAR(20),dept_name VARCHAR(20));
	CREATE TEMPORARY TABLE tmp2(`name` VARCHAR(20),dept_name VARCHAR(20));

	-- Demonstrates inserting the results of a query into a temporary table
	INSERT INTO tmp SELECT `name`,dept_name FROM instructor;
	INSERT INTO tmp2 SELECT `name`,dept_name FROM tmp WHERE name LIKE 'C%';
	The result can be viewed externally
	SELECT * FROM tmp2;
	Drop temporary table TMP,tmp2
	DROP TEMPORARY TABLE tmp,tmp2;
END

drop PROCEDURE check_test
call check_test()
Copy the code

The cursor

Cursors have long been controversial in terms of performance. For the most part, we don’t need cursors to handle tuples — except in cases where we need to manipulate tuples line by line. MySQL cursors can only be declared in stored procedures and operated with loop control.

Each cursor has four “life cycles” — declare, open, use, and close. In particular, the logic for opening and closing cursors needs to be written on our own initiative. SQL statement to declare cursor:

DECLARE <`cursor_name`> CURSOR FOR <`select_statement`>
Copy the code

The cursor is bound to a SELECT query statement. This is equivalent to unpacking declarative SQL into imperative logic. Cursors are equivalent to iterators. We insert some processing logic inside the loop to examine and process tuples line by line.

After the cursor is declared, it must be opened to retrieve values from it, and closed when it is finished using it. These two statements correspond to:

OPEN <`cursor_name`>
CLOSE <`cursor_name`>
Copy the code

If a cursor is not explicitly declared closed, it will close itself after the stored procedure runs. It is not enough to declare the cursor. In the process of using the cursor, we also need a semaphore to know if the cursor has been traversed. The following stored procedure lists the pre and post tasks using cursors:

CREATE PROCEDURE test_cursor()
BEGIN
	DECLARE done bool DEFAULT FALSE;
	-- Declare cursor;
	DECLARE row_cursor CURSOR FOR SELECT. ;This line declaration follows the cursor declaration, and the variable done is bound to the cursor state.
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
	
	OPEN row_cursor;
	-- LOOP FOR USE CURSOR
	-- LOOP FOR USE CURSOR
	-- LOOP FOR USE CURSOR
	-- LOOP FOR USE CURSOR
	-- LOOP FOR USE CURSOR
	-- LOOP FOR USE CURSOR
	CLOSE row_cursor;
END
Copy the code

Use the cursor as follows:

FETCH cursor_name INTO <`col1`> [`col2`] ... 
Copy the code

Let’s do a simple experiment. By cursor, it is realized by imperative logic: Count the number of faculty members in the history department whose DEPt_name is in the instructor relationship. Additional considerations for using cursors are attached to the comments section of the code block.

CREATE PROCEDURE test_cursor(OUT nums INT)
BEGIN

	DECLARE cur_dept VARCHAR(20);
	DECLARE sum INT DEFAULT 0;
	DECLARE done bool DEFAULT FALSE;
	
	Note that in all declarations, the cursor needs to come last.
	The cursor extracts only one dept_name attribute at a time.
	DECLARE row_cursor CURSOR FOR SELECT dept_name FROM instructor;
	Bind the traversal state of the cursor to the done variable
	DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
	
	OPEN row_cursor;
	use_cursor:REPEAT
		The number of INTO variables depends on how many columns the cursor bound SQL statement selected.
		FETCH row_cursor INTO cur_dept;
		IF cur_dept = 'history' THEN 
			SET sum = sum + 1;
		END IF;
	UNTIL done END REPEAT;
	CLOSE row_cursor;
	SET nums = sum;
END

Call the storage function.
CALL test_cursor(@nums);
SELECT @nums;
Copy the code

The trigger

A trigger is a special stored procedure. In the previous example, we needed to actively invoke the stored procedure via call, but the trigger is a callback function — the database creates a child process to run it when certain conditions are met. In addition to naming, there are three main factors needed to create a trigger:

  1. Trigger time:BEFORE(previously triggered) orAFTER(Triggered later).
  2. Trigger event: IncludesINSERT.UPDATE.DELETE.
  3. Specifies the table to which the trigger is bound.

To sum up, creating triggers requires syntax like this:

CREATE TRIGGER <`trigger_name`> <`trigger_time`> <`trigger_event`> 
ON <`trigger_name`> FOR EACH ROW 
BEGIN
	#... stmts
END
Copy the code

For example, let’s create a trigger that records how many people have performed a DELETE operation on the table:

Create a table on which triggers count.
CREATE TABLE bank_log(
	item  VARCHAR(20) PRIMARY KEY.count INT
);
-- Insert initial values
INSERT INTO bank_log(item,count) VALUES ('inserts'.0)

DROP TRIGGER test.count_insert
Create trigger
CREATE TRIGGER count_insert AFTER INSERT
ON bank FOR EACH ROW 
BEGIN
	UPDATE bank_log SET count = count + 1 WHERE item = 'inserts';
END

Try triggering count_insert
INSERT INTO bank(`name`,balance) VALUES ('Lin Qing'.4000)

-- Check whether the trigger is in effect.
SELECT count from bank_log WHERE item = 'inserts'
Copy the code

The target tuple in an INSERT or UPDATE statement can be captured in a trigger. We can intercept “newly sent tuple data”, referred to as NEW; You can also get “OLD tuple data”, which refers to OLD. Both are collectively referred to as the transition variables of the trigger. With these two variables, we can get the “NEW” or “OLD” value of a property by either new. XXX or old. XXX.

The use of transition variables is constrained by the trigger event and the trigger timing:

  1. forINSERTType of trigger that can only be retrievedNEW
  2. forDELETEType of trigger that can only be retrievedOLD.
  3. forUPDATEType of trigger,NEWOLDAll available.
  4. forBEFOREType of trigger,INSERTUPDATETriggers can be set directlyNEWModify.
  5. forAFTERType of trigger,NEWIt’s read-only.
  6. OLDIt’s read-only anyway.

With triggers, we can try to avoid errors by performing illegal operations and correct the error tuple before it is inserted into the database. In addition, when a database schema has foreign key constraints, using triggers to safely cascade deletes is also a good option.

Here’s a simple example: Before inserting a NEW student score (this tuple contains name and score, denoted by NEW), the trigger can first check before recording to the database: If the student’s grade is less than 60, null is used to indicate “fail”.

-- If the score is below 60, it will be recorded as a failure and the score will be cancelled.
CREATE TRIGGER check_score BEFORE INSERT
ON student_grades FOR EACH ROW
BEGIN
	-- Modify variables directly through SET.
	IF NEW.score < 60 THEN SET NEW.score = null;
	END IF;
END
Copy the code

Triggers are open by default after creation and can be actively closed with the ALTER statement:

ALTER TRIGGER <`trigger_name`> disable
Copy the code

In particular, when doing a master/slave backup, you should keep the trigger off because filtered data should not be fed into the trigger twice. For the backup replication system that is to take over the primary system, disable the trigger first and then enable it when the backup system starts services.

A trigger may fire another trigger during execution. At worst, this could lead to an infinite chain of triggers. Therefore, you should be careful when writing and using triggers. Triggers are not required if an AOP operation can be implemented through stored procedures.

[mysql trigger Err] 1362-updating of NEW row is not allowed in after trigger. (faceghost.com)

Mysql triggers new and old – JimmyShan – blogblogs.com

Recursive query

Make sure MySQL version 8.0+ is available before using recursive queries.

Assume that there is a table of PREReq for each course and its precursors:

course_id prereq_id
BIO-301 BIO-101
BIO-399 BIO-101
CS-190 CS-101
CS-315 CS-190
CS-319 CS-101
CS-319 CS-315
CS-347 CS-319

Suppose there is now a requirement: for each course: identify all of its direct, or indirect, prerequisites. For example, the prerequisite course of CS-315 is CS-190, and the prerequisite course of CS-190 is CS-101. Therefore, CS-101 is also the prerequisite course of CS-315.

In discrete mathematics terms, we are calculating the transitive closure of prereq. There are many applications that need to compute similar transitive closures on hierarchies. First, the pseudo-code of iterative form is given:

1. Prepare a temporary table rec_PREReq. 2. Add all pre-taken courses from target course CID to REC_PREReq. 3.LOOP: IF TMP set changes THEN BEGIN Add rec_PREReq to rec_prereq END the ELSE doneCopy the code

To summarize this iterative process and give a recursive definition of all prerequisites for a course:

  1. Prerequisite courses that can be searched directly on prereq (step 1,2 of the iteration);
  2. All prerequisites (direct or indirect) for the course (an iterative cycle).

This logic is implemented using SQL statements, where the RECURSIVE keyword is introduced (not all databases use this keyword).

-- RECURSIVE is not available until mysql 8.0
WITH RECURSIVE rec_prereq(course_id,prereq_id) AS (
    -- equivalent to definition 1.
	SELECT course_id,prereq_id 
	FROM prereq WHERE
	UNION
    -- equivalent to definition 2.
	The following SQL statement can be analogous to a compound operation in discrete mathematics:
	- R {< r.c ourse_id, r.p rereq_id >} "S {< s.c ourse_id, supachai panitchpakdi rereq_id >} = > {< r.c ourse_id, supachai panitchpakdi rereq_id >}
	-- where r. plyq_id and S. court _id are the same class attributes.
	SELECT rec_prereq.course_id,prereq.prereq_id
	FROM prereq JOIN rec_prereq ON rec_prereq.prereq_id = prereq.course_id
)
SELECT * FROM rec_prereq
Copy the code

In this RECURSIVE view, implemented by the WITH RECURSIVE keyword, we do not create temporary tables as we do iteratively, but instead add all the new relationships queried directly to the current view. The standard form of a recursive view is written as a union of two subqueries:

  1. Base query: definition 1 in this example.
  2. Use recursive Queries of the recursive view itself, as defined 2 in this example.

Using the analogy of a recursive function implemented in a general-purpose programming language, the most intuitive way to understand a recursive query is to pass in a set T’ with an initial value of ∅. The function constantly finds the implicit set of transfer relations S from the base query R, and makes T’ := T’ ∪ S. The change in T’ may introduce a new transitive relationship, so recursively check T’ again and try to add a new set S.

“T’ does not change” is set as a critical condition, and when the recursion meets this condition, it quits. At this time, T’ is called a fixed point. In recursive queries, this logic is judged by the database, and we don’t have to explicitly write the IF branch. The final result is a set that satisfies definitions 1 and 2, that is, T = T’ ∪ R, which is represented in an SQL statement as two subqueries of a UNION join.

Clearly, rec_PREReq is a superset of primitive base query PREReq: that is, rec_PREReq contains more information because it also contains those implicit transitive relationships. Our generalization leads to the conclusion that recursive queries must return at least the same result set as before, with the possibility of returning additional tuples. To put it more succinctly: recursive queries are definitely monotonic.

Correlation subquery

Suppose you have a score table student_score (id, name, grade) and now print the rank of each student. And: When more than one student has the same score, the ranking should be the same. For example, if the top three students have the same score, all three are ranked 1, and the next ranking starts at 4.

Consider the following with the aggregation function we have learned so far: we know that if there are n-1 scores higher in the class for a student, his rank will naturally be N. Select COUNT(*) from COUNT(*);

SELECT ( 1 + (SELECT COUNT(*) FROM student_grades AS b WHERE b.score > a.score)) AS `rank`.`name`,score 
FROM student_grades AS a
Copy the code

This is a correlation subquery. The internal query (student_score, alias B) uses the values of the external query (student_score, alias A) because when we calculate the rank value for each tuple in A, Table B is scanned, and the WHERE clause is associated with both tables A and B.

To generalize to a more general case, if the inner table has M records and the outer table has N records, the time complexity of the query will be O (Mn). For this example, the time complexity is approximately O (n2). Later, we’ll use the faster and more efficient RANK() function.

On the other hand, if there is no clause inside the subquery associated with the outer surface, then it is an irrelevant subquery. In this case, the query time complexity is O (m+ N).

Other options for implementing rankings

Here we introduce the RANK() function to implement the problem left over from the “relevant subquery” section. It is written as follows:

SELECT RANK(a)OVER (ORDER BY score DESC) as `rank`.name,score 
FROM student_grades
Copy the code

Where ORDER BY is moved to the front to act as a predicate for RANK(). The RANK() function helps us to RANK each tuple, but it does not guarantee that the output will be ranked. This can be supplemented with an ORDER BY to ensure the output result:

SELECT RANK(a)OVER (ORDER BY score DESC) as `rank`.`name`,score
FROM student_grades 
ORDER BY `rank`
Copy the code

If you want to ensure that people with the same score have the same rank, but you also want no gaps between the ranks (for example, the top three students have the same score and are ranked 1, but the fourth student is ranked 2 instead of 4), you can use the DENSE_RANK() function instead.

SELECT DENSE_RANK(a)OVER (ORDER BY score DESC) as `rank`.`name`,score
FROM student_grades 
ORDER BY `rank`
Copy the code

If we use this query result as a derived table, it becomes fairly easy to implement requirements such as “query the NTH name”, “query the n-m name”. Note that MySQL strictly requires that derived tables have an alias:

SELECT `name` FROM (SELECT DENSE_RANK(a)OVER (ORDER BY score DESC) as `rank`.`name`,score FROM student_grades) as rank_table
WHERE `rank` = 2
Copy the code

As a bonus, the ranking starts with a 1, not a 0.

If a NULL value is encountered during the ranking process, such as a student missing an exam, we need to specify that the NULL tuple is placed last or first (depending on the semantics). Oracle database provides the NULLS FIRST or NULLS LAST keyword, but in MySQL we need to use functions to implement the equivalent. Note that MySQL is NULLS FIRST by default.

NULLS FIRST is NOT ISNULL(

). ORDER BY clause:

- NULL priority
SELECT *
FROM student_grades
ORDER BY NOT ISNULL(score)
Copy the code

If NULLS LAST is implemented, ISNULL(

) can be used instead in the ORDER BY clause:

SELECT *
FROM student_grades
ORDER BY ISNULL(score)
Copy the code

Reference: [mysql sort null record last (implementation of oracle crosstab first | last]) _ playing undead jack Bauer lee’s blog – CSDN

Other forms of RANK()

The Percent_rank() function shows the percentage-p of the current tuple in the whole. Obviously, the first place’s P = 0.00 and the last place’s P = 1.00. In the case of “beat xx.xx% users” application requirements, simply calculate the value of 1-p.

SELECT PERCENT_RANK(a)OVER (ORDER BY score DESC) as `rank`.`name`,score 
FROM student_grades as rank_table
Copy the code

Row_number() is a Rank() that does not take into account parallelism, thus ensuring that the ordinal number of each tuple is unique. When it encounters tuples of the same rank, the database directly sorts them in an unstable fashion.

SELECT ROW_NUMBER() OVER (ORDER BY score DESC) as `rank`.`name`,score 
FROM student_grades as rank_table
Copy the code

Ntile (n) averages sorted tuples into buckets (partitions). Buckets contain an interval of rank, and for individual tuples, they are identified only by bucket numbers.

SELECT NTILE(4) OVER (ORDER BY score DESC) as `rank`.`name`,score 
FROM student_grades as rank_table
Copy the code

When buckets cannot be evenly divided, the number of tuples between each bucket differs by at most one. A tuple at the “two bucket boundary” may be assigned to two different groups even if they are equally placed. This function is particularly useful for constructing percentages based histograms.

Dept_name = dept_name = dept_name = dept_name = dept_name = dept_name = dept_name = dept_name = dept_name = dept_name

SELECT RANK(a)OVER (PARTITION BY `dept_name` ORDER BY score DESC) AS `rank`.`dept_name`.`name`.`score`
FROM student_grades
Copy the code

Of course, if we are only interested in a particular professional ranking, there is no need to use the PARTITION BY clause because it can be replaced BY the following query:

SELECT RANK(a)OVER (ORDER BY score DESC) AS `rank`.`dept_name`.`name`.`score`
FROM student_grades
WHERE `dept_name` = 'history'
Copy the code

LIMIT

Let’s go back to the more general sorting problem. When our concern is not “where does each student rank?”, requirements such as “top n” and “query n-m” can be addressed using the LIMIT keyword.

-- Take the top 3 with the highest score
SELECT * FROM student_grades ORDER BY score DESC
LIMIT 3

-- Take the highest score of 3-5
SELECT * FROM student_grades ORDER BY score DESC
LIMIT 3.5
Copy the code

The LIMIT keyword is often used for general paging queries. However, when the specified offset is very large, LIMIT based data query becomes very inefficient. See the solution: actual combat! How to solve deep pagination problems in MySQL

Points window

Suppose we have a table like this: Stock (price,year) records the price of a stock from 2010 to 2020, with only one data per year.

CREATE TABLE stock(
	price DOUBLE.`year` CHAR(4) PRIMARY KEY
)
Copy the code

Now you are required to average the stock prices for each year and the previous two years and return a new table. For example, take the average of years 13 and 12,11, 14 and 13, 12, and so on. In particular, for the earliest year, 2010, since there is no data for 2009 and 2008, the average is the price for that year. For 2011, there is no data for 2009, so the average only takes into account 2010. The following GIF illustrates the need for computation:

If you’re looking for a fixed mean over three years, you can easily do this using the basic AVG function, but now you’re looking at a mean every three years, and things get a little trickier. From the action diagram, the calculation process appears to be pushing a “sliding window”. For such calculations, SQL provides window functions. Window calculations are common in trend analysis of the stock market, and various “moving averages” can be found on business and investment websites.

In the PRECEDING row, 'each and the two PRECEDING ROWS'
-- ORDER BY 'year' ensures that data is ordered BY year, default is ASC, so no extra keyword is required.
SELECT `year`.AVG(price) OVER (ORDER BY `year`ROWS 3 PRECEDING) AS `avg` 
FROM stock
Copy the code

There are many forms of window splitting. For example, take an average of each year and the next two years:

-- ROWS 3 FOLLOWING = 'each row and two ROWS'
SELECT `year`.AVG(price) OVER (ORDER BY `year`ROWS 3 FOLLOWING) AS `avg` 
FROM stock
Copy the code

The first 3 lines and the last 3 lines (counting the “current line”, up to 7 line tuples will be counted) are computed as one window:

-- BETWEEN N PRECEDING AND M FOLLOWING, the PRECEDING AND M FOLLOWING sequence can contain at most M + N + 1.
SELECT `year`.AVG(price) OVER (ORDER BY `year`ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS `avg` 
FROM stock
Copy the code

If you want one side of the window to be UNBOUNDED, use UNBOUNDED to replace specific numbers. Here are the different scenarios:

Ignore the rest of the SQL statement.
-- Evaluates the current row and all previous rows
ROWS UNBOUNDED PRECEDING
​
-- Evaluates the current row and all subsequent rows
ROWS UNBOUNDED FOLLOWING
​
Count all the preceding lines, the current line, and the next three lines.
ROWS BETWEEN UNBOUNDED AND 3 FOLLOWING 
​
Count the first three lines, the current line, and all subsequent lines.
ROWS BETWEEN 3 AND UNBOUNDED FOLLOWING
Copy the code

The average of the stock for the last two years and this year.

-- They mean the same thing.
ROWS BETWEEN 3 AND CURRENT ROW
ROWS 3 PRECEDING
Copy the code

The windowing function can be windowed not only BY rows, but also BY the range of the ORDER BY attribute values. Such as:

-- If the price of the current row is c, then the window selects all rows whose price is in the range [c-0.5,c]. SELECT AVG(price) OVER (ORDER BY price RANGE BETWEEN 0.5 PRECEDING AND CURRENT ROW) AS AVG, 'year' FROM stock; SELECT AVG(price) OVER (ORDER BY price RANGE 0.5 PRECEDING) AS AVG, 'year' FROM stock; -- If the price of the current row is c, then the window selects all rows whose price is in the range [c-0.5,c+0.6]. SELECT AVG(price) OVER (ORDER BY price RANGE BETWEEN 0.5 AND 0.6 FOLLOWING) AS AVG, 'year' FROM stock;Copy the code

As an additional note, if RANGE is used for windowings, the ORDER BY attribute must be numeric, which is computable, or a differential interval between two times.

What are SQL window functions? What a revelation! – Java Technology Stack – Blogpark (CNblogs.com)

Slide Windows in mysql window functionsWeixin_46338676 blog -CSDN blogMysql slide window

Equivalent implementation of rotation PIOVT in MySQL

The concept of “rotation” was imported from Oracle, SQL Server database. MySQL does not provide pivot keyword processing directly, but we can still implement the concept using basic SQL statements.

Suppose you have a table sales(Item_name, color, Clothes_size, quantity), some of whose attributes are already constrained with CHECK to indicate that their range is a finite set. The quantity property represents the inventory of this item.

CREATE TABLE sales(
    item_name VARCHAR(20),
    color VARCHAR(10),
    clothes_size CHAR(1),
    quantity INT(5),
    PRIMARY KEY(item_name,color,clothes_size),
    CHECK(item_name IN ('dress'.'pants'.'shirt'.'skirt')),
    CHECK(color IN ('black'.'white'.'pastel')),
    CHECK(clothes_size IN ('S'.'M'.'L')))Copy the code

The instance data of table SALES is shown in the figure below:

Now, we want to treat (Item_name, Clothes_size) as a specific combination and calculate the total quantity of inventory in different color dimensions for each combination. The expected statistics are shown in the figure below:

The color range (‘dark’, ‘pastel’, ‘white’) is rotated as a property in this table. The table above is another representation of the original SALES table, also known as a crosstab, or PivotTable. This abstract query can be implemented directly in a database such as Oracle using the PIOVT clause:

SELECT * FROM sales
PIVOT(
    SUM(quantity)
    FOR color IN ('dark'.'pastel'.'white'))Copy the code

Since MySQL does not provide this keyword, the equivalent implementation is given by combining basic functions. Item_name and Clothes_size are grouped together using the GROUP BY clause, meaning “M size skirt”, “S size short sleeve”… And so on. Dark, white and other attributes that are expanded and rotated all represent “inventory of this group of goods in various color dimensions”. Then, according to the color of each item in the grouping, the quantity of the item is summarized to the color column, while the value of 0 is contributed to the remaining two unrelated columns. This logic is implemented using the IF() function.

CREATE PROCEDURE pivot(a)READS SQL DATA
BEGINSELECT item_name,clothes_size,
    SUM(IF(color = 'dark',quantity,0)) as dark,
    SUM(IF(color = 'pastel',quantity,0)) as pastel,
    SUM(IF(color = 'white',quantity,0)) as white
    FROM sales
    GROUP BY item_name,clothes_size;
​
END
Copy the code

GROUP BY

Sometimes, data analysts want to be able to aggregate data in multiple forms. The following two concepts of high latitude grouping are introduced: ROLLUP and CUBE.

ROLLUP

Continue with the example of table SALES: first, group the goods by subclass (Item_name, Clothes_size) and count the inventory. Then, group the goods by subclass according to item_name and make a subtotal. Finally, do a total of the inventory of all goods.

If only basic aggregation functions were used, the above requirements would be implemented using three SQL statements:

-- Statistics subclass
SELECT item_name,color,SUM(quantity)
FROM sales
GROUP BY item_name,color;
​
-- Statistical categories
SELECT item_name,SUM(quantity)
FROM sales
GROUP BY item_name;
​
-- Do a total for all inventory
SELECT SUM(quantity) as `total`
FROM sales;
Copy the code

SQL is equivalent to doing all three calculations at once:

I use ROW_NUMBER() here to print the number of rows for illustration purposes. The ROLLUP syntax varies from DBMS to DBMS, and only mysql is shown here. SELECT ROW_NUMBER() OVER () line,item_name,color,SUM(quantity) FROM sales GROUP BY item_name,color with ROLLUPCopy the code

The query result is as follows, in which the rows not marked in color are the inventory statistics of small classes. Lines 3, 7, and 10 are the inventory subtotals for each item_NAME category, while line 11 is the total inventory for all items. In order to consolidate statistics from different dimensions into one table, the database sets some attribute values to NULL.

When comparing two SQL codes, the table is grouped BY three groups at once:

1. GROUP BY(item_name,color)
2. GROUP BY(item_name)
3. GROUP BY()
Copy the code

Where, GROUP BY() means no grouping, which means a total of all tuples in the table. These groups are actually a collection of all the “prefixes” for the set of properties that ROLLUP selected. For example, if we were to scroll up an attribute set (Item_name, color, clothes_size), there would be the following groups:

1. GROUP BY(item_name,color,clothes_size)
2. GROUP BY(item_name,color)
3. GROUP BY(item_name)
4. GROUP BY()
Copy the code

Obviously, if n properties are selected in the ROLLUP clause, n+1 groups will be divided. Note that the order in which attributes are selected in ROLLUP affects the result of the calculation. ROLLUP(item_name, color) is not equal to ROLLUP(color, item_name). If the SQL looks like this:

SELECT item_name,color,SUM(quantity)
FROM sales
GROUP BY color,item_name with ROLLUP
Copy the code

Then the grouping of ROLLUP is:

1. GROUP BY(color,item_name)
2. GROUP BY(color)
3. GROUP BY()
Copy the code

Note in particular that if the grouped columns contain NULL, the result of the roll-up may be incorrect. NULL has special meaning when grouping statistics. Therefore, NULL can be converted to a value that cannot exist, or a value that has no special meaning, such as IFNULL(XXX,0). Reference: MySQL ROLLUP and CUBE problems _ITPUB blog

CUBE

MySQL seems to have completely removed the CUBE keyword after version 8.0.1 (and CUBE support is not actually available in older versions of MySQL). MySQL :: MySQL 8.0 Release Notes :: Changes in MySQL 8.0.1 (2017-04-10, Development Milestone) : The unimplemented and nonstandard WITH CUBE clause for GROUP BY is no longer supported.

The CUBE clause is a higher-dimensional partition that generates more groupings made up of all subsets of the attributes listed in the CUBE clause. Such as:

SELECT item_name,color,SUM(quantity)
FROM sales
GROUP BY (item_name,color,clothes_size) WITH CUBE
Copy the code

This is equivalent to dividing the following groups BY:

1. GROUP BY(item_name,color,clothes_size)
2. GROUP BY(item_name,color)
3. GROUP BY(item_name,clothes_size)
4. GROUP BY(color,clothes_size)
5. GROUP BY(item_name)
6. GROUP BY(color)
7. GROUP BY(clothes_size)
8. GROUP BY()
Copy the code

Obviously, if n properties are selected in the CUBE clause, 2N groups are divided. Since MySQL no longer supports the CUBE keyword, executing the above SQL statement directly will report a syntax error. However, a CUBE can be viewed as a union of multiple rollups. So we can still implement CUBE equivalently by combining the forms of ROLLUP.

1.ROLLUP(item_name,color,clothes_size)
    - GROUP BY(item_name,color,clothes_size)
    - GROUP BY(item_name,color)
    - GROUP BY(item_name)
    - GROUP BY()
​
UNION
​
2.ROLLUP(color,clothes_size)
    - GROUP BY(color,clothes_size)
    - GROUP BY(color)
    - GROUP BY()
    
UNION
​
3.ROLLUP(clothes_size)
    - GROUP BY(clothes_size)
    - GROUP BY()
Copy the code

We can see from this that CUBE is a higher dimension grouping method than ROLLUP.

SELECT item_name,color,clothes_size,SUM(quantity) FROM sales
GROUP BY item_name,color,clothes_size WITH ROLLUPUNIONTo ensure that the intersection of three tables is output in a unified table, NULL is used for missing columns.
SELECT item_name,color,NULL AS clothes_size,SUM(quantity) FROM sales
GROUP BY item_name,color WITH ROLLUPUNIONSELECT item_name,NULL AS color,NULL AS clothes_size,SUM(quantity) FROM sales
GROUP BY item_name WITH ROLLUP
Copy the code