MySQL aggregate functions and paging queries

MySQL database (MySQL installation/Basic/advanced/optimized)

We covered SQL single-line functions in the previous chapter. In fact, there is another class of SQL functions, called aggregate (or aggregate, group) functions, which are functions that aggregate a set of data, input a set of data, and output a single value.

1. Introduction to aggregation functions

What is an aggregation function

Aggregate functions operate on a set of data and return a value for a set of data.

Aggregate function type

  • AVG()
  • SUM()
  • MAX()
  • MIN()
  • COUNT()

Aggregate function syntax

Aggregate functions cannot be nested

For example, a call of the form “AVG(SUM(field name))” cannot occur.

1.1 AVG and SUM functions

You can use AVG and SUM functions for numeric data.

SELECT AVG(salary), MAX(salary),MIN(salary), SUM(salary)
FROM   employees
WHERE  job_id LIKE '%REP%';
Copy the code

1.2 MIN and MAX functions

You can use the MIN and MAX functions for data of any data type.

SELECT MIN(hire_date), MAX(hire_date)
FROM employees;
Copy the code

1.3 the COUNT function

  • COUNT(*)Returns the total number of records in the table, applicable toArbitrary data type.
SELECT COUNT(*)
FROM   employees
WHERE  department_id = 50;
Copy the code
  • COUNT(expr)returnExpr isn’t emptyTotal number of records.
SELECT COUNT(commission_pct)
FROM   employees
WHERE  department_id = 50; //Ignore theNullvalueCopy the code

Calculate how many records there are in the table

  • Method 1:count(*)
  • Method 2:count(1)
  • Method 3:Count (a specific field), but not necessarily because null values are ignored

Question: Use count(*), count(1), count(column name) which is better?

In fact, there is no difference for MyISAM engine tables. This engine has an internal counter that maintains the number of rows, but COUNT(*) is a little more efficient

Innodb engine tables use count(*), count(1) to read rows directly, complexity is O(n), because Innodb really counts. But it’s better than the actual count(column name).

Question: Can I replace count(*) with count(column name)?

Do not use count(column name) instead of count(*). Count (*) is a standard row count syntax defined in SQL92, independent of database, NULL and non-NULL.

Note: Count (*) counts rows with a NULL value, while count(column name) does not count rows with a NULL value.

Note:

  1. The above grouping functions ignore null values
  2. It can be used with distinct to implement a deduplication operation
  3. A separate introduction to the count function, which generally uses count(*) to count the number of rows
  4. The fields to be queried with the grouping function are the fields after group by

2. GROUP BY

2.1 Basic Usage

The GROUP BY clause can be used to divide the data in a table into groups as follows:

SELECT column, group_function(column)
FROM table
[WHERE	condition]
[GROUP BY  group_by_expression]
[ORDER BY  column];
Copy the code

Specify: WHERE must be placed after FROM

All columns in the SELECT list that are not included in the GROUP function should be included in the GROUP BY clause

SELECT   department_id, AVG(salary)
FROM     employees
GROUP BY department_id ;
Copy the code

2. Columns contained in the GROUP BY clause need not be included in the SELECT list

SELECT   AVG(salary)
FROM     employees
GROUP BY department_id ;
Copy the code

2.2 Grouping multiple columns

SELECT   department_id AS dept_id, job_id, SUM(salary)
FROM     employees
GROUP BY department_id, job_id ;
Copy the code

2.3 GROUP BY uses WITH ROLLUP

After the WITH ROLLUP keyword is used, a record is added after all queried group records, which calculates the sum of all queried records, that is, the number of statistics records.

SELECT department_id,AVG(salary)
FROM employees
WHERE department_id > 80
GROUP BY department_id WITH ROLLUP;
Copy the code

Note: when using ROLLUP, the ORDER BY clause cannot be used simultaneously to sort the results, i.e. ROLLUP and ORDER BY are mutually exclusive, which of course only exists in 5.7

3. HAVING(filtering data)

3.1 Basic Usage

Filter groups: HAVING clauses

  1. Rows have been grouped.
  2. Aggregate functions are used.
  3. Groups that meet the criteria in the HAVING clause will be displayed.
  4. HAVING cannot be used alone, it must be used with GROUP BY.

SELECT   department_id, MAX(salary)
FROM     employees
GROUP BY department_id
HAVING   MAX(salary)>10000 ;
Copy the code

Illegal use of aggregate functions: You cannot use aggregate functions in place of filtering conditions in the WHERE clause. As follows:

SELECT   department_id, AVG(salary)
FROM     employees
WHERE    AVG(salary) > 8000
GROUP BY department_id;
Copy the code

Exercise: Query information about the departments whose highest salary is higher than 10000 in the departments whose ids are 10,20,30, and 40

# way1: Recommended, execution efficiency over method2.
SELECT department_id,MAX(salary)
FROM employees
WHERE department_id IN (10.20.30.40)
GROUP BY department_id
HAVING MAX(salary) > 10000; # way2:SELECT department_id,MAX(salary)
FROM employees
GROUP BY department_id
HAVING MAX(salary) > 10000 AND department_id IN (10.20.30.40);
Copy the code

Conclusion:

  1. When a filter has an aggregate function, the filter must be declared in HAVING.
  2. When there is no aggregate function in the filter criterion, the filter criterion declaration can be either in WHERE or IN HAVING. However, it is recommended that you declare it in WHERE

3.2 WHERE and HAVING

1.In terms of scope of application,HAVINGIs applicable to a wider range of.2.If there is no aggregate function in the filter condition: in this case,WHEREThe execution efficiency is higher thanHAVING
Copy the code

Difference 1: WHERE can use fields in the table directly as filter criteria, but cannot use calculation functions in the group as filter criteria; HAVING must be used in conjunction with GROUP BY, which can be filtered BY GROUP computed functions and GROUP fields.

This allows HAVING to do things WHERE can’t when it comes to grouping statistics. This is because, in the query syntax structure, WHERE comes before GROUP BY, so you cannot filter the grouped results. HAVING after GROUP BY allows you to filter the result set of a GROUP using the GROUP fields and the calculation functions in the GROUP, something WHERE cannot do. In addition, records excluded by WHERE are no longer included in the grouping.

Difference 2: If you need to fetch data from an associated table by joining, WHERE is filtered and joined, while HAVING is joined and filtered. This makes WHERE more efficient than HAVING in a relational query. Because WHERE can be filtered first, it uses a smaller filtered set of data to join the associated table, which consumes less resources and is more efficient to execute. In HAVING, the result set needs to be prepared first, that is, the unfiltered data set is associated, and then the large data set is filtered. In this way, more resources are occupied and the execution efficiency is low.

The summary is as follows:

advantages disadvantages
WHERE(Filter before grouping) Filtering data before association, high execution efficiency Filtering cannot be done using computed functions in groups
HAVING(Filter after grouping) You can use calculation functions in groups Filtering in the final result set is inefficient

Options in development:

WHERE and HAVING are not mutually exclusive. We can use both WHERE and HAVING in one query. The condition that contains grouped statistics functions is HAVING, and the general condition is WHERE. In this way, we take advantage of the speed and efficiency of the WHERE condition while taking advantage of HAVING to use query conditions that contain grouped statistics functions. When the volume of data is very large, the performance can make a big difference. Generally speaking, if you can use pre-group screening, try to use pre-group screening to improve efficiency

4. Review: paging query ★

Application scenario: If the data to be displayed is incomplete on one page, you need to submit SQL requests on separate pages

Grammar:

  selectQuery listfromThe table"join type join2
  onJoin conditionwherefiltergroup byThe grouping fieldhavingFiltering after groupingorder bySelect * from 'limit';offset】 the size;offsetTo display the entry's starting index (starting index from0Start) size Number of entries to displayCopy the code

Features:

  • The limit statement is placed at the end of the query

  • The formula

    selectQuery listfromThe table limit (page- 1)*size,size;
    Copy the code

Assume size=10, that is, 10 records are displayed per page, and page starts at 1, the first page

  • If page=1, the starting index of entries is 0, and 0-10 entries are displayed on the page
  • Page =2, the starting index of the items displayed is 10, and the page displays 11-20 items
  • If page=3, the starting index of items displayed is 20, and the page displays 21-30 items

Case 1: Query information about the first five employees

SELECT * FROM employees LIMIT 0.5;
SELECT * FROM employees LIMIT 5;
Copy the code

Case 2: Query articles 11-25

SELECT * FROM employees LIMIT 10.15;
Copy the code

Case 3: Information on employees with bonuses, and the top 10 with higher salaries are displayed

SELECT *
FROM employees 
WHERE commission_pct IS NOT NULL 
ORDER BY salary DESC
LIMIT 10 ;
Copy the code

5. SELECT execution process

5.1 Complete structure of SELECT statement

Sql92 syntax:
SELECT. . .FROM. . .WHEREJoin conditions for multiple tablesANDDoes not contain filter conditions for group functionsGROUP BY. .HAVINGContains filter conditions for group functionsORDER BY.ASC/DESC
LIMIT. .Sql99 syntax
SELECT. . .FROM.JOIN.ONJoin conditions for multiple tablesJOIN.ON.WHEREDoes not contain filter conditions for group functionsAND/ORDoes not contain filter conditions for group functionsGROUP BY. .HAVINGContains filter conditions for group functionsORDER BY.ASC/DESC
LIMIT. .# :
# (1) from
# (2) on: Remove cartesian product when associated with multiple table queries
# (3) WHERE: Filter criteria from table
# (4) Group by
# (5) Having: Filter again in statistics
# (6) Order by
# (7) limit: paging
Copy the code

5.2 SELECT Execution order

You need to remember the two order in which a SELECT query occurs:

1. The order of keywords cannot be reversed:

SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT...
Copy the code

MySQL > SELECT (); SELECT ();

FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMITCopy the code

For example, if you write an SQL statement, the order of its keywords and execution is as follows:

SELECT DISTINCT player_id, player_name, count(*) as num Order # 5
FROM player JOIN team ON player.team_id = team.team_id Order # 1
WHERE height > 1.80 Order # 2
GROUP BY player.team_id Order # 3
HAVING num > 2 Order # 4
ORDER BY num DESC Order # 6
LIMIT 2 Order # 7
Copy the code

As the SELECT statement executes these steps, each step produces a virtual table, which is then passed in as input to the next step. It is important to note that these steps are implicit in SQL execution and are not visible to us.

Since where is filtered first, the group by statement is grouped in advance, with less data participating in the grouping and therefore more efficient execution

5.3 SQL Execution Principles

SELECT starts with the FROM step. At this stage, if multiple tables are checked, the following steps will also be experienced:

  1. First of all, the Cartesian product is calculated by CROSS JOIN, which is equivalent to obtaining vt (Virtual table) 1-1.
  2. ON is used to filter and virtual table VT1-2 is obtained based ON virtual table VT1-1.
  3. Add external rows. If we use left join, right join, or full join, we will involve external rows, that is, add external rows on the basis of virtual table VT1-2 to obtain virtual table VT1-3.

Of course, if we are working with more than two tables, we repeat the above steps until all tables are processed. And this is our raw data.

Once we have the raw data for the query table, which is the final virtual table VT1, we can proceed to the WHERE stage. In this stage, virtual table VT2 is obtained by filtering the results of vt1 table.

Then move on to step 3 and 4, the GROUP and HAVING stages. In this phase, virtual tables VT3 and VT4 are grouped and filtered based on virtual table VT2.

When we are done with the conditional filtering section, we are ready to filter the fields extracted from the table, which is the SELECT and DISTINCT phase.

In the SELECT phase, the desired fields are extracted, and then the duplicate rows are filtered out in the DISTINCT phase to obtain the intermediate virtual tables VT5-1 and VT5-2, respectively.

After we extract the desired field data, we can sort BY the specified field, namely the ORDER BY stage, to obtain the virtual table VT6.

Finally, on the basis of VT6, extract the specified row records, that is, the LIMIT phase, to obtain the final result, corresponding to the virtual table VT7.

Of course, when we write a SELECT statement, we may not have all the keywords, the corresponding stage will be omitted.

At the same time because SQL is a similar to English structured query language, so we write SELECT statements, but also pay attention to the corresponding keyword order, ** the so-called underlying operation principle, is we just talked about the execution order. ** For more details, refer to the high-level architecture that follows

6. Practice after class

Integrated Exercise 1

1. Can the WHERE clause be filtered using group functions? No

2. Query the maximum, minimum, average, and total salaries of employees

SELECT MAX(salary), MIN(salary), AVG(salary), SUM(salary)
FROM employees;
Copy the code

3. Query the maximum, minimum, average, and total salaries of each job_id employee

SELECT job_id, MAX(salary), MIN(salary), AVG(salary), SUM(salary)
FROM employees
GROUP BY job_id;
Copy the code

4. Select the number of employees with each job_id

SELECT job_id, COUNT(*)
FROM employees
GROUP BY job_id;                                              
Copy the code

5. Query the DIFFERENCE between employees’ highest salary and lowest salary (DIFFERENCE)

SELECT MAX(salary), MIN(salary), MAX(salary) - MIN(salary) DIFFERENCE
FROM employees;
Copy the code

6. Query the minimum salary of employees under each manager. The minimum salary cannot be less than 6000

SELECT manager_id, MIN(salary)
FROM employees
WHERE manager_id IS NOT NULL
GROUP BY manager_id
HAVING MIN(salary) > 6000;
Copy the code

7. Query all departments by name, location_id, number of employees, and average salary, in descending order by average salary

SELECT department_name, location_id, COUNT(employee_id), AVG(salary) avg_sal
FROM employees e RIGHT JOIN departments d
ON e.`department_id` = d.`department_id`
GROUP BY department_name, location_id
ORDER BY avg_sal DESC;
Copy the code

8. Query the department name, job name, and minimum wage for each job and department

SELECT department_name,job_id,MIN(salary)
FROM departments d LEFT JOIN employees e
ON e.`department_id` = d.`department_id`
GROUP BY department_name,job_id
Copy the code

Integrated Exercise 2

1. Simple groups

Case 1: Query the average salary of employees in each job category

SELECT AVG(salary),job_id
FROM employees
GROUP BY job_id;
Copy the code

Case 2: Query the number of departments in each location

SELECT COUNT(*),location_id
FROM departments
GROUP BY location_id;
Copy the code

2. Filtering before grouping can be implemented

Example 1: Query the highest salary for each department whose mailbox contains a character

SELECT MAX(salary),department_id
FROM employees
WHERE email LIKE '%a%'
GROUP BY department_id;
Copy the code

Case 2: Query the average salary of each boss who has a bonus

SELECT AVG(salary),manager_id
FROM employees
WHERE commission_pct IS NOT NULL
GROUP BY manager_id;
Copy the code

3. Group and filter

Case 1: Query the number of employees in the department >5

# 1 Query the number of employees in each departmentSELECT COUNT(*),department_id
FROM employees
GROUP BYdepartment_id; Filter the previous ① resultSELECT COUNT(*),department_id
FROM employees
GROUP BY department_id
HAVING COUNT(*)>5;
Copy the code

Case 2: The maximum salary of the employee with bonus for each job > the job number and maximum salary of 12000

SELECT job_id,MAX(salary)
FROM employees
WHERE commission_pct IS NOT NULL
GROUP BY job_id
HAVING MAX(salary)>12000;
Copy the code

Case 3: For each leader with leader number >102, the minimum wage is greater than the leader number and minimum wage of 5000

SELECT manager_id,MIN(salary)
FROM employees
GROUP BY manager_id
Where manager_id>102
HAVING MIN(salary)>5000;
Copy the code

4. Add a sort

Case: The job number and the highest salary of the employee with bonus >6000 for each job, in ascending order of the highest salary

SELECT job_id,MAX(salary) m
FROM employees
WHERE commission_pct IS NOT NULL
GROUP BY job_id
HAVING m>6000
ORDER BY m ;
Copy the code

5. Group by multiple fields

Example: Query the minimum wage for each job type and department, and descending by the minimum wage

SELECT MIN(salary),job_id,department_id
FROM employees
GROUP BY department_id,job_id
ORDER BY MIN(salary) DESC;                   
Copy the code