preface
The offset function +over() is used as an example.
First, the concept of migration analysis function
-
Lag (Col,n,default) : statistics on the NTH row in a group.
- The first parameter is the column name
- The second argument is the NTH line up (optional, default is 1).
- The third parameter is the default value (if the NTH value above is NULL, or NULL if not specified).
-
Lead (COL, N,default) : Indicates the value of the NTH row in the packet, contrary to LAG.
- The first parameter is the column name
- The second argument is line n down (optional, default is 1)
- The third argument is the default value (if the behavior below n is NULL, NULL if not specified).
-
First_value (col) : used to fetch the value of the first col up to the current row after sorting within the group.
-
Last_value (col) : Indicates the value of the last col in the row after sorting.
Two, the example explanation
2.1 case
Select * from user [yantian]; select * from user [lisi];
Analysis: Implemented by the lag() window function
SELECT
user_name,
pay_time,
lag( pay_time, 1, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag1,
lag( pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag2,
lag( pay_time, 2, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag3,
lag( pay_time, 2 ) over ( PARTITION BY user_name ORDER BY pay_time ) lag4
FROM
user_order
WHERE
user_name IN ( 'lisi'.'yantian' )
Copy the code
The results
2.2 case
Select * from user [yantian]; select * from user [lisi];
Analysis: Implemented through the Lead () window function
SELECT
user_name,
pay_time,
lead( pay_time, 1, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead1,
lead( pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead2,
lead( pay_time, 2, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead3,
lead( pay_time, 2 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead4
FROM
user_order
WHERE
user_name IN ( 'lisi'.'yantian' );
Copy the code
The results
2.3 case
Query the number of users whose payment interval exceeds 10 days
Analysis:
- For the same user, the order time of adjacent orders is compared. If the time interval between adjacent orders is more than 10 days, the user needs to be counted
- In this process, the payment time of adjacent orders needs to be subtracted
Implementation steps
(1) Group the payment time according to the lead window function, and move the payment time of the next line to the previous line. (2) Make the difference of the payment time through the previous step, and then delete the user, and then carry out the count operationCopy the code
implementation
Step 1: Group the payment time according to the payment time through the Lead window function and move the payment time of the next line to the previous line
SELECT
user_name,
pay_time,
lead( pay_time, 1 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead_time
FROM
user_order;
Copy the code
Step 2: Make the difference of the payment time through the previous step, and then remove the user, and then count operation
SELECT
count( DISTINCT user_name )
FROM
( SELECT user_name, pay_time, lead( pay_time, 1 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead_time FROM user_order ) a
WHERE
datediff( a.lead_time, a.pay_time )> 10;
Copy the code
2.4 case
Query the user with the longest payment interval per year
Implementation steps
(1) Group by user and year of payment, and merge the payment time of the previous row into the next row through the lag() window function. (2) Calculate the order time interval and rank the order time interval within a year. (3) Use the query result above as a sub-query table to obtain a ranking value of 1Copy the code
implementation
Step 1: Group by user and payment year, and merge the payment time of the previous row into the next row through the lag() window function
SELECT YEAR
( pay_time ) year_time,
user_name,
pay_time,
lag( pay_time ) over ( PARTITION BY user_name, YEAR ( pay_time ) ORDER BY pay_time ASC ) lag_time
FROM
user_order;
Copy the code
Step 2: Calculate the order interval and rank them according to their order interval within the year
SELECT
a.year_time,
a.user_name,
datediff( a.pay_time, a.lag_time ) interval_days,
rank(a)over ( PARTITION BY a.year_time ORDER BY datediff( a.pay_time, a.lag_time ) DESC ) rank1
FROM
(
SELECT YEAR
( pay_time ) year_time,
user_name,
pay_time,
lag( pay_time ) over ( PARTITION BY user_name, YEAR ( pay_time ) ORDER BY pay_time ASC ) lag_time
FROM
user_order
) a;
Copy the code
Step 3: Use the above query result as a sub-query table to obtain the ranking result of 1
SELECT
b.year_time,
b.user_name,
b.interval_days
FROM
(
SELECT
a.year_time,
a.user_name,
datediff( a.pay_time, a.lag_time ) interval_days,
rank(a)over ( PARTITION BY a.year_time ORDER BY datediff( a.pay_time, a.lag_time ) DESC ) rank1
FROM
(
SELECT YEAR
( pay_time ) year_time,
user_name,
pay_time,
lag( pay_time ) over ( PARTITION BY user_name, YEAR ( pay_time ) ORDER BY pay_time ASC ) lag_time
FROM
user_order
) a
) b
where b.rank1 = 1;
Copy the code
The results