Solving continuous interval is a common SQL topic in data analysis and data warehouse written interview. Today, I would like to share the written interview questions with you. I look forward to seeing you get your desired offer or gain something!
01 Consecutive digits
Write an SQL query to find all numbers that appear at least three times in a row.
+----+-----+ | Id | Num | +----+-----+ | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 2 | | 5 | 1 | | 6 | 2 | | 7 | 2 | +----+-----+ For example, given the Logs table above, 1 is the only number that occurs at least three times in a row. +-----------------+ | ConsecutiveNums | +-----------------+ | 1 | +-----------------+Copy the code
Solution 1: self-join three times, filter l1, L2,l3 rows whose ids are continuous and Num is equal
Note: If a number occurs more than three times in a row,Num will have a duplicate value, so use distinct to de-duplicate
This method is simple to implement, but time complexity is large, and if it is changed to 4,5.. Ten times, the code is very redundant and not extensible.
select distinct l1.Num as ConsecutiveNums from Logs l1,Logs l2,Logs l3 where l2.Id = l1.Id + 1 and l3.Id = l2.Id + 1 and l1.Num = l2.Num and l2.Num = l3.NumCopy the code
Solution 2: Use custom variables for conditional judgment
The speed is faster than the first method, and it is suitable for any continuous number of times
select distinct Num ConsecutiveNums from ( select Num, Elseif @prev := Num and @count := 1 elseIf @prev := Num and @count := 1 elseif @prev := Num and @count := 1 elseif @prev := Num and @count := 1 elseif @prev := Num and @count := 1 Log (select @prev := 0, @count := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0, @prev := 0Copy the code
Solution three: window function
select distinct Num as ConsecutiveNums
from
(
select Num,Id,
lag(Id,2)over(partition by Num order by Id) as prev
from Logs
)t
where t.Id = t.prev + 2
Copy the code
02 Find the beginning and end numbers of the contiguous interval
Table: Logs
+---------------+---------+ | Column Name | Type | +---------------+---------+ | log_id | int | +---------------+---------+ ID is the primary key of the table above. Each row in the above table contains an ID from the log table. Some ids were later removed from the Logs table. Write an SQL query to get the start and end numbers of the contiguous intervals in the Logs table. Sort the query table by start_id.Copy the code
The format of the query result is as follows: Logs table
+------------+
| log_id |
+------------+
| 1 |
| 2 |
| 3 |
| 7 |
| 8 |
| 10 |
+------------+
Copy the code
Results the table:
+------------+--------------+
| start_id | end_id |
+------------+--------------+
| 1 | 3 |
| 7 | 8 |
| 10 | 10 |
+------------+--------------+
Copy the code
The result table should contain all intervals in the Logs table. From 1 to 3 in the table. From 4 to 6 is not in the table. From 7 to 8 in the table. 9 is not in the table. 10 is in the table.
Answer:
(1) Use two variables, one @id to record the logid, and compare the difference between the current log_id and the previous log_id to determine whether it is consecutive. An @num is used to store the continuous state
select log_id,
case when @id = log_id - 1 then @num := @num
else @num := @num + 1
end num,
@id := log_id
from Logs,(select @num := 0,@id := null)t
Copy the code
The output is as follows
Input: {" headers ": {" Logs" : [" log_id "]}, "rows" : {" Logs ": [[1], [2], [3], [7], [8], [10]]}} output: {" headers" : ["log_id", "num", "@prev_id := log_id"], "values": [[1, "1", 1], [2, "1", 2], [3, "1", 3], [7, "2", 7], [8, "2", 8], [10, "3", 10]]}Copy the code
(2) After the above results are obtained, group them into num fields. The minimum log_id is start_id, and the maximum log_id is end_id.
select min(log_id) start_id,
max(log_id) end_id
from
(
select log_id,
case when @id = log_id - 1 then @num := @num
else @num := @num + 1
end num,
@id := log_id
from Logs,(select @num := 0,@id := null)t
)t2
group by num
Copy the code
03 Consecutive days for reporting system status
Table: Failed
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| fail_date | date |
+--------------+---------+
Copy the code
The primary key of this table is fail_date. This table contains the days of failed tasks.
Table: Succeeded
+--------------+---------+ | Column Name | Type | +--------------+---------+ | success_date | date | +--------------+---------+ The primary key of this table is SUCCESS_date. This table contains the days of successful tasks. The system runs a task every day. Each task is independent of the previous task. The status of the task can be failure or success. Query the start and end date (start_date and end_date) of the continuous same-state period_state between 2019-01-01 and 2019-12-31. That is, if the task fails, it is the start and end dates of the failed state, and if the task succeeds, it is the start and end dates of the successful state. The final results are sorted by start date start_dateCopy the code
The following is an example of query results: Failed table:
+-------------------+
| fail_date |
+-------------------+
| 2018-12-28 |
| 2018-12-29 |
| 2019-01-04 |
| 2019-01-05 |
+-------------------+
Copy the code
Succeeded table:
+-------------------+ | success_date | +-------------------+ | 2018-12-30 | | 2018-12-31 | | 2019-01-01 | | 2019-01-02 | | 2019-01-03 | | 2019-01-06 | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code
Result table:
+--------------+--------------+--------------+ | period_state | start_date | end_date | +--------------+--------------+--------------+ | succeeded | 2019-01-01 | 2019-01-03 | | failed | 2019-01-04 | The 2019-01-05 | | succeeded | 2019-01-06 | 2019-01-06 | + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + results had ignored a 2018 record, The system status is "Succeeded" because we only care about records from 2019-01-01 to 2019-31. All tasks from 2019-01-04 to 2019-01-05 failed, and the system status is Failed. All tasks from 2019-01-06 to 2019-01-06 succeeded, and the system status is Succeeded.Copy the code
The core method is the same as above
(1) join two tables with union all
(2) In order to maintain a @state, num can be kept constant only when the date is continuously increased and @sate remains unchanged
select period_state, min(task_date) as start_date, max(task_date) as end_date from ( select period_state,task_date, case when datediff(task_date,@date) = 1 and @state = period_state then @num := @num else @num := @num + 1 end num, @date := task_date, @state := period_state from ( select 'succeeded' as period_state,success_date as task_date from Succeeded where success_date between '2019-01-01' and '2019-12-31' union all select 'failed' as period_state,fail_date as task_date from Failed where fail_date between '2019-01-01' and '2019-12-31' order by task_date ) t1,(select @date := null,@num := 0,@state := null) t2 )t3 group by period_state,num order by start_dateCopy the code
04 Flow rate of gymnasium
A new stadium has been built in X City, and the daily visitor flow information is recorded in the three columns: id, visit_date and people.
Please write a query statement to find the peak traffic. At peak times, there should be at least three consecutive rows of at least 100 people.
For example, table stadium:
+------+------------+-----------+ | id | visit_date | people | +------+------------+-----------+ | 1 | 2017-01-01 | 10 | | | 2017-01-02 | 109 | 2 | 3 | 2017-01-03 | | 150 | | 2017-01-04 | 99 | | | 2017-01-05 | 145 | | 2017-01-06 | | 6 7 | 1455 | | 199 | 2017-01-07 | | | | 188 | 2017-01-08 8 + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - +Copy the code
For the sample data above, the output is:
+------+------------+-----------+ | id | visit_date | people | +------+------------+-----------+ | 5 | 2017-01-05 | 145 6 | | | 2017-01-06 | | 1455 | | 2017-01-07 | 199 | | | | 188 | 2017-01-08 8 + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - +Copy the code
Tip: There is only one record per day, and the date increases as the ID increases.
Solve using custom variables
1. Define a variable CNT, increment 1 if people >= 100, otherwise 0, so that if three consecutive rows are greater than or equal to 100, the sequence will increase continuously
2. Use id to increase in order to find the peak of the flow of people
select distinct s.* from stadium s,( select id,visit_date,people, case when people >= 100 then @cnt := @cnt + 1 else @cnt := 0 end as cnt from stadium,(select @cnt:=0)a )b where b.cnt >= 3 and s.id between b.id-b.cnt+1 and b.idCopy the code