- The exercise questions are from: leetcode-cn.com/
Delete duplicate email addresses
Write an SQL query to delete all duplicate email mailboxes from the Person table, leaving only the one with the smallest Id.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | [email protected] | | 2 | [email protected] | | 3 | [email protected] | + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + Id is the primary key of the table.Copy the code
For example, after running your query, the Person table above should return the following lines:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | [email protected] |
| 2 | [email protected] |
+----+------------------+
Copy the code
Tip:
- After executing the SQL, the output is the entire
Person
Table. - use
delete
Statements.
-
Find all duplicate email addresses and delete the ones with large ids
-
SQL1:
-
delete p1 from Person p1, Person p2 where p1.Email = p2.Email and p1.Id > p2.Id Copy the code
-
-
SQL2:
-
delete p1 from Person p1 left join Person p2 on p1.Email = p2.Email where p1.Id > p2.Id Copy the code
-
-
SQL3:
-
delete from Person where Id not in (select p.mId from (select min(Id) mId from Person group by Email) p) Copy the code
-
Rising temperature
Watch the Weather
+---------------+---------+ | Column Name | Type | +---------------+---------+ | id | int | | recordDate | date | | Temperature | int | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- + id is the primary key of the table This table contains the temperature of the specific date informationCopy the code
Write an SQL query to find the ids of all the dates that were hotter than the previous (yesterday) date.
Return results in no order. The query result format is as follows:
Weather +----+------------+-------------+ | id | recordDate | Temperature | +----+------------+-------------+ | 1 | The 2015-01-01 10 | | | | 2015-01-02 | | 25 20 | | 3 | | 2015-01-03 | | 2015-01-04 | | 30 4 +----+------------+-------------+ Result table: + - + | | id + - + 2 | | | | + - + 4 2015-01-02 temperature is higher than the day before (10 - > 25) the temperature is higher than the day before 2015-01-04 (20-30) >Copy the code
-
Id is not positively correlated with recordDate, so you must start with recordDate.
-
Related date functions:
- TO_DAYS(date)
- Calculate the number of days from date to January 1, 0000.
- DATE_ADD()
- The function adds the specified interval to a date.
- DATE_ADD(date,INTERVAL expr type)
- The date argument is a valid date expression. The expr parameter is the time interval you want to add.
- DATEDIFF()
- The function returns the number of days between two dates.
- DATEDIFF(date1, date2)
- The date1 and date2 arguments are valid date or date/time expressions, date1-date2
- TO_DAYS(date)
-
SQL1:
-
select w1.id from Weather w1, Weather w2 where w1.Temperature > w2.Temperature and TO_DAYS(w1.recordDate) = TO_DAYS(w2.RecordDate) + 1 Copy the code
-
-
SQL2:
-
select w1.id from Weather w1, Weather w2 where w1.Temperature > w2.Temperature and DATE_ADD(w2.recordDate, INTERVAL 1 DAY) = w1.recordDate Copy the code
-
-
SQL3:
-
select w1.id from Weather w1, Weather w2 where w1.Temperature > w2.Temperature and DATEDIFF(w1.recordDate, w2.recordDate) = 1 Copy the code
-
Itinerary and Users
Table: Trips
+-------------+----------+ | Column Name | Type | +-------------+----------+ | Id | int | | Client_Id | int | | Driver_Id | int | | City_Id | int | | Status | enum | | Request_at | date | + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- + Id is this table's primary key. This is a list of all the taxi trips. Each trip has a unique Id, where Client_Id and Driver_Id are the foreign keys of Users_Id in the Users table. Status is an enumeration type that represents the Status of a trip. The enumerators are (' completed ', 'cancelled_by_driver', 'cancelled_by_client').Copy the code
Table: the Users
+-------------+----------+ | Column Name | Type | +-------------+----------+ | Users_Id | int | | Banned | enum | | Role | enum | + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- + is this table for Users_Id is the primary key. This table holds all users, each user has a unique Users_Id, and Role is an enumeration type representing the identity of the user (' client ', 'driver', 'Partner'). Banned is an enumeration type that indicates whether a user is Banned or not, and its members are (' Yes', 'No').Copy the code
Write an SQL statement to find the cancellation rate of non-banned users (both passenger and driver must not be banned) between “2013-10-01” and “2013-10-03”. Non-prohibited users are those who are Banned as No, and prohibited users are those who are Banned as Yes. The cancellation rate is calculated as follows :(number of non-prohibited user-generated orders cancelled by drivers or passengers)/(total number of non-prohibited user-generated orders). The data in the return result table can be organized in any order. Where the Cancellation Rate needs to be rounded to keep two decimal places.
The query result format is as follows:
Trips table: +----+-----------+-----------+---------+---------------------+------------+ | Id | Client_Id | Driver_Id | City_Id | Status | Request_at | +----+-----------+-----------+---------+---------------------+------------+ | 1 | 1 | 10 | 1 | completed | 2013-10-01 | | 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 | | 3 | 3 | 12 | 6 | completed | 2013-10-01 | | 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 | | 5 | 1 | 10 | 1 | completed | 2013-10-02 | | 6 | 2 | 11 | 6 | completed | 2013-10-02 | | 7 | 3 | 12 | 6 | completed | 2013-10-02 | | 8 | 2 | 12 | 12 | completed | 2013-10-03 | | 9 | 3 | 10 | 12 | completed | 2013-10-03 | | 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 | + - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- + the Users table: +----------+--------+--------+ | Users_Id | Banned | Role | +----------+--------+--------+ | 1 | No | client | | 2 | Yes | client | | 3 | No | client | | 4 | No | client | | 10 | No | driver | | 11 | No | driver | | 12 | No | driver | | 13 | No | driver | + -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + Result table: + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | Day | Cancellation Rate | + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | | 0.33 2013-10-01 0.50 0.00 | | 2013-10-02 | | 2013-10-03 | | + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 2013-10-01: - a total of four requests, including 2 cancelled. - However, the request with Id=2 was made by the forbidden user (User_Id=2), so it should be ignored in the calculation. - Therefore, there are 3 non-prohibited requests to participate in the calculation, 1 of which is cancelled. - The cancellation rate is (1/3) = 0.33 2013-10-02: - There are three requests, of which 0 is cancelled. - However, the request with Id=6 is made by a prohibited user, so it should be ignored in the calculation. - Therefore, there are a total of 2 non-prohibited requests to participate in the calculation, and 0 of them are cancelled. - The cancellation rate is (0/2) = 0.00 2013-10-03: - There are three requests and one of them is cancelled. - However, the request with Id=8 is issued by a prohibited user, so it should be ignored in the calculation. - Therefore, there are a total of 2 non-prohibited requests to participate in the calculation, 1 of which is cancelled. - Cancellation rate is (1/2) = 0.50Copy the code
- There are several methods to calculate the Cancellation Rate:
round(sum(if (Status = 'completed', 0, 1)) / count(Status), 2)
round(avg(Status! ='completed'), 2)
round (cast(sum(case when Status = 'completed' then 0 else 1 end) as float) / count(t.Status),2)
- SQL:
-
select Request_at Day, round(avg(Status! ='completed'), 2) 'Cancellation Rate' from Trips t join Users u1 on (t.Client_Id = u1.Users_Id and u1.Banned = 'No') join Users u2 on (t.Driver_Id = u2.Users_Id and u2.Banned = 'No') where Request_at between '2013-10-01' and '2013-10-03' group by Request_atCopy the code
-
Gameplay Analysis series
Activity table:
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | Event_date | date | | games_played | int | + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- + table primary key is (player_id, event_date). This chart shows the behavior of some game players on the game platform. Each row records the number of games opened by a player logging on to the platform using the same device that day (possibly 0) before exiting the platform.Copy the code
1. Write an SQL query to get the date of each player’s first login. The query result is in the following format:
The Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | 2 | | 2016-03-01 5 | | 1 | 2 | | 2016-05-02 | 6 | 2 | 3 | The 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | | 0 5 | | 2018-07-03 | | 3 | 4 + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + Result table: +-----------+-------------+ | player_id | first_login | +-----------+-------------+ | 1 | 2016-03-01 | | 2 | 2017-06-25 | | 3 | 2016-03-02 | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code
- SQL:
-
select player_id, min(event_date) first_login from Activity group by player_id Copy the code
-
Write an SQL query that describes the name of the device on which each player first logged in. The query result is in the following format:
Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | 2 | | 2016-03-01 5 | | 1 | 2 | | 2016-05-02 | 6 | 2 | 3 | The 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | | 0 5 | | 2018-07-03 | | 3 | 4 +-----------+-----------+------------+--------------+ Result table: +-----------+-----------+ | player_id | device_id | +-----------+-----------+ | 1 | 2 | | 2 | 3 | | 3 | 1 | +-----------+-----------+Copy the code
-
Where is followed by two column names, and only both columns are printed in the first query
-
SQL:
-
select player_id, device_id from Activity where (player_id, event_date) in (select player_id, min(event_date) first_login from Activity group by player_id) Copy the code
-
Write an SQL query that also reports each group of players and the date, and how many games they have played so far. That is, the total number of games played before this date. See the example for details. The query result format is as follows:
Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | 2 | | 2016-03-01 5 | | 1 | 2 | | 2016-05-02 | 6 | 1 | 3 | The 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | | 0 5 | | 2018-07-03 | | 3 | 4 +-----------+-----------+------------+--------------+ Result table: +-----------+------------+---------------------+ | player_id | event_date | games_played_so_far | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 5 | | 2016-03-01 | | 1 11 | | 2016-05-02 | | 1 | | | 12 2017-06-25 | 3 | | 0 2016-03-02 | | 3 | 2018-07-03 | | 5 + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + for players with ID 1, 2016-05-02 play 5+6=11 games, 2017-06-25 play 5+6+1=12 games. For the player with ID 3, 0+5=5 games are played in total on July 03, 2018. Please note that for each player, we only care about the player's login date.Copy the code
- SQL1:
- Window function, sorted by group of players by time.
-
select player_id, event_date, sum(games_played) over(partition by player_id order by event_date) games_played_so_far from Activity Copy the code
- SQL2:
-
select t1.player_id, t1.event_date, sum(t2.games_played) games_played_so_far from Activity t1, Activity t2 where t1.player_id = t2.player_id and t1.event_date >= t2.event_date group by t1.player_id, t1.event_date order by player_id, event_date Copy the code
-
Write an SQL query that reports the percentage of players who log in again the day after their first login, rounded to two decimal places. In other words, you need to count the number of players who have logged in for at least two consecutive days since the first login date, and then divide by the total number of players. The query result format is as follows:
Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | 2 | | 2016-03-01 5 | | 1 | 2 | | 2016-03-02 | 6 | 2 | 3 | The 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | | 0 5 | | 2018-07-03 | | 3 | 4 +-----------+-----------+------------+--------------+ Result table: + -- -- -- -- -- -- -- -- -- -- -- + | fraction | + -- -- -- -- -- -- -- -- -- -- - + 0.33 | | + -- -- -- -- -- -- -- -- -- -- - + only with ID 1 player in the first day after login to log back in, so the answer is a third = 0.33Copy the code
-
It asks you to count the number of players who have logged in for at least two consecutive days since the first login date, and then divide by the total number of players. Find the first login date of the player and calculate the difference between the first login date and the login date. If it is 1, it represents two consecutive login days. Find the number of players whose business trip value is 1.
-
Log in again the day after your first login:
sum(if(DATEDIFF(a.event_date, b.first_date) = 1, 1, 0))
sum(case when DATEDIFF(a.event_date, b.first_date) = 1) then 1 else 0 end)
-
SQL:
-
select round(sum(if(DATEDIFF(t1.event_date, t2.first_date) = 1, 1, 0)) / (select count(distinct player_id) from Activity), 2) fraction from Activity t1 left join (select player_id, min(event_date) first_date from Activity group by player_id) t2 on t1.player_id = t2.player_id Copy the code
-
5. The player’s install date is defined as the player’s first login date. A player’s first-day retention rate is defined as: assuming the number of players with install date X is N, and the number of players re-logging in on a day after X is M, M/N is the first-day retention rate, rounded to two decimal places. Write an SQL query that reports all install dates, the number of players who installed the game that day, and the player’s first day retention. The query result format is as follows:
The Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 1 | 2 | | 2016-03-01 5 | | 1 | 2 | | 2016-03-02 | 6 | 2 | 3 | The 2017-06-25 | 1 | | 3 | 1 | 2016-03-01 | | 0 5 | | 2016-07-03 | | 3 | 4 + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- + Result table: +------------+----------+----------------+ | install_dt | installs | Day1_retention | + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 2 | 0.50 2016-03-01 | | 2017-06-25 | 1 | | 0.00 + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 1 and 3 in the 2016-03-01 installed the game, but only 1 player in the log back in 2016-03-02, Player 2 installed the game on June 25, 2017, but did not re-log in on June 26, 2017, so the retention rate on June 25, 2017 was 0/1=0.00Copy the code
- SQL1:
- Next-day retention = Number of users added on day 1 Number of active users on day 2 / Number of users added on day 1
- In the output, install_dt indicates the time of the first installation, that is, the time of the first login. Find the minimum time and calculate it
- Assume the number of players for the day. Use distinct to avoid duplication
- Day1_retention is the retention rate of the next day. The time difference is used to calculate the number of people who log in for two consecutive days, divided by the total number of people who install the game on the first day, and the result is kept by two decimal places
- Note here that you need to group by login time
-
select t2.install_date install_dt, count(distinct t1.player_id) installs, round(count(distinct case when DATEDIFF(event_date,install_date) = 1 then t1.player_id else NULL end) / count(distinct t1.player_id), 2) Day1_retention from Activity t1 left join (select player_id, min(event_date) install_date from Activity group by player_id) t2 on t1.player_id = t2.player_id group by install_date Copy the code
Median employee salary
The Employee table contains all employees. The Employee table has three columns: Employee Id, company name, and salary.
+-----+------------+--------+ |Id | Company | Salary | +-----+------------+--------+ |1 | A | 2341 | |2 | A | 341 | |3 | A | 15 | |4 | A | 15314 | |5 | A | 451 | |6 | A | 513 | |7 | B | 15 | |8 | B | 13 | |9 | B | 1154 | |10 | B | 1345 | |11 | B | 1221 | |12 | B | 234 | |13 | C | 2345 | |14 | C | 2645 | |15 | C | 2645 | |16 | C | 2652 | |17 | C | 65 | +-----+------------+--------+Copy the code
Write an SQL query to find the median salary for each company. Challenge: Can you solve this problem without using any of the built-in SQL functions?
+-----+------------+--------+ |Id | Company | Salary | +-----+------------+--------+ |5 | A | 451 | |6 | A | 513 | |12 | B | 234 | |9 | B | 1154 | |14 | C | 2645 | +-----+------------+--------+Copy the code
-
The median:
- Sorted by company group and salary, total number C is odd, serial number Rk =(C+1)/2, total number C is even, serial number RK is C/2 and C/2+1, and C/2 < (C+1)/2 < C/2+1 can be simplified in one
rk between (cnt/2) and (cnt/2 + 1)
(cnt%2 = 1 and rk = floor(cnt/2) + 1) or (cnt%2 = 0 and (rk = floor(cnt/2) or rk = floor(cnt/2) + 1))
rk in (floor((cnt + 1)/2), floor((cnt + 2)/2))
abs(rn - (cnt + 1)/2) < 1
-
SQL1:
-
select Id, Company, Salary from (select *, row_number() over(partition by Company order by Salary) rk, count(*) over (partition by Company) cnt from Employee) t where (cnt%2 = 1 and rk = floor(cnt/2) + 1) or (cnt%2 = 0 and (rk = floor(cnt/2) or rk = floor(cnt/2) + 1)) Copy the code
-
-
SQL2:
-
select Id, Company, Salary from (select *, row_number() over(partition by Company order by Salary) rk, count(*) over (partition by company) cnt from Employee) t where rk between (cnt/2) and (cnt/2 + 1) Copy the code
-
-
SQL3:
- The median must occur with a frequency greater than or equal to the absolute value of the difference between the numbers greater than and less than:
-
select t1.Id, t1.Company, t1.Salary from Employee t1 left join Employee t2 on t1.Company = t2.Company group by t1.Company, t1.Salary having sum(case when t1.Salary = t2.Salary then 1 else 0 end) >= abs(sum(sign(t1.Salary - t2.Salary))) order by t1.Id Copy the code
At least 5 direct report managers
The Employee table contains all employees and their managers. Each employee has an Id, and there is a column for the manager’s Id.
+------+----------+-----------+----------+ |Id |Name |Department |ManagerId | +------+----------+-----------+----------+ |101 |John |A |null | |102 |Dan |A |101 | |103 |James |A |101 | |104 |Amy |A |101 | |105 |Anne |A |101 | |106 |Ron |B 101 | | + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- +Copy the code
Given the Employee table, write an SQL query to find managers with at least five direct reports. For the above table, your SQL query should return:
+-------+
| Name |
+-------+
| John |
+-------+
Copy the code
Note: No one is subordinate to you.
- SQL:
-
select Name from Employee where Id in (select ManagerId from Employee group by ManagerId having count(ManagerId) >= 5) Copy the code
-
The median frequency query for a given number
The Numbers table holds the values of Numbers and their frequencies.
+----------+-------------+
| Number | Frequency |
+----------+-------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+----------+-------------+
Copy the code
In this table, the numbers are 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3, so the median is (0 + 0) / 2 = 0.
+ -- -- -- -- -- -- -- -- + | median | + -- -- -- -- -- -- -- - | | | + 0.0000 -- -- -- -- -- -- -- -- +Copy the code
Write a query to find the median of all numbers and name the result median.
-
SQL1:
- If Number is the median, the cumulative Number before Number should be greater than or equal to /2, and the cumulative Number before Number should be less than or equal to /2:
-
select avg(cast(num as float)) median from (select num, Frequency, sum(Frequency) over(order by num) - Frequency prev_sum, sum(Frequency) over(order by num) curr_sum from Numbers) t1, (select sum(Frequency) total_sum from Numbers) t2 where t1.prev_sum <= (cast(t2.total_sum as float) / 2) and t1.curr_sum >= (cast(t2.total_sum as float) / 2)Copy the code
-
SQL2:
-
select avg(num) median from (select n1.num from Numbers n1 join Numbers n2 on n1.num >= n2.num group by n1.num having sum(n2.Frequency) >= (select sum(Frequency) from Numbers)/2 and sum(n2.Frequency) - avg(n1.Frequency) <= (select sum(Frequency) from Numbers) / 2 )s Copy the code
-
-
SQL3:
- The accumulative positive and reverse order of a certain number is greater than or equal to half of the number of numbers in the whole sequence, and the last selected one or two medians can be averaged:
-
select avg(num) median from ((select num, sum(frequency) over(order by num) sum1, sum(frequency) over(order by num desc) sum2 from Numbers) t1, (select sum(frequency) sum3 from Numbers) t2) where sum1 >= sum3 / 2 and sum2 >= sum3 / 2 Copy the code