Official account: Java Xiaokaxiu, website: Javaxks.com
Author: Dimitris Poulopoulos, link: towardsdatascience.com/ten-advance…
As the volume of data continues to grow, so will the demand for qualified data professionals. Specifically, there is a growing demand for professionals fluent in SQL, and not just at the entry-level level.
So Stratascratch founder Nathan Rosidi and WHAT I think are the 10 most important and relevant intermediate to advanced SQL concepts.
That one says, we’re leaving!
1. Common table expressions (CTEs)
If you want to query subqueries, that’s where CTEs come in – CTEs basically creates a temporary table.
Using common table expressions (CTEs) is a good way to modularize and decompose your code in the same way that you would decompose your article into several paragraphs.
Use the subquery in the Where clause to make the following query.
SELECT
name,
salary
FROM
People
WHERE
NAME IN ( SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto" )
AND salary >= (
SELECT
AVG( salary )
FROM
salaries
WHERE
gender = "Female")
Copy the code
This may seem hard to understand, but what if there are many subqueries in a query? This is where CTEs come in.
with toronto_ppl as (
SELECT DISTINCT name
FROM population
WHERE country = "Canada"
AND city = "Toronto"
)
, avg_female_salary as (
SELECT AVG(salary) as avgSalary
FROM salaries
WHERE gender = "Female"
)
SELECT name
, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)
Copy the code
It is now clear that the Where clause is filtered in the name of Toronto. If you notice, Ctes are useful because you can break the code into smaller chunks, but they are also useful because they allow you to assign variable names to each CTE (namely toronto_PPL and AVG_FEMale_SALARY)
Also, CTEs allows you to do more advanced techniques, such as creating recursive tables.
2. The recursive CTEs.
A recursive CTE is a CTE that references itself, just like a recursive function in Python. Recursive CTES are especially useful because they involve querying hierarchical data for organizational charts, file systems, link diagrams between web pages, and so on.
Recursive CTE has three parts:
Anchor artifact: An initial query that returns the base results of the CTE
Recursive member: Recursive query that references CTE. This is all associated with the anchor component
Termination conditions for stopping recursive artifacts
Here is an example of a recursive CTE that gets the manager ID for each employee ID:
with org_structure as (
SELECT id
, manager_id
FROM staff_members
WHERE manager_id IS NULL
UNION ALL
SELECT sm.id
, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os
ON os.id = sm.manager_id
Copy the code
Temporary functions
Check this out if you want to learn more about temporary functions, but knowing how to write temporary functions is important for the following reasons:
It allows you to break up blocks of code into smaller blocks of code. It’s great for writing clean code. It prevents duplication, and allows you to reuse code that is similar to using functions in Python. Consider the following example:
SELECT name
, CASE WHEN tenure < 1 THEN "analyst"
WHEN tenure BETWEEN 1 and 3 THEN "associate"
WHEN tenure BETWEEN 3 and 5 THEN "senior"
WHEN tenure > 5 THEN "vp"
ELSE "n/a"
END AS seniority
FROM employees
Copy the code
Instead, you can use temporary functions to capture sample sentences.
CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS ( CASE WHEN tenure < 1 THEN "analyst" WHEN tenure BETWEEN 1 and 3 THEN "associate" WHEN tenure BETWEEN 3 and 5 THEN "senior" WHEN tenure > 5 THEN "vp" ELSE "n/a" END ); SELECT name , get_seniority(tenure) as seniority FROM employeesCopy the code
The query itself is simpler and more readable with temporary functions, and you can reuse the seniority function!
4. Pivot data using CASE WHEN
You’re likely to see a lot of questions asking you to use CASE WHEN in statements, simply because it’s such a versatile concept. This allows you to write complex conditional statements if you want to assign a value or class based on other variables.
Less well known, it also allows you to pivot data. For example, if you have a monthly column and you want to create a single column for each month, you can use statements to trace the data.
Example problem: Write an SQL query to reformat the table so that there is one revenue column per month.
Initial table:
id | revenue | month |
---|---|---|
1 | 8000 | Jan |
2 | 9000 | Jan |
3 | 10000 | Feb |
1 | 7000 | Feb |
1 | 6000 | Mar |
Result table:
id | Jan_Revenue | Feb_Revenue | Mar_Revenue | . | Dec_Revenue |
---|---|---|---|---|---|
1 | 8000 | 7000 | 6000 | . | null |
2 | 9000 | null | null | . | null |
3 | null | 10000 | null | . | null |
5.EXCEPT vs NOT IN
Except for the almost different operations. Both are used to compare rows between two queries/tables. There is a subtle nuance between the two men.
First, remove duplicates in addition to filtering and return different rows with different lines that are not in.
Again, there is no longer a single column compared to each query/table except for the same number of columns in the query/table. Recommended: Java interview exercises treasure book
6. Since the link
An SQL table joins itself. You might think it doesn’t work, but you’d be surprised how common it is. In many real lives, data is stored in one large table rather than many smaller tables. In this case, self-connection may be required to solve unique problems.
Let’s look at an example.
Example problem: Given the following table of employees, write an SQL query to learn the salaries of employees who earn more than their managers. For the table above, Joe is the only employee who earns more than his manager.
Id | Name | Salary | ManagerId |
---|---|---|---|
1 | Joe | 70000 | 3 |
2 | Henry | 80000 | 4 |
3 | Sam | 60000 | NULL |
4 | Max | 90000 | NULL |
Answer:
SELECT
a.Name as Employee
FROM
Employee as a
JOIN Employee as b on a.ManagerID = b.Id
WHERE a.Salary > b.Salary
Copy the code
7.Rank vs Dense Rank vs Row Number
It is a very common application that ranks rows and values. Here are some examples of how companies often use rankings:
-
Top customers ranked by number of purchases, profits, etc
-
Rank the number of top products sold
-
Rank top countries by largest sales
-
Rank the top videos watched in minutes watched, number of different viewers and so on.
In SQL, you can assign “levels” to rows in several ways, which we’ll explore using examples. Consider the following Query and results:
SELECT Name
, GPA
, ROW_NUMBER() OVER (ORDER BY GPA desc)
, RANK() OVER (ORDER BY GPA desc)
, DENSE_RANK() OVER (ORDER BY GPA desc)
FROM student_grades
Copy the code
ROW_NUMBER () returns the unique number at the start of each row. When relationships exist (for example, BOB vs Carrie), ROW_NUMBER () allocates arbitrary numbers if no second criterion is defined.
Rank () returns the unique number of each row starting from 1, except when there is a relationship, the Rank () will be assigned the same number. Again, the gap will follow a repeating rank.
Dense_rank () is similar to rank (), except that there is no gap after repeated rank. Note that using dense_rank (), Daniel ranks 3rd, not 4th ().
###v8. Calculating Delta Values Another common application is to compare values over time. For example, what is the delta between this month’s sales and last month’s? Or what is this month and this month and last month?
This is when Lead () and LAG () come into play when comparing values across time periods to calculate Deltas.
Here are some examples:
# Comparing each month's sales to last month
SELECT month
, sales
, sales - LAG(sales, 1) OVER (ORDER BY month)
FROM monthly_sales
# Comparing each month's sales to the same month last year
SELECT month
, sales
, sales - LAG(sales, 12) OVER (ORDER BY month)
FROM monthly_sales
Copy the code
9. Calculate the run total
If you know about row_number () and lag ()/lead (), this may not come as a surprise to you. But if you don’t, this can be one of the most useful window features, especially when you want to visualize growth!
Using the window function with SUM (), we can calculate the total number of runs. See the following example:
SELECT Month
, Revenue
, SUM(Revenue) OVER (ORDER BY Month) AS Cumulative
FROM monthly_revenue
Copy the code
10. Date and time manipulation
You should certainly expect some sort of SQL problem involving date-time data. For example, you might need to group data or convert a variable format from DD-MM-YYYY to a simple month.
Some features you should be aware of are:
-
refining
-
The Japanese yen
-
Date_add date_sub.
-
date_trunc.
Example problem: Given a weather table, write an SQL query to find the ids of all the days with higher temperatures compared to their previous (yesterday) date.
Id(INT) | RecordDate(DATE) | Temperature(INT) |
---|---|---|
1 | 2015-01-01 | 10 |
2 | 2015-01-02 | 25 |
3 | 2015-01-03 | 20 |
4 | 2015-01-04 | 30 |
Answer:
SELECT
a.Id
FROM
Weather a,
Weather b
WHERE
a.Temperature > b.Temperature
AND DATEDIFF(a.RecordDate, b.RecordDate) = 1
Copy the code
Thanks for reading!
In this way! I hope this helps you in your interview preparation – I’m sure if you know these 10 internal concepts then you’ll do well when it comes to most SQL questions.
As always, MAY you study hard and do your best!