Official account: Java Xiaokaxiu, website: Javaxks.com

Author: Dimitris Poulopoulos, link: towardsdatascience.com/ten-advance…

As the volume of data continues to grow, so will the demand for qualified data professionals. Specifically, there is a growing demand for professionals fluent in SQL, and not just at the entry-level level.

So Stratascratch founder Nathan Rosidi and WHAT I think are the 10 most important and relevant intermediate to advanced SQL concepts.

That one says, we’re leaving!

1. Common table expressions (CTEs)

If you want to query subqueries, that’s where CTEs come in – CTEs basically creates a temporary table.

Using common table expressions (CTEs) is a good way to modularize and decompose your code in the same way that you would decompose your article into several paragraphs.

Use the subquery in the Where clause to make the following query.

SELECT 
 name,
 salary 
FROM
 People 
WHERE
 NAME IN ( SELECT DISTINCT NAME FROM population WHERE country = "Canada" AND city = "Toronto" ) 
 AND salary >= (
 SELECT
  AVG( salary ) 
 FROM
  salaries 
WHERE
 gender = "Female")
Copy the code

This may seem hard to understand, but what if there are many subqueries in a query? This is where CTEs come in.

with toronto_ppl as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
)
, avg_female_salary as (
   SELECT AVG(salary) as avgSalary
   FROM salaries
   WHERE gender = "Female"
)
SELECT name
       , salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
      AND salary >= (SELECT avgSalary FROM avg_female_salary)
Copy the code

It is now clear that the Where clause is filtered in the name of Toronto. If you notice, Ctes are useful because you can break the code into smaller chunks, but they are also useful because they allow you to assign variable names to each CTE (namely toronto_PPL and AVG_FEMale_SALARY)

Also, CTEs allows you to do more advanced techniques, such as creating recursive tables.

2. The recursive CTEs.

A recursive CTE is a CTE that references itself, just like a recursive function in Python. Recursive CTES are especially useful because they involve querying hierarchical data for organizational charts, file systems, link diagrams between web pages, and so on.

Recursive CTE has three parts:

Anchor artifact: An initial query that returns the base results of the CTE

Recursive member: Recursive query that references CTE. This is all associated with the anchor component

Termination conditions for stopping recursive artifacts

Here is an example of a recursive CTE that gets the manager ID for each employee ID:

with org_structure as (
   SELECT id
          , manager_id
   FROM staff_members
   WHERE manager_id IS NULL
   UNION ALL
   SELECT sm.id
          , sm.manager_id
   FROM staff_members sm
   INNER JOIN org_structure os
      ON os.id = sm.manager_id
Copy the code

Temporary functions

Check this out if you want to learn more about temporary functions, but knowing how to write temporary functions is important for the following reasons:

It allows you to break up blocks of code into smaller blocks of code. It’s great for writing clean code. It prevents duplication, and allows you to reuse code that is similar to using functions in Python. Consider the following example:

SELECT name
       , CASE WHEN tenure < 1 THEN "analyst"
              WHEN tenure BETWEEN 1 and 3 THEN "associate"
              WHEN tenure BETWEEN 3 and 5 THEN "senior"
              WHEN tenure > 5 THEN "vp"
              ELSE "n/a"
         END AS seniority 
FROM employees
Copy the code

Instead, you can use temporary functions to capture sample sentences.

CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS ( CASE WHEN tenure < 1 THEN "analyst" WHEN tenure BETWEEN 1 and  3 THEN "associate" WHEN tenure BETWEEN 3 and 5 THEN "senior" WHEN tenure > 5 THEN "vp" ELSE "n/a" END ); SELECT name , get_seniority(tenure) as seniority FROM employeesCopy the code

The query itself is simpler and more readable with temporary functions, and you can reuse the seniority function!

4. Pivot data using CASE WHEN

You’re likely to see a lot of questions asking you to use CASE WHEN in statements, simply because it’s such a versatile concept. This allows you to write complex conditional statements if you want to assign a value or class based on other variables.

Less well known, it also allows you to pivot data. For example, if you have a monthly column and you want to create a single column for each month, you can use statements to trace the data.

Example problem: Write an SQL query to reformat the table so that there is one revenue column per month.

Initial table:

id revenue month
1 8000 Jan
2 9000 Jan
3 10000 Feb
1 7000 Feb
1 6000 Mar

Result table:

id Jan_Revenue Feb_Revenue Mar_Revenue . Dec_Revenue
1 8000 7000 6000 . null
2 9000 null null . null
3 null 10000 null . null

5.EXCEPT vs NOT IN

Except for the almost different operations. Both are used to compare rows between two queries/tables. There is a subtle nuance between the two men.

First, remove duplicates in addition to filtering and return different rows with different lines that are not in.

Again, there is no longer a single column compared to each query/table except for the same number of columns in the query/table. Recommended: Java interview exercises treasure book

6. Since the link

An SQL table joins itself. You might think it doesn’t work, but you’d be surprised how common it is. In many real lives, data is stored in one large table rather than many smaller tables. In this case, self-connection may be required to solve unique problems.

Let’s look at an example.

Example problem: Given the following table of employees, write an SQL query to learn the salaries of employees who earn more than their managers. For the table above, Joe is the only employee who earns more than his manager.

Id Name Salary ManagerId
1 Joe 70000 3
2 Henry 80000 4
3 Sam 60000 NULL
4 Max 90000 NULL

Answer:

SELECT  
    a.Name as Employee  
FROM  
    Employee as a  
    JOIN Employee as b on a.ManagerID = b.Id  
WHERE a.Salary > b.Salary
Copy the code

7.Rank vs Dense Rank vs Row Number

It is a very common application that ranks rows and values. Here are some examples of how companies often use rankings:

  • Top customers ranked by number of purchases, profits, etc

  • Rank the number of top products sold

  • Rank top countries by largest sales

  • Rank the top videos watched in minutes watched, number of different viewers and so on.

In SQL, you can assign “levels” to rows in several ways, which we’ll explore using examples. Consider the following Query and results:

SELECT Name  
 , GPA  
 , ROW_NUMBER() OVER (ORDER BY GPA desc)  
 , RANK() OVER (ORDER BY GPA desc)  
 , DENSE_RANK() OVER (ORDER BY GPA desc)  
FROM student_grades
Copy the code

ROW_NUMBER () returns the unique number at the start of each row. When relationships exist (for example, BOB vs Carrie), ROW_NUMBER () allocates arbitrary numbers if no second criterion is defined.

Rank () returns the unique number of each row starting from 1, except when there is a relationship, the Rank () will be assigned the same number. Again, the gap will follow a repeating rank.

Dense_rank () is similar to rank (), except that there is no gap after repeated rank. Note that using dense_rank (), Daniel ranks 3rd, not 4th ().

###v8. Calculating Delta Values Another common application is to compare values over time. For example, what is the delta between this month’s sales and last month’s? Or what is this month and this month and last month?

This is when Lead () and LAG () come into play when comparing values across time periods to calculate Deltas.

Here are some examples:

# Comparing each month's sales to last month  
SELECT month  
       , sales  
       , sales - LAG(sales, 1) OVER (ORDER BY month)  
FROM monthly_sales  
# Comparing each month's sales to the same month last year  
SELECT month  
       , sales  
       , sales - LAG(sales, 12) OVER (ORDER BY month)  
FROM monthly_sales
Copy the code

9. Calculate the run total

If you know about row_number () and lag ()/lead (), this may not come as a surprise to you. But if you don’t, this can be one of the most useful window features, especially when you want to visualize growth!

Using the window function with SUM (), we can calculate the total number of runs. See the following example:

SELECT Month  
       , Revenue  
       , SUM(Revenue) OVER (ORDER BY Month) AS Cumulative  
FROM monthly_revenue
Copy the code

10. Date and time manipulation

You should certainly expect some sort of SQL problem involving date-time data. For example, you might need to group data or convert a variable format from DD-MM-YYYY to a simple month.

Some features you should be aware of are:

  • refining

  • The Japanese yen

  • Date_add date_sub.

  • date_trunc.

Example problem: Given a weather table, write an SQL query to find the ids of all the days with higher temperatures compared to their previous (yesterday) date.

Id(INT) RecordDate(DATE) Temperature(INT)
1 2015-01-01 10
2 2015-01-02 25
3 2015-01-03 20
4 2015-01-04 30

Answer:

SELECT  
    a.Id  
FROM  
    Weather a,  
    Weather b  
WHERE  
    a.Temperature > b.Temperature  
    AND DATEDIFF(a.RecordDate, b.RecordDate) = 1
Copy the code

Thanks for reading!

In this way! I hope this helps you in your interview preparation – I’m sure if you know these 10 internal concepts then you’ll do well when it comes to most SQL questions.

As always, MAY you study hard and do your best!