SQL is one of the most important programming languages for data analysis and data processing, so jobs related to data science (such as data analysts, data scientists, and data engineers) often ask questions about SQL during job interviews.

SQL interview questions are designed to assess the candidate’s skills and problem-solving skills. The key, therefore, is for candidates not only to write the right queries based on sample data, but also to consider scenarios and edge cases as if they were real data sets.

​​

​​

In this article, I’ll introduce common patterns in SQL interview questions and offer some tips for manipulating them in SQL queries.

1. Ask questions

To nail a SQL interview, the most important thing is to ask as many questions as possible to get all the details about the given task and data sample. With a good understanding of the requirements, you can then save a lot of time iterating through problems and handle edge cases well.

I’ve noticed that many candidates often start writing solutions without fully understanding SQL problems or data sets. Then, after I pointed out the problem with their solution, they had to revise the query repeatedly. In the end, they waste a lot of interview time in iterations and may not even end up with the right solution.

I recommend that you approach your SQL interview as if you were working with a business partner. So before you provide a solution, you should understand all the requirements for the data request.

For example:

Find the top 3 highest-paid employees.

​​

​​

Sample employee_salary table

Here you should ask the interviewer to clarify exactly what “top three” means. Should I include 3 employees in the results? How do you want me to handle relationships? Also, check the sample employee data carefully. What is the data type of the salary field? Do you need to clear data before computing?

2, Select which JOIN

​​

​​

In SQL, joins are typically used to merge information from multiple tables.

There are four different types of joins, but for the most part, we’ll just use INNER, LEFT, and FULLJOIN, because RIGHTJOIN is not very intuitive and can be easily rewritten using LEFTJOIN. In SQL interviews, you need to choose the correct JOIN you want to use based on the specific requirements of a given question.

For example:

Find the total number of courses attended by each student. (Provide student ID, name and number of courses selected.)

​​

​​

Sample student and class_history tables

You may have noticed that not all students who appear in the class_history table appear in the student table, probably because they have already graduated. (This is actually quite typical in transactional databases, because records that are no longer active are often deleted.)

Depending on whether the interviewer wants to include graduates in the results, we need to combine the two tables using LEFT JOIN or INNER JOIN:

WITH class_count AS (

SELECTstudent_id, COUNT(*) ASnum_of_class

FROMclass_history

GROUPBYstudent_id

)

SELECT

c.student_id,

s.student_name,

c.num_of_class

FROMclass_count c

– CASE 1: include only active students

JOINstudent s ONc.student_id = s.student_id

– CASE 2: include all students

— LEFT JOIN student s ON c. tudent_id = s.tudent_id

3、 GROUP BY

GROUP BY is the most important feature in SQL because it is widely used for data aggregation. If you see keywords such as sum, average, minimum, or maximum in an SQL question, this is a good indication that you should probably use GROUP BY in your query.

A common pitfall is confusing WHERE and HAVING when filtering data in GROUP BY — a mistake I’ve seen a lot of people make.

For example:

Calculate the average required course GPA for each student for each academic year and find students who qualify for Dean’s List (GPA≥3.5) for each semester.

​​

​​

Sample gpa_history table

Since we only consider required courses in our GPA calculation, we need to exclude electives using WHERE is_required=TRUE.

We need the average GPA of each student in each academic year, so we will GROUP BY student_id and school_year at the same time, and take the average of the GPA column. Finally, we only retain rows where the student’s average GPA is higher than 3.5, which can be achieved using HAVING. It all adds up to the following:

SELECT

student_id,

school_year,

AVG(gpa) ASavg_gpa

FROMgpa_history

WHEREis_required = TRUE

GROUPBYstudent_id, school_year

HAVINGAVG (gpa) > = 3.5

Note: Whenever GROUP BY is used in a query, only group-by and aggregate columns can be selected because row-level information in the other columns has been discarded.

4, SQL query execution sequence

Most people write SQL queries from top to bottom, starting with SELECT.

But did you know that the SQL engine does not perform SELECT until later when it executes a function? Here is the order in which SQL queries are executed:

FROM, JOIN

WHERE

GROUP BY

HAVING

SELECT

DISTINCT

ORDER BY

LIMIT, OFFSET

Consider the previous example again:

Because we want to filter out electives before calculating average GPA, I use WHERE is_required=TRUE instead of HAVING, because WHERE is executed before GROUP BY and HAVING. The reason I couldn’t write HAVING AVg_gpa >= 3.5 is that AVg_gpa is defined as part of the SELECT and therefore cannot be referenced in the steps performed prior to the SELECT.

I recommend following the engine’s order of execution when writing queries, which can be useful when writing complex queries.

5. Window function

The Window function is also frequently used in SQL interviews. There are five common Window functions:

RANK/DENSE_RANK/ROW_NUMBER: They assign a RANK to each row by sorting specific columns. If any partition columns are given, the row is ranked in the partition group to which it belongs.

LAG/LEAD: It retrieves column values from the previous row or the next row in the specified order and partition group.

In SQL interviews, it is important to understand the differences between ranking functions and know when to use LAG/LEAD.

For example:

Find the top 3 highest-paid employees in each department.

​​

​​

Another example is the Employee_SALARY table

When an SQL question asks to calculate “TOP N”, we can use ORDER BY or ranking functions to answer the question.

But in this example, it requires calculating “TOP N X in each Y,” which strongly implies that we should use a ranking function because we need to rank the rows in each partition group.

The following query will find exactly three of the highest paid employees, regardless of their relationship, as follows:

WITH T AS (

SELECT

*,

ROW_NUMBER OVER( PARTITIONBYdepartment_id ORDERBYemployee_salary DESC) ASrank_in_dep

FROMemployee_salary)

SELECT* FROMT

WHERErank_in_dep <= 3

– Note: When using ROW_NUMBER, each row will have a unique rank number and ranks for tied records are assigned randomly. For exmaple, Rimsha and Tiah may be rank 2 or 3 in different query runs.

In addition, we can choose other ranking functions depending on how the relationship is handled. Again, the details are important!

​​

​​

ROW_NUMBER,RANK,DENSE_RANK Result comparison

6. Repeat items

Another common pitfall in SQL interviews is to ignore data duplication.

Although some columns in the sample data seem to have different values, interviewers expect candidates to consider all possibilities as if they were working with a real data set.

Such as:

In the previous example of the Employee_Salary table, you could have employees share the same name.

A simple way to avoid potential problems caused by duplicate entries is to always use the ID column to uniquely identify different records.

For example:

Use the Employee_Salary table to find the total salaries of all employees in each department.

The correct solution is GROUP BY Employee_id, and then use SUM(employee_salary) to calculate the total salary. If you need employee names, join with the EMPLOYEE table at the end to retrieve employee name information.

The wrong way is to use GROUP BY employee_name.

7、 NULL

In SQL, any predicate can produce one of three values true, false, and NULL, the latter being the reserved key for unknown or missing data values. Working with NULL data sets can be unexpectedly tricky.

In SQL interviews, interviewers may pay particular attention to whether the solution handles NULL values. Sometimes, it is obvious that one column cannot be nullabL, but for most of the other columns, there is a good chance that there will be NULL values.

Suggestion: Check whether key columns in the sample data are Nullable.

If you can, use functions such as IS(NOT)NULL, IFNULL, and COALESCE to override these edge cases.

8, communication

Last but not least: always communicate with the interviewer during the SQL interview.

Many of the candidates I’ve interviewed are reticent and only speak up when in doubt. Of course, if they come up with the perfect solution, that’s not a problem.

But maintaining communication during technical interviews can often be valuable.

For example, you can talk about your understanding of the problem and data, explain how you plan to solve the problem, why you are using certain functions rather than other options, and what extreme scenarios are being considered.

9,

(1) Ask questions first and collect the details you need

(2) Choose carefully between INNER, LEFT, and FULL JOIN

(3) Use GROUP BY to aggregate data and use WHERE and HAVING correctly

(4) Understand the differences between the three ranking functions

(5) Know when to use the LAG/LEAD window function

(6) If you have difficulty creating complex queries, try to follow the SQL execution order

(7) Consider potential data problems, such as duplicates and NULL values

8) Communicate your ideas to the interviewer

For programmers who are learning to program or want to be promoted at work, if you want to improve your programming skills, help you improve! The author here may be able to help you ~

— — —

Wechat official account: C language programming Learning base

Share (source code, project actual combat video, project notes, basic introduction tutorial)

This article is reproduced from the Internet using the “CC by-SA 4.0CN” protocol, only for learning and exchange, the copyright of the content belongs to the original author