All say “gold nine silver ten”, soon October is coming to an end, I believe there are quite a lot of small partners did not find the right job. In the written test process, there will always be one or two “interesting” SQL questions, to test a logical thinking candidates, which is very unfriendly to the “white” entry-level job. Don’t worry, this blog has compiled some of the most common “SQL” pen questions in job interviews to help you get ahead in the next interview.

        

1. Query users who have logged in continuously for more than 3 days

This is a very classic problem, and here is one way to think about it.

The table information is as follows:


        

Step1: The user login date is deleted

Since a user may log in multiple times on the same day, we first need to undo the user login date.

select DISTINCT date(date) as "Date".id from demo01;
Copy the code

Query result:


Step2: Count with row_number() over()

With the results of the first step, we can open a window, group them by ID, sort them by date in ascending order, and get the ranking of each date.

select *,row_number() over(PARTITION by id order by Date of ` `) as cum from (select DISTINCT date(date) as Date of ` `.id from demo01)a;
Copy the code

Query result:Why do we need to sort the time in this step?

As you can see, the order after opening the window with row_number is continuous, so if the date is also continuous, their difference is a fixed value.

Step3: date minus count is worth the result

The date_sub function is used to obtain the date difference because the HQL is used here.

select *,date_sub(Date of ` `,cum) as Results ` ` from (select *,row_number() over(PARTITION by id order by Date of ` `) as cum from (select DISTINCT date(date) as Date of ` `.id from demo01)a)b;
Copy the code

Query result:

Step4: group according to id and result and calculate count

In the last step, we can calculate count by grouping according to id and difference directly according to the difference obtained in step3. If you need to log in for more than three consecutive days, check whether the count value is greater than or equal to 3.

select id.count(*) from (select *,date_sub(Date of ` `,cum) as Results ` ` from (select *,row_number() over(PARTITION by id order by Date of ` `) as cum from (select DISTINCT date(date) as Date of ` `.id from demo01)a)b)c GROUP BY id.Results ` ` having count(*) > =3;
Copy the code

Running results:The answer is already there. Users with ids 1 and 3 have logged in continuously for at least 3 days or more, and their consecutive logins are 3 and 4 days, respectively.

2. Count the total number of access times of each user

This is a question that often appears in the written test. We can review it according to the author’s ideas.

The table information is as follows:

SQL is required to calculate the cumulative number of access times for each user, as shown in the following table:

The user id in subtotal The cumulative
u01 The 2017-01 11 11
u01 The 2017-02 12 23
u02 The 2017-01 12 12
u03 The 2017-01 8 8
u04 The 2017-01 3 3

Step1: modify the data format

Backward from the results, we need to query the data grouped by year-month, so we will first process the original data in this step.

select
     userId,
     date_format(regexp_replace(visitDate,'/'.The '-'),'yyyy-MM') mn,
     visitCount
from
 action; t1Copy the code

Treatment results:

Step2: calculate the monthly traffic volume of each person

To make the subquery more aesthetically pleasing, we will replace the result of the previous step with t1. With this step, we can capture the number of visits per user, per month.

select
    userId,
    mn,
    sum(visitCount) mn_count
from
 t1 group byuserId,mn; t2Copy the code

Query result:         

Step3: Calculate the monthly traffic volume

We express the result of the second step in terms of the variable T2. At this point, we use a sum window function to group the userids and sort the mn time.

select
    userId,
    mn,
    mn_count,
    sum(mn_count) over(partition by userId order by mn) mn_all
from t2; Copy the code

Final result:


A complete SQL

Tips: the above steps are not complete SQL, each step using variables to replace the previous step of THE SQL statement is just for the convenience of everyone to show, in fact, the result of running is the author put the complete SQL into the run ha ~

select
    userId,
    mn,
    mn_count,
    sum(mn_count) over(partition by userId order by mn) mn_all
from ( select  userId,  mn,  sum(visitCount) mn_count  from  (select  userId,  date_format(regexp_replace(visitDate,'/'.The '-'),'yyyy-MM') mn,  visitCount  from  action)t1 group by userId,mn)t2; Copy the code

3. Group TopN

There are 50W stores. When a customer visits any product in any store, an access log is generated. The log table is called Visit, and the user id is user_id.

         Demand: top3 visitor information of each store. Output store name, visitor ID and number of visits.

Step1: Query the number of visits of each store by each user

Because we ultimately need to obtain the information of the top3 users of each store, in this step, we first calculate the number of visits of each user of each store.

select shop,user_id,count(*) ct
from visit
group byshop,user_id; t1Copy the code

Calculation results:


Step2: Calculate the ranking of the visits of each store by users

With the first step, we want to get the top three visits of the users under each store. There is no doubt that we need to use the windowing function rank.

Why not use row_number?

Row_number does not rank the same for the same data. If we select Topic3, we must keep the same number of accesses

select shop,user_id,ct,rank(a)over(partition by shop order by ct) rk
fromt1; t2Copy the code

Calculation results:


Step3: Take the top 3 data of each store

With the results of step2, it is not easy for us to get the data of the top three stores

select shop,user_id,ct
from t2
where rk<=3;
Copy the code

Calculation results:


A complete SQL

Now that the query results are in place, put the SQL from the previous step together

select 
shop,
user_id,
ct
from
(select shop, user_id, ct, rank(a)over(partition by shop order by ct) rk from (select shop, user_id, count(*) ct from visit group by shop, user_id)t1 )t2 where rk<=3; Copy the code

conclusion

Whether we read a book or brush the question, does not depend on the number, but must require “fine”. This requires us to learn to think, learn to draw inferences. The person who really has the ability to solve problems, I believe that must not spend time in a large number of brush problems, but know from different types of exercises, timely summary review of the people.

Above 3 SQL “small dishes” is not to meet everyone, later have the opportunity to sum up some other topics for you, this article is over here. Curious about the technological universe, if you like this article, you can scan the code to pay attention to the author’s original public account [Simian bacteria], and we will see you next time!


Pay attention to get high-quality mind maps, Internet front-line factory face classics, big data collection boutique books… Looking forward to your attention!