@[TOC]
A Hive, SQL
1.1. Window functions
1.1.1. Calculate the cumulative access times of each user
There are the following users accessing data (T1)SQL is required to calculate the cumulative number of access times for each user, as shown in the following table: answer1. Create table
create table action (
userId string,
visitDate string,
visitCount int
) row format delimited fields terminated by "\t";
Copy the code
2. Insert data
insert into table action values('u01'.'2021/1/21'.'5');
insert into table action values('u02'.'2021/1/23'.'6');
insert into table action values('u03'.'2021/1/22'.'8');
insert into table action values('u04'.'2021/1/20'.'3');
insert into table action values('u01'.'2021/1/23'.'6');
insert into table action values('u01'.'2021/2/21'.'8');
insert into table action values('U02'.'2021/1/23'.'6');
insert into table action values('U01'.'2021/2/22'.'4');
Copy the code
3. Convert the time format
select from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm') from action;
Copy the code
4. Count the access counts of each user in a month
select
lower(userId) userId, -- Change case
from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm') visitMonth,
sum(visitCount) visit_month_count
from action
group by lower(userId),from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm')
Copy the code
5. Count the total number of visits
select
t1.userId userId,
t1.visitMonth visitMonth,
t1.visit_month_count visit_month_count,
sum(t1.visit_month_count) over (partition by userId order by visitMonth rows between UNBOUNDED PRECEDING AND CURRENT ROW) sum_month
from
(
select
lower(userId) userId,
from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm') visitMonth,
sum(visitCount) visit_month_count
from action
group by lower(userId),from_unixtime(unix_timestamp(visitDate,'yyyy/mm/dd'),'yyyy-mm')
) t1
Copy the code
Field to explain
Following the first row of the partition, we have an unbounded class. N is the offset relative to the current row following: preceding, and unbounded by the end of the partition. N: indicates the offset from the current row. Current Row: indicates the current row. The offset is 0Copy the code