Hive configuration and internal functions

configuration

Global: hive-site.xml external command:./ hiveconf XXX session Settings (in effect only during the session phase) : JDBC :hive2://node3:10000> set mapreduce.job.states =3;

Conditions for function

  • If (condition,true return execute content, false return execute content)
  • case

    • case a when b then c [when d then e]* [else f] end

      • If a=b, it returns c; otherwise, it returns e; otherwise, it returns f
    • case when a then b [when c then d]* [else e] end

      • If condition A is true, return D; if condition C is true, return D; if neither is true, return E

The transition function

  • Cast function

    • SmallInts are implicitly converted to SmallInts when comparing SmallInts with INT using Hive, but larger INT sizes cannot be converted to SmallInts unless cast is used. select cast(100 as SMALLINT);
    • Any integer type can be implicitly cast to a larger range of types. TINYINT, SMALLINT, INT, BIGINT, FLOAT, and STRING can be implicitly converted to a DOUBLE; Yes, you can’t see that STRING can also be implicitly converted to DOUBLE! But remember, BOOLEAN types cannot be converted to any other data type!

order by,sort by,distribute by,cluster by

  • When multiple partitions exist, distribute by and order by cannot be shared, replace by sort by, or replace by cluster by if they are in ascending order

The ranks of conversion

  • function

    • Concat concatenates strings of simple types
    • Concat_ws (concatenation symbol, complex type input)
    • collect_set | collect_list

      • The set operation removes duplicate data during the collection process; the list does not
  • demand

Smith 30 Allen 30 Ward 20 Jones 30 Martin 30 Blake 10 Clark 20 Scott 10 King 30 Turner 20 Adams 30 James 20 Ford 10 MILLER create table emp( deptno int, ename string ) row format delimited fields terminated by '\t'; select deptno,concat_ws("|",collect_set(ename)) from emp group by deptno; Select collect_set(ename) from emp group by deptno; +-----------------------------------------------------+--+ | ["CLARK","KING","MILLER"] | | ["SMITH","JONES","SCOTT","ADAMS","FORD"] | | ["ALLEN","WARD","MARTIN","BLAKE","TURNER","JAMES"] | +-----------------------------------------------------+--+
  • Column turned

    • SQL > explode UDTF
select deptno, explode(name) from emp2; 1. If there is only one function after select, no other fields are allowed 2. Cannot be nested, but other functions can be nested Lateral View Lateral View UDTF (expression) tableAlias AS columnAlias select deptno,name from emp2 Lateral VIEW  explode(names) tmp as name; 0: jdbc:hive2://node3:10000> select * from emp2 LATERAL VIEW explode(name) tmp as name1; +--------------+-----------------------------------------------------+------------+--+ | emp2.deptno | emp2.name | tmp.name1 | +--------------+-----------------------------------------------------+------------+--+ | 10 | ["CLARK","KING","MILLER"] | CLARK | | 10 | ["CLARK","KING","MILLER"] | KING | | 10 | ["CLARK","KING","MILLER"] | MILLER | | 20 | ["SMITH","JONES","SCOTT","ADAMS","FORD"] | SMITH | | 20 | ["SMITH","JONES","SCOTT","ADAMS","FORD"] | JONES | | 20 | ["SMITH","JONES","SCOTT","ADAMS","FORD"] | SCOTT | | 20 | ["SMITH","JONES","SCOTT","ADAMS","FORD"] | ADAMS | | 20 |  ["SMITH","JONES","SCOTT","ADAMS","FORD"] | FORD | | 30 | ["ALLEN","WARD","MARTIN","BLAKE","TURNER","JAMES"] | ALLEN | |  30 | ["ALLEN","WARD","MARTIN","BLAKE","TURNER","JAMES"] | WARD | | 30 | ["ALLEN","WARD","MARTIN","BLAKE","TURNER","JAMES"] | MARTIN |
  • Stitching complex types
select deptno,concat_ws("|",collect_set(ename)) as c1 from emp group by depto;
+---------+---------------------------------------+--+
| deptno  |                  c1                   |
+---------+---------------------------------------+--+
| 10      | CLARK|KING|MILLER                     |
| 20      | SMITH|JONES|SCOTT|ADAMS|FORD          |
| 30      | ALLEN|WARD|MARTIN|BLAKE|TURNER|JAMES  |
+---------+---------------------------------------+--+
  • Restore the above to the original data
select explode(split(c1,"\\|"))
from
(select deptno,concat_ws("|",collect_list(ename)) as c1 from emp group by deptno) as tmp

Reflect function

-reflect ('java.lang.Math','aba',-10) -reflect ('java.lang.Math','aba',-10) -reflect ('java.lang.Math','aba',-10)

Hive analysis function

Analytical function one

  • Format: analysis function over (partition by order by XXX XXX [asc | desc])
  • Place: after SELECT, before FROM
  • Analysis function:

    • O grouping topn

      • Row_number () does not consider duplicates
      • And rank() takes into account repetition, plus n
      • Dense_rank () takes into account repetition but no placeholder, followed by normal rank
      • Ntile (n) over(order by XXX) where xx=1
+--------------+----------------+--------+--+
| t2.cookieid  | t2.createtime  | t2.pv  |
+--------------+----------------+--------+--+
| cookie1      | 2018-04-10     | 1      |
| cookie1      | 2018-04-11     | 5      |
| cookie1      | 2018-04-12     | 7      |
| cookie1      | 2018-04-13     | 3      |
| cookie1      | 2018-04-14     | 2      |
| cookie1      | 2018-04-15     | 4      |
| cookie1      | 2018-04-16     | 4      |
| cookie2      | 2018-04-10     | 2      |
| cookie2      | 2018-04-11     | 3      |
| cookie2      | 2018-04-12     | 5      |
| cookie2      | 2018-04-13     | 6      |
| cookie2      | 2018-04-14     | 3      |
| cookie2      | 2018-04-15     | 9      |
select cookieid,rank() over(partition by cookieid order by pv desc) as rank2 from t2;

Hive Analysis Function II

-- A. create table itcast_t1( cookieid string, createtime string, --day pv int ) row format delimited fields terminated by ','; Load data local inpath '/root/hivedata/itcast_t1.dat' into table itcast_t1; Cookie1,2018-04-10,1 cookie1,2018-04-11,5 cookie1,2018-04-12,7 cookie1,2018-04-13,3 cookie1,2018-04-14,2 Cookie1, 2018-04-15, 4 cookie1, 2018-04-16, 4
  • sum,avg,max,count
  • SUM, cascading values

    • Apply the scene, calculate the current row and all the preceding rows, calculate the current row and the previous three rows, the first three rows and the next row, the current and all the following rows

      • sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
      • sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv3
      • sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv4
      • sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5
      • sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6

Analytic function three

  • LAG,LEAD,FIRST_VALUE,LAST_VALUE

    • Application scenario, before and after the two lines of comparison
+--------------+----------------------+---------+--+ | t4.cookieid | t4.createtime | t4.url | +--------------+----------------------+---------+--+ | cookie1 | 2018-04-10 10:00:02 | url2 | | cookie1 | 2018-04-10 10:00:00 | url1 | | cookie1 | 2018-04-10 10:03:04 | 1url3 | | cookie1 | 2018-04-10 10:50:05 | url6 | | cookie1 | 2018-04-10 11:00:00 | url7 | | cookie1 | 2018-04-10 10:10:00 | url4 | | cookie1 | 2018-04-10 10:50:01 | url5 | | cookie2  | 2018-04-10 10:00:02 | url22 | | cookie2 | 2018-04-10 10:00:00 | url11 | | cookie2 | 2018-04-10 10:03:04 | 1url33 | # The second parameter lag, lead indicates how many times the current bar moves backward \ forward, SELECT cookieid, createtime, url, ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn, SELECT ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn, LAG(createtime,1,'1970-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime) AS last_1_time, LAG(createtime,2) OVER(PARTITION BY cookieid ORDER BY createtime) AS last_2_time FROM t4; +-----------+----------------------+---------+-----+----------------------+--+ | cookieid | createtime | url | rn | last_1_time | +-----------+----------------------+---------+-----+----------------------+--+ | cookie1 | 2018-04-10 10:00:00 | url1 | 1 | 1970-01-01 00:00:00 | | cookie1 | 2018-04-10 10:00:02 | url2 | 2 | 2018-04-10 10:00:00 | | cookie1  | 2018-04-10 10:03:04 | 1url3 | 3 | 2018-04-10 10:00:02 | | cookie1 | 2018-04-10 10:10:00 | url4 | 4 | 2018-04-10 10:03:04 | | cookie1 | 2018-04-10 10:50:01 | url5 | 5 | 2018-04-10 10:10:00 | | cookie1 | 2018-04-10 10:50:05 | url6 | 6 10:50:01 | 2018-04-10 |