“This is the 26th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

A, functions,

1. Built-in system functions

1.1. Check the built-in functions of the system

hive (default)> show functions;
Copy the code

1.2, display the use of built-in functions

hive (default)> desc function upper;
Copy the code

1.3. Display the usage of built-in functions in detail

hive (default)> desc function extended upper;
Copy the code

2. Commonly used built-in functions

2.1. Empty field assignment

Function description

NVL: Assign a value to data whose value is NULL in the format of NVL(value, default_value). The NVL function returns the value of default_value if value is NULL, value otherwise, and NULL if both arguments are NULL.

Data preparation: Use the employee table

Query: If comm is NULL, -1 is used instead

hive (default)> select comm,nvl(comm,-1) from emp; OK comm _c1 null-1.0 300.0 300.0 500.0 500.0 null-1.0 1400.0 1400.0 null-1.0 null-1.0 null-1.0 null-1.0 null-1.0 0.0 0.0 0.0 0.0 Null-1.0 null-1.0 null-1.0 null-1.0 null-1.0Copy the code

Query: If the comm of an employee is NULL, the leader ID is used instead

hive (default)> select comm,nvl(comm,mgr) from emp; OK comm _C1 NULL 7902.0 300.0 300.0 500.0 500.0 NULL 7839.0 1400.0 1400.0 NULL 7839.0 NULL 7839.0 NULL 7566.0 NULL 0.0 0.0 NULL 7788.0 NULL 7698.0 NULL 7566.0 NULL 7782.0Copy the code

2.2, CASE WHEN THEN ELSE END

Data preparation
demand

Find out how many men and women there are in different departments. The results are as follows:

Create emp_sex. TXT and import data

[moe@hadoop102 datas]$vi emp_sex.txt Wukong A male dahai A male Song Song B male Feng sister A female ting sister B female tingting B femaleCopy the code

Create hive tables and import data

hive (default)> create table emp_sex
              > (name string, dept_id string, sex string)
              > row format delimited fields terminated by '\t';
              
hive (default)> load data local inpath '/opt/module/hive-3.1.2/datas/emp_sex.txt' into table emp_sex;
Copy the code

Query data as required

Hive (default)> select dept_id, > sum(case sex when 'male' then 1 else 0 end) male_count, > sum(case sex when 'female' then 1 else 0 end) female_count > from emp_sex > group by dept_id;Copy the code

2.3. Row to column

Description of correlation function

CONCAT (string A/col, string B/col… : Returns the concatenated result of the input string. Any input string is supported.

CONCAT_WS(separator, str1, str2,…) : it is a special form of CONCAT(). The separator between the remaining arguments of the first argument. The delimiter can be the same string as the remaining arguments. If the delimiter is NULL, the return value will also be NULL. This function skips any NULL and empty strings after the delimiter argument. The delimiter will be added between the concatenated strings;

Note: CONCAT_WS must be "string or array<string>

COLLECT_SET(col) : the COLLECT_SET(col) function accepts only basic data types. It is used to aggregate the values of a field to generate an Array.
Data preparation
demand

Group zodiac signs with people with the same blood type. The results are as follows:

Create a local constellation. TXT and import the data

[moe@hadoop102 datas]$vim person_info.txt Sun Wukong Aries A Sea Sagittarius A song Song Aries B pig eight jie Aries A Phoenix sister Sagittarius A old teacher Aries BCopy the code

Create hive tables and import data

hive (default)> create table person_info
              > (name string, constellation string, blood_type string)
              > row format delimited fields terminated by '\t';

hive (default)> load data local inpath '/opt/module/hive-3.1.2/datas/person_info.txt' into table person_info;
Copy the code

Query data as required

hive (default)> SELECT
              > t1.c_b,
              > CONCAT_WS("|",collect_set(t1.name))
              > FROM (
              > SELECT
              > NAME,
              > CONCAT_WS(',',constellation,blood_type) c_b
              > FROM person_info
              > )t1
              > GROUP BY t1.c_b;
Copy the code

2.4. Column change

Function description

EXPLODE(col) : EXPLODE(col) : Divide complex Array or Map structures into multiple rows in a hive column.

LATERAL VIEW

Usage: LATERAL VIEW UDTF (expression) tableAlias AS columnAlias

Use with split, explode, and other UDTFs to split a column of data into multiple rows and then aggregate the split data.
Data preparation
demand

Expands the array data in the movie category. The results are as follows:

Create local movie. TXT and import data

[moe@hadoop102 datas]$vi movie_info.txt Person of Interest suspense, Action, Science Fiction, Plot Lie to Me Suspense, Police, Psychology, Plot Wolf Warrior 2 War, Action, DisasterCopy the code

Create hive tables and import data

hive (default)> create table movie_info
              > (movie string, category string)
              > row format delimited fields terminated by '\t';

hive (default)> load data local inpath '/opt/module/hive-3.1.2/datas/movie_info.txt' into table movie_info;
Copy the code

Query data as required

hive (default)> SELECT
              > movie,
              > category_name
              > FROM
              > movie_info
              > lateral VIEW
              > explode(split(category,",")) movie_info_tmp AS category_name;
Copy the code

2.5. Window Function (Windowing function)

Description of correlation function

OVER() : Specifies the size of the data window in which the analysis function works, which may vary from row to row. CURRENT ROW: indicates the CURRENT ROW

N PRECEDING: Indicates the PRECEDING N rows of data

N FOLLOWING: indicates the next n rows of data

-Leonard: I had an UNBOUNDED starting point.

Between the UNBOUNDED PRECEDING and the UNBOUNDED FOLLOWING

LAG(col,n,default_val) : indicates the data in the NTH row ahead

LEAD(col,n, default_val) : the NTH row after the data

NTILE(n) : Distributes the rows of an ordered window to the specified data group. The groups are numbered starting with 1. For each row, NTILE returns the number of the group to which the row belongs. Note: n must be of type int.

Data preparation: name, OrderDate, cost

Jack,2017-01-01,10 Tony,2017-01-02,15 Jack,2017-02-03,23 Tony,2017-01-04,29 Jack,2017-01-05,46 Jack,2017-04-06,42 Tony,2017-01-07,50 Jack,2017-01-08,55 mart,2017-04-08,62 mart,2017-04-09,68 Neil,2017-05-10,12 mart,2017-04-11,75 Neil, 2017-06-12, 80 mart, 2017-04-13, 94Copy the code

demand
- Query the total number of customers who purchased in April 2017
- Query customer purchase details and total monthly purchases
- In the above scenario, the cost of each customer is accumulated according to the date
- Query each customer’s last purchase time
- Query the order information for the first 20% of the time

Create local business.txt and import data

[moe@hadoop102 datas]$ vi business.txt
Copy the code

Create hive tables and import data

hive (default)> create table business(
              > name string,
              > orderdate string,
              > cost int)
              > row format delimited fields terminated by ',';

hive (default)> load data local inpath '/opt/module/hive-3.1.2/datas/business.txt' into table business;
Copy the code

Query data as required

Query the total number of customers who purchased in April 2017

The over() function is not used

Hive (default)> select name,count(*) > from business > where substring(orderdate,1,7) = '2017-04' > group by name; hive (default)> select name,count(*) > from business > where substring(orderdate,1,7) = '2017-04' > group by name;Copy the code

Use the over() function

Hive (default)> select > name,count(*) over() > from business > where substring(orderdate,1,7) = '2017-04' > group by name;Copy the code

Query customer purchase details and total monthly purchases

hive (default)> select
              > name,orderdate,cost,sum(cost) over(partition by month(orderdate))
              > from business;
Copy the code

Add the cost of each customer according to the date

select name,orderdate,cost, Sum (cost) over() as sample1,-- sum(cost) over(partition by name) as sample2, Sum (cost) over(partition by name order by orderDate) as sample3, Sum (cost) over(partition by name order by OrderDate rows between UNBOUNDED PRECEDING and current row) as sample4 ,-- Same as sample3, sum(cost) over(partition by name order by orderDate rows between 1 PRECEDING and current row) as sample5, Sum (cost) over(partition by name order by orderDate rows between 1 PRECEDING AND 1 FOLLOWING) as sum(cost) over(partition by name order by OrderDate rows between 1 PRECEDING AND 1 FOLLOWING) as Sample6,-- Sum (cost) over(partition by name order by orderDate rows between current row and UNBOUNDED FOLLOWING) as sample7 -- current line and all subsequent lines from business;Copy the code

Rows must be followed by the ORDER BY clause to restrict the results of the sort, using a fixed number of rows to limit the number of data rows in the partition

Check the customer’s last purchase date

select 
name,orderdate,cost,
lag(orderdate,1,'1900-01-01') over(partition by name order by orderdate ) as time1,
lag(orderdate,2) over (partition by name order by orderdate) as time2
from business;
Copy the code

Query the order information for the first 20% of the time

select * from (
select name,orderdate,cost, ntile(5) over(order by orderdate) sorted
from business
) t
where sorted = 1;
Copy the code

2.6, Rank

Function description

If the RANK() is the same, it will repeat.

DENSE_RANK() repeats when it has the same sort, reducing the total number.

ROW_NUMBER() is evaluated in order
Data preparation
demand

Calculate the ranking of achievement in each subject.

Create a local score. TXT file and import data

[moe@hadoop102 datas]$vi score.txt Sun Wukong Language 87 Sun Wukong mathematics 95 Sun Wukong English 68 Sea Language 94 Sea mathematics 56 Sea English 84 Song Song Language 64 Song Song Mathematics 86 Song Song English 84 Tingting Chinese 65 Tingting Mathematics 85 Tingting English 78Copy the code

Create hive tables and import data

hive (default)> create table score( > name string, > subject string, > score int) > row format delimited fields terminated by '\t'; Hive (default)> load data local inpath '/opt/module/hive-3.1.2/datas/score. TXT 'into table score;Copy the code

Query data as required

select name,
subject,
score,
rank() over(partition by subject order by score desc) rp,
dense_rank() over(partition by subject order by score desc) drp,
row_number() over(partition by subject order by score desc) rmp
from score;
Copy the code

3. Custom functions

Hive provides a limited number of built-in functions, such as Max /min. You can customize udFs to facilitate extension.
If the built-in functions provided by Hive cannot meet your service processing requirements, you can use user-defined functions (UDF).
User-defined functions are classified into the following three categories
- UDF (User Defined – Function)
  
  When you come into a
- User-defined Aggregation Function (UDAF) Indicates the Aggregation Function
  
  Into a more
  
  Similar to: count/ Max /min
- User-defined table-generating Functions (UDTF)
  
  More than one in and out the other
  
  Such as lateral view explodes ()
Official Document Address

Cwiki.apache.org/confluence/…
The programming steps
- Inherit classes provided by Hive
  
  org.apache.hadoop.hive.ql.udf.generic.GenericUDF
  
  org.apache.hadoop.hive.ql.udf.generic.GenericUDTF
- Implement abstract methods in a class
- Create functions in the Hive command line window
  
  Add the jar
```
add jar linux_jar_path
Copy the code
```
  Create a function
```
create [temporary] function [dbname.]function_name AS class_name;
Copy the code
```
- Delete functions in the Hive command line window
```
drop [temporary] function [if exists] [dbname.]function_name;
Copy the code
```

4. Customize UDF functions

demand

Create a custom UDF implementation that computs the length of a given string. , such as:
```
hive(default)> select my_len("abcd");
4
Copy the code
```
Create a Maven project Hive

Import dependence

<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> The < version > 3.1.2 < / version > < / dependency > < / dependencies >Copy the code

Create a class

public class MyUDF extends GenericUDF {

    // Number of verification data parameters
    @Override
    public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {

        if(arguments.length ! =1) {
            throw new UDFArgumentException("The number of arguments is not one.");
        }

        return PrimitiveObjectInspectorFactory.javaIntObjectInspector;
    }

    // Process data
    @Override
    public Object evaluate(DeferredObject[] arguments) throws HiveException {

        //1. Fetch the input data
        String input = arguments[0].get().toString();

        //2. Check whether the input data is null
        if (input == null) {
            return 0;
        }

        //3. Return the length of input data
        return input.length();
    }

    @Override
    public String getDisplayString(String[] children) {
        return ""; }}Copy the code

Upload the jar package to the server /opt/module/hive-3.1.2/datas/myudf.jar

Add jar packages to hive classpath

Hive (default)> add jar /opt/module/hive-3.1.2/datas/myudf.jar;Copy the code

Create temporary functions associated with developed Java classes

hive (default)> create temporary function my_len as "com.moe.hive.udf.MyUDF";
Copy the code

You can use custom functions in the HQL

hive (default)> select my_len('moe');
OK
_c0
3

hive (default)> select length('moe');
OK
_c0
3
Copy the code

5. Customize UDTF functions

demand

A custom UDTF implementation splits an arbitrary separator string into separate words, for example:
```
hive(default)> select myudtf("hello,world,hadoop,hive", ",");
hello
world
hadoop
hive
Copy the code
```

Code implementation

public class MyUDTF extends GenericUDTF {

    // A collection of output data
    private ArrayList<String> outPutList = new ArrayList<>();

    @Override
    public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException {

        // The default column name of the output data, which can be overridden by an alias
        List<String> fieldNames = new ArrayList<>();
        fieldNames.add("word");

        // The type of output data
        List<ObjectInspector> fieldOIs = new ArrayList<>();
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        // The final return value
        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }

    // Process input data: hello, MOE,hive
    @Override
    public void process(Object[] args) throws HiveException {

        //1. Fetch the input data
        String input = args[0].toString();

        //2. Split the string according to ","
        String[] words = input.split(",");

        //3
        for (String word : words) {

            // Clear the collection
            outPutList.clear();

            // Put the data into the collection
            outPutList.add(word);

            // Output dataforward(outPutList); }}// End method
    @Override
    public void close(a) throws HiveException {}}Copy the code

Upload the jar package to the server /opt/module/hive-3.1.2/datas/myudtf.jar

Add the JAR package to the Hive classpath

Hive (default)> add jar /opt/module/hive-3.1.2/datas/myudtf.jar;Copy the code

Create temporary functions associated with developed Java classes

hive (default)> create temporary function myudtf as "com.moe.hive.udtf.MyUDTF";
Copy the code

Use custom functions

hive (default)> select myudtf('hello,moe,hive',',');
OK
word
hello
moe
hive
Copy the code

Two, friendship links

Big data Hive learning journey 4

Big data Hive learning journey 3

Big data Hive learning journey 2

Big data Hive learning journey 1

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Big data Hive learning journey 5

A, functions,

1. Built-in system functions

1.1. Check the built-in functions of the system

1.2, display the use of built-in functions

1.3. Display the usage of built-in functions in detail

2. Commonly used built-in functions

2.1. Empty field assignment

2.2, CASE WHEN THEN ELSE END

2.3. Row to column

2.4. Column change

2.5. Window Function (Windowing function)

2.6, Rank

3. Custom functions

4. Customize UDF functions

5. Customize UDTF functions

Two, friendship links

Big data Hive learning journey 5

A, functions,

1. Built-in system functions

1.1. Check the built-in functions of the system

1.2, display the use of built-in functions

1.3. Display the usage of built-in functions in detail

2. Commonly used built-in functions

2.1. Empty field assignment

2.2, CASE WHEN THEN ELSE END

2.3. Row to column

2.4. Column change

2.5. Window Function (Windowing function)

2.6, Rank

3. Custom functions

4. Customize UDF functions

5. Customize UDTF functions

Two, friendship links

Related Posts

Boss: The database changes, to the neat

Spring Boot 2.3.0 features Redis topology dynamic sensing

Character painting, with characters to draw a girlfriend