“This is the 21st day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge 2021”.

Hive data type

1. Basic data types

The Hive String type is equivalent to the database vARCHar type. This type is a variable String, but it cannot state how many characters it can store. It can theoretically store up to 2GB of characters.

2. Set data types

Hive supports three complex data types: ARRAY, MAP, and STRUCT. ARRAY and MAP are similar to ARRAY and MAP in Java, while STRUCT is similar to STRUCT in C in that it encapsulates a collection of named fields and allows arbitrary levels of nesting of complex data types.

Practical case:

1) Suppose a table has the following row, and we use JSON format to represent its data structure. In Hive, the format is

{
  "name": "songsong"."friends": [
    "bingbing"."lili"]."children": {
    "xiao song": 18."xiaoxiao song": 19
  },
  "address": {
    "street": "hui long guan"."city": "beijing"}}Copy the code

2) Based on the above data structure, we created corresponding tables in Hive and imported data.

Create a local test file named test.txt

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
Copy the code

Note that relationships between elements in maps, structs, and arrays can all be represented by the same character, in this case “_”.

3) Create the test table on Hive

create table test(
	name string,
	friends array<string>,
	children map<string, int>,
	address struct<street:string, city:string> 
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
Copy the code
Field Description: Row format delimited fields terminated by ',' -- column delimiter Collection items terminated by '_' --MAP STRUCT and ARRAY Map keys terminated by ':' -- lines terminated for key and value in map terminated by '\n'; -- Line separatorCopy the code

4) Import text data into the test table

Load data local inpath '/opt/module/hive-3.1.2/datas/test. TXT 'into table test;Copy the code

5) Access data in three sets of columns. The following are ARRAY, MAP and STRUCT access methods respectively

hive (default)> select friends[1],children['xiao song'],address.city from test where name="songsong"; OK _c0_c1 city lili 18 Beijing Time taken: 0.076 seconds, Touch_type: 1 row(s)Copy the code

3. Type conversion

The atomic data types of Hive can be converted implicitly, similar to Java type conversion. For example, if an expression is of the INT type, TINYINT is automatically converted to INT. However, Hive does not convert the atomic data types of Hive. INT is not automatically converted to TINYINT and will return an error unless CAST is used.

1) Implicit type conversion rules are as follows

(1) Any integer type can be implicitly converted to a wider range of types, such as TINYINT to INT and INT to BIGINT.

(2) All integer types, floats, and strings can be implicitly converted to DOUBLE.

(3) TINYINT, SMALLINT, INT can be converted to FLOAT.

(4) BOOLEAN types cannot be converted to any other type.

2) Data type conversion can be performed using CAST operation display

For example, CAST(‘1’ AS INT) converts the string ‘1’ to the integer 1; If a CAST fails, such AS CAST(‘X’ AS INT), the expression returns NULL.

DDL data definition

1. Create a database

CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)] ;Copy the code
  • Create a database. The default storage path of the database in HDFS is /user/hive/warehouse/*.db.

    hive (default)> create database db_hive;
    Copy the code
  • Add if not exists to avoid errors in the database to be created. (Standard writing)

    hive (default)> create database db_hive;
    FAILED: Execution Error, return code 1 from 
    org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists
    hive (default)> create database if not exists db_hive;
    Copy the code
  • Create a database and specify the location where the database is stored in the HDFS

    hive (default)> create database db_hive2 location '/db_hive2.db';
    Copy the code

2, query database

2.1. Display database

  • Display database

    hive (default)> show databases;
    Copy the code
  • Filter the database that displays the query

    hive> show databases like 'db_hive*';
    Copy the code

2.2. View database details

  • Display database information

    hive (default)> desc database db_hive;
    db_name	comment	location	owner_name	owner_type	parameters
    db_hive		hdfs://hadoop102:8020/user/hive/warehouse/db_hive.db	moe	USER
    Copy the code
  • Display database details, extended

    hive (default)> desc database extended db_hive;
    Copy the code

2.3. Switch the current database

hive (default)> use db_hive;
Copy the code

3. Modify the database

You can use the ALTER DATABASE command to set key-value pair attribute values for the DBPROPERTIES of a DATABASE to describe the attribute information for that DATABASE.

hive (default)> alter database db_hive set dbproperties('createtime'='20211120');
Copy the code

View the result in Hive

hive (default)> desc database extended db_hive;
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:8020/user/hive/warehouse/db_hive.db	moe	USER	{createtime=20211120}
Copy the code

Delete the database

  • Delete an empty database

    hive (default)> drop database db_hive;
    Copy the code
  • If the deleted database does not exist, you are advised to use if exists to check whether the database exists

    hive> drop database db_hive;
    FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
    hive> drop database if exists db_hive2;
    Copy the code
  • If the database is not empty, run the cascade command to forcibly delete the database

    hive> drop database db_hive;
    FAILED: Execution Error, return code 1 from 
    org.apache.hadoop.hive.ql.exec.DDLTask. 
    InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
    hive> drop database db_hive cascade;
    Copy the code

Create table

Build predicative method

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)]  [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)]  [AS select_statement]Copy the code

Field Description

CREATE TABLE Creates a TABLE with the specified name. If a table with the same name already exists, an exception is thrown; The user can ignore this exception with the IF NOT EXISTS option. (2) The EXTERNAL keyword allows users to create an EXTERNAL table. When creating a table, you can specify a LOCATION that points to the actual data. When deleting a table, the metadata and data of the internal table will be deleted together. (3) COMMENT: Add a COMMENT to the table and column. SORTED BY CLUSTERED BY PARTITIONED BY CLUSTERED BY CLUSTERED Sort one or more columns in the bucket separately. (7) ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] Users can customize serdes or use their own serdes when building tables. If ROW FORMAT or ROW FORMAT DELIMITED is not specified, the built-in SerDe is used. When building a table, you need to specify columns for the table. You can specify a custom SerDe as well as a table column. Hive uses SerDe to determine table column data. SerDe is Serialize or Deserilize. Hive uses SerDe to sequence and deserialize row objects. Common storage file types include SEQUENCEFILE, TEXTFILE, and RCFILE. If the file data is plain text, you can use STORED AS TEXTFILE. If the data needs to be compressed, use STORED AS SEQUENCEFILE. (9) LOCATION: specifies the storage LOCATION of the table in HDFS. (10) AS: followed by a query statement to create a table based on the query result. (11) LIKE allows users to copy existing table structures, but not data.Copy the code

5.1 Management table

  • The theory of

    The tables created by default are so-called administrative tables, sometimes referred to as internal tables. Because of this table, Hive controls (more or less) the lifecycle of the data. Hive will by default the table data is stored in the configuration of the Hive. The metastore. Warehouse. The dir (/ user/Hive/warehouse), as defined by the subdirectory of the directory. When we delete a management table, Hive also deletes the data in the table. Administrative tables are not suitable for sharing data with other tools.

  • A case in field

    • The original data

      1001 ss1 1002 ss2 1003 ss3 1004 ss4 1005 ss5 1006 ss6 1007 ss7 1008 ss8 1009 ss9 1010 ss10 1011 ss11 1012 ss12 1013 ss13  1014 ss14 1015 ss15 1016 ss16Copy the code
    • Common create table

      create table if not exists student(
      id int, name string
      )
      row format delimited fields terminated by '\t'
      stored as textfile
      location '/user/hive/warehouse/student';
      Copy the code
    • Create tables based on query results (the results of the query are added to the newly created table)

      create table if not exists student2 as select id, name from student;
      Copy the code
    • Create tables based on existing table structures

      create table if not exists student3 like student;
      Copy the code
    • Type of query table

      hive (default)> desc formatted student2;
      Table Type: MANAGED_TABLE
      Copy the code

5.2. External tables

  • The theory of

    Because the table is an external table, Hive does not assume it owns the data entirely. Deleting the table does not delete this data, but the metadata that describes the table is deleted.

  • Manage usage scenarios for tables and external tables

    Periodically upload the collected website logs to the HDFS text file every day. In the external table (original log table) on the basis of a large number of statistical analysis, the use of the intermediate table, the result table using internal table storage, data through SELECT+INSERT into the internal table.

  • A case in field

    Create separate department and employee external tables and import data into the tables.

    • The original data

      dept:

      10	ACCOUNTING	1700
      20	RESEARCH	1800
      30	SALES	1900
      40	OPERATIONS	1700
      Copy the code

      Emp:

      7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19000.00 20 7839 KING PRESIDENT 1981-11-17000.00 10 7844 TURNER SALESMAN 7698 1981-9-81500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10Copy the code
    • Upload data to HDFS

    • Create table sentences and create external tables

      Create a department table

      create external table if not exists dept(
      deptno int,
      dname string,
      loc int
      )
      row format delimited fields terminated by '\t';
      Copy the code

      Create employee table

      create external table if not exists emp(
      empno int,
      ename string,
      job string,
      mgr int,
      hiredate string,
      sal double,
      comm double,
      deptno int)
      row format delimited fields terminated by '\t';
      Copy the code
    • View the created table

      hive (default)>show tables;
      Copy the code
    • View table formatting data

      hive (default)> desc formatted dept;
      Table Type: EXTERNAL_TABLE
      Copy the code
    • Delete external table

      hive (default)> drop table dept;
      Copy the code

      After the external table is deleted, the HDFS data is still in HDFS, but the metadata of dept is deleted

5.3 Manage the conversion between tables and external tables

  • Type of query table

    hive (default)> desc formatted student2;
    Table Type: MANAGED_TABLE
    Copy the code
  • Example Change the internal table student2 to the external table

    alter table student2 set tblproperties('EXTERNAL'='TRUE');
    Copy the code
  • Type of query table

    hive (default)> desc formatted student2;
    Table Type: EXTERNAL_TABLE
    Copy the code
  • Example Change the external table student2 to the internal table

    alter table student2 set tblproperties('EXTERNAL'='FALSE');
    Copy the code
  • Type of query table

    hive (default)> desc formatted student2;
    Table Type: MANAGED_TABLE
    Copy the code

Note :(‘EXTERNAL’=’TRUE’) and (‘EXTERNAL’=’FALSE’) are case sensitive.

6. Modify the table

6.1. Rename tables

  • grammar

    ALTER TABLE table_name RENAME TO new_table_name
    Copy the code
  • In field case

    hive (default)> alter table dept1 rename to dept2;
    Copy the code

6.2. Add, modify, and delete table partitions

  • grammar

    • Update the list

      ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name
      column_type [COMMENT col_comment] [FIRST|AFTER column_name]
      Copy the code
    • Add and replace columns

      ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT
      col_comment], ...)
      Copy the code

      ADD (‘ ADD ‘, ‘ADD’, ‘partition’);

      REPLACE replaces all fields in the table.

  • In field case

    • Query the table structure

      hive> desc dept;
      Copy the code
    • Add columns

      hive (default)> alter table dept add columns(deptdesc string);
      Copy the code
    • Query the table structure

      hive> desc dept;
      Copy the code
    • Update the list

      hive (default)> alter table dept change column deptdesc desc string;
      Copy the code
    • Query the table structure

      hive> desc dept;
      Copy the code
    • Replace the column

      hive (default)> alter table dept replace columns(deptno string, dname string, loc string);
      Copy the code
    • Query the table structure

      hive> desc dept;
      Copy the code

6.3 Add, Modify, and replace column information

Drop table

hive (default)> drop table dept;
Copy the code

Three, friendship links

Big data Hive learning journey 1