“This is the 21st day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge 2021”.

Hive data type

1. Basic data types

The Hive String type is equivalent to the database vARCHar type. This type is a variable String, but it cannot state how many characters it can store. It can theoretically store up to 2GB of characters.

2. Set data types

Hive supports three complex data types: ARRAY, MAP, and STRUCT. ARRAY and MAP are similar to ARRAY and MAP in Java, while STRUCT is similar to STRUCT in C in that it encapsulates a collection of named fields and allows arbitrary levels of nesting of complex data types.

Practical case:

1) Suppose a table has the following row, and we use JSON format to represent its data structure. In Hive, the format is

{
  "name": "songsong"."friends": [
    "bingbing"."lili"]."children": {
    "xiao song": 18."xiaoxiao song": 19
  },
  "address": {
    "street": "hui long guan"."city": "beijing"}}Copy the code

2) Based on the above data structure, we created corresponding tables in Hive and imported data.

Create a local test file named test.txt

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
Copy the code

Note that relationships between elements in maps, structs, and arrays can all be represented by the same character, in this case “_”.

3) Create the test table on Hive

create table test(
	name string,
	friends array<string>,
	children map<string, int>,
	address struct<street:string, city:string> 
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
Copy the code

Field Description: Row format delimited fields terminated by ',' -- column delimiter Collection items terminated by '_' --MAP STRUCT and ARRAY Map keys terminated by ':' -- lines terminated for key and value in map terminated by '\n'; -- Line separatorCopy the code

4) Import text data into the test table

Load data local inpath '/opt/module/hive-3.1.2/datas/test. TXT 'into table test;Copy the code

5) Access data in three sets of columns. The following are ARRAY, MAP and STRUCT access methods respectively

hive (default)> select friends[1],children['xiao song'],address.city from test where name="songsong"; OK _c0_c1 city lili 18 Beijing Time taken: 0.076 seconds, Touch_type: 1 row(s)Copy the code

3. Type conversion

The atomic data types of Hive can be converted implicitly, similar to Java type conversion. For example, if an expression is of the INT type, TINYINT is automatically converted to INT. However, Hive does not convert the atomic data types of Hive. INT is not automatically converted to TINYINT and will return an error unless CAST is used.

1) Implicit type conversion rules are as follows

(1) Any integer type can be implicitly converted to a wider range of types, such as TINYINT to INT and INT to BIGINT.

(2) All integer types, floats, and strings can be implicitly converted to DOUBLE.

(3) TINYINT, SMALLINT, INT can be converted to FLOAT.

(4) BOOLEAN types cannot be converted to any other type.

2) Data type conversion can be performed using CAST operation display

For example, CAST(‘1’ AS INT) converts the string ‘1’ to the integer 1; If a CAST fails, such AS CAST(‘X’ AS INT), the expression returns NULL.

DDL data definition

1. Create a database

CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)] ;Copy the code

Create a database. The default storage path of the database in HDFS is /user/hive/warehouse/*.db.
```
hive (default)> create database db_hive;
Copy the code
```

Add if not exists to avoid errors in the database to be created. (Standard writing)

hive (default)> create database db_hive;
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists
hive (default)> create database if not exists db_hive;
Copy the code

Create a database and specify the location where the database is stored in the HDFS
```
hive (default)> create database db_hive2 location '/db_hive2.db';
Copy the code
```

2, query database

2.1. Display database

Display database

hive (default)> show databases;
Copy the code

Filter the database that displays the query

hive> show databases like 'db_hive*';
Copy the code

2.2. View database details

Display database information

hive (default)> desc database db_hive;
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:8020/user/hive/warehouse/db_hive.db	moe	USER
Copy the code

Display database details, extended

hive (default)> desc database extended db_hive;
Copy the code

2.3. Switch the current database

hive (default)> use db_hive;
Copy the code

3. Modify the database

You can use the ALTER DATABASE command to set key-value pair attribute values for the DBPROPERTIES of a DATABASE to describe the attribute information for that DATABASE.

hive (default)> alter database db_hive set dbproperties('createtime'='20211120');
Copy the code

View the result in Hive

hive (default)> desc database extended db_hive;
db_name	comment	location	owner_name	owner_type	parameters
db_hive		hdfs://hadoop102:8020/user/hive/warehouse/db_hive.db	moe	USER	{createtime=20211120}
Copy the code

Delete the database

Delete an empty database

hive (default)> drop database db_hive;
Copy the code

If the deleted database does not exist, you are advised to use if exists to check whether the database exists

hive> drop database db_hive;
FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
hive> drop database if exists db_hive2;
Copy the code

If the database is not empty, run the cascade command to forcibly delete the database

hive> drop database db_hive;
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.)
hive> drop database db_hive cascade;
Copy the code

Create table

Build predicative method

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)]  [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)]  [AS select_statement]Copy the code

Field Description

CREATE TABLE Creates a TABLE with the specified name. If a table with the same name already exists, an exception is thrown; The user can ignore this exception with the IF NOT EXISTS option. (2) The EXTERNAL keyword allows users to create an EXTERNAL table. When creating a table, you can specify a LOCATION that points to the actual data. When deleting a table, the metadata and data of the internal table will be deleted together. (3) COMMENT: Add a COMMENT to the table and column. SORTED BY CLUSTERED BY PARTITIONED BY CLUSTERED BY CLUSTERED Sort one or more columns in the bucket separately. (7) ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] Users can customize serdes or use their own serdes when building tables. If ROW FORMAT or ROW FORMAT DELIMITED is not specified, the built-in SerDe is used. When building a table, you need to specify columns for the table. You can specify a custom SerDe as well as a table column. Hive uses SerDe to determine table column data. SerDe is Serialize or Deserilize. Hive uses SerDe to sequence and deserialize row objects. Common storage file types include SEQUENCEFILE, TEXTFILE, and RCFILE. If the file data is plain text, you can use STORED AS TEXTFILE. If the data needs to be compressed, use STORED AS SEQUENCEFILE. (9) LOCATION: specifies the storage LOCATION of the table in HDFS. (10) AS: followed by a query statement to create a table based on the query result. (11) LIKE allows users to copy existing table structures, but not data.Copy the code

5.1 Management table

The theory of

The tables created by default are so-called administrative tables, sometimes referred to as internal tables. Because of this table, Hive controls (more or less) the lifecycle of the data. Hive will by default the table data is stored in the configuration of the Hive. The metastore. Warehouse. The dir (/ user/Hive/warehouse), as defined by the subdirectory of the directory. When we delete a management table, Hive also deletes the data in the table. Administrative tables are not suitable for sharing data with other tools.

A case in field

The original data

1001 ss1 1002 ss2 1003 ss3 1004 ss4 1005 ss5 1006 ss6 1007 ss7 1008 ss8 1009 ss9 1010 ss10 1011 ss11 1012 ss12 1013 ss13  1014 ss14 1015 ss15 1016 ss16Copy the code

Common create table

create table if not exists student(
id int, name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/student';
Copy the code

Create tables based on query results (the results of the query are added to the newly created table)
```
create table if not exists student2 as select id, name from student;
Copy the code
```

Create tables based on existing table structures

create table if not exists student3 like student;
Copy the code

Type of query table

hive (default)> desc formatted student2;
Table Type: MANAGED_TABLE
Copy the code

5.2. External tables

The theory of

Because the table is an external table, Hive does not assume it owns the data entirely. Deleting the table does not delete this data, but the metadata that describes the table is deleted.
Manage usage scenarios for tables and external tables

Periodically upload the collected website logs to the HDFS text file every day. In the external table (original log table) on the basis of a large number of statistical analysis, the use of the intermediate table, the result table using internal table storage, data through SELECT+INSERT into the internal table.

A case in field

Create separate department and employee external tables and import data into the tables.

The original data

dept:

10	ACCOUNTING	1700
20	RESEARCH	1800
30	SALES	1900
40	OPERATIONS	1700
Copy the code

Emp:

7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19000.00 20 7839 KING PRESIDENT 1981-11-17000.00 10 7844 TURNER SALESMAN 7698 1981-9-81500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10Copy the code

Upload data to HDFS

Create table sentences and create external tables

Create a department table

create external table if not exists dept(
deptno int,
dname string,
loc int
)
row format delimited fields terminated by '\t';
Copy the code

Create employee table

create external table if not exists emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
row format delimited fields terminated by '\t';
Copy the code

View the created table

hive (default)>show tables;
Copy the code

View table formatting data

hive (default)> desc formatted dept;
Table Type: EXTERNAL_TABLE
Copy the code

Delete external table
```
hive (default)> drop table dept;
Copy the code
```
After the external table is deleted, the HDFS data is still in HDFS, but the metadata of dept is deleted

5.3 Manage the conversion between tables and external tables

Type of query table

hive (default)> desc formatted student2;
Table Type: MANAGED_TABLE
Copy the code

Example Change the internal table student2 to the external table

alter table student2 set tblproperties('EXTERNAL'='TRUE');
Copy the code

Type of query table

hive (default)> desc formatted student2;
Table Type: EXTERNAL_TABLE
Copy the code

Example Change the external table student2 to the internal table

alter table student2 set tblproperties('EXTERNAL'='FALSE');
Copy the code

Type of query table

hive (default)> desc formatted student2;
Table Type: MANAGED_TABLE
Copy the code

Note :(‘EXTERNAL’=’TRUE’) and (‘EXTERNAL’=’FALSE’) are case sensitive.

6. Modify the table

6.1. Rename tables

grammar

ALTER TABLE table_name RENAME TO new_table_name
Copy the code

In field case

hive (default)> alter table dept1 rename to dept2;
Copy the code

6.2. Add, modify, and delete table partitions

grammar

Update the list

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name
column_type [COMMENT col_comment] [FIRST|AFTER column_name]
Copy the code

Add and replace columns
```
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT
col_comment], ...)
Copy the code
```
ADD (‘ ADD ‘, ‘ADD’, ‘partition’);

REPLACE replaces all fields in the table.

In field case

Query the table structure
```
hive> desc dept;
Copy the code
```

Add columns

hive (default)> alter table dept add columns(deptdesc string);
Copy the code

Query the table structure
```
hive> desc dept;
Copy the code
```

Update the list

hive (default)> alter table dept change column deptdesc desc string;
Copy the code

Query the table structure
```
hive> desc dept;
Copy the code
```

Replace the column

hive (default)> alter table dept replace columns(deptno string, dname string, loc string);
Copy the code

Query the table structure
```
hive> desc dept;
Copy the code
```

6.3 Add, Modify, and replace column information

Drop table

hive (default)> drop table dept;
Copy the code

Three, friendship links

Big data Hive learning journey 1

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Big data Hive learning journey 2

Hive data type

1. Basic data types

2. Set data types

3. Type conversion

DDL data definition

1. Create a database

2, query database

2.1. Display database

2.2. View database details

2.3. Switch the current database

3. Modify the database

Delete the database

Create table

5.1 Management table

5.2. External tables

5.3 Manage the conversion between tables and external tables

6. Modify the table

6.1. Rename tables

6.2. Add, modify, and delete table partitions

6.3 Add, Modify, and replace column information

Drop table

Three, friendship links

Big data Hive learning journey 2

Hive data type

1. Basic data types

2. Set data types

3. Type conversion

DDL data definition

1. Create a database

2, query database

2.1. Display database

2.2. View database details

2.3. Switch the current database

3. Modify the database

Delete the database

Create table

5.1 Management table

5.2. External tables

5.3 Manage the conversion between tables and external tables

6. Modify the table

6.1. Rename tables

6.2. Add, modify, and delete table partitions

6.3 Add, Modify, and replace column information

Drop table

Three, friendship links

Related Posts

.NET Core application startup with middleware

Linux install PHP7+NGINX small white tutorial

The interface is blocked by SpringSecurity, how to use PostMan call