“This is the 21st day of my participation in the Gwen Challenge in November. See details of the event: The Last Gwen Challenge 2021”.
Hive data type
1. Basic data types
The Hive String type is equivalent to the database vARCHar type. This type is a variable String, but it cannot state how many characters it can store. It can theoretically store up to 2GB of characters.
2. Set data types
Hive supports three complex data types: ARRAY, MAP, and STRUCT. ARRAY and MAP are similar to ARRAY and MAP in Java, while STRUCT is similar to STRUCT in C in that it encapsulates a collection of named fields and allows arbitrary levels of nesting of complex data types.
Practical case:
1) Suppose a table has the following row, and we use JSON format to represent its data structure. In Hive, the format is
{
"name": "songsong"."friends": [
"bingbing"."lili"]."children": {
"xiao song": 18."xiaoxiao song": 19
},
"address": {
"street": "hui long guan"."city": "beijing"}}Copy the code
2) Based on the above data structure, we created corresponding tables in Hive and imported data.
Create a local test file named test.txt
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
Copy the code
Note that relationships between elements in maps, structs, and arrays can all be represented by the same character, in this case “_”.
3) Create the test table on Hive
create table test(
name string,
friends array<string>,
children map<string, int>,
address struct<street:string, city:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';
Copy the code
Field Description: Row format delimited fields terminated by ',' -- column delimiter Collection items terminated by '_' --MAP STRUCT and ARRAY Map keys terminated by ':' -- lines terminated for key and value in map terminated by '\n'; -- Line separatorCopy the code
4) Import text data into the test table
Load data local inpath '/opt/module/hive-3.1.2/datas/test. TXT 'into table test;Copy the code
5) Access data in three sets of columns. The following are ARRAY, MAP and STRUCT access methods respectively
hive (default)> select friends[1],children['xiao song'],address.city from test where name="songsong"; OK _c0_c1 city lili 18 Beijing Time taken: 0.076 seconds, Touch_type: 1 row(s)Copy the code
3. Type conversion
The atomic data types of Hive can be converted implicitly, similar to Java type conversion. For example, if an expression is of the INT type, TINYINT is automatically converted to INT. However, Hive does not convert the atomic data types of Hive. INT is not automatically converted to TINYINT and will return an error unless CAST is used.
1) Implicit type conversion rules are as follows
(1) Any integer type can be implicitly converted to a wider range of types, such as TINYINT to INT and INT to BIGINT.
(2) All integer types, floats, and strings can be implicitly converted to DOUBLE.
(3) TINYINT, SMALLINT, INT can be converted to FLOAT.
(4) BOOLEAN types cannot be converted to any other type.
2) Data type conversion can be performed using CAST operation display
For example, CAST(‘1’ AS INT) converts the string ‘1’ to the integer 1; If a CAST fails, such AS CAST(‘X’ AS INT), the expression returns NULL.
DDL data definition
1. Create a database
CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)] ;Copy the code
-
Create a database. The default storage path of the database in HDFS is /user/hive/warehouse/*.db.
hive (default)> create database db_hive; Copy the code
-
Add if not exists to avoid errors in the database to be created. (Standard writing)
hive (default)> create database db_hive; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists hive (default)> create database if not exists db_hive; Copy the code
-
Create a database and specify the location where the database is stored in the HDFS
hive (default)> create database db_hive2 location '/db_hive2.db'; Copy the code
2, query database
2.1. Display database
-
Display database
hive (default)> show databases; Copy the code
-
Filter the database that displays the query
hive> show databases like 'db_hive*'; Copy the code
2.2. View database details
-
Display database information
hive (default)> desc database db_hive; db_name comment location owner_name owner_type parameters db_hive hdfs://hadoop102:8020/user/hive/warehouse/db_hive.db moe USER Copy the code
-
Display database details, extended
hive (default)> desc database extended db_hive; Copy the code
2.3. Switch the current database
hive (default)> use db_hive;
Copy the code
3. Modify the database
You can use the ALTER DATABASE command to set key-value pair attribute values for the DBPROPERTIES of a DATABASE to describe the attribute information for that DATABASE.
hive (default)> alter database db_hive set dbproperties('createtime'='20211120');
Copy the code
View the result in Hive
hive (default)> desc database extended db_hive;
db_name comment location owner_name owner_type parameters
db_hive hdfs://hadoop102:8020/user/hive/warehouse/db_hive.db moe USER {createtime=20211120}
Copy the code
Delete the database
-
Delete an empty database
hive (default)> drop database db_hive; Copy the code
-
If the deleted database does not exist, you are advised to use if exists to check whether the database exists
hive> drop database db_hive; FAILED: SemanticException [Error 10072]: Database does not exist: db_hive hive> drop database if exists db_hive2; Copy the code
-
If the database is not empty, run the cascade command to forcibly delete the database
hive> drop database db_hive; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One or more tables exist.) hive> drop database db_hive cascade; Copy the code
Create table
Build predicative method
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] [AS select_statement]Copy the code
Field Description
CREATE TABLE Creates a TABLE with the specified name. If a table with the same name already exists, an exception is thrown; The user can ignore this exception with the IF NOT EXISTS option. (2) The EXTERNAL keyword allows users to create an EXTERNAL table. When creating a table, you can specify a LOCATION that points to the actual data. When deleting a table, the metadata and data of the internal table will be deleted together. (3) COMMENT: Add a COMMENT to the table and column. SORTED BY CLUSTERED BY PARTITIONED BY CLUSTERED BY CLUSTERED Sort one or more columns in the bucket separately. (7) ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] Users can customize serdes or use their own serdes when building tables. If ROW FORMAT or ROW FORMAT DELIMITED is not specified, the built-in SerDe is used. When building a table, you need to specify columns for the table. You can specify a custom SerDe as well as a table column. Hive uses SerDe to determine table column data. SerDe is Serialize or Deserilize. Hive uses SerDe to sequence and deserialize row objects. Common storage file types include SEQUENCEFILE, TEXTFILE, and RCFILE. If the file data is plain text, you can use STORED AS TEXTFILE. If the data needs to be compressed, use STORED AS SEQUENCEFILE. (9) LOCATION: specifies the storage LOCATION of the table in HDFS. (10) AS: followed by a query statement to create a table based on the query result. (11) LIKE allows users to copy existing table structures, but not data.Copy the code
5.1 Management table
-
The theory of
The tables created by default are so-called administrative tables, sometimes referred to as internal tables. Because of this table, Hive controls (more or less) the lifecycle of the data. Hive will by default the table data is stored in the configuration of the Hive. The metastore. Warehouse. The dir (/ user/Hive/warehouse), as defined by the subdirectory of the directory. When we delete a management table, Hive also deletes the data in the table. Administrative tables are not suitable for sharing data with other tools.
-
A case in field
-
The original data
1001 ss1 1002 ss2 1003 ss3 1004 ss4 1005 ss5 1006 ss6 1007 ss7 1008 ss8 1009 ss9 1010 ss10 1011 ss11 1012 ss12 1013 ss13 1014 ss14 1015 ss15 1016 ss16Copy the code
-
Common create table
create table if not exists student( id int, name string ) row format delimited fields terminated by '\t' stored as textfile location '/user/hive/warehouse/student'; Copy the code
-
Create tables based on query results (the results of the query are added to the newly created table)
create table if not exists student2 as select id, name from student; Copy the code
-
Create tables based on existing table structures
create table if not exists student3 like student; Copy the code
-
Type of query table
hive (default)> desc formatted student2; Table Type: MANAGED_TABLE Copy the code
-
5.2. External tables
-
The theory of
Because the table is an external table, Hive does not assume it owns the data entirely. Deleting the table does not delete this data, but the metadata that describes the table is deleted.
-
Manage usage scenarios for tables and external tables
Periodically upload the collected website logs to the HDFS text file every day. In the external table (original log table) on the basis of a large number of statistical analysis, the use of the intermediate table, the result table using internal table storage, data through SELECT+INSERT into the internal table.
-
A case in field
Create separate department and employee external tables and import data into the tables.
-
The original data
dept:
10 ACCOUNTING 1700 20 RESEARCH 1800 30 SALES 1900 40 OPERATIONS 1700 Copy the code
Emp:
7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19000.00 20 7839 KING PRESIDENT 1981-11-17000.00 10 7844 TURNER SALESMAN 7698 1981-9-81500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10Copy the code
-
Upload data to HDFS
-
Create table sentences and create external tables
Create a department table
create external table if not exists dept( deptno int, dname string, loc int ) row format delimited fields terminated by '\t'; Copy the code
Create employee table
create external table if not exists emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int) row format delimited fields terminated by '\t'; Copy the code
-
View the created table
hive (default)>show tables; Copy the code
-
View table formatting data
hive (default)> desc formatted dept; Table Type: EXTERNAL_TABLE Copy the code
-
Delete external table
hive (default)> drop table dept; Copy the code
After the external table is deleted, the HDFS data is still in HDFS, but the metadata of dept is deleted
-
5.3 Manage the conversion between tables and external tables
-
Type of query table
hive (default)> desc formatted student2; Table Type: MANAGED_TABLE Copy the code
-
Example Change the internal table student2 to the external table
alter table student2 set tblproperties('EXTERNAL'='TRUE'); Copy the code
-
Type of query table
hive (default)> desc formatted student2; Table Type: EXTERNAL_TABLE Copy the code
-
Example Change the external table student2 to the internal table
alter table student2 set tblproperties('EXTERNAL'='FALSE'); Copy the code
-
Type of query table
hive (default)> desc formatted student2; Table Type: MANAGED_TABLE Copy the code
Note :(‘EXTERNAL’=’TRUE’) and (‘EXTERNAL’=’FALSE’) are case sensitive.
6. Modify the table
6.1. Rename tables
-
grammar
ALTER TABLE table_name RENAME TO new_table_name Copy the code
-
In field case
hive (default)> alter table dept1 rename to dept2; Copy the code
6.2. Add, modify, and delete table partitions
-
grammar
-
Update the list
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] Copy the code
-
Add and replace columns
ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) Copy the code
ADD (‘ ADD ‘, ‘ADD’, ‘partition’);
REPLACE replaces all fields in the table.
-
-
In field case
-
Query the table structure
hive> desc dept; Copy the code
-
Add columns
hive (default)> alter table dept add columns(deptdesc string); Copy the code
-
Query the table structure
hive> desc dept; Copy the code
-
Update the list
hive (default)> alter table dept change column deptdesc desc string; Copy the code
-
Query the table structure
hive> desc dept; Copy the code
-
Replace the column
hive (default)> alter table dept replace columns(deptno string, dname string, loc string); Copy the code
-
Query the table structure
hive> desc dept; Copy the code
-
6.3 Add, Modify, and replace column information
Drop table
hive (default)> drop table dept;
Copy the code
Three, friendship links
Big data Hive learning journey 1