Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Hive is a data warehouse tool based on Hadoop. Hive maps structured data files into a table and provides SQL-like query functions

Hive document address: cwiki.apache.org/confluence/…

table

Hive tables have the following types of tables

  • MANAGED_TABLE internal table
  • EXTERNAL_TABLE external tables

Table application scenarios

MANAGED_TABLE

  • Also known as internal tables tables that are managed by Hive. Data and metadata are all managed by Hive
  • When a table is dropped, both the table structure and data are deleted

EXTERNAL_TABLE

  • External table data is also shared to external hive applications, such as Spark and Flink
  • Data protection: Only the table structure is deleted when a table is deleted, and data is stored in the HDFS

Table creation

Create a table, and specify the row separator as comma, if not delimited, default to a SOH, and then all data together, there is no way to further analysis

create table  test7(id int,name string) row format delimited fields terminated by ', ';

insert into test7 values(10001."zhangsan")
Copy the code

View table structure

desc formatted test7;
Copy the code

Modify the name of the table

An error is reported if the new table name exists

alter table test1 rename to test2;
Copy the code

Modifying column information

Change the column name. The column name type cannot be forcibly converted and can only be upgraded. For example, a string cannot be changed to an int

Alter table test2 change id new_id int; Alter table test2 add columns(email string); Alter table test2 replace columns(ID string,gender string) alter table test2 replace columns(ID string,gender string)Copy the code

Table type change

The keyword is case sensitive and must be uppercase

Change the internal table to the external table

alter table table_name set tblproperties('EXTERNAL'='TRUE')
Copy the code

Change the external table to the internal table

alter table table_name set tblproperties('EXTERNAL'='FALSE')
Copy the code

Data manipulation

Load the data

Load the data when the table is created, or import the data after the table is built

The data content needs to conform to the table structure

Load Data Loads data
Load data inpath from HDFS to hive. Load data inpath from HDFS to hive'/test/dmp.txt'into table test1; Load data local inpath load data local inpath'/root/dmp.txt' into table test1;
Copy the code
Import data from other tables
insert into table test7 select * from test1;
Copy the code
Specify load data when creating the table
create external table if not exists test10(id int,name string)
for format delimited fields terminated by '\t'
location '/test/data10.txt'
Copy the code
Load data from another table when creating a table
create table test11 as select id,name from test12;
Copy the code

Export data

Exporting data requires few operations but loading more data

To export generally according to the query conditions, write to the local, because the export is all the export

Export the query results to the local PC
insert overwirte local directory '/tmp/test2.txt' select * from test2;
Copy the code
The query results are formatted and exported to the local PC, separated by Spaces
insert overwirte local directory '/tmp/test2.txt' 
FOR FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
select * from test2;
Copy the code
Send the query results to HDFS and separate them with commas
insert overwirte  directory '/tmp/test2.txt' 
FOR FORMAT DELIMITED FIELDS TERMINATED BY ', ' 
select * from test2;
Copy the code
Import and export, data migration
export table default.test7 to '/user/hive/warehouse/export/test7'
import table default.test7 from '/user/hive/warehouse/export/test7'
Copy the code

Clear data and drop tables

# display desc formatted table_name; Truncate table table_name; # remove external table data hadoop fs - rm -r -f/user/hive/warehouse/test. The db/talbe_name # delete table (internal table will delete all data and table structure, Alter table table_name drop partition(load_date= load_date'2019-01-01')
Copy the code

The partition table

A partition table corresponds to an independent folder on an HDFS file system. Hive partitions are subdirectories that divide a large amount of data into small folders and store them in a directory

Dynamic partitioning

The advantage of dynamic partitioning is that it is not necessary to add a partition each time data is written, but it is handled automatically. If you partition by day, and write data every day, you have to partition beforehand, but the number of tables is too large, this is really inefficient

Cwiki.apache.org/confluence/…

Create a partitioned table

create table if not exists test5(name string) partitioned by(age int) row format delimited fields terminated by ' ' lines terminated by '\n';
Copy the code

Data import partition table must first execute the following statements in Hive.

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;
Copy the code

Data can be imported into a partitioned table in static or dynamic partitioning modes

Insert into table test5 partition(age= 125) select name fromtest6; Insert overwrite table test5 partition select name,agefrom test6;
Copy the code

Partition by day example (static partition)

---------txt110 ACCOUNTING 1700
20 RESEARCH   1800
---------txt2
30 SALES 1900
40 OPERATIONS 1700
---------txt3
50 TEST 2000
60 DEV  1900
Copy the code
Create table dept_PAR (deptno int,dname string, loc string) partitioned by (day string) row format delimited fields terminated by'\t'; Load data local inpath'/opt/dept1.txt' into table dept_par partition(day='the 20221-09-01'); Select * from single partitionfrom dept_par where day='2021-09-01'; # select * from 'select *'from dept_par where day='2021-09-01'
union
select * from dept_par where day='2021-09-02'
union
select * from dept_par where day='2021-09-03';
----
select * from dept_par where day='2021-09-01' or day='2021-09-02' or day='2021-09-03'; Alter table dept_par add partition(day='2021-09-04'); Alter table dept_PAR DROP partition(day='2021-09-04'Show partitions dept_PAR desc fromatted dept_PAR desc fromatted dept_PAR Create table dept_par2(deptno int,dname string, loc string) partitioned by (day string,hour string) row format delimited fields terminated by'\t';
Copy the code