Welcome to my GitHub

Github.com/zq2599/blog…

Content: all original article classification summary and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

Hive Learning Notes series navigation

  1. Basic data types
  2. Complex data types
  3. Internal and external tables
  4. The partition table
  5. Points barrels
  6. HiveQL basis
  7. Built-in function
  8. Sqoop
  9. Based the UDF
  10. User-defined aggregation functions (UDAF)
  11. UDTF

This paper gives an overview of

  • In this article, you will learn about various types of tables and their features.
  1. To build libraries
  2. Internal tables (also called administrative tables or temporary tables)
  3. External tables
  4. The operation of the table

Next, start with the most basic library building

To build libraries

  1. Create a database named test (created only if it does not exist) and add remarks to test database:
create database if not exists test 
comment 'this is a database for test';
Copy the code
  1. View database list (name fuzzy matching) :
hive> show databases like 't*';
OK
test
test001
Time taken: 0.016 seconds, Fetched: 2 row(s)
Copy the code
  1. Describe database command to view this database information:
hive> describe database test;
OK
test	this is a database for test	hdfs://node0:8020/user/hive/warehouse/test.db	hadoop	USER	
Time taken: 0.035 seconds, Fetched: 1 row(s)
Copy the code
  1. The above command, the test database in the storage location on the HDFS is HDFS: / / node0:8020 / user/hive/warehouse/test. The db, open the hadoop web page, view the HDFS directory, as the chart, the path of the folder is created, And it ends in.db:

5. The new database folder is under /user/hive/warehouse, which is configured in the following figure:

If exists, do not return an Error if a database does not exist:

hive> drop database if exists test;
OK
Time taken: 0.193 seconds
Copy the code

Above is the common library related operations, next practice table related operations;

The inner table

  1. According to the life cycle of table data, tables can be divided into internal tables and external tables.
  2. Internal tables are also called management tables or temporary tables. The life cycle of this type of table is controlled by hive. By default, data is stored in /user/hive/warehouse/.
  3. Data is deleted when a table is dropped.
  4. The tables created in the previous two articles were all internal tables:
create table t6(id int, name string)
row format delimited 
fields terminated by ', ';
Copy the code
  1. Add a new record to table t6:
insert into t6 values (101.'a101');
Copy the code
  1. Run the hadoop command to check HDFS. It can be seen that table T6 has a corresponding folder in which the file stores the table data:
[hadoop@node0 bin]$ ./hadoop fs -ls /user/hive/warehouse/t6
Found 1 items
-rwxr-xr-x   3 hadoop supergroup          9 2020-10-31 11:14 /user/hive/warehouse/t6/000000_0
Copy the code
  1. 000000_0 = 000000_0 = 000000_0
[hadoop@node0 bin]$ ./hadoop fs -cat /user/hive/warehouse/t6/000000_0
101	a101
Copy the code
  1. Run drop table t6. Alter table t6 alter table t6 alter table t6 alter table t6 alter table t6
[hadoop@node0 bin]$ ./hadoop fs -ls /user/hive/warehouse/
Found 5 items
drwxr-xr-x   - hadoop supergroup          0 2020-10-27 20:42 /user/hive/warehouse/t1
drwxr-xr-x   - hadoop supergroup          0 2020-10-29 00:13 /user/hive/warehouse/t2
drwxr-xr-x   - hadoop supergroup          0 2020-10-29 00:14 /user/hive/warehouse/t3
drwxr-xr-x   - hadoop supergroup          0 2020-10-29 13:04 /user/hive/warehouse/t4
drwxr-xr-x   - hadoop supergroup          0 2020-10-29 16:47 /user/hive/warehouse/t5
Copy the code

External tables

  1. SQL > create table external;
  2. The data life cycle of external tables is not controlled by Hive.
  3. [Fixed] External table drop does not delete data
  4. External table data can be shared as a data source for multiple external tables at the same time.
  5. Let’s get started. Here are the building sentences:
create external table t7(id int, name string)
row format delimited 
fields terminated by ', '
location '/data/external_t7';
Copy the code
  1. Check the HDFS file. The directory /data/external_t7/ has been created:
[hadoop@node0 bin]$ ./hadoop fs -ls /data/
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2020-10-31 12:02 /data/external_t7
Copy the code
  1. Add a new record:
insert into t7 values (107.'a107');
Copy the code
  1. Check the data file corresponding to table T7 in HDFS, you can see the new content:
[hadoop@node0 bin]$ ./hadoop fs -ls /data/external_t7
Found 1 items
-rwxr-xr-x   3 hadoop supergroup          9 2020-10-31 12:06 /data/external_t7/000000_0
[hadoop@node0 bin]$ ./hadoop fs -cat /data/external_t7/000000_0
107,a107
Copy the code
  1. Alter table t8 create table t8; alter table t8 create table t8;
create external table t8(id_t8 int, name_t8 string)
row format delimited 
fields terminated by ', '
location '/data/external_t7';
Copy the code
  1. Mysql > alter TABLE T8 create table t7;
hive> select * from t8;
OK
107	a107
Time taken: 0.068 seconds, Fetched: 1 row(s)
hive> select * from t7;
OK
107	a107
Time taken: 0.074 seconds, Fetched: 1 row(s)
Copy the code
  1. Alter table t7 alter table T8 alter table T7 alter table T8
hive> drop table t7;
OK
Time taken: 1.053 seconds
hive> select * from t8;
OK
107	a107
Time taken: 0.073 seconds, Fetched: 1 row(s)
Copy the code
  1. Delete table t8 as well, and then look at the datafile as follows:
[hadoop@node0 bin]$ ./hadoop fs -cat /data/external_t7/000000_0
107,a107
Copy the code
  1. It can be seen that the data of the external table will not be deleted when the table is deleted. Therefore, in the development of the actual production business system, the external table is our main application table type.

The operation of the table

  1. Create table T8 again:
create table t8(id int, name string)
row format delimited 
fields terminated by ', ';
Copy the code
  1. Alter table name:
alter table t8 rename to t8_1;
Copy the code
  1. Alter table name change effect:
hive> alter table t8 rename to t8_1; OK Time taken: 0.473 seconds hive> show tables; OK allType t1 t2 t3 t4 t5 t6 t8_1 values__tmp__table__1 values__tmp__table__2 Time taken: 0.029 seconds, touch_type: 10 row(s)Copy the code
  1. Add field:
alter table t8_1 add columns(remark string);
Copy the code

SQL > select * from table_name;

hive> desc t8_1;
OK
id                  	int                 	                    
name                	string              	                    
remark              	string              	                    
Time taken: 0.217 seconds, Fetched: 3 row(s)
Copy the code

Now that we have a basic understanding of internal and external tables, the next article will look at another common table class: partitioned tables;

You are not alone, Xinchen original accompany all the way

  1. Java series
  2. Spring series
  3. The Docker series
  4. Kubernetes series
  5. Database + middleware series
  6. The conversation series

Welcome to pay attention to the public number: programmer Xin Chen

Wechat search “programmer Xin Chen”, I am Xin Chen, looking forward to enjoying the Java world with you…

Github.com/zq2599/blog…