1. Data types
1.1 Basic data types
type | HIVE | MySQL | JAVA | The length of the | example |
---|---|---|---|---|---|
The integer | TINYINT | TINYINT | byte | 1-byte integer | 2 |
SMALINT | SMALINT | short | 2-byte integer | 20 | |
INT/INTEGER | INT | int | 4-byte integer | 20 | |
BIGINT | BIGINT | long | 8-byte integer | 20 | |
String type | STRING | VARCHAR | string | String type | “Hello world” |
VARCHAR | VARCHAR | string | String type | ||
CHAR | VARCHAR | string | String type | ||
floating-point | FLOAT | FLOAT | float | Single-precision floating point number | 3.14159 |
DOUBLE | DOUBLE | double | A double – precision floating – point number | 3.14159 | |
Time to type | TIMESTAMP | TIMESTAMP | Time to type | ||
DATE | DATE | DATE | Time to type | ||
Other types of | BINARY | BINARY | binary | ||
BOOLEAN | There is no | boolean | Boolean type | TRUE/FALSE |
1.2 Collection data types
The data type | describe | Syntax examples |
---|---|---|
MAP | MAP<primitive_type, data_type> | The map (” key1 “, 1) |
ARRAY | ARRAY<data_type> | Array (” aaa “, “BBB”, “CCC”) |
STRUCT | STRUCT<col_name : data_type1, data_type2,data_type3> | Struct (” a “, 1, 2, 3) |
Hive arrays and maps are similar to Arrays and maps in Java, and structs are similar to STRUCts in C. They encapsulate a collection of named fields. Complex data types can be nested at any level.
Declarations of complex data types must use < Angle brackets > to indicate the type of the data fields in them:
create table t_eg(
col1 ARRAY<INT>,
col2 MAP<STRING,INT>,
col3 STRUCT<a:STRING,b:INT,c:DOUBLE>);
Copy the code
We use JSON format to represent the data structure:
{" name ":" jack ", "friends" : [" Sam ", "van"], / / list Array, "children" : {/ / key/value Map, "chale" : 18, "davil" : 19} "address" : Struct, "street": "Beijing road ", "phone": 13631230887, "money": 246.4}}Copy the code
1.3 Type Conversion
Hive atomic data types can be converted implicitly. For example, if an expression uses an INT, TINYINT is automatically converted to an INT.
However, Hive does not perform the reverse conversion unless the CAST operation is used.
1.3.1 Implicit Type Conversion rules
- Any integer type can be implicitly converted to a wider range of integer types, such as TINYINT — >INT, INT — >BIGINT.
- All integer types, floats, and strings can be implicitly converted to DOUBLE.
- TINYINT, SMALLINT, INT can all be converted to FLOAT.
- BOOLEAN types cannot be converted to any other type.
1.3.2 Use CAST to display conversion data
CAST(‘1’ AS INT), will convert the string ‘1’ to the integer 1
2. Common DDL operations
2.1 the Database
2.1.1 Viewing the Data List
show databases;
Copy the code
2.1.2 Using the Database
USE database_name;
Copy the code
2.1.3 Creating a Database
Grammar:
The CREATE DATABASE | SCHEMA [IF NOT EXISTS] database_name [COMMENT database_comment] - DATABASE annotation [LOCATION hdfs_path] - stored in the HDFS [WITH DBPROPERTIES (property_name=property_value,...)] ; -- Specify additional attributesCopy the code
Example:
CREATE DATABASE IF NOT EXISTS hive_test COMMENT 'Hive DATABASE for test' WITH DBPROPERTIES (' CREATE '=' DBPROPERTIES ');Copy the code
2.1.4 Viewing database Information
Grammar:
DESC DATABASE [EXTENDED] database_name; EXTENDED indicates whether additional attributes are displayedCopy the code
Example:
DESC DATABASE EXTENDED hive_test;
Copy the code
2.1.5 Deleting a Database
Grammar:
DROP DATABASE|SCHEMA [IF EXISTS] database_name RESTRICT|CASCADE; The default behavior is RESTRICT. If a table exists in the database, the drop fails. To drop a library and its tables, use CASCADE.Copy the code
Example:
DROP DATABASE IF EXISTS hive_test CASCADE;
Copy the code
2.2 create a table
2.2.1 internal table
CREATE TABLE emp(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2), deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";Copy the code
2.2.2 the external table
CREATE EXTERNAL TABLE emp_external(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2), deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_external';Copy the code
2.2.3 the partition table
CREATE EXTERNAL TABLE emp_partition( empno INT, ename STRING, job STRING, mgr INT, hiredate TIMESTAMP, Sal DECIMAL (7, 2), Comm DECIMAL(7,2)) -- PARTITIONED BY (deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_partition';Copy the code
2.2.4 barrel table
CREATE EXTERNAL TABLE emp_bucket(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL (7, 2), Deptno INT) - Hash the employee numbers INTO four BUCKETS CLUSTERED BY(empno) SORTED BY(empno ASC) INTO 4 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_bucket';Copy the code
2.2.5 tilt table
By specifying one or more frequently skewed values for columns, Hive automatically splits the data involving those values into separate files.
When querying, if skew values are involved, it gets the data directly from individual files, rather than scanning all files, which improves performance.
But do not specify the number of skewed tables. But do not specify the number of skewed tables. Comm DECIMAL(7,2)) -- specify the SKEWED value 66,88,100 but BY (empno) ON (66,88,100) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" LOCATION '/hive/emp_skewed';Copy the code
2.2.6 temporary table
The temporary table is visible only to the current session. The data in the temporary table is stored in the user’s temporary directory and deleted after the session ends.
If the temporary table has the same name as the permanent table, any reference to the table name resolves to a temporary table, not a permanent table.
Temporary tables also have the following two limitations:
- Partitioned columns are not supported;
- Index creation is not supported.
CREATE TEMPORARY TABLE emp_temp(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2)) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";Copy the code
2.2.7 CTAS Creating a table
Create a table from the result of the query:
CREATE TABLE emp_copy AS SELECT * FROM emp WHERE deptno='20';
Copy the code
2.2.8 Copying the Table Structure
Grammar:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] target_table_name -- CREATE the TABLE name LIKE src_table_name -- The TABLE name of the replicated TABLE [LOCATION hdfs_path]; -- Storage locationCopy the code
Example:
CREATE TEMPORARY EXTERNAL TABLE IF NOT EXISTS emp_co LIKE emp
Copy the code
2.2.9 Loading Data to a table
Load local data into a table:
load data local inpath "D:/project/emp.txt" into table emp;
Copy the code
The contents of emp.txt:
7369 SMITH CLERK 7902 1980-12-17 00:00:00 800.00 20
7499 ALLEN SALESMAN 7698 1981-02-20 00:00:00 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-02-22 00:00:00 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-04-02 00:00:00 2975.00 20
7654 MARTIN SALESMAN 7698 1981-09-28 00:00:00 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850.00 30
7782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450.00 10
7788 SCOTT ANALYST 7566 1987-04-19 00:00:00 1500.00 20
7839 KING PRESIDENT 1981-11-17 00:00:00 5000.00 10
7844 TURNER SALESMAN 7698 1981-09-08 00:00:00 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-05-23 00:00:00 1100.00 20
7900 JAMES CLERK 7698 1981-12-03 00:00:00 950.00 30
7902 FORD ANALYST 7566 1981-12-03 00:00:00 3000.00 20
7934 MILLER CLERK 7782 1982-01-23 00:00:00 1300.00 10
Copy the code
2.3 modify the table
2.3.1 Renaming a Table
Grammar:
ALTER TABLE table_name RENAME TO new_table_name;
Copy the code
Example:
ALTER TABLE emp_temp RENAME TO new_emp; Rename the emp_temp table to new_empCopy the code
2.3.2 modify column
Grammar:
ALTER TABLE table_name CHANGE
[COLUMN]col_old_name [COLUMN]col_new_name column_type [COMMENT]col_comment;
Copy the code
Example:
ALTER TABLE emp_temp CHANGE EMPno EMPno_new INT; ALTER TABLE emp_temp CHANGE sal sal_new decimal(7,2) AFTER ename; ALTER TABLE emp_temp CHANGE MGR mgr_new INT COMMENT 'this is column MGR ';Copy the code
2.3.3 the new column
Example:
ALTER TABLE emp_temp ADD COLUMNS (address STRING COMMENT 'home address');
Copy the code
2.4 Clearing or deleting a table
Against 2.4.1 empty table
Grammar:
TRUNCATE TABLE table_name [PARTITION (PARTItion_column = PARTItion_col_value, ...)] ;Copy the code
Example:
TRUNCATE TABLE emp_mgt_ptn PARTITION (deptno=20);
Copy the code
Delete table 2.4.2
Grammar:
DROP TABLE [IF EXISTS] table_name [PURGE];
Copy the code
- Internal table: the metadata of the table and data in the HDFS are deleted.
- External table: only metadata of the table is deleted, but data in the HDFS is not deleted.
2.5 Other Commands
2.5.1 the Describe
View database:
Desc DATABASE [EXTENDED] db_name; -- Whether EXTENDED shows additional attributesCopy the code
See table:
Desc [EXTENDED|FORMATTED] table_name; Display the table details in a friendly mannerCopy the code
2.5.2 Show
Viewing the database list
Grammar:
SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
Copy the code
Example:
SHOW DATABASES like 'hive*';
Copy the code
View the list of tables
Grammar:
SHOW TABLES [IN database_name] ['identifier_with_wildcards'];
Copy the code
Example:
SHOW TABLES IN default;
Copy the code
View the partition list for the table
SHOW PARTITIONS table_name;
Copy the code
View the table/view creation statement
SHOW CREATE TABLE (table_name|view_name);
Copy the code
3. Common DML operations
3.1 Loading file data into tables
Grammar:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
Copy the code
-
The LOCAL keyword indicates that the file is loaded from the LOCAL file system. If omitted, the file is loaded from HDFS.
-
Filepath can be either a filepath (in which case Hive moves files to a table) or a directory path (in which case Hive moves all files in that directory to a table). When loading a file from a local file system, filepath can be an absolute path or a relative path (absolute path is recommended). When loading a file from HDFS, filepath is the complete URL of the file: Such as HDFS: / / the namenode: port/user/hive/project/data1
-
If the OVERWRITE keyword is used, the contents of the target table (or partition) are deleted and populated with new data; If this keyword is not used, it is appended.
-
The loading target can be a table or partition. If it is a partitioned table, you must specify the partition to load the data.
-
The format of the loading file must be the same AS that used in STORED AS when the table was built.
Example:
CREATE TABLE emp_ptn(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2) deptno INT) PARTITIONED BY (deptno INT) -- partition BY department number ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";Copy the code
Load data from HDFS to partition table:
LOAD DATA INPATH "hdfs://hadoop001:8020/mydir/emp.txt" OVERWRITE INTO TABLE emp_ptn PARTITION (deptno=20);
Copy the code
Select * from deptno; select * from deptno;
3.2 Inserting query results into a table
Presentation data preparation:
CREATE TABLE emp(empno INT, ename STRING, job STRING, MGR INT, hiredate TIMESTAMP, sal DECIMAL(7,2), Comm DECIMAL(7,2), deptno INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"; Load data local inpath "/usr/file/emp. TXT "into table emp;Copy the code
3.2.1 Static Partition
Grammar:
INSERT OVERWRITE/INTO TABLE table_name PARTITION (partcol1[=val1], partcol2[=val2] ...) select statement FROM from_statement;
Copy the code
Partition column names are required when inserting data into partitioned tables. Having partitioned column values is called static partitioning.
Example:
INSERT OVERWRITE TABLE emp_ptn PARTITION (deptno=20) SELECT empno,ename,job,mgr,hiredate,sal,comm FROM emp WHERE deptno=20;
Copy the code
3.2.2 Dynamic Partitioning
Grammar:
INSERT OVERWRITE/INTO TABLE table_name PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
Copy the code
Partition column names are required when inserting data into partitioned tables. The absence of partitioned column values is called dynamic partitioning. Dynamically partitioned columns must be specified last in the column of the SELECT statement and in the same order as they appear in the PARTITION() clause.
Example:
INSERT OVERWRITE TABLE emp_ptn PARTITION (deptno) SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno FROM emp WHERE deptno=30;
Copy the code
3.3 Inserting Data using SQL Statements
Grammar:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES ( value... ] )Copy the code
3.4 Updating and Deleting Data
Grammar:
UPDATE tablename SET column = value [column = value...] [WHERE expression] -- DELETE DELETE FROM tablename [WHERE expression]Copy the code
The update and delete syntax is simple and consistent with a relational database, but it is important to note that both operations can only be performed on tables that support ACID, namely transaction tables.
3.5 Examples of Inserting, updating, and Deleting Data
Modify hive-site. XML and add the following configuration to enable transaction support. After the configuration, restart the Hive service.
<property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.enforce.bucketing</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> </property> <property> <name>hive.compactor.initiator.on</name> <value>true</value> </property> <property> <name>hive.in.test</name> <value>true</value> </property> Copies the codeCopy the code
The transactional table is then created for testing using the transactional = true attribute. Note that transaction tables in Hive have the following restrictions according to the official documentation:
- They must be buckets Table.
- Only ORC file format is supported.
- LOAD DATA is not supported… Statements.
CREATE TABLE emp_ts(
empno int,
ename String
)
CLUSTERED BY (empno) INTO 2 BUCKETS STORED AS ORC
TBLPROPERTIES ("transactional"="true");
Copy the code
INSERT INTO TABLE emp_ts VALUES (1," Ming "),(2,"hong");Copy the code
The MapReduce job is used to insert data. After the job is successfully executed, the data is as follows:
UPDATE emp_ts SET ename = "LAN" WHERE empno=1; DELETE FROM emp_ts WHERE empno=2;Copy the code
The MapReduce job is also used to update and delete data. After the MapReduce job is successfully executed, the data is as follows:
3.6 Writing Query Results to the File System
Grammar:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format]
Copy the code
- OVERWRITE indicates that if the output file exists, delete it first and then write it again.
- As with the Load statement, it is recommended that both the local path and the URL address be used fully;
- Data written to the file system is serialized as text, with columns separated by ^A and rows separated by newline characters by default.
- If the column is not a basic type, serialize it to JSON format.
- Row delimiters are not allowed to be customized, but column delimiters are.
Example:
Here we export the emp_pTN table created above to the local file system with the following statement:
INSERT OVERWRITE LOCAL DIRECTORY '/usr/file-ouput' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' -- define column separators as '\t' SELECT * FROM emp_ptn;Copy the code