Hive is different from Mysql

Hive does not support row-level insert, update, and delete operations.

Hive does not support transactions.

2. Hive database

The Hive database concept is essentially just a directory or namespace for a table.

/ / 1, the DATABASE directory for: hive. Metastore. Warehouse. Dir / / 2, CREATE a DATABASE: CREATE DATABASE financials. // CREATE DATABASE IF NOT EXISTS financials; // 4, check database: SHOW DATABASES; SHOW DATABASES LIKE 'f.*'; // CREATE DATABASE financials LOCATION '/my/preferred/directory'; // 6, switch work database: USE financials; (Hive v0.8.0, can modify the current working database as the default database, set Hive. The cli. Print the current. The db = true) // DROP DATABASE IF EXISTS financials;Copy the code

// DROP DATABASE IF EXISTS financials CASCADE; ALTER DATABASE set key-value pair attribute values for DBPROPERTIES of a DATABASE. ALTER DATABASES financials SET DBPROPERTIES('edited-by' = 'Joe Dba') // 10, ALTER DATABASES financials SET DBPROPERTIES('edited-by' = 'Joe Dba') ALTER TABLE log_messages RENAME TO logmsgs; // change the name of a field to a new one. ALTER TABLE log_messages CHANGE COLUMN HMS hours_minutes_seconds INT COMMENT 'The hours, minutes, and seconds part of the timestamp' AFTER severity; ALTER TABLE log_messages ADD COLUMNS(app_name STRING COMMENT 'Application name', session_id LONG COMMENT 'The current session id' ); ALTER TABLE log_messages REPLACE COLUMNS(hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp', severity STRING COMMENT 'The message severity' message STRING COMMENT 'The rest of the message' ); ALTER TABLE log_messages SET TBLPROPERTIES(' Notes' = 'The process ID is no longer captured; this column is always NULL' ); ALTER TABLE log_messages PARTITION(year = 2012, month =1, day =1) SET FILEFORMAT SEQUENCEFILE; ALTER TABLE log_messages PARTITION(year = 2012, month =1, day =1) SET FILEFORMAT SEQUENCEFILE;Copy the code

Third, partition table, management table

Data partitioning: Partitioning is commonly used to distribute stress horizontally, physically move data closer to the most frequent users, and for other purposes.

First by country, then by state

CREATE TABLE employees(
	name	STRING,
	salary	FLOAT,
	subordinates	ARRAY<STRING>,
	deductions	MAP<STRING, FLOAT>,
	adress	STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING)

Copy the code

Partition tables change the way Hive organizes data stores.

Contrast:

(1) If we created the table in myDB, there would be only one employees directory for the table:

hdfs://master_server/user/hive/warehouse/mydb.db/employees

Copy the code

(2) However, Hive will now create subdirectories that reflect the partition structure. Such as:

. . /employees/country=CA/state=AB ... /employees/country=CA/state=BC ... . /employees/country=US/state=AL ... /employees/country=US/state=AK ...Copy the code

When we asked all employees in the U.S. state of Illinois:

SELECT * FROM employees WHERE country  = 'US' AND state = 'IL';

Copy the code

Faster, so partitioning significantly improves query performance.

However, if the full query data is very large, a huge MapReduce task will be performed.

It is recommended that Hive be set to “strict” mode. If there is no WHERE filtering, this task is forbidden:

Set hive.mapred.mode=strict // SHOW PARTITIONS to check all existing PARTITIONS in the table: SHOW PARTITION employees; SHOW PARTITIONS employees PARTITION(country='US') SHOW PARTITIONS employees PARTITION(country='US', state='AK')Copy the code
ALTER TABLE log_messages ADD PARTITION(year = 2012,month = 1,day = 2) LOCATION 'hdfs://master_server/data/log_message/2012/01/02';Copy the code