Hive Data storage

1. Underlying data

  • The underlying data is stored in distributed storage (HDFS).

2. Metadata

  • Metadata information is maintained in MetaStore.
  • Hive uses Derby as the default metadata storage database. Derby cannot be accessed by multiple clients at the same time. Therefore, you need to replace Derby with MySQL.

Hive related metadata table

1. Hive version table

(1) the version table

field meaning
VER_ID Id primary key
SCHEMA_VERSION Version of the Hive
VERSION_COMMENT Release notes

2. Database table

(1) the DBS table

field meaning
DB_ID The database ID
DESC Database Description
DB_LOCATION_URI HDFS database path
NAME Hive Database Name
OWNER_NAME Hive database owner username
OWNER_TYPE Hive Owner Role

3. Tables and visual charts

(1) the TBLS table

field meaning
TBL_ID Table ID
CREATE_TIME Creation time
DB_ID The database ID
LAST_ACCESS_TIME Last visit time
OWNER The owner of the
RETENTION Keep field
SD_ID Serialization configuration information
TBL_NAME The name of the table
TBL_TYPE Table type
VIEW_EXPANDED_TEXT View HQL details
VIEW_ORIGINAL_TEXT View’s original HQL

(2) TTABLE_PARAMS table

field meaning
TBL_ID Table ID
PARAM_KEY Table attribute name
PARAM_VALUE The table attribute value

4. File storage table

(1) the SDS table

field meaning
SD_ID Storage Information ID
CD_ID Field Information ID
INPUT_FORMAT File input format
IS_COMPRESSED Whether the compression
IS_STOREDASSUBDIRECTORIES Whether to store in a subdirectory
LOCATION HDFS path
NUM_BUCKETS Points barrels
OUTPUT_FORMAT File output format
SERDE_ID Serialized class ID

(2) the SERDES table

field meaning
SERDE_ID Serialized class configuration ID
NAME Serialized category name
SLIB Serialization class

(3) SERDE_PARAMS table

field meaning
SERDE_ID Serialized class configuration ID
PARAM_KEY The property name
PARAM_VALUE Attribute values

5. Field table

(1) COLUMNS_V2 table

field meaning
CD_ID Field Information ID
COMMENT Annotation fields
COLUMN_NAME The field name
TYPE_NAME The field type
INTEGER_IDX The field order

6, partition table

(1) table PARTITIONS

field meaning
PART_ID The partition ID
CREATE_TIME Partition creation time
LAST_ACCESS_TIME Time of last visit
PART_NAME The partition name
SD_ID Partition storage ID
TBL_ID Table ID

(2) PARTITION_KEYS table

field meaning
TBL_ID Table ID
PKEY_COMMENT Partition field name description
PKEY_NAME Partition field name
PKEY_TYPE Partition field type
INTEGER_IDX Partition field order

(3) PARTITION_KEY_VALS table

field meaning
PART_ID The partition ID
PART_KEY_VAL Partition field value
INTEGER_IDX Partition field order

(4) PARTITION_PARAMS table

field meaning
PART_ID The partition ID
PARAM_KEY Partition attribute name
PARAM_VALUE Partition attribute value

Three, the application

1. Delete all metadata information based on the tablename passed in

  • (1) Stored procedure:

  • The stored procedure name is T1, and the input is tbl_id

  •   DELIMITER // 
      drop procedure if exists t1;
      create  procedure t1 ( in tbinput int) 
      begin
          declare v_sd_id int ;
          declare v_part_id int ;
          declare v_cd_id int ;
          declare v_serde_id int ;
    
          select tbinput;
          select SD_ID into v_sd_id from tbls where TBL_ID = tbinput;
          select part_id  into v_part_id from partitions where tbl_id = tbinput;
          select cd_id , serde_id into v_cd_id,v_serde_id from sds where sd_id = v_sd_id;
          select v_sd_id,v_part_id,v_cd_id,v_serde_id;
    
          if v_part_id is not  null then 
              delete from partition_params where part_id = v_part_id;
              delete from partition_key_vals where part_id = v_part_id;
          end if;
    
          delete from serdes where serde_id = v_serde_id;
          delete from serde_params where serde_id = v_serde_id;
          delete from columns_v2 where cd_id = v_cd_id;
          delete from sds where sd_id = v_sd_id;
          delete from partitions where tbl_id = tbinput;
          delete from partition_keys where tbl_id = tbinput;
          delete from table_params where tbl_id = tbinput;
          delete from tbls where tbl_id = tbinput;
      end ;
      //
      delimiter ; 
    
      mysql> call t1(tbl_id);
    Copy the code
  • (2) Shell encapsulation:

  • hive_mysql_delete.sh #! /bin/bash source /usr/local/mysql.bash_profile mysql -uroot -p password Hive database -e "SET FOREIGN_KEY_CHECKS = 0; call t1(tbl_id); SET FOREIGN_KEY_CHECKS = 1; quit" >> /tmp/mysql_delete.logCopy the code