Metadata management

Metadata management is the core of data warehouse. It not only defines what data warehouse has, but also indicates the content and location of data in the data warehouse, describes the rules of data extraction and transformation, and stores all kinds of business information related to the topic of data warehouse. This article introduces Hive Hook and MetaStore Listener. These functions can be used for automatic metadata management. You can learn from this article:

Metadata management
Hive Hooks and Metastore Listeners
Hive Hooks Basic use
Metastore Listeners are basically used

Metadata definition

Traditionally, Metadata is data about data. Metadata connects source data, data warehouse and data application, and records the whole process of data from generation to consumption. Metadata mainly records the definition of the model in the data warehouse, the mapping relationship between different levels, monitoring the data state of the data warehouse and the task running state of ETL. In the data warehouse system, metadata can help the data warehouse administrator and developers to find the data they care about very conveniently, which can be used to guide their data management and development work, and improve the work efficiency. Metadata is divided into two categories based on its purpose: Technical Metadata and Business Metadata. Technical metadata is data that stores technical details about a data warehouse system and is used to develop and manage data warehouse use.

Metadata classification

Technical metadata

Distributed computing systems store metadata

For example, Hive tables, columns, and partitions. Records the table name of the table. Partition information, owner information, file size, table type, and column field name, field type, field remarks, and whether the column is a partition field.

Distributed computing system running metadata

Hive – like Job logs include Job types, instance names, input and output, SQL, running parameters, and execution time.
Task scheduling metadata

Dependency types and dependency relationships of tasks, and run logs of different types of scheduled tasks.

Business metadata

Business metadata describes the data in a data warehouse from a business perspective. It provides a semantic layer between the user and the actual system, enabling business people who do not understand computer technology to “read” the data in a data warehouse. Common business metadata include: standardized definitions of dimensions and attributes, business processes, indicators, etc. for better management and use of data; Data application metadata, such as configuration and operation metadata of data reports and data products.

Metadata application

The real value of data lies in data-driven decisions that guide operations. With a data-driven approach, we can identify trends and take effective action to help us identify problems and drive innovation or solutions. This is data-driven operations. Similarly, metadata can be used to guide data related personnel to carry out daily work and realize data-based “operation”. For example, data users can quickly find the data they need through metadata. For ETL engineers, metadata can be used to guide their daily ETL work, such as model design, task optimization and task downsizing. O&m engineers can use metadata to guide them to perform o&M tasks such as storage, computing, and system optimization of the entire cluster.

Hive Hooks and Metastore Listeners

Hive Hooks

As for data governance and metadata management frameworks, there are many open source systems in the industry, such as Apache Atlas. These open source software can meet the requirements of metadata management in complex scenarios. Apache Atlas uses Hive Hooks for metadata management. You need to perform the following configuration:

<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook<value/>
</property>
Copy the code

Use hooks to listen to Hive events, such as creating and modifying tables, and push the collected data to Kafka in a specific format. Finally, consume metadata and store it.

Hive Hooks classification

So what are Hooks?

Hooks are event and message mechanisms that bind events to internal Hive execution flows without recompiling Hive. Hooks provide a way to extend and inherit external components. Depending on the Hook type, it can be run at different stages. The types of Hooks are as follows:

hive.exec.pre.hooks

Called before the query is executed by the execution engine, as indicated by the name. This can be used only after Hive has optimized the query plan. Use the Hooks need to implement interface: org). Apache hadoop. Hive. Ql. Hooks. ExecuteWithHookContext, hive on – site. In XML configuration is as follows:

<property>
    <name>hive.exec.pre.hooks</name>
    <value>Fully qualified name of the implementation class<value/>
</property>
Copy the code

hive.exec.post.hooks

Called at the end of the execution plan before the result is returned to the user. When used to implement interface: org. Apache. Hadoop. Hive. Ql. Hooks. ExecuteWithHookContext, hive on – site. In XML configuration is as follows:

<property>
    <name>hive.exec.post.hooks</name>
    <value>Fully qualified name of the implementation class<value/>
</property>
Copy the code

hive.exec.failure.hooks

Called after the execution plan failed. When used to implement interface: org. Apache. Hadoop. Hive. Ql. Hooks. ExecuteWithHookContext, hive on – site. In XML configuration is as follows:

<property>
    <name>hive.exec.failure.hooks</name>
    <value>Fully qualified name of the implementation class<value/>
</property>
Copy the code

hive.metastore.init.hooks

HMSHandler initialization is called. When used to implement interface: org. Apache. Hadoop. Hive. Metastore. MetaStoreInitListener, hive on – site. In XML configuration is as follows:

<property>
    <name>hive.metastore.init.hooks</name>
    <value>Fully qualified name of the implementation class<value/>
</property>
Copy the code

hive.exec.driver.run.hooks

In the Driver. The run start or end time, used to implement interface: org.. Apache hadoop. Hive. Ql. HiveDriverRunHook, hive on – site. In XML configuration is as follows:

<property>
    <name>hive.exec.driver.run.hooks</name>
    <value>Fully qualified name of the implementation class<value/>
</property>
Copy the code

hive.semantic.analyzer.hook

Called when Hive performs semantic analysis on query statements. Need to be integrated when using abstract classes: org). Apache hadoop. Hive. Ql. Parse. AbstractSemanticAnalyzerHook, hive on – site. In XML configuration is as follows:

<property>
    <name>hive.semantic.analyzer.hook</name>
    <value>Fully qualified name of the implementation class<value/>
</property>
Copy the code

Advantages and disadvantages of Hive Hooks

advantages
- You can easily embed or run custom code at various query phases
- Can be used to update metadata
disadvantages
- When Hooks are used, retrieved metadata often requires further parsing or is difficult to understand
- The query process is affected

For Hive Hooks, this article gives a use case for hive.exec.post.hook, which runs after the query executes but before the result is returned.

Metastore Listeners

Metastore Listeners refer to Hive Metastore Listeners. Users can customize code to use metadata listening.

When we look at the source of the HiveMetaStore class, we will find: When creating HiveMetaStore init () method, at the same time created three Listener, MetaStorePreEventListener, respectively, MetaStoreEventListener and MetaStoreEndFunctionListener Listener listens for events in each step.

public class HiveMetaStore extends ThriftHiveMetastore {
    / /... Omit code
    public static class HMSHandler extends FacebookBase implements
            IHMSHandler {
        / /... Omit code
        public void init(a) throws MetaException {
            / /... Omit code
            / / get MetaStorePreEventListener
            preListeners = MetaStoreUtils.getMetaStoreListeners(MetaStorePreEventListener.class,
                    hiveConf,
                    hiveConf.getVar(HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS));
            / / get MetaStoreEventListener
            listeners = MetaStoreUtils.getMetaStoreListeners(MetaStoreEventListener.class,
                    hiveConf,
                    hiveConf.getVar(HiveConf.ConfVars.METASTORE_EVENT_LISTENERS));
            listeners.add(new SessionPropertiesListener(hiveConf));
            / / get MetaStoreEndFunctionListener
            endFunctionListeners = MetaStoreUtils.getMetaStoreListeners(
                    MetaStoreEndFunctionListener.class, 
                    hiveConf,
                    hiveConf.getVar(HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS));
            / /... Omit code}}}Copy the code

Metastore Listeners classification

hive.metastore.pre.event.listeners

This abstract class needs to be extended to provide implementations of actions that need to be performed before a particular event occurs on MetaStore. These methods are called before an event occurs on MetaStore.

The need when using inherited abstract class: org.. Apache hadoop. Hive. Metastore. MetaStorePreEventListener, in the hive – site. In XML configuration as follows:

 <property>
    <name>hive.metastore.pre.event.listeners</name>
    <value>Fully qualified name of the implementation class</value> 
  </property>
Copy the code

hive.metastore.event.listeners

This abstract class needs to be extended to provide an implementation of the actions that need to be performed when a specific event occurs on MetaStore. These methods are called whenever an event occurs on Metastore.

The need when using inherited abstract class: org.. Apache hadoop. Hive. Metastore. MetaStoreEventListener, in the hive – site. In XML configuration as follows:

  <property>
    <name>hive.metastore.event.listeners</name>
    <value>Fully qualified name of the implementation class</value> 
  </property>
Copy the code

hive.metastore.end.function.listeners

These methods are called whenever the function ends.

The need when using inherited abstract class: org.. Apache hadoop. Hive. Metastore. MetaStoreEndFunctionListener, in the hive – site. In XML configuration as follows:

<property>
    <name>hive.metastore.end.function.listeners</name>
    <value>Fully qualified name of the implementation class</value> 
</property>
Copy the code

Metastore Listeners are good and bad

advantages
- The metadata has been parsed and is easy to understand
- Does not affect the query process, is read-only
disadvantages
- Inflexible, only objects belonging to the current event can be accessed

For metastore Listener, the use case of MetaStoreEventListener was presented in this paper, and two methods were implemented: onCreateTable and onAlterTable

Hive HooksThe basic use

code

The specific implementation code is as follows:

public class CustomPostHook implements ExecuteWithHookContext {
    private static final Logger LOGGER = LoggerFactory.getLogger(CustomPostHook.class);
    // Store the SQL operation type of Hive
    private static final HashSet<String> OPERATION_NAMES = new HashSet<>();

    // HiveOperation is an enumeration class that encapsulates Hive SQL operation types
    // Monitor SQL operation types
    static {
        / / table
        OPERATION_NAMES.add(HiveOperation.CREATETABLE.getOperationName());
        // Modify database properties
        OPERATION_NAMES.add(HiveOperation.ALTERDATABASE.getOperationName());
        // Change the database owner
        OPERATION_NAMES.add(HiveOperation.ALTERDATABASE_OWNER.getOperationName());
        // Modify table attributes to add columns
        OPERATION_NAMES.add(HiveOperation.ALTERTABLE_ADDCOLS.getOperationName());
        // Modify table properties, table storage path
        OPERATION_NAMES.add(HiveOperation.ALTERTABLE_LOCATION.getOperationName());
        // Modify table attributes
        OPERATION_NAMES.add(HiveOperation.ALTERTABLE_PROPERTIES.getOperationName());
        // Rename the table
        OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAME.getOperationName());
        // Rename the column
        OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAMECOL.getOperationName());
        // Update the column by deleting the current column and then adding the new column
        OPERATION_NAMES.add(HiveOperation.ALTERTABLE_REPLACECOLS.getOperationName());
        // Create database
        OPERATION_NAMES.add(HiveOperation.CREATEDATABASE.getOperationName());
        // Delete the database
        OPERATION_NAMES.add(HiveOperation.DROPDATABASE.getOperationName());
        / / delete table
        OPERATION_NAMES.add(HiveOperation.DROPTABLE.getOperationName());
    }

    @Override
    public void run(HookContext hookContext) throws Exception {
        assert (hookContext.getHookType() == HookType.POST_EXEC_HOOK);
        // Execute the plan
        QueryPlan plan = hookContext.getQueryPlan();
        // Operation name
        String operationName = plan.getOperationName();
        logWithHeader(SQL statement executed: + plan.getQueryString());
        logWithHeader("Operation name:" + operationName);
        if(OPERATION_NAMES.contains(operationName) && ! plan.isExplain()) { logWithHeader("Monitor SQL operations");

            Set<ReadEntity> inputs = hookContext.getInputs();
            Set<WriteEntity> outputs = hookContext.getOutputs();

            for (Entity entity : inputs) {
                logWithHeader(Hook Metadata input value: + toJson(entity));
            }

            for (Entity entity : outputs) {
                logWithHeader(Hook metadata output value:+ toJson(entity)); }}else {
            logWithHeader("Not monitored, ignore the hook!"); }}private static String toJson(Entity entity) throws Exception {
        ObjectMapper mapper = new ObjectMapper();
        // Entity type
        // Include:
        // DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
        switch (entity.getType()) {
            case DATABASE:
                Database db = entity.getDatabase();
                return mapper.writeValueAsString(db);
            case TABLE:
                return mapper.writeValueAsString(entity.getTable().getTTable());
        }
        return null;
    }

    /** * Log format **@param obj
     */
    private void logWithHeader(Object obj) {
        LOGGER.info("[CustomPostHook][Thread: " + Thread.currentThread().getName() + "] |"+ obj); }}Copy the code

Usage Procedure Explanation

Compile the above code into a jar package and place it in the $HIVE_HOME/lib directory, or use the Hive client to add the jar package:

0: JDBC: hive2: / / localhost: 10000 > add jar/opt/softwares/com JMX. Hive - 1.0 - the SNAPSHOT. Jar;Copy the code

Configure hive-site. XML file. For convenience, use client command to configure:

0: jdbc:hive2://localhost:10000> set hive.exec.post.hooks=com.jmx.hooks.CustomPostHook;
Copy the code

View table operation

In the above code we monitor some operations, which trigger some custom code (such as logging) when monitored. When we type the following command in Hive’s Beeline client:

0: jdbc:hive2://localhost:10000> show tables;
Copy the code

In the $HIVE_HOME/logs/hive.log file you can see:

[CustomPostHook] [Thread: f25 cab9a763 - c63e - 4-9 f9a - affacb3cecdb main] | execute SQL statements: show tables [CustomPostHook] [Thread: Cab9a763 f25 c63e - 4-9 f9a - affacb3cecdb main] | operation name: SHOWTABLES [CustomPostHook] [Thread: Cab9a763 f25 c63e - 4-9 f9a - affacb3cecdb main] | to be outside the scope of monitoring, ignore the hook!Copy the code

The above table view operation is not monitored, so there is no corresponding metadata log.

Build table operation

When we create a table in Hive’s Beeline client, it looks like this:

CREATE TABLE testposthook(
  id intCOMMENT "id", name string COMMENT "iD")ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/';
Copy the code

View hive.log:

The above Hook metastore output value has two: the first is the metadata information of the database, the second is the metadata information of the table

Database metadata

{
    "name":"default"."description":"Default Hive database"."locationUri":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."parameters":{

    },
    "privileges":null."ownerName":"public"."ownerType":"ROLE"."setParameters":true."parametersSize":0."setOwnerName":true."setOwnerType":true."setPrivileges":false."setName":true."setDescription":true."setLocationUri":true
}
Copy the code

Table metadata

{
    "tableName":"testposthook"."dbName":"default"."owner":"anonymous"."createTime":1597985444."lastAccessTime":0."retention":0."sd": {"cols": []."location":null."inputFormat":"org.apache.hadoop.mapred.SequenceFileInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe"."parameters": {"serialization.format":"1"
            },
            "setSerializationLib":true."setParameters":true."parametersSize":1."setName":false
        },
        "bucketCols": []."sortCols": []."parameters":{

        },
        "skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{

            },
            "skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0."setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0
        },
        "storedAsSubDirectories":false."colsSize":0."setParameters":true."parametersSize":0."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."setSkewedInfo":true."colsIterator": []."setCompressed":false."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":false."setCols":true."setLocation":false."setInputFormat":true
    },
    "partitionKeys": []."parameters":{

    },
    "viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."partitionKeysSize":0."setDbName":true."setSd":true."setParameters":true."setCreateTime":true."setLastAccessTime":false."parametersSize":0."setTableName":true."setPrivileges":false."setOwner":true."setPartitionKeys":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setRetention":false."partitionKeysIterator": []."setTemporary":false."setRewriteEnabled":false
}
Copy the code

The ** COLs []** column has no data, that is, there is no information about field ID and field name when the table is created. To obtain this information, run the following command:

ALTER TABLE testposthook
 ADD COLUMNS (age int COMMENT 'age');
Copy the code

Observe the log information again:

In the log above, Hook MetaStore has only one input and one output: both represent the metadata information of the table.

The input

{
    "tableName":"testposthook"."dbName":"default"."owner":"anonymous"."createTime":1597985445."lastAccessTime":0."retention":0."sd": {"cols":[
            {
                "name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
            },
            "setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
        },
        "bucketCols": []."sortCols": []."parameters":{

        },
        "skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{

            },
            "skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0."setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0
        },
        "storedAsSubDirectories":false."colsSize":2."setParameters":true."parametersSize":0."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."setSkewedInfo":true."colsIterator":[
            {
                "name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setCols":true."setLocation":true."setInputFormat":true
    },
    "partitionKeys": []."parameters": {"transient_lastDdlTime":"1597985445"."comment":"Create table _ test Hive Hooks"."totalSize":"0"."numFiles":"0"
    },
    "viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."partitionKeysSize":0."setDbName":true."setSd":true."setParameters":true."setCreateTime":true."setLastAccessTime":true."parametersSize":4."setTableName":true."setPrivileges":false."setOwner":true."setPartitionKeys":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setRetention":true."partitionKeysIterator": []."setTemporary":false."setRewriteEnabled":true
}
Copy the code

The **”cols”** column contains the field metadata, so let’s look at output json:

The output

{
    "tableName":"testposthook"."dbName":"default"."owner":"anonymous"."createTime":1597985445."lastAccessTime":0."retention":0."sd": {"cols":[
            {
                "name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
            },
            "setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
        },
        "bucketCols": []."sortCols": []."parameters":{

        },
        "skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{

            },
            "skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0."setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0
        },
        "storedAsSubDirectories":false."colsSize":2."setParameters":true."parametersSize":0."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."setSkewedInfo":true."colsIterator":[
            {
                "name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setCols":true."setLocation":true."setInputFormat":true
    },
    "partitionKeys": []."parameters": {"transient_lastDdlTime":"1597985445"."comment":"Create table _ test Hive Hooks"."totalSize":"0"."numFiles":"0"
    },
    "viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."partitionKeysSize":0."setDbName":true."setSd":true."setParameters":true."setCreateTime":true."setLastAccessTime":true."parametersSize":4."setTableName":true."setPrivileges":false."setOwner":true."setPartitionKeys":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setRetention":true."partitionKeysIterator": []."setTemporary":false."setRewriteEnabled":true
}
Copy the code

The Output object does not contain the new column AGE, which represents metadata information prior to modifying the table

Metastore ListenersThe basic use

code

The specific implementation code is as follows:

public class CustomListener extends MetaStoreEventListener {
    private static final Logger LOGGER = LoggerFactory.getLogger(CustomListener.class);
    private static final ObjectMapper objMapper = new ObjectMapper();

    public CustomListener(Configuration config) {
        super(config);
        logWithHeader(" created ");
    }

    // listen for table build operations
    @Override
    public void onCreateTable(CreateTableEvent event) {
        logWithHeader(event.getTable());
    }
    // listen for table modification operations
    @Override
    public void onAlterTable(AlterTableEvent event) {
        logWithHeader(event.getOldTable());
        logWithHeader(event.getNewTable());
    }

    private void logWithHeader(Object obj) {
        LOGGER.info("[CustomListener][Thread: " + Thread.currentThread().getName() + "] |" + objToStr(obj));
    }

    private String objToStr(Object obj) {
        try {
            return objMapper.writeValueAsString(obj);
        } catch (IOException e) {
            LOGGER.error("Error on conversion", e);
        }
        return null; }}Copy the code

Usage Procedure Explanation

Hive Hooks interact with Hiveserver, whereas listeners interact with Metastore, that is, they run in the Metastore process. The specific usage is as follows:

Jar package in $HIVE_HOME/lib, then configure hive-site. XML:

<property>
    <name>hive.metastore.event.listeners</name>
    <value>com.jmx.hooks.CustomListener</value>
    <description/>
 </property>
Copy the code

After the configuration, restart the metadata service:

bin/hive --service metastore &
Copy the code

Build table operation

CREATE TABLE testlistener(
  id intCOMMENT "id", name string COMMENT "iD")COMMENT "create table _ test Hive Listener"ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/';
Copy the code

View hive.log:

{
    "tableName":"testlistener"."dbName":"default"."owner":"anonymous"."createTime":1597989316."lastAccessTime":0."retention":0."sd": {"cols":[
            {
                "name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
            },
            "setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
        },
        "bucketCols": []."sortCols": []."parameters":{

        },
        "skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{

            },
            "setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0."skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0
        },
        "storedAsSubDirectories":false."setCols":true."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."colsSize":2."colsIterator":[
            {
                "name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setParameters":true."setLocation":true."setInputFormat":true."parametersSize":0."setSkewedInfo":true
    },
    "partitionKeys": []."parameters": {"transient_lastDdlTime":"1597989316"."comment":"Create a table _ Test Hive Listener"."totalSize":"0"."numFiles":"0"
    },
    "viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges": {"userPrivileges": {"anonymous":[
                {
                    "privilege":"INSERT"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true
                },
                {
                    "privilege":"SELECT"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true
                },
                {
                    "privilege":"UPDATE"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true
                },
                {
                    "privilege":"DELETE"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true}},"groupPrivileges":null."rolePrivileges":null."setUserPrivileges":true."setGroupPrivileges":false."setRolePrivileges":false."userPrivilegesSize":1."groupPrivilegesSize":0."rolePrivilegesSize":0
    },
    "temporary":false."rewriteEnabled":false."setParameters":true."setPartitionKeys":true."partitionKeysSize":0."setSd":true."setLastAccessTime":true."setRetention":true."partitionKeysIterator": []."parametersSize":4."setTemporary":true."setRewriteEnabled":false."setTableName":true."setDbName":true."setOwner":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setPrivileges":true."setCreateTime":true
}
Copy the code

When we perform the modify table operation again

ALTER TABLE testlistener
 ADD COLUMNS (age int COMMENT 'age');
Copy the code

Observe the log again:

The first record is the information of the old table, and the second is the information of the modified table.

old table

{
    "tableName":"testlistener"."dbName":"default"."owner":"anonymous"."createTime":1597989316."lastAccessTime":0."retention":0."sd": {"cols":[
            {
                "name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
            },
            "setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
        },
        "bucketCols": []."sortCols": []."parameters":{

        },
        "skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{

            },
            "setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0."skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0
        },
        "storedAsSubDirectories":false."setCols":true."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."colsSize":2."colsIterator":[
            {
                "name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setParameters":true."setLocation":true."setInputFormat":true."parametersSize":0."setSkewedInfo":true
    },
    "partitionKeys": []."parameters": {"totalSize":"0"."numFiles":"0"."transient_lastDdlTime":"1597989316"."comment":"Create a table _ Test Hive Listener"
    },
    "viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."setParameters":true."setPartitionKeys":true."partitionKeysSize":0."setSd":true."setLastAccessTime":true."setRetention":true."partitionKeysIterator": []."parametersSize":4."setTemporary":false."setRewriteEnabled":true."setTableName":true."setDbName":true."setOwner":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setPrivileges":false."setCreateTime":true
}
Copy the code

new table

{
    "tableName":"testlistener"."dbName":"default"."owner":"anonymous"."createTime":1597989316."lastAccessTime":0."retention":0."sd": {"cols":[
            {
                "name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true
            },
            {
                "name":"age"."type":"int"."comment":"Age"."setComment":true."setType":true."setName":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
            },
            "setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
        },
        "bucketCols": []."sortCols": []."parameters":{

        },
        "skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{

            },
            "setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0."skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0
        },
        "storedAsSubDirectories":false."setCols":true."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."colsSize":3."colsIterator":[
            {
                "name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
            },
            {
                "name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true
            },
            {
                "name":"age"."type":"int"."comment":"Age"."setComment":true."setType":true."setName":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setParameters":true."setLocation":true."setInputFormat":true."parametersSize":0."setSkewedInfo":true
    },
    "partitionKeys": []."parameters": {"totalSize":"0"."last_modified_time":"1597989660"."numFiles":"0"."transient_lastDdlTime":"1597989660"."comment":"Create a table _ Test Hive Listener"."last_modified_by":"anonymous"
    },
    "viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."setParameters":true."setPartitionKeys":true."partitionKeysSize":0."setSd":true."setLastAccessTime":true."setRetention":true."partitionKeysIterator": []."parametersSize":6."setTemporary":false."setRewriteEnabled":true."setTableName":true."setDbName":true."setOwner":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setPrivileges":false."setCreateTime":true
}
Copy the code

As you can see, the metadata information of the modified table contains the newly added column AGE.

conclusion

In this article, we showed you how to manipulate metadata in Hive to automate metadata management. We’ve shown you the basic use of Hive Hooks and Metastore Listeners to help implement operational metadata. You can also push this metadata information into Kafka to build your own metadata management system.

The public account “Big Data Technology and Data Warehouse”, reply to “information” to receive the big data data package

Metadata management | Hive Hooks and Metastore listener is introduced