Metadata management is the core of data warehouse. It not only defines what data warehouse has, but also indicates the content and location of data in the data warehouse, describes the rules of data extraction and transformation, and stores all kinds of business information related to the topic of data warehouse. This article introduces Hive Hook and MetaStore Listener. These functions can be used for automatic metadata management. You can learn from this article:
- Metadata management
- Hive Hooks and Metastore Listeners
- Hive Hooks Basic use
- Metastore Listeners are basically used
Metadata management
Metadata definition
Traditionally, Metadata is data about data. Metadata connects source data, data warehouse and data application, and records the whole process of data from generation to consumption. Metadata mainly records the definition of the model in the data warehouse, the mapping relationship between different levels, monitoring the data state of the data warehouse and the task running state of ETL. In the data warehouse system, metadata can help the data warehouse administrator and developers to find the data they care about very conveniently, which can be used to guide their data management and development work, and improve the work efficiency. Metadata is divided into two categories based on its purpose: Technical Metadata and Business Metadata. Technical metadata is data that stores technical details about a data warehouse system and is used to develop and manage data warehouse use.
Metadata classification
Technical metadata
- Distributed computing systems store metadata
For example, Hive tables, columns, and partitions. Records the table name of the table. Partition information, owner information, file size, table type, and column field name, field type, field remarks, and whether the column is a partition field.
-
Distributed computing system running metadata
Hive – like Job logs include Job types, instance names, input and output, SQL, running parameters, and execution time.
-
Task scheduling metadata
Dependency types and dependency relationships of tasks, and run logs of different types of scheduled tasks.
Business metadata
Business metadata describes the data in a data warehouse from a business perspective. It provides a semantic layer between the user and the actual system, enabling business people who do not understand computer technology to “read” the data in a data warehouse. Common business metadata include: standardized definitions of dimensions and attributes, business processes, indicators, etc. for better management and use of data; Data application metadata, such as configuration and operation metadata of data reports and data products.
Metadata application
The real value of data lies in data-driven decisions that guide operations. With a data-driven approach, we can identify trends and take effective action to help us identify problems and drive innovation or solutions. This is data-driven operations. Similarly, metadata can be used to guide data related personnel to carry out daily work and realize data-based “operation”. For example, data users can quickly find the data they need through metadata. For ETL engineers, metadata can be used to guide their daily ETL work, such as model design, task optimization and task downsizing. O&m engineers can use metadata to guide them to perform o&M tasks such as storage, computing, and system optimization of the entire cluster.
Hive Hooks and Metastore Listeners
Hive Hooks
As for data governance and metadata management frameworks, there are many open source systems in the industry, such as Apache Atlas. These open source software can meet the requirements of metadata management in complex scenarios. Apache Atlas uses Hive Hooks for metadata management. You need to perform the following configuration:
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook<value/>
</property>
Copy the code
Use hooks to listen to Hive events, such as creating and modifying tables, and push the collected data to Kafka in a specific format. Finally, consume metadata and store it.
Hive Hooks classification
So what are Hooks?
Hooks are event and message mechanisms that bind events to internal Hive execution flows without recompiling Hive. Hooks provide a way to extend and inherit external components. Depending on the Hook type, it can be run at different stages. The types of Hooks are as follows:
- hive.exec.pre.hooks
Called before the query is executed by the execution engine, as indicated by the name. This can be used only after Hive has optimized the query plan. Use the Hooks need to implement interface: org). Apache hadoop. Hive. Ql. Hooks. ExecuteWithHookContext, hive on – site. In XML configuration is as follows:
<property>
<name>hive.exec.pre.hooks</name>
<value>Fully qualified name of the implementation class<value/>
</property>
Copy the code
- hive.exec.post.hooks
Called at the end of the execution plan before the result is returned to the user. When used to implement interface: org. Apache. Hadoop. Hive. Ql. Hooks. ExecuteWithHookContext, hive on – site. In XML configuration is as follows:
<property>
<name>hive.exec.post.hooks</name>
<value>Fully qualified name of the implementation class<value/>
</property>
Copy the code
- hive.exec.failure.hooks
Called after the execution plan failed. When used to implement interface: org. Apache. Hadoop. Hive. Ql. Hooks. ExecuteWithHookContext, hive on – site. In XML configuration is as follows:
<property>
<name>hive.exec.failure.hooks</name>
<value>Fully qualified name of the implementation class<value/>
</property>
Copy the code
- hive.metastore.init.hooks
HMSHandler initialization is called. When used to implement interface: org. Apache. Hadoop. Hive. Metastore. MetaStoreInitListener, hive on – site. In XML configuration is as follows:
<property>
<name>hive.metastore.init.hooks</name>
<value>Fully qualified name of the implementation class<value/>
</property>
Copy the code
- hive.exec.driver.run.hooks
In the Driver. The run start or end time, used to implement interface: org.. Apache hadoop. Hive. Ql. HiveDriverRunHook, hive on – site. In XML configuration is as follows:
<property>
<name>hive.exec.driver.run.hooks</name>
<value>Fully qualified name of the implementation class<value/>
</property>
Copy the code
- hive.semantic.analyzer.hook
Called when Hive performs semantic analysis on query statements. Need to be integrated when using abstract classes: org). Apache hadoop. Hive. Ql. Parse. AbstractSemanticAnalyzerHook, hive on – site. In XML configuration is as follows:
<property>
<name>hive.semantic.analyzer.hook</name>
<value>Fully qualified name of the implementation class<value/>
</property>
Copy the code
Advantages and disadvantages of Hive Hooks
- advantages
- You can easily embed or run custom code at various query phases
- Can be used to update metadata
- disadvantages
- When Hooks are used, retrieved metadata often requires further parsing or is difficult to understand
- The query process is affected
For Hive Hooks, this article gives a use case for hive.exec.post.hook, which runs after the query executes but before the result is returned.
Metastore Listeners
Metastore Listeners refer to Hive Metastore Listeners. Users can customize code to use metadata listening.
When we look at the source of the HiveMetaStore class, we will find: When creating HiveMetaStore init () method, at the same time created three Listener, MetaStorePreEventListener, respectively, MetaStoreEventListener and MetaStoreEndFunctionListener Listener listens for events in each step.
public class HiveMetaStore extends ThriftHiveMetastore {
/ /... Omit code
public static class HMSHandler extends FacebookBase implements
IHMSHandler {
/ /... Omit code
public void init(a) throws MetaException {
/ /... Omit code
/ / get MetaStorePreEventListener
preListeners = MetaStoreUtils.getMetaStoreListeners(MetaStorePreEventListener.class,
hiveConf,
hiveConf.getVar(HiveConf.ConfVars.METASTORE_PRE_EVENT_LISTENERS));
/ / get MetaStoreEventListener
listeners = MetaStoreUtils.getMetaStoreListeners(MetaStoreEventListener.class,
hiveConf,
hiveConf.getVar(HiveConf.ConfVars.METASTORE_EVENT_LISTENERS));
listeners.add(new SessionPropertiesListener(hiveConf));
/ / get MetaStoreEndFunctionListener
endFunctionListeners = MetaStoreUtils.getMetaStoreListeners(
MetaStoreEndFunctionListener.class,
hiveConf,
hiveConf.getVar(HiveConf.ConfVars.METASTORE_END_FUNCTION_LISTENERS));
/ /... Omit code}}}Copy the code
Metastore Listeners classification
- hive.metastore.pre.event.listeners
This abstract class needs to be extended to provide implementations of actions that need to be performed before a particular event occurs on MetaStore. These methods are called before an event occurs on MetaStore.
The need when using inherited abstract class: org.. Apache hadoop. Hive. Metastore. MetaStorePreEventListener, in the hive – site. In XML configuration as follows:
<property>
<name>hive.metastore.pre.event.listeners</name>
<value>Fully qualified name of the implementation class</value>
</property>
Copy the code
- hive.metastore.event.listeners
This abstract class needs to be extended to provide an implementation of the actions that need to be performed when a specific event occurs on MetaStore. These methods are called whenever an event occurs on Metastore.
The need when using inherited abstract class: org.. Apache hadoop. Hive. Metastore. MetaStoreEventListener, in the hive – site. In XML configuration as follows:
<property>
<name>hive.metastore.event.listeners</name>
<value>Fully qualified name of the implementation class</value>
</property>
Copy the code
- hive.metastore.end.function.listeners
These methods are called whenever the function ends.
The need when using inherited abstract class: org.. Apache hadoop. Hive. Metastore. MetaStoreEndFunctionListener, in the hive – site. In XML configuration as follows:
<property>
<name>hive.metastore.end.function.listeners</name>
<value>Fully qualified name of the implementation class</value>
</property>
Copy the code
Metastore Listeners are good and bad
- advantages
- The metadata has been parsed and is easy to understand
- Does not affect the query process, is read-only
- disadvantages
- Inflexible, only objects belonging to the current event can be accessed
For metastore Listener, the use case of MetaStoreEventListener was presented in this paper, and two methods were implemented: onCreateTable and onAlterTable
Hive HooksThe basic use
code
The specific implementation code is as follows:
public class CustomPostHook implements ExecuteWithHookContext {
private static final Logger LOGGER = LoggerFactory.getLogger(CustomPostHook.class);
// Store the SQL operation type of Hive
private static final HashSet<String> OPERATION_NAMES = new HashSet<>();
// HiveOperation is an enumeration class that encapsulates Hive SQL operation types
// Monitor SQL operation types
static {
/ / table
OPERATION_NAMES.add(HiveOperation.CREATETABLE.getOperationName());
// Modify database properties
OPERATION_NAMES.add(HiveOperation.ALTERDATABASE.getOperationName());
// Change the database owner
OPERATION_NAMES.add(HiveOperation.ALTERDATABASE_OWNER.getOperationName());
// Modify table attributes to add columns
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_ADDCOLS.getOperationName());
// Modify table properties, table storage path
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_LOCATION.getOperationName());
// Modify table attributes
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_PROPERTIES.getOperationName());
// Rename the table
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAME.getOperationName());
// Rename the column
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_RENAMECOL.getOperationName());
// Update the column by deleting the current column and then adding the new column
OPERATION_NAMES.add(HiveOperation.ALTERTABLE_REPLACECOLS.getOperationName());
// Create database
OPERATION_NAMES.add(HiveOperation.CREATEDATABASE.getOperationName());
// Delete the database
OPERATION_NAMES.add(HiveOperation.DROPDATABASE.getOperationName());
/ / delete table
OPERATION_NAMES.add(HiveOperation.DROPTABLE.getOperationName());
}
@Override
public void run(HookContext hookContext) throws Exception {
assert (hookContext.getHookType() == HookType.POST_EXEC_HOOK);
// Execute the plan
QueryPlan plan = hookContext.getQueryPlan();
// Operation name
String operationName = plan.getOperationName();
logWithHeader(SQL statement executed: + plan.getQueryString());
logWithHeader("Operation name:" + operationName);
if(OPERATION_NAMES.contains(operationName) && ! plan.isExplain()) { logWithHeader("Monitor SQL operations");
Set<ReadEntity> inputs = hookContext.getInputs();
Set<WriteEntity> outputs = hookContext.getOutputs();
for (Entity entity : inputs) {
logWithHeader(Hook Metadata input value: + toJson(entity));
}
for (Entity entity : outputs) {
logWithHeader(Hook metadata output value:+ toJson(entity)); }}else {
logWithHeader("Not monitored, ignore the hook!"); }}private static String toJson(Entity entity) throws Exception {
ObjectMapper mapper = new ObjectMapper();
// Entity type
// Include:
// DATABASE, TABLE, PARTITION, DUMMYPARTITION, DFS_DIR, LOCAL_DIR, FUNCTION
switch (entity.getType()) {
case DATABASE:
Database db = entity.getDatabase();
return mapper.writeValueAsString(db);
case TABLE:
return mapper.writeValueAsString(entity.getTable().getTTable());
}
return null;
}
/** * Log format **@param obj
*/
private void logWithHeader(Object obj) {
LOGGER.info("[CustomPostHook][Thread: " + Thread.currentThread().getName() + "] |"+ obj); }}Copy the code
Usage Procedure Explanation
Compile the above code into a jar package and place it in the $HIVE_HOME/lib directory, or use the Hive client to add the jar package:
0: JDBC: hive2: / / localhost: 10000 > add jar/opt/softwares/com JMX. Hive - 1.0 - the SNAPSHOT. Jar;Copy the code
Configure hive-site. XML file. For convenience, use client command to configure:
0: jdbc:hive2://localhost:10000> set hive.exec.post.hooks=com.jmx.hooks.CustomPostHook;
Copy the code
View table operation
In the above code we monitor some operations, which trigger some custom code (such as logging) when monitored. When we type the following command in Hive’s Beeline client:
0: jdbc:hive2://localhost:10000> show tables;
Copy the code
In the $HIVE_HOME/logs/hive.log file you can see:
[CustomPostHook] [Thread: f25 cab9a763 - c63e - 4-9 f9a - affacb3cecdb main] | execute SQL statements: show tables [CustomPostHook] [Thread: Cab9a763 f25 c63e - 4-9 f9a - affacb3cecdb main] | operation name: SHOWTABLES [CustomPostHook] [Thread: Cab9a763 f25 c63e - 4-9 f9a - affacb3cecdb main] | to be outside the scope of monitoring, ignore the hook!Copy the code
The above table view operation is not monitored, so there is no corresponding metadata log.
Build table operation
When we create a table in Hive’s Beeline client, it looks like this:
CREATE TABLE testposthook(
id intCOMMENT "id", name string COMMENT "iD")ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/';
Copy the code
View hive.log:
The above Hook metastore output value has two: the first is the metadata information of the database, the second is the metadata information of the table
- Database metadata
{
"name":"default"."description":"Default Hive database"."locationUri":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."parameters":{
},
"privileges":null."ownerName":"public"."ownerType":"ROLE"."setParameters":true."parametersSize":0."setOwnerName":true."setOwnerType":true."setPrivileges":false."setName":true."setDescription":true."setLocationUri":true
}
Copy the code
- Table metadata
{
"tableName":"testposthook"."dbName":"default"."owner":"anonymous"."createTime":1597985444."lastAccessTime":0."retention":0."sd": {"cols": []."location":null."inputFormat":"org.apache.hadoop.mapred.SequenceFileInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe"."parameters": {"serialization.format":"1"
},
"setSerializationLib":true."setParameters":true."parametersSize":1."setName":false
},
"bucketCols": []."sortCols": []."parameters":{
},
"skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{
},
"skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0."setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0
},
"storedAsSubDirectories":false."colsSize":0."setParameters":true."parametersSize":0."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."setSkewedInfo":true."colsIterator": []."setCompressed":false."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":false."setCols":true."setLocation":false."setInputFormat":true
},
"partitionKeys": []."parameters":{
},
"viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."partitionKeysSize":0."setDbName":true."setSd":true."setParameters":true."setCreateTime":true."setLastAccessTime":false."parametersSize":0."setTableName":true."setPrivileges":false."setOwner":true."setPartitionKeys":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setRetention":false."partitionKeysIterator": []."setTemporary":false."setRewriteEnabled":false
}
Copy the code
The ** COLs []** column has no data, that is, there is no information about field ID and field name when the table is created. To obtain this information, run the following command:
ALTER TABLE testposthook
ADD COLUMNS (age int COMMENT 'age');
Copy the code
Observe the log information again:
In the log above, Hook MetaStore has only one input and one output: both represent the metadata information of the table.
- The input
{
"tableName":"testposthook"."dbName":"default"."owner":"anonymous"."createTime":1597985445."lastAccessTime":0."retention":0."sd": {"cols":[
{
"name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
},
{
"name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
},
"setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
},
"bucketCols": []."sortCols": []."parameters":{
},
"skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{
},
"skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0."setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0
},
"storedAsSubDirectories":false."colsSize":2."setParameters":true."parametersSize":0."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."setSkewedInfo":true."colsIterator":[
{
"name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
},
{
"name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setCols":true."setLocation":true."setInputFormat":true
},
"partitionKeys": []."parameters": {"transient_lastDdlTime":"1597985445"."comment":"Create table _ test Hive Hooks"."totalSize":"0"."numFiles":"0"
},
"viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."partitionKeysSize":0."setDbName":true."setSd":true."setParameters":true."setCreateTime":true."setLastAccessTime":true."parametersSize":4."setTableName":true."setPrivileges":false."setOwner":true."setPartitionKeys":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setRetention":true."partitionKeysIterator": []."setTemporary":false."setRewriteEnabled":true
}
Copy the code
The **”cols”** column contains the field metadata, so let’s look at output json:
- The output
{
"tableName":"testposthook"."dbName":"default"."owner":"anonymous"."createTime":1597985445."lastAccessTime":0."retention":0."sd": {"cols":[
{
"name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
},
{
"name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
},
"setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
},
"bucketCols": []."sortCols": []."parameters":{
},
"skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{
},
"skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0."setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0
},
"storedAsSubDirectories":false."colsSize":2."setParameters":true."parametersSize":0."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."setSkewedInfo":true."colsIterator":[
{
"name":"id"."type":"int"."comment":"id"."setName":true."setType":true."setComment":true
},
{
"name":"name"."type":"string"."comment":"Name"."setName":true."setType":true."setComment":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setCols":true."setLocation":true."setInputFormat":true
},
"partitionKeys": []."parameters": {"transient_lastDdlTime":"1597985445"."comment":"Create table _ test Hive Hooks"."totalSize":"0"."numFiles":"0"
},
"viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."partitionKeysSize":0."setDbName":true."setSd":true."setParameters":true."setCreateTime":true."setLastAccessTime":true."parametersSize":4."setTableName":true."setPrivileges":false."setOwner":true."setPartitionKeys":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setRetention":true."partitionKeysIterator": []."setTemporary":false."setRewriteEnabled":true
}
Copy the code
The Output object does not contain the new column AGE, which represents metadata information prior to modifying the table
Metastore ListenersThe basic use
code
The specific implementation code is as follows:
public class CustomListener extends MetaStoreEventListener {
private static final Logger LOGGER = LoggerFactory.getLogger(CustomListener.class);
private static final ObjectMapper objMapper = new ObjectMapper();
public CustomListener(Configuration config) {
super(config);
logWithHeader(" created ");
}
// listen for table build operations
@Override
public void onCreateTable(CreateTableEvent event) {
logWithHeader(event.getTable());
}
// listen for table modification operations
@Override
public void onAlterTable(AlterTableEvent event) {
logWithHeader(event.getOldTable());
logWithHeader(event.getNewTable());
}
private void logWithHeader(Object obj) {
LOGGER.info("[CustomListener][Thread: " + Thread.currentThread().getName() + "] |" + objToStr(obj));
}
private String objToStr(Object obj) {
try {
return objMapper.writeValueAsString(obj);
} catch (IOException e) {
LOGGER.error("Error on conversion", e);
}
return null; }}Copy the code
Usage Procedure Explanation
Hive Hooks interact with Hiveserver, whereas listeners interact with Metastore, that is, they run in the Metastore process. The specific usage is as follows:
Jar package in $HIVE_HOME/lib, then configure hive-site. XML:
<property>
<name>hive.metastore.event.listeners</name>
<value>com.jmx.hooks.CustomListener</value>
<description/>
</property>
Copy the code
After the configuration, restart the metadata service:
bin/hive --service metastore &
Copy the code
Build table operation
CREATE TABLE testlistener(
id intCOMMENT "id", name string COMMENT "iD")COMMENT "create table _ test Hive Listener"ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/warehouse/';
Copy the code
View hive.log:
{
"tableName":"testlistener"."dbName":"default"."owner":"anonymous"."createTime":1597989316."lastAccessTime":0."retention":0."sd": {"cols":[
{
"name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
},
{
"name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
},
"setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
},
"bucketCols": []."sortCols": []."parameters":{
},
"skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{
},
"setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0."skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0
},
"storedAsSubDirectories":false."setCols":true."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."colsSize":2."colsIterator":[
{
"name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
},
{
"name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setParameters":true."setLocation":true."setInputFormat":true."parametersSize":0."setSkewedInfo":true
},
"partitionKeys": []."parameters": {"transient_lastDdlTime":"1597989316"."comment":"Create a table _ Test Hive Listener"."totalSize":"0"."numFiles":"0"
},
"viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges": {"userPrivileges": {"anonymous":[
{
"privilege":"INSERT"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true
},
{
"privilege":"SELECT"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true
},
{
"privilege":"UPDATE"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true
},
{
"privilege":"DELETE"."createTime":- 1."grantor":"anonymous"."grantorType":"USER"."grantOption":true."setGrantOption":true."setCreateTime":true."setGrantor":true."setGrantorType":true."setPrivilege":true}},"groupPrivileges":null."rolePrivileges":null."setUserPrivileges":true."setGroupPrivileges":false."setRolePrivileges":false."userPrivilegesSize":1."groupPrivilegesSize":0."rolePrivilegesSize":0
},
"temporary":false."rewriteEnabled":false."setParameters":true."setPartitionKeys":true."partitionKeysSize":0."setSd":true."setLastAccessTime":true."setRetention":true."partitionKeysIterator": []."parametersSize":4."setTemporary":true."setRewriteEnabled":false."setTableName":true."setDbName":true."setOwner":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setPrivileges":true."setCreateTime":true
}
Copy the code
When we perform the modify table operation again
ALTER TABLE testlistener
ADD COLUMNS (age int COMMENT 'age');
Copy the code
Observe the log again:
The first record is the information of the old table, and the second is the information of the modified table.
- old table
{
"tableName":"testlistener"."dbName":"default"."owner":"anonymous"."createTime":1597989316."lastAccessTime":0."retention":0."sd": {"cols":[
{
"name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
},
{
"name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
},
"setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
},
"bucketCols": []."sortCols": []."parameters":{
},
"skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{
},
"setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0."skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0
},
"storedAsSubDirectories":false."setCols":true."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."colsSize":2."colsIterator":[
{
"name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
},
{
"name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setParameters":true."setLocation":true."setInputFormat":true."parametersSize":0."setSkewedInfo":true
},
"partitionKeys": []."parameters": {"totalSize":"0"."numFiles":"0"."transient_lastDdlTime":"1597989316"."comment":"Create a table _ Test Hive Listener"
},
"viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."setParameters":true."setPartitionKeys":true."partitionKeysSize":0."setSd":true."setLastAccessTime":true."setRetention":true."partitionKeysIterator": []."parametersSize":4."setTemporary":false."setRewriteEnabled":true."setTableName":true."setDbName":true."setOwner":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setPrivileges":false."setCreateTime":true
}
Copy the code
- new table
{
"tableName":"testlistener"."dbName":"default"."owner":"anonymous"."createTime":1597989316."lastAccessTime":0."retention":0."sd": {"cols":[
{
"name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
},
{
"name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true
},
{
"name":"age"."type":"int"."comment":"Age"."setComment":true."setType":true."setName":true}]."location":"hdfs://kms-1.apache.com:8020/user/hive/warehouse"."inputFormat":"org.apache.hadoop.mapred.TextInputFormat"."outputFormat":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"."compressed":false."numBuckets":- 1."serdeInfo": {"name":null."serializationLib":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"."parameters": {"serialization.format":""."field.delim":""
},
"setSerializationLib":true."setParameters":true."parametersSize":2."setName":false
},
"bucketCols": []."sortCols": []."parameters":{
},
"skewedInfo": {"skewedColNames": []."skewedColValues": []."skewedColValueLocationMaps":{
},
"setSkewedColNames":true."setSkewedColValues":true."setSkewedColValueLocationMaps":true."skewedColNamesSize":0."skewedColNamesIterator": []."skewedColValuesSize":0."skewedColValuesIterator": []."skewedColValueLocationMapsSize":0
},
"storedAsSubDirectories":false."setCols":true."setOutputFormat":true."setSerdeInfo":true."setBucketCols":true."setSortCols":true."colsSize":3."colsIterator":[
{
"name":"id"."type":"int"."comment":"id"."setComment":true."setType":true."setName":true
},
{
"name":"name"."type":"string"."comment":"Name"."setComment":true."setType":true."setName":true
},
{
"name":"age"."type":"int"."comment":"Age"."setComment":true."setType":true."setName":true}]."setCompressed":true."setNumBuckets":true."bucketColsSize":0."bucketColsIterator": []."sortColsSize":0."sortColsIterator": []."setStoredAsSubDirectories":true."setParameters":true."setLocation":true."setInputFormat":true."parametersSize":0."setSkewedInfo":true
},
"partitionKeys": []."parameters": {"totalSize":"0"."last_modified_time":"1597989660"."numFiles":"0"."transient_lastDdlTime":"1597989660"."comment":"Create a table _ Test Hive Listener"."last_modified_by":"anonymous"
},
"viewOriginalText":null."viewExpandedText":null."tableType":"MANAGED_TABLE"."privileges":null."temporary":false."rewriteEnabled":false."setParameters":true."setPartitionKeys":true."partitionKeysSize":0."setSd":true."setLastAccessTime":true."setRetention":true."partitionKeysIterator": []."parametersSize":6."setTemporary":false."setRewriteEnabled":true."setTableName":true."setDbName":true."setOwner":true."setViewOriginalText":false."setViewExpandedText":false."setTableType":true."setPrivileges":false."setCreateTime":true
}
Copy the code
As you can see, the metadata information of the modified table contains the newly added column AGE.
conclusion
In this article, we showed you how to manipulate metadata in Hive to automate metadata management. We’ve shown you the basic use of Hive Hooks and Metastore Listeners to help implement operational metadata. You can also push this metadata information into Kafka to build your own metadata management system.
The public account “Big Data Technology and Data Warehouse”, reply to “information” to receive the big data data package