When the deployed Metastore goes down or the HiveServer2 service goes down, both services may be unavailable for a considerable period of time until the service is pulled back up. To avoid service interruption, deploy Hive Metastore ha and HiveServer2 HA in real production environments.
How to implement HiveServer2 high availability and Metastore high availability are described below.
HiveServer2 high availability
Since 0.14, Hive uses Zookeeper to implement the HA function of HiveServer2 (Zookeeper Service Discovery). Clients can specify a nameSpace to connect to HiveServer2. Instead of specifying a host and port, this article learns and explores the high availability configuration of HiveServer2.
Suppose you now want to enable two instances of HiveServer2 on node1 and node3 respectively, and complete the HA configuration using ZooKeeper.
Node01 Modifies the hive-site. XML configuration
<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2_zk</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value> node01:2181,node02:2181,node03:2181</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
Copy the code
Node03 Node configuration and information modification are synchronized
Synchronize the hive folder to Node03 and modify the hive-site. XML configuration on Node03 as follows:
<property>
<name>hive.server2.thrift.bind.host</name>
<value>node03</value>
</property>
Copy the code
Restart the service
Restart hiveServer2 and metaStore services on Node01 and Node03 respectively
Nohup hive --service hiveserver2 >> /opt/module/apache-hive-2.1.1-bin/hiveserver.log 2>&1 & nohup hive --service Metastore >> /opt/module/apache-hive-2.1.1-bin/metastore.log 2> &1&Copy the code
Check the configuration in Zookeeper
After modifying the configurations, run the zookeeper_client command to check whether hiveserver2 is successfully registered with ZooKeeper
[zk: Localhost :2181(CONNECTED) 1] ls /hiveserver2_zk [serverUri=0.0.0.0:10001;version= 2.1.1-cDH6.3.2;sequence=0000000000]Copy the code
Beeline connection test
beeline> ! connect jdbc:hive2://node01:2181,node02:2181,node03:2181/; serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=hiveserver2_zkCopy the code
The URL format and parameter meanings of the JDBC connection are as follows:
jdbc:hive2://<zookeeper quorum>/<dbName>; serviceDiscoveryMode=zooKeeper; ZooKeeperNamespace =hiveserver2 // Parameter Meaning < ZooKeeper quorum> indicates the cluster link string of ZooKeeper. Such as node1:2181, 2:2181, node3:2181 < dbName > database for the Hive, Default value: default serviceDiscoveryMode=zooKeeper Specifies the zooKeeper mode. ZooKeeperNamespace =hiveserver2 Specifies the nameSpace in the ZOOK. The parameter hive. Server2. Zookeeper. Defined in the namespaceCopy the code
Metastore high availability
Principle that
General connection principle:
High availability principles:
Modifying Node Configurations
Example Modify the hive configuration file hive-site.xml on node01 and node03
<property>
<name>hive.metastore.uris</name>
<value>thrift://node01:9083,thrift://node03:9083</value>
</property>
Copy the code
Restart the service
Restart hiveServer2 and metaStore services on Node01 and Node02 respectively
Nohup hive --service hiveserver2 >> /opt/module/apache-hive-2.1.1-bin/hiveserver.log 2>&1 & nohup hive --service Metastore >> /opt/module/apache-hive-2.1.1-bin/metastore.log 2> &1&Copy the code
The validation test
Verify that HiveServer2 is highly available
On Node03, kill the hiveServer2 process that occupies port 10000
[root@node03 logs]$ netstat -ntpl |grep 10000
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::10000 :::* LISTEN 87776/java
Copy the code
Check the configuration in Zookeeper
[zk: Localhost :2181(CONNECTED) 1] ls /hiveserver2_zk [serverUri=0.0.0.0:10001;version= 2.1.1-cDH6.3.2;sequence=0000000000]Copy the code
Beeline tests the connection
[root@node01 ~]# beeline WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [the jar file: / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / jars/log4j - slf4j - impl - 2.8.2. Jar! / org/slf4j/impl/StaticLo ggerBinder.class] SLF4J: Found binding in [the jar file: / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / jars/slf4j - log4j12-1.7.25. Jar! / org/slf4j/impl/StaticLogg erBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: The Actual binding is of type [. Org. Apache logging. Slf4j. Log4jLoggerFactory] Beeline version 2.1.1 - cdh6.3.2 by apache Hive beeline> beeline> ! connect jdbc:hive2://node01:2181,node02:2181,node03:2181/; serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=hiveserver2_zk Connecting to jdbc:hive2://node01:2181,node02:2181,node03:2181/; serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=hiveserver2_zk Enter username for jdbc:hive2://node01:2181,node02:2181,node03:2181/: hive Enter password for jdbc:hive2://node01:2181,node02:2181,node03:2181/: 21/07/01 09:31:18 [main]: INFO jdbc.hiveconnection: Connected to 0.0.0.0:10001 Connected to: Apache Hive (version 2.1.1-CDH6.3.2) Driver: Hive JDBC (Version 2.1.1-CDH6.3.2) Transaction ISOLATION: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://node01:2181,node02:2181,node0> show tables; INFO : Compiling command(queryId=hive_20210701093131_c1958b66-3e2a-443d-8562-22f00f4bb463): show tables INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20210701093131_c1958b66-3e2a-443d-8562-22f00f4bb463); Time taken: 1.164 seconds INFO: Executing Command (queryId= hive_20210701093131_C1958b66-3e2A-443D-8562-22F00f4bb463): show tables INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20210701093131_c1958b66-3e2a-443d-8562-22f00f4bb463); Time taken: 0.046 seconds INFO: OK + -- -- -- -- -- -- -- -- -- -- -- + | tab_name | + -- -- -- -- -- -- -- -- -- -- -- + | score4 | | stu | + -- -- -- -- -- -- -- -- -- -- - + 2 rows selected (1.728 seconds)Copy the code
Verify that Mestastore is highly available
Node03 kills the mestastore process
[root@node01 ~]# ps -ef | grep metastore hive 19802 19786 2 09:23 ? 00:00:33 / usr/Java/jdk1.8.0 _181 cloudera/bin/Java - Dproc_jar - Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Xms576716800 -Xmx576716800 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hive_hive-HIVEMETASTORE-2c60fdac6f3da589eafb946bedf8838a_pid19802.hprof -XX:OnOutOfMemoryError=/opt/cloudera/cm-agent/service/common/killparent.sh -Dlog4j.configurationFile=hive-log4j2.properties -Dlog4j.configurationFile=hive-log4j2.properties . - Djava. Util. Logging. Config file = / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hive/bin /.. / conf/parquet - logging. The properties - Dyarn. The dir = / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hadoop/logs - Dyarn. Log. The file = hadoop. The log - Dyarn. Home. The dir = / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hadoop/libexec /.. /.. /hadoop-yarn -Dyarn.root.logger=INFO,console - Djava. If the path = / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hadoop/lib/native - Dhadoop. The dir = / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hadoop/logs - Dhadoop. The file = hadoop. The log - Dhadoop. Home. Dir = / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hadoop - Dhadoop. Id. STR = hive -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar / opt/cloudera/parcels/CDH 6.3.2-1. Cdh6.3.2. P0.1605554 / lib/hive/lib/hive - service - 2.1.1 - cdh6.3.2. Jar org.apache.hadoop.hive.metastore.HiveMetaStore -p 9083 root 27765 27722 0 09:49 pts/2 00:00:00 grep --color=auto metastore kill -9 19802Copy the code
Execute query statement
0: jdbc:hive2://node01:2181,node02:2181,node0> select * from stu limit 1; INFO : Compiling command(queryId=hive_20210701093806_b5920638-19be-42fb-921f-b81206a1f35f): select * from stu limit 1 INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:stu.id, type:int, comment:null), FieldSchema(name:stu.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20210701093806_b5920638-19be-42fb-921f-b81206a1f35f); Time taken: 0.366 seconds INFO: Executing command(queryId= hive_20210701093806_b5920638-19BE-42FB-921F-b81206a1f35f): select * from stu limit 1 INFO : Completed executing command(queryId=hive_20210701093806_b5920638-19be-42fb-921f-b81206a1f35f); Time taken: 0.001 seconds INFO: OK +---------+-----------+ | stu.id | stu.name | +---------+-----------+ | 1 | zhangsan | +---------+-----------+ 1 row Selected (0.546 seconds)Copy the code
The Ha configuration of HiveServer2 and Metastore is complete, which can solve many problems in production, such as concurrency, load balancing, single point of failure, and security. Therefore, it is strongly recommended to use this mode to provide Hive services in production environments.
If you think the author wrote well! You can follow the author’s public account “White Programmer’s self-study room” to get more content. Writing is not easy, you can also like, follow, comment to give an encouragement, hahaha.