Let me explain a few nouns:
- Metadata: Hive metadata, including table names, field names, types, partitions, and users defined by Hive. You can also use hive’s built-in Derby database in the testing phase of mysql, which stores relational books in general.
- Metastore: Hivestore server. Converts DDL and DML statements to MapReduce and submits them to HDFS.
- Hiveserver2: Indicates the Hive server. Hive services are provided. Clients can connect to Hive using beeline and JDBC (that is, Java code links).
- Beeline: a tool used to connect hive clients to Hive. Mysql client. For example, Navite Cat.
Other languages access Hive mainly through hiveserver2. Hiveserver2 (HS2) is a service that enables clients to perform Hive query. HiveServer2 supports embedded and remote access to HiveServer2, multi-client concurrency, and identity authentication. Designed to provide better support for open API clients such as JDBC and ODBC.
The default hive server port is 10000. You can connect to Hive using Beeline, JDBC, and ODBC. When hiveserver2 is started, hiveserver2 checks whether hive.metastore.uris is configured. If hiveserver2 is not configured, A MetaStore service is started and hiveserver2 is started. If hive.metastore.uris is configured. Connects to the remote MetaStore service. This method is the most commonly used. The deployment is shown below:
Python connection Hive
Python3 to access Hive, install the following dependencies:
- pip3 install thrift
- pip3 install PyHive
- pip3 install sasl
- pip3 install thrift_sasl
Here is a Python utility class for accessing Hive:
# -*- coding:utf- 8 - -*-
from pyhive import hive
class HiveClient(object):
"""docstring for HiveClient"""
def __init__(self, host='hadoop-master',port=10000,username='hadoop',password='hadoop',database='hadoop',auth='LDAP'):
"""
create connection to hive server2
"""
self.conn = hive.Connection(host=host,
port=port,
username=username,
password=password,
database=database,
auth=auth)
def query(self, sql):
"""
query
"""
with self.conn.cursor() as cursor:
cursor.execute(sql)
return cursor.fetchall()
def insert(self, sql):
"""
insert action
"""
with self.conn.cursor() as cursor:
cursor.execute(sql)
# self.conn.commit()
# self.conn.rollback()
def close(self):
"""
close connection
"""
self.conn.close()
Copy the code
To do this, you simply import and create an instance of an object, pass it into SQL and call the Query method.
# Get a connection
hclient = hive.HiveClient()
# execute query.# close the connection
hclient.close()
Copy the code
Note: in the insert method, I annotated self.conn.mit () and self.conn.rollback(), which are transactional operations for traditional relational databases and not supported in Hive.
The Java connection Hive
Java is the basic language of big data. Connecting to Hive is supported well. This section describes how to connect to Hive using JDBC and Mybatis.
1. A Jdbc connection
Java connects to Hiveserver through JDBC, just like traditional JDBC connects to mysql.
Hive-jdbc dependencies are required:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
Copy the code
Code to connect mysql routines, is using the DriverManager. GetConnection (url, username, password) :
@NoArgsConstructor
@AllArgsConstructor
@Data
@ToString
public class HiveConfigModel {
private String url = "jdbc:hive2://localhost:10000";
private String username = "hadoop";
private String password = "hadoop";
}
@Test
public void test(a){
// Initialize the configuration
HiveConfigModel hiveConfigModel = ConfigureContext.getInstance("hive-config.properties")
.addClass(HiveConfigModel.class)
.getModelProperties(HiveConfigModel.class);
try {
Connection conn = DriverManager.getConnection(hiveConfigModel.getUrl(),
hiveConfigModel.getUsername(), hiveConfigModel.getPassword());
String sql = "show tables";
PreparedStatement preparedStatement = conn.prepareStatement(sql);
ResultSet rs = preparedStatement.executeQuery();
List<String> tables = new ArrayList<>();
while (rs.next()){
tables.add(rs.getString(1));
}
System.out.println(tables);
} catch(SQLException e) { e.printStackTrace(); }}Copy the code
Java.sql.Driver: java.sql.
org.apache.hive.jdbc.HiveDriver
Copy the code
Java. SQL. DriverManager using spi implements the service interface and service implementation separation in order to achieve decoupling, the realization of the JDBC org. Here the apache. Hive. JDBC. HiveDriver based on Java. SQL. The Driver to provide standardized implementation logic. Clients using JDBC do not need to change the code and simply introduce different SPI interface services.
DriverManager.getConnection(url, username, password)
Copy the code
This will get the connection, provided the implementation complies with the corresponding SPI specification.
2. Integrate mybatis
Mybatis is usually used for dao layer database access, and hive access is similar.
Configuration file sqlconfig.xml:
<! DOCTYPEconfiguration PUBLIC "- / / mybatis.org//DTD Config / 3.0 / EN"
"http://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>
<environments default="production">
<environment id="production">
<transactionManager type="JDBC"/>
<dataSource type="POOLED">
<property name="driver" value="org.apache.hive.jdbc.HiveDriver"/>
<property name="url" value="jdbc:hive2://master:10000/default"/>
<property name="username" value="hadoop"/>
<property name="password" value="hadoop"/>
</dataSource>
</environment>
</environments>
<mappers>
<mapper resource="mapper/hive/test/test.xml"/>
</mappers>
</configuration>
Copy the code
Mapper code omitted, implementation code:
public classTestMapperImpl implements TestMapper {
private static SqlSessionFactory sqlSessionFactory = HiveSqlSessionFactory.getInstance().getSqlSessionFactory();
@Override
public int getTestCount(String dateTime) {
SqlSession sqlSession = sqlSessionFactory.openSession();
TestMapper testMapper = sqlSession.getMapper(TestMapper.class);
int count = testMapper.getTestCount(dateTime);
sqlSession.close();
returncount; }}Copy the code