Big data platform building | Hive - Moment For Technology

preface

Big data platform building | Hadoop cluster structures (a)

1, the introduction of

Based on Hive3.1.2
Hive Download Address
Hive runs on hadoop3. X
– Depends on the JDK 1.8 environment

2, architecture,

It’s essentially storedHdfs fileandTables, databasesBetween theMapping relationships (metadata)“And then offered toSQLAccess file data in the same way as access table structured data. It does this by translating SQL and then passingCalculation engineTo calculate the query results

Metadata MetaStore:It is the mapping between Hdfs files, tables, and databases. The default is stored in the built-in Derby database, and the general configuration is stored in MySQL
Driver:
- SQL parser: Converts SQL strings into an abstract syntax tree AST, and parses the AST
- Physical Plan compiler: Compiles the AST to produce a logical execution Plan.
- Query Optimizer: Optimizes logical execution plans.
- Execution executor: To convert logical Execution plans into executable physical plans (such as MapReduce or Spark).
Client:Esight provides various methods to access Hive, such as Hive shell (CLI), JDBC/ODBC(Java Access to Hive), and Beeline

3. Server planning

Any server that wants to use Hive as a client can be deployed as a server or client. As a client, you do not start the MetaStore service (or hiverServer2 service) to connect to services on other servers. For example, hadoop300 starts the MetaStore service, and Hadoop301 and Hadoop302 only need to configure the address of metaStore service to access Hive.Such as the address thrift://hadoop300:9083)
tip: If both Hive servers work as servers, you can configure high availability of metadata services

	Hadoop300	Hadoop301	Hadoop302
hive	V

4. Hive access mode

The essence of Hive access is to access metadata stored in mysql

# 3 Access methods flow
1Hive client ----> myQL (metadata)2Hive client --> MetaStore service ----- > myQL (metadata)3Hive client -- -- -- -- > hiveServer2 service -- -- -- -- - > metastore service -- -- -- -- - > myql (metadata)Copy the code

1. Directly connect to mysql

Hive clients can directly access Hive metadata by configuring the mysq information where the metadata resides
Metastore service and hiveServer2 service are not configured. By default, the direct connection mode is used. Metastore and hiveServer2 services are not required to start the Hive shell client to directly access Hive metadata
This way it works for local access without giving away mysql information, without having to start additional services,

2. Metastore metadata service

Thrift is a thrift service that you need to manually start and connect to to access Hive
Start a Metastore service on mysql(metadata) to shield the connection details of mysql. Connect to the MetaStore service first, and then connect to mysql through the MetaStore service to obtain metadata
If this parameter is configuredhive.metastore.urisParameters take this approach
Mainly responsible for access to metadata, i.e. table structure, library information

3. HiveServer2 service mode

Thrift is a thrift service that you need to manually start and connect to to access Hive
By starting a service on top of the MetaStore service
For example, Python and Java remotely access Hive data. The Beeline client also uses HiveServer2 to access Hive data

5, installation,

Download and decompress

[hadoop@hadoop300 app]$ pwd
/home/hadoop/app
drwxrwxr-x. 12 hadoop hadoop 166 2month22 00:08 manager
lrwxrwxrwx   1 hadoop hadoop  47 2month21 12:33 hadoop -> /home/hadoop/app/manager/hadoop_mg/hadoop- 3.1.3
lrwxrwxrwx   1 hadoop hadoop  54 2month22 00:04 hive -> /home/hadoop/app/manager/hive_mg/apache-hive- 3.1.2-bin

Copy the code

Add hive environment variables

Modify thevim ~/.bash_profilefile

# ================== Hive ==============
export HIVE_HOME=/home/hadoop/app/hive
export PATH=$PATH:$HIVE_HOME/bin
Copy the code

Hive configuration

${HIVE_HOME}/conf/hive-site. XML

If the file is not copied directly${HIVE_HOME}/conf/hive-default.xml.templateFile creation
You need to configure the storage mode and path of Hive metadata. The default value is delpy and the default value is saved to mysql. Therefore, you need to configure the properties related to connecting to mysql. Configure metaStore and hiverServer2 services


      

      
<configuration>
   <! Metastore service startup address (multiple addresses can be configured, separated by commas)-->
  <property>
    <name>hive.metastore.uris</name>
    <value>thrift://hadoop300:9083</value>
  </property>

    <! Mysql > select * from Hadoops_Hive; select * from Hadoops_Hive;
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://www.burukeyou.com:3306/Hadoops_Hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <! -- JDBC connection username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <! -- Password for JDBC connection -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
    </property>
 
    <! /user/hive/warehouse = /user/hive/warehouse = /user/hive/warehouse
     <property>
             <name>hive.metastore.warehouse.dir</name>
             <value>/warehouse</value>
     </property>

      <! HiveServer2 startup port -->
	  <property>
    			<name>hive.server2.thrift.port</name>
                <value>10000</value> 
       </property>
        <! -- hiveServer2 startup address -->
       <property>
               <name>hive.server2.thrift.bind.host</name>
               <value>hadoop300</value>
       </property>

	<! Verify the Hive metadata store version
	<property>
		<name>hive.metastore.schema.verification</name>
		<value>false</value>
	</property>

	  <! Select * from hive;
        <property>
               <name>hive.cli.print.header</name>
               <value>true</value>
       </property>
       
       <! -- Hive command line to display database information -->
       <property>
               <name>hive.cli.print.current.db</name>
               <value>true</value>
       </property>
        
</configuration>

Copy the code

After modifying the configuration, import the mysql driver package to the lib directory of ${HIVE_HOME}

[hadoop@hadoop300 ~]$  cp mysql-connector-java- 5.1.46.jar /home/hadoop/app/hive/lib/
Copy the code

3, initialize metadata in the database table information and data

[hadoop@hadoop300 ~]$schematool -initSchema -dbType mysql
Copy the code

4. You can see the generated table files in the Hadoops_Hive library

6. Use the Hive client

6.1 Hive CLI(Interactive Client)

Because the MetaStore service is configured, you need to start it first. Metastore can be used to access metadata from Hive clients. If the MetaStore service is not configuredhive.metastore.urisParameters do not need to be started

[hadoop@hadoop300 ~]$ hive --service metastore
Copy the code

performhiveCommand to enter the interactive command line

[hadoop@hadoop300 conf]$ hive
hive (default)> show tables;
hive (default)> show tables;
OK
tab_name
student
user
Copy the code

6.2 beeline

Beeline uses hiveServer2 to access Hive. Therefore, start hiverServer2 first

[hadoop@hadoop300 shell]$hive --service hiveserver2 
Copy the code

After the hiveserver2 WebUI is started, you can access the hiveserver2 WebUI at http://hadoop300:10002

Start the Beeline client and connect to hiveServer2

beeline -u jdbc:hive2://hadoop300:10000 -n hadoop

[hadoop@hadoop300 shell]$ beeline -u jdbc:hive2://hadoop300:10000 -n hadoop
Connecting to jdbc:hive2://hadoop300:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive

# check all tables
0: jdbc:hive2://hadoop300:10000> show tables;
INFO  : Compiling command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069): show tables
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name.type:string.comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069); Time taken: 1.044 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069); Time taken: 0.06 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+
| tab_name  |
+-----------+
| student   |
| user      |
+-----------+
3 rows selected (1.554 seconds)

Copy the code

test

1, create external partition teacher table

create external  table if not exists teacher (
  `id` int,
  `name` string,
  `age` int  COMMENT 'age'
) COMMENT 'Teacher table'
partitioned by (`date` string COMMENT 'Partition date') 
row format delimited fields terminated by '\t' 
stored as parquet
location  '/warehouse/demo/teacher' 
tblproperties ("parquet.compress"="SNAPPY");
Copy the code

2. Insert data

insert overwrite table teacher partition(`date`='2021-02-29')
select 3, "jayChou",49;
Copy the code

3. Check HDFS

Big data platform building | Hive