preface
Big data platform building | Hadoop cluster structures (a)
1, the introduction of
- Based on Hive3.1.2
- Hive Download Address
- Hive runs on hadoop3. X
- – Depends on the JDK 1.8 environment
2, architecture,
- It’s essentially stored
Hdfs file
andTables, databases
Between theMapping relationships (metadata)
“And then offered toSQL
Access file data in the same way as access table structured data. It does this by translating SQL and then passingCalculation engine
To calculate the query results
Metadata MetaStore:
It is the mapping between Hdfs files, tables, and databases. The default is stored in the built-in Derby database, and the general configuration is stored in MySQLDriver:
- SQL parser: Converts SQL strings into an abstract syntax tree AST, and parses the AST
- Physical Plan compiler: Compiles the AST to produce a logical execution Plan.
- Query Optimizer: Optimizes logical execution plans.
- Execution executor: To convert logical Execution plans into executable physical plans (such as MapReduce or Spark).
Client:
Esight provides various methods to access Hive, such as Hive shell (CLI), JDBC/ODBC(Java Access to Hive), and Beeline
3. Server planning
- Any server that wants to use Hive as a client can be deployed as a server or client. As a client, you do not start the MetaStore service (or hiverServer2 service) to connect to services on other servers. For example, hadoop300 starts the MetaStore service, and Hadoop301 and Hadoop302 only need to configure the address of metaStore service to access Hive.
Such as the address thrift://hadoop300:9083
) tip
: If both Hive servers work as servers, you can configure high availability of metadata services
Hadoop300 | Hadoop301 | Hadoop302 | |
---|---|---|---|
hive | V |
4. Hive access mode
- The essence of Hive access is to access metadata stored in mysql
# 3 Access methods flow
1Hive client ----> myQL (metadata)2Hive client --> MetaStore service ----- > myQL (metadata)3Hive client -- -- -- -- > hiveServer2 service -- -- -- -- - > metastore service -- -- -- -- - > myql (metadata)Copy the code
1. Directly connect to mysql
- Hive clients can directly access Hive metadata by configuring the mysq information where the metadata resides
- Metastore service and hiveServer2 service are not configured. By default, the direct connection mode is used. Metastore and hiveServer2 services are not required to start the Hive shell client to directly access Hive metadata
- This way it works for local access without giving away mysql information, without having to start additional services,
2. Metastore metadata service
- Thrift is a thrift service that you need to manually start and connect to to access Hive
- Start a Metastore service on mysql(metadata) to shield the connection details of mysql. Connect to the MetaStore service first, and then connect to mysql through the MetaStore service to obtain metadata
- If this parameter is configured
hive.metastore.uris
Parameters take this approach - Mainly responsible for access to metadata, i.e. table structure, library information
3. HiveServer2 service mode
- Thrift is a thrift service that you need to manually start and connect to to access Hive
- By starting a service on top of the MetaStore service
- For example, Python and Java remotely access Hive data. The Beeline client also uses HiveServer2 to access Hive data
5, installation,
Download and decompress
[hadoop@hadoop300 app]$ pwd
/home/hadoop/app
drwxrwxr-x. 12 hadoop hadoop 166 2month22 00:08 manager
lrwxrwxrwx 1 hadoop hadoop 47 2month21 12:33 hadoop -> /home/hadoop/app/manager/hadoop_mg/hadoop- 3.1.3
lrwxrwxrwx 1 hadoop hadoop 54 2month22 00:04 hive -> /home/hadoop/app/manager/hive_mg/apache-hive- 3.1.2-bin
Copy the code
Add hive environment variables
- Modify the
vim ~/.bash_profile
file
# ================== Hive ==============
export HIVE_HOME=/home/hadoop/app/hive
export PATH=$PATH:$HIVE_HOME/bin
Copy the code
Hive configuration
${HIVE_HOME}/conf/hive-site. XML
- If the file is not copied directly
${HIVE_HOME}/conf/hive-default.xml.template
File creation - You need to configure the storage mode and path of Hive metadata. The default value is delpy and the default value is saved to mysql. Therefore, you need to configure the properties related to connecting to mysql. Configure metaStore and hiverServer2 services
<configuration>
<! Metastore service startup address (multiple addresses can be configured, separated by commas)-->
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop300:9083</value>
</property>
<! Mysql > select * from Hadoops_Hive; select * from Hadoops_Hive;
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://www.burukeyou.com:3306/Hadoops_Hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<! -- JDBC connection username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<! -- Password for JDBC connection -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<! /user/hive/warehouse = /user/hive/warehouse = /user/hive/warehouse
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/warehouse</value>
</property>
<! HiveServer2 startup port -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<! -- hiveServer2 startup address -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop300</value>
</property>
<! Verify the Hive metadata store version
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<! Select * from hive;
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<! -- Hive command line to display database information -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
Copy the code
After modifying the configuration, import the mysql driver package to the lib directory of ${HIVE_HOME}
[hadoop@hadoop300 ~]$ cp mysql-connector-java- 5.1.46.jar /home/hadoop/app/hive/lib/
Copy the code
3, initialize metadata in the database table information and data
[hadoop@hadoop300 ~]$schematool -initSchema -dbType mysql
Copy the code
4. You can see the generated table files in the Hadoops_Hive library
6. Use the Hive client
6.1 Hive CLI(Interactive Client)
- Because the MetaStore service is configured, you need to start it first. Metastore can be used to access metadata from Hive clients. If the MetaStore service is not configured
hive.metastore.uris
Parameters do not need to be started
[hadoop@hadoop300 ~]$ hive --service metastore
Copy the code
- perform
hive
Command to enter the interactive command line
[hadoop@hadoop300 conf]$ hive
hive (default)> show tables;
hive (default)> show tables;
OK
tab_name
student
user
Copy the code
6.2 beeline
- Beeline uses hiveServer2 to access Hive. Therefore, start hiverServer2 first
[hadoop@hadoop300 shell]$hive --service hiveserver2
Copy the code
After the hiveserver2 WebUI is started, you can access the hiveserver2 WebUI at http://hadoop300:10002
Start the Beeline client and connect to hiveServer2
beeline -u jdbc:hive2://hadoop300:10000 -n hadoop
[hadoop@hadoop300 shell]$ beeline -u jdbc:hive2://hadoop300:10000 -n hadoop
Connecting to jdbc:hive2://hadoop300:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
# check all tables
0: jdbc:hive2://hadoop300:10000> show tables;
INFO : Compiling command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069): show tables
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name.type:string.comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069); Time taken: 1.044 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069): show tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=hadoop_20210227161152_17b6a6dd-bcd2-4ab0-8bd4-be600ae07069); Time taken: 0.06 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-----------+
| tab_name |
+-----------+
| student |
| user |
+-----------+
3 rows selected (1.554 seconds)
Copy the code
test
1, create external partition teacher table
create external table if not exists teacher (
`id` int,
`name` string,
`age` int COMMENT 'age'
) COMMENT 'Teacher table'
partitioned by (`date` string COMMENT 'Partition date')
row format delimited fields terminated by '\t'
stored as parquet
location '/warehouse/demo/teacher'
tblproperties ("parquet.compress"="SNAPPY");
Copy the code
2. Insert data
insert overwrite table teacher partition(`date`='2021-02-29')
select 3, "jayChou",49;
Copy the code
3. Check HDFS