preface

Recently, I am reading the “Massive Data Processing and Big Data Technology Combat” written by The Ice River Big Guy. The book covers a number of big data technology framework combat content based on Hadoop, giving consideration to both theory and practice. It is a rare technical book on the market. In this article, I will share 7 tips about Hive command learning, benefit friends remember to send three even ⭐ support

Hive Command Description

Among all the connection modes provided by Hive, the CLI is the most commonly used one. Users can use Hive cli to perform various operations on Hive databases, data tables, and data.

1. Hive command options

After Hadoop is started on the server, you can run the Hive command to access the Hive cli. You can also run the following command to view Hive command options:

        hive --help

The Hive command options are displayed, indicating that users can pass the --service serviceNameTo start a service. The following information lists the main Hive command-line options:

Some important options are described as follows:

(1) CLI: command line interface

(2) Hiveserver2: enables the hiveserver2 service to listen for connections from other processes

(3) jar: extends from the Hadoop jar command to execute applications that require the Hive environment

(4) MetaStore: Start a Hive metadata service

On the COMMAND line interface (CLI) of the CentOS6.8 server, run the following command to view the CLI options of Hive:

        hive --help --service cli The options are described as follows:

(1) -d, –define<key=value> : mainly used to define variables, such as -d A=B or –define A=B

(2) –databases: specifies the name of the database used

(3) -e: executes the SQL statement from the server cli

(4) -f: executes SQL statements from files

(5) -h :–help: Displays the help information

(6) –hiveconf<property=value>: Set the Hive property value, which can override the property value configured in the hive-site. XML file

(7) –hivevar<key=value> : Replace parameters in the Hive command

(8) -i: initializes the SQL file

(9) -s, — silent: Enables silent mode in integration mode

(10) -v, — verbose: displays detailed information

2. Use the Hive command

Run the hive command to access the hive cli, as shown in the following:

Let’s write a query

hive (default)> select * from testdb.student; S_sex 01 Yongchang 1990-01-01 Male 02 Hongzhe 1990-12-21 male 03 Wenjing 1990-05-20 male 04 Li Yun 1990-08-06 Male 05 Miao Zhi 1991-12-01 Female 06 Xue Hui 1992-03-01 Female 07 Qiu Xiang 1989-07-01 Female 08 Wang Li 1990-01-20 Female Time Taken: 1.197 seconds, Touch_type: 8 row(s)Copy the code

Many times, executing a query does not require opening a command line interface. You can run the hive-e command as follows:

[root @ node01 hive - 1.1.0 - cdh5.14.0]# hive -e "select count(*) from testdb.student"
Logging initialized using configuration in jar:file:/export/ servers/hive - 1.1.0 - cdh5.14.0 / lib/hive - common - 1.1.0 - cdh5.14.0. Jar! /hive-log4j.properties Query ID = root_20201108231818_becc7952-05a5-49fc-915d-b6648d429f08 Totaljobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1604845856822_0001, Tracking URL = http://node01:8088/proxy/application_1604845856822_0001/
Kill Command = /export/ servers/hadoop - server - cdh5.14.0 / bin/hadoop job -kill job_1604845856822_0001
Hadoop job information forStage-1: number of mappers: 1; number of reducers: 1 2020-11-08 23:18:36,501 stag-1 map = 0%, reduce = 0% 2020-11-08 23:18:37,649 stag-1 map = 100%, reduce = 0%, Cumulative CPU 1.88 SEC 2020-11-08 23:18:38,739 stage-1 Map = 100%, reduce = 100%, Cumulative CPU 2.18 SEC MapReduce Total Cumulative CPU time: 2 seconds 180 msec Ended Job = job_1604845856822_0001 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.18 SEC HDFS Read: 11544 HDFS Write: 580032 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 180 msec OK _c0 8Copy the code

If you do not need to output too many logs, you can add the -s option to hive, as shown in the following:

[root @ node01 hive - 1.1.0 - cdh5.14.0]# hive -S -e "select count(*) from testdb.student"
_c0
8
Copy the code

To execute multiple statements at a time, save the statements to a file ending with.hql, as shown in the following:

vim test.sql
select count(*) from testdb.student;
select * from testdb.student;
Copy the code

Run the hive -f command to execute the statements in the HQL file as follows:

[root@node01 tmpfile]# hive -f test.sql_c0 8 Time taken: 11.551 seconds, touch_type: 1 row(s) OK student. S_id student. S_name student. S_birth student 1992-05-20 Male 04 Li Yun 1990-08-06 Male 05 Miao Zhi 1991-12-01 Female 06 Xue Hui 1992-03-01 Female 07 Qiu Xiang 1989-07-01 Female 08 Wang Li 1990-01-20 Female Time Taken: 0.071 seconds, Touch_type: 8 row(s)Copy the code

You can add comments with ‘–‘, as follows:

Hive (default)> select * from testdb.student; S_sex 01 Yongchang 1990-01-01 Male 02 Hongzhe 1990-12-21 male 03 Wenjing 1990-05-20 male 04 Li Yun 1990-08-06 Male 05 Miao Zhi 1991-12-01 Female 06 Xue Hui 1992-03-01 Female 07 Qiu Xiang 1989-07-01 Female 08 Wang Li 1990-01-20 Female Time Taken: 0.073 seconds, Touch_type: 8 row(s)Copy the code

Hiverc file

Hiverc: ${HIVE_HOME}/bin: ${HIVE_HOME}/bin: ${HIVE_HOME}/bin: ${HIVE_HOME}/bin We load this file when we start Hive, so we can configure some common parameters in this file, as follows:

cd /export/servers/hive-1.1.0-cdh5.14.0/bin
vim .hiverc



select * from testdb.student;

set hive.cli.print.current.db=true;
The column name is displayed as the result of the query
set hive.cli.print.header=true;
# enable bucket table
set hive.enforce.bucketing=true;
Compress hive intermediate results
set hive.exec.compress.intermediate=true;
# BZip2 encodings/decoders for map-side output
set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;
Compress hive output
set hive.exec.compress.output=true;
Use BZip2 encoding/decoder for MR output in Hive
set mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;
Use local mode instead of mapred
set hive.exec.mode.local.auto=true;
Copy the code

For example, if we add an HQL query statement to the.hiverc file, this command will be automatically executed every time we start the Hive command line.

You can see that the statements in the. Hiverc file are automatically executed by entering the hive command to start the hive command line.

When we need to use certain commands frequently, we can save them in a. Hiverc file.

4. Hive operation command history

Hive records the last 10000 commands to the. Hivehistory file in the home directory of the current user. You can run the following command to view the file (the current user is root) :

        cat vim /root/.hivehistory

5. Run system commands on the Hive CLI

It is easy to run operating system commands on the Hive CLI. You only need to add! To the system commands. “And”;” The end is ok, as follows:

hive (default)> !echo "hello world";
"hello world"hive (default)> ! jps; 11985 ResourceManager 18308 Jps 12420 RunJar 12085 NodeManager 12519 RunJar 11545 NameNode 18138 RunJar 11837 SecondaryNameNode 11646 DataNodeCopy the code

6. Run the Hadoop command on the Hive CLI

To run Hadoop commands on the Hive cli, delete HDFS from the Hadoop command and add a;. Can.

For example, run the following command on the CLI of the operating system:

[root@node01 ~]# hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2020-01-03 02:28 /aa
drwxr-xr-x   - PC   supergroup          0 2020-03-30 10:33 /aaa
drwxr-xr-x   - root supergroup          0 2019-12-27 05:42 /abc
Copy the code

Run the Hadoop command on the Hive cli as follows:

hive (default) > dfs -ls /;
Found 3 items
drwxr-xr-x   - root supergroup          0 2020-01-03 02:28 /aa
drwxr-xr-x   - PC   supergroup          0 2020-03-30 10:33 /aaa
drwxr-xr-x   - root supergroup          0 2019-12-27 05:42 /abc
Copy the code

As you can see, the results are consistent.

7. Display the query field name on the Hive CLI

When you run the Hive command to query data, the field names of the queried data can be displayed. In this case, you need to set the hive.cli.print.header property to true. The default value is false, as shown below:

hive (default)> set hive.cli.print.header=true; hive (default)> select * from testdb.student; S_sex 01 Yongchang 1990-01-01 Male 02 Hongzhe 1990-12-21 male 03 Wenjing 1990-05-20 male 04 Li Yun 1990-08-06 Male 05 Miao Zhi 1991-12-01 Female 06 Xue Hui 1992-03-01 Female 07 Qiu Xiang 1989-07-01 Female 08 Wang Li 1990-01-20 Female Time Taken: 0.056 seconds, Touch_type: 8 row(s)Copy the code

summary

Hive: The more you know, the more you don’t know. Hive: The more you know, the more you don’t know. I’m Alice, and I’ll see you next time!

One key three, form a habit ~

The article continues to update, you can search “ape man bacteria” on wechat for the first time to read, mind mapping, big data books, big data high-frequency interview questions, a large number of first-line big factory face… Looking forward to your attention!