Hive CLI Operations (7)

Hive command line

$HIVE_HOME/bin/hive is an interactive terminal provided by Hive. In fact, it is a shell script used for interactive query

However, since Hive 0.11 introduced HiveServer2, it has introduced another terminal, Beeline, which is a JDBC-based SQL client, so the Hive command terminal has been gradually deprecated

Hive-cli is a legacy tool that has two main usage scenarios. The first is as a heavy client for SQL on Hadoop, and the second is as a command line tool for Hiveserver (now known as “HiveServer1”). But since Hiveserver 1.0 hiveserver has been deprecated and removed from the code base and replaced with HiveServer2 so the second usage scenario is no longer applicable. For the first usage scenario, Beeline provides or should provide the same functionality, but in a different way than Hivecli.

Originally, Hive-CLI was supposed to be deprecated, but because of the widespread use of Hive-CLI, the community created a new beeline based client to use Hive-CLI without requiring any or minimal changes to existing user scripts. Since some of the new Hive-CLI features use the old Hive-CLI, you can use the following configuration to enable them

export USE_DEPRECATED_CLI=false
Copy the code

Every time you start a Hive client, you can see a corresponding process running on the machine

– E line mode

With the -e parameter, we can directly use hive commands to query data without entering the hive command line. In this case, SQL is the SQL input in the command line

hive -e 'select a.foo from pokes a'
Copy the code

You can also specify other Hive parameters on the command line

hive -e 'select a.foo from pokes a' --hiveconf hive.exec.scratchdir=/opt/my/hive_scratch --hiveconf mapred.reduce.tasks=1
Copy the code

But in this case we use redirection more often, which means redirecting our execution results to a file, especially if there is a lot of output

 hive -e "select * from ods.u_data_new limit 10" > out.txt
Copy the code

-f Indicates the file mode

Hive can run one or more statements stored in a file using the -f parameter. In general, files that store Hive queries are usually named with the.q or. HQL suffix, but this is not required.

hive -f /home/my/hive-script.sql
Copy the code

This approach is common in the enterprise, where we save our ETL SQL as a file and hand it to our scheduling system

Source executes the external SQL schema

This way we can execute SQL after we enter hive’s command line, but to be honest, it looks pretty elegant but is rarely used

hive> source /home/wyp/Documents/test.sql
Copy the code

Set mode

This command allows us to set a lot of parameters after entering the command line to adjust the running environment of our SQL

Set Displays all configurations

When we use this command alone with no other parameters, it lists all of our parameter Settings, and this is the most recent parameter, which means it might be the value that we override the default

Set Attribute name = attribute value

We can define attribute values by setting set key=value, for example set userName=kingcall; Key overwrites if it already exists and creates if it does not. Key can be any string

After the setting is successful, we can use set to see how the properties are set

Header =true Set hive.cli.print.header=true

For example, set mapred.reduce.tasks=32;

reset

Resetting the parameters is not necessary for demonstration

set -v

Just list Hadoop and Hive property values, since we know we can define any property, so this command makes sense.

Resource management

add

Add files, jar packages, and distributed files to the distributed cache. If you’ve ever written a UDF, you should know this command

add FILE[S] <ivyurl> <ivyurl>* 
add JAR[S] <ivyurl> <ivyurl>* 
add ARCHIVE[S]<ivyurl> <ivyurl>*
Copy the code

Create a new table and prepare the data

CREATE TABLE ods.u_data (
  userid INT,
  movieid INT,
  rating INT,
  unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
Copy the code

wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip
Copy the code

LOAD DATA LOCAL INPATH '/Users/liuwenqiang/ml-100k/u.data' OVERWRITE INTO TABLE ods.u_data;
Copy the code

Create a script

import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print('\t'.join([userid, movieid, rating, str(weekday)]))
Copy the code

Create new tables, load scripts, and execute SQL usage scripts

CREATE TABLE u_data_new (
  userid INT,
  movieid INT,
  rating INT,
  weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

add file /Users/liuwenqiang/weekday_mapper.py;

INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'python weekday_mapper.py'
  AS (userid, movieid, rating, weekday)
FROM u_data;
Copy the code

list

Lists the resources that have been added to the distributed cache

list FILE[S]
list JAR[S]
list ARCHIVE[S]
Copy the code

delete

Deletes resources added to the distributed cache

quit/exit

Exit the terminal

! Execution mode

This is useful because it allows you to execute shell commands from the Hive command line.

Executing shell commands

For example, when I execute test. SQL, I do not use the full path, so how do I know it is in the current path (the directory where you run hive)

That’s because I executed ls and looked at it to make sure it was in the current directory

dfs

When pointing to hadoop-related commands, you can feel them more clearly than if you were executing them in a shell. Here we execute a command for the same purpose in two ways: the first is the shell execution mode we introduced, and the second is the one we are introducing now

compile

conclusion

The command	annotation
source FILE	Execute a script on the CLI.
set =	Sets a value for a specific configuration variable. Note: If you forget how to spell the name of the variable, the CLI will not show an error.
set -v	Print all Hadoop and Hive configuration variables.
set	Prints a list of configuration variables that are overridden by the user or Hive. Type a list of configuration variables that have been overridden by the user or Hive.
reset	Reset the configuration item to the default value (for Hive 0.10, see Hive-3202). Any parameters used in the command line setting command or -hiveonf will be restored to their default values. Note: For historical reasons, this does not apply to those using hiveconf on the command line. Keyword configuration parameter for the prefix.
quit exit	Use the quit or exit command to exit the interactive shell window
list FILE[S] * list JAR[S] * list ARCHIVE[S] *	Check whether a given resource has been added to the distributed cache. Refer toHive ResourceYou can see more information.
list FILE[S] list JAR[S] list ARCHIVE[S]	Lists the resources that have been added to the distributed cache. Refer toHive ResourcesYou can get more information
dfs	Run a DFS command in the Hive shell window.
delete FILE[S] * delete JAR[S] * delete ARCHIVE[S] *	Hive1.2.0Shanchu Resources in the distributed cache. Refer toHiveResourceFor more information
delete FILE[S] * delete JAR[S] * delete ARCHIVE[S] *	Deletes a resource from the distributed cache
compile `<groovy string>` AS GROOVY NAMED	This allows inline code to be compiled like UDF (Hive)0.13.0) is used as well. View information in HiveDynamic compiler
add FILE[S] * add JAR[S] * add ARCHIVE[S] *	Hive 1.2.0Used in a similar format: ivy: / / group: the module: version? The LVY URL of query_string adds one or more files, JARS, or zip packages to the list of resources in the distributed cache. Refer toHive ResourcesFor more information.
add FILE[S] * add JAR[S] * add ARCHIVE[S] *	Adds one or more files, jars, or archives to the list of resources in the distributed cache. See Hive ResourcesAdd one or more files, JAR packages, or compressed packages to the distributed cache. chax
	Execute Hive query and print standard output.
!	Run a Shell command in Hive