Local mode
Hive supports job execution after Version 0.7. Select Local Mode. Most Hadoop jobs require the full scalability provided by Hadoop to process big data.
A lot of SQL SQL running in Hive is small, with small amount of data and computation. If distributed SQL is executed in hive, the loss is not worth the gain. Because the actual execution of the SQL may take only 10 seconds, but the execution of other processes for the generation of distributed tasks may take 1 minute. Such small tasks are more suitable for lcoal Mr. It is executed locally by pulling the input data back to the client.
Enable the way
- set hive.exec.mode.local.auto=true; (Default false)
The local mode can be used only when a job meets the following conditions:
- Job input data must be smaller than the size parameters: hive. The exec. Mode. Local. Auto. Inputbytes. Max (default 128 MB)
- Job on the number of the map must be smaller than the parameters: the hive. The exec. Mode. Local. Auto. The tasks. Max (4) by default
- The number of job reduces must be 0 or 1
example
I have a table that we demonstrated in hive Streaming in the previous tutorial, which I will execute without enabling local mode
select weekday,count(1) from ods.u_data_new group by weekday;
Copy the code
Below is the execution log, and we see that it took approximately 15.762 seconds
Starting Job = job_1608438780277_0029, Tracking URL = http://localhost:8088/proxy/application_1608438780277_0029/ 2020-12-27 12:51:01, 739 INFO [680439f2-d65f-4c1f-8eaf-c5aee654b618 main] exec.Task (SessionState.java:printInfo(1227)) - Starting Job = job_1608438780277_0029, Tracking URL = http://localhost:8088/proxy/application_1608438780277_0029/ Kill Command = / usr/local/Cellar/hadoop / 3.2.1 / libexec/bin/mapred job - kill job_1608438780277_0029 12:51:01 2020-12-27, 739 INFO [680439f2-d65f-4c1f-8eaf-c5aee654b618 main] exec.Task (SessionState.java:printInfo(1227)) - Kill Command = / usr/local/Cellar/hadoop / 3.2.1 / libexec/bin/mapred job - kill job_1608438780277_0029 hadoop job information for Stage 1: number of mappers: 1; number of reducers: 1 the 12:51:06 2020-12-27, 967 INFO [680439 f2 - c1f d65f - 4-8 eaf - c5aee654b618 main] exec. Task (SessionState. Java: printInfo (1227)) - Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2020-12-27 12:51:06,984 WARN [680439F2-d65F-4c1F-8eaf-c5aEE654b618 main] mapreduce.Counters 1 2020-12-27 12:51:06,984 WARN [680439F2-d65F-4c1F-8eaf-c5aEE654b618 main (AbstractCounters.java:getGroup(235)) - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use Org, apache hadoop. Graphs. Home TaskCounter 12:51:06 2020-12-27, 984 map Stage - 1 = 0%, Reduce = 0% 2020-12-27 12:51:06.984 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] exec.task (SessionState. Java: printInfo (1227) - the 2020-12-27 12:51:06, map Stage 984-1 = 0%, Reduce = 0% 2020-12-27 12:51:11,068 stage-1 map = 100%, Reduce = 0% 2020-12-27 12:51:11.068 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] exec.task (SessionState. Java: printInfo (1227) - the 2020-12-27 12:51:11, map Stage 068-1 = 100%, Reduce = 0% 2020-12-27 12:51:16,171 stage-1 map = 100%, Reduce = 100% 2020-12-27 12:51:16.172 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] exec.task (SessionState. Java: printInfo (1227) - the 2020-12-27 12:51:16, map Stage 171-1 = 100%, Reduce = 100% Ended Job = job_1608438780277_0029 2020-12-27 12:51:17,198 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] exec.Task (SessionState.java:printInfo(1227)) - Ended Job = job_1608438780277_0029 MapReduce Jobs Launched: The 2020-12-27 12:51:17, 206 INFO [680439 f2 - c1f d65f - 4-8 eaf - c5aee654b618 main] ql. The Driver (SessionState. Java: printInfo (1227)) MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 1192486 HDFS Write: 227 SUCCESS 2020-12-27 12:51:176INFO [680439F2-D65F-4c1F-8eAF-C5aEE654b618 main] ql.driver (SessionState.java:printInfo(1227)) - Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 1192486 HDFS Write: 227 SUCCESS Total MapReduce CPU Time Spent: 0 msec 2020-12-27 12:51:17.206 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.driver (SessionState.java:printInfo(1227)) - Total MapReduce CPU Time Spent: Msec 2020-12-27 12:51:17,206 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.Driver (driver. Java :execute(2531)) - Completed executing command(queryId=liuwenqiang_20201227125101_07ce9455-4e6d-40f5-9e33-e9691a0a459d); Time taken: 15.655 seconds OK 2020-12-27 12:51:17,206 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.driver (SessionState. Java: printInfo (1227)) - OK 2020-12-27 12:51:17, 206 INFO [680439 f2 - c1f d65f - 4-8 eaf - c5aee654b618 main] ql.Driver (Driver.java:checkConcurrency(285)) - Concurrency mode is disabled, Not Creating a Lock Manager 2020-12-27 12:51:179INFO [680439F2-d65F-4c1F-8eaf-C5aee654b618 main] mapred.FileInputFormat (FileInputFormat.java:listStatus(259)) - Total input files to process : 1 the 12:51:17 2020-12-27, 210 INFO [680439 f2 - c1f d65f - 4-8 eaf - c5aee654b618 main] sasl. SaslDataTransferClient (SaslDataTransferClient.java:checkTrustAndSend(239)) - SASL encryption trust check: localHostTrusted = false, RemoteHostTrusted = false 2020-12-27 12:51:17211 INFO [680439F2-d65F-4c1F-8eaf-c5aEE654b618 main] exec.ListSinkOperator (Operator.java:logStats(1038)) - RECORDS_OUT_OPERATOR_LIST_SINK_10:7, RECORDS_OUT_INTERMEDIATE:0, 1 12254 2 13579 3 14430 4 15114 5 14743 6 18229 7 11651 Time taken: 15.762 seconds, Fetched: 7 row(s)Copy the code
Now let’s turn on the local execution mode and do it again
set hive.exec.mode.local.auto=true;
select weekday,count(1) from ods.u_data_new group by weekday;
Copy the code
As you can see from the following log output, using LocalJobRunner, which is to start the local execution mode, takes only 1.507 seconds, which is much faster than the distributed execution mode. This is useful in many cases where we are testing or have a small amount of data.
The 2020-12-27 12:55:05, 589 INFO [pool - 18 - thread - 1] mapred. LocalJobRunner (LocalJobRunner. Java: statusUpdate (634)) - reduce > Reduce 2020-12-27 12:55:05.589 INFO [pool-18-thread-1] mapred.Task (task.java :sendDone(1380)) - Task Attempt_local636843578_0002_r_000000_0 'done.2020-12-27 12:55:05,589 INFO [pool-18-thread-1] mapred.task (Task.java:done(1276)) - Final Counters for attempt_local636843578_0002_r_000000_0: Counters: 35 File System Counters FILE: Number of bytes read=162523387 FILE: Number of bytes written=83727359 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2359027 HDFS: Number of bytes written=81939664 HDFS: Number of read operations=123 HDFS: Number of large read operations=0 HDFS: Number of write operations=87 HDFS: Number of bytes read erasure-coded=0 Map-Reduce Framework Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=146 Reduce input records=7 Reduce output records=0 Spilled Records=7 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=0 Total committed heap usage (bytes)=252706816 HIVE CREATED_FILES=1 RECORDS_OUT_0=7 RECORDS_OUT_INTERMEDIATE=0 RECORDS_OUT_OPERATOR_FS_6=7 RECORDS_OUT_OPERATOR_GBY_4=7 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Output Format Counters Bytes Written=0 2020-12-27 12:55:05,589 INFO [pool-18-thread-1] mapred.LocalJobRunner (localjoBrunner. Java :run(353)) - Finishing task: Attempt_local636843578_0002_r_000000_0 2020-12-27 12:55:05,589 INFO [thread-129] mapred.LocalJobRunner (LocalJobRunner. Java: runTasks (486) - the reduce task executor. Complete the 2020-12-27 12:55:06, map Stage 471-1 = 100%, Reduce = 100% 2020-12-27 12:55:06.471 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] exec.task (SessionState. Java: printInfo (1227) - the 2020-12-27 12:55:06, map Stage 471-1 = 100%, Reduce = 100% Ended Job = job_local636843578_0002 2020-12-27 12:55:06.472 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] exec.Task (SessionState.java:printInfo(1227)) - Ended Job = job_local636843578_0002 MapReduce Jobs Launched: The 2020-12-27 12:55:06, 477 INFO [680439 f2 - c1f d65f - 4-8 eaf - c5aee654b618 main] ql. The Driver (SessionState. Java: printInfo (1227)) MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 4718054 HDFS Write: 163879101 SUCCESS 2020-12-27 12:55:06,477 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.driver (SessionState.java:printInfo(1227)) - Stage-Stage-1: HDFS Read: 4718054 HDFS Write: 163879101 SUCCESS Total MapReduce CPU Time Spent: 0 msec 2020-12-27 12:55:06.478 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.driver (SessionState.java:printInfo(1227)) - Total MapReduce CPU Time Spent: 0 msec 2020-12-27 12:55:06/478 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.Driver (driver. Java :execute(2531)) - Completed executing command(queryId=liuwenqiang_20201227125504_616a97cf-1b85-4a7f-aaf4-f565a7d3843b); Time taken: 1.346 seconds OK 2020-12-27 12:55:06.478 INFO [680439F2-d65F-4c1F-8eaf-C5aEE654b618 main] ql.driver (SessionState. Java: printInfo (1227)) - OK 2020-12-27 12:55:06, 478 INFO [680439 f2 - c1f d65f - 4-8 eaf - c5aee654b618 main] ql.Driver (Driver.java:checkConcurrency(285)) - Concurrency mode is disabled, Not Creating a Lock Manager 2020-12-27 12:55:06,480 INFO [680439F2-d65F-4c1F-8eaf-C5aee654b618 main] mapred.FileInputFormat (FileInputFormat.java:listStatus(259)) - Total input files to process : 1 2020-12-27 12:55:06/483 INFO [680439F2-d65F-4c1f-8eaf-c5aEE654b618 main] exec.ListSinkOperator (Operator.java:logStats(1038)) - RECORDS_OUT_OPERATOR_LIST_SINK_10:7, RECORDS_OUT_INTERMEDIATE:0, 1 12254 2 13579 3 14430 4 15114 5 14743 6 18229 7 11651 Time taken: 1.507 seconds, Fetched: 7 row(s)Copy the code
Strict mode
Hive provides a strict pattern that prevents users from executing queries that might have unintended unintended consequences. In layman’s terms, this pattern can prevent certain queries from being executed.
Enable the way
Through set hive. Mapred. Mode = strict; Enable strict mode. The following queries are not allowed in strict mode:
- No partition is specified on the partition table
- An order by statement with no limit
- Cartesian product: JOIN without ON statement
example
First we create a partitioned table and then load the data
CREATE TABLE ods.u_data (
userid INT,
movieid INT,
rating INT,
unixtime STRING)
partitioned by(year string,month string ,day string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/Users/liuwenqiang/ml-100k/u.data' OVERWRITE INTO TABLE ods.u_data partition(year='2020'.month='2020-12'.day='2020-12-21');
LOAD DATA LOCAL INPATH '/Users/liuwenqiang/ml-100k/u.data' OVERWRITE INTO TABLE ods.u_data partition(year='2020'.month='2020-12'.day='2020-12-22');
Copy the code
Try to execute a query without adding a partition condition
select * from ods.u_data;
Copy the code
Queries against partitioned tables without a partition filter are disabled for safety reasons
Error 10056]: Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.no.partition.filter to false and make sure that hive.mapred.mode is not set to 'strict' to proceed. Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features. No partition predicate for Alias "u_data" Table "u_data" org.apache.hadoop.hive.ql.parse.SemanticException: Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.no.partition.filter to false and make sure that hive.mapred.mode is not set to 'strict' to proceed. Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features. No partition predicate for Alias "u_data" Table "u_data" at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:192) at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:147) at org.apache.hadoop.hive.ql.parse.ParseContext.getPrunedPartitions(ParseContext.java:532) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.checkTree(SimpleFetchOptimizer.java:211) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.optimize(SimpleFetchOptimizer.java:144) at org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:114) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:250) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12295) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:330) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:659) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:323) at org.apache.hadoop.util.RunJar.main(RunJar.java:236)Copy the code