Hive3.1.2+ big data engine Tez0.9.2 installation, deployment and use test

This blog is original, if the installation version number and I can directly download my compiled in my public number, reply tez, welcome to discuss welcome to pay attention to the public number: later X big data

The process of installing TEz is a bit trickier, and compiling is relatively simple. Now let’s recheck. Here is my version number

The framework	The version number
Hadoop	3.1.3
Hive	3.1.2
Tez	0.10.1

If you can read this article, you can also know what TEZ is doing. Here, I will not introduce it and start installing it directly

Hadoop3.X is a Tez engine that needs to be compiled by hadoop3. X (For Tez 0.8.3 and later, Tez requires Apache Hadoop 2.6.0 or later). For Tez 0.9.0 and later, Tez requires Apache Hadoop 2.7.0 or later.)

If you encounter some problems in the compilation process, you can also directly contact me (l970306love), maybe you have this problem I happened to encounter, can save a lot of unnecessary time, haha.

1. Process of compiling Tez0.10.1: follow the process of the official website (I take Tez0.9.2 as an example, you can download my compiled version directly if the version number is the same as mine)

Later, X Big Data replied “TEz”;

Download the SRC tez. Tar. Gz source package, with the official download link (tez.apache.org/releases/in…
After downloading, upload it to Linux system, and extract it. I used 0.9.2 as an example. The new version of tez0.10.1 needs to be downloaded from github, with a link (github.com/apache/tez)
You need to change the value of the Hadoop.version attribute in pom.xml to match the version of the Hadoop branch being used. Here is Apache Hadoop 3.1.3
Hadoop 3.1.3 uses guava version 27.0-JRE, while TEZ default is 11.0.2, so it must be modified. Otherwise, teZ will not work after the installation.

[later@bigdata101 lib]# CD/opt/module/hadoop - 3.1.3 / share/hadoop/common/lib /
[later@bigdata101 lib]# ls |grep guavaGuava - 27.0 - the jre. Jar listenablefuture - 9999.0 - empty - to - get - conflict - with - guava. JarCopy the code

Also, when compiling tez, the tez-UI module is time-consuming and useless, so we can skip it
The first step is to start compiling. The first step is to have a compilation environment, so install Maven, install Git, and refer to my previous article (blog.csdn.net/weixin_3858… , and finally install the build tool

Install the build tool
yum -y install autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++
Copy the code

First install protobuf (official website for 2.5.0), download link (github.com/protocolbuf… After the source package is ready, unzip and compile Protobuf 2.5.0

./configure
make install
Copy the code

Start compiling Tez (this process takes different time)

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true`
Copy the code

After compiling successfully, the package will be in apache-tez-0.92-src /tez-dist/target/. What we need is two of them. I took a screenshot of 0.10.1, but everything is the same except the version number

2. Configure Tez for Hive

Copy the teZ installation package to the cluster, and extract the tar package, note that the extract is minimal

Mkdir /opt/module/tez tar -zxvf /opt/software/ tez-0.10.1-snapshot-minimal.tar.gz -c /opt/module/tezCopy the code

Upload tez dependency to HDFS (upload is the one without minimal)

Hadoop fs-mkdir /tez hadoop fs-put /opt/software/tez-0.10.1- snapshot.tar. gz /tezCopy the code

Create tez-site. XML in $HADOOP_HOME/etc/hadoop/. Synchronize tez-site. XML to other machines in the cluster.

<? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <configuration> <! <property> <name>tez.lib.uris</name> <value>${fs.defaultFS}/tez/ tez-0.10.1-snapshot.tar. gz</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> <property> <name>tez.am.resource.memory.mb</name> <value>1024</value> </property> <property> <name>tez.am.resource.cpu.vcores</name> <value>1</value> </property> <property> < name > tez. Container. Max. Java heap. Fraction < / name > < value > 0.4 < value > / < / property > < property > <name>tez.task.resource.memory.mb</name> <value>1024</value> </property> <property> <name>tez.task.resource.cpu.vcores</name> <value>1</value> </property> </configuration>Copy the code

Modify Hadoop environment variables and add the following

hadoop_add_profile tez
function _tez_hadoop_classpath
{
    hadoop_add_classpath "$HADOOP_HOME/etc/hadoop" after
    hadoop_add_classpath "/opt/module/tez/*" after
    hadoop_add_classpath "/opt/module/tez/lib/*" after
}
Copy the code

The effect is shown in figure

To modify the hive computing engine, vim $HIVE_HOME/conf/hive-site. XML adds the following information

<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>
<property>
    <name>hive.tez.container.size</name>
    <value>1024</value>
</property>
Copy the code

Add the teZ path in hive-env.sh

export TEZ_HOME=/opt/module/tez    # is the unzip directory for your tez
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

exportHIVE_AUX_JARS_PATH = / opt/module/hadoop - 3.1.3 / share/hadoop/common/hadoop - lzo - 0.4.21 - the SNAPSHOT. The jar$TEZ_JARS
Copy the code

Note that your jar package path is not the same as mine, need to compile LZO to see my other blog installation compile LZO, attached link (blog.csdn.net/weixin_3858…

Resolve log Jar package conflicts

Rm/opt/module/tez/lib/slf4j - log4j12-1.7.10. JarCopy the code

To this end, hive configuration is done, we first run the official test case, you need to write a file in the local first, and then upload to HDFS, I wrote a word. TXT, upload to HDFS

Upload this file to HDFS
hadoop fs -put word.txt /tez/
Copy the code

## Start testing wordCount, pay attention to your jar package path/opt/module/hadoop-3.1.3/bin/yarn jar /opt/module/tez/tez-examples 0.10.1- snapshot. jar orderedwordcount /tez/word.txt /tez/output/Copy the code

The results of the run are as follows

The teZ engine can be used in Hive

Hive (default)> create table student(id int, name string); Hive (default)> insert into student values(1,"zhangsan"); Hive (default)> select * from student; 1 zhangsanCopy the code

Set mapreduce.framework.name = yarn;

Step 1

Application application_1589119407952_0003 failed 2 times due to AM Container for appattempt_1589119407952_0003_000002 exited with exitCode: -103 Failing this attempt.Diagnostics: [2020-05-10 22:06:09.140]Container [PID =57149,containerID= container_E14_1589119407952_0003_02_000001] is running 759679488B beyond the'VIRTUAL'Memory limit. Current Usage: 140.9 MB of 1 GB physical memory used; 2.8 GB of 2.1 GB Virtual memory use. Killing container. Dump of the process-treeforcontainer_e14_1589119407952_0003_02_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE | - 57149, 57147, 57149, 57149 (bash) 0 2 118079488 368 / bin/bash - c/opt/module/jdk1.8.0 _211 / bin/Java - Djava. IO. Tmpdir = / opt/module/hadoop - 3.1.3 / TMP/nm - local - dir/usercache/later/appcache/application_1589119407952_0003 / conta iner_e14_1589119407952_0003_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties - Dyarn. App. Container. The dir = / opt/module/hadoop - 3.1.3 / logs/userlogs/application_1589119407952_0003 / container_e14_1589119 407952_0003_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=' 'org.apache.tez.dag.app.DAGAppMaster --session 1 > / opt/module/hadoop - 3.1.3 / logs/userlogs application_1589119407952_0003 / container_e14_1589119407952_0003_02_000001 / stdou t 2 > / opt/module/hadoop - 3.1.3 / logs/userlogs application_1589119407952_0003 / container_e14_1589119407952_0003_02_000001 / stder R | - 57201 57149 57149 57149 871 158 2896457728 35706 (Java)/opt/module/jdk1.8.0 _211 / bin/Java - Djava. IO. Tmpdir = / opt/module/hadoop - 3.1.3 / TMP/nm - local - dir/usercache/later/appcache/application_1589119407952_0003 / conta iner_e14_1589119407952_0003_02_000001/tmp -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties - Dyarn. App. Container. The dir = / opt/module/hadoop - 3.1.3 / logs/userlogs/application_1589119407952_0003 / container_e14_1589119 407952_0003_02_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= org.apache.tez.dag.app.DAGAppMaster --session [2020-05-10 22:06:09.222] Exit code is 143 [2020-05-10 22:06:09.229] Exit code is 143 non-zeroexitcode 143. For more detailed output, check the application tracking page: http://bigdata101:8088/cluster/app/application_1589119407952_0003 Then click on links to logs of each attempt. . Failing  the application.Copy the code

This error is relatively detailed, that is, teZ check memory is not enough, this solution is two,

Disable virtual memory check. (1) Disable virtual memory check and change yarn-site. XML yarn.nodeManager. vmem-check-enabled false. (2) Distribution must be performed after the change and restart the Hadoop cluster.
Increase the size of the memory (hive-site.xml) as shown in the following figure

On hole 2

Tez run task error: Java. Lang. NoClassDefFoundError: org/apache/tez/dag/api/TezConfiguration at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2570) at java.lang.Class.getMethod0(Class.java:2813) at java.lang.Class.getMethod(Class.java:1663) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.(ProgramDriver.java:59) at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:103) at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:47) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native  Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.TezConfigurationCopy the code

Seems is tell me what configuration can not find, that is not equivalent to mean my tez configuration has a problem, so I took tez – site. XML into $HADOOP_HOME/etc/hadoop, used to be in a hive/conf/directory does not become effective

On hole 3

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1589119407952_0008_1_00, diagnostics=[Vertex vertex_1589119407952_0008_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: _dummy_table initializer failed, vertex=vertex_1589119407952_0008_1_00 [Map 1], org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/tmp/later/5649ae4c-2ca8-4c7a-82fe-c3003a953682/hive_2020-05-10_22-19-58_227_6887259009104125247-1/dummy_path
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:332)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:274)
	at org.apache.hadoop.hive.shims.Hadoop23ShimsThe $1.listStatus(Hadoop23Shims.java:134)
	at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
	at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:76)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:321)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:444)
	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:564)
	at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:489)
	at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:338)
	at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallableThe $1.run(RootInputInitializerManager.java:280)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallableThe $1.run(RootInputInitializerManager.java:271)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:271)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:255)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
]
Copy the code

Mapreduce.framework. name = local (default setting in Hadoop 3.2.1) Set mapreduce.framework.name = YARN to resolve the problem.

On hole 4

Application application_1589119407952_0002 failed 2 times due to AM Container for appattempt_1589119407952_0002_000002 exited with exitCode: 1 Failing this attempt.Diagnostics: [2020-05-10 22:05:49.552]Exception from container-launch. Container_e14_1589119407952_0002_02_000001 Exit code: 1 [2020-05-10 22:05:49.652]Container exited with a non-zeroexit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in[the jar file: / opt/module/hadoop - 3.1.3 / TMP/nm - local - dir/filecache / 10 / tez - 0.10.1 - the SNAPSHOT. Tar. Gz/lib/slf4j - log4j12-1.7.10. Jar !/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found bindingin[the jar file: / opt/module/hadoop - 3.1.3 / share/hadoop/common/lib/slf4j - log4j12-1.7.25. Jar! /org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type[org. Slf4j. Impl. Log4jLoggerFactory] [the 2020-05-10 22:05:49. 652] Container exited with a non - zeroexit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in[the jar file: / opt/module/hadoop - 3.1.3 / TMP/nm - local - dir/filecache / 10 / tez - 0.10.1 - the SNAPSHOT. Tar. Gz/lib/slf4j - log4j12-1.7.10. Jar !/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found bindingin[the jar file: / opt/module/hadoop - 3.1.3 / share/hadoop/common/lib/slf4j - log4j12-1.7.25. Jar! /org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type[org.slf4j.impl.Log4jLoggerFactory] For more detailed output, check the application tracking page: http://bigdata101:8088/cluster/app/application_1589119407952_0002 Then click on links to logs of each attempt. . Failing  the application.Copy the code

This problem is the most headache among them, I can’t see any bug, but I know that TEz session can’t access AM. In this case, Hive –hiveconf hive.root. Logger =DEBUG,console: hive –hiveconf hive.root. Logger =DEBUG,console: hive –hiveconf hive.root. Logger =DEBUG,console: hive –hiveconf hive.root. Although it is an INFO log, the last output is stuck for a while, is this page

Compilation should have a normal heart, in accordance with the method of a post installed, otherwise the final installation of chaos. Come on.

Download my compiled TEZ JAR package, follow my public account “later X big data” and reply to TEz

Hive3.1.2+ big data engine Tez0.9.2 installation, deployment and use test

1. Process of compiling Tez0.10.1: follow the process of the official website (I take Tez0.9.2 as an example, you can download my compiled version directly if the version number is the same as mine)

2. Configure Tez for Hive

Set mapreduce.framework.name = yarn;

Step 1

On hole 2

On hole 3

On hole 4

Related Posts

Can ICLR’s 2018 Best Paper AMSGrad replace Adam

Solve the symbol conflict problem in the process of dynamic library link

Numpy06: Input and output