Spark Installation Tutorial (Yarn mode)

Configure the environment

Hadoop version Hadoop-2.6.0-CDH5.15.1 Spark version Spark-2.4.8-bin-hadoop2.6 Java version jdk1.8.0_65

The configuration process

To use YARN mode, you need to install Spark on only one machine. In this document, install Spark on the machine where namenode is located

Yarn configuration

Modify the configuration files of the hadoop/home/hadoop/app/hadoop – server – cdh5.15.1 / etc/hadoop/yarn – site. XML, add the related content

Because the VM memory in the test environment is small, to prevent accidental killing during execution, perform the following configuration. It is recommended to add this configuration. The following error occurs when the VM memory is not added before, and the vm runs perfectly after the following configuration is added

org.apache.spark.shuffle.FetchFailedException: Failed to connect to s201/192.168.231.201:39838 org. Apache. Spark. Shuffle. FetchFailedException: Failed to connect to s202/192.168.231.202:45209Copy the code

  <! Whether to start a thread to check the amount of physical memory each task is using, and if the task exceeds the allocated value, it will be killed. Default is true -->
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<! Whether to start a thread to check the amount of virtual memory each task is using, and if the task exceeds the allocated value, it will be killed. Default is true -->
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
Copy the code

The spark configuration

Modify /home/hadoop/app/spark-2.4.8-bin-hadoop2.6/conf/spark-env.sh to add YARN_CONF_DIR

   mv spark-env.sh.template spark-env.sh
   vi spark-env.sh
   <! -- spark-env.sh -->YARN_CONF_DIR = / home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoopCopy the code

Starting the HDFS Cluster

CD/home/hadoop/app/hadoop - server - cdh5.15.1 / sbin/start - all. ShCopy the code

Submit a task Note if there is a error Java. Lang. ClassNotFoundException, must pay attention to. / examples/jars/spark – examples_2. 11-2.1.1. Jar address is correct, and character Spaces related issues
```
CD/home/hadoop/app/spark - from 2.4.8 - bin - hadoop2.6 bin/spark - submit \ - class org. Apache. Spark. Examples. SparkPi \ - master Yarn \./examples/jars/spark-examples_2.11-2.4.8.jar \ 10Copy the code
```
View the webUI http://192.168.231.200:8088/ click the history mission into yarn, view the history

Spark configures the history service

Change the name of spark-default.conf.template

CD /home/hadoop/app/spark-2.4.8-bin-hadoop2.6/conf mv spark-defaults.conf.template spark-defaults.conf<! -- Add configuration -->YARN_CONF_DIR = / home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoop export SPARK_HISTORY_OPTS ="<! -- The webUI access port number is 18080 -->
-Dspark.history.ui.port=18080 
<! History - specifies the server log storage paths, need and/home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoop/core - site. In XML fs. DefaultFS consistent - >
-Dspark.history.fs.logDirectory=hdfs://s200:9000/spark_directory 
<! If the value is exceeded, old Application information will be deleted. This is the number of applications in memory, not the number of applications displayed on the page.
-Dspark.history.retainedApplications=20"    
Copy the code

Spark Configure Yarn to view historical logs

To switch to the Spark task management page, click History on the YARN webUI. Therefore, you need to configure the Spark history service

Modify the spark-defaults.conf configuration file

CD /home/hadoop/app/spark-2.4.8-bin-hadoop2.6/conf mv spark-defaults.conf.template spark-defaults.conf<! Add the following configuration -->
spark.eventLog.enabled          true
spark.eventLog.dir              hdfs://s200:9000/spark_directory
spark.yarn.historyServer.address s200:18080
spark.history.ui.port 18080
Copy the code

Restart the Spark history service

sbin/stop-history-server.sh
sbin/start-history-server.sh 
Copy the code

To viewyarnWeb – UI intoyarnThe webUIhttp://192.168.231.200:8088/Click History to jump to spark’s UI

HistoryServer configuration of Yarn

Click log on yarn-web to find java.lang.Exception: Unknown container. Container either has not started... Yarn Log Monitoring

You need to enable the HistoryServer service

CD/home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoop vim yarn - site. XML<! -- Configuration data -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>s200:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>s200:19888</value>
</property>
</configuration>
Copy the code

Start the HistoryServer service

CD /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/sbin./ mr-jobHistory -daemon.sh start historyServerCopy the code

reference

Blog.csdn.net/dwt14154033…

Spark Installation Tutorial (Yarn mode)

Spark Installation Tutorial (Yarn mode)

Configure the environment

The configuration process

Yarn configuration

The spark configuration

Spark configures the history service

Spark Configure Yarn to view historical logs

HistoryServer configuration of Yarn

reference

Related Posts

In Python, sciKit-learn and PANDAS decision trees are used for iris iris data classification modeling and cross-validation

Under MAC, use scripts to watermark and upload pictures

Design and implementation of large flow reward system entry and display of Spring Festival wallet