Spark Installation Tutorial (Yarn mode)

Configure the environment

Hadoop version Hadoop-2.6.0-CDH5.15.1 Spark version Spark-2.4.8-bin-hadoop2.6 Java version jdk1.8.0_65


The configuration process

To use YARN mode, you need to install Spark on only one machine. In this document, install Spark on the machine where namenode is located

  1. Yarn configuration

    Modify the configuration files of the hadoop/home/hadoop/app/hadoop – server – cdh5.15.1 / etc/hadoop/yarn – site. XML, add the related content

    Because the VM memory in the test environment is small, to prevent accidental killing during execution, perform the following configuration. It is recommended to add this configuration. The following error occurs when the VM memory is not added before, and the vm runs perfectly after the following configuration is added

    org.apache.spark.shuffle.FetchFailedException: Failed to connect to s201/192.168.231.201:39838 org. Apache. Spark. Shuffle. FetchFailedException: Failed to connect to s202/192.168.231.202:45209Copy the code
      <! Whether to start a thread to check the amount of physical memory each task is using, and if the task exceeds the allocated value, it will be killed. Default is true -->
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    
    <! Whether to start a thread to check the amount of virtual memory each task is using, and if the task exceeds the allocated value, it will be killed. Default is true -->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    Copy the code

The spark configuration

  1. Modify /home/hadoop/app/spark-2.4.8-bin-hadoop2.6/conf/spark-env.sh to add YARN_CONF_DIR

       mv spark-env.sh.template spark-env.sh
       vi spark-env.sh
       <! -- spark-env.sh -->YARN_CONF_DIR = / home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoopCopy the code
  2. Starting the HDFS Cluster

    CD/home/hadoop/app/hadoop - server - cdh5.15.1 / sbin/start - all. ShCopy the code
  3. Submit a task Note if there is a error Java. Lang. ClassNotFoundException, must pay attention to. / examples/jars/spark – examples_2. 11-2.1.1. Jar address is correct, and character Spaces related issues

    CD/home/hadoop/app/spark - from 2.4.8 - bin - hadoop2.6 bin/spark - submit \ - class org. Apache. Spark. Examples. SparkPi \ - master Yarn \./examples/jars/spark-examples_2.11-2.4.8.jar \ 10Copy the code
  4. View the webUI http://192.168.231.200:8088/ click the history mission into yarn, view the history

Spark configures the history service

  1. Change the name of spark-default.conf.template
    CD /home/hadoop/app/spark-2.4.8-bin-hadoop2.6/conf mv spark-defaults.conf.template spark-defaults.conf<! -- Add configuration -->YARN_CONF_DIR = / home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoop export SPARK_HISTORY_OPTS ="<! -- The webUI access port number is 18080 -->
    -Dspark.history.ui.port=18080 
    <! History - specifies the server log storage paths, need and/home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoop/core - site. In XML fs. DefaultFS consistent - >
    -Dspark.history.fs.logDirectory=hdfs://s200:9000/spark_directory 
    <! If the value is exceeded, old Application information will be deleted. This is the number of applications in memory, not the number of applications displayed on the page.
    -Dspark.history.retainedApplications=20"    
    Copy the code

Spark Configure Yarn to view historical logs

To switch to the Spark task management page, click History on the YARN webUI. Therefore, you need to configure the Spark history service

  1. Modify the spark-defaults.conf configuration file

    CD /home/hadoop/app/spark-2.4.8-bin-hadoop2.6/conf mv spark-defaults.conf.template spark-defaults.conf<! Add the following configuration -->
    spark.eventLog.enabled          true
    spark.eventLog.dir              hdfs://s200:9000/spark_directory
    spark.yarn.historyServer.address s200:18080
    spark.history.ui.port 18080
    Copy the code
  2. Restart the Spark history service

sbin/stop-history-server.sh
sbin/start-history-server.sh 
Copy the code
  1. To viewyarnWeb – UI intoyarnThe webUIhttp://192.168.231.200:8088/Click History to jump to spark’s UI

HistoryServer configuration of Yarn

Click log on yarn-web to find java.lang.Exception: Unknown container. Container either has not started... Yarn Log Monitoring

  1. You need to enable the HistoryServer service
CD/home/hadoop/app/hadoop - server - cdh5.15.1 / etc/hadoop vim yarn - site. XML<! -- Configuration data -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>s200:10020</value>
</property>
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>s200:19888</value>
</property>
</configuration>
Copy the code
  1. Start the HistoryServer service
CD /home/hadoop/app/hadoop-2.6.0-cdh5.15.1/sbin./ mr-jobHistory -daemon.sh start historyServerCopy the code

reference

Blog.csdn.net/dwt14154033…