Spark Standalone mode and Yarn mode

Many tutorials teach you how to deploy Standalone mode or Yarn mode. Each deployment mode has a separate folder (e.g. spark-local, spark-YARN), which is a waste of space. This tutorial is intended for those with some experience in big data deployment. Caution: 1. Workers, spark-env.sh, and spark-default.conf are in the conf folder of spark. 2.Standalone mode or Yarn mode tasks are now on the history server http://hd1:18080.

Profile configuration

/etc/profile

# Yarn mode related, originally configured in spark-env.sh
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
Copy the code

workers

Standalone mode Configuration. The Standalone mode requires three Standalone devices for computing and storage. Deploy yarn independently
hd1
hd2
hd3
Copy the code

spark-env.sh

# Common configuration
export JAVA_HOME=/opt/soft/java
The Standalone mode shows the Standalone Standalone mode. The Standalone HA mode shows the Standalone mode
#SPARK_MASTER_HOST=hd1
#SPARK_MASTER_PORT=7077
# Standalone HA mode
SPARK_MASTER_WEBUI_PORT=8082
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=hd1,hd2,hd3
-Dspark.deploy.zookeeper.dir=/spark"
# Common history server configuration
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://hd1:8020/spark/history
-Dspark.history.retainedApplications=30"

Copy the code

Create the corresponding file for the history server (if not available)

hdfs dfs -mkdir -p /spark/history

Copy the code

spark-default.conf

# Common history server configuration
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hd1:8020/spark/history
# add two configuration and click yarn application history to forward to spark ui
# to add this configuration click http://hd2:8088/cluster/apps History can jump to the corresponding application inside spark 18080 server
spark.yarn.historyServer.address=hd1:18080
spark.history.ui.port=18080
Copy the code

validation

Start related components and run the following commands

# standalone (HA) modebin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://hd1:7077,hd2:7077 \ . / examples/jars/spark - examples_2. 12-3.1.1. Jar \ 10# yarn patternbin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ . / examples/jars/spark - examples_2. 12-3.1.1. Jar \ 10Copy the code

View the effect: Standalone (HA) mode Check whether http://hd1:8082 (Spark UI) and http://hd1:18080/ (history server) have corresponding information. Yarn model performed view (yarn UI) http://hd2:8088/cluster/apps and http://hd1:18080/ server (history) whether there is a corresponding information and click on the yarn of the UI history history can jump to the server.