Spark on Yarn Cluster environment setup and example run

Environment to prepare

Prepare Linux (CentOS7) VMS

10.58.12.170
10.58.12.171
10.58.10.129
tdops
Copy the code

Software version
- The JDK 1.8.0 comes with _60
- Scala 2.11.12
- Hadoop 3.1.3
- The spark 2.4.6
- Livy 0.7.0

Configure hosts

sudo vim /etc/hosts

// Add the following host configuration 10.58.12.171 ailoan-VIP-d-012171.hz. td 10.58.12.170 ailoan-VIP-D-012170.hz. td 10.58.10.129 ailoan-vip-d-010129.hz.tdCopy the code

Configure no-secret login for the three machines

Install openssh server. –

sudo yum install openssh-server
Copy the code

To generate a public key
```
Ssh-keygen -t rsa #Copy the code
```
Public keys append to authorized_keys

Test success

170 > SSH ailoan-VIP-d-012171.hz. td 170 > SSH ailoan-VIp-d-012171.hz. tdCopy the code

Install the JDK

Install Scala

Downloading the Installation package

Downloads.lightbend.com/scala/2.11….

Copy the installation package to the target machine

SCP {username}@localip:/Users/{username}/Downloads/ bigdata software /scala-2.11.12.tgz /usr/install/bigdataCopy the code

Unzip to the target file

Sudo tar -zxvf scala-2.11.12.tgzCopy the code

Configuring environment Variables

# editing environment variable sudo vim/etc/profile # to add the following configuration export SCALA_HOME = / usr/install/bigdata/scala - 2.11.12 export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin #Copy the code

Test installation results

Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-bit Server VM, Java 1.8.0_60).Copy the code

Install Hadoop

Downloading software Packages

CD/usr/install/bigdata wget HTTP: / / https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.1.3/hadoop-3.1.3.tar.gzCopy the code

Decompress to the specified directory

Sudo tar - ZXVF hadoop - 3.1.3. Tar. GzCopy the code

Assign users to the Hadoop directory

Sudo chown -r tdops:users hadoop-3.1.3Copy the code

Configure environment variables and application configuration

Configure hadoop – env. Sh

# to jump to the hadoop installation directory configuration directory of the CD/usr/install/bigdata/hadoop - 3.1.3 / etc/hadoop vim hadoop - env. Sh # add JDK home directory for export JAVA_HOME = / usr/install/jdk1.8.0 _60 export HADOOP_LOG_DIR = / home/tdops/spark/hadoop - 3.1.3 / logsCopy the code

Configuration of yarn – evn. Sh

Sh # add JDK home directory export JAVA_HOME=/usr/install/jdk1.8.0_60Copy the code

Configure the core – site. The XML

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <! - the hostname of the master - >
        <value>hdfs://ailoan-vip-d-012170.hz.td:9000/</value>
    </property>
    <property>
         <name>hadoop.tmp.dir</name>
         <value>/ home/tdops/spark/hadoop - 3.1.3 / TMP</value>
    </property>
</configuration>

Copy the code

Configuration HDFS – site. XML

<configuration>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>ailoan-vip-d-012170.hz.td:50090</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/ home/tdops/spark/hadoop - 3.1.3 / DFS/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/ home/tdops/spark/hadoop - 3.1.3 / DFS/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
</configuration>

Copy the code

Configuration mapred – site. XML

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
Copy the code

Configuration of yarn – site. XML

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<! Demo - this configuration join temporarily solve the yarn running mode abend problem -- -- > https://stackoverflow.com/questions/41468833/why-does-spark-exit-with-exitcode-16
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>ailoan-vip-d-012170.hz.td:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>ailoan-vip-d-012170.hz.td:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>ailoan-vip-d-012170.hz.td:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>ailoan-vip-d-012170.hz.td:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>ailoan-vip-d-012170.hz.td:8090</value>
</property>
Copy the code

Verifying the Installation

accessAiloan-vip-d-012170.hz. td:8090
Access the HADOOP HDFS webUI Decryption address: ailoan-VIP-D-012170.Hz. td:9870/

Install the Spark

Download the spark package
- Official website to download spark.apache.org/downloads.h…
- And upload it to the prepared service
- SCP {username}@localip:/Users/{username}/Downloads/ bigdata software /spark-2.4.6-bin-hadoop2.7. TGZ /usr/install/bigdata

Decompress to the specified directory

Sudo tar ZXVF - spark - 2.4.6 - bin - hadoop2.7. TGZCopy the code

Modify the owning user and group

Sudo chown -r TDOPS: Users spark-2.4.6-bin-hadoop2.7Copy the code

Configure environment variables and properties

Run the spark-env.sh command in $SPARK_HOME/conf

Export SCALA_HOME = / usr/install/bigdata/scala - 2.11.12 export JAVA_HOME = / usr/install/jdk1.8.0 _60 export HADOOP_HOME = / usr/install/bigdata/hadoop - 3.1.3 export HADOOP_CONF_DIR = $HADOOP_HOME/etc/hadoop export SPARK_LOG_DIR = / home/tdops/spark/spark - 2.4.6 / logs SPARK_MASTER_IP = ailoan - VIP - d - 012170. Hz. Td SPARK_LOCAL_DIRS = / usr/install/bigdata/spark - 2.4.6 - bin - hadoop2.7 SPARK_DRIVER_MEMORY = 512 mCopy the code

Configure the slaves file under $SPARK_HOME/conf

// Set the two hosts as salve ailoan-VIP-D-012171.hz. td ailoan-VIP-D-010129.hz. tdCopy the code

To start the spark

cd $SPARK_HOME/sbin

./start-all.sh
Copy the code

Verifying the Installation
- ailoan-vip-d-012170.hz.td:8080/
- JPS Checks whether the master has a master process and the slave has a worker process
- Run the following official demo program
```
./spark-submit --master spark://ailoan-vip-d-012170.hz.td:7077 --class org.apache.spark.examples.SparkPi --deploy-mode Cluster file: / TMP/spark - examples_2. 11-2.4.6. JarCopy the code
```
- The following figure shows the spark execution result on the webUI

Install livy

Downloading software Packages

Download wget directly from wget https://www.apache.org/dyn/closer.lua/incubator/livy/0.7.0-incubating/apache-livy-0.7.0-incubating-bin.zipCopy the code

Decompress to the specified directory

Sudo yum install unzip unzip apache-livy-0.7.0-bin. Zip apache-livy-0.7.0Copy the code

Modifying configuration files

Add and configure livy-env.sh

Livy-env.sh. Template livy-env.sh Add the following information: sudo vim livy-env.sh JAVA_HOME=/usr/install/jdk1.8.0_60 HADOOP_CONF_DIR = / usr/install/bigdata/hadoop - 3.1.3 / etc/hadoop SPARK_HOME = / usr/install/bigdata/spark - 2.4.6 - bin - hadoop2.7 LIVY_LOG_DIR=/home/tdops/spark/livy/logsCopy the code

Add and configure livy.conf

Livy.conf. template livy.conf // edit the configuration file. Conf livy.spark. Deploy -mode=cluster livy.spark. Master =spark:// ailoan-VIP-d-012170.hz. td:7077 livy.file.local-dir-whitelist=/tmpCopy the code

Start the livy

cd $LIVY_HOME/bin

./livy-server start
Copy the code

Look at and verify LIvy

Open ailoan-VIP-D-012170.hz.td :8998
Execute a demo using Postman or Java code

package cn.xxx.yuntu.common.util.livy.core;


import cn.xxx.yuntu.common.util.dto.ApiResult;
import cn.xxx.yuntu.common.util.livy.vo.LivyArg;
import cn.xxx.yuntu.common.util.livy.vo.LivyStatus;
import cn.xxx.yuntu.common.util.livy.vo.LivyResult;
import cn.xxx.yuntu.common.util.util.HttpUtil;
import cn.xxx.yuntu.common.util.util.LogUtil;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.commons.lang3.StringUtils;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/ * * *@author li.minqiang
 * @date2019/12/5 * /
public class LivyClient {

    public static final String DELETED = "deleted";
    public static final String LIVY_BATCH_URI = "%s/batches/%s";
    public static final String MSG = "msg";
    public static final String NOT_FOUND = "not found";
    public static final String SESSION = "Session";
    private final static String STARTING = "starting";
    private LivyArg livyArg;

    private LivyClient(a) {}public static LivyClient getInstance(LivyArg livyArg) {
        LivyClient livyClient = new LivyClient();
        livyClient.setLivyArg(livyArg);
        return livyClient;
    }


    public ApiResult submitSparkJar(a) {
        JSONObject data = new JSONObject();
        data.put("file", livyArg.getJarPath());
        data.put("className", livyArg.getClassName());
        data.put("name"."testLivy" + System.currentTimeMillis());
        data.put("executorCores", livyArg.getExecutorCores());
        data.put("executorMemory", livyArg.getExecutorMemory());
        data.put("driverCores", livyArg.getDriverCores());
        data.put("driverMemory", livyArg.getDriverMemory());
        data.put("numExecutors", livyArg.getNumExecutors());
        data.put("conf", livyArg.getConf());
        data.put("args", livyArg.getArgs());

        ApiResult apiResult = HttpUtil.postJson(String.format("%s/batches", livyArg.getLivyServer()), data);

        JSONObject obj = (JSONObject) apiResult.getResult();

        if(! apiResult.isSuccess() || StringUtils.isEmpty(obj.getString("state"))) {
            LogUtil.error("make livy request error:{}", JSON.toJSONString(apiResult));
            return ApiResult.failure("Failed to submit livy task");
        }

        LogUtil.info("livy submit result:{}", JSON.toJSONString(apiResult, true));
        return ApiResult.successWithResult(obj.getString("id"));
    }

    private String getLivyUrl(a) {
        return String.format("%s/ui/batch/%s/log", livyArg.getLivyServer(), livyArg.getTaskId());
    }

    private List<String> makeListLogs(JSONArray logs) {
        List<String> mlogs = new ArrayList<String>();
        if (logs == null) {
            return mlogs;
        }
        for (int i = 0; i < logs.size(); i++) {
            mlogs.add(logs.getString(i));
        }
        return mlogs;
    }

    private List<String> getLivyServerLogs(a) {
        String url = String.format("%s/batches/%s/log? size=-1", livyArg.getLivyServer(), livyArg.getTaskId());
        ApiResult apiResult = HttpUtil.get(url);
        if (apiResult.isSuccess()) {
            JSONObject r = (JSONObject) apiResult.getResult();
            return makeListLogs(r.getJSONArray("log"));
        }
        return new ArrayList<String>();
    }

    public LivyArg getLivyArg(a) {
        return livyArg;
    }

    public void setLivyArg(LivyArg livyArg) {
        this.livyArg = livyArg;
    }

    public static void main(String[] args) {
// JavaWordCount();
        SparkPi();
    }

    public static void SparkPi(a) {
        LivyClient livyClient = new LivyClient();
        LivyArg livyArg = new LivyArg();
        livyArg.setLivyServer("http://ailoan-vip-d-012170.hz.td:8998");
        livyArg.setJarPath("HDFS: / / ailoan - VIP - d - 012170. Hz. Td: 9000 / example/spark - examples_2. 11-2.4.6. Jar");
        livyArg.setClassName("org.apache.spark.examples.SparkPi");
        livyArg.setExecutorCores(1);
        livyArg.setDriverCores(1);
        livyArg.setExecutorMemory("512M");
        livyArg.setDriverMemory("512M");

        livyClient.setLivyArg(livyArg);

        ApiResult apiResult = livyClient.submitSparkJar();
        System.out.println(apiResult.getCode() + "-" + apiResult.getReason());
        System.out.println(apiResult);
    }

    public static void JavaWordCount(a) {
        LivyClient livyClient = new LivyClient();
        LivyArg livyArg = new LivyArg();
        livyArg.setLivyServer("http://ailoan-vip-d-012170.hz.td/:8998");
        livyArg.setJarPath("HDFS: / / ailoan - VIP - d - 012170. Hz. Td: 9000 / example/spark - examples_2. 11-2.4.6. Jar");
        livyArg.setClassName("org.apache.spark.examples.JavaWordCount");
        livyArg.setExecutorCores(1);
        livyArg.setDriverCores(1);
        livyArg.setExecutorMemory("512M");
        livyArg.setDriverMemory("512M");
        List<String> args = new ArrayList<>();
        args.add("hdfs://ailoan-vip-d-012170.hz.td:9000/example/wordCount.txt");
        livyArg.setArgs(args);

        livyClient.setLivyArg(livyArg);

        ApiResult apiResult = livyClient.submitSparkJar();
        System.out.println(apiResult.getCode() + "-"+ apiResult.getReason()); System.out.println(apiResult); }}Copy the code

Similarly, view the submission records and results on sparkWebUI

Spark on Yarn Cluster environment setup and example run

Environment to prepare

Prepare Linux (CentOS7) VMS

Software version

Configure hosts

Configure no-secret login for the three machines

Install openssh server. –

To generate a public key

Public keys append to authorized_keys

Test success

Install the JDK

Install Scala

Downloading the Installation package

Copy the installation package to the target machine

Unzip to the target file

Configuring environment Variables

Test installation results

Install Hadoop

Downloading software Packages

Decompress to the specified directory

Assign users to the Hadoop directory

Configure environment variables and application configuration

Verifying the Installation

Install the Spark

Download the spark package

Decompress to the specified directory

Modify the owning user and group

Configure environment variables and properties

To start the spark

Verifying the Installation

Install livy

Downloading software Packages

Decompress to the specified directory

Modifying configuration files

Start the livy

Look at and verify LIvy

Related Posts

VirtualBox | open virtual machine error

Use the ESP8266 module to DIY a water heater controller (iii) : APP part

Git Git