This article describes how to set up a Spark cluster.

To set up a Spark cluster, you need to:

Configure SSH non-secret login for the cluster
Java jdk1.8
Scala – 2.11.12
The spark – 2.4.0 – bin – hadoop2.7
Hadoop – 2.7.6

All of the above files are installed in the /home/zhuyb/opt folder.

The server

The servers were laboratory, using one master and three slave machines. The IP and the machine name are mapped in the hosts file so that the machine can be accessed directly through hostname.

ip addr	hostname
219.216.64.144	master
219.216.64.200	hadoop0
219.216.65.202	hadoop1
219.216.65.243	hadoop2

Configure SHH encryption-free login

For details see cluster environment SSH password-free login setup SHH password-free login configuration is actually very easy, we now have four machines, first we generate new public and private key files on master.

Pub cat id_rsa.pub >> authorized_keys # Copy the public key to authorized_keys fileCopy the code

Then generate the public and private keys on the three slave machines in the same way, copy the respective authorized_keys file to the master, and append the public key from the file to the master’s authorized_keys file.

Ssh-copy-id -i master # Copy the public key to master authorized_keysCopy the code

Finally, copy the master authorized_keys file to the three slave machines. At this point, four machines can achieve SSH non-secret login.

Install the JDK and Scala

JDK version 1.8, Scala version 2.11. Scala 2.12 is somewhat incompatible with Spark 2.4, and there will be some issues in the programming process that should be resolved later. After decompressing the JDK and Scala files, you can configure environment variables in ~/.bashrc files.

exportJAVA_HOME = / home/zhuyb/opt/jdk1.8.0 _201export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
export PATH=$PATH:${JAVA_PATH}

exportSCALA_HOME = / home/zhuyb/opt/scala - 2.11.12export PATH=${SCALA_HOME}/bin:$PATH

Copy the code

Then run the source ~/.bashrc command. All operations need to be configured on each machine.

Configure Hadoop

Unzip Hadoop files to~/opt/Under the folder.

Tar -zxvf hadoop-2.1.3.tar. gz mv hadoop-2.7.3 ~/optCopy the code

in~/.bashrcConfigure environment variables in the file and executesource ~/.bashrcTo take effect

` ` `exportHADOOP_HOME = / home/zhuyb/opt/hadoopp - 2.7.6export PATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin/:$PATH
export CLASSPATH=.:$HADOOP_HOME/lib:$CLASSPATH
exportHADOOP_PREFIX = / home/zhuyb/opt/hadoop - 2.7.6 ` ` `Copy the code

Modify the corresponding configuration file

A. Modify $HADOOP_HOME /etc/hadoop/Slaves to delete the original localhost and change it to the following content:

    hadoop0
    hadoop1
    hadoop2
Copy the code

B. modify the $HADOOP_HOME/etc/hadoop/core – site. XML

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>File: / home/zhuyb/opt/hadoop - 2.7.6 / TMP</value>
    </property>

</configuration>

Copy the code

C. modify the $HADOOP_HOME/etc/hadoop/HDFS – site. XML

    <configuration>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:50010</value>
    </property>

    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:50090</value>
    </property>
    <! -- Number of backups: default is 3-->
     <property>
        <name>dfs.replication</name>
         <value>3</value>
     </property>
    <! -- namenode-->
     <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:/home/zhuyb/tmp/dfs/name</value>
     </property>
    <! -- datanode-->
     <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/home/zhuyb/tmp/dfs/data</value>
</property>
</configuration>
Copy the code

D. Copy the template to generate XML, cp mapred-site.xml.template mapred-site. XML, and modify $HADOOP_HOME/etc/hadoop/mapred-site. XML

<configuration>
<! -- MapReduce job execution framework is YARN -->
    <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
    </property>
<! -- MapReduce job record access address -->
    <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:10020</value>
     </property>
    <property>
           <name>mapreduce.jobhistory.webapp.address</name>
            <value>master:19888</value>
     </property>
</configuration>
Copy the code

E. modify $HADOOP_HOME/etc/hadoop/yarn – site. XML

    <configuration>
	<! -- Site specific YARN configuration properties -->
<property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
</property>

</configuration>
Copy the code

F. Modify $HADOOP_HOME/etc/hadoop/hadoop-env.sh to modify JAVA_HOME

exportJAVA_HOME = ~ / opt/jdk1.8.0 _121Copy the code

Copy the Master Hadoop folder to Hadoop0, Hadoop1, and hadoop2

SCP - r ~ / opt/hadoop - 2.7.3 zhuyb @ hadoop0: ~ / optCopy the code

Configure the Spark

Decompress the Spark file to~/optDown, and then in~/.bashrcConfigure environment variables and executesource ~/.bashrc

exportSPARK_HOME = / home/zhuyb/opt/spark - 2.4.0 - bin - hadoop2.7export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME
Copy the code

Template to spark-env.sh,cp spark-env.sh.template spark-env.shModify $SPARK_HOME/conf/spark-env.sh to add the following information:

exportJAVA_HOME = / home/zhuyb/opt/jdk1.8.0 _201export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
Copy the code

Slaves. Template become slaves,cp slaves.template slaves, modify $SPARK_HOME/conf/ Slaves and add the following:

hadoop0
hadoop1
hadoop2
Copy the code

Hadoop1, hadoop2,hadoop0, $SPARK_HOME/conf/spark-env.sh, Change export SPARK_LOCAL_IP to the IP addresses of hadoOP1, hadoOP2, and hadoOP0

reference

Hadoop2.7.3+Spark2.1.0 completely distributed cluster building process

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Spark Learning Spark cluster construction

The server

Configure SHH encryption-free login

Install the JDK and Scala

Configure Hadoop

Configure the Spark

reference

Spark Learning Spark cluster construction

The server

Configure SHH encryption-free login

Install the JDK and Scala

Configure Hadoop

Configure the Spark

reference

Related Posts

Visualize how much you spend on food each day

[Big Data Tribe] R language conducts text sentiment analysis on Twitter data

Recommend a very easy to use chart bed tool Picgo