The Spark deployment process refers to the Spark distributed cluster environment_bosea-csDN blog_Spark cluster. However, there are many problems in the deployment process due to different software versions. This section describes how to build clusters for Hadoop3.2 and Spark3.1.2 based on the original version

VMware Workstation Pro 64-bit Operating system: Ubuntu16.04 64-bit Java: JDK-8u3301-linux-x64.tar. gz Scala: VMware Workstation Pro 64-bit Operating system: Ubuntu16.04 64-bit Java: JDK-8u3301-linux-x64.tar. gz Scala-2.12.15.tgz Hadoop: hadoop-3.2.2.tar.gz Spark: spark-3.1.2-bin-hadoop3.2.tgz

Ii. Construction process

1. Create a VM. You can create a user dedicated to Spark for the VM. Run the following commands in sequence on the command line

Sudo user add -m hadoop -s /bin/bash // Create a Hadoop user

Sudo passwd hadoop // Set the hadoop password

Sudo adduser Hadoop sudo // Add hadoop user to sudo group

2. After the cluster is set up, there is one master and two workers (worker1 and worker2). Now set the first VM to master. Example Change the host name to master

Sudo vi /etc/hostname // Set hsotname

Reboot // Make the Settings take effect



3. Change its OWN IP address to a fixed valueSudo vi /etc/network/interfaces // Changes the IP address



Ifconfig // Displays the current IP address



4. Modify the host file to add hosts of master, worker1, and worker2. Set the IP addresses of worker1 and worker2 to 192.168.127.200 and 192.168.127.210 respectivelySudo vi /etc/hosts // Modify hosts



5. Install the JDK

You are advised to download the installation package and move it to a VM. Create a spark folder under /home/hadoop. Create multiple folders under the Spark folder. Each folder is used to store multiple software versions. For example, my structure looks like this:



Tar -zvxf jdK-8u301-linux-x64.tar. gz // Decompress the installation package

Then add the package location to the environment variable,sudo vi /etc/profile



Source /etc/profiler// Enables the Settings to take effect

Java -version // Check whether the installation is successful

6. Install scalaTar -zxvf Scala-2.12.15.tgz // Decompress the installation package

Add the package location to the environment variable,sudo vi /etc/profile



Source /etc/profiler// Enables the Settings to take effect

Scala-version // Check whether the installation is successful7. Install the SSH service

Sudo apt-get install openssh-server // Install SSH

8. Clone the host

Based on the existing general configuration, clone two VMS as Worker1 and worker2. You need to change the hostname(worker1, worker2) and IP address (192.168.127.200, 192.168.127.210) of the cloned hosts. After the modification, run the ping command to check whether the three devices can communicate with each other.

9. Generate a public and private key pair on each host.

ssh-keygen -t rsa

Then send the master id_rsa.pub on woker1 and worker2

Worker1 on:scp ~/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.woeker1

Worker2 on:scp ~/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.woeker2

On master, load all public keys into authorized_key, the public key file used for authenticationcat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys

Then distribute the public key file authorized_key on master to Worker1 and Worker2

scp ~/.ssh/authorized_keys hadoop@worker1:~/.ssh/

scp ~/.ssh/authorized_keys hadoop@worker2:~/.ssh/

Finally, try to see if you can use SSH to log in to other hosts without passwordssh worker1

10. Hadoop installation

Tar -zxvf hadoop-3.2.2.tgr.gz // Decompress the installation package

After decompressing, go to the Hadoop configuration directory and modify the configuration file

CD spark/hadoop/hadoop-3.2.2 /etc/hadoop// Go to the configuration directory

Vi hadoop-env.sh // Modify the configuration file

The modified configuration file looks like this:

hadoop-env.sh:



yarn-env.sh:



workers:



core-site.xml:



hdfs-site.xml:



mapred-site.xml:



yarn-site.xml:

11. After configuring Hadoop on master, distribute Hadoop to two workers.

SCP - r ~ / saprk/hadoop/hadoop - 3.2.2 haddop @ worker1: ~ / spark/hadoop/hadoop - 3.2.2

SCP - r ~ / saprk/hadoop/hadoop - 3.2.2 haddop @ worker2: ~ / spark/hadoop/hadoop - 3.2.2

12. Format namenode

CD ~ / spark/hadoop/hadoop - 3.2.2

bin/hadoop namenode -format

13. Start the Hadoop cluster and verify it

CD ~ / spark/hadoop/hadoop - 3.2.2

sbin/start-dfs.sh

sbin/start-yarn.sh

It was then executed separately on the three machinesjps, you can see namenode, Datanode, and NodeManager are running on the master, and Datanodes are running on the two workers.

14. Install and configure SparkTar -zxvf spark-3.2.2-bin-hadoop3.2. TGZ // Decompress the installation package

Go to spark-3.2.2-bin-hadoop3.2 /conf, copy all the template files, delete the. Template in the name of the copied file, and then edit the template filevi spark-env.sh

spark-env.sh:



workers:



Distribute spark on the master.

SCP - r ~ / saprk/spark - 3.1.2 - bin - hadoop3.2 haddop @ worker1: ~ / spark/saprk - 3.1.2 - bin - hadoop3.2

SCP - r ~ / saprk/spark - 3.1.2 - bin - hadoop3.2 haddop @ worker2: ~ / spark/saprk - 3.1.2 - bin - hadoop3.2

15. Start the Spark cluster. First, start the Hadoop cluster according to [13].

CD ~ / spark spark - 3.1.2 - bin - hadoop3.2 sbin\start-all.shTo start Spark.

You can usejpsCommand, access master:8080 for verification. You can also run it in the Spark directory./bin/spark-shellAccessing the Spark Console