The Spark deployment process refers to the Spark distributed cluster environment_bosea-csDN blog_Spark cluster. However, there are many problems in the deployment process due to different software versions. This section describes how to build clusters for Hadoop3.2 and Spark3.1.2 based on the original version
VMware Workstation Pro 64-bit Operating system: Ubuntu16.04 64-bit Java: JDK-8u3301-linux-x64.tar. gz Scala: VMware Workstation Pro 64-bit Operating system: Ubuntu16.04 64-bit Java: JDK-8u3301-linux-x64.tar. gz Scala-2.12.15.tgz Hadoop: hadoop-3.2.2.tar.gz Spark: spark-3.1.2-bin-hadoop3.2.tgz
Ii. Construction process
1. Create a VM. You can create a user dedicated to Spark for the VM. Run the following commands in sequence on the command line
Sudo user add -m hadoop -s /bin/bash // Create a Hadoop user
Sudo passwd hadoop // Set the hadoop password
Sudo adduser Hadoop sudo // Add hadoop user to sudo group
2. After the cluster is set up, there is one master and two workers (worker1 and worker2). Now set the first VM to master. Example Change the host name to master
Sudo vi /etc/hostname // Set hsotname
Reboot // Make the Settings take effect
3. Change its OWN IP address to a fixed valueSudo vi /etc/network/interfaces // Changes the IP address
Ifconfig // Displays the current IP address
4. Modify the host file to add hosts of master, worker1, and worker2. Set the IP addresses of worker1 and worker2 to 192.168.127.200 and 192.168.127.210 respectivelySudo vi /etc/hosts // Modify hosts
5. Install the JDK
You are advised to download the installation package and move it to a VM. Create a spark folder under /home/hadoop. Create multiple folders under the Spark folder. Each folder is used to store multiple software versions. For example, my structure looks like this:
Tar -zvxf jdK-8u301-linux-x64.tar. gz // Decompress the installation package
Then add the package location to the environment variable,sudo vi /etc/profile
Source /etc/profiler// Enables the Settings to take effect
Java -version // Check whether the installation is successful
6. Install scalaTar -zxvf Scala-2.12.15.tgz // Decompress the installation package
Add the package location to the environment variable,sudo vi /etc/profile
Source /etc/profiler// Enables the Settings to take effect
Scala-version // Check whether the installation is successful
7. Install the SSH service
Sudo apt-get install openssh-server // Install SSH
8. Clone the host
Based on the existing general configuration, clone two VMS as Worker1 and worker2. You need to change the hostname(worker1, worker2) and IP address (192.168.127.200, 192.168.127.210) of the cloned hosts. After the modification, run the ping command to check whether the three devices can communicate with each other.
9. Generate a public and private key pair on each host.
ssh-keygen -t rsa
Then send the master id_rsa.pub on woker1 and worker2
Worker1 on:scp ~/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.woeker1
Worker2 on:scp ~/.ssh/id_rsa.pub hadoop@master:~/.ssh/id_rsa.pub.woeker2
On master, load all public keys into authorized_key, the public key file used for authenticationcat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys
Then distribute the public key file authorized_key on master to Worker1 and Worker2
scp ~/.ssh/authorized_keys hadoop@worker1:~/.ssh/
scp ~/.ssh/authorized_keys hadoop@worker2:~/.ssh/
Finally, try to see if you can use SSH to log in to other hosts without passwordssh worker1
10. Hadoop installation
Tar -zxvf hadoop-3.2.2.tgr.gz // Decompress the installation package
After decompressing, go to the Hadoop configuration directory and modify the configuration file
CD spark/hadoop/hadoop-3.2.2 /etc/hadoop// Go to the configuration directory
Vi hadoop-env.sh // Modify the configuration file
The modified configuration file looks like this:
hadoop-env.sh:
yarn-env.sh:
workers:
core-site.xml:
hdfs-site.xml:
mapred-site.xml:
yarn-site.xml:
11. After configuring Hadoop on master, distribute Hadoop to two workers.
SCP - r ~ / saprk/hadoop/hadoop - 3.2.2 haddop @ worker1: ~ / spark/hadoop/hadoop - 3.2.2
SCP - r ~ / saprk/hadoop/hadoop - 3.2.2 haddop @ worker2: ~ / spark/hadoop/hadoop - 3.2.2
12. Format namenode
CD ~ / spark/hadoop/hadoop - 3.2.2
bin/hadoop namenode -format
13. Start the Hadoop cluster and verify it
CD ~ / spark/hadoop/hadoop - 3.2.2
sbin/start-dfs.sh
sbin/start-yarn.sh
It was then executed separately on the three machinesjps
, you can see namenode, Datanode, and NodeManager are running on the master, and Datanodes are running on the two workers.
14. Install and configure SparkTar -zxvf spark-3.2.2-bin-hadoop3.2. TGZ // Decompress the installation package
Go to spark-3.2.2-bin-hadoop3.2 /conf, copy all the template files, delete the. Template in the name of the copied file, and then edit the template filevi spark-env.sh
spark-env.sh:
workers:
Distribute spark on the master.
SCP - r ~ / saprk/spark - 3.1.2 - bin - hadoop3.2 haddop @ worker1: ~ / spark/saprk - 3.1.2 - bin - hadoop3.2
SCP - r ~ / saprk/spark - 3.1.2 - bin - hadoop3.2 haddop @ worker2: ~ / spark/saprk - 3.1.2 - bin - hadoop3.2
15. Start the Spark cluster. First, start the Hadoop cluster according to [13].
CD ~ / spark spark - 3.1.2 - bin - hadoop3.2
sbin\start-all.sh
To start Spark.
You can usejps
Command, access master:8080 for verification. You can also run it in the Spark directory./bin/spark-shell
Accessing the Spark Console