1/ Preparation
Prepare several servers, at least three. Determine which is the master node and which is the slave node (worker node). Set SSH password-free login from master to other work nodes. This requires the same user account on all machines. How to implement SSH Password-free login? Write. SSH/id_dans. pub file of master node to. SSH /authorized_keys file of each worker node. $ssh-keygen -t dsa Enter file in which to save the key (/home/you/.ssh/id_dsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: [empty] <2> On the worker node: # copy the ~/.ssh/id_dsa.pub file of the master node to the worker node, then use: $cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keysCopy the code
2/ Log in to the master node server (all operations are performed on the master node)
<1> Download the Spark installation package (determine the operating system and version number) from the official website, and upload the package to the master server
< 2 > extract
Tar ZXVF - spark - 2.3.1 - bin - hadoop2.6. TGZCopy the code
<3> Change the name of the file (rename or not change actually can be)
The mv spark - 2.3.1 - bin - hadoop2.6 spark - 2.3.1Copy the code
<4> CD Go to the conf directory of the installation package and modify the slaves. Template file
CD Spark -2.3.1 MV Slaves. Template SlavesCopy the code
<5> Open the modified file Slaves in <4> and add the host name of the node in the cluster
Vi Slaves then add the host name as shown below, add one (in this file, containing the host name of the master node) node1 node2 Node3 in one rowCopy the code
<6> Change the name of the spark-env.sh.template file to spark-env.sh, and then edit the spark-env.sh file
Sh template spark-env.sh Set the following parameters. JAVA_HOME: set the JAVA_HOME path. SPARK_MASTER_HOST: specifies the IP address of the master SPARK_WORKER_CORES: the number of cores that each worker can use from the node SPARK_WORKER_MEMORY: the number of memory that each worker can use from the nodeCopy the code
<7> Synchronize the Spark installation package on the master node to other worker nodes
SCP -r spark-2.3.1 node2: 'PWD' SCP -r spark-2.3.1 node3: 'PWD' Note: The path for storing the Spark installation package on the worker node is the same as that on the master node.Copy the code
<8> Go to the sbin directory and run the./start-all.sh command in the current directory
Note: Run sbin/start-all.sh on the master node (to run on the master node, not the worker node) to start the cluster. To stop the cluster, run sbin/stop-all.sh on the primary nodeCopy the code