The author | The date of | The weather |
---|---|---|
Yuan childe | 2020-01-27 (Monday) | After rain, the wind is strong in Dongguan |
The less you know, the less you don’t know
You can’t level up without your friends’ likes
First, environmental preparation
- The example uses the Centos7 64-bit operating system
- Java 1.8 or above
- Hadoop has been installed
- Python has been installed
- The Scala environment has been installed
Download the installation package
Official address: Go to download page
Download the latest software version: spark-2.4.4-bin-without-hadoop.tgz
3. Start installation
Creating an installation directory
[root@hadoop-master /soft]$tar-xvzf spark-2.4.4-bin-without-hadoop. TGZ [root@hadoop-master /soft]# chown-r Hadoop :hadoop spark-2.4.4-bin-without-hadoop [root@hadoop-master /soft]# ln -s spark-2.4.4-bin-without-hadoop sparkCopy the code
To set environment variables, the PYSPARK_DRIVER_PYTHON parameter is used to set the Python environment. For details, see Common Environments
[root@hadoop-master /soft]# vi /etc/profile
export SPARK_HOME=/soft/spark
export SPARK_CONF_DIR=/home/hadoop/spark/conf
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_DRIVER_PYTHON=$ANACONDA_HOME/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_HOME/bin/python
[root@hadoop-master /soft]# source /etc/profile
Copy the code
Create configuration and staging directory folders
[root@hadoop-master /soft]# su - hadoop
[hadoop@hadoop-master /home/hadoop]$ mkdir -p /home/hadoop/spark/conf
[hadoop@hadoop-master /home/hadoop]$ cp -fr /soft/spark/conf/* /home/hadoop/spark/conf/
Copy the code
Modify the configuration file
[hadoop@hadoop-master /home/hadoop]$ cp /home/hadoop/spark/conf/spark-env.sh.template /home/hadoop/spark/conf/spark-env.sh
[hadoop@hadoop-master /home/hadoop]$ cp /home/hadoop/spark/conf/slaves.template /home/hadoop/spark/conf/slaves
[hadoop@hadoop-master /home/hadoop]$ vi /home/hadoop/spark/conf/spark-env.sh
export JAVA_HOME=/soft/jdk
export SCALA_HOME=/soft/scala
export SPARK_HOME=/soft/spark
export SPARK_CONF_DIR=/home/hadoop/spark/conf
export SPARK_LOG_DIR=/home/hadoop/spark/log
export SPARK_MASTER_IP=hadoop-master
export SPARK_WORKER_MEMORY=512m
export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop
export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)
[hadoop@hadoop-master /home/hadoop]$ vi /home/hadoop/spark/conf/slaves
hadoop-dn1
hadoop-dn2
hadoop-dn3
Copy the code
Follow section 5 to handle possible exceptions before synchronizing the installation files to child nodes
#Under the hadoop users[hadoop@hadoop-master /soft]$ xrsync.sh /soft/spark ================ dn1 ================== ================ dn2 ================== ================ dn3 ================== [hadoop@hadoop-master /soft]$ xrsync.sh /soft/spark-2.4.4-bin-without-hadoop [hadoop@hadoop-master /soft]$xrsync.sh /home/hadoop/spark#Under the root user
[hadoop@hadoop-master /soft]$ su - root
[root@hadoop-master /root]# xrsync.sh /etc/profile
[root@hadoop-master /root]# xcall.sh source /etc/profile
Copy the code
Ready to start
[hadoop@hadoop-master /home/hadoop]$ run-example SparkPi 10
[hadoop@hadoop-master /home/hadoop]$ spark-shell --master local[2]
scala> :quit[hadoop@hadoop-master /home/hadoop]$pyspark --master local[2] Using Python version 3.6.5 (default, Apr 29 2018 16:14:56) SparkSession available as 'spark'. In [1]: exit [hadoop@hadoop-master /home/hadoo]$ /soft/spark/sbin/start-all.sh org.apache.spark.deploy.master.Master running as process 36750. Stop it first. hadoop-dn3: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/log/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop-dn3.out hadoop-dn2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/log/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop-dn2.out hadoop-dn1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/log/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop-dn1.out [hadoop@hadoop-master /home/hadoo]$ jps 36750 Master [hadoop@hadoop-dn1 /home/hadoo]$ jps 5653 Worker
#Added: Boot separately
#/soft/spark/sbin/start-master.sh // Start the master server
#Sh // Start slaves. // Start multiple slave servers
Copy the code
spark web ui
http://hadoop-master:8080
Copy the code
Iv. Service startup
The master node
[hadoop@hadoop-master /home/hadoop]$ su - root [root@hadoop-master /root]# vi /etc/systemd/system/spark-master.service [Unit] Description=spark-master After=syslog.target network.target [Service] Type=forking User=hadoop Group=hadoop ExecStart=/soft/spark/sbin/start-master.sh ExecStop=/soft/spark/sbin/stop-master.sh [Install] WantedBy=multi-user.target Perform save: Esc :wq [root@hadoop-master /root]# chmod 755 /etc/systemd/system/spark-master.service [root@hadoop-master /root]# systemctl enable spark-master [root@hadoop-master /root]# service spark-master startCopy the code
Slave node
[hadoop@hadoop-dn1 /home/hadoop]$ su - root [root@hadoop-dn1 /root]# vi /etc/systemd/system/spark-slave.service [Unit] Description=spark-slave After=syslog.target network.target [Service] Type=forking User=hadoop Group=hadoop ExecStart=/soft/spark/sbin/start-slave.sh spark://hadoop-master:7077 ExecStop=/soft/spark/sbin/stop-slave.sh [Install] WantedBy=multi-user. Target Save: Esc :wq [root@hadoop-master /root]# chmod 755 /etc/systemd/system/spark-slave.service [root@hadoop-master /root]# systemctl enable spark-slave [root@hadoop-master /root]# service spark-slave startCopy the code
notebook
[hadoop@hadoop-master /home/hadoop]$ su - root
[root@hadoop-master /root]# vi /etc/init.d/notebook
#! /bin/sh
# chkconfig: 345 85 15
# description: service for notebook
# processname: notebookcase "$1" in start) echo "Starting hive" su - hadoop -c 'export PYSPARK_DRIVER_PYTHON_OPTS="notebook --config=/home/hadoop/.ipython/profile_myserver/ipython_notebook_config.py"; nohup pyspark >/dev/null 2>&1 &' echo "ipython_notebook started" ;; stop) echo "Stopping ipython_notebook" PID_COUNT=`ps aux |grep ipython_notebook |grep -v grep | wc -l` PID=`ps aux |grep ipython_notebook |grep -v grep | awk {'print $2'}` if [ $PID_COUNT -gt 0 ]; then echo "Try stop ipython_notebook" kill -9 $PID echo "Kill ipython_notebook SUCCESS!" else echo "There is no ipython_notebook!" fi ;; restart) echo "Restarting ipython_notebook" $0 stop $0 start ;; status) PID_COUNT=`ps aux |grep ipython_notebook |grep -v grep | wc -l` if [ $PID_COUNT -gt 0 ]; then echo "ipython_notebook is running" else echo "ipython_notebook is stopped" fi ;; *) echo "Usage: $0 {start | stop | restart | status}" exit 1 esac perform save: Esc :wq [root@hadoop-master /root]# chmod 755 /etc/init.d/notebook [root@hadoop-master /root]# chkconfig --add notebook [root@hadoop-master /root]# chkconfig notebook on [root@hadoop-master /root]# service notebook startCopy the code
Five, encounter pit
- Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/Logger
Jar, slf4J-API-1.7.30. jar, or slf4J-log4j12-1.7.25. jar. The latest version is available.
[hadoop@hadoop-master /home/hadoop]$ll /soft/spark/jars/log* log4J-1.2.17.jar logging-interceptor-3.12.0.jar [hadoop@hadoop-master /home/hadoop]$ll /soft/spark/jars/slf4j-* slf4J-apI-1.7.30.jar SLf4j-log4j12-1.7.25.jarCopy the code
- JAVA_HOME is not set
[hadoop@hadoop-master /home/hadoop]$ vi /soft/spark/sbin/spark-config.sh
# exportPYSPARK_PYTHONPATH_SET=1 Append the following
export JAVA_HOME=/soft/jdk
export SPARK_HOME=/soft/spark
export HADOOP_HOME=/soft/hadoop
export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop
export SPARK_CONF_DIR=/home/hadoop/spark/conf
export SPARK_LOG_DIR=/home/hadoop/spark/log
Copy the code
Vi. Supplementary content
Use ipython Notebook remotely
[root@hadoop-master /root]# pip install ipython [root@hadoop-master /root]# su - hadoop [hadoop@hadoop-master / home/hadoop] $ipython Python 3.6.5 | Anaconda, Inc. | (default, Apr 29 2018, 16:14:56) Type 'copyright', 'Credits' or 'license' for more information IPython 6.4.0 -- An enhanced Interactive python. Type '? ' for help. In [1]: from IPython.lib import passwd In [2]: passwd() Enter password: Verify password: Out[2]: 'sha1:9435b2964949:cdcf603ca1cf095c5141270b66e9848db30d09f9'#Password: 123456
[hadoop@hadoop-master /home/hadoop]$ ipython profile create myserver
[hadoop@hadoop-master /home/hadoop]$ vi /home/hadoop/.ipython/profile_myserve/ipython_notebook_config.py
c = get_config()
c.IPKernelApp.pylab='inline'
c.NotebookApp.ip='*'
c.NotebookApp.open_browser=False
c.NotebookApp.password=u'sha1:9435b2964949:cdcf603ca1cf095c5141270b66e9848db30d09f9'
c.NotebookApp.port=8888
[hadoop@hadoop-master /home/hadoop]$ PYSPARK_DRIVER_PYTHON_OPTS="notebook --config=/home/hadoop/.ipython/profile_myserver/ipython_notebook_config.py" pyspark
Copy the code
Appendix:
-
Official documents:
Spark.apache.org/docs/latest…