1 / download
To the official website to download the apache spark's official website: https://spark.apache.org/downloads.html or is mirror, tsinghua university library: https://mirrors.tuna.tsinghua.edu.cn/Copy the code
2/ Upload the file to the Linux server from the local PC
Run the rz spark-3.1.1-bin-hadoop3.2. TGZ commandCopy the code
3 / uncompress
Tar -zxvf spark-3.1.1-bin-hadoop3.2. TGZ Generates a spark-3.1.1-bin-hadoop3.2 directoryCopy the code
4/ Set environment variables
In the.bashrc file, write (based on your own situation, Export SPARK_HOME=/home/hadoop/spark-3.1.1-bin-hadoop3.2 export PATH=$PATH:$SPARK_HOME/bin export PYTHONPATH = $SPARK_HOME/python: $SPARK_HOME/python/lib/py4j - 0.10.4 - SRC. Zip: $PYTHONPATH export PATH=$SPARK_HOME/python:$PATHCopy the code
5/ Make environment variables take effect immediately
source .bashrc
Copy the code
6 / start pyspark
Go to the installation directory spark-3.1.1-bin-hadoop3.2/bin/ under./pysparkCopy the code
# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
1. Install Spark
1 / download
The official download address: spark.apache.org/downloads.h... , select The Spark version and Hadoop version and download:Copy the code
2/ Decompress the installation package:
# tar - ZXVF spark - then - bin - hadoop2.6. TGZCopy the code
3/ Configure environment variables
Vim /etc/profile export SPARK_HOME=/home/hadoop/spark-2.2.3-bin-hadoop2.6 export PATH=${SPARK_HOME}/bin:$PATH source /etc/profileCopy the code
2 / Start Spark
Local mode is the simplest running mode. It adopts single-node multi-threading mode to run without deployment and out of the box, which is suitable for daily test and development. Spark-shell --master local[2] local: start only one worker thread; Local [k] : start k worker threads; Local [*] : starts the same number of worker threads as the number of cpus. This is the screen after successfully starting Spark, and you can see the version of Spark. Scala is a programming language, and spark is the default programming language, although it is possible to start Spark in Python.Copy the code