directory
- Introduction to the
- features
- Spark Operating mode
- Mac Local Installation
This article is based on Spark 2.4.1, and the code can be seen on my Github.
Introduction to the
Spark is a distributed cluster computing system that provides powerful distributed computing capabilities similar to Hadoop. Compared with traditional batch processing systems, Spark can process larger amounts of data. Spark provides Java, Python, Scala, and R interfaces. In addition to common MapReduce calculation, it supports graphs, machine learning, and SparkSQL.
features
- Efficient Speed, because a lot of data is in memory, it is more efficient than Hadoop.
- Usability, Spark provides more than 80 advanced operators.
- Generality provides a large number of libraries, including SQL, DataFrames, MLib, GraphX, Spark Streaming.
- Compatibility with Runs Everywhere, based on the JVM can be compatible with different types of operating systems.
Spark Operating mode
- Local: used to develop and debug Spark applications
- Standlone: Use Spark’s built-in resource management and scheduler to run the Spark cluster in the Master/Slave structure. Xookeeper provides High Availability (HA) to solve single point of failure.
- Apache Mesos: Runs on the well-known Mesos resource management framework. This cluster mode leaves resource management to Mesos, and Spark is only responsible for task scheduling and computing
- Hadoop YARN: Clusters run on YARN resource manager. Resource management is assigned to YARN. Spark only schedules and calculates tasks
Mac Local Installation
Download the appropriate version from the Official Website of Spark and decompress it to the installation directory. This document uses 2.4.1.
Configure the environment variable ~/.bash_profile
exportSPARK_HOME = / Users/shiqiang/Projects/tools/spark - against 2.4.1 - bin - hadoop2.7export PATH=${PATH}:${SPARK_HOME}/bin
Copy the code
Installation directory of the local PC ~/Project/tools
Enable Mac remote login in Mac System management to allow installation users to log in remotely.
Start the command
$ ./sbin/start-all.sh
$ jps
21731 Jps
21717 Worker
21515 Master
Copy the code
Using the JPS command, you can see that the Master and Worker have started. You can also start master./sbin/start-master.sh separately. Separate start Worker. / bin/spark – class org. Apache. Spark. Deploy. Worker. The Worker spark: / / localhost: 7077
The way to stop the service is also very simple
$ ./sbin/stop-all.sh
Copy the code