Spark Introduction (3)

This is the 14th day of my participation in the August More Text Challenge

3. Introduction to the server

1. Hardware and deployment suggestions

withHDFSDeploy it on the same machine, and then on the same LAN.
withHBase(Low-latency storage) Deployed separately.
Four to eight disks, no RAID;
Under Linux, noatime is selected to install disks to reduce unnecessary writes.
In the Spark configuration, spark.local.dir is a comma-separated list of local disks. It can share disks with the HDFS.
Memory >= 8GiB and <=200GiB, Spark occupies 75% of the total memory.
Ten gigabit network card;
At least 8-16 CPU cores;

2. The environment variable

The environment variable	meaning
`JAVA_HOME`	The Java installation location (if not the default setting)`PATH`).
`PYSPARK_PYTHON`	Python binary executables available for PySpark in both drivers and workers (`python2.7`Default if it is available, otherwise`python`).`spark.pyspark.python`If a property is set, it takes precedence
`PYSPARK_DRIVER_PYTHON`	Python binary executable used for PySpark only in drivers (default`PYSPARK_PYTHON`).`spark.pyspark.driver.python`If a property is set, it takes precedence
`SPARKR_DRIVER_R`	R binary executable for SparkR shell (default`R`).`spark.r.shell.command`If a property is set, it takes precedence
`SPARK_LOCAL_IP`	Bind the IP address of the machine.
`SPARK_PUBLIC_DNS`	The hostname of your Spark program is broadcast to other computers.

`SPARK_MASTER_HOST`	Bind the primary server to a specific host name or IP address, such as a public host name or IP address.
`SPARK_MASTER_PORT`	Start the primary server on another port (default: 7077).
`SPARK_MASTER_WEBUI_PORT`	Port of the primary Web UI (default: 8080).
`SPARK_MASTER_OPTS`	Applies only to the configuration properties of the primary server in the form of “-dx = y” (default: none). See the list of possible options below.
`SPARK_LOCAL_DIRS`	The directory used for the “temporary” space in Spark, including the mapping output file and the RDD stored on disk. It should be on a fast local disk on your system. It can also be a comma-separated list of directories on different disks.
`SPARK_WORKER_CORES`	Total number of cores allowed to be used by Spark applications on computers (default: all available cores).
`SPARK_WORKER_MEMORY`	The total amount of memory allowed to be used by Spark applications on the computer, for example`1000m`.`2g`(Default: Total memory minus 1 GiB); Please note that each application’sA separateMemory is all about it`spark.executor.memory`Property configured.
`SPARK_WORKER_PORT`	Start the Spark worker on a specific port (default: random).
`SPARK_WORKER_WEBUI_PORT`	Port for secondary Web UI (default: 8081).
`SPARK_WORKER_DIR`	The directory in which to run the application, which will include logs and temporary space (default: SPARK_HOME/work).
`SPARK_WORKER_OPTS`	Applies only to worker configuration properties in the form of “-dx = y” (default: none). See the list of possible options below.
`SPARK_DAEMON_MEMORY`	Memory allocated to the Spark main daemon and the secondary daemon itself (default: 1 GB).
`SPARK_DAEMON_JAVA_OPTS`	The JVM options for Spark primary and secondary daemons themselves appear as “-dx = y” (default: none).
`SPARK_DAEMON_CLASSPATH`	The classpath of the Spark primary daemon and secondary daemon itself (default: none).
`SPARK_PUBLIC_DNS`	Spark Public DNS names of the primary and secondary servers (default: none).

Iv. Introduction to the client

1. The key configuration

val conf = new SparkConf()
  .setMaster(...)
  .setAppName(...)
  .set("spark.cores.max", "10")
val sc = new SparkContext(conf)
Copy the code

Spark.executor. cores Number of CPU cores assigned to each application;
Spark.cores. Max Limits the number of CPU cores used.

2. Submit a job

${SPARK_HOME}/bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments] # Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master local[8] \ /path/to/examples.jar \ 100 # Run on a Spark standalone cluster in client deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master Spark: / / 207.184.161.138: \ \ 7077 - executor - memory 20 g - total - executor - 100 cores \ / path/to/examples. The jar \ 1000 # Run on a Spark standalone cluster in cluster deploy mode with supervise ./bin/spark-submit \ --class Org. Apache. Spark. Examples. SparkPi \ - master spark: / / 207.184.161.138: \ 7077 - deploy - mode cluster \ - supervise \ --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 # Run on a YARN cluster export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ # can be client for client mode --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000 # Run a Python application on a Spark standalone cluster. / bin/Spark - submit \ - master Spark: / / 207.184.161.138: \ 7077 examples/src/main/python/pi.py \ 1000 # Run on a Mesos cluster in cluster deploy mode with supervise ./bin/spark-submit \ - class org. Apache. Spark. Examples. SparkPi \ - master mesos: / / 207.184.161.138:7077 \ - deploy - mode cluster \ --supervise \ --executor-memory 20G \ --total-executor-cores 100 \ http://path/to/examples.jar \ 1000 # Run on a Kubernetes cluster in cluster deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master k8s://xx.yy.zz.ww:443 \ --deploy-mode cluster \ --executor-memory 20G \ --num-executors 50 \ http://path/to/examples.jar \ 1000Copy the code

3. Introduction to the server

1. Hardware and deployment suggestions

2. The environment variable

Iv. Introduction to the client

1. The key configuration

2. Submit a job

Related Posts

Relearning the Java design pattern: The Hands-on Builder Pattern

That Offer a daily topic | 14, print n digits from 1 to the maximum

More elegant Token authentication mode JWT