【 K8S series 1】 Comparison between Spark on K8S and Spark on K8S Operator

The Current K8s-based Spark application runs in two modes

Spark on K8S supported by Spark
Spark on K8S operator based on K8S operator

The former is the implementation of the K8S client introduced by the Spark community to support the RESOURCE management framework K8S. The latter is an operator developed by the K8S community to support Spark

The difference between	spark on k8s	spark on k8s operator
Community support	The spark community	Google LoudPlatform unofficial support
Version for	The spark > = 2.3, Kubernetes > = 1.6	The spark > 2.3, Kubernetes > = 1.13
The installation	Install according to the official website, need K8S pod create List Edit delete permission, and need to compile the source code to build the image, the construction process is tedious	K8s admin is required to install incubator/ SparkOperator and the pod Create List Edit delete permission is required
use	Code 1 supports client and Cluster modes.spark on k8s	Submit through YAML configuration file, support client and cluster mode, submit as code2, specific parameters for referencespark operator configuration
advantages	Task submission in sparker mode is more convenient for users who are used to Spark	K8s configuration file is used to submit tasks, which is highly reusable
disadvantages	Driver resources are not automatically released after the driver runs	Driver resources are not automatically released after the driver runs
implementation	For Spark submission, both client and cluster submissions inherit SparkApplication. Submit as client, subclassJavaMainApplication, which runs in reflection mode. For K8S task analysis,clusterManager isKubernetesClusterManagerThis mode is the same as the mode of submitting tasks to YARN. Submit in cluster mode. For K8S tasks, the spark program entry isKubernetesClientApplication, the client will set clusterIp to NoneserviceExecutor interacts with the service through RPC, such as the submission of tasks, and creates driver-conf-map extensionsconfigMapTo create the Spark Driver podvolumnThe mount form is referenced, and the contents of the file are finally referenced when the driver submits the task–properties-fileThen configuration items such as spark.driver.host are transferred to the driver. At the same time, a -hadoop-config file is createdconfigMapBut how does a K8S image distinguish between an executor and a driver? Everything is indockerfile(Specific build time according to the hadoop and Kerbeors environment is different configuration) andentrypointShell, where the driver and executor are distinguished.	Use the K8S CRD Controller mechanism to customizeCRD, according to theoperator SDKAnd listens for the add, delete, modify, and check events. If the CRD creation event is detected, create a POD and submit the Spark task according to the configuration item in the CORRESPONDING YAML file. For details, seespark on k8s operator designThe principle of cluster and client mode is the same as that of Spark on K8S, because the image reuse is the official spark image

Code 1 - bin/spark - submit \ - master k8s: / / https://192.168.202.231:6443 \ - deploy - mode cluster \ - name spark - PI \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=2 \ --conf "spark.kubernetes.namespace=dev" \ --conf "spark.kubernetes.authenticate.driver.serviceAccountName=lijiahong" \ --conf "Spark. Kubernetes. Container. Image = harbor. K8s - test. Uc. Host. Against a/dev/spark - py: CDH - server - 5.13.1" \ conf "spark.kubernetes.container.image.pullSecrets=regsecret" \ --conf "spark.kubernetes.file.upload.path=hdfs:///tmp" \ - the conf "spark kubernetes. Container. Image. PullPolicy = Always" \ HDFS: / / / TMP/spark - examples_2. 12-3.0.0. JarCopy the code

code 2 --- apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi namespace: Dev spec: type: Scala mode: cluster image: "gcr. IO /spark-operator/spark:v3.0.0" imagePullPolicy: Always mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "The local: / / / opt/spark/examples/jars/spark - examples_2. 12-3.0.0. Jar" sparkVersion: "3.0.0 restartPolicy:" type: Never volumes: - name: "test-volume" hostPath: path: "/tmp" type: Directory driver: cores: 1 coreLimit: "1200m" memory: "512M" labels: version: 3.0.0 serviceAccount: lijiahong volumeMounts: - name: "test-volume" mountPath: "/ TMP "Executor: Cores: 1 Instances: 1 Memory: "512M" labels: version: 3.0.0 volumeMounts: -name: "test-volume" mountPath: "/ TMP"Copy the code

This article is published by OpenWrite, a blogging tool platform

【 K8S series 1】 Comparison between Spark on K8S and Spark on K8S Operator

The Current K8s-based Spark application runs in two modes

Related Posts

【VRPD Problem 】 The ant colony algorithm is used to solve the VRPD problem of armored vehicle transportation

Don’t know what to eat? Python to tell you! Generate recipes, and you don’t have to worry about what to eat every day!

I again to | 2021 annual summary