Abstract: This article will introduce the development history and working principle of Spark on Kubernetes, as well as Spark with Volcano, and how Volcano can help Spark run more efficiently.

This article will introduce the development history and working principle of Spark on Kubernetes, and the second part will roughly introduce Spark with Volcano, and how Volcano can help Spark run more efficiently.

Spark on Kubernetes

Let’s look at the background of Spark on Kubernetes. In fact, Spark has supported Kubernetes Native since version 2.3, which allows Spark users to run their jobs on Kubernetes and use Kubernetes to manage the resource layer. Support for Client Mode and Python was added in version 2.4. Spark 3.0, released this year, also added a number of important Spark on Kubernetes features, including dynamic resource allocation, remote Shuffle service, and Kerberos support.

Advantages of Spark on Kubernetes:

1) Elastic expansion and contraction capacity

2) Resource utilization

3) Unify the technology stack

4) Fine-grained resource allocation

5) Logging and monitoring

How Spark Submit works

Spark’s support for Kubernetes was initially supported by Spark’s official Spark Submit method. Clinet used Spark Submit to submit jobs. The Spark Driver then invokes some API of Apiserver to apply for the creation of executors. After the executors are up, they can perform real computing tasks and log backup.

One advantage of this method is that the user experience of traditional Spark users changes greatly after switching to this method. However, there is also a lack of job cycle management.

Working principle of spark-operator

The second use of Spark on Kubernetes is operator. Operator is more Kubernetes way, you see his whole job submission, first yamL file through Kubectl to submit the job, in which it has its own CRD, namely SparkApplication, Object. After SparkApplication is created, the Controller can watch the creation of these resources. The following process is actually the first mode of reuse, but through this mode, it does a little more perfect.

Compared with the first method, the Controller here can maintain the object life cycle, watch the status of the Spark driver and update the status of the application, which is a more complete solution.

These two different ways of use have their own advantages, and many companies use both ways. This section is also introduced on the official website.

Spark with Volcano

Volcano has integrated and supported both modes of working mentioned above. This link is the Spark open source repository we maintain:

Github.com/huawei-clou…

The job is submitted via Spark Submit. When the job is submitted, it creates a Podgroup that contains scheduling information configured by the user. The YAML file, as you can see, on the right side of the page, adds the driver and Executor roles.

The Volcano queue

Queues are actually something we talked about in lecture 1 and lecture 2. Because Kubernetes has no queue support, it cannot share resources with multiple users or departments sharing the same machine. However, sharing resources through queues is a basic requirement in both HPC and big data.

When we do resource sharing through queues, we provide several mechanisms. Figure at the top of this, we create two inside this queue, through the two queue to share resources of the whole cluster, a queue to give him 40% of the consulting resources, another to give him 60% of the resources, so you can put the two different queue map is different to different departments or project respectively using a queue. Resources in one queue can be used by jobs in another queue when they are not in use. Here is the resource balance between two different namespaces. In Kubernetes, when users of two different application systems submit jobs, the more users submit jobs, the more resources they will get from the cluster. Therefore, we make fair scheduling based on namespace to ensure that namespaces can share cluster resources according to the weight.

Volcano: Pod delay creation

When I introduced this scene before, some students didn’t quite understand it, so I added several pages of PPT to expand it.

For example, when we were doing performance testing, we submitted 16 concurrent jobs, each of which had a size of 1 driver+4 Executor, for a cluster of 4 machines and 16 cores.

When 16 Spark jobs are submitted at the same time, there is a time lag between the creation of driver Pods and the creation of Executor pods. Because of this time difference, when 16 Spark jobs run and the entire cluster is occupied, the cluster will freeze when a large number of concurrent jobs are submitted at the same time.

To solve this situation, we did something like this.

Have a node dedicated to running the driver pod. The other three nodes are dedicated to running the Executor pod, preventing the driver pod from taking up more resources and solving the stuck problem.

But there is a downside, in this case the nodes are 1:3. In real scenarios, the specifications of users’ jobs are dynamic, but the allocation is static and cannot be consistent with the dynamic ratio in real business scenarios. There will always be some resource fragments and waste of resources.

Therefore, we added the Pod delay creation function. After adding this function, there is no need to do static partition of nodes, the whole node is still 4, and the concept of Podgroup for each job is added when 16 jobs are brought up. Volcano’s scheduler plans resources based on podgroups that bring up jobs.

This will prevent too many assignments from being submitted. Not only can you use up all the resources in the four nodes, but you can control the pace of pod creation in high concurrency scenarios without any waste. It is also very simple to use, can be allocated according to your needs, to solve the high concurrency scenarios running stuck or inefficient operation situation.

Volcano: Spark external shuffle service

We know the original Spark is pretty polished and has a lot of great features, but The Volcano ensures that there are no major features missing after migrating to Kubernetes:

1)ESS is deployed at each node on a daemonset basis

2)Shuffle The local device writes Shuffle data, and the local and remote device reads Shuffle data

3) Support dynamic resource allocation

Click to follow, the first time to learn about Huawei cloud fresh technology ~