Overview of Nebula Operator, an automated deployment cluster management tool on the cloud

This article is available online at Nebula Graph: Nebula Operator. This article explains the cluster management tool for automated deployment on the cloud

Before introducing Nebula Operator, let’s take a look at what Operator is.

Operator is a way to encapsulate, deploy, and manage Kubernetes applications by extending the functionality of the Kubernetes API to manage user creation, configuration, and management of complex application instances. It is built on the concept of custom resource CRD and controller, covering domain – or application-specific knowledge for automating the entire lifecycle of the software it manages.

In Kubernetes, the controller of the control plane performs a control loop, repeatedly comparing the expected and actual state of the cluster. If the actual state of the cluster does not match the expected state, the controller will continue to coordinate the internal business logic until the application is updated to the expected state.

NebulaGraph abstracting NebulaGraph’s deployment management into a custom resource called CRD, bundled together through several built-in API objects such as StatefulSet, Service, and ConfigMap, The routine process of managing and maintaining The Nebula Graph is programmed into a control loop that Nebula Operator follows to drive the database cluster to the final state when a CR instance is committed.

Nebula Operator features description

CRD definition

Take a look at the core capabilities of nebula Operator in conjunction with the NEBULA Cluster deployment CR file.

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
  namespace: default
spec:
  graphd:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
    replicas: 1
    image: vesoft/nebula-graphd
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: gp2
  metad:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
    replicas: 1
    image: vesoft/nebula-metad
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: gp2
  storaged:
    resources:
      requests:
        cpu: "500m"
        memory: "500Mi"
    replicas: 3
    image: vesoft/nebula-storaged
    version: v2.0.0
    storageClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: gp2
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  imagePullPolicy: IfNotPresent
Copy the code

There are three descriptions to focus on in the spec: Graphd, Metad, And Storaged represent Graph Service, Meta Service, and Storage Service respectively. The controller checks whether the built-in API objects StatefulSet, Service, and ConfigMap are ready in the coordination loop. If a dependent API object is not created successfully or a Nebula Graph component service fails to coordinate, the controller returns, waits for the next coordination, and repeats the process.

The configuration parameters provided in CRD are not rich enough to cover the core Resource, Replicas, Image, schedulerName, and other essential configuration parameters for the Nebula Operator operation. In the future, the Configuration parameters will be enriched for the Nebula Operator scenarios.

Enlarge shrinks capacity

Storage capacity expansion is divided into two phases. In the first phase, wait until all newly added pods are in Ready state. In the second phase, perform Data Balance Data operation. Decoupling the expansion of controller replicas from the data Balance process allows for customization of data Balance tasks, such as adding time to perform tasks at low traffic times to minimize the impact of data migration on online services, which is in line with Nebula Graph’s own design: The fully automated Balance mode is not adopted, and the Balance timing is controlled by users themselves.

The BALANCE DATA REMOVE $host_list command is used to safely REMOVE a node before capacity reduction. After removing a node is complete, you can perform capacity reduction on a Storage Pod.

Note: The following figure is for description only and does not refer to actual configurations. In high availability scenarios, ensure that three replica instances are online.

Scheduling is balanced

In the scheduling section, Nebula Operator provides a choice between using a default scheduler or one based on the Scheduler Extender interface.

The default scheduler topology distribution constraint controls Pod distribution evenly across the cluster topology domain. Nebula Operator provides uniform distribution of the default kubernetes.io/hostname node tags, with future support for custom node tag configurations. Affinity based scheduling is not selected mainly because affinity essentially controls how pods are stacked or split. PodAffinity is the scheduling of multiple pods in a specific topology domain. PodAntiAffinity ensures that there is only one Pod in a given topology domain, which is called disaggregation scheduling. However, these scheduling policies cannot handle scenarios with as uniform distribution as possible, especially in distributed application scenarios, where high availability is required to be distributed across multiple topology domains.

Of course, if you are using an earlier version of Kubernetes and cannot experience the features of topology distribution constraints, there is also the Nebula Scheduler. Its core logic is to ensure that each component’s Pod is evenly distributed across the specified topology domain.

Workload controller

The Nebula Operator program supports a variety of workload controllers. With reference configuration items, you can choose which workload controller to use. Nebula Operator also provides additional support for the AdvancedStatefulSet for the community edition of OpenKruise. Advanced features such as in-place upgrade and offline specified nodes can be used in Nebula Operator for your business needs. Of course, this requires configuration within the Operator itself. Currently, only in-place upgrade parameters are supported.

  reference:
    name: statefulsets.apps
    version: v1
Copy the code

Other features

Other features, such as configuration validation in high-availability mode and custom configuration updates, are provided by WebHook to make managing the Nebula Graph cluster with Nebula Operator more secure and convenient. You can read the GitHub documentation for the details, but I won’t go into them here.

FAQ

Is Nebula Operator available outside of Kubernetes?

No, Operator runs on Kubernetes, which is an extension of the Kubernetes API, a tool in the K8s domain.

How do I ensure stable availability of upgrade, capacity expansion, and capacity rollback?

You are advised to back up data in advance to avoid rollback failures. Nebula Operator does not currently support pre-operation data backups; future iterations will.

Is NebulaGraph v1.x compatible?

V1. x does not support internal domain name resolution, which Nebula Operator requires and is not compatible with.

Is the cluster stable using local storage?

There is no guarantee that using local storage means Pod is bound to a specific node. Operator does not currently have the ability to failover nodes that fail using local storage.

When will the upgrade be available?

NebulaGraph requires NebulaGraph to be compatible with NebulaGraph, which will support nebulas Operator once the database supports rolling upgrades.

Nebula Operator is now open source at GitHub: github.com/vesoft-inc/… To try out ~

Come to the Nebula Operator call your Needs event and help it grow into what you want it to be: Nebula Operator is at your discretion

Overview of Nebula Operator, an automated deployment cluster management tool on the cloud

Nebula Operator features description

CRD definition

Enlarge shrinks capacity

Scheduling is balanced

Workload controller

Other features

FAQ

Related Posts

How is a current limiter implemented in a distributed system?

Go + gRPC – Gateway (V2) service of actual combat, small application login authentication service (4) : the client type constraints, automatically generated API TS type definition | Go on topic

Edge computing monitoring practices based on Prometheus