The content is from the official Longhorn 1.1.2 English technical manual.

A series of

  • What’s a Longhorn?
  • Longhorn enterprise cloud native container distributed storage solution design architecture and concepts
  • Longhorn Enterprise Cloud Native Container Distributed storage – Deployment
  • Longhorn Enterprise Cloud Native Container Distributed Storage – Volume and Node
  • Longhorn, enterprise cloud native Container Distributed storage -K8S resource configuration example
  • Longhorn, Enterprise Cloud Native Container Distributed Storage – Monitoring (Prometheus+AlertManager+Grafana)
  • Longhorn, Enterprise cloud native Container Distributed storage – backup and recovery

directory

  • Data locality
    • Data locality Settings
    • How do I set data locality for a volume
      • Change the default global Settings
      • useLonghorn UIChange the data location of a single volume
      • useStorageClassSet data locality for a single volume
  • Restore volume after unexpected separation
  • useLonghornHandling Node Faults
    • whenKubernetesWhat happens when a node fails
    • The node is downLonghorn PodDeletion policy
      • Volume attachment recovery policy
        • Volume attachment recovery policynever (KubernetesThe default)
        • Volume attachment recovery policywait (LonghornThe default)
        • Volume attachment recovery policyimmediate
    • When the fault occursKubernetesWhat happens when the node recovers

Data locality

The Data Locality Setting is intended to be enabled when, whenever possible, at least one copy of the Longhorn volume should be scheduled on the same node as the POD that uses the volume. We refer to the feature of having a local copy as having data locality.

For example, when the cluster has a poor network, data locality can be useful because having a local copy increases volume availability.

Data Locality is also useful for distributed applications, such as databases, where high availability is achieved at the application level rather than the volume level. In this case, each Pod only needs one volume, so each volume should be scheduled on the same node as the Pod that uses it. In addition, the default Longhorn behavior of volume scheduling can cause problems for distributed applications. The problem is that if a Pod has two copies, and each Pod copy has a volume, Longhorn does not know that these volumes have the same data and should not be scheduled on the same node. Thus Longhorn can schedule the same replicas on the same node, preventing them from providing high availability for the workload.

When data locality is disabled, the Longhorn volume can be supported by a replica on any node in the cluster and accessed by a POD running on any node in the cluster.

Data locality Settings

Longhorn currently supports two data Locality setting modes:

  • Disabled. This is the default option. There may or may not be replicas on the same node as the attached volume (workload).

  • Best-effort. This option instructs Longhorn to try to keep the copy on the same node as the attached volume (workload). Longhorn will not stop the volume even if it cannot keep the copy locally on the attached volume (workload) due to environmental constraints, such as insufficient disk space, incompatible disk labels, and so on.

How do I set data locality for a volume

Data Locality can be set for the Longhorn volume in three ways:

Change the default global Settings

You can change the global default setting for Data Locality in the Longhorn UI Settings. Global Settings are only used as default values, similar to replica count. It does not change the Settings of any existing volumes. When data locality is not specified when the volume is created, Longhorn uses the global default setting to determine the volume’s data locality.

Change the data location of a single volume using the Longhorn UI

You can set data Locality when creating a volume using the Longhorn UI. You can also change the Data Locality Setting after volume creation on the Volume Detail page.

Use StorageClass to set data locality for a single volume

Longhorn also exposes the Data Locality Setting as a parameter in StorageClass. You can create the StorageClass using the specified Data Locality Setting, and then create the PVC using the StorageClass. For example, the following YAML file defines a StorageClass that tells the Longhorn CSI Driver to set the data locality to best-effort:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: hyper-converged
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "2"
  dataLocality: "best-effort"
  staleReplicaTimeout: "2880" # 48 hours in minutes
  fromBackup: ""
Copy the code

Restore volume after unexpected separation

When unexpected detachment occurs, it may occur during Kubernetes upgrade, Docker Reboot or network disconnection. If POD is managed by controller (e.g. Deployment, Statefulset, Daemonset, etc.), Longhorn automatically removes the workload pod. By removing the POD, its controller restarts the POD, and Kubernetes handles volume reattachment and remount.

If you don’t want Longhorn to automatically remove workload Pods, Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly You can set The Longhorn UI to Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly To set.

For pods without controllers, Longhorn won’t delete them, because if Longhorn deletes them, no one will restart them. To recover an unexpectedly detached volume, you must manually delete and recreate a POD without a controller.

Troubleshoot node faults using Longhorn

What happens when Kubernetes nodes fail

This section is intended to inform users of what happens during a Node failure and what happens during recovery.

One minute later, Kubectl Get Nodes will report NotReady for the failed node.

About five minutes later, the status of all pods on the NotReady node changes to Unknown or NodeLost.

StatefulSets have a stable identity, so Kubernetes does not force pod removal for the user. See the official Kubernetes documentation on forcibly deleting StatefulSet.

Deployments has no stable identity, but for read-write-once storage, since it cannot be attached to both nodes at the same time, the new pod created by Kubernetes will not start because the RWO volume is still connected to the old pod, Located on the missing node.

In both cases, Kubernetes automatically evicts the pod on the missing node (setting the removal timestamp for the POD), and then tries to recreate a new volume with the old one. Because expelled PODS are stuck in Terminating state, and additional volumes cannot be reused, without the intervention of admin or storage software, The new pod will be stuck in ContainerCreating state.

Longhorn Pod deletion policy for node downtime

Longhorn provides an option to help users automatically force the termination pod of StatefulSet/Deployment to be removed on a down node. After forced deletion, Kubernetes will detach the Longhorn volume and start the replacement pod on the new node.

You can configure the Pod Deletion Policy When the Node is Down in the Settings TAB of the Longhorn UI or Settings Reference Delete policies) for more details on setting options.

Volume attachment recovery policy

If you decide to forcibly remove the POD (manually or with Longhorn’s help), Kubernetes will take about 6 minutes to remove the VolumeAttachment object associated with the pod, and then finally detach the volume from the missing node and allow it to be used by the new pod.

This 6-minute period is hard-coded in Kubernetes: if the pod on the missing node is forcibly removed, the related volume will not be properly unloaded. Kubernetes then waits for this fixed timeout to clean up the VolumeAttachment object directly.

To solve this problem, we provide three different volume attachment recovery strategies.

Volume attachment recovery policynever (Kubernetes default)

Longhorn does not recover Volume Attachment from a failed node, which is consistent with Kubernetes’ default behavior. The user needs to forcibly remove the terminated POD, at which point Longhorn will recover the Volume Attachment from the failed node. The suspended replacement pod(replacement Pod) is then allowed to start correctly if the requested volume is available.

Volume attachment recovery policywait (Longhorn default)

Longhorn will wait to restore the Volume Attachment until the grace period for all terminating Pods to be removed has passed. Since the node Kubelet is required to remove the Pod at this point, and the Pod is still available, we can conclude that the failed node Kubelet cannot remove the Pod. At this point Longhorn will recover the Volume Attachment from the failed node. The suspended replacement pod(replacement Pod) is then allowed to start correctly if the requested volume is available.

Volume attachment recovery policyimmediate

Longhorn will restore the Volume Attachment from the failed node as soon as the replacement Pod to be processed is available. The suspended replacement pod(replacement Pod) is then allowed to start correctly if the requested volume is available.

What happens when a failed Kubernetes node recovers

If the node comes back online within 5 to 6 minutes after a failure, Kubernetes will restart the Pod, unmount (re-mount), and re-mount (re-attaching) volumes without the need for re-attaching and VolumeAttachment cleanup.

Because volume engines shut down after a node goes down, this direct reinstallation will not work because the device no longer exists on the node.

In this case, Longhorn will detach and reattach the volume to restore the volume engine so that POD can safely remount/reuse the volume.

If a node is not brought back online within 5-6 minutes after a failure, Kubernetes will attempt to remove all unaccessible pods based on Pod Eviction, the pods will be in Terminating state. For more information, see Pod Eviction Timeout.

Then, if the failed nodes recover later, Kubernetes will restart the terminated pods, detach the volumes, and wait for the old VolumeAttachment to clean up. Reuse re-attach and re-mount volumes. Typically these steps may take 1 to 7 minutes.

In this case, detaching and re-attaching operations are already included in the Kubernetes recovery process. Therefore, no additional operations are required and the Longhorn volume will be available after the above steps.

For all of the above recovery scenarios, Longhorn will handle these steps automatically through Kubernetes’ association.