This article is one of a series of articles that introduce the basic concepts of Kubernetes. In the first article, we briefly introduce the Persistent Volumes. In this article, we’ll learn how to set up data persistence, and we’ll write a Kubernetes script to connect our POD to a persistent volume. In this example, Azure File Storage will be used to store the data from our MongoDB database, but you can use any type of volume to achieve the same result (for example, Azure Disk, GCE persistent Disk, AWS elastic block Storage, etc.).

If you want to fully understand the other concepts of K8S, you can check out the previously published articles.

Please note: The scripts provided in this article are not platformer specific, so you can follow this tutorial with other types of cloud providers or with a local cluster with K3S. This article recommends using K3S because it is very light and all dependencies are packaged in a single binary size of less than 100MB. It is also a highly available certified Kubernetes distribution for production workloads in resource-constrained environments. For more information, see the official documentation:

https://docs.rancher.cn/k3s/

preparation

Before starting this tutorial, make sure you have Docker installed. Install Kubectl at the same time (if not, please visit the following link to install:

https://kubernetes.io/docs/tasks/tools/#install-kubectl-on-windows

The Kubectl command used throughout this tutorial can be found in Kubectl Cheat Sheet.

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

In this tutorial, we will use Visual Studio Code, but you can use other editors as well.

What problems can Kubernetes persistence volumes solve?

Remember, we have a node (hardware device or virtual machine) and inside the node, we have a Pod (or multiple pods), in which we have containers. PODs are transient, so they are elusive (often deleted, rescheduled, etc.). In this case, if you want to have saved the data in the Pod after it has been deleted, you need to move the data outside the Pod. This allows it to exist independently of any Pod. The other part of the location is called a volume, which is an abstraction of the storage system. With volumes, you can maintain a persistent state across multiple PODs.

When to use persistent volumes

When containers came into widespread use, they were designed to support stateless workloads with persistent data stored elsewhere. Since then, many efforts have been made to support stateful applications in container ecosystems.

Every project requires some kind of data persistence, so you usually need a database to store the data. But in clean design, you don’t want to rely on concrete implementations; You want to write an application that is as reusable and platform-independent as possible.

There has always been a need to hide the details of the storage implementation from the application. But now, in the era of cloud native applications, cloud providers are creating environments where applications or users who want to access data need to be integrated with a specific storage system. For example, many applications directly use specific storage systems, such as Amazon S3, AzureFile, or block storage, which creates unhealthy dependencies. Kubernetes is trying to change this by creating an abstraction called a persistent volume, which allows cloud native applications to connect to various cloud storage systems without having to establish an explicit dependency with those systems. This can make the consumption of cloud storage more seamless and eliminate integration costs. It also makes it easier to migrate clouds and adopt a multi-cloud strategy.

Even if there are times when you have to make compromises to couple your application directly to a particular platform or provider due to objective constraints such as money, time, or manpower, you should try to avoid as many direct dependencies as possible. One way to decouple the application from the actual database implementation (there are other solutions, but these are more complex) is to use containers (and persistent volumes to prevent data loss). In this way, your application will rely on abstractions rather than specific implementations.

The real question now is whether we should always use containerized databases with persistence volumes, or what types of storage systems should not be used in containers?

There is no golden rule for when to use persistent volumes, but as a starting point, you should consider scalability and handling node loss in the cluster.

Depending on scalability, we can have two types of storage systems:

  • Vertical scaling – This includes traditional RDMS solutions such as MySQL, PostgreSQL, and SQL Server
  • Horizontal scaling – includes “NoSQL” solutions such as Elasticsearch or Hadoop based solutions

Vertical scaling solutions like MySQL, Postgres, Microsoft SQL, and others should not enter the container. These database platforms require high I/O, shared disks, block storage, etc., and do not gracefully handle node loss in a cluster, which typically occurs in a container-based ecosystem.

For horizontally scaled applications (Elastic, Cassandra, Kafka, and so on), you should use containers because they can withstand node loss in the database cluster, and the database application can be rebalanced independently.

In general, you can and should containerize distributed databases that use redundant storage technologies that can withstand node loss in a database cluster (Elasticsearch is a good example).

The type of Kubernetes persistent volume

We can classify Kubernetes volumes based on their life cycle and configuration.

Considering the life cycle of a volume, we can divide it into:

  • Temporary volumes, which are tightly coupled to a node’s life cycle (such as ExpertDir or HostPath), remove their intercept number if the node fails.
  • Persistent volumes, that is, long-term storage, and are independent of PPD or the node life cycle. These can be cloud volumes (such as GCEPersistentDisk, AwselasticBlockStore, AzureFile, or AzureDisk), NFS (Network File System) or Persistent Volume Claim (a series of abstractions to connect to the underlying cloud to provide storage volumes).

According to the configuration of volumes, we can divide them into:

  1. Direct access to the
  2. Static configuration
  1. Dynamic configuration

Direct access to persistent volumes

In this case, POD will be directly coupled to Volume, so it will know about the storage system (for example, POD will be coupled to an Azure storage account). This solution is cloud agnostic and depends on implementation rather than abstraction. Therefore, avoid such a solution if possible. Its only advantage is that it is fast, creating a Secret in a Pod and specifying the Secret that should be used and the exact storage type.

The script for creating Secret is as follows:

apiVersion: v1  
kind: Secret  
metadata:  
  name: static-persistence-secret  
type: Opaque  
data:  
  azurestorageaccountname: "base64StorageAccountName"  
  azurestorageaccountkey: "base64StorageAccountKey"

In any Kubernetes script, in line 2 we specify the type of the resource. In this case, we call it Secret. In line 4, we give it a name (we call it static because it was created manually by the administrator, not automatically generated). From Kubernetes’s point of view, the Opaque type means that the content (data) of the Secret is unstructured (it can contain any key-value pair). For more information about Kubernetes Secrets, see Secrets Design Document and ConfigureKubernetes Secrets.

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/auth/secrets.md

https://kubernetes.io/docs/concepts/configuration/secret/

In the data section, we must specify the account name (in Azure, it is the name of the storage account) and the Access key (in Azure, select “Settings” under the storage account, Access key). Don’t forget that both should be Base64 encoded.

The next step is to modify our Deployment script to use the volume (in this case, Azure File Storage).

apiVersion: apps/v1 kind: Deployment metadata: name: user-db-deployment spec: selector: matchLabels: app: user-db-app replicas: 1 template: metadata: labels: app: user-db-app spec: containers: - name: mongo image: Mongo :3.6.4 command: -mongod - "--bind_ip_all" - "-- DirectoryPerDB "ports: -ContainerPort: 27017 Volumemounts: -ContainerPort: 27017 - name: data mountPath: /data/db resources: limits: memory: "256Mi" cpu: "500m" volumes: - name: data azureFile: secretName: static-persistence-secret shareName: user-mongo-db readOnly: false

The only difference we can see is that from line 32 we specify the volume to be used, give it a name, and specify the exact details of the underlying storage system. SecretName must be the name of the previously created Secret.

Kubernetes storage class

To understand static or dynamic configuration, we must first understand the Kubernetes storage class.

With StorageClass, administrators can provide configuration files, or “classes,” about the available storage. Different classes may map to different quality of service levels, or backup policies, or any policies determined by the cluster administrator.

For example, you can have a profile that stores data on an HDD and name it slow storage, or a profile that stores data on an SSD and name it fast storage. The type of these stores is determined by the provider. With Azure, there are two types of providers: AzureFile vs. AzureDisk (the difference is that AzureFile can be used with the Read Wriite Many access mode, while AzureDisk only supports Read Write Once access, which can be a disadvantage when you want to use multiple PODs at the same time). You can learn more about the different types of Storage Classes here:

https://kubernetes.io/docs/concepts/storage/storage-classes/

Here’s the Storage Class script:

kind: StorageClass  
apiVersion: storage.k8s.io/v1  
metadata:  
  name: azurefilestorage  
provisioner: kubernetes.io/azure-file  
parameters:  
  storageAccount: storageaccountname  
reclaimPolicy: Retain  
allowVolumeExpansion: true

Kubernetes predefines the value of the provider attribute (see Kubernetes storage classes). The retention recycling strategy means that after we remove the PVC and PV, the actual storage media are not cleaned up. We can set it to delete and use this setting, once the PVC is removed, it will also trigger the corresponding PV and the actual storage medium (in this case the actual storage is Azure file storage) to be deleted.

Persistent Volume Claim

Kubernetes has a matching primitive for each traditional storage operation activity (provisioning/configuring/attaching). The persistent volume is provisioned, the storage class is being configured, and the persistent volume Claim is attached.

From the initial document:

*Persistent Volume (PV) is storage in a cluster that has been configured by the administrator or dynamically using storage classes. Persistent Volume Claim (PVC) is the request that the user stores. It’s similar to POD. POD consumes node resources in a similar way as PVC consumes PV resources. POD can request specific resource levels (CPU and memory). Claim can request specific sizes and access patterns (for example, they can be installed once read/write or multiple times read only). This means that the administrator will create persistent volumes to specify the storage size, access mode, and storage type that POD can use. The developer will create the Persistent Volume Claim, which requires a Volume, access rights, and storage type. This creates a clear distinction between the “development” side and the “operations” side. The developer is responsible for requesting the necessary volumes (PVCs), and the operations staff is responsible for preparing and configuring the required volumes (PVCs). The difference between static and dynamic configuration is that without persistent volumes and a Secret created manually by an administrator, Kubernetes will try to create these resources automatically. *

Dynamic configuration

In this case, there are no persistent volumes and secrets created by hand, so Kubernetes will try to generate them. The Storage Class is necessary, and we will use the Storage Class created in the previous article.

The PersistentVolumeClaim script is as follows:

apiVersion: v1  
kind:Persistent Volume Claim  
metadata:  
  name: persistent-volume-claim-mongo  
spec:  
  accessModes:  
    - ReadWriteMany  
  resources:  
    requests:  
      storage: 1Gi  
  storageClassName: azurefilestorage

And our updated Deployment script:

apiVersion: apps/v1 kind: Deployment metadata: name: user-db-deployment spec: selector: matchLabels: app: user-db-app replicas: 1 template: metadata: labels: app: user-db-app spec: containers: - name: mongo image: Mongo :3.6.4 command: -mongod - "--bind_ip_all" - "-- DirectoryPerDB "ports: -ContainerPort: 27017 Volumemounts: -ContainerPort: 27017 - name: data mountPath: /data/db resources: limits: memory: "256Mi" cpu: "500m" volumes: - name: data Persistent Volume Claim: claimName: persistent-volume-claim-mongo

As you can see, in line 34, we refer to the PVC we created earlier by name. In this case, we did not create a persistent volume or Secret for it manually, so it will be created automatically.

The most important advantage of this approach is that you don’t have to manually create PV and Secret, and Deployment is cloud agnostic. The underlying details of the storage do not exist in the SPEC of the POD. But there are some drawbacks: You can’t configure storage accounts or file shares because they are automatically generated, and you can’t reuse PV or Secret — they are regenerated for each new Claim.

Static configuration

The only difference between static and dynamic configuration is that we manually create the persistent volume and Secret in the static configuration. This gives us complete control over the resources created in the cluster.

The script for the persistent volume is as follows:

apiVersion: v1  
kind: PersistentVolume  
metadata:  
  name: static-persistent-volume-mongo  
  labels:  
    storage: azurefile  
spec:  
  capacity:  
    storage: 1Gi  
  accessModes:  
    - ReadWriteMany  
  storageClassName: azurefilestorage  
  azureFile:  
    secretName: static-persistence-secret  
    shareName: user-mongo-db  
    readOnly: false

Importantly, on line 12 we refer to the Storage Class by name. In addition, in line 14 we reference Secret for accessing the underlying storage system.

This solution was preferred here because it was cloud-agnostic, even though it required more work. It also allows you to apply separation of concerns about roles (cluster administrator and developer) and gives you control over naming and creating resources.

conclusion

In this article, we learned how to use Volume to persist data and state, presented three different ways to set up the system for direct access, dynamic configuration, and static configuration, and discussed the pros and cons of each system.

Author’s brief introduction

Czako Zoltan, an experienced full-stack developer, has extensive experience in multiple areas including front-end, back-end, DevOps, IoT and artificial intelligence.