preface

Fluid, a Kubernetes-based framework for accelerating data scheduling and orchestration in cloud-based survival computing scenarios, recently completed its v0.6.0 release. Tencent Cloud container TKE team has been committed to participating in Fluid community construction, in the latest version of the following two features: cache engine high availability runtime, new data cache engine implementation GooseFSRuntime.

What is the separation of memory and computation? Why is data choreography needed in a cloud-native context? What is Fluid and GooseFSRuntime? Don’t worry! We take you to explore each of these questions. This article will first introduce the background of Fluid technology and its relationship to GooseFS. Secondly, we will experience two characteristics of Fluid V0.6.0 through practical practice in the TKE cluster. Finally, we’ll talk about the future of Fluid communities. I hope this article will give you a better understanding of data orcheography in cloud native application scenarios, and I’m looking forward to your participation in Community building in Fluid.

Status quo and Challenges

What is the memory separation architecture?

“Separation of memory and computing” is the product of the current development of network technology and social and economic progress, and it is the most suitable framework for the development of the current era. In a public cloud environment, block storage, file storage, and object storage are used to replace local storage to meet users’ on-demand services and unlimited expansion requirements. For example, when creating a TKE cluster, high-performance cloud disks, SSDS, or enhanced SSDS are mounted based on the maximum throughput and IOPS of a single disk. These storage devices of different specifications are essentially cloud hard disks and consume unlimited network bandwidth. However, as cloud vendors continue to push for technology, and users pursue the ultimate cost, scalability, and performance, the separation of computing and storage has become the trend of cloud native architecture.

CNCF’s “China Cloud Native report 2020” points out that container applications have grown by a staggering 240% compared to two years ago, and the proportion of container ographer implementation standard Kubernetes in production has also increased from 72% to 82%. Kubernetes, the base of the cloud native era, has become the first choice for public, private and hybrid clouds with convenient portability, rich scalability and automation of scheduling. At present, many AI and big data businesses are also actively close to Kubernetes, such as open-source machine learning platform Kubeflow; Big data computing framework Spark also introduces Spark-Operator to meet the requirements of building a big data computing platform based on Kubernetes. There is a growing trend for cloud native applications to evolve into a storage and computing architecture.

What are the challenges of cloud computing separation architecture?

From Adrian Cockcroft’s introduction of Netflix’s successful Cloud Native application on AWS in 2013 to Pivotal’s Matt Stine’s definition of Cloud Native architecture and the founding of CNCF, the Cloud Native Computing Foundation, in 2015, Cloud native values have been widely accepted by enterprise users. While cloud native is accelerating its penetration into vertical industries, the public cloud scenario with the separation of computing and memory still presents a number of challenges for cloud native business development:

  • The cloud platform storage and computing separation architecture results in high data access latency. With the large-scale use of high-speed network equipment and loads, all data depends on network IO to compute nodes for computation and summarization, especially for data-intensive applications, there is a high probability that the network will become a bottleneck (no silver bullet). I/O bottlenecks will ultimately lead to insufficient utilization of computing and storage resources, which will undermine the original purpose of using the cloud to reduce costs and increase efficiency.
  • In hybrid cloud scenarios, joint analysis across storage systems is difficult. Most companies’ lines of business are likely to be divided into teams that use different computing frameworks for different Workload tasks, and the storage that the frameworks support. For example, HDFS is for the big data domain, and Lustre is for the supercomputer domain. When it is necessary to combine data for comprehensive analysis, the increase in the number of data copies and the increase in the cost of data transformation will inevitably lead to the increase in the cost of resources (i.e., manpower) and the decrease in the efficiency of business iteration.
  • Data security governance and multi-dimensional management in the cloud are increasingly complex. Data is the lifeblood of many companies, and data leaks, misoperations, and mismanagement of life cycles can cause huge losses. How to ensure data isolation in the cloud native environment and protect the data life cycle of users are challenges.

What can Fluid do?

Fluid resembles a “logistics management system” in the cloud-native world, where logistics is delivered by data sources such as COS, HDFS, Ceph, and Lustre. In addition, there should be logistics warehouses with the ability to store different goods (that is, to aggregate different data sources), such as GooseFS; The delivery address of logistics is the compute node where the user expects the data to be used.

Fluid is designed to deliver goods (data) efficiently and accurately into the user’s hands. In real life, we often distribute the goods in the form of express container, that is, after the goods arrive at the designated express container, we hope the user to take the initiative to receive the express. This can avoid the backlog of express delivery, users can also flexibly plan the time to collect express delivery. In fact, the design concept reflected in the cloud computing scenario is similar to the operator push down, which pushes more calculations down to the storage layer to complete, and reduces the amount of data required for transmission. Hopefully in the last mile, “mobile computing to storage” rather than “mobile storage to computing”.

GooseFS & Fluid inquiry

Cloud Native Data Lake accelerator GooseFS

Data Lake Accelerator Goose FileSystem (GooseFS) is a highly reliable, highly available and elastic Data Lake acceleration service launched by Tencent Cloud. Relying on the cost advantage of Cloud Object Storage (COS) as the data lake Storage base, it provides a unified data lake entrance for computing applications in the data lake ecology, and accelerates the Storage performance of massive data analysis, machine learning, artificial intelligence and other services. It adopts the distributed cluster architecture and features such as flexibility, high reliability, and high availability. It provides unified namespaces and access protocols for upper-layer computing applications to facilitate data management and flow in different storage systems.

Distributed data orchestration and acceleration framework Fluid

Fluid is an open source distributed data orchestration and acceleration framework of CNCF Sandbox. It is an open source project combining original research of academia (Nanjing University, etc.) and practical practice of industry. Driven in the context of the separation of computing and storage, Fluid’s goal is to provide an efficient and convenient layer of data abstraction for AI and big data cloud native applications, abstracting data from storage in order to:

  • Data affinity scheduling and distributed cache engine acceleration are used to realize the fusion between data and computing, so as to accelerate the access of computing to data.
  • The data is managed independently from storage, and resources are isolated through Kubernetes namespace to achieve data security isolation;
  • By combining data from different stores for computation, there is an opportunity to break the siloed effect of different stores.

From the user’s point of view, Fluid automatically schedules the data to the most appropriate node after the user declares the source of the data and exposes the Kubernetes native persistent data volumes. For user applications such as Big data applications such as Hadoop and Spark, or AI applications such as Pytorch and Tensorflow, data volumes need to be mounted. Fluid schedules applications based on affinity to accelerate data and ensure unified access. Fluid project open source developed rapidly in less than half a year, attracting the attention and contribution of many experts and engineers from big factories. Project Adoptor included Tencent, Weibo, Qihoo 360, China Telecom, BOSS Zhipin, fourth Paradigm and many other large well-known IT and Internet enterprises.

Untangle the relationships between TKE, Fluid, and GooseFS

Fluid and Tencent Cloud TKE fusion architecture is shown below, according to different views into computing scheduling layer, storage layer and task layer, below we will detangle the cocoon of the architecture, take you quickly clarify the relationship between Fluid, TKE, GooseFS.

  1. Computing and scheduling layer: TKE provides a container application deployment platform based on the Kubernetes environment, The Fluid GooseFS controller controls the creation of the Master Pod, Worker Pod, and Fuse Pod in the GooseFS instance at the most appropriate TKE Worker node.
  2. Storage layer: The controller will cache the low-level storage data such as COS and HDFS into Worker Pod according to the data source specified by the user.
  3. Task layer: Task POD specifies persistent storage volumes, and controller Webhook will inject affinity information, so as to realize the priority of scheduling cached tasks to nodes with caching and scheduling non-cached tasks to nodes without caching.

In general, Fluid solves many of the pain points of the AI/ Big Data separation scenario by using cloud-native architecture and the concept of “mobile computing to storage” in the last kilometer of data.

Fluid V0.6.0 characteristic experience

The following features are designed and contributed by the TKE team of Tencent Cloud

Cache Engine High Availability Runtime

In the GooseFS distributed cache file system, high availability consists of two layers, one is the availability of the entire file system, the other is the integrity and consistency of the data. As a global metadata management component, the Master ensures High Availability of file systems through Master high-availability. The Raft algorithm is used to ensure the integrity and consistency of logs and metadata. In real business scenarios, if a single master fails, the normal operation of the service will be directly affected. Therefore, Fluid needs to support multiple masters of the cache engine to ensure fault tolerance.

“New data cache engine implementation GooseFSRuntime”

To support the need for a caching system for computing tasks on Tencent Cloud TKE, we have added an execution engine implementation that supports Fluid Dataset data management and caching in the new release. Users can access and cache Tencent Cloud COS files in Fluid using GooseFS caching capability through GooseFSRuntime. Simple process to use and deploy GooseFSRuntime in Fluid, compatible with native K8s environment, out of the box, and even better with Tencent Cloud TKE.

Characteristics of the Demo

This document will briefly show you the above features

The premise condition

Before running the sample, complete the installation by referring to the installation documentation and check that the Fluid components are working properly:

$ kubectl get pod -n fluid-system
goosefsruntime-controller-5b64fdbbb-84pc6   1/1     Running   0          8h
csi-nodeplugin-fluid-fwgjh                  2/2     Running   0          8h
csi-nodeplugin-fluid-ll8bq                  2/2     Running   0          8h
csi-nodeplugin-fluid-dhz7d                  2/2     Running   0          8h
dataset-controller-5b7848dbbb-n44dj         1/1     Running   0          8h
Copy the code

Typically, you’ll see a POD named dataset-controller, a POD named GoosefsRuntime – Controller, and multiple Pods named CSI-nodePlugin running. The number of cS-nodePlugin pods depends on the number of nodes in your Kubernetes cluster.

New Working Environment

$ mkdir <any-path>/demo
$ cd <any-path>/demo
Copy the code

Viewing All Nodes

$kubectl get Nodes NAME STATUS ROLES AGE VERSION 192.168.1.145 Ready < None > 7D14h v1.18.4-tke.13 192.168.1.146 Ready < None > 7d14h v1.18.4-tke.13 192.168.1.147 Ready < None > 7D14h v1.18.4-tke.13Copy the code

Creating a Dataset resource

cat >> dataset.yaml <<EOF
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: hbase
spec:
  mounts:
    - mountPoint: https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/stable/
      name: hbase
EOF
$ kubectl create -f dataset.yaml
dataset.data.fluid.io/hbase created
Copy the code

In mountPoint, Web UFS is used for the convenience of users for experiments, and COS is used as UFS visible acceleration COS.

Create and view the GooseFSRuntime resource

cat >> runtime.yaml <<EOF apiVersion: data.fluid.io/v1alpha1 kind: GooseFSRuntime metadata: name: hbase spec: replicas: 3 TieredStore: Levels: -mediumType: MEM Path: /dev/shm Quota: 2G high: "0.8" low: "0.7" Master: replicas: 3 EOF $ kubectl create -f runtime.yaml goosefsruntime.data.fluid.io/hbase created $ kubectl get pod NAME READY STATUS RESTARTS AGE hbase-fuse-4v9mq 1/1 Running 0 84s hbase-fuse-5kjbj 1/1 Running 0 84s hbase-fuse-tp2q2 1/1 Running 0 84s hbase-master-0 1/1 Running 0 104s hbase-master-1 1/1 Running 0 102s hbase-master-2 1/1 Running 0 100s hbase-worker-cx8x7  1/1 Running 0 84s hbase-worker-fjsr6 1/1 Running 0 84s hbase-worker-fvpgc 1/1 Running 0 84s $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hbase Bound default-hbase 100Gi ROX fluid 12h $ kubectl get goosefsruntime NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE hbase Ready Ready Ready 15m $ kubectl exec -ti Hbase-master-0 bash # hbase-master-0 bash # hbase-master-0 hbase-master-0:26000 All masters: [hbase-master-0:26000, hbase-master-1:26000, hbase-master-2:26000]Copy the code

Here we focus on three areas:

  1. At this point, we have created a distributed caching engine, GooseFS, that can be accessed by computing tasks whose PODS only need to be specifiedpersistentVolumeClaim.nameYou can obtain the cache acceleration capability for hbase.
  2. At the same time, you only need to specifyspec.master.replicas=nWhere n is an odd number greater than or equal to 3, the Master HA mode can be directly enabled.
  3. You just have to specifyspec.replicas=n, the controller will create three Worker Pods and three Fuse Pods for the GooseFS cache system

Data warm-up and acceleration

Yaml <<EOF apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: nginx volumeMounts: - mountPath: /data name: hbase-vol volumes: - name: hbase-vol persistentVolumeClaim: claimName: EOF $kubectl create -f nginx.yaml $kubectl exec -it nginx /bin/bash $ root@nginx:/# time cp -r /data/hbase/real 1m9.031s user 0m0.000s sys 0m2.101s $kubectl delete -f nginx.yaml $cat >> dataload.yaml <<EOF apiVersion: data.fluid. IO /v1alpha1 kind: Dataload metadata: name: hbase-dataload spec: dataset: name: hbase namespace: default target: - path: / replicas: 1 EOF $kubectl create -f dataload.yaml $kubectl get dataset hbase --watch NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE hbase 545.32MiB 545.32MiB 5.59GiB 100.0% Bound 16m $kubectl create -f nginx.yaml $kubectl exec -it nginx /bin/bash $root@nginx:/# time cp -r /data/hbase/real 0m0.278s user 0m0.000s sys 0m0.273sCopy the code

This large acceleration effect (1m9s -> 0.278s 248x acceleration) is attributed to the powerful caching capability provided by GooseFS. In summary, the GooseFSRuntime is used to manage user-defined data sources and speed up application access through caching.

Two major features of V0.6.0 are shown here: the cache engine’s high availability runtime and the new data cache engine’s implementation of GooseFSRuntime, which does not refer to the Fluid’s other features but is visible in the usage document.

Fluid Roadmap

The figure above shows the current Roadmap of Fluid Community, which is divided into six aspects: automated operations and observability, multi-runtime support, data elastic scaling, scheduling optimization, Fluid Agent and access mode. At present, automatic operation and maintenance, multi-runtime support and access mode are basically realized. The community will focus on the following three aspects in the future:

  1. In terms of elastic expansion, currently it supports Horizontal Pod Autoscaler (HPA) for cache workers based on custom metrics and CronHPA for business peaks and peaks. However, due to the lack of the re-balance function of the cache engine, it is currently impossible to reduce the cost and increase the efficiency by scaling up the capacity, which is also a feature that the community will focus on in the future.
  2. In Fluid Agent mode, some key operation and maintenance indicators such as whether there are residues to be cleared and whether the node has cache are reported through Agent push mode. At the same time, system information on different nodes, such as CPU /memory usage, disk usage, page cache usage, etc. can be used to guide the fluid scheduler to optimal schedule the data set.
  3. In terms of scheduling policy, there are mainly three aspects of scheduling at present:
    1. Data set scheduling: Currently, Kubernetes scheduling has been adapted, such as Toleration, Node Selector, and Preferred scheduling. In the later stage, we hope to achieve the optimal Scheduling of data sets through Filter, Scoring, Binding and other operations in the manner of Scheduling Framework.
    2. Task scheduling: Currently, you can use Webhook to automatically add affinity and anti-affinity labels to loads in a specified namespace for task scheduling.
    3. Collaborative scheduling of task scheduling and data preheating: Data used by jobs in the Job Queue can be preloaded through scheduling information to achieve the purpose of pipeline optimization.

Summary and Prospect

This paper first introduces the birth background of Fluid technology and how the data orchestration function solves the “mobile computing to storage” demand of cloud native services in the scenario of separation of memory and computing. Secondly, through a brief analysis of Fluid structure, the relationship between Fluid, GooseFS and TKE is clarified, and the two basic functions of V0.6.0 are demonstrated through a simple Demo. Finally, Fluid Roadmap summarizes what has been done in the community and its future development plans.

In general, achieving the ultimate flexibility of computing and storage in the public cloud is the premise of efficiency and cost reduction. Only by enabling our business to better use the ability of elasticity, and obtain the maximum dividend of cloud native and cloud computing, can we make applications born in the cloud and longer than the cloud.