The author | car overflow Fluid community Commiter Xie Yuandong Fluid source community Commiter | alibaba cloud native public number
Flexible scaling is one of Kubernetes’ core capabilities, but it has always been around this stateless application load. Fluid provides flexible scalability of distributed cache, allowing flexible expansion and contraction of data cache. Based on Runtime, it provides performance indicators such as cache space and current cache ratio. Combined with the capacity of Runtime resources, it provides on-demand data cache scaling capability.
background
As more and more data-intensive applications such as big data and AI begin to deploy and run in Kubernetes environment, the divergence between the design concept of data-intensive application computing framework and the flexible application choreography of cloud native has led to data access and computing bottlenecks. Fluid, a cloud native data choreography engine, provides data access acceleration for applications by abstracting data sets, utilizing distributed cache technology and combining with scheduler.
Elastic scaling is one of Kubernetes’ core capabilities, but it has always been around this stateless application load. Fluid provides flexible scalability of distributed cache, allowing flexible expansion and contraction of data cache. Based on Runtime, it provides performance indicators such as cache space and current cache ratio. Combined with the capacity of Runtime resources, it provides on-demand data cache scaling capability.
This capability is very important for big data applications in Internet scenarios, as most big data applications are implemented through end-to-end pipeline. The pipeline consists of the following steps:
- Data extraction: Use Spark, MapReduce and other big data technologies to preprocess the original data.
- Model training: The first stage is used to generate feature data for machine learning model training, and generate corresponding models.
- Model evaluation: Evaluate and test phase 2 generated models through test sets or validation sets.
- Model reasoning: The model verified in the third stage is finally pushed online to provide reasoning services for business.
As you can see, the end-to-end pipeline contains many different types of computing tasks. For each task, there are appropriate professional systems in practice (TensorFlow, PyTorch, Spark, Presto). However, these systems are independent of each other and usually rely on external file systems to implement the transfer of data from one phase to the next. However, frequent use of file systems to achieve data exchange will bring a large amount of I/O overhead, and often become the bottleneck of the entire workflow.
Fluid is very suitable for this scenario. Users can create a Dataset object, which has the ability to decentralize and cache data to Kubernetes compute nodes as the data exchange medium. In this way, remote writing and reading of data are avoided and the efficiency of data use is improved. But the problem here is resource estimation and reservation for the temporary data cache. Before data production and consumption, accurate data volume estimation is difficult to meet. A high estimate leads to a waste of resource reservation, and a low estimate leads to a higher possibility of data writing failure. It is more user-friendly to expand and shrink as needed. We want to achieve a similar effect to page cache, where the layer is transparent to the end user but the cache acceleration is real.
We introduced cache elastic scaling through Fluid through a custom HPA mechanism. When the amount of cached data reaches a certain percentage, elastic expansion is triggered to expand cache space. For example, if the cache space usage exceeds 75%, the total cache space is 10 GB. When the cache space usage reaches 8 GB, the expansion mechanism is triggered.
Here’s an example to help you experience Fluid’s ability to expand and shrink automatically.
The premise condition
Kubernetes 1.18 and above is recommended, because prior to 1.18, HPA could not customize the scaling policy, it was implemented by hard coding. After 1.18, users can customize capacity expansion policies, for example, they can define the cooling time after a capacity expansion.
Specific steps
1. Install the JQ tool to parse JSON.
In this example we are using centos and can install JQ through yum.
yum install -y jq
Copy the code
2. Download and install Fluid latest version.
git clone https://github.com/fluid-cloudnative/fluid.git
cd fluid/charts
kubectl create ns fluid-system
helm install fluid fluid
Copy the code
3. Deploy or configure Prometheus.
Metrics exposed by AlluxioRuntime’s cache engine are collected by Prometheus if there is no Prometheus in the cluster:
$ cd fluid
$ kubectl apply -f integration/prometheus/prometheus.yaml
Copy the code
If you have Prometheus in your cluster, you can write the following configuration to the Prometheus configuration file:
scrape_configs:
- job_name: 'alluxio runtime'
metrics_path: /metrics/prometheus
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_monitor]
regex: alluxio_runtime_metrics
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: web
action: keep
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_label_release]
target_label: fluid_runtime
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
target_label: pod
replacement: $1
action: replace
Copy the code
4. Verify that Prometheus is installed successfully.
$kubectl get ep -n kube-system Prometheus - SVC NAME ENDPOINTS AGE Prometheus - SVC 10.76.0.2:9090 6m49s $kubectl get SVC -n kube-system Prometheus - SVC NAME TYPE cluster-ip external-ip PORT(S) AGE Prometheus - SVC NodePort 172.16.135.24 < None > 9090:32114/TCP 2m7sCopy the code
If you want to visualize monitoring metrics, you can install Grafana to validate monitoring data, as described in the documentation.
5. Deploy metrics Server.
Check whether the metrics server is included in the cluster. If the metrics server is correctly configured, run kubectl top node to display the memory and CPU.
Kubectl Top Node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% 192.168.1.204 93m 2% 1455Mi 10% 192.168.1.205 125m 3% 1925Mi 13% 192.168.1.206 96m 2% 1689Mi 11%Copy the code
Otherwise, run the following command:
kubectl create -f integration/metrics-server
Copy the code
6. Deploy the custom-metrics-API component.
To scale based on custom metrics, you need to have two components:
- The first component collects metrics from the application and stores them in the Prometheus time series database.
- The second component uses the metrics collected to extend the Kubernetes custom Metrics API, known as K8S-Prometheus-Adapter.
The first component is deployed in step 3, and the second component is deployed.
If custom-metrics-API has been configured, add the configuration related to the dataset to the Adapter configMap configuration:
apiVersion: v1 kind: ConfigMap metadata: name: adapter-config namespace: monitoring data: config.yaml: | rules: - seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime! ="",instance! ="",job="alluxio runtime",namespace! ="",pod! =""}' seriesFilters: - is: ^Cluster_(CapacityTotal|CapacityUsed)$ resources: overrides: namespace: resource: namespace pod: resource: pods fluid_runtime: resource: datasets name: matches: "^(.*)" as: "capacity_used_rate" metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))Copy the code
Otherwise, run the following command:
kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api
Copy the code
Note: Because custom-metrics-API connects to the cluster’s Prometheous access address, please replace the Prometheous URL with the actual Prometheous address you use.
Checking custom indicators:
$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "pods/capacity_used_rate",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "datasets.data.fluid.io/capacity_used_rate",
"singularName": "",
"namespaced": true,
"kind": "MetricValueList",
"verbs": [
"get"
]
},
{
"name": "namespaces/capacity_used_rate",
"singularName": "",
"namespaced": false,
"kind": "MetricValueList",
"verbs": [
"get"
]
}
]
}
Copy the code
7. Submit the Dataset used by the test.
$ cat<<EOF >dataset.yaml apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: spark spec: mounts: - mountPoint: https://mirrors.bit.edu.cn/apache/spark/ name: spark --- apiVersion: data.fluid.io/v1alpha1 kind: AlluxioRuntime metadata: name: spark spec: replicas: 1 tieredstore: levels: - mediumtype: MEM path: /dev/shm quota: Gi high: 1 "0.99" low: "0.7" the properties: alluxio. User.. Streaming data. The timeout: 300sec EOF $ kubectl create -f dataset.yaml dataset.data.fluid.io/spark created alluxioruntime.data.fluid.io/spark createdCopy the code
8. Check whether the Dataset is in the available state.
It can be seen that the total amount of data in this data set is 2.71GiB. Currently, the number of cache nodes provided by Fluid is 1, and the maximum cache capacity that can be provided is 1GiB. In this case, the amount of data cannot meet the requirements of full data cache.
$kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE Spark 2.71GiB 0.00B 1.00GiB 7 m38s 0.0% BoundCopy the code
9. After the Dataset is available, check whether the monitoring metrics can be obtained from custom-metrics-API.
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" | jq { "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate" }, "items": [ { "describedObject": { "kind": "Dataset", "namespace": "default", "name": "spark", "apiVersion": "data.fluid.io/v1alpha1" }, "metricName": "capacity_used_rate", "timestamp": "2021-04-04T07:24:52Z", "value": "0" } ] }Copy the code
10. Create an HPA task.
$ cat<<EOF > hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: spark
spec:
scaleTargetRef:
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
name: spark
minReplicas: 1
maxReplicas: 4
metrics:
- type: Object
object:
metric:
name: capacity_used_rate
describedObject:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
name: spark
target:
type: Value
value: "90"
behavior:
scaleUp:
policies:
- type: Pods
value: 2
periodSeconds: 600
scaleDown:
selectPolicy: Disabled
EOF
Copy the code
First of all, let’s interpret the configuration from the sample. There are mainly two parts: one is the rule of expansion capacity, and the other is the sensitivity of expansion capacity:
- Rule: The condition for triggering capacity expansion is that the amount of cached data of a Dataset object accounts for 90% of the total cache capacity. The capacity expansion object is AlluxioRuntime. The minimum number of copies is 1, and the maximum number of copies is 4. The Dataset and AlluxioRuntime objects need to be in the same namespace.
- Policy: The version can be K8s 1.18 or later. You can set the stability time and step size ratio for capacity expansion and capacity reduction respectively. In periodSeconds, two new replicas are added, which cannot exceed the maxReplicas limit. And complete an enlarged, cooldown (stabilizationWindowSeconds) for 20 minutes; The capacity reduction policy can be directly disabled.
11. Check the HPA configuration. The current data ratio of the cache space is 0. Far below the threshold for triggering expansion.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
spark AlluxioRuntime/spark 0/90 1 4 1 33s
$ kubectl describe hpa
Name: spark
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 07 Apr 2021 17:36:39 +0800
Reference: AlluxioRuntime/spark
Metrics: ( current / target )
"capacity_used_rate" on Dataset/spark (target value): 0 / 90
Min replicas: 1
Max replicas: 4
Behavior:
Scale Up:
Stabilization Window: 0 seconds
Select Policy: Max
Policies:
- Type: Pods Value: 2 Period: 600 seconds
Scale Down:
Select Policy: Disabled
Policies:
- Type: Percent Value: 100 Period: 15 seconds
AlluxioRuntime pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ScaleDownStabilized recent recommendations were higher than current one, applying the highest recent recommendation
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Copy the code
12. Create a data preheating task.
$ cat<<EOF > dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: spark
spec:
dataset:
name: spark
namespace: default
EOF
$ kubectl create -f dataload.yaml
$ kubectl get dataload
NAME DATASET PHASE AGE DURATION
spark spark Executing 15s Unfinished
Copy the code
13. At this point, it can be found that the amount of cached data is close to the cache capacity provided by Fluid (1GiB) and elastic scaling conditions are triggered.
$kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE Spark 2.71GiB 1020.92MiB 1.00 GiB 36.8% 5 m15s BoundCopy the code
Based on HPA monitoring, you can see that the expansion of Alluxio Runtime has started and the expansion step is 2.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
spark AlluxioRuntime/spark 100/90 1 4 2 4m20s
$ kubectl describe hpa
Name: spark
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Wed, 07 Apr 2021 17:56:31 +0800
Reference: AlluxioRuntime/spark
Metrics: ( current / target )
"capacity_used_rate" on Dataset/spark (target value): 100 / 90
Min replicas: 1
Max replicas: 4
Behavior:
Scale Up:
Stabilization Window: 0 seconds
Select Policy: Max
Policies:
- Type: Pods Value: 2 Period: 600 seconds
Scale Down:
Select Policy: Disabled
Policies:
- Type: Percent Value: 100 Period: 15 seconds
AlluxioRuntime pods: 2 current / 3 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 3
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 21s horizontal-pod-autoscaler New size: 2; reason: Dataset metric capacity_used_rate above target
Normal SuccessfulRescale 6s horizontal-pod-autoscaler New size: 3; reason: Dataset metric capacity_used_rate above target
Copy the code
14. After waiting for some time, the cache space of the dataset was increased from 1GiB to 3GiB, and the data cache was almost complete.
$kubectl get dataset NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE Spark 2.71GiB 2.59GiB 3.00 GiB 95.6% Bound 12 mCopy the code
According to the HPA status, the number of replicas in the Runtime corresponding to the Dataset is 3, and the capacity_used_rate of the used cache space is 85%, which does not trigger cache expansion.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
spark AlluxioRuntime/spark 85/90 1 4 3 11m
Copy the code
15. Clean up your environment.
kubectl delete hpa spark
kubectl delete dataset spark
Copy the code
conclusion
Fluid provides the ability to use cache capacity on demand by combining Prometheous, Kubernetes HPA and Custom Metrics capabilities to trigger automatic elastic scaling based on the proportion of cache space occupied. In this way, users can be more flexible in using the distributed cache to improve the data access acceleration capability. In the future, we will provide the ability of timed expansion and contraction to provide stronger certainty for expansion and contraction capacity.
Fluid’s repository: github.com/fluid-cloud… Welcome to follow, contribute code and star.