background
Flink version 1.13.3 uses native K8S deployment mode. Originally, HDFS is used as the storage address of status snapshots (Checkpoint and Savepoint). However, because only HDFS is used as the storage address of status snapshots and Hadoop framework is heavy, It takes up a lot of resources in the K8S cluster and is now considering replacing it with a more lightweight distributed file system, NFS.
Status Back-end parameter Settings
Since Flink1.13, state the back-end is divided into two kinds: HashMapStateBackend, EmbeddedRocksDBStateBackend. If the specified state back end is not displayed, Flink uses HashMapStateBackend.
State of the backend | State Storage location in memory | Whether asynchronous snapshots are supported |
---|---|---|
HashMapStateBackend | The JVM heap memory | no |
EmbeddedRocksDBStateBackend | RocksDB (Off-heap managed memory) | is |
Please refer to the official website for the applicable scenarios and advantages and disadvantages of both.
This article USES EmbeddedRocksDBStateBackend + FileSystemCheckpointStorage way of storage. The operator status is stored in the RocksDB database, and Checkpoint and Savepoint are stored in a file mounted to JobManager. Set the parameters as follows:
state.backend: rocksdb
state.checkpoint-storage: filesystem
state.checkpoints.dir: /opt/flink/checkpoint
state.savepoints.dir: /opt/flink/Savepoint
kubernetes.pod-template-file: /opt/flink/conf/pod-template.yaml
Copy the code
pod-template
Note The Checkpoint and Savepoint files need to be accessed by all TaskManager and JobManager. In this document, mount NFS files using PV and PVC. You can use the kubernetes.pod-template-file parameter to specify the location of the pod-template.yaml file, and use the yaml file to specify the location of Checkpoint and Savepoint. Pod – template. Yaml is as follows:
apiVersion: v1
kind: Pod
spec:
containers:
# Do not change the main container name
- name: flink-main-container
volumeMounts:
- mountPath: /opt/flink/Checkpoint
name: Checkpoint
- mountPath: /opt/flink/Savepoint
name: Savepoint
volumes:
- name: Checkpoint
persistentVolumeClaim:
claimName: flink-checkpoint-pvc
- name: Savepoint
persistentVolumeClaim:
claimName: flink-savepoint-pvc
Copy the code
In addition, the yamL file can set other parameters of JobManager and TaskManager based on the priority:
- Defined by Flink: Indicates that the user cannot configure the Flink.
- Defined by the user: A user can specify it freely. Flink framework does not set it. This value is explicitly configured first, followed by the value in pod-template.yaml, and if not specified, the default value is used.
- Merged with Flink: Merged Flink value with user-defined value. If the name is the same, use Flink value.
The DEPLOYMENT files such as PVC and StorageClass used in the PV can be obtained by replying to “pod-template” in the background of GZH “HEY DATA”.