background
As the scale of the business grows, more and more Kafka clusters are required, which poses great challenges for deployment and management. We expect to take advantage of K8S ‘excellent capacity of expansion and rapid deployment to reduce the burden of daily work. Therefore, the feasibility of K8S on Kafka is investigated.
For example, a Kafka cluster involves many components and is a stateful cluster. The industry uses a customized operator solution. At present, there are several related warehouses on GitHub. Based on the comprehensive consideration of community activity and usage, Strimzi GitHub address was adopted this time.
Kafka component interaction diagram
plan
- Strimzi is deployed using ali Cloud K8S cluster
- Since the kafka used within the group is a secondary development from the open source version, a custom Strimazi-Kafka image needs to be maintained
- Strimzi manages the Kafka cluster, which includes Kafka, ZK, Kafka-Half,
- Use the ZK GitHub address in the zoo-entrance agent cluster
- Deploy Prometheus to capture Metrics for Kafka and ZK
- Enable the service port to expose Kafka and ZK for external use in the K8S cluster
Actual combat process
Build a custom Kafka image
- Pull the latest strimazi-kafka-operator code from the company Git (slightly modified with the open source version, you can directly use the open source version for experiments)
- In the docker-images folder, there is a Makefile that executes the docker_build script, which will execute the build.sh script. This step will pull the kafka installation package from the official website. We need to change the package repair of this step to our internal installation package.
- After building the image, the image is local and we need to upload the image to the harbor server inside the company
The deployment of the operator
Only one operator needs to be deployed in each K8S cluster
- Sufficient required: A healthy K8S cluster
- Kubectl create namespace kafka kubectl create namespace kafka is used by default
- Pull the latest code from your company Git (address first)
- By default, the file listens to the namespace whose name is kafka. If you want to modify the namespace, run sed -i ‘s/namespace:.*/namespace: kafka/’ install/cluster-operator/RoleBinding.yaml (replace kafka/ in command)
- Kubectl apply -f install/cluster-operator/ -n kafka to all files
- Kubectl get Pods-nkafka kubectl get Pods-nkafka
- Check the creation status of these resources and the operation status of operators from the K8S control console of Ali Cloud
Deploy the Kafka cluster
Ensure that your operator has been deployed successfully and that kafka deploys a namespace that is monitored by the operator above
- Again, go to the latest code directory, where the examples/kafka directory contains the files required for this deployment
- Deploy Kafka and ZK
-
- Kafka-persistent. yaml is a core file that runs kafka with ZK and Kafka-Exporters.
ApiVersion: kafka.strimaz. IO /v1beta2 kind: kafka metadata: name: my-cluster spec: kafka: version: 2.8.1 replicas: 3 resources: requests: memory: 16Gi cpu: 4000m limits: memory: 16Gi cpu: 4000m image: Repository.poizon.com/kafka-operator/poizon/kafka:2.8.4 jvmOptions: - Xms: 3072 m -xmx: 3072 m listeners: - name: external port: 9092 type: nodeport tls: false - name: plain port: 9093 type: internal tls: false config: offsets.topic.replication.factor: 2 transaction.state.log.replication.factor: 2 transaction.state.log.min.isr: 1 default.replication.factor: 2 *** template: pod: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: strimzi.io/name operator: In values: - my-cluster-kafka topologyKey: "kubernetes.io/hostname" storage: type: persistent-claim size: 100Gi class: rocketmq-storage deleteClaim: false metricsConfig: type: jmxPrometheusExporter valueFrom: configMapKeyRef: name: kafka-metrics key: kafka-metrics-config.yml zookeeper: replicas: 3 resources: requests: memory: 3Gi cpu: 1000m limits: memory: 3Gi cpu: 1000m jvmOptions: -Xms: 2048m -Xmx: 2048m jmxOptions: {} template: pod: affinity: podAntiAffinity: *** storage: type: persistent-claim size: 50Gi class: rocketmq-storage deleteClaim: false metricsConfig: type: jmxPrometheusExporter valueFrom: configMapKeyRef: name: kafka-metrics key: zookeeper-metrics-config.yml *** ***Copy the code
-
- You can change the name of the Kafka cluster. The name attribute in the fourth line is currently my-cluster by default
- You can change the number of pods (nodes) in Kafka. The default value is 3
-
- You can modify Pod to configure memory CPU
- Kafka JVM startup heap size can be modified
-
- You can modify the kafka configuration in line 36
- The disk type and size can be changed. The disk type is in line 50 and can be changed to other storage types. Currently, the disk type can be efficient cloud disk, SSD, or ESSD
-
- Zk modifications are similar to kafka, modifiable things, in the same file
- Below the file are the metrics kafka and ZK need to expose, which can be added or deleted as required
-
- Kubect apply -f kafka-persistent.yaml -nkafka
- Deploy the ZK agent
-
- Since external components are not officially supported to access ZK directly, proxy access is used
- For reasons of safety, the official is deliberately not support external program to visit zk: https://github.com/strimzi/strimzi-kafka-operator/issues/1337
-
- Solution: github.com/scholzj/zoo…
-
- After deploying the ZK agent, we need to create a LoadBalance service on the K8S console to expose the agent to applications outside the cluster for connection. K8s console –> Network –> Services –> Create (select loadBalance to create, then go to zoo-entrance)
- Deploy the zk – exporter
-
- Zk-exporter is not an official operator, we use github.com/dabealu/zoo…
- In the zK-exe. yaml file in the folder, we only need to modify the zK address (spec.container.args) that is being monitored.
-
- Execute kubectl apply -f zk-ext.yaml to complete the deployment
- The deployment of kafka – JMX
-
- Because ingress does not support TCP connections and loadBalance is expensive, Kafka’s JMX is exposed using nodePort
- Nodeports can be created on the Ali Cloud console, or they can be created using kafka-jmx.yaml files
apiVersion: v1
kind: Service
metadata:
labels:
strimzi.io/cluster: my-cluster
strimzi.io/name: my-cluster-kafka-jmx
name: my-cluster-kafka-jmx-0
spec:
ports:
- name: kafka-jmx-nodeport
port: 9999
protocol: TCP
targetPort: 9999
selector:
statefulset.kubernetes.io/pod-name: my-cluster-kafka-0
strimzi.io/cluster: my-cluster
strimzi.io/kind: Kafka
strimzi.io/name: my-cluster-kafka
type: NodePort
Copy the code
- The deployment of kafka – exporter – service
-
- After deploying Kafka, we have my exporter enabled in our configuration. However, after officially starting my exporter, a service was not automatically generated, so we deployed a service to make Prometheus connection easier
- In the kafka-exi-service. yaml file in the folder
apiVersion: v1
kind: Service
metadata:
labels:
app: kafka-export-service
name: my-cluster-kafka-exporter-service
spec:
ports:
- port: 9404
protocol: TCP
targetPort: 9404
selector:
strimzi.io/cluster: my-cluster
strimzi.io/kind: Kafka
strimzi.io/name: my-cluster-kafka-exporter
type: ClusterIP
Copy the code
-
- Run kubectl apply -f kafka-exi-service. yaml to complete the deployment
- The deployment of kafka – Prometheus
-
- If Prometheus was deployed outside the K8S cluster, data collection would be cumbersome, so we deployed Prometheus directly inside the cluster
- In the kafka-Promethe. yaml folder, you can modify Prometheus configuration, such as memory CPU required, monitoring data retention time, external cloud disk size, and Kafka and ZK addresses to be listened on
apiVersion: apps/v1 kind: StatefulSet metadata: name: kafka-prometheus labels: app: kafka-prometheus spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: kafka-prometheus serviceName: kafka-prometheus updateStrategy: type: RollingUpdate template: metadata: labels: app: kafka-prometheus spec: containers: - args: - '--query.max-concurrency=800' - '--query.max-samples=800000000' *** command: - /bin/prometheus image: 'repository.poizon.com/prometheus/prometheus:v2.28.1 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 10 httpGet: path: /status port: web scheme: HTTP initialDelaySeconds: 300 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 3 name: kafka-prometheus resources: limits: cpu: 500m memory: 512Mi requests: cpu: 200m memory: 128Mi volumeMounts: - mountPath: /etc/localtime name: volume-localtime - mountPath: /data/prometheus/ name: kafka-prometheus-config - mountPath: /data/database/prometheus name: kafka-prometheus-db terminationMessagePath: /dev/termination-log terminationMessagePolicy: File terminationGracePeriodSeconds: 30 restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 0 volumes: - hostPath: path: /etc/localtime type: '' name: volume-localtime - configMap: defaultMode: 420 name: kafka-prometheus-config name: kafka-prometheus-config volumeClaimTemplates: - apiVersion: v1 kind: PersistentVolumeClaim metadata: name: kafka-prometheus-db spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: rocketmq-storage volumeMode: Filesystem status: phase: PendingCopy the code
-
- Execute kubectl apply -f kafka-promethes. yaml to complete the deployment
- After deployment, expose Prometheus to grafana of the monitoring group, connect to POD IP for verification, and then select network > Route > Create from K8S Control console, create an ingress, select Prometheus service, and apply for a domain name from o&M. Can.
conclusion
- advantages
-
- Rapid cluster deployment (minute), rapid cluster expansion (second), and rapid disaster recovery (second)
- Support for rolling updates, backup and restore
- disadvantages
-
-
The introduction of more components increases the complexity
-
Access outside the K8S cluster is not very friendly
-
The text/ZUOQI
Pay attention to the technology, do the most fashionable technology!