Hello everyone, I’m Zhang Jintao.

A friend in the group asked me the question above, how to measure the time of the rolling upgrade process.

This problem can be abstracted into a common requirement that applies to multiple scenarios.

  • For example, if you are the administrator of the Kubernetes cluster, you want to measure the time spent in this process to find optimization points;
  • Let’s say you’re doing CI/CD, and you want to measure how long it takes to do CI/CD;

Existing programs

Kubernetes has provided a convenient way to solve this problem, which I mentioned in my reply, by measuring events.

For example, let’s create a Deployment in K8S and look at the event information in the process:

➜ ~ kubectl create NS Moelove Namespace/Moelove created ➜ ~ kubectl -n Moelove create Deployment redis -- image = GHCR. IO/moelove/redis: alpine deployment. The apps/redis created ➜ ~ kubectl - n moelove get deploy NAME READY Up-to-date AVAILABLE AGE Redis 1/1 1 1 16s ➜ ~ kubectl -n moelove get events LAST SEEN TYPE REASON OBJECT MESSAGE 27s Normal Scheduled pod/redis-687967dbc5-gsz5n Successfully assigned moelove/redis-687967dbc5-gsz5n to kind-control-plane 27s Normal Pulled pod/redis-687967dbc5-gsz5n Container image "ghcr.io/moelove/redis:alpine" already present on machine 27s Normal Created pod/redis-687967dbc5-gsz5n Created container redis 27s Normal Started pod/redis-687967dbc5-gsz5n Started container redis 27s Normal SuccessfulCreate replicaset/redis-687967dbc5 Created pod: redis-687967dbc5-gsz5n 27s Normal ScalingReplicaSet deployment/redis Scaled up replica set redis-687967dbc5 to 1Copy the code

You can see that some of the events that we’re focusing on are already documented. But you can’t go through Kubectl every time. Look at it this way, it’s a waste of time.

One way I used to do this is to write a program in K8S to continuously monitor and collect events in the K8S cluster and write them into a set of systems I developed for storage and visualization. But this approach requires additional development and is not universal. Here’s another, better solution.

A more elegant solution

All of these events in K8S correspond to one of our operations. For example, above, we Created a Deployment, which generated several events, including Scheduled, Pulled, Created, etc. Is it similar to tracing when we abstract it?

Here we will use Jaeger, a graduation project of CNCF. I have introduced it many times in the previous K8S Ecology Weekly. Jaeger is an open source, end-to-end distributed tracing system. But this article isn’t about it, so let’s look at the documentation and quickly deploy a Jaeger. Another CNCF sandbox-level project is OpenTelemetry, an observable framework for cloud-native software that we can use in conjunction with Jaeger. But these two projects are not the focus of this article, so we’ll skip them here.

Next, we will introduce the main project used in this article, which is an open source project from Weaveworks named KSPAN. Its main method is to organize events in K8S as span in the trace system.

Deploy kspan

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kspan
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  creationTimestamp: null
  name: kspan-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: kspan
  namespace: default
---
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: kspan
  name: kspan
spec:
  containers:
  - image: Docker. IO/weaveworks kspan: v0.0
    name: kspan
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  serviceAccountName: kspan

Copy the code

You can use YAML directly for deployment testing, but note that the above configuration files are not used in a production environment, and the RBAC permissions need to be changed.

It will pass span by default using OTLP-collector. default:55680, so you need to make sure that this SVC exists. When all of the above is deployed, you will look something like this:

➜ ~ kubectl get all NAME READY STATUS RESTARTS AGE pod/ jaeger-76c84457FB-89S5V 1/1 Running 0 64M pod/kspan 1/1 Running 0  35m pod/otel-agent-sqlk6 1/1 Running 0 59m pod/otel-collector-69985cc444-bjb92 1/1 Running 0 56m NAME TYPE CLUSTER-IP External-ip PORT(S) AGE service/ Jaeger-Collector ClusterIP 10.96.47.12 < None > 14250/TCP 60M service/ Kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 39H service/ otel-Collector ClusterIP 10.96.231.43 <none> 4317 / TCP, 14250 / TCP, 14268 / TCP, 9411 / TCP, 8888 / TCP 59 m service/otlp - collector ClusterIP 10.96.79.181 < none > 55680 / TCP 52 m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/otel-agent 1 1 1 1 1 <none> 59m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/jaeger 1/1 1 1 73m deployment.apps/otel-collector 1/1 1 1 59m NAME DESIRED CURRENT READY AGE replicaset.apps/jaeger-6f77c67c44 0 0 0 73m replicaset.apps/jaeger-76c84457fb 1 1 1 64m replicaset.apps/otel-collector-69985cc444 1 1 1 59mCopy the code

Get started practice

Here we create a namespace to test:

➜  ~ kubectl create ns moelove
namespace/moelove created
Copy the code

Create a Deployment

➜ ~ kubectl - n moelove create deployment redis - image = GHCR. IO/moelove/redis: alpine deployment. The apps/redis created ➜ ~ kubectl -n moelove get pods NAME READY STATUS RESTARTS AGE redis-687967dbc5-xj2zs 1/1 Running 0 10sCopy the code

Check it out on Jaeger:

Click here for details

As you can see, events related to the creation of the deploy are grouped together, and you can see details such as how long they took on the timeline.

conclusion

This paper introduces how to collect events in K8S with Jaeger tracing, so as to better grasp the time consuming points of all events in K8S cluster and find the direction of optimization and measure results more easily.


Please feel free to subscribe to my official account [MoeLove]