An interesting scenario was recently encountered when the business side had a container that only ran asynchronous tasks, which meant it had simple logic to fetch content from upstream services for data processing, but the application itself did not provide any way to determine the current state of the service. How do we implement probe management in Kubernetes when the service runs blocked? Many students have used survival probes, ready probes, and start probes, but they all have one common attribute: the application itself needs to provide an Http/TCP interface or a Command to evaluate whether the service is currently healthy. In the absence of the above conditions, the only way to determine the health of a container is to capture its console log output

Don’t ask me why my business applications are blocking, it’s a long story. And don’t ask me if relying on a container’s printed log is unreliable. Where development requires stable application will not write probe?

Although we as the platform to take over the pan man, when the application blocked, most of the time in K8S to kill the problem of the application restart. Over time, our rich experience with delete containers compelled us to leave such tasks to K8S.

First, how do you capture your own console logs within the container

When a K8S cluster is deployed, there is a default service called Kubernetes in the default namespace. Its main role is to enable the cluster containers to retrieve the addresses used by the K8S API. We can in the container k8s API access by https://kubernetes.default.svc.cluster.local. To retrieve logs from the container console, use the following interface:

https://kubernetes.default.svc/api/v1/namespaces/$NAMESPACE/pods/$HOSTNAME/log
Copy the code

Here we need to pass the namespace metadata of the container to the environment variable as follows:

containers:
- env:
  - name: NAMESPACE
    valueFrom:
      fieldRef:
        apiVersion: v1
        fieldPath: metadata.namespace
Copy the code

Let’s take a look at the situation:

forbidden: User system:anonymousIt seems that RABC related permissions have to be added. We can create a separate ServiceAccount for our application as follows:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: log-capture
  labels:
    app.kubernetes.io/name: app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/name: app
  name: log-capture
rules:
- apiGroups: [""]
  resources: ["pods","pods/log"]
  verbs: ["get","list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    app.kubernetes.io/name: app
  name: log-capture
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: log-capture
subjects:
- kind: ServiceAccount
  name: log-capture
Copy the code

Add the SA to Workerload, for example:

spec:
  containers: {}
  serviceAccount: log-capture
  serviceAccountName: log-capture
Copy the code

If you are uninhibited in nature, some can also make RoleBinding on default

In addition, I also need to bring my own authentication information with the request. By default container SA Token in the/var/run/secrets/kubernetes. IO/serviceaccount/Token path. Let’s ask again

Ok, now we can capture our logs inside the container.

Secondly, the relationship between K8S probe and console log is established

If you are aware of the problem, simply fetching the log interface does not tell whether the application is blocking or not, because the console log of the container is persisted to the Node node, and the result will always be true by checking whether the log has output

Should we add to the logging interface? The sinceSeconds= parameter, so we can capture the recent logs.

All we need to do is implement the above logic in a shell and put it in a mirror or configMap using liVENESS probes. Here xiao Bai added a probe failure counter in the probe script to increase the time to expand the log capture, can refer to the following:

#! /bin/bash

if [[ -f /tmp/ProbeFailedTimes ]]; then
    COUNT=$(cat /tmp/ProbeFailedTimes)
    let Probe_Seconds=(1+${COUNT}) *${COUNT}/ 2 * 60else
    COUNT="0"
    Probe_Seconds="60"
fi

STDOUT=`timeout 10 curl -s https://kubernetes.default.svc/api/v1/namespaces/$NAMESPACE/pods/$HOSTNAME/log? sinceSeconds=${Probe_Seconds} -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"`

if [[ "$STDOUT"= =""]].then
    let COUNT=${COUNT}+ 1echo "$COUNT" > /tmp/ProbeFailedTimes
    echo "Now $(cat /tmp/ProbeFailedTimes)"
    exit 1
else
    rm -rf /tmp/ProbeFailedTimes
    echo "Now 0"
    exit 0
fi
Copy the code

Then we added the LiVENESS probe to workerLoad:

containers:
  livenessProbe:
    exec:
      command:
      - /bin/bash
      - -c
      - /probe.sh
    failureThreshold: 15
    initialDelaySeconds: 120
    periodSeconds: 60
    successThreshold: 1
    timeoutSeconds: 10
Copy the code

In this way, liVENESS checks container logs every 60 seconds. If no log is generated within 180 seconds, the log will be generated within 36 seconds until the 15th check. If no log is generated within 105 minutes after the probe is removed, the task will fail. K8s Restarts the POD. Once a log is printed during probe detection, the counter is reset.

Why use a counter? Flexibly adjust the time range of the request log in the container to avoid probe failure when the probe appears in the middle of two prints

Finally, instead of asking what happens if my app doesn’t even print a container log, I can only say, well, good luck 😂


Pay attention to the public account “cloud native Xiaobai”, get more exciting content