An interesting scenario was recently encountered when the business side had a container that only ran asynchronous tasks, which meant it had simple logic to fetch content from upstream services for data processing, but the application itself did not provide any way to determine the current state of the service. How do we implement probe management in Kubernetes when the service runs blocked? Many students have used survival probes, ready probes, and start probes, but they all have one common attribute: the application itself needs to provide an Http/TCP interface or a Command to evaluate whether the service is currently healthy. In the absence of the above conditions, the only way to determine the health of a container is to capture its console log output
Don’t ask me why my business applications are blocking, it’s a long story. And don’t ask me if relying on a container’s printed log is unreliable. Where development requires stable application will not write probe?
Although we as the platform to take over the pan man, when the application blocked, most of the time in K8S to kill the problem of the application restart. Over time, our rich experience with delete containers compelled us to leave such tasks to K8S.
First, how do you capture your own console logs within the container
When a K8S cluster is deployed, there is a default service called Kubernetes in the default namespace. Its main role is to enable the cluster containers to retrieve the addresses used by the K8S API. We can in the container k8s API access by https://kubernetes.default.svc.cluster.local. To retrieve logs from the container console, use the following interface:
https://kubernetes.default.svc/api/v1/namespaces/$NAMESPACE/pods/$HOSTNAME/log
Copy the code
Here we need to pass the namespace metadata of the container to the environment variable as follows:
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
Copy the code
Let’s take a look at the situation:
forbidden: User system:anonymous
It seems that RABC related permissions have to be added. We can create a separate ServiceAccount for our application as follows:
apiVersion: v1
kind: ServiceAccount
metadata:
name: log-capture
labels:
app.kubernetes.io/name: app
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
labels:
app.kubernetes.io/name: app
name: log-capture
rules:
- apiGroups: [""]
resources: ["pods","pods/log"]
verbs: ["get","list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app.kubernetes.io/name: app
name: log-capture
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: log-capture
subjects:
- kind: ServiceAccount
name: log-capture
Copy the code
Add the SA to Workerload, for example:
spec:
containers: {}
serviceAccount: log-capture
serviceAccountName: log-capture
Copy the code
If you are uninhibited in nature, some can also make RoleBinding on default
In addition, I also need to bring my own authentication information with the request. By default container SA Token in the/var/run/secrets/kubernetes. IO/serviceaccount/Token path. Let’s ask again
Ok, now we can capture our logs inside the container.
Secondly, the relationship between K8S probe and console log is established
If you are aware of the problem, simply fetching the log interface does not tell whether the application is blocking or not, because the console log of the container is persisted to the Node node, and the result will always be true by checking whether the log has output
Should we add to the logging interface? The sinceSeconds= parameter, so we can capture the recent logs.
All we need to do is implement the above logic in a shell and put it in a mirror or configMap using liVENESS probes. Here xiao Bai added a probe failure counter in the probe script to increase the time to expand the log capture, can refer to the following:
#! /bin/bash
if [[ -f /tmp/ProbeFailedTimes ]]; then
COUNT=$(cat /tmp/ProbeFailedTimes)
let Probe_Seconds=(1+${COUNT}) *${COUNT}/ 2 * 60else
COUNT="0"
Probe_Seconds="60"
fi
STDOUT=`timeout 10 curl -s https://kubernetes.default.svc/api/v1/namespaces/$NAMESPACE/pods/$HOSTNAME/log? sinceSeconds=${Probe_Seconds} -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"`
if [[ "$STDOUT"= =""]].then
let COUNT=${COUNT}+ 1echo "$COUNT" > /tmp/ProbeFailedTimes
echo "Now $(cat /tmp/ProbeFailedTimes)"
exit 1
else
rm -rf /tmp/ProbeFailedTimes
echo "Now 0"
exit 0
fi
Copy the code
Then we added the LiVENESS probe to workerLoad:
containers:
livenessProbe:
exec:
command:
- /bin/bash
- -c
- /probe.sh
failureThreshold: 15
initialDelaySeconds: 120
periodSeconds: 60
successThreshold: 1
timeoutSeconds: 10
Copy the code
In this way, liVENESS checks container logs every 60 seconds. If no log is generated within 180 seconds, the log will be generated within 36 seconds until the 15th check. If no log is generated within 105 minutes after the probe is removed, the task will fail. K8s Restarts the POD. Once a log is printed during probe detection, the counter is reset.
Why use a counter? Flexibly adjust the time range of the request log in the container to avoid probe failure when the probe appears in the middle of two prints
Finally, instead of asking what happens if my app doesn’t even print a container log, I can only say, well, good luck 😂
Pay attention to the public account “cloud native Xiaobai”, get more exciting content