preface

Recently, it was found that the timeout exception caused by CPU busy occurred during service peak hours. According to the monitoring, it is because some PODS on Node suddenly occupied a large number of CPUS.

Q: Is there no CPU limit? Can limiting CPU usage be solved? Solution: In fact, this problem can not be fundamentally solved, because the container engine used is Docker, and Docker uses cgroups technology, which introduces a long-standing problem, the isolation of cgroups. When the problem occurs, there is no way to stop the abnormal CPU process directly, but there is a short peak. The phenomenon is: the CPU is limited to 2 cores, the sudden CPU may be 4, 5, 6, etc. Then the container will be killed, and K8S will try to rebuild the container.

So how to solve it?

  1. Use a container engine with better isolation, such as KATA (VM level).
  2. The optimizer

Plan 1

We know that solution 1 is relatively thorough and only needs to be handled globally once. However, the technology is relatively new and we are not sure whether it will cause other problems. We plan to try Kata Container with some nodes later.

Scheme 2

It has high requirements for application developers and requires targeted intervention of corresponding developers, which has high short-term benefits. We deployed this kind of software first.

How to implement it?

We know that the program is in operation, unless there is a very serious BUG, the CPU peak is generally very short, at this time it is basically too late to rely on human flesh to capture packages, but also very expensive, we hope to have a program to automatically grab the thread stack when the CPU reaches a certain threshold to optimize afterwards. In addition, only one run is allowed within a certain period of time to prevent the program from being unavailable due to cyclic packet capture.

In the end, we found that the alarm mechanism was very similar to Grafana and Prometheus, and all we had to do was receive the webhook for the alarm and fetch the thread stack from the corresponding container.

So we took Grafana, and we wrote a program to do that.

Project information

Development language: Go, Shell project address: https://github.com/majian159/k8s-java-debug-daemon

k8s-java-debug-daemon

Grafana’s alerting mechanism was used in conjunction with Ali’s Arthas to achieve stack fetching for threads with high CPU usage. The overall process is as follows:

  1. Add a WebHook type alarm notification channel for Grafana with the url of the program (the default hooks path is /hooks).
  2. Configure the Grafana diagram and set alarm thresholds
  3. When Webhook is triggered, the application will automaticallycraw.shThe script is copied to the container of the corresponding Pod and executed.
  4. The program saves stdout to a local file.

Results the preview

The default behavior

  • The number of simultaneous executions per node is 10./internal/defaultvalue.goChanges in the
    var defaultNodeLockManager = nodelock.NewLockManager(10)
    Copy the code
  • The Master configuration within the cluster is used by default

    Can be found in./internal/defaultvalue.goChanges in the
    func DefaultKubernetesClient(a){}
    
    // default
    func getConfigByInCluster(a){}
    
    func getConfigByOutOfCluster(a){} Copy the code
  • A local file-based stack store is used and implemented by default, with a path under the working pathstacksIn the

    Can be found in./internal/defaultvalue.goChanges in the
    func GetDefaultNodeLockManager(a){}
    Copy the code
  • Default is the stack information for the top 50 busiest threads (available atcraw.shModify)
  • Sample collection time is 2 seconds (available atcraw.shModify)

How to use

Docker Image

majian159/java-debug-daemon

Create a new notification channel for Grafana


Pay attention to the point

  1. Send Reminders need to be opened, otherwise Grafana will default to sending reminders that have not been resolved since the alarm was triggered
  2. Reminder every can control how soon an alarm occurs

Create a new alarm diagram for Grafana

If you don’t mind, you can directly import the following configuration and change it yourself

{
  "datasource": "prometheus".  "alert": {
    "alertRuleTags": {},
    "conditions": [
 {  "evaluator": {  "params": [  1 ]. "type": "gt"  },  "operator": {  "type": "and"  },  "query": {  "params": [  "A". "5m". "now"  ]  },  "reducer": {  "params": []. "type": "last"  },  "type": "query"  } ]. "executionErrorState": "keep_state". "for": "10s". "frequency": "30s". "handler": 1. "name": "Pod high CPU stack grab". "noDataState": "no_data". "notifications": [  {  "uid": "AGOJRCqWz"  }  ]  },  "aliasColors": {},  "bars": false. "dashLength": 10. "dashes": false. "fill": 1. "fillGradient": 0. "gridPos": {  "h": 9. "w": 24. "x": 0. "y": 2  },  "hiddenSeries": false. "id": 14. "legend": {  "alignAsTable": true. "avg": true. "current": true. "max": true. "min": false. "rightSide": true. "show": true. "total": false. "values": true  },  "lines": true. "linewidth": 1. "nullPointMode": "null". "options": {  "dataLinks": []  },  "percentage": false. "pointradius": 2. "points": false. "renderer": "flot". "seriesOverrides": []. "spaceLength": 10. "stack": false. "steppedLine": false. "targets": [  {  "expr": "container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", image! =\"\", container! =\"POD\"}* on (namespace, pod) group_left(node) max by(namespace, pod, node, container) (kube_pod_info)". "legendFormat": "{{node}} - {{namespace}} - {{pod}} - {{container}}". "refId": "A"  } ]. "thresholds": [  {  "colorMode": "critical". "fill": true. "line": true. "op": "gt". "value": 1  } ]. "timeFrom": null. "timeRegions": []. "timeShift": null. "title": "Pod CPU". "tooltip": {  "shared": true. "sort": 0. "value_type": "individual"  },  "type": "graph". "xaxis": {  "buckets": null. "mode": "time". "name": null. "show": true. "values": []  },  "yaxes": [  {  "format": "short". "label": null. "logBase": 1. "max": null. "min": null. "show": true  },  {  "format": "short". "label": null. "logBase": 1. "max": null. "min": null. "show": true  } ]. "yaxis": {  "align": false. "alignLevel": null  } } Copy the code

The Queries configuration

The Metrics of the fill in

container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", image! ="", container! ="POD"} * on (namespace, pod) group_left(node) max by(namespace, pod, node, container) (kube_pod_info)Copy the code

Legend of the fill in

{{node}} - {{namespace}} - {{pod}} - {{container}}
Copy the code

The configuration is as follows:

Alert configuration

IS ABOVE CPU usage value. The configured here IS the alarm if the CPU exceeds 1 core. You can adjust the Evaluate every For Pedding time according to your needs

The configuration should be as follows:

build

binary

Build for the current system platform
make

GOOS: Linux Darwin Window FreebSD
make GOOS=linux
 Copy the code

Docker mirror

make docker

# Customize the mirror tag
make docker IMAGE=test
Copy the code