background

Profiling performance diagnostics for applications are commonplace for many backend programmers after business operations. A handy tool often makes the task twice as effective.

In this regard, go has a natural advantage: following in the footsteps of Google’s pprof C++ profiler, the go tool pprof tool has been around since birth. Also, the library provides both runtime/pprof and NET/HTTP /pprof packages to make profiling programmable.

In non-container environments, our r&d students like to use NET/HTTP /pprof to provide HTTP interfaces for go Tool pprof tools for profiling:

import _ "net/http/pprof"

func main(){
    ...
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    ...
}
Copy the code

Get CPU profile data:

go tool pprof http://localhost:6060/debug/pprof/profile
Copy the code

However, as the architecture evolves into a microservice architecture and is deployed using containerized technologies such as K8S, this approach faces more and more problems:

  1. Our production environment uses K8S for container choreography and deployment. The service type is NodePort. Profiling of a particular pod of a service is not possible. The previous solution was:
    1. Profiling is straightforward if the problem to diagnose is one that is endemic to the service.
    2. If the problem to be diagnosed is confined to a particular POD of the service, the operations student logs in to that pod’s host for profiling. Time consuming and low efficiency.
  2. With architecture microservitization, the number of services increases by an order of magnitude. The old approach of going to the diagnostic service site after a problem becomes increasingly difficult to meet the increasing frequency and volume of profiling needs (in many cases, we are not ready for profiling until the problem has passed…). . There is an urgent need for a way to profiling a program automatically in the event of a program failure, maximizing access to on-site data.
  3. At the same time, we want this automatic profiling mechanism to have as little impact on application performance as possible and to be integrated with existing alarm systems to notify diagnostics directly to the owner of the application.

Scheme design and implementation

  • We use Heapster to monitor the k8S container cluster. The monitored timing data is written to the influxDB for persistence.
  • gopprofIs the core service for performance diagnostics of other GO services in our container environment:
    • Profiling of abnormal PODS is automatically performed by analyzing monitoring data in the influxDB. The current policy is set to trigger profiling if the POD’s resource utilization exceeds the set threshold of 0.8 for both 1-minute analysis periods.
    • willgopprofDeployment as a service in K8S cluster is mainly to enable it to access POD HTTP profile interface directly through Intranet IP, which has realized the profiling of specific pods:
    go tool pprof http://POD_LAN_IP:NodePort/debug/pprof/profile
    Copy the code
    • gopprofAfter profiling is complete, profile SVG invocation diagrams are automatically generated, the profile data and invocation diagrams are uploaded to the cloud storage, and diagnostic results are pushed to the service owner:

    • Due to thegopprofDepend on the toolgo tool pprofgraphivz, sogopprofThe basic image needs to be pre-installed with both tools. referenceDockerfile:
    # Base image contains golang env and graphivz FROM Ubuntu :16.04 MAINTAINER Daniel [email protected] RUN apt-get update RUN apt - get the install wget - y wget RUN - O go. Tar. Gz https://dl.google.com/go/go1.9.2.linux-amd64.tar.gz && \ tar - C /usr/local -xzf go.tar.gz && \ rm go.tar.gz ENV PATH=$PATH:/usr/local/go/bin RUN go version RUN apt-get install graphviz  -yCopy the code
    • gopprofProvides interfaces for developers to manually profiling specific pods and groups of pods. It not only liberates the productivity of operation and maintenance students, but also makes it more possible for r&d students to obtain the program site when there are problems that are difficult to reproduce.
    • In terms of high availability, only one deployment is supportedgopprofPod, service availability depends on K8S ‘Auto Restart. Later, if there is a need for this, it may be modified to rely on ETCD to support multiplegopprofPod deployment.

summary

The GopProf service has been on the ground for some time, and has met our design expectations and helped us find and solve some performance issues that we were not aware of. Due to some internal code dependencies, it is not open source for the time being. But the components that the entire solution relies on are common, so you can implement it easily. If you are interested in some of the details of our implementation, please feel free to comment and leave a comment.