The author | | peng source enlightenment alibaba cloud native public number

OpenTelemetry is an observability project of CNCF, which aims to provide a standardized solution in the field of observability, solve the standardization problems of data model, collection, processing and export of observed data, and provide services independent of three vendors.

In 2021.02.10, The Tracing Spec of OpenTelemetry reached version 1.0 (link). Based on this milestone, the author explored OpenTelemetry and judged its value and development prospect in the field of observability.

The author’s understanding of OpenTelemetry is given below. Due to the author’s limited ability, improper understanding of the place please point out.

What is OpenTelemetry?

From the official What is OpenTelemetry? It can be learned that:

OpenTelemetry is a set of APIs, SDKs, tooling and integrations that are designed for the creation and management of telemetry data such as traces, metrics, and logs.

The project provides a vendor-agnostic implementation that can be configured to sent telemetry data to the backend(s) of your choice. It supports a variety of popular open-source projects including Jaeger and Prometheus.

OpenTelemetry is a collection of standards and tools designed to manage observation-class data such as trace, metrics, logs, and so on (new observation-class data types may appear in the future).

OpenTelemetry provides a vendor-neutral implementation that exports observation class data to a different backend, such as open source Prometheus, Jaeger, or cloud vendor services, depending on the user’s needs.

So what is OpenTelemetry not? As can be seen from the official description:

OpenTelemetry is not an observability back-end like Jaeger or Prometheus. Instead, it supports exporting data to a variety of open-source and commercial back-ends. It provides a pluggable architecture so additional technology protocols and formats can be easily added.

OpenTelemetry does not provide observability-related back-end services, which typically provide storage, query, visualization, and so on.

You can understand the scope of OpenTelemetry by looking at the following abstract diagram:

What is the problem domain that OpenTelemetry faces?

Definition of Observability:

In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.

Consider a physical system modeled in state-space representation. A system is said to be observable if, for any possible evolution of state and control vectors, the current state can be estimated using only the information from outputs (physically, this generally corresponds to information obtained by sensors). In other words, one can determine the behavior of the entire system from the system’s outputs. On the other hand, if the system is not observable, there are state trajectories that are not distinguishable by only measuring the outputs.

In simple terms, observability is a method of deriving the internal state of a system from its external output.

The following diagram simplifies system composition and interaction between systems:

As can be seen from the interaction diagram above, the interaction behavior of the system has the following forms:

  • Within the system

    • Components function in closed loop and do not interact with other components or systems
    • Interaction between components
  • System between

    • Systems interact with each other

Thus, to understand the state of the system from its external output, two forms of information are required:

  • Component closed loop information
  • Information that flows between components or systems

The first pattern is usually represented by logs or metrics, and the second pattern requires trace characterization, adding markers to the flow of information.

The difference between logs and metrics can be understood in terms of how they operate.

Further abstracting, observability involves the following questions:

  • Data model of observed data
  • Collection of observation data
  • Processing of observed data
  • The derivation of observational data
  • Use of observational data
  • etc.

The above are the problem domains and specific problems that OpenTelemetry faces, and the specific problems are limited to:

  • Data model of observed data
  • Collection of observation data
  • Processing of observed data
  • The derivation of observational data

What is OpenTelemetry’s solution?

OpenTelemetry specifies the data model and collection, processing, and export methods of observed data through the Spec, including trace, metrics, and logs (new types are not excluded in the future). See OpenTelemetry – Specification.

For the convenience of using a protobuf, see OpentElemetry -proto.

Based on Spec, OpenTelemetry makes the following efforts for the generation and processing of observation data:

  • To facilitate developers, the system provides SDKS related to languages, such as OpentElemetyr-go, OpentElemetyr-java, and OpentElemetyr-js. For all supported development languages, see the official documents
  • To facilitate the collection, processing, and export of observable data, a configuration managed Collector service is provided. For example, connect to opentelemetyr-collector of the open-source project and opentElemetyr-collector-contrib of a third-party vendor

The following figure provides an intuitive understanding of OpenTelemetry’s components and workflow:

What is the history of OpenTelemetry?

OpenTelemetry (So Far) is A combination of two open source projects:

  • OpenCensus

    • Standardize data models for Trace and metrics, and provide tools for collection, processing, and export
  • OpenTracing

    • Standardize data model for Trace, and provide tools for collection, processing and export

In May 2019, the two open source projects merged and officially announced the open Source OpenTelemetry project.

On 2021.02, Trace Spec reached version 1.0. According to the official maturity model (LINK), trace spec has reached stable level, metrics has reached beta level, logs are still at alpha level:

What are the prospects for OpenTelemetry?

Since the launch of OpenTelemetry, more and more manufacturers begin to pay attention and contribute.

As can be seen from opentElemetyr-collector-contrib, vendors focus on exporting observation data to their own services, which already contain Alicloud’s OWN SLS products:

For receiver and processor, it is believed that manufacturers will gradually invest more energy, such as:

  • A processing workflow for the observation data is formed through the cooperation of the Receiver and my exporter
  • Through the processor, the observed data is normalized before being stored

For multi-cloud scenarios, OpenTelemetry defines observation data model and collection/processing/export standards, which helps users connect to multiple cloud vendors through a set of observability standards, avoiding vendor locking.

Even for a single cloud (such as a service within a cloud vendor), it is inevitable to consider future open source and external co-authoring, etc. Using community observability standards can reduce open source costs. At the same time, the concept, standard and technology of observability are constantly iterating. By following the community, the technology dividend and influence brought by the community can be better used.

Therefore, it is necessary to adopt industry standards for observability for both cloudy scenarios and single cloud vendors.

How does OpenTelemetry work?

The core concept

There are many concepts in OpenTelemetry. Here are some common concepts for easy understanding:

  • Correlation of observed data

    • Signal
      • Observe data types such as trace, metrics, logs
    • Instrument
      • Think of it as an instance of Signal
  • OpenTelemetry own project related

    • API
      • OpenTelemetry Formal description of the Spec, for example, OpenTelemetry -proto
    • SDK
      • API implementations for different development languages
    • Contrib Packages
      • Implementations associated with a specific open source project or vendor product
  • The components used are related

    • Components
      • Receivers
        • A component that receives observed data
      • Processors
        • A component that processes the observations received
      • Exporters
        • A component that exports observation data, such as to an open source project Prometheus or cloud vendor services
      • Extensions
        • Do not participate in the observation data processing, assist the operation of related processing components, such as health detection, service discovery, etc
      • Services
        • Represents which components of the configuration need to run, such as Receivers/Processors/Exporters/Extentions
      • Collector
        • Receivers/processors/exporters/extentions/services
      • Controller
        • Processors/Processors/exporters in developer applications

golang demo

I wrote a Golang demo to demonstrate:

  • How to generate trace/metrics data in APP
  • Stdout Controller is used in APP to collect, process and print trace/metrics data
  • Trace/metrics data is collected by OTLP Controller in APP and exported to external running collector
  • Run a collector service independently locally, receive trace/metrics data pushed by OTLP Controller, and export it to local files and Ali Cloud SLS

Demo See: github.com/flyer103/ot…

See readme. md of Demo for specific usage methods. The following is a brief description of the idea.

The CMD /app/server.go file describes the use logic of OpenTelemetry, which is divided into two parts:

  1. Initialize and run the global controller to receive/process/export observations inside the APP or to push observations outside the APP
  2. Metrics and trace are generated within the APP based on business requirements

PKG/encapsulates controller and signal (trace/metrics) respectively.

An example of exporting observations to SLS is provided under YAML /, Includes processors for receiving observation data (to which the client pushes data through the GRPC client), processors for observation data conversion, exporters for data export, and components startup Services:

Think about

Through the above analysis, you will have an intuitive experience of OpenTelemetry concept, problem domain, solution and use method, through the above golang demo can quickly get started.

For developers, OpenTelemetry can generate and export trace/metrics/logs through a set of standard solutions, reducing the cost of using different types of observation data during development and the cost of connecting different back-end services. Such as open source project Prometheus or third-party cloud vendor services.

For SRE, OpenTelemetry provides a standard collection, processing, and export process for the observation data, and standardizates the observation data according to team requirements in the process, so as to facilitate the subsequent use of standardized solutions for observation data, such as monitoring and alarm services.

At the same time, both developers and SRE can continue to iterate on the understanding of the observability problem domain through the power of the community, absorb the technical dividends of the community, and return the best practices generated in production to the community, to better promote the development of the observability domain.

References

  • Metrics, tracing, and logging
  • Merging OpenTracing and OpenCensus: A Roadmap to Convergence
  • Introduction to OpenTelemetry (Overview Part 1/2)
  • OpenTelemetry Best Practices (Overview Part 2/2)
  • OpenTelemetry: Future-Proofing Your Instrumentation
  • Getting started with OpenTelemetry on Kubernetes
  • OpenTelemetry website
  • wikipedia: Observability

Welcome to leave a message to exchange the stability problems in the process of using Kubernetes, as well as the stability of the expected tools or services. You can also contact the author via email: [email protected].