Author | Ginger famous

Source | Erda public account

In order to give you a better understanding of the design and implementation of THE APM system in MSP, we decided to write a series of articles called “Micro Service Observation in Detail”, which dive into the product, architecture design and basic technology of the APM system. This article is the third article in this series. It mainly introduces the implementation principle of Telegraf data processing link and the implementation method of plug-in.

“Detailed Chat micro service Observation” series of articles:

  • From Surveillance to Observability, Where do We Finally Go?
  • After learning, this dashboard system is really fun to use!
  • Understand the index collection tool Telegraf (article)

Telegraf is a popular metric gathering software that is open source for InfluxData and has tens of thousands of stars on the GiHub. With the help of the community, it has more than 200 collection plug-ins and more than 40 export plug-ins, covering almost all monitoring items, such as machine monitoring, service monitoring and even hardware monitoring.

Architecture design

Pipeline concurrent programming

Pipeline concurrent programming mode is a common concurrent programming mode in Go. To put it simply, it is composed of a series of stages on the whole, each stage is composed of a group of goroutine running the same function, and each stage is connected by channel.

At each stage, Goroutine is responsible for the following:

  1. The data generated by the upstream stage is received through the entry channel.
  2. Handle data, such as format conversion, data filtering aggregation, and so on.
  3. The processed data is sent to the downstream stage through the exit channel.

Among them, each stage has one or more exit and entrance channels at the same time, except for the first stage and the last stage, which have only exit channel and entrance channel respectively.

Implementation in Telegraf

Telegraf uses this programming model, and there are four stages: Inputs, Processors, Aggregators, and Outputs.

  • Inputs: collect the original monitoring indicators, both active and passive.
  • Processors: Processes the data collected by Inputs, including deweighting, renaming, format conversion, and so on.
  • Aggregators: Responsible for aggregating the data that Processors processed, and then calculating the aggregated data.
  • Outputs: Responsible for receiving and exporting data from Processors or Aggregators to other media, such as files, databases, etc.

They are also linked to each other by channels, and their architecture diagram is as follows:

It can be seen that the overall use of pipeline concurrent programming model, let’s briefly introduce its operation mechanism:

  • The first stage is Inputs, and each input generates a Goroutine, which is collected separately and fan-in into the channel.
  • The second stage is Processors, where each processor generates a Goroutine and channels each other in order.
  • The third stage is for Aggregators, each of which generates a goroutine and consumes data generated by Processors, fan-out to each aggregator.
  • The last stage is the Outputs, where each output generates a goroutine and consumes data generated by Processors or Aggregators, fanned out to each output.

Fan-in: Multiple functions output data to a channel, which is read by a function until it is closed. Fan-out: Multiple functions read the same channel until it is closed.

Plug-in design

With so many input, output, and processor plugins, how does Telegraf manage these plugins efficiently? And how do you design your plug-in architecture to cope with the ever-increasing need for extensions? Don’t worry. Let me elaborate. In fact, this is not a plug-in in the usual sense (that is, dynamically loading and binding dynamic link libraries at runtime), but a variant based on the factory pattern. First, let’s look at Telegraf’s plug-in directory structure:

│ ├─ all │ ├─ BasicStats │ ├─ Registry. ├ ─ ─ inputs │ ├ ─ ─ all │ ├ ─ ─ CPU │ ├ ─ ─ registry. The go... ├ ─ ─ outputs │ ├ ─ ─ all │ ├ ─ ─ it │ ├ ─ ─ registry. The go... ├ ─ ─ processors │ ├ ─ ─ all │ ├ ─ ─ clone │ ├ ─ ─ registry. The go...Copy the code

As you can see above, the directory structure is regular (all of the following Inputs are used as an example, other modules implement similar implementations).

  • Plugins /inputs: Package directory for each input plug-in.
  • Plugins /inputs/all: Import plug-in module packages (mainly to avoid circular references).
  • Plugins/inputs/registry. Go: storing registry and related functions.

Interface declaration

Telegraf uses interface to declare the following Input interface:

Interface implementation

Create plug-ins, such as CPUS, in the plugins/inputs/ directory to implement the Input interface:

To register the plugin

Finally, all we need to do is register the plugin’s factory functions in the global registry:

In this way, many plug-ins are managed in an orderly manner. At the same time, the extension is convenient, simply implementing the Input interface and registering the factory function.

Application in Erda

In Erda, we use Telegraf as the metrics collection service of the Erda platform, deployed as a daemon on each physical machine. At present, it has been widely used in production, running steadily on thousands of machines, collecting and reporting a large number of indicators for SRE and related operation and maintenance personnel to easily analyze and investigate. Due to some special needs, we had to carry out secondary development based on Telegraf, in order to better adapt to business needs. Still, thanks to Telegraf’s powerful plug-in system, we often just need to add new plug-ins to meet our needs. For example, add output plug-in to report to our own collection end, add IntPUT plug-in to check the health of Erda’s own components, etc. In the future, we will gradually abandon the two parts and embrace open source to maximize the consistency with the official open source version of Telegraf to give back to the community.

reference

  • Go Concurrency Patterns: Pipelines and Cancellation​
  • Telegraf project address​
  • Talk about Go’s Factory Mode in combination with projects​

If you have any questions, welcome to add little assistant wechat (Erda202106) to join the exchange group, participate in the exchange and discussion!

  • Erda Github: github.com/erda-projec…
  • Erda Cloud website: www.erda.cloud/