This article has participated in the activity of “New person creation Ceremony”, and started the road of digging gold creation together.
Search the public account “programmer Bai Ze” on wechat and enter bai Ze’s programming knowledge sharing planet
Recently, I have done something about distributed link tracing, and I will write an article to sort out my ideas, which may help students who want to get started. Below I will explain from principle to demo for everyone one by one, welcome to the comment area exchange ~.
1. Causes of distributed link tracing
Explain the reasons for the emergence of distributed link tracking, and analyze the implementation method of dapper distributed link tracking system given in dapper paper
1.1 Requirements for Distributed Link Tracing — Dapper Paper (2010)
Dapper paper translation version: bigbully. Making. IO/Dapper – tran…
Internet applications are built on different sets of software modules, which may be developed by different teams, implemented in different programming languages, and spread across thousands of servers across different data centers. Therefore, you need tools that help you understand the behavior of the system, analyze performance issues, and track the delivery of requests.
Dapper — a distributed tracking system in the production environment of Google. Two years after its operation, Google released a paper based on Dapper, focusing on the design thought of Dapper and providing theoretical support for the subsequent generation of related analysis tools for distributed link tracking.
Next, it will introduce how to realize distributed link tracking in Dapper’s paper and refine the core concept — SPAN.
1.2 Distributed tracking of Dapper
The figure on the left shows A single service associated with five servers, consisting of A front end (A), two middle tiers (B and C), and two back ends (D and E). When a user (the originator of this use case) makes a request, it reaches the front end first and then sends two RPCS to servers B and C. B will respond immediately, but C needs to interact with D and E at the back end before returning it to A, who will respond to the initial request. A simple, practical implementation of distributed tracing for such a request is to collect trace message identifiers and timestamped events for every action you send and receive on the server.
In order to associate all record entries with a given initiator (for example, RequestX in the figure) and log all information, Dapper tends to explicitly mark a global ID for the application or middleware to connect each record to the initiator’s request. The main disadvantage of this scheme is that it obviously requires code placement. However, Dapper developers believe that the code can be limited to a small common component library, which makes the application of the monitoring system effectively transparent to developers.
In form, Dapper tracking model uses a tree structure, and Dapper calls every node in the tracking tree span, and SPAN stands for nodes in distributed link tracking.
1.3 Trace tree and Span
In the Dapper trace tree structure, tree nodes are the basic units of the entire architecture, and each node is a reference to a SPAN. The lines between nodes represent a direct relationship between the span and its parent span.
The figure on the left illustrates what span looks like over a large trace. Dapper records the span names, as well as the ID and parent ID of each span, to reconstruct the relationship between the different spans in a single trace. A span without a parent ID is called a root span. All spans hang on a specific trace and also share a trace ID (not shown in the figure). In a typical Dapper trace, we want to correspond to a single span for each RPC, and each additional component layer corresponds to a hierarchy of trace tree structures.
The left image gives a more detailed view of a typical Dapper tracking span’s record points. This span represents two “helper. Call” RPCS (server side and client side). The start and end times of the SPAN, as well as any RPC time information, are recorded through Dapper’s incorporation into the RPC component library. If the application developer chooses to add their own comments (for “Foo” in the figure) (business data) to the trace, this information will be logged just like any other SPAN information.
Annotation: The above insertion points are enough to deduce the tracking details of complex distributed systems, making Dapper’s core functions available without changing Google apps. However, Dapper also allows application developers to add additional information during Dapper tracking to monitor higher-level system behavior, or to help debug problems. We allow users to define time-stamped annotations that can add anything through a simple API.
1.4 Tracing collection
Dapper’s track record and collection pipeline process is divided into three stages (see figure left). First, the SPAN data is written to (1) the local log file. Dapper’s daemons and collection components then pull the data out of the production hosts (2) and eventually write it to (3) Dapper’s Bigtable repository. A trace is designed as a row in Bigtable, with each column equivalent to a span. Bigtable’s support for a sparse table layout is perfect for this situation, as there can be any number of spans per trace.
2. Zipkin and Jaeger
2.1 ZipKin structure
Zipkin is a Trace system component open-source by Twitter. By comparing the structure of Zipkin in the dotted box of the first two pictures with that of Dapper in the third picture, it can be clearly confirmed that the implementation of Zipkin refers to Google Dapper. As shown in Figure 1, each instrumented node sends link-tracking information to Zipkin’s collector. Zipkin stores the data and provides a UI to display link-monitoring.
Note: Each node that needs to be traced is a span, and the information of each SPAN is sent to the collector independently (because there is a unified global traceId and a parent spanId in the SPAN, therefore, the trace tree is sent to zipkin after all the spanes-related traces are traced. Tree structure by Zipkin component to provide UI display)
2.2 Sending and display of Trace information
2.3 Jaeger structure and monitoring display
From Jaeger official documentation: Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies
Jaeger and Zipkin can be used as distributed link tracking components, but they were developed in Go language later. The choice of the two technologies depends on the specific needs of the project. It is not listed here, but it should be clear that both are distributed link tracking components based on Dapper, remember the Dapper structure diagram on the left below
3. OpenCensus is introduced
3.1 introduce OpenCensus
Obviously, Dapper, Zipkin and Jaeger all receive span data sent by Collector on the same port, then build a trace tree and display it. The monitored service must actively send SPAN data, and the behavior of sending SPAN data to the specified tracking component must be supported by corresponding API.
OpenCensus currently provides libraries in several languages that allow you to capture, manipulate, and export metrics and distributed tracing to a backend of your choice. So the key is how do you build a SPAN, and how do you send a span to a given back end
3.2 OpenCensus — > Span build
The OpenCensus API provides two parameters to create a span: Context.Context is an interface type that holds trace data and is used to layer through memory. The second method adds a SpanContext type named parent to create a span based on a given parent span (from an external request)
3.3 OpenCensus – > Span build – > Dig into the source code of two startSpan to explore its usage scenarios
3.4 OpenCensus – > Span build – > startSpanInternal part of the core source code
If hasParent is false (nil) when startSpanInternal() is called, it will automatically generate a random traceId, which is not what we want. While startSpanWithRemoteParent () from the parameter for the parent, so we use the second method is used for after obtaining the front-end traceId create our first span the entire trace link, after will span information in CTX in memory, Each subsequent span is created by calling the startSpan() method
3.5 OpenCensus – > Span propagation
Now that I’ve shown you how span is created and how span is stored in golang in the type described by the context. context interface, how is span delivered on a microservice node and between microservice nodes implemented? Here’s an excerpt from a byteco&m team article:
In other words, SPAN is the smallest unit of the tracking tree. Multiple SPANS can be extracted from a microservice node along with the service processing process (using context to transmit SPAN information in memory), and context transmission across microservice nodes will be realized by the microservice framework. For example, after GRPC generates pb files, Context comes with the call method option, just passing the context between microservices
Note that the initial traceId is passed to the backend HTTP server through an HTTP request in an HTTP header, after which the microservices framework is used to automatically pass the SPAN (context) information
3.6 OpenCensus – > Exporter registration
My Exporter sends trace messages (spans) to any back end that can consume them. The Exporter itself can be changed without changing your client code. This is why OpenCensus is truly vendor-agnostic. You only need to collect trace (SPAN) information once and export it simultaneously to different back ends.
Overall usage process: 1. Register my Exporter (equivalent to declaring which back-end Zipkin, Jaeger, etc.); 2. Build span using opencensus API; 3. Send a span (the sending process will run through all registered exporters, sending the span built one by one to the back end of each pointed to Exporter).
There is no need to change the code of step 2 and step 3. If you need to add or replace the target backend each time, you only need to modify MY Exporter.
4. Demo structure introduction and demonstration
The project structure
The directory structure
4.1 Code of http_server.go
- The main function
- sendHttp
- callGrpcServer
4.2 Use Zipkin and Jaeger to show monitoring effect
- Zipkin
- Jaeger
Just set up a small wechat group, welcome to exchange and study together, prepare for autumn recruitment.
P3-juejin.byteimg.com/tos-cn-i-k3…
Bai Ze’s wechat (if the qr code of the invalid group is invalid, you can add my wechat, and I will pull you in if you add the group)
P3-juejin.byteimg.com/tos-cn-i-k3…
Public number [programmer Bai Ze], later try to synchronize the update on both sides, convenient for everyone to read.
P3-juejin.byteimg.com/tos-cn-i-k3…