Look at history with doubt
When it comes to link tracing, most people will think of mature open source software such as Zipkin, Jaeger, Skywalking and open source standards such as Opentelemetry, OpenTracing and OpenCensus. Although the implementation is different, there are many similarities between different link tracking systems using various software, standards, and implementation combinations.
For example, these link tracing systems all need to propagate metadata on the call link. They also define metadata content in much the same way, such as the link’s unique Trace ID, the parent ID associated with the parent link, and the SPAN ID that identifies itself. The collected tracing information is reported asynchronously and dispersively, and the tracing link is aggregated offline. They all have link sampling and so on.
The architecture and model design of the link tracing system look so similar that I can’t help but wonder: are developers all thinking the same way when designing link tracing? Why pass metadata on the call link? Is all this metadata information necessary? Can we access the link tracking system without intrusive modification code? Why asynchronous decentralized reporting and offline aggregation? What is the use of setting link sampling?
With all sorts of questions, I found the “Google Dapper” paper that was the inspiration for many of these link-tracking apps, and read the original text and related citation papers. These papers gradually cleared up my doubts.
Explore black box mode
In the early academic exploration of link state detection of distributed system, some people think that each application or middleware in distributed system should be a black box, and link detection should not invade into the application system. At that time, Spring had not yet been developed, and inversion of control and aspect programming techniques were not yet popular. If you had to hack into application code, which would involve modifying application code, the extra barrier to access was too high for engineers to implement such a link detection tool.
If you’re not allowed to hack into your application and change your code, you can only manipulate it from the outside of your application to get and record link information. However, due to the limitation of black box, link information is scattered and cannot be connected in series. How to connect these links in series becomes a problem to be solved.
Performance Debugging for Distributed Systems of Black Boxes
This paper, published in 2003, is an exploration of call chain monitoring in black box mode, and proposes two algorithms to find link information.
The first algorithm is called “nesting algorithm”. It first associates a request (1 call) link with a return (11 return) link to form a link pair by generating a unique ID. Then, by using the time sequence, different round-trip links are associated horizontally or upstream and downstream (see Figure 1).
Figure 1Copy the code
If the application is single threaded, this algorithm is fine. Production applications are often multi-threaded, so it is difficult to find the link correspondence using this method. Although a scoreboard penalty method is proposed in this paper to divide the weights of some error-associated link relationships, this method will have some problems for some services based on asynchronous RPC calls.
The other algorithm, called “convolution algorithm”, treats the round-trip links as independent links, then treats each independent link pair as a time signal, and uses signal processing techniques to find the correlation between the signals. The advantage of this algorithm is that it can be used for services based on asynchronous RPC calls. However, if the actual call link has loopback, the convolution algorithm can obtain not only the actual call link, but also other call links. For example, if link A -> B -> C -> B -> A is called, the convolution algorithm will get not only its own call link, but also the call link of A -> B -> A. If a node appears more than once on a link, the algorithm is likely to result in a large number of derived call links.
In black box mode, links are determined based on probability statistics. Probability statistics is always probability, and the association between links cannot be accurately obtained.
Another way to think about it
How can we accurately figure out the relationship between call links? The following paper gives some ideas and practices.
Pinpoint: Problem Determination in Large, Dynamic Internet Services
I have Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint Pinpoint
The research object of this paper is mainly single application with different components, of course, the corresponding method can also be extended to distributed cluster. Pinpoint architecture design is mainly divided into three parts. Refer to Figure 2. Tracing and Trace Log are the first part, called Client Request Trace, which are mainly used to collect link logs. The second part, Internal F/D, External F/D, and Fault Log, is used to collect Fault logs. Statistical Analysis is the third part, called Data Clustering Analysis, which is mainly used to analyze the collected log Data and obtain the fault detection results.
Figure 2Copy the code
Pinpoint architecture, design a kind of data that can be effectively used in data mining analysis method. As shown in Figure 3, each invocation link is a sample of data, marked with a unique identifier request ID. The attributes of the sample record the program components and Failure status that the invocation link passes through.
Figure 3Copy the code
In order to associate Trace Logs and Fault Logs for each invocation, this paper takes Java application as an example to describe how to implement the association of these Logs in code. Here is a summary of some key points from the Pinpoint practice chapter:
- One needs to be generated for each component
component id
- Generate a unique for each HTTP request
request id
, and is passed through a thread-local variable - For a new thread in the request, you need to modify the thread creation class to add
request id
Pass it on - For RPC calls generated in the request, the request side code needs to be modified to put
request id
The information is taken into the header, which is parsed at the receiver and injected into thread-local variables - Each time a component is called, use (
request id
.component id
) to record a Trace Log
For Java applications, these points are simple in practice and high in operability, which provides a basic idea for the link tracking system to realize link series and link propagation.
This paper was published in 2002, when Java version 1.4 had the ability to use ThreadLocal variables, making it easier to carry information in a thread. But because aspect programming was not popular at the time (Spring appeared in 2003, and JavaAgent was a Java 1.5 capability released in 2004), this approach was not widely used. Conversely, the emergence of these programming requirements may be the driving force behind the technological advances in Java aspect programming.
Rebuild the call link
X-Trace: A Pervasive Network Tracing Framework
The main research object of this paper is the network link in distributed cluster. X-trace paper continues and has extended the Pinpoint paper’s train of thought, proposed to be able to rebuild the complete call link framework and model. In order to achieve this goal, the paper defines three design principles:
- Carry metadata in the call link (the data passed in the call link is also called in-band data,
in-bound data
) - The reported link information is not stored in the calling link. The mechanism for collecting link information must be orthogonal to the application itself. (Note: The link data not stored in the calling link is also called out-of-band data.
out-of-bound data
) - The entity that injects metadata should be decoupled from the entity that collects reports
Principles 1,2 are the design principles used today. Principle 1 extends the Poinpont idea by adding more elements to the original request ID, where TaskID, ParentID, and OpID are the predecessors of trace ID, parent ID, and SPAN ID. The word span also appeared in the Abstract of x-Trace papers, which may be a tribute to the authors of X-trace papers by Dapper.
Here’s how x-Trace defines metadata:
-
Flags
- An array of bits for marking
TreeInfo
,Destination
.Options
Whether to use
- An array of bits for marking
-
TaskID
- Globally unique ID that identifies a unique call chain
-
TreeInfo
ParentID
– Id of the parent node, unique in the call chainOpID
– Id of the current operation, unique in the calling chainEdgeType
– NEXT indicates the brotherly relationship, and DOWN indicates the father-son relationship
-
Destination
- This parameter is used to specify the reporting address
-
Options
- Reserved field for extension
In addition to the definition of metadata, this paper also defines two link propagation operations, namely pushDown() and pushNext(). PushDown () copies metadata to the next level, and pushNext() propagates metadata from the current node to the next node.
Figure 4. Pseudocode for pushDown() and pushNext()Copy the code
Figure 5. Position of pushDown() and pushNext() in the call linkCopy the code
The second design principle is followed in the structure design of x-Trace link data. As figure 6 shows, X-Trace provides an application with a lightweight client package that allows the application to forward link data to a local daemon. The local daemon, on the other hand, opens a UDP port, receives data from client packets, and places it in a queue. The other side of the queue is sent to the corresponding place according to the specific configuration information of the link data, perhaps a database, perhaps a data forwarding service, data collection service or data aggregation service.
Figure 6.Copy the code
The architecture design of x-Trace reporting link data has a great influence on the current link tracing implementation in the market. Compared to Zipkin’s Collector and Jeager’s Jaeger-Agent, you can see a bit of X-Trace.
The three design principles of X-Trace, the definition of in-band and out-of-band data, the definition of metadata transmission operation, and the link data reporting architecture are all references for today’s link tracing system. Comparing Zipkin’s Collector and Jeager’s Jaeger-Agent, we can see the shadow of x-Trace link data reporting architecture.
Large-scale commercial practice — Dapper
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
Dapper is Google’s internal system for providing developers with information about the behavior of complex distributed systems. Dapper paper introduces Google’s experience in the design and practice of this distributed link tracking infrastructure. According to the Dapper paper, which was published in 2010, the Dapper system has been in practice within Google for two years.
The main purpose of Dapper system is to provide the behavior information of complex distributed system for developers. This paper analyzes what kind of problems need to be solved in order to realize such a system. Based on these issues, two basic design requirements were proposed: large-scale deployment and continuous monitoring. Aiming at the two basic design requirements, three specific design objectives are proposed:
- Low overhead: The link tracking system needs to ensure that the performance impact on online services is negligible. Even small monitoring costs can have a noticeable impact on some highly optimized services, even forcing deployment teams to shut down tracking systems.
- Application-level transparecy: Developers should not be aware of link tracing facilities. If a link tracing system relies on application-level developer assistance to work, the link tracing facility is weakest and often fails due to bugs or omissions. This violates the design requirements for large-scale deployment.
- Scalability: The link tracking system needs to be able to meet the scale of Google’s services and clusters for years to come.
Although the design concept of Dapper and Pinpoint, Magpie, X-trace have a lot of come to mind, but Dapper also has some of its own unique design. One of them is that in order to achieve the design goal of low overhead, Dapper carries out sampling collection on the request link. According to Dapper’s experience at Google, for many common scenarios, even sampling 1 in 1,000 requests will yield enough information.
Another unique feature is that they achieve very high application transparency. This is due to the homogeneity of Google’s application cluster deployment, which allows them to restrict the link-tracing infrastructure implementation code to the underlying software without the need to add external annotations to the application. For example, cluster applications that use the same HTTP library, message notification library, thread pool factory, and RPC library can restrict link tracing facilities to these code modules.
How to define link information?
In this paper, a simple example of call chain is given at first, as shown in Figure 7. The author believes that distributed tracing of a request requires collecting the identification code of the message as well as the corresponding event and time of the message. In the case of RPC only, the calling link can be understood as an RPCs nested tree. Of course, Google’s internal data model is not limited to RPCs calls.
Figure 7.Copy the code
Figure 8 illustrates the structure of Dapper tracking tree, and the nodes of the tree are basic units, called span. The border lines are the connections between father and son spans. A SPAN simply has a start and end time stamp, RPC time, or application-specific annotation information. To rebuild the Dapper trace tree, the SPAN also needs to contain the following information:
span name
: Easy-to-read name, as shown in Figure 8Frontend.Request
span id
: a unique identifier of 64bitparent id
Father:span id
Figure 8.Copy the code
Figure 9 shows the details of an RPC Span. It is worth noting that the same SPAN may contain information for multiple hosts. In fact, each RPC Span contains annotations for both client and server processing. Because the client and server timestamps come from different hosts, you need to pay special attention to exceptions to these times. Figure 9 shows the details of a span
Figure 9.Copy the code
How to achieve application-level transparency?
Dapper adds measuring points to some general packages to realize distributed link tracking with zero interference for application developers. The main practices are as follows:
- When a thread is working on a link trace path, Dapper associates the trace context with thread local storage. Tracking context is a small and easy to copy span information easily.
- If the computation is delayed or one-step, most Google developers use the common control flow library to construct callbacks and use thread pools thread pools or other actuators to schedule them. In this way, Dapper ensures that all callback functions store trace context when created, and that trace context is associated with the correct thread when the callback function is executed.
- Almost all of Google’s in-thread communication is built on an RPC framework, including C++ and Java implementations. Measures were added to the framework to define the span associated with all RPC calls. The IDS of span and trace are passed from the client to the server in the traced RPC. Google is a necessary measurement point.
At the end
Dapper’s paper provides data model design that is easy to read and helpful for problem location, transparent measurement practice at application level and low overhead design scheme, which clears many obstacles for the use of link tracing in industrial applications and inspires many developers. Since The Google Dapper paper came out, many developers were inspired by the paper, developed a variety of link tracking, 2012 Twitter open source Zipkin, Naver open source Pinpoint, 2015 Wu Cheng open source Skywalking, Uber open source Jaeger and others. Since then, link tracking has entered an era of contention.
Welcome to click on a button to subscribe to “Yunjian Big Tycoon” column, access to more quality content. Watch the ups and downs of cloud technology and listen to the big shots for guidance.