Link tracing based on OpenTelemetry
The past life of link tracing
Distributed tracing (also known as distributed request tracing) is a method for analyzing and monitoring applications, especially those built using microservices architectures. Distributed tracing helps pinpoint where faults occur and the cause of poor performance.
The origin of
The term Distributed Tracing first appeared in the paper Dapper, published by Google. In a large-scale Distributed Systems Tracing Infrastructure, this paper still has a deep influence on the implementation of link Tracing and the design concept of Jaeger, Zipkin and other open source Distributed Tracing projects that emerged later.
Microservices architecture is a distributed architecture with many different services. Different services called each other before, if something went wrong because a request went through N services. As the business grows and there are more and more calls between services, without a tool to keep track of the call chain, the problem solving will be as clueless as the cat playing with the ball of wool in the picture belowSo you need a tool that knows exactly what services a request goes through, and in what order, so you can easily locate problems.
Hundred flowers
After Google released Dapper, there are more and more distributed link tracing tools. The following lists some commonly used link tracing systems
- Skywalking
- Ali eagle eye
- Dianping CAT
- Twitter Zipkin
- Naver pinpoint
- Uber Jaeger
Tit for tat?
As the number of link-tracking tools grows, the open source world is divided into two main groups, one is the OtherTechnical committee member of CNCFWill give priority toOpenTracing“, such as Jaeger ZipkinOpenTracingThe specification. Google is the initiator of the otherOpenCensusGoogle itself was the first company to come up with the concept of link tracking, and Microsoft later joined inOpenCensus
The birth of OpenTelemetry
OpenTelemetric is a set of apis, SDKS, modules, and integrations designed for creating and managing the astrologer telemetry data, such as tracking, metrics, and logging
After Microsoft joined OpenCensus, it directly broke the balance before, and indirectly led to the birth of OpenTelemetry. Google and Microsoft decided to end the chaos. The first problem was how to integrate the existing projects of the two communities. Compatible with OpenCensus and OpenTracing, users can access OpenTelemetry with little or no change
Kratos link tracing practices
Kratos is a set of lightweight Go microservices framework, including a number of microservices related frameworks and tools.
Tracing the middleware
There is a middleware named Tracing middleware provided by Kratos framework, which implements link tracing function of Kratos framework based on Opentelemetry. The middleware code can be seen from middleware/ Tracing.
Realize the principle of
Kratos link tracking middleware is composed of three files. Carrie, tracer. Go, go tracing. Go. The implementation principle of client and server is basically the same, this paper analyzes the principle of server implementation.
- First, when a request comes in, the tracing middleware is called. The NewTracer method in tracer.go is called first
// Server returns a new server middleware for OpenTelemetry.
func Server(opts ... Option) middleware.Middleware {
// Calling NewTracer in tracer.go passes in a SpanKindServer and configuration item
tracer := NewTracer(trace.SpanKindServer, opts...)
/ /... Omit code
}
Copy the code
- The NewTracer method in tracer.go returns a tracer as follows
func NewTracer(kind trace.SpanKind, opts ... Option) *Tracer {options := options{}
for _, o := range opts {
o(&options)
}
// Determine if the OTEL trace provider configuration exists and set it if it does
ifoptions.TracerProvider ! = nil { otel.SetTracerProvider(options.TracerProvider) }*/ * Check if there is a Propagators set, Propagators if there is one, and set a default TextMapPropagator if there is none
ifoptions.Propagators ! = nil { otel.SetTextMapPropagator(options.Propagators) }else { otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.Baggage{}, propagation.TraceContext{}))
}
var name string
// Determine the current type of middleware, whether server or client
if kind == trace.SpanKindServer {
name = "server"
} else if kind == trace.SpanKindClient {
name = "client"
} else {
panic(fmt.Sprintf("unsupported span kind: %v", kind))
}
// Call the otel package Tracer method and pass in name to create a Tracer instance
tracer := otel.Tracer(name)
return &Tracer{tracer: tracer, kind: kind}
}
Copy the code
- Determine the current request type, process the data to be collected, and call the Start method in tracer.go
var (
component string
operation string
carrier propagation.TextMapCarrier
)
// Determine the request type
if info, ok := http.FromServerContext(ctx); ok {
// HTTP
component = "HTTP"
// Retrieve the requested address
operation = info.Request.RequestURI
// Call HeaderCarrier in otel/ Propagation package. HTTP.Header is processed to satisfy TextMapCarrier Interface
// TextMapCarrier is a text mapping carrier used to carry information
carrier = propagation.HeaderCarrier(info.Request.Header)
/ / otel. GetTextMapPropagator (). The Extract text () method is used to map the carrier, read into context
ctx = otel.GetTextMapPropagator().Extract(ctx, propagation.HeaderCarrier(info.Request.Header))
} else if info, ok := grpc.FromServerContext(ctx); ok {
// Grpc
component = "gRPC"
operation = info.FullMethod
//
/ / call GRPC/metadata package metadata. FromIncomingContext (CTX) was introduced into CTX, convert GRPC metadata
if md, ok := metadata.FromIncomingContext(ctx); ok {
// Call MetadataCarrier in carrier.go to convert the MD into a text mapping carrier
carrier = MetadataCarrier(md)
}
}
// Call the tracer.start method
ctx, span := tracer.Start(ctx, component, operation, carrier)
/ /... Omit code
}
Copy the code
- Call the Start method in tracing. Go
func (t *Tracer) Start(ctx context.Context, component string, operation string, carrier propagation.TextMapCarrier) (context.Context, trace.Span) {
// Determine if the current middleware is server and inject carrier into the context
if t.kind == trace.SpanKindServer {
ctx = otel.GetTextMapPropagator().Extract(ctx, carrier)
}
// Call the start method in the otel/tracer package to create a span
ctx, span := t.tracer.Start(ctx,
// Tracing. Go declared the request route as spanName
operation,
// Set the span property, set a component, component is the request type
trace.WithAttributes(attribute.String("component", component)),
// Set the span type
trace.WithSpanKind(t.kind),
)
// Determine if the current middleware is a client and inject carrier into the request
if t.kind == trace.SpanKindClient {
otel.GetTextMapPropagator().Inject(ctx, carrier)
}
return ctx, span
}
Copy the code
- Defer declared a closure method
// Note here that closure is needed because defer's arguments are evaluated in real time and err will remain nil if an exception occurs
// https://github.com/go-kratos/kratos/issues/927
defer func(a) { tracer.End(ctx, span, err) }()
Copy the code
- Middleware continues execution
/ / tracing. Go 69 rows
reply, err = handler(ctx, req)
Copy the code
- The End method in tracer.go is executed after the closure in defer is called
func (t *Tracer) End(ctx context.Context, span trace.Span, err error) {
// Check whether exceptions occur. If yes, set some exception information
iferr ! =nil {
// Record an exception
span.RecordError(err)
// Set the span property
span.SetAttributes(
// Set the event to exception
attribute.String("event"."error"),
// Set message to err.error ().
attribute.String("message", err.Error()),
)
// Set the span state
span.SetStatus(codes.Error, err.Error())
} else {
// If no exception occurs, the span state is OK
span.SetStatus(codes.Ok, "OK")}/ / suspended span
span.End()
}
Copy the code
How to use
Examples of tracing middleware can be found from Kratos /examples/traces, which simply implements link tracing across services. The following code snippet contains some examples.
// https://github.com/go-kratos/kratos/blob/7f835db398c9d0332e69b81bad4c652b4b45ae2e/examples/traces/app/message/main.go#L3 8
// First call the otel library method to get a TracerProvider
func tracerProvider(url string) (*tracesdk.TracerProvider, error) {
Jaeger is used in examples/traces, and you can see opentElemetry for other methods
exp, err := jaeger.NewRawExporter(jaeger.WithCollectorEndpoint(jaeger.WithEndpoint(url)))
iferr ! =nil {
return nil, err
}
tp := tracesdk.NewTracerProvider(
tracesdk.WithSampler(tracesdk.AlwaysSample()),
// Set up the Batcher and register the Jaeger exporter
tracesdk.WithBatcher(exp),
// Record some default information
tracesdk.WithResource(resource.NewWithAttributes(
semconv.ServiceNameKey.String(pb.User_ServiceDesc.ServiceName),
attribute.String("environment"."development"),
attribute.Int64("ID".1),)))return tp, nil
}
Copy the code
Used in GRPC/Server
// https://github.com/go-kratos/kratos/blob/main/examples/traces/app/message/main.go
grpcSrv := grpc.NewServer(
grpc.Address(": 9000"),
grpc.Middleware(
middleware.Chain(
recovery.Recovery(),
// Configuring tracing Middleware
tracing.Server(
tracing.WithTracerProvider(tp),
),
),
logging.Server(logger),
),
))
Copy the code
Used in GRPC/Client
// https://github.com/go-kratos/kratos/blob/149fc0195eb62ee1fbc2728adb92e1bcd1a12c4e/examples/traces/app/user/main.go#L63
conn, err := grpc.DialInsecure(ctx,
grpc.WithEndpoint("127.0.0.1:9000"),
grpc.WithMiddleware(middleware.Chain(
tracing.Client(
tracing.WithTracerProvider(s.tracer),
tracing.WithPropagators(
propagation.NewCompositeTextMapPropagator(propagation.Baggage{}, propagation.TraceContext{}),
),
),
recovery.Recovery())),
grpc.WithTimeout(2*time.Second),
)
Copy the code
Used in HTTP/Server
// https://github.com/go-kratos/kratos/blob/main/examples/traces/app/user/main.go
httpSrv := http.NewServer(http.Address(": 8000"))
httpSrv.HandlePrefix("/", pb.NewUserHandler(s,
http.Middleware(
middleware.Chain(
recovery.Recovery(),
// Configuring tracing middleware
tracing.Server(
tracing.WithTracerProvider(tp),
tracing.WithPropagators(
propagation.NewCompositeTextMapPropagator(propagation.Baggage{}, propagation.TraceContext{}),
),
),
logging.Server(logger),
),
)),
)
Copy the code
Used in HTTP /client
http.NewClient(ctx, http.WithMiddleware(
tracing.Client(
tracing.WithTracerProvider(s.tracer),
),
))
Copy the code
How to implement a tracing of other scenes
We can refer to the code of Kratos tracing middleware to realize the tracing of database. The author uses tracing middleware to realize the tracing of qMgo library to operate MongoDB database.
func mongoTracer(ctx context.Context,tp trace.TracerProvider, command interface{}) {
otel.SetTracerProvider(tp)
var (
commandName string
failure string
nanos int64
reply bson.Raw
queryId int64
eventName string
)
reply = bson.Raw{}
switch value := command.(type) {
case *event.CommandStartedEvent:
commandName = value.CommandName
reply = value.Command
queryId = value.RequestID
eventName = "CommandStartedEvent"
case *event.CommandSucceededEvent:
commandName = value.CommandName
nanos = value.DurationNanos
queryId = value.RequestID
eventName = "CommandSucceededEvent"
case *event.CommandFailedEvent:
commandName = value.CommandName
failure = value.Failure
nanos = value.DurationNanos
queryId = value.RequestID
eventName = "CommandFailedEvent"
}
duration, _ := time.ParseDuration(strconv.FormatInt(nanos, 10) + "ns")
tracer := otel.Tracer("mongodb")
kind := trace.SpanKindServer
ctx, span := tracer.Start(ctx,
commandName,
trace.WithAttributes(
attribute.String("event", eventName),
attribute.String("command", commandName),
attribute.String("query", reply.String()),
attribute.Int64("queryId", queryId),
attribute.String("ms", duration.String()),
),
trace.WithSpanKind(kind),
)
iffailure ! ="" {
span.RecordError(errors.New(failure))
}
span.End()
}
Copy the code
reference
- Dapper, a Large-scale Distributed Systems Tracing Infrastructure
- OpenTelemetry website
- KubeCon2019 OpenTelemetry share
- Kratos framework
- Traces of sample