Because skywalking was used, the process of using it was very smooth, and I really appreciate it. So write specifically about Skywalking architecture and design.

I’ve had some similar experience before

  • Wrote a Skywalking plug-in for RPC services
  • Read up on many byteBuddy examples like Mockito
  • Used link tracking services like Jaeger

But it didn’t do a comprehensive review. This article will cover Skywalking architecture and design as thoroughly as possible.

For those of you who have never used SkyWalking before, check out the user guide.

Skywalking target

Skywalking aims to be a full-link distributed solution.

Concepts and principles of Skywalking

Skywalking is also based on the OpenTracing protocol.

OpenTracing agreement

OpenTracing is a distributed tracing protocol. When we disintegrate a system into a service-oriented, distributed system, we may need to log in to multiple machines to query a problem. OpenTracing makes it easy for developers to add (or replace) tracking system implementations by providing platform-independent, vendor-independent apis.

However, Opentracing is not a standard, but requires developers to add Instrumentation to code applications.

Span

One concept that is essential to understanding the design of distributed call chains is Span. Span is a unit of operation with an operation name and duration.

  • Each span can have its own tags
  • logs
  • There are also father and son relationships

The in-process

Within the same process, Span looks like this:

Parent Span Contains the duration of the Child Span. However, the tags and logs are independent of each other.

Across the process

I don’t need to use SpanContexts when passing across processes. However, there are many ways to make cross-process calls, such as HTTP/GRPC/Kafka etc.

In order to abstract the unified concept, OpenTracing proposed the API of Tracer (io.openTracing.Tracer) to operate the spanContext through carrier

Carrier handling for Skywalking

  • Bring the request to the contextCarrier
final ContextCarrier contextCarrier = newContextCarrier(); String remotePeer = meta.getCallInfo().getCallee(); AbstractSpan span = ContextManager.createExitSpan(invocation.getFunc(), contextCarrier, remotePeer); . CarrierItem next = contextCarrier.items();while (next.hasNext()) {
    next = next.next();
    request.getAttachments().put(next.getHeadKey(), next.getHeadValue());
}

Copy the code

This code passes the ContextCarrier as Attachments to the request to the caller.

  • The caller sets the requested getAttachment to the CarrierItem
CarrierItem next = contextCarrier.items();
while (next.hasNext()) {
    next = next.next();
    String headKey = next.getHeadKey();
    next.setHeadValue(new String((byte[]) request.getAttachment(headKey)));
}
Copy the code

The code Attachments to the ContextCarrier. This ensures that the caller and the called have the same carrier and are linked together. As shown below, they are placed together on the UI. The diagram below:

  • Span3 invokes services on the server side
  • Blue server processing, SPAN4 becomes span3’s child.

Several different types of spans

Keep these concepts in mind when programming.

  • ExitSpan: generally used for clients
  • EntrySpan: Generally used as a starting point for services
  • LocalSpan: related calls to local

Weapon: non-intrusive bytecode enhancement

Not only Skywalking, but many other frameworks also use non-intrusive bytecode enhancement techniques, which provide the easiest experience to use.

Principles of bytecode enhancement

  • javaagent Instrument

JVMTI can support third-party tools to connect to and access the JVM in a proxy manner, and use the rich programming interface provided by JVMTI to complete many jVM-RELATED functions. At JVM startup, the Agent JAR is passed in with the JVM parameter -JavaAgent, and the Instrument Agent is loaded, or the Agent package can be attached after JVM startup

Skywalking works by using the JVM parameter javaAgent at JVM startup.

public static void premain(String agentArgs, Instrumentation inst) {
  // SK's Transformer is very extensible and supports plug-in mechanism.
    inst.addTransformer(new MyClassFileTransformer());
}
Copy the code

Bytecode framework

  • ASM is a bytecode framework that is very difficult to use
  • Bytebuddy is also a bytecode framework that is easy to use

The bytecode framework can be used to modify the bytecode and class replacement can be accomplished through JVM TI and JavaAgent technology.

Tracing implementation principles

Skywalking Agent uses a plugin mechanism to allow more developers to participate in development and to be extensible. All plugins are loaded at agent startup for bytecode enhancement.

There are two core problems with Plugins:

(1) Create a span that displays the Trace call chain

(2) Consider how to transport it. For example, Kafka needs to consider how to add it to the Kafka header. HTTP needs to be considered for HTTP headers.

Skywalking’s overall architecture

Official website source: github.com/apache/skyw…

Skywalking currently wants to develop a Tracing, Metrics and Logging all-in-one solution.

Functional decomposition

This can be seen from the overall architecture diagram. Skywalking has clear functions and strong decoupling.

  • Data collection: Tracing relies on probe (Agent), Metrics relies on Prometheus or the new Open Telemetry, and logs go through ES or Fluentd.
  • Data transmission: Transmitted to Skywalking Reveiver via Kafka, Grpc, and HTTP
  • Data analysis and analysis: the OAP system performs data analysis and analysis.
  • Data storage: The back-end interface supports multiple storage implementations, such as ES.
  • UI module: Query through GraphQL, and then display through the front-end built by VUE.
  • Alarm: Supports multiple alarms. The latest version supports nailing.

Microservices want to use monitoring tracking, as long as the corresponding agent can be added.

other

Sampling rate

Skywalking Agent adjusts the sampling rate to reduce data upload

Set the number of samples within 3 seconds by agent.sample_N_per_3_secs. Generally 500~2000 is the appropriate value. The default is -1 full sampling. After the agent sampling rate is set, if sampling is carried out in the upstream of the call chain, the downstream will ignore the sampling rate for forced sampling to ensure the integrity of the Trace call chain

performance

Performance wastage control Because you are operating in a production environment and cannot seriously impact existing code, you need to control performance wastage.

  • Skywalking’s performance profiling requires no burying points, and thus no additional log printing overhead, compared to writing log printing invassively.
  • Send logs asynchronously in batches.

conclusion

This article provides a brief overview of Skywalking architecture and design. In general, Skywalking has a simple architecture, is easy to use using non-intrusive bytecode enhancement technology, and has strong scalability, which provides great convenience for Java program distributed monitoring scheme.