Because skywalking was used, the process of using it was very smooth, and I really appreciate it. So write specifically about Skywalking architecture and design.
I’ve had some similar experience before
- Wrote a Skywalking plug-in for RPC services
- Read up on many byteBuddy examples like Mockito
- Used link tracking services like Jaeger
But it didn’t do a comprehensive review. This article will cover Skywalking architecture and design as thoroughly as possible.
For those of you who have never used SkyWalking before, check out the user guide.
Skywalking target
Skywalking aims to be a full-link distributed solution.
Concepts and principles of Skywalking
Skywalking is also based on the OpenTracing protocol.
OpenTracing agreement
OpenTracing is a distributed tracing protocol. When we disintegrate a system into a service-oriented, distributed system, we may need to log in to multiple machines to query a problem. OpenTracing makes it easy for developers to add (or replace) tracking system implementations by providing platform-independent, vendor-independent apis.
However, Opentracing is not a standard, but requires developers to add Instrumentation to code applications.
Span
One concept that is essential to understanding the design of distributed call chains is Span. Span is a unit of operation with an operation name and duration.
- Each span can have its own tags
- logs
- There are also father and son relationships
The in-process
Within the same process, Span looks like this:
Parent Span Contains the duration of the Child Span. However, the tags and logs are independent of each other.
Across the process
I don’t need to use SpanContexts when passing across processes. However, there are many ways to make cross-process calls, such as HTTP/GRPC/Kafka etc.
In order to abstract the unified concept, OpenTracing proposed the API of Tracer (io.openTracing.Tracer) to operate the spanContext through carrier
Carrier handling for Skywalking
- Bring the request to the contextCarrier
final ContextCarrier contextCarrier = newContextCarrier(); String remotePeer = meta.getCallInfo().getCallee(); AbstractSpan span = ContextManager.createExitSpan(invocation.getFunc(), contextCarrier, remotePeer); . CarrierItem next = contextCarrier.items();while (next.hasNext()) {
next = next.next();
request.getAttachments().put(next.getHeadKey(), next.getHeadValue());
}
Copy the code
This code passes the ContextCarrier as Attachments to the request to the caller.
- The caller sets the requested getAttachment to the CarrierItem
CarrierItem next = contextCarrier.items();
while (next.hasNext()) {
next = next.next();
String headKey = next.getHeadKey();
next.setHeadValue(new String((byte[]) request.getAttachment(headKey)));
}
Copy the code
The code Attachments to the ContextCarrier. This ensures that the caller and the called have the same carrier and are linked together. As shown below, they are placed together on the UI. The diagram below:
- Span3 invokes services on the server side
- Blue server processing, SPAN4 becomes span3’s child.
Several different types of spans
Keep these concepts in mind when programming.
- ExitSpan: generally used for clients
- EntrySpan: Generally used as a starting point for services
- LocalSpan: related calls to local
Weapon: non-intrusive bytecode enhancement
Not only Skywalking, but many other frameworks also use non-intrusive bytecode enhancement techniques, which provide the easiest experience to use.
Principles of bytecode enhancement
- javaagent Instrument
JVMTI can support third-party tools to connect to and access the JVM in a proxy manner, and use the rich programming interface provided by JVMTI to complete many jVM-RELATED functions. At JVM startup, the Agent JAR is passed in with the JVM parameter -JavaAgent, and the Instrument Agent is loaded, or the Agent package can be attached after JVM startup
Skywalking works by using the JVM parameter javaAgent at JVM startup.
public static void premain(String agentArgs, Instrumentation inst) {
// SK's Transformer is very extensible and supports plug-in mechanism.
inst.addTransformer(new MyClassFileTransformer());
}
Copy the code
Bytecode framework
- ASM is a bytecode framework that is very difficult to use
- Bytebuddy is also a bytecode framework that is easy to use
The bytecode framework can be used to modify the bytecode and class replacement can be accomplished through JVM TI and JavaAgent technology.
Tracing implementation principles
Skywalking Agent uses a plugin mechanism to allow more developers to participate in development and to be extensible. All plugins are loaded at agent startup for bytecode enhancement.
There are two core problems with Plugins:
(1) Create a span that displays the Trace call chain
(2) Consider how to transport it. For example, Kafka needs to consider how to add it to the Kafka header. HTTP needs to be considered for HTTP headers.
Skywalking’s overall architecture
Official website source: github.com/apache/skyw…
Skywalking currently wants to develop a Tracing, Metrics and Logging all-in-one solution.
Functional decomposition
This can be seen from the overall architecture diagram. Skywalking has clear functions and strong decoupling.
- Data collection: Tracing relies on probe (Agent), Metrics relies on Prometheus or the new Open Telemetry, and logs go through ES or Fluentd.
- Data transmission: Transmitted to Skywalking Reveiver via Kafka, Grpc, and HTTP
- Data analysis and analysis: the OAP system performs data analysis and analysis.
- Data storage: The back-end interface supports multiple storage implementations, such as ES.
- UI module: Query through GraphQL, and then display through the front-end built by VUE.
- Alarm: Supports multiple alarms. The latest version supports nailing.
Microservices want to use monitoring tracking, as long as the corresponding agent can be added.
other
Sampling rate
Skywalking Agent adjusts the sampling rate to reduce data upload
Set the number of samples within 3 seconds by agent.sample_N_per_3_secs. Generally 500~2000 is the appropriate value. The default is -1 full sampling. After the agent sampling rate is set, if sampling is carried out in the upstream of the call chain, the downstream will ignore the sampling rate for forced sampling to ensure the integrity of the Trace call chain
performance
Performance wastage control Because you are operating in a production environment and cannot seriously impact existing code, you need to control performance wastage.
- Skywalking’s performance profiling requires no burying points, and thus no additional log printing overhead, compared to writing log printing invassively.
- Send logs asynchronously in batches.
conclusion
This article provides a brief overview of Skywalking architecture and design. In general, Skywalking has a simple architecture, is easy to use using non-intrusive bytecode enhancement technology, and has strong scalability, which provides great convenience for Java program distributed monitoring scheme.