In this article, I will introduce you to distributed link tracing, which is a must for most companies using microservices, whether it is traditional microservices or the new generation of Service Mesh microservices. And the specific introduction of the content, this article is not completely about theory, but hope from theory to practice, guide everyone to operate, because only in this way can we really have a deep understanding and understanding from the technical level!

Overview of distributed link tracing

Before introducing a distributed link tracing system, we need to understand what is link tracing? As can be seen from the introduction of the monitoring system in this column, the observation data of the monitoring system mainly comes from the three aspects of statistical indicators, logs and link tracking. These data can be divided into two types: request level and aggregate level.

Request-level data mainly comes from real requests, such as an HTTP call, an RPC call, and so on, and link tracing is the type described in this article. The aggregation level is a measure of the interface request or aggregation of some parameter data, such as QPS, CPU utilization, etc. Log and statistics data can be either request-level or aggregation-level, as they may come from actual requests or from information recorded during the system’s own diagnostics.

For link tracing, its main logic is to record the complete behavior of the request link, so that link query, performance analysis, dependency, topology and other functions related to distributed link tracing can be realized in the form of visualization. As shown below:

In the figure above, it is assumed that A total of two microservices are involved in an interface invocation in the microservice system, and the invocation relationship is A->B->C respectively. Service B also generates an invocation relationship with A third-party service like Redis, and service C also needs to invoke MySQL database service. Therefore, what link tracing actually does is to record detailed call information on the complete link A->B (B->Redis) ->C (C->MySQL), such as interface response results and time consumption.

So how exactly is the data recorded on this call link? Next, we continue the above call chain as an example to analyze the specific composition and transmission form of link tracking information, so as to further understand the principle and concept of distributed link tracking system. The specific logic diagram is as follows:

As shown in the figure above, the monitored object of distributed link tracing is the link generated one call after another. Figure 1-8 shows a complete link Trace, which is recorded by the system with a unique TraceId. Each dependent call in the link generates a call trace information (Span). The first generated Span is called the Root Span, and the subsequent generated Span uses the identifier (Sid) of the previous Span as the parent ID (Pid) of this Span.

In this way, the Span information will be context-transmitted within or across the process along with the execution of the link. Through the Span data chain, the trail information generated by each link call can be connected in series, and the Annotation attached on each Span is the data source for monitoring and analysis of the call chain. This is the basic principle of distributed link tracing.

At this point, you might wonder: Is it too resource-intensive to monitor such a large amount of data? That’s true. Most link tracking systems have a set called Sampling rate, which controls the proportion of link information collected by the system to improve system performance. In many cases, a large amount of link information is the same. Therefore, we need to focus on the links that are relatively time-consuming and prone to errors, and it is not necessary to collect 100% of the links.

SkyWalking profile

Now that we’ve explained what link tracing is from a fundamental point of view, we’ll take a look at SkyWalking, the most popular distributed link tracing system.

SkyWalking is an excellent open source APM (Application Performance Management) system, which not only provides distributed tracking functions such as link tracking and link analysis, It also supports a series of functions related to application performance monitoring, such as performance indicator analysis, application and service dependency analysis, service topology analysis, and alarm, helping us effectively locate problems.

In terms of data collection, SkyWalking supports a variety of different data sources and formats, including non-invasive Agent probes for Java,.NET Core, NodeJS, PHP, Python and other languages, as well as Service Mesh architecture. Its specific structure is shown in the figure below:

As shown in the figure above, SkyWalking’s core consists of a Receiver Cluster and an AggregatorCluster. Receiver Cluster is the access point of the whole back-end service and is specially used to collect various indicators and link information of the service.

AggregatorCluster collects and aggregates the collected data, and stores the aggregated data in a database. For example, ElasticSearch, MySQL, and TIDB can be used as required. The aggregated data can be used to set alarms or be accessed through HTTP by visual systems such as GUI or CLI.

In addition, from the perspective of data acquisition logic, SkyWalking supports multiple language probes and project protocols, which can cover most of the current mainstream distributed technology stacks, mainly including the following three types:

  • Metrics System: statistics System. Support for pulling metrics data directly from Prometheus to SkyWalking and pushing data from Micrometer;

  • Agents: service probe. Link tracing refers to the integration of probe services in each service system for link tracing, that is, link data collection. SkyWalking supports probes in Java, Go,.NET, PHP, NodeJS, Python, Nginx LUA, and more. In addition, it supports data transfer via gRPC or HTTP;

  • Service Mesh: SkyWalking also supports the monitoring of the new-generation micro-service architecture Service Mesh. It can collect data on the data plane and control plane through the specific Service Mesh protocol to realize the observation of link data of the Service grid.

The above provides a brief overview of SkyWalking’s basics and a brief analysis of its system architecture. In fact, SkyWalking has developed rapidly in the past two years and has an active community. It has been used more and more widely in the fields of micro-service link tracking and application performance monitoring. Due to lack of space, it is not possible to share more in-depth information here.

SkyWalking installation and deployment

The previous sections introduced the fundamentals of distributed link tracking, with a special emphasis on SkyWalking! Obviously, this article would be of little value if it ended here, because it’s just a bunch of correct nonsense that you read and forget! This is definitely not my style to share, so let’s play SkyWalking from an experimental point of view.

The following contents need to be carried out the actual experiment operation, if it is not convenient on the subway, you can collect them first, and then play the specific experiment when you have time!

The deployment of SkyWalking mainly involves the back-end OAP Server and the front-end UI, which can be deployed on physical machines, virtual machines or Kubernetes clusters according to actual needs. To demonstrate environmental consistency, we chose to deploy SkyWalking’s back-end services and UI separately into the Kubernetes cluster.

SkyWalking can be installed in Helm mode through the official Kubernetes deployment file, or manually written Kubernetes deployment file. In order to facilitate learning, we use the latter method. The specific steps are as follows:

1) Create a Namespace in Kubernetes cluster that runs SkyWalking containers alone. The command is as follows:

$kubectl create ns skywalking $kubectl create ns skywalkingCopy the code

After the command is executed, you can run the following command to check whether the Namespace is successfully created:

$kubectl get NS NAME STATUS AGE default Active 10d kube-node-lease Active 10d kube-public Active 10d kube-system Active 10d kubernetes-dashboard Active 10d skywalking Active 46sCopy the code

You can see that the Skywalking space has been successfully created!

2) Wrote Skywalk-UI and Kubernetes deployment files of OAP Server service

In the process of preparing specific Kubernetes deployment files need to specify skywalk-UI and OAP Server container images, generally can be manually packaged through the source code or can directly use the official packaged image. In order to facilitate the demonstration, the image has been packaged in the official Docker image warehouse. As shown in the figure:

As shown in the above two figures, we found the official container image versions of Skywalk-UI and OAP Server respectively in the official image repository of Docker Hub, then we wrote the specific deployment file.

Write SkyWalking server Kubernetes deployment file (SkyWalking -Aop.yML), the details are as follows:

apiVersion: apps/v1 kind: Deployment metadata: name: oap namespace: skywalking spec: replicas: 1 selector: matchLabels: app: oap release: skywalking template: metadata: labels: app: oap release: skywalking spec: containers: - name: Image: Apache/Skywalking - OAP-server :8.3.0-es7 imagePullPolicy: IfNotPresent Ports: - containerPort: 11800 name: grpc - containerPort: 12800 name: rest --- apiVersion: v1 kind: Service metadata: name: Oap Namespace: Skywalking labels: service: OAP spec: ports: #restful port-port: 12800 Name: rest # RPC port-port: 11800 name: grpc - port: 1234 name: page selector: app: oapCopy the code

The above is a standard Kubernetes deployment file. For details about the instructions in the file, please refer to the Kubernetes related information.

Write the Skywalk-UI deployment file (Skywalk-uI.yml) as follows:

apiVersion: apps/v1 kind: Deployment metadata: name: ui-deployment namespace: skywalking labels: app: ui spec: replicas: 1 selector: matchLabels: app: ui template: metadata: labels: app: ui spec: containers: - name: ui image: Apache/Skywalking -UI :8.3.0 ports: -containerPort: 8080 Name: page env: - name: SW_OAP_ADDRESS value: oap:12800 --- apiVersion: v1 kind: Service metadata: name: ui namespace: skywalking labels: service: ui spec: ports: - port: 8080 name: page nodePort: 31234 type: NodePort selector: app: uiCopy the code

3) Execute Kubernetes deployment command according to the prepared deployment file

Based on the Kubernetes distribution file written in the previous step, here we execute the deployment command directly from the distribution file written as follows:

# go to the directory where the publishing files are stored, $kubectl apply -f. Deployment. apps/ oAP created Service/OAP created deployment.apps/ uI-deployment created service/ui createdCopy the code

Run the following command to view the deployment status:

$kubectl get all-n skywalking NAME READY STATUS RESTARTS AGE $kubectl get all-n skywalking NAME READY STATUS RESTARTS AGE pod/oap-5f6d6bc4f6-k4mvv 1/1 Running 0 36h pod/ui-deployment-868c66449d-fffrt 1/1 Running 0 36h NAME TYPE CLUSTER-IP EXTERNAL - IP PORT (S) AGE service/oap ClusterIP 10.110.112.244 < none > 12800 / TCP, 11800 / TCP, 1234 / TCP 36 h service/UI NodePort 10.100.154.93 < None > 8080:31234/TCP 36h NAME READY up-to-date AVAILABLE AGE deployment.apps/ OAP 1/1 1 1 36h deployment.apps/ui-deployment 1/1 1 1 36h NAME DESIRED CURRENT READY AGE replicaset.apps/oap-5f6d6bc4f6 1 1 1 36h replicaset.apps/ui-deployment-868c66449d 1 1 1 36hCopy the code

You can see that the Deployed SkyWalking services are up and running! If it is the first deployment, the image pulling process may be slow. If there are problems during deployment, you can also view the run logs of the Pod object, for example:

$kubectl logs POD/oap-5f6d6bc4f6-k4mvV-n skywalkingCopy the code

4) Check skywalk-UI’s Web access address

After the above steps, we have successfully run skywalk-UI and OAP Server in Kubernetes cluster. Then access the Web UI through the mapping port of the Skywalk-UI service (defined as port 31234 in the K8S deployment file), which can be accessed through **http://NodeIP:31234**, for example:

# the IP for outward exposure Kubernetes cluster node entry IP http://10.211.55.12:31234/Copy the code

If you do not know the Kubernetes cluster node IP address, run the following command to query the IP address:

$kubectl describe Node Kubernetes Name: Kubernetes Roles: master Addresses: InternalIP: 10.211.55.12 Hostname: kubernetes...Copy the code

The interface display effect after access is as follows:

As shown in the figure above, you can see that SkyWalking is running successfully. There is no service access, so there is no monitoring data for the time being!

Afterword.

As mentioned above, we have successfully deployed the distributed link tracking system in Kubernetes environment. In addition, we cannot see any link tracking data for the time being because there is no service access. However, due to the lack of space, we will not continue to introduce how to connect Java microservices to SkyWalking. But this access process is very interesting, because it is the key for us as developers to further understand the integration and interaction between microservices and distributed link tracking systems! I will share this part as a sequel in the next article, it won’t be long, look forward to your attention!

Write in the last

Welcome to pay attention to my public number [calm as code], massive Java related articles, learning materials will be updated in it, sorting out the data will be placed in it.

If you think it’s written well, click a “like” and add a follow! Point attention, do not get lost, continue to update!!