Introduction: Recently in the investigation and selection of distributed call chain monitoring components. Three main APM components are selected for practice and comparison. Originally intended to write an article, the length is too long, plan to divide into two. This paper mainly introduces the basic concept of traceing and the practice of several APM components. The practice part also does not give specific steps, because the focus of this paper is not on specific steps. The second part will talk about the comparison and performance test of several APM models.
1. Background
In the microservice architecture, services are divided according to different dimensions. A request often involves multiple services. Internet applications are built on different sets of software modules, which may be developed by different teams, implemented in different programming languages, and spread across thousands of servers across different data centers. Therefore, you need tools to understand system behavior and analyze performance problems, so that faults can be quickly located and rectified.
A distributed call chain monitoring component was created in such an environment. The most famous is the Dapper mentioned in Google’s published paper. Dapper was developed to collect more information about the behavior of complex distributed systems and present it to Google developers. Such distributed systems have a special benefit because those large-scale low-end servers, as carriers of Internet services, are a special and cost-effective platform. To understand the behavior of distributed systems in this context, you need to monitor the associated actions across different applications and different servers.
Most of the theoretical models of APM (Application Performance Management) on the market borrow from Google Dapper’s paper. This paper focuses on the following APM components:
- Zipkin is an open source distributed tracking system designed to collect timed data from services to address latency issues in microservices architecture, including data collection, storage, lookup, and presentation.
- Pinpoint Pinpoint is a large-scale distributed system written on Java APM tools, distributed tracking components open source by Koreans.
- Skywalking’s homegrown excellent APM component is a system for tracking, alerting and analyzing the business performance of JAVA distributed application clusters.
Other similar components include CAT from Meituan Dianping and EgleEye from Taobao.
As mentioned above, what are the requirements for selecting a link monitoring component? As mentioned in Dapper, the author summarizes as follows:
-
Probe performance cost. The impact of APM component services should be minimal. In some highly optimized services, even a small loss can be easily detected and may force the deployment team of the online service to shut down the tracking system.
-
The code is so intrusive that application programmers don’t need to know about the tracking system. If a tracking system is to be effective, it must rely on the active cooperation of the application developer, so the tracking system is too fragile, often due to the bugs or negligence of the tracking system implanted code in the application, which is unable to meet the requirement of “ubiquitous deployment” of the tracking system.
-
The more components scalability can support, the better. Or provide convenient plug-in development API, for some components that are not monitored, application developers can also extend themselves.
-
Analysis of data Analysis of data should be fast, analysis of as many dimensions as possible. Tracking systems provide feedback fast enough to respond quickly to abnormal conditions in production environments. Comprehensive analysis can avoid secondary development.
2. Basic concepts
Of the components listed above, Zipkin was implemented strictly in accordance with the Google Dapper paper. Here are the basic concepts involved.
-
Span basic unit of work, one link call (RPC, DB, etc. No specific restrictions) to create a Span, identified by a 64-bit ID, uUID is convenient, Span also contains other data, such as description information, timestamp, Annotation tag information of key-value pairs, parent-ID, etc. Parent-id can indicate the source of the span call link.
-
Trace: A tree-like Span collection that represents a call link and has a unique identifier. Let’s say you’re running a distributed big data store and a Trace is made up of one of your requests.
-
Annotation: Used to record information related to a particular event (such as the time). It usually contains four Annotation information: (1) CS: Client Start, indicating that the Client initiates the request
Sr: Server Receive: indicates that the Server receives the request
(3) SS: Server Send: indicates that the Server finishes processing and sends the result to the client
(4) CR: Client Received: indicates that the Client receives the returned information from the server
2.1 Trace
So let’s see what Trace looks like in the system.
Note in each color has a span. A link is uniquely identified by TraceId. The SPAN identifies the request information. Tree nodes are the building blocks of the entire architecture, and each node is a reference to a SPAN. The lines between nodes represent a direct relationship between the span and its parent span. Although spans simply represent the start and end times of spans in log files, they are relatively independent in the overall tree structure.
2.2 Span
The figure above illustrates what span looks like during a large trace. Dapper records the span names, as well as the ID and parent ID of each span, to reconstruct the relationship between the different spans in a single trace. A span without a parent ID is called a root span. All spans hang on a specific trace and also share a trace ID.
2.3 the Annotation
Automated probes, which require no modifications to the application source code, track the distributed control path at near zero immersion cost to the application developer, and rely almost entirely on modifications based on a small number of common component libraries. Dapper also allows application developers to add additional information during Dapper tracing to monitor higher-level system behavior, or to help debug problems.
The following sections will introduce the use and practice of the three APM components.
3. zipkin
Zipkin consists of the following components: Collector Collects agent data, Storage storage, Web UI graphical interface, and Search queries data stored in storage. Zipkin provides simple JSON API to obtain data.
Our project builds microservices based on the microservices framework Spring Cloud. Spring Cloud also provides Spring-Cloud-sleUTH to facilitate integration with Zipkin implementations. So I tried spring-Cloud-Sleuth-Zipkin on the project.
Zipkin-server, Zipkin-client-Backend, and zipkin-client are created. The Server service collects and displays information. Client-backend Invokes clients and generates link information.
3.1 zipkin – server implementation
There are two main points to pay attention to in the implementation of Zipkin-Server. One is the storage of collected data, including memory, database, ES, etc. The other is the communication mode, including HTTP communication and MQ asynchronous communication. HTTP communication will affect normal access, so IT is recommended to communicate based on MQ asynchronous communication.
This article uses mysql as storage and MQ communication, which is based on Spring-cloud-Stream. This article is not focused on the specific implementation of Zipkin-Server, other ways, readers can go to the official website to check.
(1) PoM needs to add the following references:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<! - zipkin dependence - >
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin-stream</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-stream-rabbit</artifactId>
</dependency>
<dependency>
<groupId>io.zipkin.java</groupId>
<artifactId>zipkin-autoconfigure-ui</artifactId>
<scope>runtime</scope>
</dependency>
<! Save to database requires following dependencies
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>Copy the code
(2) Startup class:
// Start ZipkinServer with Stream
@EnableZipkinStreamServer
@SpringBootApplication
public class ZipkinStreamServerApplication {
public static void main(String[] args) { SpringApplication.run(ZipkinStreamServerApplication.class,args); }}Copy the code
The @enableZipkInstreamServer annotation introduces the @EnableZipkinServer annotation and also creates a Rabbit-MQ SleuthSink message queue listener.
(3) Configuration file
server:
port: 9411
spring:
datasource:
username: root
password: root123
schema[0]: classpath:/zipkin.sql
zipkin:
storage:
type: mysql
---
spring:
application:
name: microservice-zipkin-stream-server
rabbitmq:
host: ${RABBIT_ADDR:localhost}
port: ${RABBIT_PORT:5672}
username: guest
password: guest
sleuth:
enabled: false
profiles: default
datasource:
url: jdbc:mysql://localhost:3307/zipkin? autoReconnect=true&useSSL=falseCopy the code
Zipkin. SQL can be obtained from the official website, and the zipkin-server port number is 9411.
3.2 zipkin – client
The two Zipkin-clients are configured the same, so they are placed together.
(1) POM dependence
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth-zipkin-stream</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-stream-rabbit</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>Copy the code
(2) Configuration file
spring:
rabbitmq:
host: 127.0. 01.
port : 5672
username: guest
password: guestCopy the code
3.3 the results
The invocation relationship between services is as follows:
You can see that the client requests go through the Gateway to invoke various services on the Intranet, including the notice service. From the figure, you can clearly see the service through which the client requests. Take a look at the HTTP path in the demo2-default service instance:
In the figure above, the HTTP paths of demo2-default service are sorted by duration, showing the duration of trace calls and the number of spans. Click in to see:
The figure lists the time taken for each span starting with the parent span. In this trace, two services demo1 and Demo2 are involved. Demo2 calls Demo1 and calls Demo1 from 597ms. It takes 1265ms to complete the final request.
4. pinpoint
Zero invasion of code, the use of JavaAgent bytecode enhancement technology, only need to add startup parameters. Several components of pinpoint and Zipkin are similar, the architecture diagram is as follows:
Pinch-collector collects all kinds of performance data, pinch-Agent and its application associated with the probe, pinch-Web will collect the data displayed in the form of Web pages, HBase Storage data collected in HBase.
4.1 pinpoint installation
It mainly involves the installation of the following software:
-
The JDK 1.8 Java environment is required, there is no easy explanation.
-
Hbase pinpoint collect test data, mainly Hbase database. So it can collect a lot of data, it can do much more detailed analysis. After the Hbase installation, pinpoint database of Hbase needs to be initialized, which is provided by pinpoint database. Zookeeper is built into Hbase.
-
Pinpoint collector collector collects agent data, saves the data in hbase cluster, and exposes the TCP and udp listening port 9994,9995,9996 of collector.
-
Pinpoint web page, the configuration file to set environment variables HBASE_HOST, HBASE_PORT and so on.
-
pinpoint-agent
Download the pinch-agent-x-snapshot.tar. gz and configure the related collector information in the pinpoint.
Installation is really more troublesome, the length of this article is too long, the specific steps behind the separate article to explain.
4.2 run the pinpoint – agent
The author uses the spring-boot project, so only need to add -javaagent parameter in the command to start the JAR package, and specify the absolute path of the pinch-bootstrap package. Example code is as follows:
Java - javaagent: / aoho/auth_compose/pinpoint - the bootstrap - 1.6.0. Jar - Dpinpoint. AgentId = aoho - consumer - Dpinpoint. ApplicationName = aoho - consumer - jar id_generator/snowflake - id - generate - 1.0 - the SNAPSHOT. The jarCopy the code
The id generator service is relatively simple and does not use storage media such as databases. The service is registered on Consul and the local client requests id-server to obtain the ID. The call chain is as follows:
Pinpoint provide more rich functions, the following is the call/API/ID interface details.
Pinpoint pinpoint record the corresponding client time, IP address, etc., the call tree is also listed in detail below, the time of each method, etc.
ServerMap also shows the heap, persistent generation, CPU, and other information of the server, which is very powerful.
5. Skywalking
Skywalking is a domestic open source APM monitoring component. According to the official website of OpenSkywalking, it focuses on performance and real-time performance. The architecture of Skywalking found online.
Skywalking is also composed of four parts: Collector, Agent, Web and storage. Cluster deployment is supported, and GRPC communication is introduced between clusters. Storage supports built-in H2 and ElasticSearch storage.
5.1 installation
Specific installation can be found on the official website.
- The collector installed
Here, the author uses the standalone collector and downloads the compressed package from the release page. After decompression, the standalone Collector uses the H2 database by default, so the configuration file can be run without modificationbin/startup.sh
.
The directory structure is as above, logs folder, there are startup logs, you can view the startup status.
-
Decompress the Skywalk-UI and set the config/collector_config.properties, log4j2, and listening port information of the server.
-
Agent copies the Skywalk-agent directory to the desired location, probes contain the entire directory, and sets the collector information in /config/agent.config.
5.2 run the agent
Spring boot project, start and the above pinpoint Agent start the same way. Increase the JVM startup parameters – javaagent: / path/to/skywalking – agent/skywalking – agent. The jar.
This time, the user service is started, involving mysql, Redis, Consul and other components. It can be seen that its call link diagram is as follows:
When the/API /external/register-code and/API /external/validate-code interfaces are accessed, the call chain shown in the figure above is formed.
The TraceId in the preceding figure is 2.59.15103021777910001 request/API /external/register-code. In this trace, the time consuming of each involved span is counted in the figure.
The figure above shows the statistics of Entry Service List interface in userService, including the number of calls, success rate and other information. (Because of the built-in H2, this side of the UI responds very long)
There is also a statistics on instance, statistics about the JVM, API response is very slow, may be related to the storage I use, I will not screenshots.
6. Summary
This paper mainly writes the practice of link monitoring component. Firstly, the background of the generation and application of the link monitoring component is introduced, and the requirements for selecting it are introduced. Secondly, the basic concepts involved in OpenTracing are introduced. Finally, the installation and use of the three APM components are introduced, and the UI screenshots of each APM are shown. This paper is relatively simple, the next article mainly introduces the comparison and performance test of several APM models.
The source code of zipkin – server – the stream
github: Github.com/keets2012/S…
oschina: Gitee.com/keets/sprin…
Subscribe to the latest articles, welcome to follow my official account
Reference (crazy search for information)
- OpenTracing Official Standard – Chinese version
- Skywalking
- Zipkin
- PinPoint
- Spring Cloud Sleuth
- Dapper
- Pinpoint installation and deployment
- Java open source APM profile
- Introduction to Zipkin distributed system monitoring system
- Follow the path of microservices – extend the distributed call chain yourself