Distributed link tracking technology Sleuth + Zipkin

Application scenarios of distributed link tracing technology

  • Scene description

To support the growing volume of business, we will use microservices architecture to design our systems so that they can not only withstand the impact of traffic through cluster deployment, but also flexibly scale according to the business. Therefore, in the microservice architecture, a request can be completed at least three or four service invocations, and at most it can span dozens or even hundreds of service nodes. The questions then arise:

1) How to dynamically display the service invocation link? (such as what other services A service calls – dependencies)

2) How to analyze and tune the bottleneck nodes in the service invocation link? (For example, A — >B — >C, where the C service takes A very long time)

3) How to quickly discover faults on service links?

  • Distributed link tracking technology

If we can log each link node during the invocation process of a request and eventually centrally visualize the log, then we can hope to monitor some indicators in the invocation link such as, which service instance does the request arrive at? What is the status of the request being processed? What is the processing time? These can all be analyzed…

The monitoring technology based on this idea in distributed environment is distributed link tracing (full link tracing).

  • Distributed link tracking solutions on the market

Distributed link tracking technology has been mature, there are many products at home and abroad, such as

  • Spring Cloud Sleuth + Twitter Zipkin
  • Alibaba’s Eagle Eye
  • “CAT” of Dianping
  • Meituan “Mtrace”
  • Jingdong “Hydra”
  • Sina’s “Watchman”
  • There’s also Apache Skywalking, which has been mentioned a lot lately.

Core idea of distributed link tracking technology

Nature: Logging, as a complete technology, distributed link tracing also has its own theory and concept. In the microservice architecture, the invocation chain for request processing can be represented as a tree, as shown belowThe figure above depicts a common invocation scenario

  • A request is routed through the gateway service to the downstream microservice-1
  • Microservice-1 then calls microservice-2
  • Call microservice-3 after you get the results
  • Finally, the results of microservice-2 and microservice-3 are combined
  • It is returned to the user through the gateway

In order to trace the entire call link, you definitely need to log, and logging is the foundation, and there are definitely some theoretical concepts on top of that, The current mainstream Distributed link tracking technology/system is based on the concept of a paper “Dapper, a large-scale Distributed Systems TracingInfrastructure” by Google. What are the core concepts involved in this paper? Let’s take a look at the previous service invocation

The preceding figure shows a request link. A link is uniquely identified by TraceId. A SPAN identifies the request information

Trace: The tracing unit of service tracing is a process from the time when a customer sends a request and arrives at the boundary of the tracked system until the tracked system returns a response to the customer

Trace ID: To achieve request tracing, when a request is sent to the entry endpoint of a distributed system, the service tracing framework only needs to create a unique Trace ID for the request, and the framework retains this unique Trace ID as it flows through the distributed system until it is returned to the requester

A Trace is composed of one or more spans, and each Span has a SpanId in which TraceId is recorded. At the same time, there is also a SpanId named ParentId pointing to the SpanId of another Span, indicating the parent-child relationship, which essentially expresses the dependency relationship

Span ID: In order to count the time delay of each processing unit, when a request arrives at each service component, it is also marked with a unique Span ID to mark its start, process, and end. For each Span, it must have two start and end nodes. By recording the timestamp of the start Span and the end Span, the time delay for that Span can be calculated. In addition to the timestamp record, it can also contain some other metadata, such as the time name, request information, etc.

Each Span has a unique trace identifier Span ID, and several ordered spans form a trace.

Span can be considered as a log data structure, which records some log information at some special time points, such as timestamp, spanId, TraceId, parentId, etc. Another concept is abstracted from Span, called event, and the core event is as follows:

  • CS: Client send/start A client/consumer sends a request that describes the start of a span
  • SR: Server received/start The sr-CS is the network delay of sending requests
  • SS: Server send/ Finish The SS-SR indicates the time consumed by the server
  • CR: Client received/ FINISHED Cr-SS Indicates the response time (network delay of the response).

Spring Cloud Sleuth (Tracing service framework) can track the invocation between services. Sleuth can record which services a service request goes through and the service processing duration, etc. Based on these, we can clarify the invocation relationship between various micro-services and conduct problem tracking analysis.

  • Time analysis: Use Sleuth to understand the time of sampling requests and analyze service performance issues (which service calls are time-consuming)
  • Link optimization: Sleuth is used to record trace data through logging, such as discovering frequently invoked services and targeted optimization

Note: We often use Spring Cloud Sleuth and Zipkin together, send Sleuth data information to Zipkin for aggregation, and use Zipkin to store and display data

Sleuth + Zipkin

1) Dependency coordinates are introduced for every microservice project that needs to be tracked

<! -- Link Tracing -->
<dependency>
 <groupId>org.springframework.cloud</groupId>
 <artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Copy the code

2) Each microservice modifies the application.yml configuration file to add a log level

# Distributed link tracing
logging:
  level:
    org.springframework.web.servlet.DispatcherServlet: debug
    org.springframework.cloud.sleuth: debug
Copy the code

At this point, our SLEUTH is configured, and when a request comes in, we can observe the logs (global TraceId, SpanId, etc.) of the SLEUTH output at the console.Such logs are not easy to read and observe at first. In addition, logs are scattered on various microservice servers. Next, we use Zipkin to uniformly aggregate track logs and store and display them.

3) Display tracking data with Zipkin

Zipkin consists of Zipkin Server and Zipkin Client. Zipkin Server is an independent service, and Zipkin Client is a specific micro-service

1. Create Zipkin Server

pom.xml

<! -- Zipkin-server dependencies -->
<dependency>
    <groupId>io.zipkin.java</groupId>
    <artifactId>zipkin-server</artifactId>
    <version>2.12.3</version>
    <exclusions>
        <! Log4j2 = log4j2 = log4j2 = log4j2 = log4j2 = log4j2
        <exclusion>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-log4j2</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<! -- Zipkin-server UI depends on coordinates -->
<dependency>
    <groupId>io.zipkin.java</groupId>
    <artifactId>zipkin-autoconfigure-ui</artifactId>
    <version>2.12.3</version>
</dependency>
Copy the code

Entry start class

@SpringBootApplication
@EnableZipkinServer
public class ZipkinServerApplication9411 {
    public static void main(String[] args) {
        SpringApplication.run(ZipkinServerApplication9411.class,args);
    }

    // Inject the transaction controller
    @Bean
    public PlatformTransactionManager transactionManager(DataSource dataSource) {
        return newDataSourceTransactionManager(dataSource); }}Copy the code

application.yml

Server: port: 9411 Management: metrics: Web: server: auto-time-requests: falseCopy the code

2. Zipkin Client build (modified in specific microservices)

Add zipkin dependencies to POM

<dependency>
 <groupId>org.springframework.cloud</groupId>
 <artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Copy the code

Add a reference to Zipkin Server in application.yml

3. ZipKin service parsing

Use the postman call request at http://localhost:9002/autodeliver/checkState/1545133

Then open the zipkin http://localhost:9411/zipkin/ view the relevant data

Click on the details of each request to take you to the details page, which contains the complete invocation request

4 trace data persisted by Zipkin to mysql

In our current Zipkin service, the tracking data is stored in the cache. Once the service is restarted, the data will be lost. So far, data persistence is required

1) Prepare database scripts

Mysql > create table (zipkin)

2) Introduce POM files

<! Mysql persistence dependency -->
<dependency>
    <groupId>io.zipkin.java</groupId>
    <artifactId>zipkin-autoconfigure-storage-mysql</artifactId>
    <version>2.12.3</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid-spring-boot-starter</artifactId>
    <version>1.1.10</version>
</dependency>
<! -- Transaction control required to operate database -->
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-tx</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-jdbc</artifactId>
</dependency>
Copy the code

3) Modify the configuration file and add database information

server:
  port: 9411
management:
  metrics:
    web:
      server:
        auto-time-requests: false # Disable automatic detection
spring:
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://localhost:3306/zipkin? useUnicode=true&characterEncoding=utf-8&useSSL=false&allowMultiQueries=true&serverTimezone=GMT%2B8
    username: root
    password: root
    druid:
      initialSize: 10
      minIdle: 10
      maxActive: 30
      maxWait: 50000
# specify mysql as the zipkin persistence medium
zipkin:
  storage:
    type: mysql
Copy the code

4) Start the class to inject the transaction manager

// Inject the transaction controller
@Bean
public PlatformTransactionManager transactionManager(DataSource dataSource) {
    return new DataSourceTransactionManager(dataSource);
}
Copy the code

At this point, the data in Zipkin is persisted to the database.