What is Zipkin? Zipkin distributed tracking system; It can help collect time data and solve latency problems under microService architecture; It manages the collection and lookup of this data; Zipkin’s design is based on Google’s Google Dapper paper. Each application reports timing data to Zipkin, and the Zipkin UI presents a dependency chart showing how many trace requests go through each application; If you want to address latency, you can filter or sort all trace requests, and you can view each trace request as a percentage of the total trace time.

As the business becomes more and more complex, the system will be split. Especially with the rise of microservice architecture and container technology, a seemingly simple application may be supported by dozens or even hundreds of services in the background. A front-end request may require multiple service calls to complete; When the request becomes slow or unavailable, it is impossible to know which background service is caused. At this time, it is necessary to solve how to quickly locate the service failure point. Zipkin distributed tracking system can solve such problems well.

IO/zipkin.io/ zipkin.io/ zipkin.io/ There are three official ways to boot, here is the second way to boot:

curl -sSL https://zipkin.io/quickstart.sh | bash -s
java -jar zipkin.jar
Copy the code

You can open a browser and visit http://ip:9411.

Zipkin architecture

The architecture diagram is as follows:



As shown in the figure above, when each business system calls each other, it will transmit specific tracking messages to Zipkin, which will aggregate, process, store and display the tracking information collected by Zipkin. Users can conveniently obtain network delay, call link, system dependence and so on through Web UI.

The component Reporter uses to send data to Zipkin in an application is called a Report and is intended to track data collection. Instrumented Client and Instrumented Server are two applications that use the Trace tool in the distributed architecture. The Client invokes services provided by the Server, and both applications report Trace information to Zipkin. After Trace information is reported by the Client and Server through Transport, it is received by Zipkin’s Collector module and stored in the Storage media by Zipkin’s Storage module. Zipkin then provides API for UI to query Trace Trace information. Non-instrumented Server: a Server that does not use the Trace tool does not report Trace information.

Transport A method of transmitting data, such as HTTP, which can be replaced with Kafka or other message queues for high concurrency.

Zipkin consists of four modules

  • Collector Receives or collects data transmitted by each application
  • Storage Stores received or collected data in Memory, MySQL, Cassandra, ElasticSearch, etc. By default, the data is stored in Memory.
  • API (Query) Queries data stored in Storage and provides simple JSON API to obtain data, which is mainly used by the Web UI.

The basic concepts of Zipkin

  • Span: The basic unit of work, a link call (RPC, DB, etc. No specific restrictions) creates a span, identifies it with a 64-bit ID, and a SPAN passes other data, such as description information, timestamp, key-value tag information, parent-ID, etc. Parent-id can indicate the link source of the span call. Generally speaking, the SPAN is a request information.
  • Trace: A Span collection similar to a tree structure, which represents a call link and has a unique identifier, namely, TraceId. Zipkin uses Trace structure to express the tracing of a request. A request may be processed by several services in the background. Each service is processed by a Span, and there are dependencies between spans.
  • Annotation: An Annotation used to record information related to a particular event requested (such as time), usually containing four Annotation information

Cs-client Start: indicates that the Client initiates a request. Sr-server Receive: indicates that the Server receives the request. Ss-server Send: indicates that the Server completes processing and sends the result to the Client. How to calculate the delay of the call after the client gets the record of the link call returned by the server, This requires using Annotation information — sr-cs gets the request sending delay — ss-sr gets the server processing delay — cr-cs gets the entire link completion delay BinaryAnnotation: provides some additional information, usually as key-value pairs. Simple example:

Compile group: 'IO. Zipkin.brave ', name: 'brave', version: '5.6.0' compile group:' IO. Zipkin.reporter2 ', name: 'zipkin-sender-okhttp3', version: '2.7.13' compile group: 'IO. Zipkin. brave', name: 'brave-context-log4j2', version: '5.6.0 import brave. Span; import brave.Tracer; import brave.Tracing; import brave.context.log4j2.ThreadContextScopeDecorator; import brave.propagation.B3Propagation; import brave.propagation.ExtraFieldPropagation; import brave.propagation.ThreadLocalCurrentTraceContext; import zipkin2.codec.SpanBytesEncoder; import zipkin2.reporter.AsyncReporter; import zipkin2.reporter.Sender; import zipkin2.reporter.okhttp3.OkHttpSender; import java.util.concurrent.TimeUnit; public class TraceDemo { public static void main(String[] args) { Sender sender = OkHttpSender.create("http://localhost:9411/api/v2/spans"); AsyncReporter asyncReporter = AsyncReporter.builder(sender) .closeTimeout(1000, TimeUnit.MILLISECONDS) .build(SpanBytesEncoder.JSON_V2); Tracing tracing = Tracing.newBuilder() .localServiceName("tracer-demo") .spanReporter(asyncReporter) // .propagationFactory(ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "user-name")) .currentTraceContext(ThreadLocalCurrentTraceContext.newBuilder().addScopeDecorator(ThreadContextScopeDecorator.create()) .build())// // puts trace IDs into logs .build(); Tracer tracer = tracing.tracer(); Span span = tracer.newTrace().name("encode").start(); try { doSomethingExpensive(); } finally { span.finish(); } Span twoPhase = tracer.newTrace().name("twoPhase").start(); try { Span prepare = tracer.newChild(twoPhase.context()).name("prepare").start(); try { prepare(); } finally { prepare.finish(); } Span commit = tracer.newChild(twoPhase.context()).name("commit").start(); try { commit(); } finally { commit.finish(); } } finally { twoPhase.finish(); } sleep(1000); } private static void doSomethingExpensive() { sleep(500); } private static void commit() { sleep(500); } private static void prepare() { sleep(500); } private static void sleep(long milliseconds) { try { TimeUnit.MILLISECONDS.sleep(milliseconds); } catch (InterruptedException e) { e.printStackTrace(); }}}Copy the code