The background,
The most common method for troubleshooting system faults is to view system logs. In distributed systems, traceId is used as the link log tracing.
<property name="pattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS} [${applicationName},%X{X-B3-TraceId:-}] [%thread] %-5level [%uid] [%C:%L] %msg%n"/>
Copy the code
The values of traceId and X-B3-TracEID are the same in the MDC. However, because MDC uses ThreadLocal internally, only the local thread is valid, and values in child threads and downstream service MDC are lost. Therefore, the main difficulty of the scheme is to solve the problem of value transfer, which mainly includes the following parts:
- How do asynchronous cases (thread pools) pass traceId to child threads
- How is an asynchronous service call (RPC) delivered
This paper will focus on the above two problems, to provide solutions.
Second, solutions
1. Extend thread pools
1.1 toThreadLocalThreadPoolExecutor
For extension, the implementation of the parent threadMDC
Content is copied to child threads.
public class LogThreadLocalThreadPoolExecutor extends ThreadLocalThreadPoolExecutor {...@Override
public void execute(Runnable command) {
super.execute(new MDCRunnable() {
@Override
protected void innerRun(a) { command.run(); }}); }public static abstract class MDCRunnable implements Runnable {
private Map<String, String> copyOfContextMap = MDC.getCopyOfContextMap();
@Override
public void run(a) {
if(copyOfContextMap ! =null) {
MDC.setContextMap(copyOfContextMap);
}
try {
innerRun();
} finally {
if(MapUtils.isNotEmpty(copyOfContextMap)) { MDC.clear(); }}}protected abstract void innerRun(a); }}Copy the code
1.2 Alibaba TTL Transmission traceId Problem
In this project, we use TtlRunnable to package the first layer. Because the MDC implementation of Logback uses ThreadLocal to transmit child threads, we need to rewrite ali’s TransmittableThreadLocal. Note: While InheritableThreadLocal can be passed to child threads, the TransmittableThreadLocal provides a solution for thread reuse in the thread pool, passing ThreadLocal from the current thread to other threads in the thread pool.
The requirement scenario for ThreadLocal is the latent requirement scenario for TTL, and the TTL target scenario is if your business needs to “pass ThreadLocal using components that cache threads such as thread pools”
public class ThreadLocalThreadPoolExecutor extends ThreadPoolExecutor {...@Override
public void execute(Runnable command) {
super.execute(TtlRunnable.get(command)); }}Copy the code
If the LogbackMDCAdapter has any issues, replace ThreadLocalMap with TransmittableThreadLocal. Note: The package must be org.slf4j
Initialize the TtlMDCAdapter instance and replace the Adapter object in MDC. Auto-inject configuration spring.factories
org.springframework.context.ApplicationContextInitializer=\
com.moka.hcm.common.log.config.TtlMDCAdapterInitializer
Copy the code
public class TtlMDCAdapterInitializer implements ApplicationContextInitializer<ConfigurableApplicationContext> {
@Override
public void initialize(ConfigurableApplicationContext applicationContext) {
// Load the TtlMDCAdapter instanceTtlMDCAdapter.getInstance(); }}Copy the code
2. Asynchronous Feign calls
When other services are invoked through feign in the thread pool, hystrix sends requests through the thread pool, but ThreadLocal in MDC cannot pass child thread variables, causing inconsistent TraceIds.
2021-12-21 19:20:58.385 [50c8a6e474e8d858] [http-nio-8080-exec-5] INFO
2021-12-21 19:20:58.388 [50c8a6e474e8d858] [http-nio-8080-exec-5] INFO
2021-12-21 19:21:05.195 [50c8a6e474e8d858] [pool-15-thread-1] INFO
2021-12-21 19:21:05.219 [f85da8ad351a0ff3] [hystrix-9] INFO
2021-12-21 19:21:05.319 [50c8a6e474e8d858] [http-nio-8080-exec-5] INFO
Copy the code
Solutions:
public class ThreadLocalHystrixConcurrencyStrategy extends HystrixConcurrencyStrategy {
/** * Hystrix thread pool passes the MDC context */
@Override
public <T> Callable<T> wrapCallable(Callable<T> callable) {
Map<String, String> map = MDC.getCopyOfContextMap();
return() - > {try {
MDC.setContextMap(map);
return callable.call();
} finally{ MDC.clear(); }}; }}Copy the code
Take a look at the modified effect:
2021-12-23 14:52:32.714 [205932514ea19D87] [HTTP-NIO-8080-exec-1] INFO 2021-12-23 14:52:32.850 [205932514ea19d87] INFO 2021-12-23 14:52:32.850 [205932514ea19d87] [hystrix-2] INFO 2021-12-23 14:52:32.890 [205932514ea19d87] [HTTP-NIO-8080-exec-1] INFO 2021-12-23 14:52:32.892 [205932514ea19d87] [http-nio-8080-exec-1] INFO 2021-12-23 14:52:32.937 [205932514ea19d87] [HTTP-nio-8080-exec-1] INFO 2021-12-23 14:52:32.987 [205932514ea19D87] [pool-15-thread-2] INFO 2021-12-23 14:52:33.036 [205932514ea19d87] INFO 2021-12-23 14:52:33.036 [205932514ea19d87] [hystrix-3] INFO 2021-12-23 14:52:33.153 [205932514ea19d87] [HTTP-NIO-8080-exec-1] INFOCopy the code
3,Slf4jCurrentTraceContext
useInheritableThreadLocal
The problem
[InheritableThreadLocal] [InheritableThreadLocal] [InheritableThreadLocal] [InheritableThreadLocal] [InheritableThreadLocal] [InheritableThreadLocal] [InheritableThreadLocal] [InheritableThreadLocal]
In the thread pool state, when the thread is created for the first time, the inheritableThreadLocals data will be copied from the parent thread. Therefore, the variables set by the parent thread will be retrieved when the task starts. It does not trigger data operations on the inheritableThreadLocals in the parent thread, so even if a new traceId is set in the main thread, it does not take effect. The asynchronous thread traceId uses the old traceId.
So how to solve this problem? Solutions are as follows:
Create a custom DefaultTraceContext with three types of thread copies (ThreadLocal, InheritableThreadLocal, TransmittableThreadLocal). This project uses the TransmittableThreadLocal
/**
* DefaultTraceContext
*
* @author weiyilin
* @dateThe 2022-01-29 16:12:43 * /
public class DefaultTraceContext extends CurrentTraceContext {
static final ThreadLocal<TraceContext> DEFAULT = new ThreadLocal<>();
static final InheritableThreadLocal<TraceContext> INHERITABLE = new InheritableThreadLocal<>();
static final TransmittableThreadLocal<TraceContext> TRANSMITTABLE = new TransmittableThreadLocal<>();
final ThreadLocal<TraceContext> local;
/** * Uses a non-inheritable static thread local */
public static CurrentTraceContext create(a) {
return new DefaultTraceContext(DEFAULT);
}
/**
* Uses an inheritable static thread local which allows arbitrary calls to {@link
* Thread#start()} to automatically inherit this context. This feature is available as it is was
* the default in Brave 3, because some users couldn't control threads in their applications.
*
* <p>This can be a problem in scenarios such as thread pool expansion, leading to data being
* recorded in the wrong span, or spans with the wrong parent. If you are impacted by this,
* switch to {@link #create()}.
*/
public static CurrentTraceContext inheritable(a) {
return new DefaultTraceContext(INHERITABLE);
}
public static CurrentTraceContext transmittable(a) {
return new DefaultTraceContext(TRANSMITTABLE);
}
DefaultTraceContext(ThreadLocal<TraceContext> local) {
if (local == null) {
throw new NullPointerException("local == null");
}
this.local = local;
}
@Override
public TraceContext get(a) {
return local.get();
}
@Override
public CurrentTraceContext.Scope newScope(@Nullable TraceContext currentSpan) {
final TraceContext previous = local.get();
local.set(currentSpan);
class DefaultCurrentTraceContextScope implements CurrentTraceContext.Scope {
@Override
public void close(a) { local.set(previous); }}return newDefaultCurrentTraceContextScope(); }}Copy the code
Inject bean objects
/**
* TraceConfiguration
*
* @author weiyilin
* @dateThe 2022-01-29 16:10:16 * /
@Configuration
public class TraceConfiguration {
@Bean
public CurrentTraceContext slf4jSpanLogger(a) {
returnSlf4jCurrentTraceContext.create(DefaultTraceContext.transmittable()); }}Copy the code
Reference documentation
- transmittable-thread-local