How do I encounter a write bottleneck in monitoring Log4j2 asynchronous logs
In a previous article, we looked in detail at the classic Log4j2 asynchronous log blocking problem, which is caused by slow log file writes. And more in-depth analysis of Log4j2 asynchronous log principle, finally give some solutions.
New question – how best to deal with this situation?
The proposed solution is just an optimization of the previously identified problem, but as the business grows, the volume of logs will definitely increase, and the large number of logs may cause writing to become a new performance bottleneck. In this case, we need to monitor.
The first thing that comes to mind is the process external acquisition system metrics monitoring: now services are advocated in the cloud and cloud native services are implemented. For cloud services, Network File System (NFS) is most likely used to store logs, such as EFS of AWS. The NFS movement is to dynamically control the IO maximum load, but the service is difficult to estimate the growth of perfect, business flow and high concurrency basic it is moment to arrive, only through the IO timing acquisition is difficult to assess the real traffic peak (such as IO timing acquisition is a 5 s, but suddenly in a seconds to reach a lot of traffic, This causes a large number of threads in the process to block. After this, we collect IO and see that the IO pressure does not seem to be much. Also, because threads are blocking, we might see that the CPU usage is not high. Therefore, it is difficult to locate log traffic problems when external indicators are collected periodically.
Then we consider process monitoring itself, exposing the interface to external monitoring for periodic checks, such as K8s’s POD health check, and so on. Add a new instance when the process log is overloaded. Once the startup is complete, set the log-heavy process’s status to temporarily offline in the registry (for example, OUT_OF_SERVICE for Eureka and PAUSED for Nacos) and let traffic be forwarded to other instances. When the log pressure becomes low, change the status to UP to continue the service.
So how does this monitoring work?
Monitoring the Log4j2 asynchronous log core – Monitoring the RingBuffer
According to the previous analysis of the principle of Log4j2 asynchronous log, we know that its core is the RingBuffer data structure as a cache. We can monitor the change in the remaining size to determine the current log pressure. So how do you get it?
Log4j2 Relationship between asynchronous logs and ringbuffers
Log4j2 creates a separate RingBuffer for each AsyncLogger configuration, such as Log4j2:
<! <loggers> <! --default logger --> <Asyncroot level="info" includeLocation="true"> <appender-ref ref="console"/> </Asyncroot> <AsyncLogger name="RocketmqClient" level="error" additivity="false" includeLocation="true"> <appender-ref ref="console"/> </AsyncLogger> <AsyncLogger name="com.alibaba.druid.pool.DruidDataSourceStatLoggerImpl" level="error" additivity="false" includeLocation="true"> <appender-ref ref="console"/> </AsyncLogger> <AsyncLogger name="org.mybatis" level="error" additivity="false" includeLocation="true"> <appender-ref ref="console"/> </AsyncLogger> </loggers>Copy the code
This configuration contains four AsyncLogger AsyncLogger and creates a RingBuffer for each AsyncLogger. Log4j2 also allows for monitoring AsyncLogger, so it exposes the AsyncLogger’s monitoring as an MBean (JMX Managed Bean).
The relevant source code is as follows:
Server.java
private static void registerLoggerConfigs(final LoggerContext ctx, final MBeanServer mbs, final Executor executor) throws InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException {/ / obtain log4j2. Loggers in the XML configuration under the label of all configuration values final Map < String, LoggerConfig> map = ctx.getConfiguration().getLoggers(); For (final String name: map.keyset ()) {final LoggerConfig CFG = map.get(name); final LoggerConfigAdmin mbean = new LoggerConfigAdmin(ctx, cfg); // Register a LoggerConfigAdmin register(MBS, MBean, mBean.getobjectName ()) for each Logger; // If asynchronous logging is configured, If (CFG instanceof AsyncLoggerConfig) {final AsyncLoggerConfig Async = (AsyncLoggerConfig) CFG; final RingBufferAdmin rbmbean = async.createRingBufferAdmin(ctx.getName()); register(mbs, rbmbean, rbmbean.getObjectName()); }}}Copy the code
Create MBean class source: ringBufferAdmin.java
public class RingBufferAdmin implements RingBufferAdminMBean { private final RingBuffer<? > ringBuffer; private final ObjectName objectName; / /... Omit the other code we don't care about the public static final String DOMAIN = "org. Apache. Logging. Log4j2"; String PATTERN_ASYNC_LOGGER_CONFIG = DOMAIN + ":type=%s,component=Loggers,name=%s,subtype=RingBuffer"; Public static RingBufferAdmin forAsyncLoggerConfig(final RingBuffer<? > ringBuffer, final String contextName, final String configName) { final String ctxName = Server.escape(contextName); // For RootLogger, cfgName is an empty String. Final String cfgName = server.escape (configName); final String name = String.format(PATTERN_ASYNC_LOGGER_CONFIG, ctxName, cfgName); return new RingBufferAdmin(ringBuffer, name); } @override public long getBufferSize() {return RingBuffer == null? 0 : ringBuffer.getBufferSize(); } @override public long getRemainingCapacity() {return RingBuffer == null? 0 : ringBuffer.remainingCapacity(); } public ObjectName getObjectName() { return objectName; }}Copy the code
We can view the corresponding MBean via JConsole:
2f0E140b is the name of LoggerContext.
Spring Boot + Prometheus Monitors the Log4j2 RingBuffer size
We use Spring Boot in our microservices project and integrate Prometheus. We can expose Prometheus by using the Log4j2 RingBuffer size as an indicator, with the following code:
import io.micrometer.core.instrument.Gauge; import io.micrometer.prometheus.PrometheusMeterRegistry; import lombok.extern.log4j.Log4j2; import org.apache.commons.lang3.StringUtils; import org.apache.logging.log4j.LogManager; import org.apache.logging.log4j.core.LoggerContext; import org.apache.logging.log4j.core.jmx.RingBufferAdminMBean; import org.springframework.beans.factory.ObjectProvider; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.actuate.autoconfigure.metrics.export.ConditionalOnEnabledMetricsExport; import org.springframework.context.annotation.Configuration; import org.springframework.context.event.ContextRefreshedEvent; import org.springframework.context.event.EventListener; import javax.annotation.PostConstruct; import javax.management.ObjectName; import java.lang.management.ManagementFactory; @log4j2@configuration (proxyBeanMethods = false) // Load only when Prometheus is introduced and the INTERFACE that operates as Prometheus is exposed @ConditionalOnEnabledMetricsExport("prometheus") public class Log4j2Configuration { @Autowired private ObjectProvider<PrometheusMeterRegistry> meterRegistry; Private volatile Boolean isInitialized = false; // The log configuration is initialized before the ApplicationContext is loaded // But Prometheus' related beans are complicated to load. // ApplicationContext may refresh multiple times, such as calling /refresh /, when the ApplicationContext is used as the source of the actuator. // For simplicity, use a simple isInitialized to determine if the initialization is the first time. Ensure initialized only once @ EventListener (ContextRefreshedEvent. Class) public synchronized void init () {if (! // Get LoggerContext from LogManager, LoggerContext = (LoggerContext) logManager.getContext (false); org.apache.logging.log4j.core.config.Configuration configuration = loggerContext.getConfiguration(); // Get the name of the LoggerContext, because the name of the Mbean contains this String ctxName = loggerContext.getName(); Configuration.getloggers ().keyset ().foreach (k -> {try {// For RootLogger, its cfgName is an empty string. We named it root String cfgName = stringutils.isblank (k) in Prometheus? "" : k; String gaugeName = StringUtils.isBlank(k) ? "root" : k; Gauge.builder(gaugeName + "_logger_ring_buffer_remaining_capacity", () - > {try {return (Number) ManagementFactory. GetPlatformMBeanServer (). The getAttribute (new ObjectName (/ / in accordance with the Log4j2 Source of naming the assembly name String. The format (RingBufferAdminMBean PATTERN_ASYNC_LOGGER_CONFIG, ctxName, cfgName) / / get the rest of the size, Note that this is strictly case sensitive), "RemainingCapacity"); } catch (Exception e) { log.error("get {} ring buffer remaining size error", k, e); } return -1; }).register(meterRegistry.getIfAvailable()); } catch (Exception e) { log.error("Log4j2Configuration-init error: {}", e.getMessage(), e); }}); isInitialized = true; }}}Copy the code
Add this code, and after requesting /actuator/ Prometheus, you can see the corresponding return:
# HELP root_logger_ring_buffer_remaining_capacity # TYPE root_logger_ring_buffer_remaining_capacity Gauge Root_logger_ring_buffer_remaining_capacity 262144.0 # HELP org_mybatis_logger_ring_BUFFer_remaining_capacity # TYPE Org_mybatis_logger_ring_buffer_remaining_capacity Gauge ORG_mybatis_logger_ring_BUFFer_remaining_capacity 262144.0 # HELP com_alibaba_druid_pool_DruidDataSourceStatLoggerImpl_logger_ring_buffer_remaining_capacity # TYPE com_alibaba_druid_pool_DruidDataSourceStatLoggerImpl_logger_ring_buffer_remaining_capacity gauge Com_alibaba_druid_pool_DruidDataSourceStatLoggerImpl_logger_ring_buffer_remaining_capacity # 262144.0 HELP RocketmqClient_logger_ring_buffer_remaining_capacity # TYPE RocketmqClient_logger_ring_buffer_remaining_capacity gauge RocketmqClient_logger_ring_buffer_remaining_capacity 262144.0Copy the code
Thus, when this value is 0 for a period of time (indicating that the RingBuffer is full and that log generation is faster than consuming appenders), we consider the application log load to be excessive.