- Monitor Spring Boot Microservices
- By Tanmay Ambre
- The Nuggets translation Project
- Permanent link to this article: github.com/xitu/gold-m…
- Translator: YueYongDev
- Proofreader: Liang2028, Niayyy
SpringBoot microservice monitoring
Use Micrometer, Prometheus, and Grafana to build comprehensive monitoring capabilities for Spring Boot microservices
introduce
Observability in monitoring, logging, tracing, and alerting is an important architectural concern when using microservices and event-driven architecture (EDA), primarily because:
- Large-scale deployments require centralized and automated monitoring and observability
- The asynchronous and distributed nature of the architecture makes it difficult to correlate metrics generated by multiple components
Resolving this architectural issue simplifies architecture management and speeds up turnaround time to resolve runtime problems. It can also provide insights that help make smart architecture, design, deployment, and infrastructure improvements to the non-functional features of the platform. In addition, the output, collection, and visualization of custom metrics can bring other useful information to a business or operation.
However, in practical architectural applications, this problem is often overlooked. This tutorial is believed to be a best practice guide for monitoring the observability of Java and Spring Boot microservices using open source tools such as Micrometer, Prometheus, and Grafana.
A prerequisite for
Before you start this tutorial, you need to set up the following environment:
- Docker environment with Docker Compose tool
- A Java IDE for cloning and editing code in git repo
Estimated time
This tutorial will take approximately 2 hours to complete.
Summary of monitoring
The main objectives of the monitoring tool are:
- Monitor application performance
- Self-service for stakeholders (development teams, infrastructure teams, operations users, maintenance teams, and business users).
- Assist in rapid problem traceability analysis (RCA)
- Establish a performance baseline for your application
- If cloud services are used, provide the ability to monitor the cost of cloud service usage and monitor different cloud services in an integrated manner
Monitoring is mainly reflected in the following four types of behaviors:
- Application indexing – The metrics that result from indexing an application are important for monitoring application and maintenance teams as well as business users. There are many non-invasive ways to measure metrics, the most popular being “bytecode detection,” “section-oriented programming,” and “JMX.”
- Metrics collection – Metrics are collected from the application and persisted to the appropriate repository. The repository then needs to provide a way to query and summarize the data to visualize it. Popular collectors are Prometheus, StatsD, and DataDaog. Most metric collection tools are time series repositories and provide advanced query capabilities.
- Index visualization – Visualization tool index query library, build view and dashboard for end user use. They provide rich user interfaces to perform various operations on metrics, such as aggregation, data digging, and so on.
- Alarms and notifications – When indicators exceed defined thresholds (for example, the CPU exceeds 80% and lasts for 10 minutes), manual intervention may be required. For this, alarms and notifications are important. Most visualization tools provide alarm and notification capabilities.
Many open source and commercial products are available for monitoring. Some notable commercial products are AppDynamics, Dynatrace, DataDog, LogDNA, and SysDIG. Open source tools are often used in combination. Some very popular combinations are Prometheus and Grafana, Elastic-Logstuck-Kibana (ELK), and StatsD + Graphite.
Microservices Monitoring Guide
We encourage consistency in the types of metrics collected across all microservices. This helps improve the reusability of monitoring dashboards and simplifies the aggregation and drill-down of metrics to visualize them at different levels.
What to monitor
Microservices expose an API and/or consume events and messages. During processing, it may invoke its own business components, such as connecting to a database, calling technical services (caching, auditing, etc.), calling other microservices, and/or sending events and messages. Monitoring indicators during these different processing phases is beneficial because it helps to provide a summary analysis of performance and anomalies. This, in turn, facilitates rapid problem analysis.
Common metrics related to event-driven Architecture (EDA) and microservices include:
-
Resource utilization indicators
-
Resource utilization – CPU, memory, disk utilization, network utilization, etc
-
JVM heap and GC metrics — GC overhead, GC time, heap (and its different regions) utilization
-
JVM thread utilization – blocked, runnable, wait connection usage time
-
-
Application metrics
Availability, latency, throughput, status, exceptions, etc. of different architectural layers of microservices, for example:
-
Controller layer — for HTTP/REST method calls
-
Service layer – for method calls
-
Data access layer – for method calls
-
Integration layer – for RPC calls, HTTP/REST/API calls, message publishing, message consumption
-
-
Technical service utilization index (specific to the corresponding technical service)
- Cache – Cache hit rate, loss rate, write rate, clean rate, read rate
- Log – Number of log events per log level
- Connection pool – Connection pool usage, connection wait time, connection creation time, number of empty idle connections
-
Middleware metrics
- Event Broker metrics — Availability, message throughput, byte throughput, consumption lag, (de) serialization exceptions, cluster status
- Database Specifications
For application metrics, ideally the entry and exit points of each architectural layer in microservices should be examined.
Key indicator characteristics of microservices
The following three characteristics of metrics are important when monitoring microservices:
- The dimension
- Time series/rate summary
- Indicators point of view
The dimension
Dimensions control how a metric is aggregated and how deep a particular metric goes. This is done by adding a label to a metric. A label is a set of key-value pair information (such as name-value). Tags are used to qualify metrics to be retrieved or aggregated through queries to the monitoring system. Because of the large number of deployments, it is an important feature for monitoring microservices. In other words, multiple microservices (or even different components of a microservice) send metrics with the same name. In order to distinguish between them, you need to define the indices in terms of dimensions.
For example, for the http_server_requests_SECONds_count metric. If there are multiple API nodes (as is the case in the microservice ecosystem), the aggregate value of this metric can only be viewed at the platform level without dimensionality. The specific distribution of this metric among different API nodes was not available. When sending metrics, add a URI label to the metrics to get the distribution. Take a look at the following example, which illustrates this feature.
If http_server_requestS_SECONds_count produces metric data with the following tag:
http_server_requests_seconds_count{appName="samplemicrosvc",env="local",exception="None",instanceId="1",method="GET",out Last come = "SUCCESS", the status = "200", uri = "/ addressDetails / {addressId}", 67.0} http_server_requests_seconds_count{appName="samplemicrosvc",env="local",exception="InternalServerError",instanceId="1",m Ethod = "GET", outcome = "SERVER_ERROR", status = "500", uri = "/ the userInfo / {username}", 39.0} http_server_requests_seconds_count{appName="samplemicrosvc",env="local",exception="None",instanceId="1",method="GET",out Last come = "SUCCESS", the status = "200", uri = "/ the userInfo / {username}", 67.0} http_server_requests_seconds_count{appName="samplemicrosvc",env="local",exception="IllegalArgumentException",instanceId= "1", the method = "GET", outcome = "SERVER_ERROR", status = "500", uri = "/ addressDetails / {addressId}", 13.0} http_server_requests_seconds_count{appName="samplemicrosvc",env="local",exception="IllegalStateException",instanceId="1" , method = "GET", outcome = "SERVER_ERROR", status = "500", uri = "/ addressDetails / {addressId}", 26.0}Copy the code
The http_server_requestS_SECONds_count indicator can be aggregated at the appName and instanceId levels based on the HTTP response status or result. The query statement is as follows:
# Count distribution by status for a given environment
sum by (status) (http_server_requests_seconds_count{env="$env"})
# Count distribution by uri and status for a given environment
sum by (uri, status) (http_server_requests_seconds_count{env="$env"})
# Count distribution by uri, status and appName for a given environment
sum by (uri, status, appName) (http_server_requests_seconds_count{env="$env"})
Copy the code
Tags can also be used as query criteria. Notice the use of the env tag, where $env is the variable the Grafana dashboard uses for the user to enter “environment.”
Time series/rate aggregation
The ability to aggregate metrics over time is important for performance analysis of applications, such as associating performance with load patterns, building daily/weekly/monthly performance profiles, and creating performance baselines for applications.
Index perspective
This is a derived feature and provides the ability to group metrics together for easy visualization and use. Such as:
- A dashboard that describes the availability status of all microservices on the platform
- A down (detailed) view of each microservice to see detailed metrics for the microservice
- Cluster views and detailed views of middleware components, such as Event Broker
Check the Spring Boot microservice
This section covers the detection of microservices and their REST controllers, service beans, component beans, and data access objects. Some of the components related to EDA or integration, such as producer and consumer in Kafka, Spring-Cloud-Stream, or Camel routing in Apache Camel, were also introduced.
In order to help the monitoring and management of micro-services, Spring Boot Actuator service is used here. This is a third-party component that uses multiple HTTP and JMX nodes to monitor applications out of the box, enabling basic monitoring of microservices’ health, bean information, application information, and environment information.
In order to enable this feature, we need to add spring-boot-starter-actuator as the app’s dependencies:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Copy the code
Operating indicators out of the box
Add Spring Boot Actuator to microservices, and the following parameters can be used directly:
- JVM metrics (related to GC and thread utilization)
- Resource utilization metrics (CPU, threads, file descriptors, JVM heap, and garbage collection metrics)
- Kafka Consumer Metrics
- Log indicators (Log4j2 and Logback)
- Availability indicator (process uptime)
- Cache metrics (Caffeine, EhCache2, Hazelcast or any JSR-107 compliant cache)
- Tomcat indicators
- Spring Integration Metrics
You can define indicator nodes
The Actuator also creates custom nodes for metrics. By default, it is stored in /actuator/metrics. Need to be exposed through Spring configuration. Here is an example configuration:
management:
endpoints:
web:
exposure:
include:
[
"health"."info"."metrics"."prometheus"."bindings"."beans"."env"."loggers"."streamsbindings",]Copy the code
Micrometer
In order to integrate with measurement tools, Spring Boot Actuator provides automatic configuration for Micrometers. Micrometer provides a facade for a number of monitoring systems, including Prometheus. This tutorial assumes that you have some understanding of Micrometer concepts. Micrometer provides three mechanisms for collecting metrics:
- Counters (Counter) – usually used for count occurrences, method executions, exceptions, and so on
- Timers — measures duration and frequency of occurrence; Usually used to measure delay
- Gauge — a single point of time measurement; For example, the number of threads
With the integration of Prometheus
Because Prometheus uses polling to collect metrics, integrating Prometheus and Micrometer is a relatively simple two-step process.
- add
micrometer-registry-prometheus
Registration. - Declare a
MeterRegistryCustomizer<PrometheusMeterRegistry>
Type of bean.
This is an optional operation. However, I recommend you do this because it provides a custom MeterRegistry. This is useful for ** generic labels (dimensions) ** declared through metrics data collected by Micrometer. This is especially helpful when there are many microservices or multiple instances of each microservice, and common tags include applicationName, instanceName, and Environment. This allows you to build visualizations of aggregated data across applications and instances, and to drill down to specific instances, applications, or environments.
After the configuration is complete, the Actuator exposes a node configured in/Actuator/Prometheus, which should be enabled in the Spring configuration. We then need to configure a job in Prometheus to get the data produced by that node at a specified frequency.
Add a Prometheus dependency to the POM
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
Copy the code
Custom MetricsRegistry
Using MetricsRegistryCustomizer statement configuration class can be used as part of the framework, so that all micro service implementation can reuse it. You can use system/application properties as tags.
@Configuration
public class MicroSvcMeterRegistryConfig {
@Value("${spring.application.name}")
String appName;
@Value("${env}")
String environment;
@Value("${instanceId}")
String instanceId;
@Bean
MeterRegistryCustomizer<PrometheusMeterRegistry> configureMetricsRegistry(a)
{
return registry -> registry.config().commonTags("appName", appName, "env", environment, "instanceId", instanceId);
}
Copy the code
Detection of application-level metrics
Some application-level metrics are out of the box, and in some cases, multiple metrics can be used. The following table summarizes these features:
indicators | The controller | Service layer component | Data access object | The business component | Technology components | Kafka consumers | Kafka producers | Spring Integration Components | The HTTP client | Camel routing |
---|---|---|---|---|---|---|---|---|---|---|
Resource utilization(CPU, thread, file descriptor, heap, GC) | Out-of-the-box microservice instance level | |||||||||
availability | Out-of-the-box microservice instance level | |||||||||
delay | Out of the box@Timed annotation |
This is done through spring-AOP’s custom reusable aspects | This is done through spring-AOP’s custom reusable aspects | This is done through spring-AOP’s custom reusable aspects | Out-of-the-box logging, caching, and JDBC connection pooling | If spring-cloud-stream is used, it is out of the box | This is done through custom MeterBinder beans | Out of the box | Out of the box | Provide partial support. Custom route meters are required. |
throughput | Out of the box@Timed annotation |
This is done through spring-AOP’s custom reusable aspects | This is done through spring-AOP’s custom reusable aspects | This is done through spring-AOP’s custom reusable aspects | Out-of-the-box logging, caching, and JDBC connection pooling | If spring-cloud-stream is used, it is out of the box | This is done through custom MeterBinder beans | Out of the box | Out of the box | Provide partial support. Custom route meters are required. |
exception | Out of the box@Timed annotation |
This is done through spring-AOP’s custom reusable aspects | This is done through spring-AOP’s custom reusable aspects | This is done through spring-AOP’s custom reusable aspects | Out-of-the-box logging, caching, and JDBC connection pooling | If spring-cloud-stream is used, it is out of the box | This is done through custom MeterBinder beans | Out of the box | Out of the box | Provide partial support. Custom route meters are required. |
Check the REST service controller
The fastest and easiest way to detect a REST controller is to use the @timed annotation flag on the controller or the controller’s various methods. The @timed annotation automatically adds the Exception, Method, Outcome, status, and URI labels to the timer. The @timed annotation can also add additional tags.
Examine the different architectural layers of microservices
Microservices typically have a Controller layer, a Service layer, a data access layer, and an Integration layer. The controller layer with the @timed annotation typically does not require any additional detection, while for the Service layer, data access layer, and integration layer, developers typically create custom beans using the @Service or @Component annotation. Metrics related to latency, throughput, and exceptions can provide important information for system analysis. These can be easily collected using Micrometer’s Timer and Counter. However, code needs to be examined to apply these metrics. This is where you need to use Spring-AOP to create reusable classes that detect services and components for use in all microservices. Using the @around and @AfterThrowing annotations eliminates the need to add any code generation suggestion metrics to the service/component classes and methods. The following is a reference guide:
-
Create reusable annotations to apply to different types of components/services. Such as @ MonitoredService, @ MonitoredDAO and @ MonitoredIntegrationComponent such custom annotations, respectively is added to the service, the data access object, and integration of components.
-
Define multiple pointcuts to apply recommendations to different types of components that contain the annotations described above.
-
Apply appropriate labels to indicators so that they can be analyzed or sliced in depth. For example, you can use the componentClass, componentType, methodName, and exceptionClass tags. Using these custom labels and public labels, metrics are produced as follows:
component_invocation_timer_count{env="local", instanceId="1", appName="samplemicrosvc", componentClass="SampleService", The pop3access.componenttype = "service", methodName = "getUserInformation} 26.0"Copy the code
See the following sample comments:
@Target({ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
public @interface MonitoredService {
}
Copy the code
The following sample code shows a simple reusable aspect for detecting service classes
@Configuration
@EnableAspectJAutoProxy
@Aspect
public class MonitoringAOPConfig {
@Autowired
MeterRegistry registry;
@Pointcut("@target(com.ibm.dip.microsvcengineering.framework.monitoring.MonitoredService) && within(com.ibm.dip.. *) ")
public void servicePointcut(a) {}@Around("servicePointcut()")
public Object serviceResponseTimeAdvice(ProceedingJoinPoint pjp) throws Throwable {
return monitorResponseTime(pjp, TAG_VALUE_SERVICE_TYPE);
}
@AfterThrowing(pointcut = "servicePointcut()", throwing = "ex")
public void serviceExceptionMonitoringAdvice(JoinPoint joinPoint, Exception ex)
{
monitorException(joinPoint, ex, TAG_VALUE_SERVICE_TYPE);
}
private Object monitorResponseTime(ProceedingJoinPoint pjp, String type) throws Throwable {
long start = System.currentTimeMillis();
Object obj = pjp.proceed();
pjp.getStaticPart();
long end = System.currentTimeMillis();
String serviceClass = getClassName(pjp.getThis().getClass().getName());
String methodName = pjp.getSignature().getName();
Timer timer = registry.timer(METER_COMPONENT_TIMER,
TAG_COMPONENTCLASS, serviceClass, TAG_METHODNAME, methodName, TAG_OUTCOME, SUCCESS, TAG_TYPE, type);
timer.record((end - start), TimeUnit.MILLISECONDS);
Counter successCounter = registry.counter(METER_COMPONENT_COUNTER,
TAG_COMPONENTCLASS, serviceClass, TAG_METHODNAME, methodName, TAG_OUTCOME, SUCCESS, TAG_TYPE, type);
successCounter.increment();
return obj;
}
private void monitorException(JoinPoint joinPoint, Exception ex, String type)
{ String serviceClass = getClassName(joinPoint.getThis().getClass().getName()); String methodName = joinPoint.getSignature().getName(); Counter failureCounter = registry.counter(METER_COMPONENT_EXCEPTION_COUNTER, TAG_EXCEPTIONCLASS, ex.getClass().getName(), TAG_COMPONENTCLASS, serviceClass, TAG_METHODNAME, methodName, TAG_OUTCOME, ERROR, TAG_TYPE, type); failureCounter.increment(); }}Copy the code
This abstracts all the detection logic in the microservice into a reusable set of facets and annotations. Microservice developers simply add annotations to their classes.
The use of the annotation instance as follows, by labeling the annotations on the SampleService class, all the methods in this class will be automatically as serviceResponseTimeAdvice and serviceExceptionMonitoringAdvice candidate.
@Service
@MonitoredService
public class SampleService {... }Copy the code
Detect outbound HTTP/REST calls
The detection of outbound HTTP/REST calls is performed by spring-actuator. However, for it to work properly, the RestTemplate should be obtained from a bean called RestTemplateBuilder. In addition, if provided the custom types of RestTemplateExchangeTagsProvider bean, the custom tags can be added to the index.
The following configuration classes illustrate this:
@Bean
public RestTemplate restTemplate(RestTemplateBuilder templateBuilder)
{
templateBuilder = templateBuilder.messageConverters(new MappingJackson2HttpMessageConverter())
.requestFactory(this::getClientHttpRequestFactory);
return templateBuilder.build();
}
@Bean
public RestTemplateExchangeTagsProvider restTemplateExchangeTagsProvider(a)
{
return new RestTemplateExchangeTagsProvider() {
@Override
public Iterable<Tag> getTags(String urlTemplate, HttpRequest request, ClientHttpResponse response) {
Tag uriTag = (StringUtils.hasText(urlTemplate) ? RestTemplateExchangeTags.uri(urlTemplate)
: RestTemplateExchangeTags.uri(request));
return Arrays.asList(RestTemplateExchangeTags.method(request), uriTag,
RestTemplateExchangeTags.status(response), RestTemplateExchangeTags.clientName(request),
Tag.of("componentClass"."httpClient"),
Tag.of("componentType"."integration"),
Tag.of("methodName", uriTag.getValue())); }}; }Copy the code
Detect Kafka consumers
Kafka Consumers is detected by default by the Actuator. The Actuator and Micrometer collected more than 30 metrics related to Kafka Consumers. Universal labels are also available for Kafka consumers. Some notable metrics include kafka_consumer_records_consumed_total_records_total, kafka_consumer_bytes_consumed_total_bytes_total and kafka_consumer_bytes_total Kafka_consumer_records_lag_avg_records. They can then be grouped by kafka-topics, kafka-partitions, and so on.
Detect Kafka producers
By default, the Actuator does not detect Kafka producers. Kafka Producer has its own Metrics implementation. To register these metrics using Micrometer, you need to set the value of each KafkaProducer
define a bean of type MeterBinder. The MeterBinder completes the creation and registration of Gauges (evacuation Gauges) through the Micrometer Registry. Using this method, more than 50 Kafka Producer metrics can be collected. Common labels and additional labels (during meter construction) provide multiple dimensions for these metrics.
The following code shows what a common MeterBinder implementation looks like:
public class KafkaProducerMonitor implements MeterBinder {
//Filter out metrics that don't produce a double
private Set<String> filterOutMetrics;
//Need to store the reference of the metric - else it might get garbage collected. KafkaMetric is a custom implementation that holds reference to the MetricName and KafkaProducer
private Set<KafkaMetric> bindedMetrics;
privateKafkaProducer<? ,? > kafkaProducer;private Iterable<Tag> tags;
public KafkaProducerMonitor(KafkaProducer kafkaProducer, MeterRegistry registry, Iterable<Tag> tags)
{... }@Override
public void bindTo(MeterRegistry registry) {
Map<MetricName, ? extends Metric> metrics = kafkaProducer.metrics();
if(MapUtils.isNotEmpty(metrics)) { metrics.keySet().stream().filter(metricName -> ! filterOutMetrics.contains(metricName.name())) .forEach(metricName -> { logger.debug("Registering Kafka Producer Metric: {}", metricName);
KafkaMetric metric = new KafkaMetric(metricName, kafkaProducer);
bindedMetrics.add(metric);
Gauge.builder("kafka-producer-"+ metricName.name(), metric, KafkaMetric::getMetricValue) .tags(tags) .register(registry); }); }}}Copy the code
Note: There are other third-party components that generate metrics but are not integrated with Micrometer. In this case, the pattern above can be used; One example is Apache Ignite.
Integrated Camel
If you need to integrate Apache Camel, you need to integrate and process Routes in your application. It also makes sense to capture metrics at the routing level. Camel provides endpoints for Micrometer through its Camel-Micrometer component. Adding camel-Micrometer dependencies to the APPLICATION’s POM enables the Micrometer endpoint to start or stop timers and increment counters. These can be used to collect routing level metrics. Other specific Camel beans, such as the org. Apache. Camel. The Processor type, you can use the AOP methods described above.
To enable the Micrometer service, add camel-Micrometer dependency as follows:
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-micrometer</artifactId>
</dependency>
Copy the code
To publish routing metrics, RouteBuilder should send a message to Micrometer with the following code:
@Override
public void configure(a) throws Exception {
from(inputEndpoints).
routeId(routeId).
to("micrometer:timer:route_timer" + "?" + "action=start" + "&" + "routeName=<routeId>").
to("micrometer:counter:route_counter" + "?" + "routeName=<routeId>")...//other route components.//and finally
to("micrometer:timer:route_timer" + "?" + "action=stop" + "&" + "routeName=<routeId>");
}
Copy the code
Instrument in this paper,
As you can see, a number of metrics can be collected and pushed to Prometheus using the following methods:
- Actuator specifications out of the box.
- By using AOP and
MeterBinder
. All of this custom instrumentation code is reusable and can be encapsulated as a library for all microservice implementations.
Both approaches provide a consistent and least intrusive way to gather metrics across multiple microservices and their multiple instances.
Prometheus integration with other third party systems
Prometheus has a healthy development ecosystem. There are multiple libraries and servers that can be used to export metrics from third-party systems to Prometheus, and these libraries and servers have been orchestrated by Prometheus Exporters. For example, Mongodb_exporter can be used to export MongoDB metrics to Prometheus.
Apache Kafka makes its metrics available for JMX, and they can be exported to Prometheus, which is covered in the next section.
Integrate Kafka with Prometheus
If you use Kafka as a message/event broker, the integration of Kafka metrics with Prometheus is not out of the box and requires the JmX_EXPORTER component. It also needs to be configured on Kafka’s Brokers, which then provides metrics over HTTP. Jmx_exporter requires a configuration file (.yml). The sample configuration JMX_EXPORTER is provided in the Examples folder of the sample repository.
In this tutorial, we build custom Kafka images for demonstration purposes only. Instructions for building custom Kafka images are provided in readme.md of the JMx_EXPORTER code repository.
Build the dashboard in Grafana
Once metrics are registered in the Prometheus Meter Registry and Prometheus is successfully up and running, it will start collecting metrics. These metrics can now be used to build different monitoring dashboards in Grafana. Multiple dashboards are required for different endpoints. You are advised to create the following dashboards:
- Platform overview dashboardTo provide the availability status of each microservice and other software components of the platform, such as Kafka. This type of dashboard can also report aggregated metrics at the platform level
Request rate
(HTTP request rate, Kafka consumption request rate, etc.) andNumber of abnormal
. - Microservices explore the dashboardTo provide detailed metrics for microservice instances. Declared in Grafana
variables
Very important, they correspond to the different labels used in the metrics. For example,appName
.env
.instanceId
And so on. - Middleware monitors dashboardsProvides a detailed dive view of the middleware components. These are specific to middleware (for example, the Kafka dashboard). Here,
variable
The statement is important in order to be able toThe cluster
Level andThe instance
Observe indicators at the level.
Use dimensions to drill down and aggregate
When metrics are reported, labels are added to metrics. These tags can be used in Prometheus queries for aggregation or for insight into metrics. For example, at the platform level, people want to see the total number of exceptions in the platform. This can be easily done using the following query:
sum(component_invocation_exception_counter_total{env="$env"})
Copy the code
The result is:
Now to delve into the same metrics at the method and exception type levels, a Prometheus query would look like this:
sum by(appName, instanceId, componentClass, methodName, exceptionClass)(component_invocation_exception_counter_total{env="$env", appName="$application", instance="$instance"})
Copy the code
Details are as follows:
Note the $variable. This symbol can be defined as a variable in the dashboard. Grafana will populate them according to the different metrics available in Prometheus. Users of the dashboard can select their respective fill values, which can be used to dynamically change metric visualizations without creating new visualizations in Grafana.
As another example, the following Prometheus query can be used to visualize the throughput of a service bean in a particular microservice instance.
rate(component_invocation_timer_seconds_count{instance="$instance", appName="$application", componentType="service"}[1m])",
Copy the code
Dashboard Example
The following dashboard visualizes metrics at the platform level:
The dashboard offers:
-
HTTP request rates for all REST controller methods and consumption rates for Kafka consumers
-
Availability status of all microservice instances and Kafka clusters.
Note that each visualization here is a hyperlink to a specific microservice instance, which provides a detailed dashboard to navigate down to that microservice instance.
-
Failed HTTP requests and service errors for all microservice instances.
-
Exception breakdown of all microservice instances.
Example of a microservices dive dashboard
The dashboard is divided into sections called “rows” in Grafana. This dashboard provides all metrics for a specific instance of a microservice. Note that it is a single dashboard with user input for environment, microservices, instanceId, and so on. By changing the values in these user inputs, you can view metrics for any microservices on the platform.
Note: There are multiple screenshots because many metrics have been visualized for demonstration purposes.
Different indicator parts
Microservice instance level metrics
HTTP Controller Specifications
Service indicators
HTTP client metrics
Kafka producer indicator
JDBC connection pool indicator
Example Kafka dashboard
Kafka broker indicators
Kafka message statistics
conclusion
Monitoring Spring Boot microservices is easy with Spring-boot-Actuator, Micrometer, and Spring-AOP. With these powerful frameworks, you can build comprehensive monitoring capabilities for microservices.
One focus of monitoring is consistency of metrics across multiple microservices and their multiple instances, which makes monitoring and troubleshooting easy and intuitive, even if there are hundreds of microservices.
Another focus of monitoring is viewpoints. This can be achieved by using the dimension and rate aggregation properties of indicators. Tools such as Prometheus and Grafana support this right out of the box. Developers just need to make sure that the metrics they produce are properly labeled (again, this can be easily done with common or reusable facets and Spring configuration).
By applying this guide, all microservices can be monitored consistently and comprehensively with minimal invasive glue code.
The sample code
The code samples provided in this tutorial are available on GitHub.
If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.
The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.