In today’s article, we’ll cover one of Elastic’s most important applications: Application Performance Monitoring/Management, or APM. So what exactly is APM? Have you already applied logging and system metrics in Elasticsearch? Extend to application metrics using Elastic APM. You can know exactly where your application is spending its time so you can quickly fix problems and be happy with the code you’re pushing.
Elastic APM hands-on
Modern application services architecture
With the development of The Times, our IT architecture is more and more complex, such as:
The challenge of operations monitoring today – complex infrastructure
- The infrastructure level is complex
- Multiple servers
- Multiple network devices
- Multiple safety equipment
- Multiple storage devices
We have more and more servers on our system, and more and more devices are deployed in the cloud. Complex systems may even consist of thousands of microservices and architectures, so our business requests may require one or hundreds of microservices to complete together.
As the business complexity increases, the calls between microservices become more complex. So the question is: If our requests are getting slow, we want to know where the problem is?
An experienced programmer or system designer might find the answer in some logs:
A log is a chronological record of events. We can pick up clues from the thousands of logs here, like the code we can request back. Log is useful in many cases, but it has several disadvantages:
- If you don’t print log information in your application, you don’t have log content to look at
- Log will only tell you when the error occurred, but sometimes it can’t tell you why, for example, the return code is all correct, but it can’t accept why the corresponding speed is so slow
Metrics are also useful for app developers. Indicators are periodic measurements of numerical KPIs:
Above, we can measure CPU load by Metric every x minutes and annotate it with metadata. Metrics can be used to analyze information such as CPU load or disk utilization if a request fails or the response time is too long:
This is the output we see when using the iostat command on a Mac. Metrics are useful in many situations where we want to show trends or historical data. This is useful when trying to create simple, predictable, and reliable rules to catch events and exceptions. One problem with metrics is that they tend to monitor the infrastructure layer, capturing data about component instance levels (such as hosts, containers, and networks) rather than custom application levels.
But the question now is: what happened during and between each of these events? For example, between the three log events above, the first event occurs at 16:10:02, the second at 16:58:58, and the third at 16:20:55. We can see that the interval between the second event and the third event is 9 minutes. So what happened in between?
Let’s first look at the logs and metrics at the time of the second event:
Log:
Indicators:
But when our log gets really big, and we have more and more interfaces, we can’t look at these logs and metrics.
When designing a page or request, we often encounter the above waiting situation. There may be individual tools that can effectively solve part of the problem, but how to complete the problem location and analysis from the whole system. Elastic’s APM solution solves these problems perfectly. For our system designers or programmers to provide a quick positioning method.
With Elastic APM, it’s very easy to figure out why each request is taking so much time to complete and where each time is being spent.
The Elastic Observability
Elastic integrates logs, metrics, and APM to unify the visibility of the entire ecosystem and create an integrated visibility. Massively consolidate your logs, metrics, and APM traces into one stack so that you can monitor and react to events occurring in your environment. According to the description of the article “Metrics, Tracing, and Logging”, building comprehensive observability requires the following three elements:
In the Elastic Stack, we have a module that handles:
- Logging: Events generated during the execution of a program, which can explain its running state in detail.
- Metrics: A set of aggregated metrics that are used to monitor infrastructure (machines, containers, networks, etc.), but also applications that are used to monitor the business level. For example, the open source search system Elasticsearch has application-level metrics on query/write volume, time taken, rejection rate, etc.
- Application performance Monitoring (APM) : Tracking (or monitoring) down to the code level, including internal application execution, link calls between services, etc., makes it easy to find the cause of application “slow”. APM is most commonly used to track the processing of a request by a Web server, including internal execution logic, calls to external services, and their corresponding time consumption.
Elastic Stack provides fast, reliable and relevant searches for all operational data, so you can ask the questions you want — and get the answers you need, regardless of the data type. APM adds a new dimension to system observability by extending logging and server-level monitoring capabilities. Access to Real User Monitoring (RUM), which can also be extended to the end User experience. With Elastic Stack, you can create one-stop full Stack monitoring:
What exactly is APM?
Simply put: APM is about monitoring and managing the performance and availability of software applications. Elastic APM is an application performance monitoring system built on Elastic Stack. It allows you to monitor software services and applications in real time – gathering detailed performance information about response times for incoming requests, database queries, calls to the cache, external HTTP requests, and so on. This makes it easy to quickly identify and resolve performance problems.
Elastic APM also automatically collects unhandled errors and exceptions. Errors are grouped primarily by stack traces, so you can identify new errors as they occur and keep an eye on how many times a particular error occurs.
Metrics are another important source of information when debugging a production system. The Elastic APM agent automatically selects base host-level metrics and agent-specific metrics, such as JVM metrics in the Java agent and Go runtime metrics in the Go agent.
Let’s take a look at the following figure:
As shown in the figure above, when we request at different times, we find out why one request at 17:36:38 takes nearly 8 seconds and another at 17:36:31 returns an error code?
The Elastic APM solution is the world’s first open source APM solution:
- APM keeps track of database queries, external HTTP requests, and other slow operations that occur during requests to applications
– It’s easy for programmers to see how much time is spent on various parts of the application at runtime
- It collects unhandled errors and exceptions
– Make it easy for programmers to debug errors
- Locate performance bottlenecks and errors before customers face them
- Increase the productivity of the development team
APM applies to the location of the Elastic Stack
How does APM store data in Elasticsearch and provide analysis? Let’s take a look at the following architecture diagram:
As shown in the figure above, we see a typical APM architecture diagram:
- We need to set up a dedicated APM server, although we can also put it on the same server as the other servers in the Elastic Stack
- AMP Agent collects data and sends it to the APM server. APM agents here include:
- The APM server sends the data to Elasticsearch and analyzes it
- Kibana allows us to display our data and display it in a Dashboard
In general, APM data is just another Elasticsearch index. There is already a ready-made APM application in Kibana that we can use. We can also customize our own dashboards on demand. APM is the perfect combination of machine learning and alerting.
APM term
- Service: Set in the APM Agent configuration to identify a specific APM Agent group as a single Service, which is a way to logically identify a set of transactions
- Transaction: Requests and responses that make up a service, such as login API calls, each consisting of a separate SPAN.
- Span: A single event within a transaction, such as a method call, database query, or cache insert or retrieval, that is, any event that takes time to complete.
- Erorrs: Group of exceptions with matching exceptions or log messages
- Trace: Represents the entire process of the request
The relationship between them can be expressed as follows:
Distributed tracing:
As requests flow from one microservice to another, the tracker adds logic to create a unique trace identification code, the span ID
Hands-on practice
In today’s exercise, we’ll use Java Spring Boot as an example to show how to use Elastic APM.
Download the Spring Boot code
First, enter the following command in terminal:
git clone https://github.com/liu-xiao-guo/elastic-apm-demo
Copy the code
The example above is a simple Spring Boot application. It has the following characteristics:
- It can access the MySQL database through the REST interface to add data and request data
- It can access the Baidu Weather interface through the REST interface to obtain weather data
Here is part of the code:
@PostMapping(path="/add") // Map ONLY POST Requests
public @ResponseBody String addNewUser (@RequestParam String name
, @RequestParam String email) {
// @ResponseBody means the returned String is the response, not a view name
// @RequestParam means it is a parameter from the GET or POST request
User n = new User();
n.setName(name);
n.setEmail(email);
userRepository.save(n);
return "Saved";
}
@GetMapping(path="/all")
public @ResponseBody Iterable<User> getAllUsers() {
// This returns a JSON or XML with the users
return userRepository.findAll();
}
@GetMapping(path="/weather")
public @ResponseBody String getBaiduWeather() throws InterruptedException {
// Add some random delays before getting the info
double delay = Math.random() * 10;
System.out.println("delay: " + delay);
TimeUnit.SECONDS.sleep((long)delay);
String weather = getWeatherInform("北京");
return weather;
}
Copy the code
In the weather interface, I have deliberately added a random number delay to simulate that each request will take a different time.
We can type the following command in the root directory of the application:
./mvnw clean package
Copy the code
This will produce a file called log-data-mysql-0.0.1-snapshot.jar in the target subdirectory of the current directory.
You can open localhost:8080 to see if there is output from the page.
$ls. / target/accessing data - mysql - 0.0.1 - the SNAPSHOT. Jar. / target/accessing data - mysql - 0.0.1 - the SNAPSHOT. The jarCopy the code
If you want to test the app, you can type the following command in the app root directory:
Java - jar. / target/accessing data - mysql - 0.0.1 - the SNAPSHOT. The jarCopy the code
You can open localhost:8080 to see if there is output from the page.
We can copy this file to any directory we want. In my case, I copied it to my data/ APm directory in my home directory.
$PWD /Users/liuxg/data/ APM LIUxG-2: APM LIUxg $ls accessing-data-mysql-0.0.1- snapshot.jar Accessing data - mysql - 0.0.1 - the SNAPSHOT. The jarCopy the code
MySQL installation
We can install our MySQL as required by the documentation. We enter the following command ina terminal:
mysql -uroot -p
Copy the code
We enter the password of user root into MySQL. To create a database, we type the following command in MySQL prompt:
mysql> create database db_example; -- Creates the new database
mysql> create user 'springuser'@'%' identified by 'ThePassword'; -- Creates the user
mysql> grant all on db_example.* to 'springuser'@'%'; -- Gives all privileges to the new user on the newly created database
Copy the code
The above command creates a database called db_Example. It also creates a user named SpringUser and its password, ThePassword. Remember to use the same username and password as we used in the Java application above:
We can use the MySQL tool to check:
Notice that in the database above, we have three fields: ID, email, and name.
Run the Elastic Stack
There are two ways to set up the Elastic Stack. The easiest way is to use Docker to deploy Elasticsearch, Kibana, and APM servers in one click. For detailed installation procedures, please refer to github.com/elastic/apm… . Let’s deploy the Elastic Stack manually.
You can install and run Elasticsearch and Kibana by following our article “Elastic: A Beginner’s Guide”.
We also have to install the same version of APM server as Elasticsearch. We open up our Kibana interface and click on the top left section:
Then, we follow the above steps step by step to install:
We have configured the address and username and password for Elasticsearch (if you have security enabled) as shown above:
The steps above are very detailed. If we want to monitor Real User Monitoring (RUM), we must modify our APM-server.yml to do so. At the end of the apm-server.yml file we add:
apm-server.rum.enabled: true
Copy the code
For the selection of APM Agent, because we are a Java application, we choose Java Agent. We download the corresponding Agent JAR file and store it in the same folder as our Spring Boot JAR file above. In my case, it’s data/ APM in the home directory.
$PWD /Users/liuxg/data/ APM LIUxG-2: APm LIUxg $ls *.jar accessing-data-mysql-0.0.1- snapshot.jar Elastic - apm - agent - 1.10.0. JarCopy the code
At this point, we can start running our Spring Java application. We can run it with the following command:
Java-javaagent :./elastic-apm-agent-1.10.0.jar \ -delastic. Apm. service_name=sample_apm \ -Delastic.apm.server_url=http://localhost:8200 \ -Delastic.apm.secret_token= \ - Delastic. Apm. Application_packages = accessing data - mysql \ - jar accessing data - mysql - 0.0.1 - the SNAPSHOT. The jarCopy the code
Note: sample_apm here is a service name I gave it. You can choose a unique name according to your needs. If you don’t want the hassle, you can produce a file called Elasticapm.properties in your current directory. It reads as follows:
service_name=sample_apm
application_packages=accessing-data-mysql
server_url=http://localhost:8200
Copy the code
So we can run it with the following command:
Java-javaagent :./elastic-apm-agent-1.10.0.jar \ -deltastic. Apm. secret_token= \ -jar Accessing data - mysql - 0.0.1 - the SNAPSHOT. The jarCopy the code
Another way to do this is by setting OS environment variables. This configuration replaces any Settings in Elasticapm.properties, such as:
export ELASTIC_APM_SERVICE_NAME=my_fantastic_service
export ELASTIC_APM_APPLICATION_PACKAGES=org.example
export ELASTIC_SERVER_URL=http://localhost:8200
Copy the code
After our Spring Boot application was fully up, we clicked the “Check Agent Status” button in Kibana. No data may be displayed at this time. We can open our browser and type the following address in the browser’s address bar:
We can see that we’ve got some weather data. At this time, we can see the information in Agent status:
Let’s click the Confirm Overwrite button above:
Start the APM application
If you’re up here, you’ve basically got the whole environment up and running. We can enter the following command in terminal:
curl localhost:8080/demo/add -d name=First -d [email protected]
Copy the code
The above application writes a record to our data.
curl 'localhost:8080/demo/all'
Copy the code
Running the above command displays all the records that have been entered
curl 'localhost:8080/demo/weather'
Copy the code
Run the above command to get the weather information brought to us by baidu weather API interface.
All of the above information can be entered in the address bar of the browser.
Click on the APM app icon in Kibana:
Let’s click on the sample_APM link:
Above we can see the statistics of the four interfaces of the application.
We can see all of our API calls on the dashboard of the APM application. Such as:
In my application, I deliberately added some delay, so our whole request time of getBaiduWeather was 9.157 seconds to complete, while api.map.baidu.com’s time was only 149ms. If we click on the blue line above, we can see how the API is called:
Let’s click on the “addNewUser” link:
We can see the following screen:
We can see how long it takes addNewUser to call several commands in MySQL.
We can also click Error to see all Error messages:
We click on the link above to produce our own stack of errors:
We can click on the above JVM to see the current JVM usage:
And that’s the end of my presentation. Leave the rest to you to dig!
Read more
- Solutions: How can I use Elastic APM to test multilingual microservices applications
- Solutions: How do I APM Python Flask applications
Reference:
[1] Accessing data with MySQL