Description: Performance Testing Service (PTS) is a SaaS platform with powerful distributed pressure Testing capabilities. It simulates the real Service scenarios of massive users and comprehensively verifies the Performance, capacity, and stability of Service sites.

Author: Zhi Yun

Why do we need to do pressure survey problem location?

Performance Testing Service (PTS) is a powerful distributed SaaS pressure Testing platform that simulates the real Service scenarios of massive users and comprehensively verifies the Performance, capacity, and stability of Service sites.

In the process of constantly raising the water level of the pressed server, comprehensive pressure indicators such as QPS, RT, and TPS can be seen in the pressure view or the pressure report. However, the specific problem of the server cannot be quickly located based on these indicators. For example, We can see from the whole scenario errors information center error code corresponding to the response body of the interface, but specific error in which the downstream link, and what is the error stack, it simply can’t see it from the report, and downstream concrete interface is where to go wrong, what is the error stack, it is of concern to users.

With the help of problem diagnosis, we can determine the call situation of the upstream and downstream of the pressed interface. Meanwhile, from the link view, We can see message components (Kafka, RocketMQ, etc.), caches (Redis, MongoDB, etc.), databases (MySQL, Oracle, etc.), RPC calls (Feign, Dubbo, HttpClient, etc.) through the whole link, for example, An interface or other abnormal status code error, so, we can see from the invocation chain is Rpc calls appear problem, or problems database, speaking, reading and writing, and be able to see from the invocation chain corresponding error stack, on the basis of the information, the problem should be where to locate more clear.

Basic introduction and core advantages of problem diagnosis

Basic introduction

When it comes to problem diagnosis, users are mainly concerned about whether access problem diagnosis requires a series of modifications to the application side code, and whether complex configuration is required. The FAULT diagnosis provided by PTS is based on JavaAgent and does not require service code modification on the user side. In the Tomcat-based deployment mode, users only need to add necessary parameters in the startup script to access the fault diagnosis. For Kubernetes users, users can access problem diagnosis by adding the necessary annotations to the Yaml configuration file. PTS provides default link collection rules, and users can modify them based on their own requirements.

In the process of pressure test, TraceId is generated on the pressure engine side for each request. Through TraceId, upstream and downstream links involved in this request are associated. Users can see the complete call chain involved from the entry of this request to the end of this request. Problem diagnosis generates an application topology view based on the call chain, so that users can clearly see the call relationship between applications.

For the abnormal interface, we can see the corresponding error cause in the call chain, and users can troubleshoot and optimize the problems on the server side according to the specific error stack. Users can view the call chain of a specified request in real time during the pressure test, and trace back the problem from the pressure test report after the pressure test.

Core strengths

**1. Zero code intrusion: ** For Java type services, users can complete probe access for problem diagnosis without business code modification.

**2, high integration: ** pressure measurement, monitoring, problem diagnosis, integrated in the same console, user understanding and operation cost is relatively low.

**3, monitoring indicators: ** In the process of pressure measurement, in addition to the more basic monitoring indicators, at the same time for each service, to provide interface, machine, application level monitoring.

**4, low threshold: ** Only need to configure simple parameters to complete the problem diagnosis probe access, and the probe also has multi-protocol Mock, full link pressure test and other functions.

Quick problem diagnosis

The basic flow chart of access problem diagnosis is as follows:

Connect the probe and check whether the probe is connected successfully

First, we sorted out the applications involved in the pressed scenario, and connected all the applications to the problem diagnosis probe according to the steps in the [Problem Diagnosis] -> [Probe access **[1]] document. We can check whether the application probe is connected successfully in the APPLICATION configuration, application monitoring, interface monitoring, or machine monitoring of the PTS console. The pressure test scenario in this demonstration involves five applications, namely petStore-Web, PetStore-User, PetStore-Order, PetStore-Catalog, and PetStore-Cart. The application monitoring is used as an example to check whether the application is successfully connected. Click [Fault Diagnosis] -> [Application Monitoring [2]**] -> Select the Region and Namespace we have configured. If all applications involved in the pressure test scenario are displayed on this page, the application is successfully connected.

Turn on the fault diagnosis switch in the pressure survey scenario

Then, we create the pressure test scenario in the [Pressure Center] -> [Create Scenario **[3]**] of the PTS console, where PTS scenario or JMeter scenario can be selected. PTS scenario is used as an example, because this demonstration is mainly to verify the ability of problem diagnosis. Therefore, enable the fault diagnosis switch in Advanced Settings in scenario configuration. For specific monitoring and collection rules, PTS pushes the configuration of the default collection switch to users. In addition, the sampling rate is set to 1/1000. Users can customize the monitoring and collection rules based on their own requirements.

Start the pressure test and check the application monitoring

After completing the above steps, our pressure survey scenario is equipped with problem diagnosis capability. When we click start pressure test, we can select the service we care about from application monitoring, interface monitoring and machine monitoring to check the corresponding monitoring situation. Here, application monitoring **[2]** is taken as an example. The operation steps of other types of monitoring are similar, we select petStore-user to check application monitoring, as shown below:

After the pressure test is complete, view error information in all scenarios

Pressure after the test, we need to be pressure from pressure test report for the problem of illness from the server, open the pressure test report of the corresponding scene, specific steps: PTS console – > [pressure test center] – > [report list * * * *] [4], choose corresponding pressure test report, can see the whole scene from the overview page of information, specific as shown in the figure below:

Select probe sampling to see the specific call chain

Click “View Sampling Log” and select “Probe Sampling” as the sampling type to filter out the call chain collected by the problem diagnosis probe, as shown below:

View the call chain error stack information to locate server faults

After screening the call chain collected by the probe, call chain analysis can be carried out on the problematic interface. For example, the status code returned by the interface of the commodity list is 500. Click to view the details and see the specific reason, as shown below:

You can see the exact cause of the error in the call stack to optimize and fix the server-side code. In addition, you can view the call between services and database usage in the application topology view and database view. The application topology view is used as an example, as shown in the following figure:

Summary of common error codes in pressure test reports

Fault diagnosis Error code summary

Common error codes in fault diagnosis call link are summarized as follows:

  • Java. Lang. NullPointerException: server-side null Pointers, specific according to the errors in the call chain stack for server-side code.
  • Com. Microsoft. Essentially. JDBC. SQLServerException: server SQL error, according to the stack of invocation chain gathering information to check server SQL syntax, etc.

Error code summary of pressure test report

This section lists common errors in the pressure test report. You can see related error messages in all scenarios as follows:

  • Class java.net.SocketTimeoutException:null said request midway through the waiting for a response, or read the idle timeout. Check whether the server health or the TIMEOUT Settings of the PTS pressure API are reasonable. There may also be a bottleneck in the server processing capacity.

  • Class java.net.ConnectException:null said request with the distal (end) pressure test to establish a TCP connection when there is failure or was blocked by the remote. Check the health status of the server or whether there is a bottleneck in the network connection layer.

  • Class Java. Util. Concurrent. TimeoutException: null said the request with the distal (end) pressure test to establish a TCP connection when there is failure or was blocked by the remote. Check the health status of the server or whether there is a bottleneck in the network connection layer.

  • Class org. Apache. HTTP. ConnectionClosedException: Connection closed Connection abnormal shutdown, said the service side active closed the Connection.

  • Class Java. IO. IOException: Connection reset by peer indicates that the Connection is reset. If the SLB is used, check whether the SLB configuration is correct.

  • Class org. Apache. HTTP. ConnectionClosedException: Connection closed unexpectedly said data have not yet received, Connection has been closed. The server may not respond in a timely manner or the debugging or pressure test may be terminated prematurely.

  • The class java.lang.RuntimeException:java.net.UnknownHostException said domain name information cannot resolve. Check whether the domain name has been registered and can be resolved, and whether the unregistered domain name has been bound.

  • class org.apache.hc.core5.http.ProtocolException:Header ‘key: Value ‘is illegal for HTTP/2 messages Indicates that the server preferentially uses the HTTP2 protocol. If a Header that is not supported by the HTTP2 protocol is configured, remove the Header and retry. Common headers not supported by HTTP2 include Connection, keep-alive, proxy-connection, Transfer-encoding, Host, and Upgrade.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.