Server-side performance testing, especially service performance testing, is used to evaluate performance capacity, diagnose performance bottlenecks and application errors, or verify high availability capabilities to reduce costs and improve user experience. However, when further positioning and analysis are required, such performance tests can be a little stretched. This article will introduce the classical combination of PTS + ARMS and practical solutions in performance capacity evaluation, performance bottleneck diagnosis and application error diagnosis.

PTS (Performance Testing Service)

Based on Ali full-link pressure test platform, it is a simple and direct cloud-based performance test tool, which can help users easily simulate the real business scenarios of massive access, support the initiation of required resources at any time, and avoid the construction and maintenance costs.

Application Real-Time Monitoring Service (ARMS)

APM Is a Tracing monitoring product. Users can quickly build real-time service monitoring capabilities based on the front-end, application, and custom monitoring of ARMS.

Apply to industry

PTS + classic pressure monitoring measurement scheme of very different, not only has been in the service of taobao, Tmall internal users such as alibaba group, is also used in other industries, including e-commerce, Internet financial, games, new media, government and large state-owned enterprises, etc., to support the new version performance baseline test line, promoting the scene test and large capacity planning, etc.

Typical Application Scenarios

  • Evaluate performance capacity

Taking a well-known online education platform as an example, users hope to cope with the normal course selection peak at the minimum cost, and at the same time, the user experience cannot be reduced under the condition of peak traffic.

According to the bucket principle, the system with the smallest capacity, that is, the board with the shortest bucket, determines the capability of the site. What users need to do is to identify the short board and level the system water level by adjusting the ratio of machines between the long and short boards. Then even the same number of machines can provide greater service throughput without additional expansion.

PTS integrated cloud monitoring and ARMS monitoring can not only accurately determine the system performance capacity bottleneck under a specific configuration, but also locate the configuration shortcomings of the performance baseline, such as system performance, database bottlenecks, and code problems. Performance capacity evaluation consists of the following three steps:

  1. Using PTS to quickly build high simulation service pressure measurement;
  2. The end-to-end full monitoring of the pressure test initiator side (client side) and the service side (cloud monitoring) can be observed through the PTS console to understand the business performance and the performance level of each core system under high pressure.
  3. Then through ARMS integration, application bottlenecks can be found quickly, interface snapshots can be listed, and specific causes of slow snapshots can be diagnosed based on system performance, such as system performance, slow SQL, or other code problems.

  • Diagnosing performance bottlenecks

In addition to adjusting capacity ratios, PTS + ARMS can further increase the overall capacity water level for the same number of machines by detecting and identifying system bottlenecks to improve site performance.

Take online education platform login through a browser as an example, the most common operation process is: login -> list the optional course list according to the user’s attribute information -> users perform accurate query or screening through query conditions -> select the final course to submit.

This process, as a transaction, has a strict sequence. In PTS this is a tandem link. With the friendly support of PTS for cookies, the entire link can be configured with the corresponding pressure test traffic through PTS and monitored by ARMS, so as to observe and analyze performance bottlenecks.

In addition, users can also find performance bottleneck inflection points of interface invocation through PTS, one-click jump to ARMS, and find performance bottleneck inflection points within specific code stacks based on specific thread mapping, thus providing evidence of code stack level for optimizing code performance.

Performance bottleneck diagnosis consists of the following three steps:

  1. Use PTS to quickly build pressure test and one-stop observation of interface call time of related applications to find performance bottlenecks;
  2. Use ARMS to observe the interface time of the corresponding application, and find the thread calling the corresponding slow interface for analysis;
  3. You can find the thread snapshot of the corresponding interface through thread mapping to analyze performance bottlenecks.

  • Diagnostic application error

Application error diagnosis is another major scenario before the official launch of enterprise Internet applications. While these errors generally do not directly affect call time and thus cause performance bottlenecks, they can still lead to a poor user experience due to business errors.

In addition to time consuming, applications under baseline performance may also return various invocation errors, typically:

  1. Timeout error: A timeout error may occur on the client when the back-end service is too late to respond.
  2. Fuse fault: This type of fault is triggered by fuse components (such as Sentinal) to protect back-end application performance.
  3. Errors caused by other system components, such as IOExceptions caused by performance overload.

Through the classical combination of PTS + ARMS, the above errors can be effectively found when the pressure increases.

In addition, the pressure-side monitoring and multi-dimensional monitoring features based on PTS can also be associated with specific error and abnormal details through ARMS integration to locate the detailed codes thrown by specific errors, thus improving the interface error diagnosis efficiency at an exponential level in pressure measurement scenarios. Fault interface diagnosis can be divided into the following three steps:

  1. Using PTS, observe whether service anomalies/errors occur when pressure measurement is increased;
  2. Using ARMS, observe the overall error calls of the corresponding application and determine the problem points.
  3. View the fault details and determine the root cause of the fault based on the fault snapshot details.

It can be seen that PTS can integrate key server performance indicators into the overall monitoring of pressure measurement by integrating the monitoring capability of ARMS, so as to identify problems more quickly and easily and reduce the operation and maintenance burden of users.


Alibaba Cloud Double 11 discount group activity: 6 people, is the lowest discount!

[Full 6 people] 1 core 2G cloud server 99.5 YUAN 298.5 yuan a year three years 2 core 4G cloud server 545 yuan a year 1227 yuan three years

1 core 1G MySQL database 119.5 yuan a year

【 Full 6 people 】3000 domestic SMS packets 60 yuan per 6 months

Tuxedo address: click.aliyun.com/m/100002029…


The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.