Introduction: With the popularization of wireless devices and the vigorous construction of 5G, more and more online systems and small programs have become indispensable tools in people’s life. For these tools, there is a problem: how many users can the system withstand simultaneous access, in the face of sudden traffic peak, can ensure the stable operation of the system without failure? This article will answer this question and explain.

Author: Brush clothes, clouds

Why do you need a pressure test

With the popularization of wireless devices and the vigorous construction of 5G, more and more online systems and small programs have become indispensable tools in people’s life. For these tools, there is a problem: how many users can the system withstand simultaneous access, in the face of sudden traffic peak, can ensure the stable operation of the system without failure?

In order to answer this question, it is necessary to do several rounds of pressure test before the system goes online to simulate complex and highly simulated online flow in advance to verify the high availability of the overall system, which is also the key link to implement the system high availability scheme. In addition, the capacity planning and bottleneck detection of the system are also completed through pressure measurement at different stages, and the overall capacity of the system is checked and accepted to ensure that the system can withstand the coming real online pressure before the sudden flood peak.

In a sense, pressure measurement is the verification of system stability.

How to perform an accurate performance pressure test

Prepare pressure test environment

The execution environment of the pressure test is an old topic. If the pressure test is performed directly in the production environment, there are two problems:

  1. Online services and users who can access the system are affected
  2. It will contaminate the online data and write the pressure measurement data to the online database

In order to solve these two problems, the following solutions are generally adopted in the industry:

Each of the above schemes has its own advantages and disadvantages, and applies to different scenarios. You can choose the scheme flexibly according to the stage of your project.

Build the pressure test script

Pressure measurement tools commonly used in the industry include JMeter, Gatling, Locust, K6, Tsung, Ali Cloud PTS, etc. These tools, without exception, require the compression service API to be choreographed into a compression script.

The focus of this step is to ensure that the API for the pressure test is not missed, and that the API is arranged in order to conform to the user’s operation logic. For the compression test of the health code service, if the login authentication API is omitted in the script, the following apis, such as updating the health code and viewing the nucleic acid report, will report an error during the permission verification step, and the normal service logic will not be executed, and the real service scenarios cannot be simulated.

Each of the above pressure tools can be programmed in two ways:

  1. Manually entering the script requires that the script writer be familiar with the business and ensure that the API is not missed.
  2. Automatic recording scripts. All the open source tools provide the proxy function for recording requests. After the proxy function is enabled and configured, you can automatically record requests and generate compression scripts by simulating user operations and clicking behaviors on the page. PTS also provides the Chrome recording plugin **[1]**, proxy-free configuration, one-click generation of JMeter and PTS compression scripts. This improves scripting efficiency and ensures that apis are not missed.

To avoid the risk of missing apis in complex scripts, you are advised to use the recording function to generate scripts.

Confirm pressure model

This step is to simulate the pressure peak, the pressure distribution ratio of different apis, and the pressure value increasing model during the configuration manometry. The pressure value refers to the number of simulated concurrent users, or requests sent per second.

Pressure mode

Before setting, it is necessary to confirm the pressure mode. There are two pressure modes in the industry:

  1. The virtual user (VU) mode can be understood as a thread simulating a real user. The thread keeps running in a loop during pressure measurement to simulate the user sending requests continuously.

  2. Throughput mode, or requests per second (QPS), is a direct measure of server throughput.

During the project acceptance phase, an important metric is the throughput of the system, the QPS that can be supported. For this scenario, throughput mode is recommended. You can intuitively see the number of requests sent by the press per second, which is directly related to the throughput of the server.

Pressure distribution ratio of each API

After confirming the pressure mode, you need to configure the pressure distribution ratio for different apis. For example, in the health code service, 100% of users will call the API of logging in to AP and obtaining the health code, but not all users will call the API of querying nucleic acid report and viewing push information. Therefore, the exact ratio of pressure distribution for each API is also a factor not available in a successful pressure survey.

Increasing pressure model

Common pulse model, stepwise increase, uniform increase.

Pulse model can simulate the sudden increase of flow in a moment and is often used in the business scenarios of second kill and buying.

The incremental model can simulate the increasing number of users in a certain period of time, and is often used to simulate business scenarios with warm-up.

In addition to the conventional incremental model, it is best to realize the manual speed regulation function in the pressure measurement. First, it can simulate some unconventional flow increasing situation, and second, it can repeatedly adjust the pressure value to reproduce and troubleshoot problems.

Regional distribution of pressure flow

After determining the pressure value and increasing model, it is also necessary to determine the regional distribution of pressure flow and try to fit the real user distribution to ensure the reliability of test results.

For regional online services, it is understandable that the presses are distributed in the same local machine room. If it is a nationwide online business, the pressure machine should also be deployed in each region of the country according to the distribution of users.

Perform the pressure test and observe the pressure index

Core indicators in pressure measurement: Request success rate, request response time (RT), system throughput (QPS)

The request success rate should not only look at the global request success rate, but also pay attention to the success rate of some core apis to avoid the situation that the overall success rate reaches the standard and the core API success rate is insufficient.

Request response times 99, 95, 90, 80… The average response time is not of great reference significance, because the experience of most users needs to be guaranteed in pressure measurement. In the case of unclear dispersion degree, the average value is easy to cause misjudgment.

System throughput is an index to measure how much access the system can bear, and it is an indispensable standard for pressure measurement.

When the above three indicators meet the inflection point, it can be considered that the system has a performance bottleneck, and you can stop the pressure measurement or adjust the pressure value, and prepare to analyze and locate the performance problem.

In addition to the three service indicators, some indicators of application monitoring, middleware monitoring and hardware monitoring of the system should also be observed, including but not limited to:

Server:

  • Network throughput
  • CPU utilization
  • Memory usage
  • Disk throughput
  • .

Database:

  • The number of connections
  • SQL throughput
  • Slow SQL number
  • Index hit ratio
  • Lock wait time
  • Lock wait times
  • .

Middleware:

  • The JVM GCS
  • The JVM GC time consuming
  • In-heap and out-of-heap memory usage
  • Number of active Threads in the Tomcat thread pool
  • .

For more indicators that need to be paid attention to during pressure measurement, see pressure measurement Indicators **[2]**

If the system has reached the expected, often can also be in accordance with 10-20% of the proportion, constantly increase the pressure value, for the system to do a peak “touch high”, observe the system’s limit value is how much, do have a bottom in mind.

Redisk, performance optimization

At the end of the pressure test, if it does not meet the expectations, you can coordinate with the monitoring and scheduling to analyze the performance problems. After the performance optimization is completed, the verification will be continued in the next round of pressure test.

Methods for problem analysis and tuning in testing are not described here, but refer to the test Problem Analysis and Tuning article **[3]**.

If the system is performing as expected, flow control, degradation, system or isolation rules can be configured using the system throughput indicators obtained by pressure measurements to ensure system stability.

Ali Cloud PTS – Pressure test package, help you system worry-free

PTS (Performance Testing Service) is a SaaS Performance Testing tool of Alibaba Cloud. It has been 10 years since its birth to accurately simulate the double 11 traffic peak. It supports tens of thousands of pressure testing tasks across the group every year, including singles’ Day, and is the “validator in advance” of alibaba’s internal singles’ Day technical architecture.

Technology benefits 1 — self-developed PTS pressure test engine, accurate pressure model, excellent performance

PTS’s self-developed pressure engine is superior to the traditional thread model in the implementation of concurrent model. It also supports API dimension throughput configuration, which is more refined than open source tools and can accurately simulate the traffic funnel model.

For example, a real traffic model is that 100% users will call the login API, 80% users will call the refresh health code API, and 20% users will call the nucleic acid API. This requires throughput (QPS) configuration on each API. If the concurrent model is used, this scenario cannot be simulated.

Example funnel model:

PTS pressure test also supports multiple client traffic recording functions to quickly build pressure test scripts and complete blank screen operation, greatly reducing the barrier to build pressure test scripts.

Technology benefits 2 – fully compatible with JMeter, JMeter plug-in online

While fully compatible with JMeter, PTS has made a number of optimizations for JMeter distributed manometry:

Optimization point 1: global distribution of pressure machine, press and use, can support millions of concurrent, ten million QPS pressure measurement;

Optimization point 2: Throughput mode is supported. The global target QPS can be set to more intuitively measure the performance of the server.

Optimization point 3: support speed control during pressure measurement, can flexibly adjust concurrent or QPS, constantly approaching the performance limit point;

Optimization point 4: Support browser plug-in recording, one-click export of JMeter scripts, no need to configure proxy, greatly reducing the workload of script construction;

Optimization point 5: For distributed pressure test, support automatic file segmentation, support global effective Timer and Controller components, and enable distributed pressure test with zero threshold;

Optimization point 6: Release JMeter PTS plug-in to initiate cloud distributed pressure testing using the JMeter GUI client, seamlessly connecting script debugging and execution phases (see JMeter Plug-in instructions **[4]**).

Technology Concession 3 – Pressure test for VPC Intranet

Prior to full formal pressure testing, key micro-service applications need to perform pressure testing on a single application on a daily basis to understand local performance limits.

For services deployed on Aliyun, a single micro-service application does not expose the access to the public network. In this case, the pressure test tool must be able to access the VPC Intranet.

PTS Supports VPC Intranet pressure testing. During pressure testing, the network between the pressure machine and the user’S VPC can be quickly connected to ensure that the Intranet pressure testing network is normal. After the pressure test, the network channel will be closed immediately to ensure network security.

In the pressure test configuration, you only need to select the VPC Intranet, security group, and switch where the micro-service application resides to enable the pressure test. Enable your service to detect performance metrics without exposing public access points.

The following is an example:

Technology benefits 4 – flow area customization

Users in most businesses are not evenly distributed geographically, but rather tend to be very uneven. To simulate the real flow distribution, the pressure machine needs to be dispersed around the deployment, and support by region, according to the amount of distribution, in pressure measurement, but also support real-time unified scheduling. If all the presses are located in one Region, or even one availability zone, it is impossible to simulate requests from users around the world.

When using Ali Cloud performance test service (PTS) pressure test, enable the flow region customization function. Simply check the region to specify the regional distribution of the press. Currently, 22 regions around the world can be customized.

Technology Benefits 5 – problem diagnosis tools

The purpose of the pressure test is to discover performance problems. In the pressure test report, THE PTS has statistics of abnormal request status codes and provides request sampling logs, so that you can intuitively see all information about requests and responses. For a request with a long response time, the time spent in each phase of the request is also intuitively displayed.

For Java applications, PTS provides a Java Agent-based problem diagnosis tool that automatically obtains second-level monitoring of application, API, and machine dimensions by simply mounting probes on Java applications. For the request that reported an error, it can directly locate the method stack that reported an error on the call chain, saving a lot of troubleshooting time, and is a “sharp tool” to locate the problem.

The following is an example of an error location method stack:

Cost concession 1 – Launch JMeter resource pack

PTS has launched its own JMeter resource pack at a much better price than the PTS pressure resource pack.

Cost advantage 2 – The VPC Intranet pressure test price is better

PTS has launched a VPC Intranet pressure test resource pack, which costs only 29 YUAN to start for 10,000 concurrent pressure tests for 20 minutes, lowering the cost of daily normal Intranet pressure tests.

Cost concession 3 – annual package monthly package, limited time discount 75

Annual and monthly resource package, limited time discount 75% off, and in the monthly period, excluding VUM, suitable for high frequency pressure measurement users.

Cost Concession 4 – Customize resource pools

You are advised to use a customized resource pool for high concurrency and long pressure test time. More than 20 presses, under the condition of continuous pressure test for 1 hour, the charging is equivalent to 40% discount of normal pressure test, so that users with long time and high concurrent pressure test can pay lower cost.

A link to the

[1] Chrome Record plugin

Help.aliyun.com/document\_d…

[2] Pressure measurement index:

Help.aliyun.com/document\_d…

[3] Test problem analysis and tuning:

Help.aliyun.com/document\_d…

[4] JMeter plugin:

Help.aliyun.com/document\_d…

[5] PTS Product Purchase Page:

Common-buy.aliyun.com/?commodityC…

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.