Development-oriented testing Techniques (II) : Performance testing

Introduction: Since Kent Beck put forward the concept of TDD(Test-driven Development) at the end of the last century, the boundary between Development and testing has become increasingly blurred. From the original dependence between upstream and downstream, it has gradually evolved into an interdependent relationship. Many companies have even created a new Quality Engineer position. Different from traditional Quality Assurance (QA), QE is mainly responsible for ensuring project Quality through engineering means, including but not limited to writing unit tests, integration tests, building automated test processes, and designing performance tests. It can be said that QE combines the quality awareness of QA with the engineering ability of development. This is the second in a three-part series on the basic skills required for QE’s beta/development role from a development perspective.

Previously:

Development-oriented testing Techniques (I) : Mocks

1 What is performance testing?

Let’s take a look at the wikipedia definition of performance testing,

In software engineering, performance testing is in general, a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. – Wikipedia

Note that there are three key words in the above definition:

Responsiveness responsiveness refers to how long it takes servers to return results after a request is sent.
The same request is sent at different times, and the smaller the difference in response time, the better the stability and performance.
Workload Indicates the number of requests received by the server at the same time. Throughput indicates the number of requests successfully processed per unit of time. A larger throughput indicates better performance.

Response time and throughput are two of the most important indicators of application performance. For the vast majority of applications, initial response times are minimal; As the load increases, the throughput increases rapidly and the response time becomes longer gradually. When the load exceeds a certain value, the response time will suddenly increase exponentially, and the throughput will also decrease, resulting in a sharp decline in application performance. The whole process is as follows:

Photo credit:How should performance testing be done?

2 Purpose of performance test

Knowing the general law of application performance change, the purpose of performance testing is also the answer: for a certain application, find the quantitative relationship between response time and throughput, find the critical point of application performance change. What’s the use of knowing this, you may ask? In my opinion, there are at least 3 levels of benefits:

First, aim at the target and improve the utilization rate of resources. The process of performance testing is to quantify performance. With various performance data, you can quantitatively analyze application performance, find and solve potential performance problems, and improve resource utilization.

Second, scientific capacity planning. Finding the critical point of application performance change makes it easy to find the performance limit of a single node, which is an important decision basis for capacity planning. For example, if the maximum throughput of an application on a single node is 2000 QPS, at least five nodes need to be deployed for the traffic of 10000 QPS.

Third, improve Quality of Service (QoS). In many cases, resources are limited. In the face of traffic exceeding service capacity, trade-offs must be made to ensure QoS (such as traffic limiting degradation, switching scheme, etc.). Application performance data is an important basis for designing QoS schemes.

3 Three common errors in performance testing

Myth 1: Only look at the average, do not understand TP95/TP99

Using averages to measure response times is one of the most common misconceptions in performance testing. As you can see from the illustration in section 1, response times gradually get longer as throughput increases, and begin to accelerate after maximum throughput is reached, especially for requests that come later in the queue. At this point in time, if you just look at the average, you won’t notice a problem, because the response time for most requests is still very short, and slow requests are only a small percentage, so the average doesn’t change much. In practice, however, more than 1%, or even 5%, of requests have exceeded their design response times.

A more scientific and reasonable indicator is TP95 or TP99 response time. TP, short for Top Percentile, is a statistical term used to describe the distributed characteristics of a set of values. Take TP95 as an example. If there are 100 digits, the 95th digit is the TP95 value of this group of digits after sorting from smallest to largest, indicating that at least 95% of digits are less than or equal to this value.

Taking a specific performance test as an example,

There were 1000 requests and the average response time was 58.9ms, TP95 was 123.85ms (2.1 times the average response time) and TP99 was 997.99ms (16.9 times the average response time). Assuming that the maximum response time of the application design is 100ms, the average time alone is perfectly acceptable, but there are already over 50 failed requests. If you look at TP95 or TP99, the problem is very clear.

Myth 2: Focus on response time and throughput and ignore request success rates

While application performance is primarily measured by response time and throughput, the big assumption here is that all requests (if not all, at least 99.9%) are handled successfully, rather than returning a bunch of error codes. If this is not guaranteed, then no amount of low response time, no amount of high throughput means anything.

Myth 3: Forgetting that there are performance bottlenecks on the test side

The third mistake in performance testing is to focus only on the server side, ignoring that the test side may have limitations of its own. For example, if the test case is set to 10000 concurrent requests, but the actual machine running the case only supports 5000 concurrent requests, if you only look at the server data, you might mistakenly think that the server only supports 5000 concurrent requests. If this happens, either switch to a higher performance test machine or increase the number of test machines.

4 How do I perform a performance test?

After introducing some concepts related to performance testing, let’s take a look at the tools available for performance testing.

4.1 JMeter

JMeter is probably the most commonly used performance testing tool. It supports both graphical and command lines, and is in the black box category, friendly to non-developers and very easy to get started. Graphical interfaces are generally used to write and debug test cases, while actual performance testing is recommended to run on the command line.

Concurrent Settings

Request parameters

The results report

Common commands on the CLI:

Set JVM parameters: JVM_ARGS=” -xMS2g -XMx2g”
Run the test: jmeter -n-T
Run tests and generate reports: jmeter -n -t < jmx_FILE > -L

-e -o < report_DIR >

In addition to JMeter, other commonly used performance testing tools include AB, HTTP_Load, WRK, and the commercial LoaderRunner.

4.2 JMH

If the test cases are complex, or if the person responsible for the performance test has some development skills, you can also consider writing a separate performance test program using some framework. JMH is a recommended choice for Java developers. Similar to JUnit, JMH provides a set of annotations for writing test cases and an engine to run tests. In fact, the upcoming JDK 9 will include JMH by default.

Here’s an example from my sample project on GitHub,

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@Threads(Threads.MAX)
@State(Scope.Benchmark)
@Warmup(iterations = 1, time = 3)
@Measurement(iterations = 3, time = 3)
public class VacationClientBenchmark {

    private VacationClient vacationClient;

    @Setup
    public void setUp(a) {
        VacationClientConfig clientConfig = new VacationClientConfig("http://localhost:3000");
        vacationClient = new VacationClient(clientConfig);
    }

    @Benchmark
    public void benchmarkIsWeekend(a) {
        VacationRequest request = new VacationRequest();
        request.setType(PERSONAL);
        OffsetDateTime lastSunday = OffsetDateTime.now().with(TemporalAdjusters.previous(SUNDAY));
        request.setStart(lastSunday);
        request.setEnd(lastSunday.plusDays(1));

        Asserts.isTrue(vacationClient.isWeekend(request).isSuccess());
    }

    // Run only in IDE
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(VacationClientBenchmark.class.getSimpleName())
                .build();

        newRunner(opt).run(); }}Copy the code

Among them:

BenchmarkMode: Performance test mode, supporting Throughput, AverageTime, SingleShotTime and other modes.
@fork: Sets the number of Fork processes to run the performance test. The default is 0, indicating that the JMH main process is shared.
Threads: indicates the number of concurrent Threads. Threads.MAX indicates the number of CPU cores in the same system.
@warmup and @measurement: Set the number of running rounds for the Warmup and actual performance test, the duration of each round, etc
@setup and @benchmark: equivalent to @beforeClass and @test in JUnit

On the command line, performance tests written using the JMH framework can only be run as Jar packages (the Main function is fixed as org.openJDK.jmh.main), so a separate project is typically maintained for each JMH program. If you are a Maven project, you can use the jMh-Java-benchmark-archetype plugin. If you are a Gradle project, you can use the jMh-Gradle-plugin.

4 summary

These are some of my insights on performance testing, and you are welcome to share them on my message board. Stay tuned for the next article where I’ll talk about automated testing for the Web.

5 reference

A few points about performance testing
How should performance testing be done?
Ali double eleven promotion, technical preparation only do these two things?
Ding Yu, senior technical expert of Alibaba, talks about the evolution of high availability architecture on Double 11