An overview of the

Java performance optimization is, in my opinion, the only way to progress in Java. Many Java engineers may have little understanding of the Underlying Java virtual machine running after executing the code. The biggest difference between Java and C/C++ is the lack of memory management. It allows engineers to focus on applying principal logic without having to manage memory usage, but this is a double-edged sword, and Java performance optimization is all about getting the best performance out of the program.

Performance tuning

There are many factors that affect Java performance tuning, or performance tuning, and the JVM is only a small part of overall performance. Not just the Java virtual machine itself, but database connections, network overhead, and so on. We’re going to focus on the application itself today.

A better algorithm

People who understand data structures and algorithms are well aware of the difference between a good algorithm and a bad one. The same is true for the application code itself. The encapsulation of good methods often saves more memory space and time, and is very good for exception handling, rather than like a newcomer to the programming language, the purpose is to implement specific functions first. For the application of the code to be craftsman, ensure readability on the basis of better performance.

Less code

As described in the previous section, a small program with less code will run faster than a large program for the same function. The more code you have to compile, the longer it takes to start the program, the more objects you create and destroy, and the more garbage collection you have to do. The more objects to allocate and hold, the longer the GC cycle; The more classes that have to be loaded from disk into the JVM, the longer the program takes to start; The more code you have to execute, the less efficient your machine’s hardware cache becomes; The more code you execute, the longer it takes. Similarly, when small features are introduced, performance degrades, but as the application becomes larger and larger, the cumulative small degrades add up to a significant decline in application performance.

Premature optimization

“We shouldn’t spend a lot of time on small performance improvements; Thinking about optimization too early is the mother of all nightmares. “Readable, well-structured code needs to be written early in the application’s functionality. Premature optimization here does not include avoiding code structures that are already known to be bad for performance. If there are two simple, straightforward ways to program each line of code, you should choose the one with better performance.

Performance optimization principles

  • Optimize code with performance analysis, focusing on the most time-consuming operations in performance analysis. Note, however, that this does not mean looking only at the leaf approach to performance analysis (see Chapter 3)
  • Diagnose performance problems using Occam’s Razor principle. The most likely cause of performance problems should be the easiest to explain: new code is more likely to introduce performance problems than machine configurations, which are more likely to introduce performance problems than JVM or operating system bugs.
  • Write simple algorithms for the most common operations in your application. In the case of a program that estimates a mathematical formula, the user can decide whether the maximum allowable error he expects is 10% or 1%. If a 10% error is appropriate for most users, optimizing the code means that even if the error range is reduced to 1%, the speed is slower

Performance test method

Testing a real application

Microbenchmark

Microbenchmarks are used to measure the performance of tiny units of code, including how long it takes to call a synchronous method versus an asynchronous method, the cost of creating a thread versus using a thread pool, the time it takes to execute an algorithm versus its alternative implementation, and so on

  • The results must be used, and modern compilers optimise the code intelligently to remove unused results, no matter how large the data is, resulting in a similar lapse in time. The corresponding local variables can be defined as instance variables and declared as volatile to test the performance of the method.
  • Remove irrelevant operations. To remove irrelevant operations used in a method, such as generating random numbers for calculation, you need to calculate random numbers before testing the method.
  • Use reasonable parameters. For the method being tested, the corresponding reasonable parameter interval is formulated.

Acer quasi test

Macro test is to test the overall performance of the application, including load balancing, application subjects, database performance, network overhead, etc. When testing the overall performance of the application, conduct separate tests for each module. For example, when testing the performance of the application subject, disguise it as the corresponding database connection operation and ignore the impact of database connection performance. In the case of a modular model, the rate at which data enters the subsystem depends on the output rate of the previous module or system. For example, the database can only load data at 100 RPS. While requests are sent to the database at a rate of 200RPS, output to other modules is only 100 RPS. Even with doubling the efficiency of business logic processing, the overall throughput of the system is still only 100 RPS. So, no amount of improvement in business logic is going to be effective unless time is spent making other aspects of the environment more efficient. First, conduct overall application test to confirm the priority of performance optimization of each module.

Intermediate reference test

Benchmarks are used to test the performance of certain aspects of an application, such as Socke management, reading requests, finding JSPS, writing responses, etc. There is no security management, no session management, and no use of a number of other Java EE features. They have fewer pitfalls than microbenchmarks and are easier than macro tests. They don’t contain a lot of dead code that can be optimized by the compiler.

Test indicators

Batch elapsed time

The easiest way to test an application process. It’s how long it takes to do the task. In Java applications, due to just-in-time compilation (JIT), the JVM takes a few minutes (or more) to fully optimize the code and execute it at maximum performance. For this (and other) reason when tuning Java performance, you need to pay close attention to the warm-up phase of code optimization: you should measure performance only after running code has been executed for a long enough time to compile and optimize.

throughput

Throughput tests are based on how much work can be done over a period of time, and the client reports the total number of operations it has completed. Throughput is the total number of operations completed by the client (including multithreading). This number is the number of operations performed per second, not the total number of operations performed during the measurement period. This metric is called transactions per second (TPS), requests per second (RPS) or operations per second (OPS). Client-server testing has certain risks:

  • The CPU of the client machine is insufficient to support the required number of client threads
  • It takes a lot of time for the client to process the response before it can send a new request

How does throughput compare to each other? If throughput is the same, then compare the response time, if throughput is different. A server with 500 OPS and a 0.5 second response time will perform better than a server with a 0.3 second response time and only 400 OPS. (Throughput testing also needs to be done after an appropriate warm-up period.)

The response time

Response time is the elapsed time between the client sending the request and receiving the response. Will try to simulate user behavior. The difference between this and throughput tests (assuming the latter are based on client-server mode) is that client threads in response time tests sleep for a period of time (thought time) between operations. With a fixed thought time and a specified number of clients, the throughput of the server is fixed (with a slight difference). So the corresponding response time determines the efficiency of the server.

Testing the client included throughput at think time, for example, a think time of 30 seconds and a response time of 1 second means that the client sends a request every 31 seconds, resulting in throughput (calculated as requests/(response time + think time)) of 0.0032 OPS. If the response time is 2 seconds, the client sends a request every 32 seconds, and the throughput is 0.031 OPS. However, with a fixed-cycle request (say, 30 seconds), a fixed throughput of 0.033 OPS per client is generated regardless of the response time (assuming that the response time in this example is less than 30 seconds).

Response time can be measured in two ways:

  • Average response time, total request time divided by number of requests. Average values are prone to be affected by outliers, leading to inaccurate data, such as pauses introduced by GC (garbage collection) and large outliers introduced.
  • Percentile value of response time, such as 90th percentile response time. If 90% of requests respond less than 1.5 seconds, and 10% respond at least 1.5 seconds, then 1.5 seconds is the 90th percentile response time.

Response time can be measured in combination with an average and at least one percentile value.

Load generator

Faban is an open source, Java-based load generator. Faban comes with a simple program (FHB) that can be used to test the performance of simple urls, as described on the website.

The resources

The Definitive Guide to Java Performance