Top Java only understand, benchmark JMH!

Original: Taste of Little Sister (wechat official ID: XjjDog), welcome to share, please reserve the source.

Writing an ID generator recently, you need to compare the speed difference between UUID and the currently popular NanoID, and of course test your own ID generator according to the rules.

Such code is the most basic API, and even a few nanoseconds of speed can add up to a lot. The point is, how do I evaluate the rate of ID generation?

1. How do I collect performance statistics?

A common way to do this is to write some statistical code. This code, interspersed in our logic, performs some simple timing operations. Like these lines:

long start = System.currentTimeMillis();
//logic
long cost = System.currentTimeMillis() - start;
System.out.println("Logic cost : " + cost);
Copy the code

This is not a problem in business code, even in APM.

Unfortunately, the statistical results of this code are not necessarily accurate. For example, JVM execution involves JIT compilation and in-line optimization of blocks of code, or frequently executed logic, that need to be looped tens of thousands of times to warm up for a stable test result. The performance difference between preheating and preheating is very large.

In addition, to evaluate performance, there are many indicators. If these indicators had to be calculated manually every time, it would be boring and inefficient.

JMH(the Java Microbenchmark Harness) is such a benchmark tool. If you use our suite of tools to locate hot code, test its performance data, and evaluate improvements, you can turn it over to JMH. Its measurement accuracy is very high, up to the nanosecond level.

JMH has been included in JDK 12, other versions will need to be imported into Maven themselves, as shown below.

<dependencies>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-core</artifactId>
            <version>1.23</version>
        </dependency>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-generator-annprocess</artifactId>
            <version>1.23</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
Copy the code

Below, we introduce the use of this tool.

2. Key notes

JMH is a JAR package, much like the unit testing framework JUnit, that can be annotated for some basic configuration. Much of this configuration can be set using the OptionsBuilder of the main method.

The figure above shows what a typical JMH program does. By starting multiple processes, multiple threads, first perform warm-up, then perform iteration, and finally summarize all test data for analysis. Some pre – and post-operations can also be handled before and after execution, depending on granularity.

A simple code looks like this:

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@Threads(2)
public class BenchmarkTest {
    @Benchmark
    public long shift(a) {
        long t = 455565655225562L;
        long a = 0;
        for (int i = 0; i < 1000; i++) {
            a = t >> 30;
        }
        return a;
    }

    @Benchmark
    public long div(a) {
        long t = 455565655225562L;
        long a = 0;
        for (int i = 0; i < 1000; i++) {
            a = t / 1024 / 1024 / 1024;
        }
        return a;
    }

    public static void main(String[] args) throws Exception {
        Options opts = new OptionsBuilder()
                .include(BenchmarkTest.class.getSimpleName())
                .resultFormat(ResultFormatType.JSON)
                .build();
        newRunner(opts).run(); }}Copy the code

Let’s take a look at the key annotations and parameters one by one.

@Warmup

The sample.

@Warmup(
iterations = 5,
time = 1,
timeUnit = TimeUnit.SECONDS)
Copy the code

We’ve mentioned warmup more than once. The warmup annotation can be used on a class or method for warmup configuration. As you can see, it has several configuration parameters.

timeUnit: Unit of time. The default unit is second.
iterations: Number of iterations in the warm-up phase.
time: Time of preheating each time.
batchSize: Batch size, specifying how many times a method is called per operation.

The comment above means that the code is warmed up for a total of 5 seconds (5 iterations of one second each). The test data of the preheating process are not recorded.

Let’s see how it works:

# Warmup: 3 iterations, 1 s each
# Warmup Iteration 1: 0.281 ops/ns
# Warmup Iteration 2: 0.376 ops/ns
# Warmup Iteration 3: 0.483 Ops /ns
Copy the code

In general, benchmarks are for small, relatively fast blocks of code. This code has a good chance of being compiled, inlined, and kept methods concise at code time, as well as good for JIT.

When it comes to warming up, service warming in a distributed environment has to be mentioned. When the service nodes are published, there is usually a warm-up process, gradually releasing volume to the corresponding service nodes until the service reaches the optimal state. As shown in the figure below, load balancing is responsible for this scaling process, usually based on percentage scaling.

@Measurement

Here’s an example.

@Measurement( iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
Copy the code

The parameters for Measurement and Warmup are the same. Unlike preheating, this refers to the actual number of iterations.

We can see this in the log:

# Measurement: 5 iterations, 1 s each
Iteration   1: 1646.000 ns/op
Iteration   2: 1243.000 ns/op
Iteration   3: 1273.000 ns/op
Iteration   4: 1395.000 ns/op
Iteration   5: 1423.000 ns/op
Copy the code

Although the code behaves optimally after being warmed up, it is not always the case. If your test machine is performing very well, or if your test machine resource utilization has reached its limit, the test results will be affected. Usually, I give the machine enough resources to maintain a stable environment while testing. When analyzing the results, you also focus more on the performance differences between different implementations rather than the test data itself.

@BenchmarkMode

This annotation is used to specify the benchmark type, corresponding to the Mode option, which can be used to modify both classes and methods. The value, in this case, is an array that can be configured with multiple statistical dimensions. Such as:

@ BenchmarkMode ({Throughput, Mode. AverageTime}). The statistics are throughput and average execution time.

The so-called modes, in JMH, can be divided into the following types:

Throughput: Overall Throughput, such as QPS, Throughput per unit time, etc.
AverageTime: indicates the AverageTime of each execution. If the value is too small to recognize, you can adjust the unit time of the count to a smaller value.
SampleTime:randomsampling.
SingleShotTime: If you want to test performance only once, such as how long the first initialization took, you can use this parameter, which is no different from the traditional main method.
All: calculate All indicators. You can set this parameter to see the effect.

Let’s take the average time and look at a general execution result:

Result "com.github.xjjdog.tuning.BenchmarkTest.shift":
  2.068Plus or minus (99.9%) 0.038 ns/op [Average]
  (min, avg, max) = (2.059.2.068.2.083), stdev = 0.010
  CI (99.9%) :2.030.2.106] (assumes normal distribution)
Copy the code

Since we stated the time in nanoseconds, the average response time for this shift method is 2.068 nanoseconds.

We can also look at the final elapsed time.

Benchmark            Mode  Cnt  Score   Error  Units
BenchmarkTest.div    avgt    5  2.072 ± 0.053  ns/op
BenchmarkTest.shift  avgt    5  2.068 ± 0.038  ns/op
Copy the code

Since it is an average, Error means Error (or fluctuation).

As you can see, each of these metrics has a time dimension configured through the ** @outputTimeUnit ** annotation.

This one is simpler, specifying the time type of the benchmark results. Can be used on classes or methods. It’s usually seconds, milliseconds, microseconds, nanoseconds that’s a very fast way to do it.

For example, @benchmarkmode (mode.throughput) combined with @outputTimeUnit (timeunit.milliseconds) represents Throughput per millisecond.

The following results for throughput are measured in milliseconds.

Benchmark             Mode  Cnt       Score       Error   Units
BenchmarkTest.div    thrpt    5  482999.685 ±  6415.832  ops/ms
BenchmarkTest.shift  thrpt    5  480599.263 ± 20752.609  ops/ms
Copy the code

The OutputTimeUnit annotation can also modify a class or method to get more readable results by changing the time level.

@Fork

The value of fork is typically set to 1, indicating that only one process is being tested. If the number is greater than 1, the new process is enabled for testing. If set to 0, however, the program will still run, but on the user’s JVM process. This is not recommended, as you can see below.

# Fork: N/A, test runs in the host VM
# *** WARNING: Non-forked runs may silently omit JVM options, mess up profilers, disable compiler hints, etc. ***
# *** WARNING: Use non-forked runs only for debugging purposes, not for actual performance runs. ***
Copy the code

Does fork run in a process or thread environment? We traced the JMH source code and found that each fork is run separately in the Proccess process, thus allowing for complete environmental isolation and avoiding cross effects. Its input and output streams are sent to our execution terminal in a Socket connection mode.

Here’s a little tip. The fork annotation actually has a parameter called jvmArgsAppend that we can pass some JVM parameters through.

@Fork(value = 3, jvmArgsAppend = {"-Xmx2048m", "-server", "-XX:+AggressiveOpts"})
Copy the code

In ordinary tests, you can also increase the number of forks appropriately to reduce test errors.

@Threads

Fork is process-oriented while Threads is thread-oriented. When this annotation is specified, parallel testing is turned on.

If threads.max is configured, use the same number of Threads as the number of machine cores to process.

@Group

The @group annotation can only be applied to methods to Group test methods. Use this annotation if you have a large number of methods in a single test file, or if you need to categorize them.

The @GroupThreads annotation, which is associated with it, adds some additional threading Settings to this category.

@State

@state specifies the scope of a variable in a class. It has three values.

@state is used to declare that a class is a “State”, and the Scope argument can be used to indicate the shared Scope of that State. This annotation must be applied to the class or the prompt will not work.

Scope has the following three values:

Benchmark: Indicates that a variable is scoped to a Benchmark class.
Threads: One copy per Thread. If the Threads annotation is configured, each Thread has a copy of the variables that do not affect each other.
Group: Contact the @group annotation above to share the same variable instance within the same Group.

In the JMHSample04DefaultState test file, we demonstrate that the default scope of variable X is Thread. The key code is as follows:

@State(Scope.Thread)
public class JMHSample_04_DefaultState {
    double x = Math.PI;
    @Benchmark
    public void measure(a) { x++; }}Copy the code

@ Setup and @ TearDown

Similar to the unit testing framework JUnit, used for pre-benchmark initialization actions, @teardown is used for post-benchmark actions to do some global configuration.

These two annotations also have a Level value indicating when the method is run, which has three values.

Trial: the default level. That’s the Benchmark level.
Iteration: Each Iteration will run.
Invocation: Every method Invocation runs; this is the most granular.

@Param

The @param annotation can only modify fields to test the effect of different parameters on application performance. With the @state annotation, you can specify the execution scope of these parameters at the same time.

Example code:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class JMHSample_27_Params {
    @Param({"1", "31", "65", "101", "103"})
    public int arg;
    @Param({"0", "1", "2", "4", "8", "16", "32"})
    public int certainty;
    @Benchmark
    public boolean bench(a) {
        return BigInteger.valueOf(arg).isProbablePrime(certainty);
    }
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_27_Params.class.getSimpleName())
// .param("arg", "41", "42") // Use this to selectively constrain/override parameters
                .build();

        newRunner(opt).run(); }}Copy the code

It is worth noting that if you set a lot of parameters, they will be executed multiple times, usually for a long time. For example, if you have 1 M parameters and 2 N parameters, you have to execute M times N times.

Below is a screenshot of the result.

@CompilerControl

This is a very useful feature.

The overhead of method calls in Java can be quite high, especially if the call volume is very high. Take simple getter/setter methods, which abound in Java code. When we visit, we need to create the corresponding stack frame, access to the required field, and then pop up the stack frame, restore the execution of the original program.

If you can bring the access and operation of these objects into the call scope of the target method, you can save a method call and speed up, which is the concept of method inlining. As shown in the figure, there is a big increase in efficiency when the code is JIT compiled.

This annotation can be used on a class or method to control the compilation behavior of the method. There are three common modes.

Enforce INLINE, forbid DONT_INLINE, and even forbid method compilation (EXCLUDE).

2. Visualize the results

The results of JMH tests can be reprocessed and graphically displayed. Combined with graph data, more intuitive. By specifying the output format file at runtime, performance test results of the corresponding format can be obtained.

For example, this line of code specifies to output data in JSON format.

Options opt = new OptionsBuilder()
    .resultFormat(ResultFormatType.JSON)
    .build();
Copy the code

JMH supports results in the following five formats:

TEXT Exports a TEXT file.
CSV Export a CSV file.
SCSV Export files in SCSV format.
JSON Export to JSON file.
LATEX exports to LATEX, a typesetting system based on oxΕ χ.

Generally speaking, we export into CSV files, operate directly in Excel, and generate corresponding graphics.

Here are some other tools that can be used to make diagrams:

JMH Visualizer There is an open source project (jmh.morethan. IO /) that provides simple statistics by exporting json files and uploading them. Personally, I don’t think the presentation is very good.

jmh-visual-chart

For comparison, the following tool (deepoove.com/jmh-visual-… , is relatively intuitive.

meta-chart

A general-purpose online chart generator. (www.meta-chart.com/), after exporting the CSV file…

Some continuous integration tools, such as Jenkins, also provide plug-ins to display these test results directly.

END

This tool is very useful and uses exact test data to support our analysis results. In general, if hot code is located, specific optimizations with benchmarking tools are required until significant performance improvements are achieved.

In our scenario, NanoID is actually much faster than UUID.

Xjjdog is a public account that doesn’t allow programmers to get sidetracked. Focus on infrastructure and Linux. Ten years architecture, ten billion daily flow, and you discuss the world of high concurrency, give you a different taste. My personal wechat xjjdog0, welcome to add friends, further communication.