preface

As we all know, code is at the heart of a project, and a small piece of code can affect the entire project experience. A project from zero to one, from growth to maturity, is inseparable from the careful polishing of the code. Details determine success or failure, and an excellent open source project is just like this. This dry experience post will take ShardingSphere 5.1.0 performance improvement as an example to bring you the ultimate experience of code details and how to achieve a leap in code.

Weijie Wu, SphereEx Infrastructure Development Engineer, Apache ShardingSphere Committer. Currently focused on Apache ShardingSphere and its sub-project ElasticJob.

The optimized content

Correct the use of Optional

Java 8 introduced java.util.Optional to make code more elegant, such as avoiding methods that return NULL directly. Optional has two common methods:

public T orElse(T other) { return value ! = null ? value : other; }public T orElseGet(Supplier<? extends T> other) { return value ! = null ? value : other.get(); }Copy the code

In class ShardingSphere org. Apache. ShardingSphere. Infra. Binder. The segment. Select. Orderby. Engine. OrderByContextEngine have so a period of use Optional code:

Optional<OrderByContext> result = // return result.orElse(getDefaultOrderByContextWithoutOrderBy(groupByContext));Copy the code

OrElse methods will be called even if the result of orElse is not null, especially if orElse methods involve modification operations. Cases involving method calls should be written as follows:

Optional<OrderByContext> result = // return result.orElseGet(() -> getDefaultOrderByContextWithoutOrderBy(groupByContext));Copy the code

Use lambda to provide a Supplier to orElseGet so that methods in orElseGet are called only if result is empty.

Related PR: github.com/apache/shar…

Avoid frequent concurrent calls to computeIfAbsent for Java 8 ConcurrentHashMap

Java. Util. Concurrent ConcurrentHashMap is we in concurrent scenarios a Map, which are frequently used for all operations than synchronized modify Java. Util. Hashtable, ConcurrentHashMap provides better performance with thread safety. In the Java 8 implementation, however, ConcurrentHashMap’s computeIfAbsent will still fetch a value from the synchronized block in the presence of a key. Frequent computeIfAbsent calls to the same key greatly affect concurrency performance.

Reference: bugs.openjdk.java.net/browse/JDK-…

This problem was solved in Java 9, but in order to ensure concurrency performance in Java 8, we adjusted the writing in ShardingSphere code to circumvent this problem.

A high-frequency calls to ShardingSphere class org. Apache. ShardingSphere. Infra. Executor. SQL. Prepare. Driver. DriverExecutionPrepareEngine, for example:

// omit some code... private static final Map<String, SQLExecutionUnitBuilder> TYPETOBUILDERMAP = new ConcurrentHashMap<>(8, 1); // omit some code... public DriverExecutionPrepareEngine(final <span class="hljs-builtin" style="font-size: inherit; line-height: inherit; margin: 0px; padding: 0px; color: rgb(170, 87, 60); word-wrap: inherit ! important; word-break: inherit ! important;" >String type, final int maxConnectionsSizePerQuery, final ExecutorDriverManager<C, ? ,? > executorDriverManager, final StorageResourceOption option, final Collection<ShardingSphereRule> rules) { super(maxConnectionsSizePerQuery, rules); this.executorDriverManager = executorDriverManager; this.option = option; sqlExecutionUnitBuilder = TYPETOBUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties())); }Copy the code

There are only two types of computeIfAbsent passed into the above code, and this code is necessary for most SQL execution. In other words, the computeIfAbsent method will be called to the same key frequently, resulting in limited concurrent performance. We circumvent this problem in the following ways:

SQLExecutionUnitBuilder result; if (null == (result = TYPE_TO_BUILDER_MAP.get(type))) { result = TYPE_TO_BUILDER_MAP.computeIfAbsent(type, key -> TypedSPIRegistry.getRegisteredService(SQLExecutionUnitBuilder.class, key, new Properties())); }return result;Copy the code

Related PR: github.com/apache/shar…

Avoid frequent calls to java.util.properties

Java.util. Properties is one of ShardingSphere’s more commonly used classes for configuration.Properties inherits from Java.util. Hashtable, so avoid frequent calls to Properties methods in concurrent situations.

We checked the classes related to data sharding algorithm in ShardingSphere Org. Apache. Shardingsphere. Sharding. Algorithm. Sharding. Inline. InlineShardingAlgorithm exists in the high frequency call getProperty logic, lead to concurrent performance is limited. We do this by placing the logic involved in the Properties method call in the Init method of InlineShardingAlgorithm to avoid calculating the concurrency performance of the logic in the sharding algorithm.

Related PR: github.com/apache/shar…

Avoid the use of the Collections. SynchronizedMap

In the process of screening the Monitor Blocked of ShardingSphere, Found in org. Apache. Shardingsphere. Infra.. The metadata schema. Model. TableMetaData USES the Collections in this class. The synchronizedMap modification will be the high frequency read the Map, Concurrency performance is affected. After analysis, the modified Map will only have modification operations in the initialization phase, subsequent read operations, we directly remove Collections. SynchronizedMap modification method.

Related PR:github.com/apache/shar…

String concatenation replaces unnecessary string.format

In class ShardingSphere org.apache.shardingsphere.sql.parser.sql.com mon. Constant. QuoteCharacter have so a logic:

 public String wrap(final String value) {        return String.format("%s%s%s", startDelimiter, value, endDelimiter);    }
Copy the code

Obviously the logic is to do String concatenation, but using string. format is more expensive than concatenating strings directly. We modify it as follows:

public String wrap(final String value) {        return startDelimiter + value + endDelimiter;    }
Copy the code

We use JMH to do a simple test, test results:

Java HotSpot(TM) 64-bit Server VM, 17.0.1+ 12-LT-39 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)# Warmup: 3 iterations, 5 s each# Measurement: 3 iterations, 5 s each# Timeout: 10 min per iteration# Threads: 16 threads, will synchronize iterations# Benchmark mode: Throughput, Ops/timeBenchmark Mode Cnt Score Error UnitsStringConcatBenchmark. BenchFormat THRPT 9 28490416.644 + / - 1377409.528 Ops/sStringConcatBenchmark benchPlus THRPT 9 163475708.153 + / - 1748461.858 ops/sCopy the code

As you can see, concatenating strings using String.format is more expensive than concatenating strings using +, and the performance of concatenating strings directly has been optimized since Java 9. This shows the importance of choosing the right string concatenation method.

Related PR: github.com/apache/shar…

Use for-each instead of high-frequency streams

The ShadingSphere 5.x code uses a lot of java.util.stream.stream.

In a performance test of BenchmarkSQL (Java implementation of TPC-C test), shardingSphere-JDBC + openGauss, We found that shardingsphere-JDBC performance was significantly improved by replacing all high-frequency streams found during the pressure test with for-each.

**ShardingSphere-JDBC and openGauss use bilitre JDK 8 on two 128-core AARCH64 machines respectively.

The above test results may also be related to the AARCH64 platform and JDK. However, the stream itself has some overhead, and the performance varies greatly in different scenarios. For the logic with high frequency invocation and uncertain stream performance optimization, we give priority to the for-each loop.

Related PR: github.com/apache/shar…

Avoid unnecessary logical (repetitive) calls

There are many examples of avoiding unnecessary logic calls:

HashCode calculation

. There is a kind of org ShardingSphere apache. ShardingSphere. Sharding. The route. The engine. The condition. The Column has realized the equals and hashCode methods:

@RequiredArgsConstructor@Getter@ToStringpublic final class Column {    private final String name;    private final String tableName;    @Override    public boolean equals(final Object obj) {...}    @Override    public int hashCode() {        return Objects.hashCode(name.toUpperCase(), tableName.toUpperCase());     } }
Copy the code

Obviously, the above class is immutable, but the implementation of the hashCode method calls the method every time it evaluates hashCode. If this object is frequently accessed in a Map or Set, there will be a lot of unnecessary computational overhead.

After the adjustment:

@Getter@ToStringpublic final class Column {    private final String name;    private final String tableName;    private final int hashCode;    public Column(final String name, final String tableName) {        this.name = name;        this.tableName = tableName;        hashCode = Objects.hash(name.toUpperCase(), tableName.toUpperCase());    }    @Override    public boolean equals(final Object obj) {...}    @Override    public int hashCode() {        return hashCode;    } }
Copy the code

Related PR: github.com/apache/shar…

Use lambda instead of reflection calling methods

In the ShardingSphere source code, there are the following scenarios that need to record method and parameter calls and replay method calls on specified objects when needed:

  1. Send statements such as BEGIN to shardingsphere-proxy.

  2. Use ShardingSpherePreparedStatement to specify the location of the placeholder setting parameters.

Take the following code as an example. Before refactoring, reflection is used to record method calls and replay. Reflection calls themselves have certain performance overhead and the code is not readable enough:

@Overridepublic void begin() { recordMethodInvocation(Connection.class, "setAutoCommit", new Class[]{boolean.class}, new Object[]{false}); }Copy the code

After refactoring, the overhead of calling methods using reflection is avoided:

@Overridepublic void begin() {    connection.getConnectionPostProcessors().add(target -> {        try {            target.setAutoCommit(false);        } catch (final SQLException ex) {            throw new RuntimeException(ex);        }    });}
Copy the code

Related PR:

Github.com/apache/shar…

Github.com/apache/shar…

Netty Epoll support for AARCH64

Netty’s Epoll implementation supports a Linux environment with aARCH64 architecture since 4.1.50.final. In the AARCH64 Linux environment, using Netty Epoll API can improve performance compared to Netty NIO API.

Reference: stackoverflow.com/a/23465481/…

5.1.0 and 5.0.0 Comparison of SHARdingSphere-Proxy TPC-C performance tests

We used TPC-C to benchmark ShardingSphere-Proxy to verify the results of performance optimization. Since earlier versions of ShardingSphere-Proxy have limited support for PostgreSQL, tPC-C testing is not possible, so 5.0.0 is used for comparison with 5.1.0.

In order to highlight the performance loss of ShardingSphere-Proxy, shardingSphere-Proxy of data sharding (1 sharding) is used in this test to compare with PostgreSQL 14.2.

Test according to the official document “BenchmarkSQL performance test (shardingsphere.apache.org/document/cu…)” The configuration is reduced from four fragments to one fragment.

The test environment

Test parameters

BenchmarkSQL parameters:

  • warehouses=192 (data volume)
  • Terminals =192 (Number of concurrent requests)
  • terminalWarehouseFixed=false
  • Running time 30 mins

PostgreSQL JDBC parameters:

  • defaultRowFetchSize=50
  • reWriteBatchedInserts=true

Shardingsphere-proxy JVM

  • -Xmx16g
  • -Xms16g
  • -Xmn12g
  • -XX:AutoBoxCacheMax=4096
  • -XX:+UseNUMA
  • -XX:+DisableExplicitGC
  • -XX:LargePageSizeInBytes=128m
  • -XX:+SegmentedCodeCache
  • -XX:+AggressiveHeap

The test results

Conclusions obtained in the context and scenarios of this paper are as follows:

  • Based on ShardingSphere-Proxy 5.0.0 + PostgreSQL, the performance of 5.1.0 is improved by about 26.8%.
  • Based on direct connection to PostgreSQL, ShardingSphere-Proxy 5.1.0 reduced losses by about 15% compared to 5.0.0, from 42.7% to 27.4%.

As the code details are optimized throughout all modules of ShardingSphere, the above test results do not cover all optimization points.

What do you think about performance

From time to time, people may ask, “How is ShardingSphere performing? How much wear and tear?”

In my opinion, the performance can meet the demand. Performance is a complex problem that is affected by many factors. In different environments and scenarios, the performance loss of ShardingSphere may be less than 1% or as high as 50%. We cannot give an answer without the environment and scenario. In addition, as an infrastructure, the performance of ShardingSphere is one of the key factors to be considered in the research and development process. Teams and individuals in The ShardingSphere community will also continue to play the craftsman spirit and constantly push the performance of ShardingSphere to the extreme.