1. Use StringBuilder
StingBuilder should be used by default in our Java code and the + operator should be avoided. You may disagree with syntax Sugar on StringBuilder, such as:
Copy the code
- String x = “a” + args.length + “b”;
Will compile to:
Copy the code
- 0 new java.lang.StringBuilder [16]
- 3 dup
- 4 ldc <String “a”> [18]
- 6 invokespecial java.lang.StringBuilder(java.lang.String) [20]
- 9 aload_0 [args]
- 10 arraylength
- 11 invokevirtual java.lang.StringBuilder.append(int) : java.lang.StringBuilder [23]
- 14 ldc <String “b”> [27]
- 16 invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [29]
- 19 invokevirtual java.lang.StringBuilder.toString() : java.lang.String [32]
- 22 astore_1 [x]
But what happened? Do you need to use the following sections to improve strings next?
Copy the code
- String x = “a” + args.length + “b”;
- if (args.length == 1)
- xx = x + args[0];
Now comes the second StringBuilder, which does not consume extra memory in the heap, but puts pressure on the GC.
Copy the code
- StringBuilder x = new StringBuilder(“a”);
- x.append(args.length);
- x.append(“b”);
- if (args.length == 1);
- x.append(args[0]);
summary
In the example above, if you rely on the Java compiler to implicitly generate the instance, the effect of the compilation is almost irrelevant to whether the StringBuilder instance is used. Remember: in the N.O.P.E branch, we are wasting N x O x P time every time the CPU loops to GC or allocating default space for StringBuilder.
In general, using StringBuilder is better than using the + operator. If possible, select StringBuilder if you need to pass references across multiple methods, because Strings consume additional resources. JOOQ uses this approach when generating complex SQL statements. Only one StringBuilder is used throughout the AST Abstract Syntax Tree SQL pass.
Even more tragically, if you’re still using StringBuffer, use StringBuilder instead of StringBuffer, since there aren’t many strings that need to be synchronized.
2. Avoid regular expressions (good technical article)
Regular expressions give the impression of being quick and easy. But using regular expressions in the N.O.P.E branch is the worst decision you can make. If you must use regular expressions in computationally intensive code, at least cache Pattern to avoid repeating Pattern compilations.
Copy the code
- static final Pattern HEAVY_REGEX =
- Pattern.compile(“(((X)*Y)*Z)*”);
If only a simple regular expression like this is used:
Copy the code
- String[] parts = ipAddress.split(“\.”);
This is best done using plain char[] arrays or index-based operations. The less readable code below, for example, does the same thing.
Copy the code
- int length = ipAddress.length();
- int offset = 0;
- int part = 0;
- for (int i = 0; i < length; i++) {
- if (i == length – 1 ||
- ipAddress.charAt(i + 1) == ‘.’) {
- parts[part] =
- ipAddress.substring(offset, i + 1);
- part++;
- offset = i + 2;
- }
- }
The above code also shows that premature optimization doesn’t make sense. This code is less maintainable than the split() method, though.
The challenge: Can a smart friend come up with a faster algorithm? Welcome to leave a message.
summary
Regular expressions are useful, but they come at a cost. Especially deep in the N.O.P.E branch, avoid regular expressions in your code at all costs. Also be wary of the various JDK String methods that use regular expressions, such as string.replaceall () or string.split (). You can choose to use a popular development library, such as Apache Commons Lang, for string manipulation.
Do not use the iterator() method
This advice does not apply to general situations, but only to scenes deep in the N.O.P.E branch. Still, it’s worth knowing. The Java 5 format for loop writing is so convenient that we can forget about the internal loop methods, such as:
Copy the code
- for (String value : strings) {
- // Do something useful here
- }
Each time the code runs through this loop, if the strings variable is an Iterable, the code will automatically create an instance of Iterator. If you use an ArrayList, the virtual machine automatically allocates 3 integer memory sizes on the heap for objects.
Copy the code
- private class Itr implements Iterator<E> {
- int cursor;
- int lastRet = -1;
- int expectedModCount = modCount;
- // …
It is also possible to replace the above for loop with the following equivalent loop, “wasting” just one integer on the stack, which is quite cost-effective.
Copy the code
- int size = strings.size();
- for (int i = 0; i < size; i++) {
- String value : strings.get(i);
- // Do something useful here
- }
If the value of the string in the loop does not change very much, you can also loop with an array.
Copy the code
- for (String value : stringArray) {
- // Do something useful here
- }
summary
Iterators, Iterable interfaces, and foreach loops are useful both from a readability perspective and from an API design perspective. The trade-off is that using them creates an additional object on the heap for each child of the loop. If the loop is executed many, many times, be careful to avoid generating meaningless instances. It is best to replace the iterators, Iterable interfaces, and foreach loops with basic pointer loops.
discuss
Some of the objections to the above (in particular the use of pointer operations instead of iterators) can be found in the discussion on Reddit.
4. Don’t call expensive methods
Some methods are expensive. In the case of the N.O.P.E branch, we did not mention the method for leaves, but this could be done. Suppose our JDBC driver needs to calculate the return value of the resultSet.wasNULL () method against all the difficulties. Our own implementation of the SQL framework might look like this:
Copy the code
- if (type == Integer.class) {
- result = (T) wasNull(rs,
- Integer.valueOf(rs.getInt(index)));
- }
- // And then…
- static final <T> T wasNull(ResultSet rs, T value)
- throws SQLException {
- return rs.wasNull() ? null : value;
- }
In the above logic, the resultset.wasnull () method is called each time an int value is fetched from the ResultSet, but the getInt() method is defined as:
Return type: variable value; If the SQL query result is NULL, 0 is returned.
So a simple and effective way to improve is as follows:
Copy the code
- static final <T extends Number> T wasNull(
- ResultSet rs, T value
- )
- throws SQLException {
- return (value == null ||
- (value.intValue() == 0 && rs.wasNull()))
- ? null : value;
- }
It’s a breeze.
summary
Caching method calls instead of expensive methods on leaf nodes, or avoiding expensive method calls if method conventions allow.
5. Use primitive types and stacks
The example from jOOQ described above uses a lot of generics, resulting in wrapper classes for Byte, short, int, and Long. But at least generics should not be a limitation of code until they are specialized in Java 10 or Valhalla projects. The substitution can be done as follows:
Copy the code
- // Stored on the heap
- Integer i = 817598;
- … If you write it like this:
- // Store on the stack
- int i = 817598;
Things can get even worse when using arrays:
Copy the code
- // Three objects are generated on the heap
- Integer[] i = { 1337, 424242 };
… If you write it like this:
Copy the code
- // Only one object was generated on the heap
- int[] i = { 1337, 424242 };
summary
When we are in the depths of the N.O.P.E. branch, wrapper classes should be avoided at all costs. The downside of this is that it puts a lot of pressure on the GC. The GC will have its hands full cleaning up the objects generated by the wrapper class.
So an effective optimization method is to use primitive data types, fixed-length arrays, and a series of split variables to identify the position of the object in the array.
Lgpl-compliant Trove4J is a Java collection class library that gives us better performance than the integer array int[].
exception
There is an exception to this rule: Boolean and byte are not sufficient for the JDK to provide caching methods for them. We can write this:
Copy the code
- Boolean a1 = true; // … syntax sugar for:
- Boolean a2 = Boolean.valueOf(true);
- Byte b1 = (byte) 123; // … syntax sugar for:
- Byte b2 = Byte.valueOf((byte) 123);
The same is true for other basic integer types, such as char, short, int, and long.
Do not automatically box these integral primitives or call theType.valueof () when you call the constructor.
Also do not call constructors on wrapped classes unless you want an instance that is not created on the heap.
6. Avoid recursion (Real technical text)
Functional programming languages like Scala now encourage recursion. Recursion usually means tail-recursing that can be decomposed into individual optimizations. It’s great if your programming language supports it. Even so, be aware that a slight tweak to the algorithm will turn tail recursion into normal recursion.
Hopefully, the compiler will automatically detect this, otherwise we would have wasted a lot of stack frames for something we could have done with just a few local variables.
summary
There’s not much to say in this section, except that in the N.O.P.E branch, try to use iteration instead of recursion.
7. Use entrySet()
When we want to iterate over a Map stored in key-value pairs, we have to find a good reason for this code:
Copy the code
- for (K key : map.keySet()) {
- V value : map.get(key);
- }
Not to mention the following:
Copy the code
- for (Entry<K, V> entry : map.entrySet()) {
- K key = entry.getKey();
- V value = entry.getValue();
- }
Map should be used sparingly when we use the N.O.P.E. branch. Because many seemingly O(1) access operations are actually composed of a series of operations. And the access itself is not free. At the very least, use entrySet() to iterate if you have to use a map! In this case, all we need to access is an instance of Map.Entry.
summary
Always use the entrySet() method when iterating over a key-value pair Map.
8. Use EnumSet or EnumMap
In some cases, such as when using a configuration map, we may be aware of the key values stored in the map in advance. If the key value is very small, we should consider using EnumSet or EnumMap instead of the usual HashSet or HashMap. The following code gives a clear explanation:
Copy the code
- private transient Object[] vals;
- public V put(K key, V value) {
- // …
- int index = key.ordinal();
- vals[index] = maskNull(value);
- // …
- }
The key implementation of the previous code is that we use arrays instead of hash tables. Especially when inserting new values into a map, all you need to do is get a constant serial number generated by the compiler for each enumerated type. If you have a global map configuration (such as one instance), EnumMap will perform better than HashMap under pressure to increase access speed. The reason is that EnumMap uses one less (bit) of heap memory than HashMap, and HashMap calls the hashCode() and equals() methods on each key value.
summary
Enum and EnumMap are close partners. When we use an enumer-like structure for key values, we should consider declaring those key values as enumeration types and using them as EnumMap keys.
9. Optimize custom hasCode() and equals() methods
At the very least, optimize the hashCode() and equals() methods when EnumMap is not available. A good hashCode() method is necessary because it prevents unnecessary calls to the expensive equals() method.
In the inheritance structure of each class, you need simple objects that are easy to accept. Let’s see how jOOQ org.jooq.Table is implemented.
The simplest and fastest way to implement hashCode() is as follows:
Copy the code
- Basic implementation of a general Table, AbstractTable:
- @Override
- public int hashCode() {
- // [#1938] This is a more efficient implementation of hashCode() than standard QueryParts
- return name.hashCode();
- }
Name is the table name. We don’t even need to worry about schema or other table attributes, because table names are usually unique in the database. And the variable name is a string, which itself has already cached a hashCode() value.
Comments are important in this code because AbstractTable, which derives from AbstractQueryPart, is the basic implementation of any abstract syntax tree element. Ordinary abstract syntax tree elements don’t have any attributes, so you can’t be under any illusions about optimizing the implementation of the hashCode() method. The overridden hashCode() method looks like this:
Copy the code
- AbstractQueryPart AbstractQueryPart AbstractQueryPart
- @Override
- public int hashCode() {
- // This is a working default implementation.
- // Implementation subclasses should override this method to improve performance.
- return create().renderInlined(this).hashCode();
- }
In other words, the entire SQL rendering workflow is triggered to calculate the hash code for a common abstract syntax tree element.
Equals () is even more interesting:
Basic implementation of this table: AbstractTable
Copy the code
- @Override
- public boolean equals(Object that) {
- if (this == that) {
- return true;
- }
- // [#2144] abstractQueryPart.equals ()
- // We can tell early if objects are unequal.
- if (that instanceof AbstractTable) {
- if (StringUtils.equals(name,
- (((AbstractTable
) that).name))) { - return super.equals(that);
- }
- return false;
- }
- return false;
- }
First, don’t prematurely use the equals() method (not just in N.O.P.E. If:
-
this == argument
-
This “is not compatible with: parameters
Note: If we use instanceof too early to verify compatibility types, the following condition actually includes argument == null.
At the end of our comparison of these cases, we should be able to draw some conclusions. For example, jOOQ’s table.equals () method is used to compare whether two tables are the same. Regardless of the specific implementation type, they must have the same field name. For example, the following two elements cannot be the same:
-
com.example.generated.Tables.MY_TABLE
-
DSL. TableByName (” MY_OTHER_TABLE “)
If we can easily determine whether the passed argument is equal to the instance itself (this), we can abandon the operation if the result is false. If true is returned, we can further evaluate the super implementation. In cases where most of the objects we compare are unequal, we can save CPU execution time by ending the method early.
Some objects are more similar than others.
In jOOQ, most of the table instances are generated by jOOQ’s code generator, and their equals() methods have been deeply optimized. There are dozens of other table types (Derived tables, table-valued functions, Array tables, joined tables, pivot tables) Tables, common Table Expressions, and so on, keep the basic implementation of equals().
10. Consider using sets instead of individual elements (Technical paper)
Finally, there is a case that applies to all languages, not just Java. In addition, the branch of N.O.P.E. that we studied before will also be useful in understanding the order from O(N3) to O(n log n).
Unfortunately, many programmers think in terms of simple, native algorithms. They are used to solving problems step by step. This is the yes/or functional programming style of imperative. This style of programming makes it easy to model “bigger picture” in the transition from pure imperative programming to surface object programming to functional programming, but it lacks something that only exists in SQL and R:
Declarative programming.
In SQL, we can declare the desired effect of the database regardless of the algorithm. The database can choose the best algorithm based on the data type, such as constraints, keys, indexes, and so on.
In theory, we initially had the basic idea after SQL and Relational Calculus. In practice, SQL vendors have implemented cost-based optimizers (CBOs) over the past several decades. Then with the 2010 release, we finally realized the full potential of SQL.
But we don’t need to implement SQL in set mode yet. All languages and libraries support Sets, collections, bags, lists. The main benefit of using a set is that it makes our code cleaner. For example:
Copy the code
- SomeSet INTERSECT SomeOtherSet
Rather than
Copy the code
- // The way Java 8 was written before
- Set result = new HashSet();
- for (Object candidate : someSet)
- if (someOtherSet.contains(candidate))
- result.add(candidate);
- // Even using Java 8 doesn’t help much
- someSet.stream()
- .filter(someOtherSet::contains)
- .collect(Collectors.toSet());
Some people may disagree about functional programming and how Java 8 can help us write simpler, more concise algorithms. But that is not necessarily true. We could convert an imperative Java 7 loop to a Java 8 Stream Collection, but we used the same algorithm. Sql-style expressions are different:
Copy the code
- SomeSet INTERSECT SomeOtherSet
The code above can have 1000 different implementations on different engines. What we’re looking at today is making it smarter to automatically convert two sets to Enumsets before calling the INTERSECT operation. We can even do the parallel INTERSECT operation without calling the underlying stream.parallel () method.
conclusion
In this article, we discuss optimization of the N.O.P.E. branch. Such as delving into highly complex algorithms. As jOOQ developers, we are happy to optimize our SQL generation.
-
Each query is generated using a unique StringBuilder.
-
The template engine actually processes characters, not regular expressions.
-
Choose to use arrays whenever possible, especially when iterating over listeners.
-
Stay away from the JDBC approach.
-
And so on.
JOOQ is “at the bottom of the food chain” because it is the last API called by our computer program when it leaves the JVM and enters the DBMS. Being at the bottom of the food chain means it takes N x O x P time for any line to be executed in jOOQ, so I want to optimize it early.
Our business logic may not be as complex as N.O.P.E. But the infrastructure can be quite complex (local SQL frameworks, local libraries, and so on). So you need to do a review with Java Mission Control or some other tool to see if there are any optimizations that need to be made, following the principles we discussed today.
The original post was published on October 31, 2017
Author: Anonymous
This article is from the cloud community partner “51CTO”, you can pay attention to the relevant information.