Performance tuning
A large number of classes for performance improvement exist in the java.util.Concurrent library in the Java SE5 class library. When reading through the library, it’s hard to tell which classes are suitable for general applications and which are suitable for improving performance.
Compare mutually exclusive techniques
Java includes the old synchronized keyword and the new Lock and Atomic classes in Java SE5, so it makes sense to compare these different approaches and understand more about their respective value and scope of use.
We did a simple test for each:
public abstract class Incrementable {
protected long counter =0;
public abstract void increment(a);
}
Copy the code
Synchronized method:
public class SynchronizingTest extends Incrementable{
@Override
public synchronized void increment(a) {
// TODO Auto-generated method stub++counter; }}Copy the code
The Lock mode:
public class LockingTest extends Incrementable{
private Lock lock = new ReentrantLock();
@Override
public void increment(a) {
// TODO Auto-generated method stub
lock.lock();
try {
++counter;
} finally{ lock.unlock(); }}}Copy the code
Test two classes:
public class SimpleMicroBenchmark {
static long test(Incrementable incrementable){
long start = System.nanoTime();
for (int i = 0; i < 10000000l; i++) {
incrementable.increment();
}
return System.nanoTime()-start;
}
public static void main(String[] args) {
// TODO Auto-generated method stub
long synchTime = test(new SynchronizingTest());
long lockTime = test(new LockingTest());
System.out.printf("Synchronizing: %1$10d\n",synchTime);
System.out.printf("lockTime: %1$10d\n",lockTime); }}Copy the code
Test results:
Synchronizing: 231087528
lockTime: 199864257
Copy the code
Lock is much more efficient than synchronized, and the overhead of synchronized seems to vary widely, whereas Lock is relatively consistent. (Run the above code a few more times)
Does this mean that you should never use the synchronized keyword? There are two factors to consider here: First, depending on the size of the mutex method body, the mutex part can be very large in real development, so the percentage of time spent in the method body can be significantly greater than the cost of entering and exiting the mutex, thus drowning out the benefit of increasing the speed of the mutex. Of course, we need to try different approaches to see their impact when tuning performance. Second, it’s clear that the synchronized keyword produces much less code than Lock and is much more readable. Code is read far more often than it is written. Communication with other people during programming is much more important than communication with a computer, so readability of code is critical.
Avoid lock the container
Containers are a fundamental tool for all programming, including concurrent programming. The earlier container classes Vector and Hashtable had many synchronized methods that caused unacceptable overhead when used in non-multithreaded programs. In Java 1.2, the new container library was not synchronized, and the Collections class provided a variety of static synchronization decorators to synchronize different types of containers. Although this is an improvement because it allows you to choose whether or not to use synchronization in your containers, this overhead is still based on the synchronized locking mechanism. Java SE5 specifically adds new containers to improve thread-safe performance by using more clever tricks to eliminate locking.
The general strategy behind these lockfree containers is that changes to the container can occur at the same time as reads, as long as the reader sees only the result of the modification. The modification is performed on a separate copy of a portion of the container data structure, and this copy is not visible during the modification process. Only when the modification is complete, the modified structure is automatically exchanged with the master data structure, and the reader can see the result of the modification.
In CopyOnWriteArrayList, writing causes a copy of the entire underlying array to be created, while the source array is left in place so that reads can be safely performed while the copied array is modified. When the modification is complete, an atomic operation will swap the new array in, making the new modification visible to the new read operation. CopyOnWriteArrayList when more than one iterator is one of the advantages of traverse and modify the list at the same time, ConcurrentModificationException. CopyOnWriteArraySet will use CopyOnWriteArrayList to implement its lock-free behavior. ConcurrentHashMap and ConcurrentLinkedQueue use similar techniques to allow concurrent reads and writes, but only parts of the container rather than the entire container can be copied and modified. However, the reader still cannot see any changes until they are complete.
Optimistic locking
As long as you’re primarily reading from a lock-free container, it’s much faster than its synchronized counterpart because the overhead of acquiring and releasing locks is eliminated. The same is true if you want to perform a small number of writes to a lock-free container. Below we will use a test example to show a general idea of how these containers differ in performance in various situations.
Start with a generic framework for performing tests on any type of container, including maps, where the generic parameter C represents the type of the container:
public abstract class Tester<C> {
static int testReps = 10;
static int testCycles = 1000;
static int containerSize = 1000;
abstract C containerInitializer(a);
abstract void startReadersAndWriters(a);
C testContainer;
String testId;
int nReaders;
int nWriters;
volatile long readResult = 0;
volatile long readTime = 0;
volatile long writeTime = 0;
CountDownLatch endLatch;
static ExecutorService exec =
Executors.newCachedThreadPool();
Integer[] writeData;
Tester(String testId, int nReaders, int nWriters) {
this.testId = testId + "" +
nReaders + "r " + nWriters + "w";
this.nReaders = nReaders;
this.nWriters = nWriters;
writeData = Generated.array(Integer.class,
new RandomGenerator.Integer(), containerSize);
for(int i = 0; i < testReps; i++) {
runTest();
readTime = 0;
writeTime = 0; }}void runTest(a) {
endLatch = new CountDownLatch(nReaders + nWriters);
testContainer = containerInitializer();
startReadersAndWriters();
try {
endLatch.await();
} catch(InterruptedException ex) {
System.out.println("endLatch interrupted");
}
System.out.printf("%-27s %14d %14d\n",
testId, readTime, writeTime);
if(readTime ! =0&& writeTime ! =0)
System.out.printf("%-27s %14d\n"."readTime + writeTime =", readTime + writeTime);
}
abstract class TestTask implements Runnable {
abstract void test(a);
abstract void putResults(a);
long duration;
public void run(a) {
long startTime = System.nanoTime();
test();
duration = System.nanoTime() - startTime;
synchronized(Tester.this) { putResults(); } endLatch.countDown(); }}public static void initMain(String[] args) {
if(args.length > 0)
testReps = new Integer(args[0]);
if(args.length > 1)
testCycles = new Integer(args[1]);
if(args.length > 2)
containerSize = new Integer(args[2]);
System.out.printf("%-27s %14s %14s\n"."Type"."Read time"."Write time"); }}Copy the code
The abstract method containerInitializer() returns the initialized container to be tested, stored in the testContainer domain. The other abstract method, startReadersAndWriters(), starts read and write tasks that read and modify the container under test. Different tests will have varying numbers of readers and writers at run time so that lock contention and writes can be observed. We provide the constructor with various information about the tests, and then call the runTest() method repetions several times. RunTest () creates a CountDownLatch, initializes the container, and then calls startReadersAndWriters(), waiting for them all to complete.
Each Reader and Writer class is based on TestTask, which measures the execution time of its abstract method test() and then calls putResults() in a synchronized block to store the measurement.
To use the framework, we must have the specific type of container we want to test inherit Tester and provide the appropriate Reader and Writer classes:
abstract class ListTest extends Tester<List<Integer>> {
ListTest(String testId, int nReaders, int nWriters) {
super(testId, nReaders, nWriters);
}
class Reader extends TestTask {
long result = 0;
void test(a) {
for(long i = 0; i < testCycles; i++)
for(int index = 0; index < containerSize; index++)
result += testContainer.get(index);
}
void putResults(a) { readResult += result; readTime += duration; }}class Writer extends TestTask {
void test(a) {
for(long i = 0; i < testCycles; i++)
for(int index = 0; index < containerSize; index++)
testContainer.set(index, writeData[index]);
}
void putResults(a) { writeTime += duration; }}void startReadersAndWriters(a) {
for(int i = 0; i < nReaders; i++)
exec.execute(new Reader());
for(int i = 0; i < nWriters; i++)
exec.execute(newWriter()); }}class SynchronizedArrayListTest extends ListTest {
List<Integer> containerInitializer(a) {
return Collections.synchronizedList(
new ArrayList<Integer>(
new CountingIntegerList(containerSize)));
}
SynchronizedArrayListTest(int nReaders, int nWriters) {
super("Synched ArrayList", nReaders, nWriters); }}class CopyOnWriteArrayListTest extends ListTest {
List<Integer> containerInitializer(a) {
return new CopyOnWriteArrayList<Integer>(
new CountingIntegerList(containerSize));
}
CopyOnWriteArrayListTest(int nReaders, int nWriters) {
super("CopyOnWriteArrayList", nReaders, nWriters); }}public class ListComparisons {
public static void main(String[] args) {
Tester.initMain(args);
new SynchronizedArrayListTest(10.0);
new SynchronizedArrayListTest(9.1);
new SynchronizedArrayListTest(5.5);
new CopyOnWriteArrayListTest(10.0);
new CopyOnWriteArrayListTest(9.1);
new CopyOnWriteArrayListTest(5.5); Tester.exec.shutdown(); }}Copy the code
Execution Result:
Type Read time Write time
Synched ArrayList 10r 0w 232158294700 0
Synched ArrayList 9r 1w 198947618203 24918613399
readTime + writeTime = 223866231602
Synched ArrayList 5r 5w 117367305062 132176613508
readTime + writeTime = 249543918570
CopyOnWriteArrayList 10r 0w 758386889 0
CopyOnWriteArrayList 9r 1w 741305671 136145237
readTime + writeTime = 877450908
CopyOnWriteArrayList 5r 5w 212763075 67967464300
readTime + writeTime = 68180227375
Copy the code
From the output, synchronized ArrayList performs roughly the same regardless of the number of readers and writers. Readers compete for locks with other readers in the same way as writers do. However, CopyOnWriteArrayList is faster if there are no writers. And it’s still fast when there are multiple writers. It seems like you should try to use CopyOnWriteArrayList as much as possible, and it doesn’t affect writing to the list any more than short-term synchronization of the entire list. Of course, testing is different in different environments, and you’ll have to try two different approaches in your specific application to see which one is better.
Compare the various Map implementations
SynchronizedHashMap and ConcurrentHashMap compare in terms of performance:
abstract class MapTest
extends Tester<Map<Integer.Integer>> {
MapTest(String testId, int nReaders, int nWriters) {
super(testId, nReaders, nWriters);
}
class Reader extends TestTask {
long result = 0;
void test(a) {
for(long i = 0; i < testCycles; i++)
for(int index = 0; index < containerSize; index++)
result += testContainer.get(index);
}
void putResults(a) { readResult += result; readTime += duration; }}class Writer extends TestTask {
void test(a) {
for(long i = 0; i < testCycles; i++)
for(int index = 0; index < containerSize; index++)
testContainer.put(index, writeData[index]);
}
void putResults(a) { writeTime += duration; }}void startReadersAndWriters(a) {
for(int i = 0; i < nReaders; i++)
exec.execute(new Reader());
for(int i = 0; i < nWriters; i++)
exec.execute(newWriter()); }}class SynchronizedHashMapTest extends MapTest {
Map<Integer,Integer> containerInitializer(a) {
return Collections.synchronizedMap(
new HashMap<Integer,Integer>(
MapData.map(
new CountingGenerator.Integer(),
new CountingGenerator.Integer(),
containerSize)));
}
SynchronizedHashMapTest(int nReaders, int nWriters) {
super("Synched HashMap", nReaders, nWriters); }}class ConcurrentHashMapTest extends MapTest {
Map<Integer,Integer> containerInitializer(a) {
return new ConcurrentHashMap<Integer,Integer>(
MapData.map(
new CountingGenerator.Integer(),
new CountingGenerator.Integer(), containerSize));
}
ConcurrentHashMapTest(int nReaders, int nWriters) {
super("ConcurrentHashMap", nReaders, nWriters); }}public class MapComparisons {
public static void main(String[] args) {
Tester.initMain(args);
new SynchronizedHashMapTest(10.0);
new SynchronizedHashMapTest(9.1);
new SynchronizedHashMapTest(5.5);
new ConcurrentHashMapTest(10.0);
new ConcurrentHashMapTest(9.1);
new ConcurrentHashMapTest(5.5); Tester.exec.shutdown(); }}Copy the code
Test results:
Type Read time Write time
Synched HashMap 10r 0w 306052025049 0
Synched HashMap 9r 1w 428319156207 47697347568
readTime + writeTime = 476016503775
Synched HashMap 5r 5w 243956877760 244012003202
readTime + writeTime = 487968880962
ConcurrentHashMap 10r 0w 23352654318 0
ConcurrentHashMap 9r 1w 18833089400 1541853224
readTime + writeTime = 20374942624
ConcurrentHashMap 5r 5w 12037625732 11850489099
readTime + writeTime = 23888114831
Copy the code
The impact of adding writes to ConcurrentHashMap is even less obvious than CopyOnWriteArrayList, because ConcurrentHashMap uses a different technique that significantly minimizes the impact of writes.
Optimistic locking
Although Atomic objects will perform Atomic operations like decrementAndGet(), some Atomic classes also allow you to perform so-called optimistic locking. This means that when you perform a calculation, you don’t actually use a mutex, but you need to use a method called compareAndSet() when the object is evaluated and ready to update. You submit the old and new values to the method, and if they are different, the operation fails. This means that a task somewhere modified the object during the operation. But we are optimistic because we keep the data in an unlocked state and hope that no other task inserts to modify it. You can gain performance benefits by using Atomic instead of synchronized or lock.
What happens if compareAndSet() fails? This is a tricky problem, and one of the limitations of using this technology is that it can only be used to deal with problems under the same conditions. If you fail, you have to decide what to do. You can’t use this technique because you can’t perform certain restore operations.
Here’s an example, once you run the program and find it slow and start applying performance tuning techniques:
public class FastSimulation {
static final int N_ELEMENTS = 100000;
static final int N_GENES = 30;
static final int N_EVOLVERS = 50;
static final AtomicInteger[][] GRID =
new AtomicInteger[N_ELEMENTS][N_GENES];
static Random rand = new Random(47);
static class Evolver implements Runnable {
public void run(a) {
while(! Thread.interrupted()) {// Randomly select an element to work on:
int element = rand.nextInt(N_ELEMENTS);
for(int i = 0; i < N_GENES; i++) {
int previous = element - 1;
if(previous < 0) previous = N_ELEMENTS - 1;
int next = element + 1;
if(next >= N_ELEMENTS) next = 0;
int oldvalue = GRID[element][i].get();
// Perform some kind of modeling calculation:
int newvalue = oldvalue +
GRID[previous][i].get() + GRID[next][i].get();
newvalue /= 3; // Average the three values
if(! GRID[element][i] .compareAndSet(oldvalue, newvalue)) {// Policy here to deal with failure. Here, we
// just report it and ignore it; our model
// will eventually deal with it.
print("Old value changed from " + oldvalue);
}
}
}
}
}
public static void main(String[] args) throws Exception {
ExecutorService exec = Executors.newCachedThreadPool();
for(int i = 0; i < N_ELEMENTS; i++)
for(int j = 0; j < N_GENES; j++)
GRID[i][j] = new AtomicInteger(rand.nextInt(1000));
for(int i = 0; i < N_EVOLVERS; i++)
exec.execute(new Evolver());
TimeUnit.SECONDS.sleep(5); exec.shutdownNow(); }}Copy the code
All elements are placed in arrays, which is supposed to improve performance. Each Evolver object averages its value with the element before it and the element after it, and if the update fails, the value is simply printed and execution continues. Notice that there are no mutex occurrences in the above program.
ReadWriteLock
ReadWriteLock is optimized for infrequent writes to a data structure, but multiple tasks frequently read the data structure. ReadWriteLock allows you to have multiple readers at once, as long as none of them attempt to write. If the write lock is already held by another task, no reader can access it until the write lock is released.
ReadWriteLock does this improve application performance? It’s not certain. This depends on how often the data is read versus modified, the timing of read and write operations, how many thread contenders are running on multiple processors, and so on. Only experiments can prove if your program is optimized.
Here’s a basic example of ReadWriteLock:
public class ReaderWriterList<T> {
private ArrayList<T> lockedList;
// Make the ordering fair:
private ReentrantReadWriteLock lock =
new ReentrantReadWriteLock(true);
public ReaderWriterList(int size, T initialValue) {
lockedList = new ArrayList<T>(
Collections.nCopies(size, initialValue));
}
public T set(int index, T element) {
Lock wlock = lock.writeLock();
wlock.lock();
try {
return lockedList.set(index, element);
} finally{ wlock.unlock(); }}public T get(int index) {
Lock rlock = lock.readLock();
rlock.lock();
try {
// Show that multiple readers
// may acquire the read lock:
if(lock.getReadLockCount() > 1)
print(lock.getReadLockCount());
return lockedList.get(index);
} finally{ rlock.unlock(); }}public static void main(String[] args) throws Exception {
new ReaderWriterListTest(30.1); }}class ReaderWriterListTest {
ExecutorService exec = Executors.newCachedThreadPool();
private final static int SIZE = 100;
private static Random rand = new Random(47);
private ReaderWriterList<Integer> list =
new ReaderWriterList<Integer>(SIZE, 0);
private class Writer implements Runnable {
public void run(a) {
try {
for(int i = 0; i < 20; i++) { // 2 second test
list.set(i, rand.nextInt());
TimeUnit.MILLISECONDS.sleep(100); }}catch(InterruptedException e) {
// Acceptable way to exit
}
print("Writer finished, shutting down"); exec.shutdownNow(); }}private class Reader implements Runnable {
public void run(a) {
try {
while(! Thread.interrupted()) {for(int i = 0; i < SIZE; i++) {
list.get(i);
TimeUnit.MILLISECONDS.sleep(1); }}}catch(InterruptedException e) {
// Acceptable way to exit}}}public ReaderWriterListTest(int readers, int writers) {
for(int i = 0; i < readers; i++)
exec.execute(new Reader());
for(int i = 0; i < writers; i++)
exec.execute(newWriter()); }}Copy the code
ReaderWriterList can hold a fixed number of objects of any type. You must provide the constructor with the desired list size and the initial object used to assemble the list. The set() method obtains a write lock to call the underlying arrayList.set(), and the get() method obtains a read lock to call the underlying arrayList.get(). In addition, the get() method checks if more than one reader has already acquired the lock, and if so, displays the number of such readers to prove that more than one reader can acquire the lock.
If you look at the JDK documentation, ReentrantReadWriteLock, you’ll see a number of other methods available, involving fairness and policy issues. This is a fairly complex tool and should only be used if you are thinking of ways to improve performance. The first solution to your program should consider more intuitive synchronization.
Active objects
Multithreading becomes very complex and difficult to use correctly. Every detail is important and we need to take care of every detail without any security in the form of compiler checks. Is there something wrong with the multithreaded model? After all, it comes from the world of procedural programming, and little has changed.
There is an alternative approach called an activist or doer. These objects are active because each object maintains its own worker thread and message queue, and all requests for such objects are queued, and only one of them can be run at any time. Therefore, with live objects, we can serialize messages instead of methods, which means that there is no need to guard against a task being interrupted in the middle of its loop. When you send a message to an active object, the message is converted into a task that is inserted into the object queue to be executed at a later time. The Future of Java SE5 will come in handy when implementing this pattern.
Here is an example of two methods that queue method calls:
public class ActiveObjectDemo {
private ExecutorService ex = Executors.newSingleThreadExecutor();
private Random rand = new Random(47);
private void pause(int factor) {
try {
TimeUnit.MILLISECONDS.sleep(100+ rand.nextInt(factor));
} catch (InterruptedException e) {
// TODO Auto-generated catch block
System.out.println("sleep interrupted"); }}public Future<Integer> calculateInt(final int x,final int y) {
return ex.submit(new Callable<Integer>() {
@Override
public Integer call(a) throws Exception {
System.out.println("staring"+x+"+"+y);
pause(500);
returnx+y; }}); }public Future<Float> calculateFloat(final float x, final float y) {
return ex.submit(new Callable<Float>() {
public Float call(a) {
System.out.println("starting " + x + "+" + y);
pause(2000);
returnx + y; }}); }public void shutdown(a) {
ex.shutdown();
}
public static void main(String[] args) {
ActiveObjectDemo dl = newActiveObjectDemo(); List<Future<? >> results =new CopyOnWriteArrayList<>();
for(float f = 0.0 f; f < 1.0 f; f += 0.2 f)
results.add(dl.calculateFloat(f, f));
for(int i = 0; i < 5; i++)
results.add(dl.calculateInt(i, i));
System.out.println("All asynch calls made");
while (results.size() >0) {
for(Future<? > future : results) {if(future.isDone()) {
try {
System.out.println(future.get());
} catch(Exception e) {
throw newRuntimeException(e); } results.remove(future); } } } dl.shutdown(); }}Copy the code
Execution Result:
All asynch calls made starting 0.0 + 0.0 0.0 0.0 0.2 + 0.2 0.4 0.4 0.4 0.6 + 0.6 1.2 Starting 0.8 + 0.8 1.6 starting 0+0 0 starting 1+1 starting 2+2 2 4 Starting 3+3 6 Starting 4+4 8Copy the code
The single-threaded executor generated by the call to newSingleThreadExecutor() maintains its own unbounded blocking queue, and only one thread takes tasks from that queue and executes them until they are complete. All we need to do in calculateInt() and calculateFloat() is submit() to submit a new Callable object corresponding to calls to these methods so that methods can be turned into messages, The method body of submit() is contained in the anonymous inner class. Note that the method return value of each active object is a Future with a generic parameter, which is the actual return type of the method. In this way the method call can return immediately, and the caller can use the Future to discover that the task completed properly and collect the actual return value.
The main() method creates a List<Future<? >> to capture the Future object returned by the calculateInt() and calculateFloat() messages of the active object. For each Future, isDone() is used to extract it from the list. In this way, the Future is removed from the List when it is completed and its results are processed. Note that use CopyOnWriteArrayList can be removed in order to prevent abnormal ConcurrentModificationException and copy List of this kind of demand.
To prevent inadvertently preventing thread coupling, any method calls passed to live objects must take arguments that are read-only to other live objects, or that are disconnected, that is, objects that are not connected to any other task.
With live objects:
- Each object can have its own worker thread
- Each object will maintain full control over its own domain
- All communication between live objects takes place as messages between these objects
- All messages prior to the live object are queued
Because messages from one live object to another can only be blocked by a delay in queuing, and because this delay is always very short and independent of any other objects, sending messages is virtually non-blocking. Because a live object system communicates only via messages, two objects competing to call methods on another object are not blocked, which means no deadlocks occur. This is a huge step forward. Because the thread in the workspace of the live object only executes one message at a time, there is no competition for resources and you don’t have to worry about synchronizing methods. Synchronization still occurs, but it is controlled at the message level by queuing method calls so that only one call can occur at any time.
conclusion
This series introduces you to the basics of Java concurrent programming design, which you need to understand:
- You can run multiple independent tasks
- You must consider all the problems that can arise when these tasks are shut down.
- Tasks may interfere with each other in sharing resources. Mutexes (locks) are a basic tool used to prevent such conflicts.
- If the task is not carefully designed, it can be deadlocked.
Knowing when to use concurrency and when to avoid it is critical. The main reasons for using it are:
- There are many tasks to handle, they are interwoven, application concurrency can be used effectively by the computer
- Be able to organize your code better
- Make it more user-friendly
Main drawbacks of multithreading:
- Performance degradation while waiting for shared resources
- Additional CPU costs required to process threads
- Poor programming leads to unnecessary complexity
- It is possible to have pathological behaviors such as starvation, competition, deadlocks, and live locks (multiple tasks running on their own threads make the whole thing impossible to complete)
- Inconsistencies caused by different platforms.
Because threads may share a resource, such as the memory of an object, and you have to be sure that multiple threads don’t read and change the resource at the same time, this is the biggest problem with thread locking. This requires judicious use of available locking mechanisms, they are only tools, and they can introduce potential deadlock conditions, so understand them thoroughly.
My blog
Scan the following public account to get a full set of notes for free: