A couple of days ago, when I was browsing the blog, I saw someone write a blog saying that ReentrantLock is slower than synchronized, which is quite contrary to my understanding. After reviewing his blog and test code in detail, I found that his test was not rigorous, and kindly pointed out his problems in the comments. As a result, he directly deleted his blog and deleted it…
Many of the older generation of programmers have a stereotype of synchronized as having poor performance, and then use the Lock class in the java.util.concurrent package. If you ask them how much worse synchronized and Lock implementations are, few of them will be able to answer. Speaking of which, would you like to know my test results? Synchronized and ReentrantLock achieve similar functions and use of the same, so let’s test the performance difference between the two.
The measured results
Test platform: JDK11, MacBook Pro (13-inch, 2017), JMH test
Public class LockTest {public class LockTest {
private static Object lock = new Object();
private static ReentrantLock reentrantLock = new ReentrantLock();
private static long cnt = 0;
@Benchmark
@Measurement(iterations = 2)
@Threads(10)
@Fork(0)
@Warmup(iterations = 5, time = 10)
public void testWithoutLock(){
doSomething();
}
@Benchmark
@Measurement(iterations = 2)
@Threads(10)
@Fork(0)
@Warmup(iterations = 5, time = 10)
public void testReentrantLock(){
reentrantLock.lock();
doSomething();
reentrantLock.unlock();
}
@Benchmark
@Measurement(iterations = 2)
@Threads(10)
@Fork(0)
@Warmup(iterations = 5, time = 10)
public void testSynchronized(){
synchronized (lock) {
doSomething();
}
}
private void doSomething() {
cnt += 1;
if (cnt >= (Long.MAX_VALUE >> 1)) {
cnt = 0;
}
}
public static void main(String[] args) {
Options options = new OptionsBuilder().include(LockTest.class.getSimpleName()).build();
try {
new Runner(options).run();
} catch (Exception e) {
} finally {
}
}
Copy the code
}
Benchmark Mode Cnt Score Error Units. LockTest testReentrantLock THRPT 2 32283819.289 ops/s LockTest. TestSynchronized THRPT 2 25325244.320 OPS /s LockTest. TestWithoutLock THRPT 2 641215542.492 OPS /sCopy the code
It’s true that synchronized is worse, but only by about 20%. I was surprised when I first tested it, knowing that synchronized would be worse, but the difference of orders of magnitude that you’d expect didn’t occur. I increased the number of @Threads Threads to increase the likelihood of contention between Threads and got the following result.
Benchmark Mode Cnt Score Error Units. LockTest testReentrantLock THRPT 2 29464798.051 ops/s LockTest. TestSynchronized THRPT 2 22346035.066 OPS /s LockTest. TestWithoutLock THRPT 2 383047064.795 OPS /sCopy the code
The performance differences are slightly different, but still on the same order of magnitude.
conclusion
There is no doubt that synchronized performs 20-30% worse than ReentrantLock, so should lock be used everywhere in your code that synchronized is used? No, if you think about it, ReentrantLock is a better substitute for almost any scenario that uses synchronized, so why does the JDK stick with this keyword? And there is absolutely no desire to scrap it.
Hegel saidBeing is reasonableSynchronized emerged at the historic moment because of multithreading, its existence also greatly simplifies the development of Java multithreading. Yes, its advantage is that it is easy to use, you do not need to display to add or remove the lock, compared to ReentrantLock use is much more complicated, you add the lock after the lock also have to consider the various circumstances of the lock release, a bit not careful on a bug buried.But beneath the complexity of ReentrantLock, it also provides a more complex API that can handle more complex requirements, as detailed in my previous blogReentrantLock source code parsing.
The performance difference between Synchronized and ReentrantLock is no longer a major factor in choosing a candidate these days, but rather the ease of use, functionality, and maintainability of your code… A 30% performance difference doesn’t really matter, and if you really want to optimize your code’s performance, you should start with something other than this one.
This article should have ended here, but I still wonder why synchronized has given the older generation of Java programmers an impression of poor performance. Unfortunately, jdK1.5 and prior data are not easy to find, but it is easy to find what JDK1.6 does to improve the performance of synchronized.
What does the JDK optimize for synchronized?
If you add synchronized, the JVM will compile and insert monitorenter and Monitorexit before and after it, as follows: monitorenter
void onlyMe(Foo f) {
synchronized(f) { doSomething(); }}Copy the code
The compiled:
Method void onlyMe(Foo) 0 aload_1 // Push f 1 dup // Duplicate it on the stack 2 astore_2 // Store duplicate in local variable 2 3 monitorenter // Enter the monitor associated with f 4 aload_0 // Holding the monitor, pass this and... 5 invokevirtual #5 // ... call Example.doSomething()V 8 aload_2 // Push local variable 2 (f) 9 monitorexit // Exit the monitor associated with f 10 goto 18 // Complete the method normally 13 astore_3 // In case of any throw, end up here 14 aload_2 // Push local variable 2 (f) 15 monitorexit // Be sure to exit the monitor! 16 aload_3 // Push thrown value... 17 athrow // ... and rethrow value to the invoker 18 return // Return in the normal case Exception table: From To Target Type 4 10 13 any 13 16 13 anyCopy the code
The performance costs of locking and releasing locks are reflected in monitorenter and Monitorexit, which are optimized for performance. Looking at the Art of Concurrent Programming in Java, Java6 introduced a lock hierarchy strategy to reduce the performance cost of lock acquisition and release. The lock states are divided into four states, i.e., no-lock, biased lock, lightweight lock and heavyweight lock, and their performance decreases successively. Fortunately, partial or lightweight locks can meet our requirements in most concurrent situations because of locality, and locks can only be upgraded in the event of serious competition, so synchronized performance is not too bad in most cases.
From this, we can roughly deduce that before JDK1.6, locks were not hierarchical, only heavyweight locks, threads blocked as long as they did not acquire locks, resulting in poor performance.
Monitorenter and Monitorexit are implemented in x86 versions of monitorenter and Monitorexit in jdk11u.
//-----------------------------------------------------------------------------
// Synchronization
//
// Note: monitorenter & exit are symmetric routines; which is reflected
// in the assembly code structure as well
//
// Stack layout:
//
// [expressions ] <--- rsp = expression stack top
// ..
// [expressions ]
// [monitor entry] <--- monitor block top = expression stack bot
// ..
// [monitor entry]
// [frame data ] <--- monitor block bot
// ...
// [saved rbp ] <--- rbp
void TemplateTable::monitorenter(a) {
transition(atos, vtos);
// check for NULL object
__ null_check(rax);
const Address monitor_block_top( rbp, frame::interpreter_frame_monitor_block_top_offset * wordSize);
const Address monitor_block_bot( rbp, frame::interpreter_frame_initial_sp_offset * wordSize);
const int entry_size = frame::interpreter_frame_monitor_size() * wordSize;
Label allocated;
Register rtop = LP64_ONLY(c_rarg3) NOT_LP64(rcx);
Register rbot = LP64_ONLY(c_rarg2) NOT_LP64(rbx);
Register rmon = LP64_ONLY(c_rarg1) NOT_LP64(rdx);
// initialize entry pointer
__ xorl(rmon, rmon); // points to free slot or NULL
// find a free slot in the monitor block (result in rmon)
{
Label entry, loop, exit;
__ movptr(rtop, monitor_block_top); // points to current entry,
// starting with top-most entry
__ lea(rbot, monitor_block_bot); // points to word before bottom
// of monitor block
__ jmpb(entry);
__ bind(loop);
// check if current entry is used
__ cmpptr(Address(rtop, BasicObjectLock::obj_offset_in_bytes()), (int32_t) NULL_WORD);
// if not used then remember entry in rmon
__ cmovptr(Assembler::equal, rmon, rtop); // cmov => cmovptr
// check if current entry is for same object
__ cmpptr(rax, Address(rtop, BasicObjectLock::obj_offset_in_bytes()));
// if same object then stop searching
__ jccb(Assembler::equal, exit);
// otherwise advance to next entry
__ addptr(rtop, entry_size);
__ bind(entry);
// check if bottom reached
__ cmpptr(rtop, rbot);
// if not at bottom then check this entry
__ jcc(Assembler::notEqual, loop);
__ bind(exit);
}
__ testptr(rmon, rmon); // check if a slot has been found
__ jcc(Assembler::notZero, allocated); // if found, continue with that one
// allocate one if there's no free slot
{
Label entry, loop;
// 1. compute new pointers // rsp: old expression stack top
__ movptr(rmon, monitor_block_bot); // rmon: old expression stack bottom
__ subptr(rsp, entry_size); // move expression stack top
__ subptr(rmon, entry_size); // move expression stack bottom
__ mov(rtop, rsp); // set start value for copy loop
__ movptr(monitor_block_bot, rmon); // set new monitor block bottom
__ jmp(entry);
// 2. move expression stack contents
__ bind(loop);
__ movptr(rbot, Address(rtop, entry_size)); // load expression stack
// word from old location
__ movptr(Address(rtop, 0), rbot); // and store it at new location
__ addptr(rtop, wordSize); // advance to next word
__ bind(entry);
__ cmpptr(rtop, rmon); // check if bottom reached
__ jcc(Assembler::notEqual, loop); // if not at bottom then
// copy next word
}
// call run-time routine
// rmon: points to monitor entry
__ bind(allocated);
// Increment bcp to point to the next bytecode, so exception
// handling for async. exceptions work correctly.
// The object has already been poped from the stack, so the
// expression stack looks correct.
__ increment(rbcp);
// store object
__ movptr(Address(rmon, BasicObjectLock::obj_offset_in_bytes()), rax);
__ lock_object(rmon);
// check to make sure this monitor doesn't cause stack overflow after locking
__ save_bcp(a); // in case of exception
__ generate_stack_overflow_check(0);
// The bcp has already been incremented. Just need to dispatch to
// next instruction.
__ dispatch_next(vtos);
}
void TemplateTable::monitorexit(a) {
transition(atos, vtos);
// check for NULL object
__ null_check(rax);
const Address monitor_block_top( rbp, frame::interpreter_frame_monitor_block_top_offset * wordSize);
const Address monitor_block_bot( rbp, frame::interpreter_frame_initial_sp_offset * wordSize);
const int entry_size = frame::interpreter_frame_monitor_size() * wordSize;
Register rtop = LP64_ONLY(c_rarg1) NOT_LP64(rdx);
Register rbot = LP64_ONLY(c_rarg2) NOT_LP64(rbx);
Label found;
// find matching slot
{
Label entry, loop;
__ movptr(rtop, monitor_block_top); // points to current entry,
// starting with top-most entry
__ lea(rbot, monitor_block_bot); // points to word before bottom
// of monitor block
__ jmpb(entry);
__ bind(loop);
// check if current entry is for same object
__ cmpptr(rax, Address(rtop, BasicObjectLock::obj_offset_in_bytes()));
// if same object then stop searching
__ jcc(Assembler::equal, found);
// otherwise advance to next entry
__ addptr(rtop, entry_size);
__ bind(entry);
// check if bottom reached
__ cmpptr(rtop, rbot);
// if not at bottom then check this entry
__ jcc(Assembler::notEqual, loop);
}
Copy the code
The resources
- Java Virtual Machine Specification 3.14. Synchronization
- The Art of Concurrent Programming in Java 2.2 Principles and applications of Synchronized