JDK BUG
In this article, LET’s talk about a BUG I recently learned about JDK 8.
First of all, how did I find this BUG?
Everyone knows I have a certain attention to Dubbo, recently Dubbo 2.7.7 after I watched it update point, is the following url: https://github.com/apache/dubbo/releases/tag/dubbo-2.7.7
There is the section Bugfixes:
Each one I took a quick look at, and the other Bugfixes were more or less related to the Dubbo framework. But the red boxes above are JDK bugs entirely.
So we can isolate it.
I didn’t know about this Bug until I saw this place, but as I researched it, I realized that, how shall I put it: I suspect it’s not a Bug at all, it’s Doug Lea fishing.
Why do you say so, you will know after reading this article.
Bug steadily reappears
Click on the link in Dubbo, we can see that the specific description is a link:
Open this link:
https://bugs.openjdk.java.net/browse/JDK-8062841
We can see that the Bug is the computeIfAbsent method in the well-known Concurrent package.
This Bug was fixed in JDK 9 by Doug Lea.
ConcurrentHashMap, as we know, is Doug Lea’s masterpiece, “Who pollutes, who cleans.”
To understand how this Bug works, you must first understand what this method does:
java.util.concurrent.ConcurrentHashMap#computeIfAbsent
From the mappingFunction, the second entry to this method, we can see that this method is provided after JDK 8.
If the value corresponding to the key in the Map does not exist, the mappingFunction function is called and the result (not null) of the function is returned as the value of the key.
Like the following:
Initialize a ConcurrentHashMap and return null for the first time when the key is why.
The computeIfAbsent method is then called, and when null is obtained, the getValue method is called, associating the return value of the method with the current key.
So, on the second acquisition, I got the “Why technology.”
The return value on line 17 of the code above is the “Why technique”, but I called the map.get() method again for the code demonstration.
Now that I know what this method does, I’ll show you what the bugs are.
Let’s directly use the test case given in this question, address:
https://bugs.openjdk.java.net/secure/attachment/23985/Main.java
I just added the output statement at lines 11 and 21:
{AaAa=42, BBBB=42} {AaAa=42, BBBB=42}
But if you run the code locally (requiring a JDK 8 environment), you’ll see that the method never ends. Because it’s going in an endless loop.
That’s a Bug.
The Art of Asking questions
Knowing the Bug, it makes sense to start analyzing the source code to understand why the Bug appears.
But I’d like to start with a section on the art of asking questions. Because this Bug is a living example.
This link, which I encourage you to check out, also includes Doug Lea’s answer:
https://bugs.openjdk.java.net/browse/JDK-8062841
First let’s take a look at the person who asked the question’s description of the question (don’t bother to read it, it’s confusing anyway) :
Typically, people who are questioned fall into one of two categories:
1. People who have experienced and know the problem will understand what you are talking about.
2. Although I have never met this question before, I feel that I am familiar with the field and may know the answer, but I have no idea what you are talking about after reading your question description.
It’s a long description, and when I read it for the first time, I was confused. It was hard to understand what he was saying. I belong to the second group.
And in most questions, there are far more people in the second category than in the first.
But when I learned the ins and outs of the Bug and looked at the description, it was very clear and easy to understand. I’m in the first group.
But there’s a caveat to being in the first category, and the caveat is that I already know the Bug. Unfortunately, this is a question, and the person who is asked the question, is not particularly aware of the Bug.
Even though the person being questioned is Doug Lea.
Doug Lea responded to Martin’s question in less than an hour on November 04, 2014.
First, you said you found problems with ConcurrentHashMap, but I didn’t see the test case. So I’m going to make a guess that there are other threads that are getting stuck calculating values, but I can’t see that point from your description.
To put it simply: Talk is cheap. Show me the code.
Another buddy, Pardeep, submitted a test case a month later, the one we saw earlier:
Pardeep replied to Martin with the following:
He came straight to the point: I’ve been aware of this bug for a long time, and then I have a test case.
The real turning point, so to speak, was the emergence of this test case.
And then he offers his opinion, which is a short and powerful description of what the problem is (we’ll get to that later), and then he offers his opinion.
Less than an hour later, this post was answered by Doug Lea:
He said: “The young man’s suggestion is good, but now is not the time for us to solve this problem. We might improve the deadlock checking mechanism through code to help users debug their programs. But at present, even if this mechanism is made, the work efficiency is very low, such as in the current case. But now we at least know that there is no guarantee that such a mechanism will be implemented.
In a word: I know the problem, but SO far I haven’t come up with a good solution.
Nineteen days later, however, the Don was back on the subject:
In a reversal, he said, ‘Please ignore what I said.’ We found some possible improvements that could handle more user errors, including the test case presented in this report that addresses the issue of deadlock caused by recursive map updates in functions provided in computeIfAbsent. We will address this issue in JDK 9.
So, review the process by which the Bug was raised.
It was Martin who first raised the question and described it in detail. Unfortunately, his description is very professional, and it is described from the standpoint that you already know the Bug, which makes people look very confused.
So Doug Lea saw this and said, “I don’t understand.
Then Pardeep followed up, and the turning point was the test case he threw up. And I’m sure Martin, since he was able to describe the problem clearly, must have had a test case of his own, but he didn’t show it.
So, my friends, the importance of test cases is self-evident. When you ask a question, don’t just throw an exception. You should at least write the corresponding section of code, or log it, or write it out in a document.
The cause of the Bug
The cause of this Bug can be explained in one sentence. Pardeep also said:
The problem is that when we do a computeIfAbsent, there’s also a computeIfAbsent. The two computeIfAbsent keys correspond to the same hashCode.
Coincidence, you say.
When they have the same hashCode, it means they want to put something in the same slot.
When the second element comes in, it finds that the pit has been occupied by the previous element, which might look something like this:
Let’s examine the workflow of the computeIfAbsent method:
The first step is to calculate which slot the hashCode corresponding to the key should go into.
Then there’s the for loop that goes to line 1649, and the for loop is an infinite loop that looks inside the body of the loop and breaks the loop if the conditions are met.
First, let’s see what the h values of “AaAa” and “BBBB” are after the spread calculation (right shift 16-bit efficient calculation) :
Wow, what a coincidence. You can see from the two parts of the box, it’s 2031775.
Which means they’re gonna be in the same tank.
First “AaAa” enters the computeIfAbsent method:
InitTable on the first loop, nothing to say.
The second loop first evaluates the index of the array at line 1653 and retrieves the node for that index. The node was found empty. So we go into branching judgment:
To perform the CAS operation at the location marked ①, use R (or ReservationNode) to perform a placeholder operation.
Apply to mappingfunction. apply and calculate value. If it does not calculate to null, the value is assembled into the final node.
Replace the previously reserved ReservationNode with a assembled node labeled ② for the item labeled ③.
The problem arises at 2. You can see that the mappingFunction.apply operation is performed here, which in our case triggers another computeIfAbsent operation.
Now “AaAa” waits for the return value from the computeIfAbsent operation to proceed to the next operation, operation number ③.
Then “BBBB” came along.
The hashCode of “BBBB” is the same as that of “AaAa”. So it’s going to want to do something in that trough.
Too bad it’s too late.
Take a look at the corresponding code:
When the key is “BBBB”, the calculated h value is also 2031775.
It’s also going to go into this endless loop at line 1649. And then make all kinds of judgments.
My next argument is:
In this article’s example code, when the key is “BBBB”, it enters an endless loop at line 1649 and cannot be ejected. The program runs in a loop all the time.
At the point marked ①, since TAB is no longer null, it will not enter the branch.
In case (2), the AaAa has already thrown a ReservationNode into the ReservationNode, so it is not null. So, you’re not going to go into this branch.
I’m afraid you’re confused, so I give you a picture. It’s really warm male author stone hammer:
And then we go to ③, there’s a MOVED, what does this MOVED do?
Indicates whether the current ConcurrentHashMap is being expanded.
Clearly, it’s not time to expand:
F on line 1678 is the ReservationNode that “AaAa” threw into it earlier, and its hash is -3, not equal to MOVED (-1).
So, you don’t go into this branch judgment.
Next, can enter only mark for ④ place, so we just need to break this place, will thoroughly understand the Bug.
Start:
We know from our previous analysis that in our current case, we’re only going to go into line 1672.
And in this branch, there are four more judgments. We break them down one by one:
The object labeled “⑤” that the tabAt method pulls out is the ReservationNode that “AaAa” put in earlier, which is the f. So you can go into this branch.
For ⑥, fH >=0. Fh is the hash value of the current node. If the value is greater than 0, the data is stored in a linked list. As we analyzed earlier, the current hash value is -3. So, it doesn’t go into this branch.
At the point marked ⑦, determine whether node F is red-black tree storage. Of course not. So, it doesn’t go into this branch.
At ⑧, binCount means the number of nodes in the index. Apparently, there are none right now. So the current binCount is still 0. So, it doesn’t go into this branch.
Finished. That’s it.
The Bug is that after a for loop, there is no break. The trouble is that the for loop is still a dead loop.
For another god perspective, see what happens when key is “BBBB” :
Into an infinite loop:
① After “AaAa”, TAB is not null.
(2) The current slot is not null because a ReservationNode is already in the AaAa slot for ReservationNode.
3. The current map does not perform capacity expansion.
④ includes ⑤, ⑥, ⑦, ⑧.
⑤. The tabAt method takes out the object that is the ReservationNode that “AaAa” put into it before, so it meets the conditions to enter the branch.
⑥. Check whether the current is linked list storage, if the condition is not met, skip.
⑦. Check whether the current is red-black tree storage, if the condition is not met, skip.
(8) Check whether the current subscript contains node, does not meet the condition (” AaAa “only has a placeholder node, and has not completed the initial, so it has not been placed in the subscript), and enter the next loop.
And then it can’t get out of the loop!
I believe you now understand the origin of this Bug.
If you are running the test case in IDEA, you can also look at it visually:
Click on this camera icon:
From the thread snapshot inside can also see clues, we can go analysis.
Some people say that the thread safety caused by the loop, after analysis I think this view is wrong.
It has an infinite loop, not because of thread safety, but because it has entered an infinite loop of its own.
Or is it an Easter egg?
Or… Be confident. Say it’s a Bug, a stable recurrence.
So how do we avoid this “egg” if we use JDK 8?
See how it works in Dubbo:
The get method is called first, and if null is returned, the putIfAbsent method is called, which gives the same effect as before.
If you also use computeIfAbsent in your project, it is recommended to do the same.
The ConcurrentHashMap get method returns null. The ConcurrentHashMap get method returns null.
The answers are written in this article, if you are interested, you can read “THIS interview question I really don’t know what the interviewer wants to answer”.
Now that we have a thorough understanding of the Bug, let’s take a look at the JDK 9 solution and see the official source code comparison:
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/ConcurrentHashMap.java?r1=1.258&r2=1.2 59&sortby=date&diff_format=fCopy the code
I just add two lines of code to check if it’s a red-black tree node and then check if it’s a ReservationNode, which is a placeholder node. If so, an exception is thrown.
It’s that simple. There is no mystery.
So, if you execute a text test case in JDK 9, IllegalStateException will be thrown.
This is the solution Doug Lea mentioned earlier:
Knowing the ins and outs of the Bug, and especially seeing the solution, we can simply say:
That’s it? Never heard of it!
Also, when I looked at the JDK 9 fix, I fixed more than one issue:
http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/6dd59c01f011/src/java.base/share/classes/java/util/concurrent/ConcurrentHa shMap.javaCopy the code
You go and check it out. Discover, ah, it is knowledge point, learn not to move.
Fishing law enforcement
Why did I say at the beginning of the article that this is Doug Lea fishing?
Because at the beginning of the art of asking questions, I’m sure Doug Lea was feeling better after running the test case.
I kind of know what the problem is, and I have reason to believe from his answers and the documentation that he wrote, that he knew when he wrote the method that it might go wrong.
Also, Pardeep’s response mentions documentation, so let’s take a look at what the official documentation says about this method:
https://docs.oracle.com/javase/8/docs/api/
The documentation says that function methods should be short and simple. The mapping cannot be updated at the same time as the updated mapping. That means no nesting dolls.
Nesting dolls, programmatically termed recursive, are documented to throw an IllegalStateException if there is recursion.
And when you think about recursion, what do you think?
The first thing that comes to mind is the Fibonacci function. We use computeIfAbsent to implement a Fibo function as follows:
public class Test {
static Map<Integer, Integer> cache = new ConcurrentHashMap<>();
public static void main(String[] args) { System.out.println("f(" + 14 + ") =" + fibonacci(14)); } static int fibonacci(int i) { if (i == 0) return i; if (i == 1) return 1; return cache.computeIfAbsent(i, (key) -> { System.out.println("Slow calculation of " + key); return fibonacci(i - 2) + fibonacci(i - 1); }); } Copy the code
Copy the code
}
This is a recursive call, and I didn’t throw an IllegalStateException when I ran with JDK 1.8. I just faked death for the same reason we analyzed earlier. I understand that this place is inconsistent with the document.
So, I suspect Doug Lea is fishing in this area.
Is CHM thread safe?
Now that we’re talking about currentHashMap (CHM), let me make a related note.
Is CHM guaranteed to be thread safe in the first place?
Yes, CHM itself must be thread-safe. However, there is still the possibility of thread insecurity if you don’t use it properly.
Let’s take a look at the source code in Spring:
org.springframework.core.SimpleAliasRegistry
In this class, aliasMap is of type ConcurrentHashMap:
In both the registerAlias and getAliases methods, there is code to operate on aliasMap, but synchronized locks the aliasMap before the operation.
Why is that? Why do we lock ConcurrentHashMap?
This depends on the scenario, the alias manager, where the lock is supposed to prevent multiple threads from operating on ConcurrentHashMap.
Although ConcurrentHashMap is thread-safe, it is assumed that if one thread puts and one thread gets, it is not allowed in this code scenario.
Let me give you an example of Redis if it’s a little confusing.
Redis’ get and set methods are thread-safe. But if you get first and then set, you’re still going to have problems with multiple threads.
Because these two operations are not atomic. So incR was born.
I use this example to say that thread safety is not absolute, it depends on the situation. Given a thread-safe container, you can still have thread-safe problems if you don’t use it properly.
For example, is a HashMap necessarily thread-unsafe?
Say you can’t say it like that. It is a thread-unsafe container. But what if my usage scenario is read-only?
In this read-only scenario, it is thread-safe.
Anyway, look at the scene. Truth, is such a truth.
One last word (for attention)
So click a “like”, zhou is more tired, don’t whoring me, need some positive feedback.
If you find the wrong place, please leave a message to point out, I will modify it.
Thank you for reading, I insist on original, very welcome and thank you for your attention.
I am Why, a literary creator delayed by code. I am not a big shot, but I like sharing. I am a nice man in Sichuan who is warm and interesting.
Welcome to follow my wechat official account: WHY Technology. Here I’m going to share some knowledge about Java technology, being creative and responsible for every line of code. Occasionally also meeting shortage cavity go off board of chat about a life, write a book review, film review. Thank you for your attention, I wish you and I progress together.
This article is formatted using MDNICE