HashMap tableSizeFor

When looking at HashMap source code, I noticed that the capacity must be an integer power of 2. To ensure this, I provided a clever and efficient method tableSizeFor. Might as well think about, if it is to solve this problem, how to solve?

Given an integer n of type int, how can we find the nearest power m of 2 that is not less than it, such as 16 for 10 and 32 for 25?

The rough-and-tumble ways of ordinary people

The average person’s idea might be simple: take the logarithm base 2 of n, and the result is that m is a double. If the decimal part is 0, then m is the exponent we want. If the decimal part is not 0, then you round up m, and then you take 2 to the m.

The first problem is that the JDK does not provide a mathematical formula for the logarithm of 2, only a formula for the natural logarithm of e, math.log (double a).

Fortunately, we can use the logarithmic base change formula

The sample code

    public static int fun(int n) {
        double m = Math.log(n) / Math.log(2);
        int m2 = (int) Math.ceil(m);
        return (int) Math.pow(2, m2);
    }
Copy the code

Regardless of the loss of precision, the above code is very simple, just three steps, take the logarithm + take the whole + take the exponent.

The problem

Reviewing the requirements in HashMap, we know that this method is very basic and will be performed heavily at initialization or addition time, which requires that the method itself be efficient.

Although the code here is simple, but the method to call the code is a lot of closer look, and the operations involved, such as logarithm, index, division, integer, cast conversion, are more advanced, must rely on a large number of simple operations at the bottom.

The time a program takes to run depends not only on the environment, such as the length of clock cycles and the average number of clock cycles per instruction, but also on the number of instructions. Perceptual knowledge also tells us that the actual execution of the above code must not be less final instructions.

We all want to use this method to convert to the power of 2, in order to reduce the hash conflict, improve the efficiency of access, the result of this method itself seriously affect the efficiency, is not missing the point?

Realization of the great god

Let’s take a look at how the authors of HashMap implement this.

static final int tableSizeFor(int cap) {
    
    int n = cap - 1;
    // Shift operation
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

The first line is easy, why put -1 at the end, and the last line is two ternary operators, one of which is n+1, and it’s easy to understand. The key is the shift plus or in the middle five steps.

Idea of displacement

Here’s what I understand the author’s thinking:

Integer powers of 2 are represented in binary with 1 as the most significant bit and 0 as the rest, such as decimal 8 and 32, as shown in the figure below using only one byte.

The conversion of any decimal number to an integer power of 2 results in the most significant digit of the number itself becoming a 1, and the most significant and subsequent bits becoming 0.

The idea is that you change the most significant bit and all the bits following it to 1, and then you add 1 to the last bit, and then all the full 2 bits after that become 0. So the key is how to change the most significant bits to 1’s.

I’m going to draw it again. This converts the decimal 25 to 32.

The author’s approach is to shift first, then or operation.

If you move one to the right, or you do an operation, two of them become 1’s;

Move two to the right, or, and four of them become 1.

The last 16-bit right shift ensures that all bits after the most significant 32-bit int integer are 1.

Whole process diagram

I feel that I understand, but I feel that the article is written around, and I estimate that you also have this feeling when you see here. Here’s a diagram of the whole process.

The initial value

Select any number of int type. X in the figure below represents an indeterminate 0 or 1.

Our goal is to change all the x’s to 1, as shown below

And then finally +1, you can round up to an integer power of 2.

All we have to do is keep shifting the + or to the right to get there.

Move one + or to the right

You can see that when you move it one or the other, two of them become 1’s.

Move two + or to the right

When we move two to the right, four of them become ones.

Move four + or to the right

When we move four digits to the right, eight of them become ones.

Shift eight bits to the right + or operation

By moving eight or to the right, sixteen of the digits become ones.

Move 16 bits to the right + or operation

Move 16 or to the right. Note that this is not a total change of 32 bits, but a total change of 1 after the highest bit.

The results of + 1

And you can see that no matter what x is, we can convert it to 1. And after 1,2,4,8,16 conversions, we’re going to convert this int no matter how big it is, but if it’s small, we’re going to do a few more meaningless operations.

Initial capacity -1

The reason why the capacity is -1 before the start of the shift is to avoid that if the given capacity is already 8,16, such as the power of 2, the direct shift without reducing one will lead to a larger result than expected. If you expect 16 to be 16, if you shift it, you get 32. In the figure above, for all cases where x is already 0, the result without subtracting 1 becomes larger.

conclusion

Going back to the original problem, this method is efficient because the shift operations and/or operations are relatively low-level operations, and the number of code is no more than the final number of instructions, so we achieve our goal with a few simple operations.

The reason why I have to write a special article to explain this method is because when I read this method, I realized some problems I didn’t care about. In this way, you can understand why you need to learn the basics of computing, such as binary operations, logical operations, etc., and why some advanced algorithms seem to deal with simple problems. They might be boring to learn, but they are actually very useful. At ordinary times may not be able to see, in some key details to see the difference between ordinary people and gods.

There’s still so much to learn!