Java bit operation

In the computer system, no matter the code or the number, will eventually be converted to the binary 0 or 1, the value will be calculated and stored with the complement code, because the use of complement can make addition and subtraction unified use of addition operation.

For machine numbers, source, inverse, and complement, see this article: What we Should Know about Java Bit Arithmetic

1. Move left (<<)

The meaning of m<<n: the binary number represented by the integer M is moved n bits to the left, the high value exceeding is discarded, and the low value is filled with 0. (there will be a possibility that the positive number becomes a negative number at this time).

2. Move right (>>)

The meaning of m>>n: the binary number represented by the integer M right n bits, m is a positive number, high all complement 0; M is negative, and all the higher ones are added

3. Unsigned right shift (>>>)

M >>> N: the binary representation of the integer m is shifted n bits to the right, regardless of positive or negative values, the high order is filled with 0

4. Bitwise and Operation (&)

1 in both cases, 0 in all other cases

  • 1 & 0 = 0
  • 0, 0 = 0
  • 1 and 1 = 1
  • 0 and 1 = 0

5. Bitwise or operation (|)

It’s a 1 if there’s one 1, and it’s a 0 if there’s two 0’s

  • 1 | 0 = 1
  • 0 | 0 = 0
  • 1 | = 1
  • 0 | = 1

6. Bitwise XOR operation (^)

The value of the same bit is 0; the value of the different bit is 1

  • 1 ^ 1 = 0
  • 1 ^ 0 = 1
  • 0 ^ 1 = 1
  • 0 ^ 0 = 0

7. Bitwise non-operation (~)

Including the sign bit, 1 becomes 0,0 becomes 1

  • Number of 16-bit binary machines with -5 transposed: 11111111 11111011
  • The negative result of ~ (-5) is 00000000 00000100
  • To decimal, the result is 4

Second, summarize the rules and try to apply them

We know that bit operations are much faster at the bottom of the CPU than in code, so how does bit operations work, let’s summarize the rules, when a binary number n does bit operations on 0 or 1

Enter n for bit calculation And operation (&) Or operation (|) Xor operation (^)
0 0 n n
1 n 1 N the not

From this we conclude that

  1. When we need to change a binary number n containing various zeros and ones to 0, we only need to perform (&) and on it. Example:
10100101 & 11110000=10100000, that is, the low zero, with 00001111 can be high zeroCopy the code
  1. Change the n part to 1 and leave the other bits unchanged, then perform (|) or on it. Example:
10100101 | 11110000 = 11110101, namely the high fill 1, with a low fill 1, 00001111Copy the code
  1. When a hashmap obtains a hash value, it performs an xor operation on the lower 16 bits of the original hash value of the key so that the higher 16 bits also participate in the index operation. This will be explained in the following application.

Three, the application of snowflake algorithm

Twitter’s snowflake algorithm SnowflakeIdWorker, as I know it, is a distributed ID solution that uses long’s 64-bit digits for bitarithmetic.

See this article: Snowflake algorithms for Distributed ID artifacts

A number of type long has 64 bits.

  1. The first digit is always 0, which is a positive number.

  2. Timestamps. Unlike Unix native timestamps, Java timestamps are accurate to the millisecond and take up 41 bits, which can hold about 69 years in total.

  3. The working machine ID occupies 10 bits. The high 5 bits are the ID of the data center, and the low 5 bits are the ID of the working node. A maximum of 1024 nodes can be accommodated.

  4. The serial number occupies 12 bits. Each node starts to accumulate every millisecond 0 and can accumulate up to 4095. A total of 4096 ids can be generated.

Long idwork = timestamp << 22 | this.datacenterId << 17 | this.workerId << 12 | this.sequence; Where timestamp is the timestamp, The current timestamp is 00000000 00000000 00000001 01111111 10000010 10100110 01001000 10001001 Move 22 bits to the left to get 01011111 11100000 10101001 10010010 00100010 01000000 00000000 00000000 datacenterId indicates the datacenterId. The datacenterId ranges from 0 to 31. 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00111110 00000000 00000000 00000000 workerId indicates the id of the number of nodes, which is also a 5-bit id from 0 to 31. 00000000 00000000 00000000 00000000 00000000 00000001 11110000 00000000 Sequence Is the self-incremented digit of the same node within 1 ms. It occupies 12 bits. Value range: 0-4095 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001111 11111111 Perform (|) or operation 01011111 11100000 10101001 10010010 00100010 01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00111110 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000001 11110000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001111 11111111 After summing up the above rules, we know that 0 is itself when it performs or operates on other numbers, that is, the above numbers are equivalent to accumulation after performing or operations. 01011111 11100000 10101001 10010010 00100010 01111111 111111 11111111 You can obtain the distributed ID.Copy the code

4. Application of HashMap

To summarize the result, the hashMap stored procedure, where hashMap is an array + linked list + red-black tree storage mode

(1) The capacity of the hashMap can be specified during initialization, which is raised to a power of 2 and initialized on the first put method

(2) If the capacity is not specified, the capacity is initialized to 16 during the first PUT

(3) The initial capacity is array capacity, namely Node<K,V>[] table, if there is a hash mod conflict, it will be hung under the linked list, namely table[I].next

(4) hashCode uses a perturbation algorithm to make both high and low bits participate in the Hash calculation to reduce Hash conflicts

(5) Use hashCode to model the array capacity (actually = budget), get the index subscript position, call equals() if the index conflict, overwrite if the index is equal, do not equal to the linked list

(6) If the length of the list exceeds 8, it is transformed into a red-black tree

(7) If the size of the storage element exceeds 0.75 times the size of the array table.length (the default value, which can be modified), the storage element is expanded

(8) Recalculate the index subscript value during expansion. The elements are either at the original location or at the original location + the old bucket capacity, where the linked list will be evenly dispersed

1. Initialize the capacity to the power of 2

When you new a HashMap object, you can specify the capacity of the collection, but is the capacity you specify the container initialization capacity? The answer is no.

Let’s take a look at the source code for initialization

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

InitialCapacity is the specified capacity, and loadFactor is the loadFactor. The default value is 0.75, that is, the capacity of the table is expanded every time the size reaches 0.75 times. We see that initialCapacity has undergone a tableSizeFor() conversion.

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

We can see that this code is a shift and or operation on the initial capacity. The result of this code is the power of 2 on the input value. For example, input 10 returns 16, input 18 returns 32, and if it is itself a power of 2, it returns the original value.

Int = 32 bits; int = 32 bits; int = 32 bits; int = 32 bits; I.e., 00000000, 00000000, 00000000, 00010001 n | = n > > > 1; N > > > 1 00000000 00000000 00000000 00001001 for both or operation then assigned to the n - 00000000, 00000000, 00000000, 00011001 n | = n > > > 2. We get 00000000 00000000 0000000000011111 and we see that all the numbers are 1, and then after the or equals operation, the result is the same, After n + 1 is equal to 00000000, 00000000, 00000000, 00100000 or 2 5 power = 32 for int the most greatly, even the biggest 01000000 00000000 00000000 00000000 after n | = n > > > 1; // 01100000 00000000 00000000 00000000 n |= n >>> 2; // 01111000 00000000 00000000 00000000 n |= n >>> 4; // 01111111 10000000 00000000 00000000 n |= n >>> 8; // 01111111 11111111 10000000 00000000 n |= n >>> 16; // 01111111 11111111 11111111 11111111 11111111 MAXIMUM_CAPACITY = 1 << 30, i.e. 01000000 00000000 00000000 00000000 00000000, That is, the maximum capacity cannot exceed this valueCopy the code

We see that the shift operation hashmap ensures that the size of the array must be a power of 2, which is crucial in subsequent hashing and high-order operations during scaling.

2. The perturbation algorithm obtains the hashCode value

We know that hashMap keys must implement the equals() and hashCode() methods, but does hashMap use this hashCode directly? The answer is no

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
For example, if hashCode is 11110111 00101100 10010100 10010010 unsigned 16 bits right to get 00000000 00000000 10010100 10010010 xor 0 xor any number is original, That is, the high 16 bits remain unchanged, and the high 16 bits and the low 16 bits are disturbed to get the low 16 bitsCopy the code

This ensures that both high and low bits are involved in the Hash calculation when the length of the array table is small, and reduces the probability of Hash collisions.

3. Use & (and) instead of modulo operation to obtain key index

When we want to get the index subscript of the elements in a hashMap, we first think of modulo the length of the array to ensure that the elements are evenly distributed to minimize conflicts.

Entering the putVal() method in put(), we can see a piece of code like this

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); ...Copy the code

Where I = (n-1) & hash is the index subscript of the element. Where n is the array capacity and hash is the hashCode value obtained by the above perturbation algorithm.

We know that n has to be a power of 2, so n minus 1 is

We know that the array capacity n must be a power of 2, for example, 16, which is 00000000 00000000 00000000 00010000, so n-1 is 00000000 00000000 00000000 00000000 00001111 and we know that 0 and any number is 0, If 1 is the original value of any number, you get the last four digits of the original number, which is the same as modulo 16Copy the code

Let me borrow meituan’s picture

So we know that the size of an array to the power of two is important here.

4. The high-order calculation is evenly distributed in the linked list during expansion

This segment method is used when copying the list in the resize() method

Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) ! = null); if (loTail ! = null) { loTail.next = null; newTab[j] = loHead; } if (hiTail ! = null) { hiTail.next = null; newTab[j + oldCap] = hiHead; }Copy the code

The function of the above code is to evenly split the linked list into two linked lists, and one is hung in the original index position, the other is hung in the original position + the old table capacity position. Where E. hash & oldCap is the high order operation, judging whether the high order is 0 or 1 dispersed linked list.

The following figure shows the meaning of this statement. N is the length of a table. Figure (a) shows the index location of key1 and KEY2 before expansion, and Figure (b) shows the index location of key1 and KEY2 after expansion

When we look at the index calculation of the element after the expansion, we just need to see whether the new bit is 1 or 0. 0 is the same index, and 1 is “old index +oldCap”.

This design not only saves the time of recalculating the hash value, but also can be considered as random whether the new high 1bit is 0 or 1, so the resize process evenly distributes the previously conflicting nodes into the new bucket.

Again, the importance of having an array size to a power of two.

conclusion

This article mainly explains the basic knowledge of Java bit operation, as well as the application scenarios of bit operation, the use of snowflake algorithm and HashMap, and there are many advanced frameworks using bit operation, here is just to introduce a brick to let students have a basic understanding.

Copyright notice: This article is the blogger’s original article, please attach the original source link

Original link: juejin.cn/post/707455…

Refer to the article: blog.csdn.net/javazejian/… Tech.meituan.com/2016/06/24/…