And check set (UnionFind) and optimization

Learn from the love of data Structure and Algorithm written by code guy, the picture comes from the video screenshot.

What is a concurrent search set? Let’s start with the following question.

To solve this problem, we need to design a new data structure that can quickly merge two sets of data into one. At the same time, it can quickly identify whether two numbers are a group.

The one that meets this condition is the union search set

The two core methods for searching sets are Union and Find.

The following is an example of an integer.

Quick Find

This is the first implementation, not the recommendation.

Ideas:

The index of the array represents an integer and the value represents its parent element.

If the parent element or the parent node is the same, we treat it as a set of elements.

So the initial parent of each element is itself, and each element is a separate group.

 
public class UnionFind {

    private int[] parents;

    public UnionFind(int capacity) {
        if (capacity < 0) {
            throw new IllegalArgumentException("capacity must be >0");
        }
        parents = new int[capacity];

        for (int i = 0; i < parents.length; i++) {
            parents[i] = i;// The parent of each element is itself}}... }Copy the code

Union

How do I merge two sets of elements?

You just change the parent node of all the elements in the group to one.

For example, merge the values on the left to the right: union(v1,v2) is the parent node of the same node as the parent byte of v1 that changes its parent node to v2.

You can see that the height of the tree is at most 2.

public class UnionFind {.../** * add the group of elements v1 to the group of elements v2 **@paramV1 elements *@paramThe v2 elements * actually modify their parent element */
    public void union(int v1, int v2) {
        int parent1 = find(v1);
        int parent2 = find(v2);
        if (parent1 == parent2) return;

        for (int i = 0; i < parents.length; i++) {
            if(parents[i] == parent1) { parents[i] = parent2; }}}... }Copy the code

Notice that you need to change the parent nodes of all elements in the group v1 (parent node and all elements of v1).

Find

Fetching the location of the array directly is fetching the parent element, and the same parent element is a group.

So find’s time complexity is O(1)

public class UnionFind {.../** * Finds the parent element of the current element **@paramElement The current element *@returnThe parent element * /
    public int find(int element) {
        rangCheck(element);
        return parents[element];
    }

 /** * if the two elements are in the same group, the parent elements are considered to be in the same group@paramV1, 1 star@paramV2 entry 2 star@returnWhether it is a group of */
    public boolean isSame(int v1, int v2) {
        returnfind(v1) == find(v2); }... }Copy the code

There we go. It’s really easy. The time is order N.

Write the test code according to the following figure:

public class UninFindMain {

    public static void main(String[] args) {
        UnionFind uf=new UnionFind(12);

        uf.union(0.1);
        uf.union(0.3);
        uf.union(0.4);
        uf.union(2.3);
        uf.union(2.5);

        uf.union(6.7);

        uf.union(8.10);
        uf.union(9.10);
        uf.union(9.11);

        System.out.println(uf.isSame(0.6));
        System.out.println(uf.isSame(6.7));
        uf.union(1.6);
        System.out.println(uf.isSame(0.7)); }}Copy the code

The following output is displayed:

false
true
true

Process finished with exit code 0
Copy the code

As you can see above the Find step complexity is O(1), Find is fast so it is called QuickFind. But Union does have O(N).

Quick Union

QuickUnion differs from QuickFind in that both find and union have O(logN) time complexity.

Union

The difference with QuickFind is that you can find the root node of the current group and change the direction of the root node. You do not need to change the direction of all nodes.

Find

The difference with QuickFind is that you keep looking up until you find the root node and return. The same root nodes are considered a group.

The time complexity of Find is essentially the height of the tree, which is order logN.

The time of the Union is also O(logN), because you need to find the two root nodes and then switch points. O(logN)+O(logN)+O(1)=O(logN)

The code implementation is simple:

 @Override
    public int find(int element) {
        rangCheck(element);
        while(element ! = parents[element]) { element = parents[element]; }return element;
    }

    @Override
    public void union(int v1, int v2) {
        // Find the root node
        int p1 = find(v1);
        int p2 = find(v2);
        if (p1 == p2) return;
        // Change the element value of the root node to v2 root
        parents[p1] = p2;
    }
Copy the code

The Quick Union optimization

QuickUnion can cause tree imbalance when merging, or even tree degradation to linked lists. So the union is order N.

Size based optimization

In the previous case of union, we were merging the elements from the left to the right. If this is the case above you can merge the right side to the left side. So we’re back to the tree structure.

Merge groups with fewer elements into groups with more elements.

Open up an array size[] to store the size of each tree. The root node of the tree is used as index and value to store the size value.

So at the beginning, value is always 1.

/** * QuickUnion size optimization */
public class UnionFind_QU_S extends UnionFind_QU {

    private int sizes[];

    protected UnionFind_QU_S(int capacity) {
        super(capacity);
        sizes = new int[capacity];
        for (int i = 0; i < capacity; i++) {
            sizes[i] = 1; }}... }Copy the code

Public void Union (int v1, int v2) public void Union (int v2)

		/** * merge the root node of v1 to the root node of v2@paramV1 elements *@param* / v2 elements
    @Override
    public void union(int v1, int v2) {
        // Find the root node
        int p1 = find(v1);
        int p2 = find(v2);
        if (p1 == p2) return;

        if (sizes[p1] < sizes[p2]) {
            // the size of the root tree is smaller than that of the root tree
            parents[p1] = p2;
            // P2 size increases
            sizes[p2] += sizes[p1];
        } else {
            // The opposite of the aboveparents[p2] = p1; sizes[p1] += sizes[p2]; }}Copy the code

Rank based optimization

The short tree merges into the tall tree.

Obviously the tree is more balanced than size.

As above, use an array to store the rank of the height of the tree. The height of the starting tree is always 1:

/** * QuickUnion rank optimization */
public class UnionFind_QU_R extends UnionFind_QU {

    private int ranks[];

    protected UnionFind_QU_R(int capacity) {
        super(capacity);
        ranks = new int[capacity];
        for (int i = 0; i < capacity; i++) {
            ranks[i] = 1; }}... }Copy the code

Public void Union (int v1, int v2) public void Union (int v1, int v2)

Note that the height of a rank does not change when the lower rank is merged into the higher tree. As long as the height of the two trees is the same, the height of the new composite tree needs to increase.

/** * merge the root node of v1 to the root node of v2@paramV1 elements *@param* / v2 elements
    @Override
    public void union(int v1, int v2) {
        // Find the root node
        int p1 = find(v1);
        int p2 = find(v2);
        if (p1 == p2) return;

        if (ranks[p1] < ranks[p2]) {
            // The height of tree p1 as root is smaller than that of tree P2 as root
            parents[p1] = p2;
            // The height of short grafts to high ones should not be changed
        } else if (ranks[p1] > ranks[p2]) {
            // The opposite of the above
            parents[p2] = p1;
        } else {
            // The height of the two numbers is the same. The height of the tree is +1parents[p1] = p2; ranks[p2]++; }}Copy the code

PathCompression

Although rank based optimization, the tree will be relatively balanced.

But the more times the tree merges, the taller the tree still gets. As a result, the time complexity of each find increases.

Because when we perform find, we traverse a path in the tree. We can optimize at this point –> path compression

When find is performed, all nodes on the path point to the root node. So it lowers the height of the tree.

As a result, we simply need to rewrite the rank optimized find(Element) method:

/** ** QuickUnion rank Path Compression * 

* override find */

public class UnionFind_QU_R_PC extends UnionFind_QU_R { protected UnionFind_QU_R_PC(int capacity) { super(capacity); } @Override public int find(int v) { rangCheck(v); if(parents[v] ! = v) { parents[v] = find(parents[v]); }// Path compressed v and parents[v] are different returnparents[v]; }}Copy the code

Find 300,000 numbers at random. Compare the velocity.

【UnionFind_QU_S】 Start:18:41:28.591The end:18:41:30.448Time:1.856Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 【 UnionFind_QU_R 】 :18:41:30.478The end:18:41:32.371Time:1.893Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 【 UnionFind_QU_R_PC 】 :18:41:32.383The end:18:41:34.075Time:1.692Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the Process finished with exit code0
Copy the code

You can see there’s not much improvement in path compression. The main reason is that the operations in find are time-consuming.

PathSpliting

The performance of the path compression method is affected because the tree is too balanced.

So we can change the way we compress paths.

Path sorting: Make each node on a path point to its grandfather node.

The diagram below:

Split the original path in two.

It’s a little different from path compression, which is the find method.

/** * Path Splitting of QuickUnion rank * 

* Rewrite find */

public class UnionFind_QU_R_PS extends UnionFind_QU_R { protected UnionFind_QU_R_PS(int capacity) { super(capacity); } @Override public int find(int v) { rangCheck(v); while(parents[v] ! = v) {// Find the root node // Keep the parent node first int p = parents[v]; //v points to its grandparent node parents[v] = parents[parents[v]]; v = p;2->4->6... } returnparents[v]; }}Copy the code

Path Halving

This is actually similar to path splitting, just another optimization of path compression.

Halve path: Make every other node on the path point to its grandfather.

For example, the name is reduced to half of the original path length.

/** * QuickUnion rank * 

* rewrite find */

public class UnionFind_QU_R_PH extends UnionFind_QU_R { protected UnionFind_QU_R_PH(int capacity) { super(capacity); } @Override public int find(int v) { rangCheck(v); while(parents[v] ! = v) {// Find the root node //v points to its grandparent node parents[v] = parents[parents[v]]; v = parents[v];/ / 1 - > 3 - > 5... } returnparents[v]; }}Copy the code

conclusion

3 million values were randomly generated to compare the optimized performance

[UnionFind_QU_S] start optimization based on Size:22:32:10.819The end:22:32:12.943Time:2.123Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- optimization based on Rank UnionFind_QU_R 】 【 begins:22:32:12.971The end:22:32:14.849Time:1.878Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + path optimization based on Rank UnionFind_QU_R_PC 】 【 compression begins:22:32:14.861The end:22:32:16.425Time:1.564Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- based on Rank optimization + UnionFind_QU_R_PS 】 【 path splitting begins:22:32:16.431The end:22:32:18.008Time:1.577Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- based on Rank optimization + UnionFind_QU_R_PH 】 【 path in half to start:22:32:18.024The end:22:32:19.568Time:1.544Seconds -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the Process finished with exit code0

Copy the code

The test code is as follows:

public class UninFindMain {
    static final int count = 3000000;

    public static void main(String[] args) {
// testTime(new UnionFind_QF(count));
// testTime(new UnionFind_QU(count));
        testTime(new UnionFind_QU_S(count));
        testTime(new UnionFind_QU_R(count));
        testTime(new UnionFind_QU_R_PC(count));
        testTime(new UnionFind_QU_R_PS(count));
        testTime(new UnionFind_QU_R_PH(count));
    }

    static void testTime(UnionFind uf) {
        uf.union(0.1);
        uf.union(0.3);
        uf.union(0.4);
        uf.union(2.3);
        uf.union(2.5);

        uf.union(6.7);

        uf.union(8.10);
        uf.union(9.10);
        uf.union(9.11); Asserts.test(! uf.isSame(2.7));

        uf.union(4.6);

        Asserts.test(uf.isSame(2.7));

        Times.test(uf.getClass().getSimpleName(), () -> {
            for (int i = 0; i < count; i++) {
                uf.union((int) (Math.random() * count),
                        (int) (Math.random() * count));
            }

            for (int i = 0; i < count; i++) {
                uf.isSame((int) (Math.random() * count),
                        (int) (Math.random() * count)); }}); }}Copy the code

You are advised to use path splitting, path compression, path halving and Rank based optimization. The time complexity can reach O(α(n)), α(n)<5.