This article will analyze and answer the following three questions step by step: What is a set of concurrent searches? And what are the optimizations for looking up sets? What kind of problem can a parallel lookup set solve? Finally, the paper will give the analysis and summary of LeetCode related problems


What is a concurrent search set

If you’ve never heard of a data structure, what kind of data structure comes to mind when you first see the words “simultaneous collection”? You’ll get more out of thinking and deriving a problem from what we’ve already learned than from trying to understand it directly.

Let’s break it down word by word, and, search, set this three words, among which the first two words are verbs, the third word is a noun. Let’s look at the noun first, because only when we know what it is can we understand what it can do. A set is a set, and we learned this in high school, and a set is basically a bunch of elements that are placed in the same place out of order. Now that we know that a set is essentially a set, what can it do? The first two words are “union” and “search”. Some operations on sets, you must remember, such as intersection, union, etc. The “union” here actually refers to the union operation, two sets merged into a set, as follows:

{1,3,5,7} U {2,4,6,8} = {1,2,3,4,5,6,7,8}
Copy the code

What is “check”? Collection itself is only a container, we finally know what is inside of element, so the “check” here is for store element in the set, we should not only look up the elements in collection, we need to make sure the elements are in which collection, for the former one operation, common set can be done in Java, But for the latter operation, just using a set is more difficult. Well, now you know and look up what a set is and what it does, to sum it up:

  • And find the set can be set merge operation (and)
  • And look up the set to find which set the element is in (look up)
  • And looking up sets maintains a bunch of sets (sets)

With that in mind, the concept of looking up sets becomes clear.


And look up the implementation of the set

As a programmer, it is necessary to present what we know in the form of code. I believe that through the above statement, you already know that the collection maintenance is a collection rather than a collection. What data structure is used to express and look up the collection? The set? There are two things we must want to know, is the value of an element, a collection of symbols, an element may only exist in a collection at the same time, the element is a many-to-one relationship to collections, so we can use to represent a health value of the structure and set, the Map is certainly can, but if there is no specific requirements for the element itself, we can use the array, This is faster and easier to use, as shown in the following example:

{0}, {1}, {2}, {3}, {4}, {5} = >,1,2,3,4,5 [0] {0}, {three, four, five} = >,0,0,3,3,3 [0] or [1,1,1,4,4,4]...Copy the code

In interpreting the above array representation before, don’t know if you have found the fact is, “the value of the element itself is fixed, but a collection of elements belong to can be change”, so we can use an array index to represent the elements, the array index stored value indicates that the element’s collection, another problem is that, What is the set, a label? The most straightforward way to do this is to pick something from the set, and we just pick one element from the set to represent the set. I believe that at this point, you still have a pile of questions in mind, no hurry, we continue to see.

Had said the representation of a collection, let’s take a look at how to implement based on this representation “and” and “check”, namely mergers and elements of the set of search, the two operations are influence each other, so it is best to put together, merger is actually change the value stored in the array, this value represents the elements (index) collection, However, there is a problem that one set is merged into another set. Do we need to change the corresponding values of all elements in the set? In fact, it is not necessary.

{0}, {3, 4}, {5, 6} = >,1,1,3,3,6,6 [1] {3, 4} U = {5, 6} > {0}. ,1,1,6,3,6,6,4,5,6 {3} = > [1] {0} U,4,5,6 {3} = >,1,2,3,4,5,6 {0} = >,6,1,6,3,6,6 [1]Copy the code

If you look closely, you’ll see that each time we merge, we update only the set that represents the element, not the set that represents the element in the entire set. “On behalf of elements (the root element)”, that is, on behalf of the collection of elements, the element deposit location is its itself, two sets, and and, we only need to update the representative elements, because look for the element in which part of the collection process is to find the representative elements, the process of representative element changed, the rest of the element the corresponding collection also changed. Based on the merged example above, let’s see how to find:

[1, 1,1,1,3,3,6,6,6] select * from (select * from (select * from (select * from (select * from (select * from))); Element 4 in set 6 [1,6,1,6,3,6,6] find element 0 in set 0 -> 1 -> 6, element 0 in set 6Copy the code

We are looking at the implementation of the code, the first is to find:

public int find(int element) {
    if (roots[element] == element) {
        return element;
    }
    
    int father = element;
    while(roots[father] ! = father) { father = roots[father]; }return father;
}
Copy the code

To make the code more concise, we can also write it recursively:

public int find(int element) {
    if (roots[element] == element) {
        return element;
    }
    
    return find(roots[element]);
}
Copy the code

A lookup is the process of finding a representative element.

Another is merged, when two elements meet, we will merge is these two elements in a collection of a merger, so we still want to use the find find these two elements in the collection, if it is the same collection do not need to merge, different collection, will be one of the representative elements changes make it points to another representative elements:

public void union(int element1, int element2) {
    int root1 = find(element1);
    int root2 = find(element2);
    
    if (root1 != root2) {
        roots[root1] = root2;
    }
}
Copy the code

If you’ve ever wondered why the array is named roots, you’ll see that the set can be viewed as multiple trees, namely forests, but the edges of the trees are oriented from child to parent. One way to visualize it is a tree falling down.


Optimization – Path compression

We can look at the time complexity of the above code. Both operations are array-based, and the union operation is dependent on find, so the time complexity of the find operation is the same as the time complexity of the look up set operation. Let’s start with an example:

{0}, {1}, {2}, {3},... {n} => [0,1,2,3... n] {0} U {1} => {1,1, 3... ,n} {1} U {2} => {1,2,2,3... ,n} {2} U {3} => {1,2,3,3... ,n} ...Copy the code

The time complexity of find(1) is O(n), and the worst time of find operation is O(n). Is there any way to optimize it? Here’s an idea of path compression, again in the example above, where we end up with a long search chain:

1 -> 2 -> 3 ->... -> nCopy the code

And the idea of optimization is to make this chain shorter, so if we find(1), at the end we can find the representation element, but everything in between 1 and the representation element is in the same set, and from there we find the same representation element, Can we assign the representative elements found by find(1) to the intermediate elements? So that next time, you can find the representative element directly from these elements. That is, change the top chain to:

1 -> n
2 -> n
3 -> n
...
Copy the code

The find function is optimized to look like this:

public int find(int element) {
    if (root[element] == element) {
        return element;
    }

    // Find the representative element
    int father = element;
    while(root[father] ! = father) { father = root[father]; }// Point all elements in the path directly to the representative element
    int compressPointer = element;
    while(root[compressPointer] ! = father) {int tmp = root[compressPointer];
        root[compressPointer] = father;
        compressPointer = tmp;
    }

    return father;
}
Copy the code

It would be neat to write it recursively:

private int find(int element) {
    if (roots[element] == element) {
        return element;
    }
    
    return roots[element] = find(roots[element]);
}
Copy the code

Find is going to be O(1) after path compression, and maybe you don’t understand why it’s O(1), because there’s a while loop in the code. Dynamic arrays, like arraylists in Java or vectors in C++, are based on static arrays. They start small. If they fill up, it regenerates a static array twice the size. Then copy all the previous values into the new array. The add operation in this step is O(n), but the time is still O(1) if you amortized it to the add operation in the previous n-1 step. Find (1) after an O(n) operation, find(1), find(2), find(3)… It all becomes order one.

I don’t want to prove it mathematically, it’s very complicated, there are papers that say look up the time complexity of the set is at worst O(logn), and there are papers that say O(an), where a is a very, very, very small number, neither of which is wrong, But the probability of degenerate to O(logn) is very, very small, only very extreme cases can occur, and just remembering a time complexity doesn’t help us much in understanding the algorithm itself. For example, quicksort is one of the ten greatest algorithms of all time, we always say that quicksort is O(n LGN), have you ever seen that the worst time of this algorithm is O(n^2), and it’s unstable, merge sort is stable and it’s O(n LGN), but compared to quicksort, It would still be less impressive. It can be seen that the performance of an algorithm cannot be analyzed only by the worst-time complexity.

If we spliced together the previous things, we could combine them into a true union set structure:

private class UnionFind {
    private int[] roots;
    
    private UnionFind(int size) {
        this.roots = new int[size];
        
        for (int i = 0; i < size; ++i) { roots[i] = i; }}private int find(int element) {
        if (roots[element] == element) {
            return element;
        }
        
        return roots[element] = find(roots[element]);
    }
    
    private void union(int element1, int element2) {
        int root1 = find(element1);
        int root2 = find(element2);
        
        if(root1 ! = root2) { roots[root1] = root2; }}}Copy the code

And check set and a optimized is called the heuristic merger, is on the union operation optimization, said before and check the set can be thought of as a heap of backwards tree, this optimization is mainly consider the depth of the tree, merge the depth needs to be above the little tree to the depth of the big tree, because of the effect of the optimization of time and there is no path compression so big, so skip here, For general problems, path compression is sufficient.


What problem can a parallel lookup set solve

Syndication is often used to solve problems on graphs, and syndication has only two operations, “union” and “search,” but some other applications can be derived from these two operations:

  • Graph connectivity issues
  • Number of sets
  • The number of elements in a set

Connectivity is easy to understand, refers to a graph is connected, “if it is a connected graph, then starting from any node on the graph, we can traverse all the nodes on the drawing”, here we just need to put on the figure of the node in the same collection, then to see if all the nodes are set point to the same; The number of sets is to find the number of representative elements. The simplest way to find the number of elements in a set is to directly traverse the roots array and then find one by one. Another method is to save an additional array in the structure to record the number of elements in each set and change it according to specific operations.

And the flip side of that is, what is the problem that you can’t solve by looking up sets? Check and set the merge operation is not reversible, you can understand as close only, that is to say two set after the merge is not to separate again, and check the other set will only save and maintain collections and elements of the relationship, as for the relationship between the elements, such as the edge of a node to a node on the graph, this information is not maintain and check the set, If you get a problem that asks you to analyze something like that, then looking up sets is not a good starting point, and you may need to look at other algorithms.

Let’s analyze the actual LeetCode problems and see how to use and search sets to solve some algorithm problems.


And look up set related issues

LC 323. Number of Connected Components in an Undirected Graph

Analysis of the topic:

Given a graph, let you find the connected region on the graph, here we only need to use and look up the set merge operation to connect the points in the same set, and finally count the number of representative elements in the roots array, that is, the number of sets.

Reference code:

private class UnionFind {
    private int[] roots;
    
    private UnionFind(int size) {
        roots = new int[size];
        
        for (int i = 0; i < size; ++i) { roots[i] = i; }}private int find(int element) {
        if (roots[element] == element) {
            return element;
        }
        
        return roots[element] = find(roots[element]);
    }
    
    private void union(int element1, int element2) {
        int root1 = find(element1);
        int root2 = find(element2);
        
        if(root1 ! = root2) { roots[root1] = root2; }}}public int countComponents(int n, int[][] edges) {
    if (edges == null && edges.length == 0) {
        return n;
    }
    
    UnionFind unionFind = new UnionFind(n);
    
    for (int i = 0; i < edges.length; ++i) {
        unionFind.union(edges[i][0], edges[i][1]);
    }
    
    int[] roots = unionFind.roots;
    
    int count = 0;
    
    for (int i = 0; i < roots.length; ++i) {
        if(roots[i] == i) { count++; }}return count;
}
Copy the code


LC 547. Friend Circles

Analysis of the topic:

They want you to find out how many circles of friends are represented by the input matrix, A and B are friends, B and C are friends, A and C are indirectly friends, and then A, B, and C form A circle of friends. Join two people who are friends with each other and find out how many friends there are.

Reference code:

private class UnionFind {
    private int[] roots;
    
    private UnionFind(int size) {
        this.roots = new int[size];
    }

    private int find(int element) {
        while (element == roots[element]) {
            return element;
        }
        
        return roots[element] = find(roots[element]);
    }

    private void union(int element1, int element2) {
        int root1 = find(element1);
        int root2 = find(element2);
        
        if(root1 ! = root2) { roots[root1] = root2; }}}public int findCircleNum(int[][] M) {
    if (M.length == 0) {
        return 0;
    }
    
    int n = M.length;
    
    UnionFind unionFind = new UnionFind(n);
    
    for (int i = 0; i < n; ++i) {
        unionFind.roots[i] = i;
    }
    
    for (int i = 0; i < n; ++i) {
        for (int j = 0; j < i; ++j) {
            if (M[i][j] == 1) { unionFind.union(i, j); }}}int count = 0;
    for (int i = 0; i < n; ++i) {
        if(unionFind.roots[i] == i) { count++; }}return count;
}
Copy the code


LC 305. Number of Islands II

Analysis of the topic:

Number of Islands: There are no Islands in the map at any given time. Each time a grid is filled in to make it land. This is a problem that can be solved with traditional search, like depth-first search, and the question is what is the time complexity? The time complexity of finding the number of islands in each traverse matrix is O(m*n), where M and n respectively represent the length and width of the matrix. Assuming that there are k moments, the time complexity of the whole problem will become O(k*m*n).

Here we need to clarify the following points:

  • The number of islands increases by 1 for each land block
  • For every two continents merged, the number of islands decreased by 1
  • Each land mass can only merge with its upper, lower, left and right lands

I don’t know if you can see from the above analysis and look up the shadow of the set, you can think of the land as an element of the set, the island as a set, and the merge operation is to get two islands together, to merge two islands, and unlike the previous problem we also need a variable to represent the current number of islands.

Here is a trick for converting a two-dimensional matrix into a one-dimensional array for easy lookup. If any element in the matrix has a two-dimensional coordinate (I, j), then the corresponding index in the one-dimensional matrix will be index = I * n + j, where n is the number of elements in each row of the matrix.

At the beginning, initializing the roots array requires traversing all the cells, the time complexity is O(m*n), there are k operations, each operation is O(1), so the overall time complexity is O(k + m*n), as can be seen, Using a look-up set reduces the time complexity of previous searches by one dimension.

Reference code:

private class UnionFind {
    private int[] roots;
    private int count;
    
    private UnionFind(int size) {
        this.roots = new int[size];
        
        Arrays.fill(roots, -1);
        
        this.count = 0;
    }
    
    private int find(int element) {
        if (roots[element] == element) {
            return element;
        }
        
        return roots[element] = find(roots[element]);
    }
    
    private void union(int element1, int element2) {
        int root1 = find(element1);
        int root2 = find(element2);
        
        if(root1 ! = root2) { roots[root1] = root2;this.count--; }}private int getCount(a) {
        return this.count;
    }
    
    private void setFather(int element, int father) {
        this.roots[element] = father;
        this.count++; }}public List<Integer> numIslands2(int m, int n, int[][] positions) {
    List<Integer> results = new ArrayList<>();
    
    if (positions == null || positions.length == 0) {
        return results;
    }
    
    UnionFind unionFind = new UnionFind(m * n);
    
    int[] dirX = {0.0, -1.1};
    int[] dirY = {-1.1.0.0};
    
    for (int i = 0; i < positions.length; ++i) {
        int currentId = positions[i][0] * n + positions[i][1];
        
        if(unionFind.roots[currentId] ! = -1) {
            results.add(unionFind.count);
            continue;
        }
        
        unionFind.setFather(currentId, currentId);
        
        for (int j = 0; j < 4; ++j) {
            int neighbourX = positions[i][0] + dirX[j];
            int neighbourY = positions[i][1] + dirY[j];
            
            if (neighbourX < 0 || neighbourX >= m || neighbourY < 0 || neighbourY >= n) {
                continue;
            }
            
            int neighbourId = neighbourX * n + neighbourY;
            
            if(unionFind.roots[neighbourId] ! = -1) {
                unionFind.union(currentId, neighbourId);
            }
        }
        
        results.add(unionFind.getCount());
    }
    
    return results;
}
Copy the code


LC 130. Surrounded Regions

Analysis of the topic:

If the inner O is connected to the outer O, then the inner O will not be rewritten. More direct approach is first to the outermost O do pretreatment, here we use a similar “sentinel node” thoughts, we open and an additional check for several groups, with the last node to represent a collection of elements in this collection will not be rewritten, so we first put the outer ring O in this collection, We specify that the set with small label must be placed in the large set to ensure that the set represented by the sentinel node is always there. Finally, we can rewrite O that is not in the sentinel set into X.

Reference code:

private class UnionFind {
    private int[] roots;
    
    private UnionFind(int size) {
        this.roots = new int[size];
        
        for (int i = 0; i < size; ++i) {
            this.roots[i] = i; }}private int find(int element) {
        if (roots[element] == element) {
            return element;
        }
        
        return roots[element] = find(roots[element]);
    }
    
    private void union(int element1, int element2) {
        int root1 = find(element1);
        int root2 = find(element2);
        
        if(root1 ! = root2) {if (root2 > root1) {
                roots[root1] = root2;
            } else{ roots[root2] = root1; }}}}public void solve(char[][] board) {
    if (board == null || board.length == 0 || board[0] = =null || board[0].length == 0) {
        return;
    }
    
    int m = board.length, n = board[0].length;
    
    UnionFind unionFind = new UnionFind(m * n + 1);
    
    for (int i = 0; i < m; ++i) {
        if (board[i][0] = ='O') {
            unionFind.union(i * n, m * n);
        }
        
        if (board[i][n - 1] = ='O') {
            unionFind.union((i + 1) * n - 1, m * n); }}for (int i = 0; i < n; ++i) {
        if (board[0][i] == 'O') {
            unionFind.union(i, m * n);
        }
        
        if (board[m - 1][i] == 'O') {
            unionFind.union((m - 1) * n + i, m * n); }}int[] dirX = {0.0, -1.1};
    int[] dirY = {-1.1.0.0};
    
    for (int i = 1; i < m - 1; ++i) {
        for (int j = 1; j < n - 1; ++j) {
            if(board[i][j] ! ='O') {
                continue;
            }
            
            int currentId = i * n + j;
            
            for (int k = 0; k < 4; ++k) {
                int neighborX = i + dirX[k];
                int neighborY = j + dirY[k];
                
                if (board[neighborX][neighborY] == 'X') {
                    continue;
                }
                
                intneighborId = neighborX * n + neighborY; unionFind.union(neighborId, currentId); }}}for (int i = 0; i < m; ++i) {
        for (int j = 0; j < n; ++j) {
            int currentId = i * n + j;

            if(unionFind.find(currentId) ! = m * n) { board[i][j] ='X'; }}}}Copy the code


conclusion

If the conditions involved in a problem can be abstracted into the concepts of set and element, then the problem can almost be solved by using set and search. If the meanings of find and union operations are clearly defined, the whole problem becomes very simple. The above is the sharing of this time, I hope to help you.