This article is a translation of Swift Algorithm Club. Swift Algorithm Club is an open source project produced by Raywenderlich.com to realize algorithms and data structures by Swift. Currently, there are 18000+⭐️ on GitHub. I have made a brief statistics, and there are about 100 algorithms and data structures. It’s basically a good resource for iOSer learning algorithms and data structures. 🐙andyRon/ swift-algorithm-club-CN is my project of learning and translating for Swift Algorithm Club. Due to the limited capacity, if you find any errors or improper translation, please correct them. Welcome to Pull Request. Also welcome interested, have time partners to participate in translation and learning 🤓. Of course, we welcome ⭐️, 🤩🤩 ️ ⭐ ⭐. A translation of this article and the code can be found at 🐙swift-algorithm-club-cn/ union-find


And check sets (the Union – Find)

A union lookup set is a data structure that tracks a set of elements spread over several disjoint (non-overlapping) subsets. It is also known as an unintersected data structure.

What does that mean? For example, a lookup set data structure can track the following collections:

[ a, b, f, k ]
[ e ]
[ g, d, c ]
[ i, j ]
Copy the code

These sets are disjoint because they have no common members.

A parallel lookup set supports three basic operations:

  1. Find(A) : Determines the subset of element A. For example, find(d) will return the subset [g, d, c].

  2. Union(A, B) : Merges two subsets containing A and B into one subset. For example, union(d, j) means to combine [g, d, c] and [I, j] into a larger set [g, d, c, I, j].

  3. AddSet(A) : Adds A new subset containing only element A. For example, addSet(h) adds a new collection [h].

The most common application of this data structure is to trace the connected components of an undirected graph. It is also used to implement a valid version of Kruskal’s algorithm to find the minimum spanning tree of a graph.

The implementation of

Pooling sets can be implemented in a number of ways, but we’ll look at an efficient and easy-to-understand implementation: Weighted Quick Union.

PS: and multiple implementations of the lookup set have been included in the playground.

public struct UnionFind<T: Hashable> {
  private var index = [T: Int] ()private var parent = [Int] ()private var size = [Int]()
}
Copy the code

Our pooled set data structure is actually a forest in which each subset is represented by a tree.

For our purposes, we only need to trace the parent of each tree node, not the child. To do this, we use the array parent, so parent[I] is the parent index of node I.

Example: If parent looks like this,

parent [ 1, 1, 1, 0, 2, 0, 6, 6, 6 ]
     i   0  1  2  3  4  5  6  7  8
Copy the code

Then the tree structure looks like this:

      1              6
    /   \           / \
  0       2        7   8
 / \     /
3   5   4
Copy the code

There are two trees in this forest, and each tree corresponds to a set of elements. (Note: Trees are shown here as binary trees due to ASCII limitations, but this is not necessarily the case.)

We provide each subset with a unique number to identify it. The number is the index of the root node of the subset tree. In the example, node 1 is the root of the first tree and node 6 is the root of the second tree.

So in this example we have two subsets, the first with tag 1 and the second with tag 6. The Find operation actually returns the label of the set, not its contents.

Note that the parent[] of the root node points to itself. So parent[1] = 1 and parent[6] = 6. That’s how we know which ones are the root nodes.

Add a collection

Let’s look at the implementation of these basic operations, starting with adding a new set.

public mutating func addSetWith(_ element: T) {
  index[element] = parent.count  / / 1
  parent.append(parent.count)    / / 2
  size.append(1)                 / / 3
}
Copy the code

When you add a new element, you actually add a new subset that contains only that element.

  1. We save the index of the new element in the index dictionary. This allows us to quickly look up elements later.

  2. We then add the index to the parent array to build a new tree for the collection. Here, parent[I] points to itself, because the tree representing the new collection contains only one node, which of course is the root of the tree.

  3. Size [I] is the number of nodes in the tree whose root is in index I. For the new set, this is 1, because it contains only one element. We will use the size array in the Union operation.

To find the

Usually we want to determine if we already have a set containing a given element. That’s what the Find operation does. In our UnionFind data structure, it is called setOf() :

public mutating func setOf(_ element: T) -> Int? {
  if let indexOfElement = index[element] {
    return setByIndex(indexOfElement)
  } else {
    return nil}}Copy the code

This looks up the index of the element in the index dictionary, and then uses helper methods to find the collection to which the element belongs:

private mutating func setByIndex(_ index: Int) -> Int {
  if parent[index] == index {  / / 1
    return index
  } else {
    parent[index] = setByIndex(parent[index])  / / 2
    return parent[index]       / / 3}}Copy the code

Because we’re dealing with a tree structure, we’re using a recursive approach here.

Recall that each collection is represented by a tree, and the index of the root node is used as a number to identify the collection. We will find the root node of the tree to which the element we are searching belongs and return its index.

  1. First, we check whether the given index represents the root node (that is, a node whose “parent” points to the node itself). If so, we’re done.

  2. Otherwise, we call this method recursively on the parent of the current node. Then we did something very important: we overwrote the parent node of the current node with the index of the root node, essentially reconnecting the node directly to the root node of the tree. The next time we call this method, it will execute faster because the root path of the tree is now much shorter. Without this optimization, the complexity of this approach is O(n), but now with size optimization (illustrated in the Union section) it’s almost O(1).

  3. We return the index of the root node as a result.

That’s what I’m saying. Now the tree looks like this:

We call setOf(4). To find the root node, we must first go to node 2 and then to node 7. (The index of the element is marked red.)

During the call to setOf(4), the tree is reorganized as follows:

Now if we need to call setOf(4) again, we no longer need to go through node 2 to the root node. Therefore, when you use the union-find data structure, it optimizes itself. Cool!

There is an auxiliary method to check if two elements are in the same collection:

public mutating func inSameSet(_ firstElement: T, and secondElement: T) -> Bool {
  if let firstSet = setOf(firstElement), let secondSet = setOf(secondElement) {
    return firstSet == secondSet
  } else {
    return false}}Copy the code

This calls setOf(), which also optimizes the tree.

Union (Weighted)

The final operation is Union, which combines the two sets into a larger set.

    public mutating func unionSetsContaining(_ firstElement: T, and secondElement: T) {
        if let firstSet = setOf(firstElement), let secondSet = setOf(secondElement) { / / 1
            iffirstSet ! = secondSet {/ / 2
                if size[firstSet] < size[secondSet] { / / 3
                    parent[firstSet] = secondSet      / / 4
                    size[secondSet] += size[firstSet] / / 5
                } else {
                    parent[secondSet] = firstSet
                    size[firstSet] += size[secondSet]
                }
            }
        }
    }
Copy the code

Here’s how it works:

  1. We find the set that each element belongs to. Remember, this gives us two integers: the index of the root node in the parent array.

  2. Check if these sets are equal; if they are, the merge makes no sense.

  3. This is where size optimization comes in (weighting). We want to keep the tree as shallow as possible, so we always attach smaller trees to the roots of larger trees. To determine which is the smaller tree, we compare trees by their size.

  4. Here we attach the smaller tree to the root of the larger tree.

  5. Update the larger tree size because it only adds a bunch of nodes.

Illustrations may help to understand this better. Suppose we have these two sets, each with its own tree:

Now we call unionSetsContaining(4, and:3). The smaller tree is connected to the larger tree:

Notice that because we call setOf() at the beginning of the method, the tree is also optimized in the process – node 3 now hangs directly above the root.

A Union with optimization takes almost O(1) time.

Path to the compression

private mutating func setByIndex(_ index: Int) -> Int {
    ifindex ! = parent[index] {// Updating parent index while looking up the index of parent.
        parent[index] = setByIndex(parent[index])
    }
    return parent[index]
}
Copy the code

Path compression helps keep the tree very flat, so a lookup operation might only need __O(1)__.

Complexity summary

Processing N objects
Data Structure Union Find
Quick Find N 1
Quick Union Tree height Tree height
Weighted Quick Union lgN lgN
Weighted Quick Union + Path Compression very close, but not O(1) very close, but not O(1)
Process M’s union command on N objects
Algorithm Worst-case time
Quick Find M N
Quick Union M N
Weighted Quick Union N + M lgN
Weighted Quick Union + Path Compression (M + N) lgN

Further reading

For more examples of how to use this convenient data structure, see Playground.

And look it up on Wikipedia

Translated by Andy Ron and proofread by Andy Ron