This article has been accepted by Github github.com/silently952…
IDEA plug-ins commonly used by programmers: github.com/silently952…
Fully open source Taoke project: github.com/silently952…
Wechat official account: Beta learning Java
preface
I visited my hometown during the Spring Festival holiday and stopped for many days. This is the first article I wrote after the Spring Festival holiday. First, I will talk about what happened during the Spring Festival holiday
“Out of the silt without staining, zhuo Qinglian without demon”
I didn’t expect to meet her when I got home this time. My body became fat and my image of the goddess was broken instantly, just like Da Vinci met Mona Lisa again
“Handan xiangpin Cuiye residue”
All right, back to the point.
Sometimes we need to determine whether two computers are connected on a large network and whether we need to establish a new connection to communicate. Or whether two people are friends on social networks (connected means friends). In this kind of application, we usually need to deal with millions of objects and hundreds of millions of connections. How can we quickly determine whether we are connected or not? So you need to use the union-find algorithm
concept
Connected to a
If you input a pair of integers, where each number represents some object (person, address, computer, etc.), the integers p and Q are understood to be “connected with P and Q”, which have the following properties:
- Reflexivity: P is related to P
- Symmetry: If P is connected to Q, then Q is connected to P
- Transitivity: If P is connected to Q and Q is connected to R, then P is connected to R
How do objects relate to numbers, and then we talk about an algorithmic symbol table
Equivalence class
If the connection is a kind equivalence relation, then the equivalence relation can divide objects into multiple equivalence classes. In this algorithm, two objects belong to the same equivalence class if and only if they are connected
contact
Certain objects in the entire network are called contacts
Connected components
Pairs of integers are called connections and equivalence classes are called connected components or simply components
Dynamic connectivity
The goal of the union-find algorithm is that when a program reads pairs of integers p and q from the input, if all known pairs of integers do not show that p and q are connected, the pair of integers is printed. Otherwise, the pair of integers is ignored. We need to design data structures to store the information of all the known integer pairs and determine whether the input integer pairs are connected or not. This problem is called dynamic connectivity problem.
The union-find algorithm API definition
public interface UF { void union(int p, int q); // add a link between p and q int find(int p); Boolean connected(int p, int q); Boolean connected(int p, int q); Int count(); // Count the connected components}Copy the code
If two contacts are in different components, the union operation merges the two components. We start with N components (each contact represents one component) and subtract one by merging the two components.
The abstract implementation is as follows:
public abstract class AbstractUF implements UF { protected int[] id; protected int count; public AbstractUF(int N) { count = N; id = new int[N]; for (int i = 0; i < N; i++) { id[i] = i; } } @Override public boolean connected(int p, int q) { return find(p) == find(q); } @Override public int count() { return count; }}Copy the code
So let’s focus on how do we implement the Union method and the find method
The quick – find algorithm
The realization idea of this algorithm is that the values of all contacts in ID [] in the same connected component are the same, and the method to judge connected is to judge whether ID [p] is equal to ID [q].
public class QuickFindImpl extends AbstractUF { public QuickFindImpl(int N) { super(N); } @Override public int find(int p) { return id[p]; } @Override public void union(int p, int q) { int pId = find(p); int qId = find(q); If (pId == qId) {// if (pId == qId) {return; } for (int i = 0; i < id.length; i++) { if (id[i] == pId) { id[i] = qId; QId}} count--; // The connected component is reduced by one}}Copy the code
- Algorithm analysis:
The find() operation is obviously very fast, O(1), but the union algorithm can’t handle large data, because it needs the entire array of variables each time, so the union method has O(n) time.
The quick – union algorithm
To speed up the Union method, we need to consider another algorithm; The same data structure is used, but the meaning of id[] is redefined. The value of ID [] for each contact is the name of another contact in the same component
After the array is initialized, each node’s link points to itself; The id[] array represents the forest in the form of a parent link, and each union operation finds the root node of each component for merging.
public class QuickUnionImpl extends AbstractUF { public QuickUnionImpl(int N) { super(N); } @override public int find(int p) { = id[p]) { p = id[p]; } return p; } @Override public void union(int p, int q) { int pRoot = find(p); Int qRoot = find(q); If (pRoot == qRoot) {return; } id[pRoot] = qRoot; // root count--; }}Copy the code
- Algorithm analysis:
It seems that Quick-union is faster than Quick-find because union does not need to traverse the entire array for each pair of inputs. Considering the optimal case, find only needs to access the array once to get the root contact, the time complexity of the union method is O(n). Consider the worst-case input case, as shown below:
The find method needs to access the array n-1 times, so the union method has O(n²) time.
Weighted Quick-Union algorithm
In order to ensure that the worst case of the Quick-Union algorithm does not occur, I need to record the size of each tree and always join the small trees to the large ones during the component merging operation. The tree height constructed by this algorithm will be much smaller than the tree height constructed by the unweighted version.
public class WeightedQuickUnionImpl extends AbstractUF { private int[] sz; public WeightedQuickUnionImpl(int N) { super(N); sz = new int[N]; for (int i = 0; i < N; i++) { sz[i] = 1; } } @Override public void union(int p, int q) { int pRoot = find(p); Int qRoot = find(q); If (pRoot == qRoot) {return; If (sz[pRoot] < sz[qRoot]) {sz[qRoot] += sz[pRoot]; id[pRoot] = qRoot; } else { sz[pRoot] += sz[qRoot]; id[qRoot] = pRoot; } count--; } @override public int find(int p) { = id[p]) { p = id[p]; } return p; }}Copy the code
- Algorithm analysis:
In the worst case, each union merges a tree of equal size. They all contain 2 to the n nodes with height n+1. From this, we can see that the time complexity of union method is O(lgN).
conclusion
The union-find algorithm can only judge whether two integers given are connected or not, and cannot give a specific path to reach them. Later we’ll talk about graph algorithms that give you specific paths
algorithm | union() | find() |
---|---|---|
The quick – find algorithm | N | 1 |
The quick – union algorithm | The height of the tree | The height of the tree |
Weighted Quick-Union algorithm | lgN | lgN |
Finally (pay attention, don’t get lost)
There may be more or less deficiencies and mistakes in the article, suggestions or opinions are welcome to comment and exchange.
Finally, writing is not easy, please do not white whoring I yo, I hope friends can point comment attention to three even, because these are all the power source I share 🙏
All source code has been put into github repository github.com/silently952…
Reference books: Algorithms fourth edition