Union find¶
The union-find, or disjoint-set, data structure is used to model a collection of non-overlapping sets of elements. The three supported operations are as follows.
make_set(x)
Adds a single element
x
that is a member of a new singleton set containingx
.find(x)
Finds the set that
x
is a part of. This function returns a representative element of that set.union(x, y)
Merges the sets containing
x
andy
into a single set.
Note that with our definition of find
we can determine if x
and y
belong to the same set by comparing their representative
elements, i.e. find(x) == find(y)
.
The union find data structure can be used to model the connected components of a graph. This is useful, for example, when implementing Kruskal’s algorithm for computing minimum spanning trees.
Linked list implementation¶
A straightforward implementation of the union-find data structure
represents each set with a linked list of elements. The representative
of a set is the head of the list. We could implement the operations as
follows. Here we assume all elements are pointers to a node that have
a parent
, child
, and head
pointers pointing to the
previous, next, and first elements of the list, if any.
MAKESET(x)
create a new linked list containing x
FIND(x)
if x has a head
return x.head
else
return x
UNION(x, y)
while x has child
x = x.child
rootY = FIND(Y)
rootY.parent = x
rootY.head = x.head
while rootY has child
rootY.child.head = x.head
rootY = rootY.child
Here we use the head element of a list as its representative, and we append linked lists together to merge them.
Note that this implementation has the following runtime characteristics, where throughout \(n\) is the total number of elements being tracked.
make_set(x)
is \(O(1)\) timefind(x)
is \(O(1)\) timeunion(x, y)
is \(O(n)\) time
If we always append the smaller list to the larger list, then
performing \(m\) MAKESET
, FIND
, or UNION
operations on
\(n\) elements takes \(O(m + n \log n)\) time. This helps
because we only need to update the head
pointer on the smaller of
the lists, and so we always have to update at most half of a set.
Disjoint forests structure¶
We can improve the runtime by using a more interesting data structure,
along with some optimizations. We now represent each set as a rooted
tree. The representative of a set is the root of the tree. MAKESET
is implemented by creating a tree with a single root node. FIND
returns the root of the tree, found by traversing the tree upwards
until a node is found with no parent. UNION
is implemented by
attaching the root of one tree as the child of the root of another.
Union by rank¶
The first optimization is similar to the one we employed with linked
lists, i.e. always appending to the larger list. Here we always merge
the smaller tree into the larger tree. Note that this increases the
depth of the tree being merged into only if the height of the two
trees were equal beforehand. Both UNION
and FIND
can be
implemented in \(O(\log n)\) time with this optimization.
Path compression¶
In path compression, whenever we traverse the tree for a FIND
operation we attach all nodes traversed in getting to the root
directly to the root; this maintains the validity of the structure,
since the root is the representative for all these nodes. The
intuition is that any subsequent calls to FIND
on the same node or
any recursive parent of the node will be faster.
With both union-by-rank and path compression, the amortized time per
FIND
or UNION
operation is \(O(\alpha(n))\), where
\(\alpha(n)\) is defined as the inverse of \(A(n, n)\), where
is the Ackermann function. It can be safely assumed that \(\alpha(n) \le 5\) for any real-world inputs, since the Ackermann function is increasing in each of its arguments and \(A(4, 4) = 2^{2^{2^{65536}}} - 3\).