Disjoint Set Data Structure

Jul 9, 2024

Introduction to Disjoint Set Data Structure

Importance

  • Commonly asked in interviews & online assessments
  • Mastery required by watching this video and solving follow-up problems

Why Use Disjoint Set Data Structure?

  • To determine if two nodes belong to the same component in constant time
  • Useful in dynamic graphs, which change their configuration frequently

Example

  • Given graph components: [1, 2, 3, 4] and [5, 6, 7]
  • Traditional brute force using DFS/BFS takes linear time (O(N+E))
  • Disjoint Set Data Structure answers in constant time

Key Operations

  1. Find Parent: Determine the parent node of a given node
  2. Union: Connect two components
    • Methods: By rank or by size

Implementation of Disjoint Set Data Structure

Initial Configuration

  • All nodes are their own parents
  • Rank/Size initialized (rank to 0 and size to 1)

Union by Rank

  • Compare ranks of parent nodes
  • Attach smaller rank tree under root of higher rank tree
  • Equal rank -> attach any and increment rank

Union by Size

  • Similar to rank, but using the size of components
  • Attach smaller size tree to the root of the larger size tree
  • Equal size -> attach any and increment size

Path Compression

  • Used in Find Parent operation
  • Flattens the structure by making nodes point directly to the root
  • Reduces time complexity to nearly constant

Coding Implementation

Class Structure

  • Implemented in a reusable class format
  • Constructor initializes parent and rank/size arrays

Pseudocode

  • Find Parent with Path Compression:
if node == parent[node]:
   return node
else:
   parent[node] = find_parent(parent[node])
   return parent[node]
  • Union by Rank:
if rank[pu] < rank[pv]:
   parent[pu] = pv
elif rank[pu] > rank[pv]:
   parent[pv] = pu
else:
   parent[pv] = pu
   rank[pu] += 1
  • Union by Size:
if size[pu] < size[pv]:
   parent[pu] = pv
   size[pv] += size[pu]
else:
   parent[pv] = pu
   size[pu] += size[pv]

Time Complexity

  • Both Union operations (by Rank and by Size) achieve nearly constant time complexity, O(α(N)), where α(N) is the inverse Ackermann function, which grows very slowly and is almost constant.