Skip to content

Union-Find (Disjoint Set Union)

Union-Find maintains a partition of n elements into disjoint sets and keeps the answer to "are u and v in the same group?" cheap even as those groups keep merging. It is the backbone of connectivity problems, Kruskal's MST, and any scenario that needs fast equivalence queries over a changing collection.

Difficulty: Intermediate | Prereqs: Arrays and Strings | Contest Frequency: Common—appears in most Div 2 C/D problems


Contents


Intuition

First, what is a connected component? Consider 5 nodes and 3 edges:

text
  Nodes: {0, 1, 2, 3, 4}
  Edges: (0,1), (1,2), (3,4)

  Visual:
    0 --- 1 --- 2       3 --- 4
    \___________/        \___/
     component A        component B

Nodes 0, 1, and 2 can reach each other by following edges—they form one connected component. Nodes 3 and 4 form another. Node 0 cannot reach node 3 by any path, so they are in different components.

Now suppose someone asks: "Are 0 and 2 connected?" Without a specialized data structure, you have to run BFS or DFS from node 0 and check whether node 2 appears, costing O(n+m) per query. Over q queries that becomes O(q(n+m)). If edges are being added between queries, it gets worse: the graph changes after each update, forcing you to start over. DSU avoids that repeated work by answering connectivity queries in near-O(1) while absorbing new edges incrementally.

Imagine 6 servers numbered 0–5, starting isolated. Requests come in two flavors:

  • connect(u, v)—link two servers into the same network.
  • connected?(u, v)—are these two servers in the same network?

Here is a concrete sequence:

text
connect(0, 1)
connect(2, 3)
connected?(1, 3)   --> NO
connect(1, 2)
connected?(0, 3)   --> YES
connect(4, 5)
connected?(3, 5)   --> NO

Naive approach—component label array. Maintain an array comp[] where comp[i] is the component ID of node i. Initially comp = [0, 1, 2, 3, 4, 5].

text
connect(0, 1):     comp[1] was 1, scan all 6 entries, change every 1 -> 0
                   comp = [0, 0, 2, 3, 4, 5]                    6 ops

connect(2, 3):     scan all 6 entries, change every 3 -> 2
                   comp = [0, 0, 2, 2, 4, 5]                    6 ops

connected?(1, 3):  comp[1]=0, comp[3]=2, NO                     1 op

connect(1, 2):     comp[1]=0, comp[2]=2, scan all 6, change every 2 -> 0
                   comp = [0, 0, 0, 0, 4, 5]                    6 ops

connected?(0, 3):  comp[0]=0, comp[3]=0, YES                    1 op

connect(4, 5):     scan all 6 entries, change every 5 -> 4
                   comp = [0, 0, 0, 0, 4, 4]                    6 ops

connected?(3, 5):  comp[3]=0, comp[5]=4, NO                     1 op

Total: 27 operations for just 7 queries on 6 nodes. Every connect scans the entire array—O(n) per merge. With m operations on n elements the naive approach costs O(nm). At n=m=105 that is 1010 operations—hopeless.

We're rewriting the same labels over and over. There has to be a better way.

Instead of labeling every element with a group ID, let each element point to a parent. The result is a forest of trees—one tree per group—with each tree's root serving as the group's representative.

Think of it as tribes merging. Each starts alone. When two groups merge, one leader becomes a child of the other. To check whether two elements belong together, each walks upward to its representative and the roots are compared.

The analogy has one important limit: in real life, leadership is meaningful. In DSU, the root is just an internal label. We can reshape the tree however we like as long as every node in the same set still reaches the same root. That freedom is precisely what makes path compression and union by rank possible.

Think of DSU as maintaining equivalence classes. Two elements are equivalent if they are connected, and equivalence is reflexive, symmetric, and transitive. Every union merges two classes; every find asks which class a node belongs to. DSU is the right tool whenever a problem defines some notion of sameness that grows over time—connected components, equal-value groups, same-color regions, and similar relations.

Walkthrough: Six Servers

Same 6 servers, same operations. Each node stores a parent pointer, and par[i] = i means i is a root.

Step 0: Initialization—every node is its own root.

text
  par: [0, 1, 2, 3, 4, 5]   rank: [0, 0, 0, 0, 0, 0]

  0   1   2   3   4   5       <-- 6 singleton trees

Step 1: connect(0, 1)—find(0)=0, find(1)=1, different roots. Attach 1 under 0 (equal rank, pick 0 as root, bump its rank).

text
  par: [0, 0, 2, 3, 4, 5]   rank: [1, 0, 0, 0, 0, 0]

  0       2   3   4   5
  |
  1
                                   Cost: 2 finds (1 step each) + 1 link = 3 ops

Step 2: connect(2, 3)—find(2)=2, find(3)=3, different roots. Attach 3 under 2.

text
  par: [0, 0, 2, 2, 4, 5]   rank: [1, 0, 1, 0, 0, 0]

  0       2       4   5
  |       |
  1       3
                                   Cost: 2 finds (1 step each) + 1 link = 3 ops

Step 3: connected?(1, 3)—find(1): walk 1→0 (root). find(3): walk 3→2 (root). Roots differ: 0 ≠ 2. Answer: NO.

text
  Trees unchanged.
                                   Cost: 2 finds (2 steps each) = 4 ops

Step 4: connect(1, 2)—find(1): walk 1→0 (root=0). find(2)=2 (root). Roots 0 and 2 have equal rank (both 1). Attach 2 under 0, bump rank of 0.

text
  par: [0, 0, 0, 2, 4, 5]   rank: [2, 0, 1, 0, 0, 0]

        0           4   5
       / \
      1   2
          |
          3
                                   Cost: 2 finds (2+1 steps) + 1 link = 4 ops

Step 5: connected?(0, 3)—find(0)=0 (root, 1 step). find(3): walk 3→2→0 (root=0). Same root! Answer: YES.

Now path compression kicks in: node 3 passed through node 2 on the way up. Repoint both directly to root 0:

text
  Before compression:           After compression:

        0                            0
       / \                        / | \
      1   2                      1  2  3
          |
          3

  par: [0, 0, 0, 0, 4, 5]   rank: [2, 0, 1, 0, 0, 0]

                                   Cost: 2 finds (1+3 steps) = 4 ops
                                   Future find(3) will be O(1)!

Step 6: connect(4, 5)—find(4)=4, find(5)=5. Attach 5 under 4.

text
  par: [0, 0, 0, 0, 4, 4]   rank: [2, 0, 1, 0, 1, 0]

        0           4
      / | \         |
     1  2  3        5
                                   Cost: 2 finds + 1 link = 3 ops

Step 7: connected?(3, 5)—find(3)=0 (direct! path was compressed in Step 5). find(5): walk 5→4 (root=4). Roots differ: 0 ≠ 4. Answer: NO.

text
  Trees unchanged.
                                   Cost: 2 finds (1+2 steps) = 3 ops

Operation count comparison:

text
  Naive (label array):  27 operations   (O(n) per connect)
  Union-Find:           24 operations   (and shrinking as trees flatten)

On this tiny example the savings are modest. But as n and m grow, the naive approach stays O(n) per merge while Union-Find drops to nearly O(1) per operation—the trees keep getting flatter with every query.

Path Compression in Detail—A 5-Node Chain

To see the payoff more clearly, flatten a chain of 5 nodes in a single find() call.

Suppose a sequence of unions without compression has built this tall chain:

text
  BEFORE find(4):

      0             par: [0, 0, 1, 2, 3]
      |
      1             find(4) walks:  4 -> 3 -> 2 -> 1 -> 0  (root!)
      |                             (4 hops to reach root)
      2
      |
      3
      |
      4

When find(4) is called, the recursive version unwinds and repoints every node on the path directly to root 0:

text
  AFTER find(4):

          0         par: [0, 0, 0, 0, 0]
        / | \ \
       1  2  3  4   Every node now points directly to root 0.
                    Future find() on ANY of these nodes: 1 hop.

One call to find(4) turns a depth-4 chain into a depth-1 star. That is path compression in a nutshell: pay O(k) once to traverse a chain of length k, and every subsequent find on any of those nodes costs O(1).

Step-by-step trace of the recursive find(4):

text
  Call stack (going down):            Rewiring (going back up):
  -------------------------           -------------------------
  find(4): p[4]=3, not root           p[4] = find(3) -> 0  ok
    find(3): p[3]=2, not root          p[3] = find(2) -> 0  ok
      find(2): p[2]=1, not root         p[2] = find(1) -> 0  ok
        find(1): p[1]=0, not root        p[1] = find(0) -> 0  ok
          find(0): p[0]=0, ROOT! -> return 0

Union by Rank—Step-by-Step Visual

Union by rank keeps trees shallow by always attaching the shorter tree under the taller one. To see how that plays out, trace 5 unions from scratch with rank tracking. The notation i(r) means node i with rank r.

text
Initial: 5 separate nodes, rank[i]=0 for all

  1(0)  2(0)  3(0)  4(0)  5(0)

union(1,2): ranks equal (both 0). Pick 1 as root, bump rank[1] to 1.

text
  1(1)   3(0)  4(0)  5(0)
  |
  2(0)

union(3,4): ranks equal (both 0). Pick 3 as root, bump rank[3] to 1.

text
  1(1)   3(1)   5(0)
  |      |
  2(0)   4(0)

union(1,3): ranks equal (both 1). Pick 1 as root, bump rank[1] to 2.

text
     1(2)      5(0)
    / \
   2(0) 3(1)
         |
        4(0)

union(1,5): rank[1]=2 > rank[5]=0, so attach 5 under 1. Rank does not increase—the taller tree absorbed a shorter one.

text
       1(2)
      /|\ 
     2  3  5
        |
        4

find(4): path is 4 -> 3 -> 1. Path compression repoints 4 directly to root 1:

text
       1(2)
      /|\ \
     2  3  5  4

Notice how rank guides every decision:

  • Equal rank: either root works; bump the winner's rank.
  • Unequal rank: shorter tree goes under taller tree, no rank change.
  • Result: a tree of rank k contains 2k nodes, so rank never exceeds log2n.

The Invariants

text
+------------------------------------------------------------------+
|  INVARIANT                                                        |
|                                                                   |
|  1. Each tree's root is the unique representative for its set.    |
|     Two elements are in the same set iff find() returns the       |
|     same root.                                                    |
|                                                                   |
|  2. Path compression re-parents nodes directly to the root        |
|     during find(). This changes tree shape but never changes      |
|     which root a node belongs to -- representatives are           |
|     preserved.                                                    |
|                                                                   |
|  3. Union by rank attaches the shorter tree under the taller      |
|     root. This prevents any single chain from growing tall.       |
|     A tree of rank k contains at least 2^k nodes.                 |
+------------------------------------------------------------------+

In Step 5, node 3 sat two levels below the root. Path compression moved it directly under root 0: the root did not change, node 3 stayed in the same set, and the next find(3) became a single hop. That is the compression payoff—you pay once, and every future query on that node is nearly free.

In Step 4, roots 0 and 2 had equal rank, so we attached 2 under 0 and bumped rank. Had we attached 0 under 2 the depth would have been the same. When ranks are unequal, always attaching the shorter tree under the taller one prevents any single chain from growing tall.

Why O(α(n))—Informal Amortized Proof

Each optimization alone is good but not great:

  • Union by rank alone guarantees tree height log2n (a rank-k root has 2k descendants, so rank never exceeds log2n). Every find costs O(logn).

  • Path compression alone can still build tall trees—imagine a long chain before the first find. But once you call find on a deep node, every node on that path gets repointed to the root, so future finds on those nodes cost O(1). Amortized over many operations this gives O(logn) via an iterated-logarithm argument.

Together they are far more powerful than either alone. The intuition:

  1. Union by rank ensures no tree ever grows taller than logn, so the worst case for a single find is O(logn)—bounded, not catastrophic.

  2. Path compression moves every node you touch during a find directly to the root. You pay for visiting those nodes once; every future find involving them skips straight there.

  3. Because trees are shallow (rank bound), the number of nodes that can be at each depth level is limited. Path compression keeps promoting nodes to the root level, so the pool of "deep" nodes drains rapidly. After enough operations, almost every node sits one hop from its root.

  4. Tarjan's 1975 analysis shows this draining is so aggressive that any sequence of m find/unite operations on n elements costs at most O(mα(n)) total, where α is the inverse Ackermann function. Since the Ackermann function grows faster than any primitive recursive function, its inverse satisfies α(n)4 for every n that could fit in a computer's memory (n<22265536).

For contest purposes: treat find and unite as O(1).

Why Both Optimizations Matter

The real insight is the "representative" concept, not the tree itself. You do not care which node is the root—you care that find(u) == find(v) if and only if u and v are in the same set. Path compression can completely restructure the internal tree as long as every node still reaches the correct root. That is why comparing p[u] == p[v] is wrong: immediate parents are not representatives.

The two optimizations do fundamentally different things. Rank alone caps tree height at O(logn), so every find stays logarithmic—but never improves. Compression alone can still start with tall chains; it flattens them over time, but the first find on a deep chain is expensive. Together they achieve O(α(n)) because compression keeps squeezing trees that rank has already kept shallow.

text
  WHY BOTH OPTIMIZATIONS ARE NEEDED
  ===================================

  Rank only:              Compression only:       Both:
  ----------              ------------------       ----
  Height <= log n         Height starts O(n)       Height <= log n
  Every find: O(log n)    After finds: flattens    Finds flatten shallow trees
  Never improves          Amortized O(log n)       Amortized O(alpha(n)) ~= O(1)

       O                       O                        O
      / \                      |                      / | \
     O   O               find= |  (first time)       O  O  O
    / \                   O(n) O              find ~= O(alpha(n)) ~= 1
   O   O                       |
                                O

DSU is append-only. You can merge sets but you cannot split them. If the problem involves edge deletions, either process queries in reverse—so deletions become additions—or use rollback DSU with a checkpoint stack. Spotting "this is a deletion problem" and flipping to reverse processing is a reliable contest reflex.

When to Reach for DSU

Once the mechanics are clear, the pattern is easy to recognize.

Trigger phrases—if you see these in a problem statement, think DSU:

  • "connected components" with edges added over time
  • "merge groups" / "equivalence classes" / "same set"
  • "online connectivity"—answer "is u reachable from v?" as edges arrive
  • "Kruskal" / "minimum spanning tree"—sort edges, greedily unite
  • "transitive relation"—if ab and bc then ac

Counter-examples—these look like DSU but aren't:

  • "Shortest path between u and v"—DSU answers only yes/no connectivity, not distances. Use BFS or Dijkstra.
  • "Delete edges and check connectivity"—vanilla DSU cannot undo unions. Process queries in reverse (deletions become additions) or use rollback DSU—see the Offline Queries and Rollback patterns below.
  • "Connected components in a static graph"—if no edges are being added, a simple BFS/DFS finds all components in O(n+m). DSU shines when edges arrive incrementally and you need answers between arrivals.

C++ STL Reference

There is no STL equivalent for Union-Find—you must implement it yourself. The structure is small enough to inline in any contest solution.

OperationSignatureWhat it doesAmortized Time
InitializeDSU(int n)Create n singleton sets {0},{1},,{n1}O(n)
Findint find(int x)Return root representative of x's set (with path compression)O(α(n))
Unionbool unite(int a, int b)Merge sets containing a and b; return false if already same setO(α(n))
Connectedbool connected(int a, int b)Check if a and b are in the same setO(α(n))
Set sizeint size(int x)Return number of elements in x's setO(α(n))
Component countint components()Return current number of disjoint setsO(1)

Implementation

Minimal Contest Template

cpp
#include <bits/stdc++.h>
using namespace std;

struct DSU {
    vector<int> p, rank_;
    int comps;
    DSU(int n) : p(n), rank_(n, 0), comps(n) { iota(p.begin(), p.end(), 0); }
    int find(int x) { return p[x] == x ? x : p[x] = find(p[x]); }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rank_[a] < rank_[b]) swap(a, b);
        p[b] = a;
        if (rank_[a] == rank_[b]) rank_[a]++;
        comps--;
        return true;
    }
    bool connected(int a, int b) { return find(a) == find(b); }
};

int main() {
    int n, m;
    cin >> n >> m;
    DSU dsu(n);
    while (m--) {
        int t, u, v;
        cin >> t >> u >> v;
        if (t == 1) dsu.unite(u, v);
        else cout << (dsu.connected(u, v) ? "YES" : "NO") << "\n";
    }
}

Annotated Version (Union by Size)

cpp
#include <bits/stdc++.h>
using namespace std;

struct DSU {
    vector<int> parent;
    vector<int> sz;    // sz[x] is meaningful only when x is a root
    int comps;         // number of connected components

    // Create n singleton sets: {0}, {1}, ..., {n-1}
    DSU(int n) : parent(n), sz(n, 1), comps(n) {
        // Every element is its own parent initially
        iota(parent.begin(), parent.end(), 0);
    }

    // Find the root representative of x.
    // Path compression: every node on the path from x to root
    // gets re-parented directly to root. This flattens the tree
    // so future finds on these nodes are O(1).
    int find(int x) {
        if (parent[x] != x)
            parent[x] = find(parent[x]); // recursive path compression
        return parent[x];
    }

    // Merge the sets containing a and b.
    // Union by size: attach smaller tree under larger tree's root.
    // Returns true if a merge actually happened (false if already same set).
    bool unite(int a, int b) {
        a = find(a);
        b = find(b);
        if (a == b) return false; // already in the same set

        // Ensure a is the larger set (by element count)
        if (sz[a] < sz[b]) swap(a, b);

        parent[b] = a;     // b's root becomes child of a's root
        sz[a] += sz[b];    // absorb b's count into a
        comps--;            // one fewer component
        return true;
    }

    bool connected(int a, int b) {
        return find(a) == find(b);
    }

    // Number of elements in the set containing x
    int size(int x) {
        return sz[find(x)];
    }

    int components() const {
        return comps;
    }
};

int main() {
    int n, m;
    cin >> n >> m;
    DSU dsu(n);

    while (m--) {
        int type, u, v;
        cin >> type >> u >> v;
        if (type == 1) {
            dsu.unite(u, v);
        } else {
            cout << (dsu.connected(u, v) ? "YES" : "NO") << "\n";
        }
    }
    return 0;
}

Operations Reference

With both optimizations active, every DSU operation is effectively constant time.

OperationTime (amortized)Time (worst, no compression)Space
DSU(n) (init)O(n)O(n)O(n)
find(x)O(α(n))O(logn)--
unite(a, b)O(α(n))O(logn)--
connected(a, b)O(α(n))O(logn)--
size(x)O(α(n))O(logn)--
components()O(1)O(1)--
m operations totalO(mα(n))O(mlogn)O(n)

α(n) = inverse Ackermann function 4 for all practical n.

Only union by rank (no compression): O(logn) per find. Only path compression (no rank): amortized O(logn) per find (iterated-log bound). Both together: amortized O(α(n)).

Union by Rank vs Union by Size

Two variants of the balancing heuristic exist. Union by rank tracks tree height:

cpp
int par[MAXN], rnk[MAXN];
void init(int n) { iota(par, par + n, 0); fill(rnk, rnk + n, 0); }
void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (rnk[a] < rnk[b]) swap(a, b);
    par[b] = a;
    if (rnk[a] == rnk[b]) rnk[a]++;
}

Union by size tracks subtree size:

cpp
int par[MAXN], sz[MAXN];
void init(int n) { iota(par, par + n, 0); fill(sz, sz + n, 1); }
void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (sz[a] < sz[b]) swap(a, b);
    par[b] = a;
    sz[a] += sz[b];
}

Both give O(α(n)) with path compression. Union by size is usually more useful in contests because you can query component sizes directly. Union by rank uses slightly less space (rank values stay small). Default to size unless you have a specific reason to prefer rank.

Problem Patterns & Tricks

Basic Connectivity Queries

Given a graph where edges are added one at a time, answer "are u and v connected?" after each batch or inline. This is the core DSU application. See CF 1411C (~1300) and CSES—Road Construction.

Example problems:

Kruskal's MST

Sort edges by weight. Iterate in order and call unite(u, v). If the endpoints are already connected, skip (adding the edge would create a cycle). Otherwise add it to the MST. Stop when n1 edges have been added.

cpp
sort(edges.begin(), edges.end()); // edges = {w, u, v}
DSU dsu(n);
long long mst = 0;
for (auto& [w, u, v] : edges)
    if (dsu.unite(u, v)) mst += w;

Example problems:

Offline Queries—Reverse Processing

When queries involve deleting edges or removing elements, process them in reverse order—deletions become unions. This is far simpler than maintaining a deletable DSU.

Example problems:

DSU with Rollback

When you need to undo unions—e.g., divide-and-conquer on queries—use union by rank without path compression. Store each union on a stack and revert by restoring old parent/rank values. The trade-off is O(logn) per operation instead of O(α(n)).

cpp
struct RollbackDSU {
    vector<int> p, rank_;
    vector<pair<int,int>> history; // (node, old_parent)
    RollbackDSU(int n) : p(n), rank_(n, 0) { iota(p.begin(), p.end(), 0); }
    int find(int x) { while (p[x] != x) x = p[x]; return x; } // NO compression
    int save() { return history.size(); }
    void rollback(int checkpoint) {
        while ((int)history.size() > checkpoint) {
            auto [node, old_par] = history.back(); history.pop_back();
            if (node < 0) rank_[~node]--; // rank restore sentinel
            else p[node] = old_par;
        }
    }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rank_[a] < rank_[b]) swap(a, b);
        history.push_back({b, p[b]});
        p[b] = a;
        if (rank_[a] == rank_[b]) { history.push_back({~a, 0}); rank_[a]++; }
        return true;
    }
};

Worked example—rollback: 4 nodes {0, 1, 2, 3}, all singletons. Perform two unions, then roll back one.

Step 0: Initial state

text
  p:    [0, 1, 2, 3]   rank: [0, 0, 0, 0]
  history: []

  0   1   2   3          (4 singletons)

  checkpoint = save() -> 0

Step 1: unite(0, 1)

find(0)=0, find(1)=1—different roots, equal rank. Push {1, 1} (node 1's old parent was 1) onto history. Attach 1 under 0. Equal rank → push rank sentinel {~0, 0}, bump rank[0].

text
  p:    [0, 0, 2, 3]   rank: [1, 0, 0, 0]
  history: [(1, 1), (~0, 0)]
                  ^       ^
              parent    rank
              restore   restore

  0       2   3
  |
  1

  checkpoint_after_first = save() -> 2

Step 2: unite(2, 3)

find(2)=2, find(3)=3—different roots, equal rank. Push {3, 3} onto history. Attach 3 under 2. Equal rank → push {~2, 0}, bump rank[2].

text
  p:    [0, 0, 2, 2]   rank: [1, 0, 1, 0]
  history: [(1, 1), (~0, 0), (3, 3), (~2, 0)]

  0       2
  |       |
  1       3

  save() -> 4

Step 3: rollback(checkpoint_after_first)—undo Step 2 only

Roll back to checkpoint 2 (history size was 2 after Step 1). Pop (~2, 0) → rank sentinel, so rank[2]-- → rank[2] = 0. Pop (3, 3) → normal parent restore, so p[3] = 3. History size is now 2, which equals the checkpoint. Stop.

text
  p:    [0, 0, 2, 3]   rank: [1, 0, 0, 0]
  history: [(1, 1), (~0, 0)]

  0       2   3          Step 2 is fully undone!
  |                      Step 1 (0-1 union) is still intact.
  1

Each unite pushes 1–2 entries (a parent change and an optional rank change). rollback pops them in reverse order, restoring exact prior state. Because there is no path compression, find() never mutates the parent array—rollback only ever needs to undo the effects of unite.

Example problems:

Weighted / Potential DSU

Each edge (u,v) carries a weight w. The DSU tracks relative potentials: pot[v]pot[root(v)]. Useful for problems of the form "the difference between a and b is d"—model the constraints as a potential graph.

cpp
struct WeightedDSU {
    vector<int> p;
    vector<long long> diff; // diff[x] = potential[x] - potential[parent[x]]
    vector<int> rank_;
    WeightedDSU(int n) : p(n), diff(n, 0), rank_(n, 0) { iota(p.begin(), p.end(), 0); }
    pair<int, long long> find(int x) {
        if (p[x] == x) return {x, 0};
        auto [root, d] = find(p[x]);
        p[x] = root;
        diff[x] += d;
        return {root, diff[x]};
    }
    // potential[b] - potential[a] = w
    bool unite(int a, int b, long long w) {
        auto [ra, da] = find(a);
        auto [rb, db] = find(b);
        if (ra == rb) return (db - da) == w; // consistency check
        w = w + da - db; // adjust for root potentials
        if (rank_[ra] < rank_[rb]) { swap(ra, rb); w = -w; }
        p[rb] = ra; diff[rb] = w;
        if (rank_[ra] == rank_[rb]) rank_[ra]++;
        return true;
    }
};

Worked example—relative positions on a number line: diff[x] tracks potential[x] - potential[parent[x]]. For a root r, diff[r] = 0. The potential of any node x relative to its root is the sum of diff values along the path from x to the root.

Step 0: Initial state

text
  p:    [0, 1, 2, 3, 4]   diff: [0, 0, 0, 0, 0]   rank: [0, 0, 0, 0, 0]

  Each person is their own root with potential 0.

Step 1: unite(0, 1, 3)—"pos[1] - pos[0] = 3" (person 1 is 3m right of 0)

find(0) = (0, 0), find(1) = (1, 0). Different roots. w=w+dadb=3+00=3. Equal rank → attach root 1 under root 0, diff[1] = 3, bump rank[0].

text
  p:    [0, 0, 2, 3, 4]   diff: [0, 3, 0, 0, 0]   rank: [1, 0, 0, 0, 0]

  0 ----(+3)---- 1       2   3   4

  Meaning: potential[1] - potential[0] = 3.
           If we say pos[0] = 0, then pos[1] = 3.

Step 2: unite(1, 2, 5)—"pos[2] - pos[1] = 5" (person 2 is 5m right of 1)

find(1) = (0, 3), find(2) = (2, 0). Different roots (0 != 2). w=w+dadb=5+30=8. rank[0]=1 > rank[2]=0 → attach root 2 under root 0, diff[2] = 8.

text
  p:    [0, 0, 0, 3, 4]   diff: [0, 3, 8, 0, 0]   rank: [1, 0, 0, 0, 0]

      0
     / \
  (+3) (+8)
   /     \
  1       2

  Verification: pos[2] - pos[1] = diff[2] - diff[1] = 8 - 3 = 5. (ok)
  Relative to root: pos[0]=0, pos[1]=3, pos[2]=8.

Step 3: unite(3, 4, 2)—"pos[4] - pos[3] = 2" (person 4 is 2m right of 3)

find(3) = (3, 0), find(4) = (4, 0). Different roots. w=2+00=2. Equal rank → attach 4 under 3, diff[4] = 2, bump rank[3].

text
  p:    [0, 0, 0, 3, 3]   diff: [0, 3, 8, 0, 2]   rank: [1, 0, 0, 1, 0]

      0             3
     / \            |
  (+3) (+8)       (+2)
   /     \          |
  1       2         4

  Two separate components: {0,1,2} and {3,4}.

Step 4: unite(2, 3, -1)—"pos[3] - pos[2] = -1" (person 3 is 1m left of 2)

find(2) = (0, 8), find(3) = (3, 0). Different roots (0 != 3). w=w+dadb=(1)+80=7. rank[0]=1, rank[3]=1, equal → attach root 3 under root 0, diff[3] = 7, bump rank[0].

text
  p:    [0, 0, 0, 0, 3]   diff: [0, 3, 8, 7, 2]   rank: [2, 0, 0, 1, 0]

            0
          / | \
       (+3)(+8)(+7)
        /   |    \
       1    2     3
                  |
                (+2)
                  |
                  4

  All 5 people in one component now.
  Relative positions (using root 0 as origin):
    pos[0] = 0
    pos[1] = 3
    pos[2] = 8
    pos[3] = 7      (8 + (-1) = 7, consistent!)
    pos[4] = 7 + 2 = 9

  Query: "What is pos[4] - pos[1]?"
    find(4): walk 4->3->0, accumulate diff: 2 + 7 = 9.
    find(1): walk 1->0, accumulate diff: 3.
    Answer: 9 - 3 = 6. Person 4 is 6m right of person 1.

  Consistency check: unite(1, 4, 6) -> find(1)=(0,3), find(4)=(0,9).
    Same root! Check: 9 - 3 = 6 = w. (ok) Consistent.

Example problems:

Counting Components / Largest Component

Track component count or maximum component size while merging. Common in problems asking "after all edges are added, how many components remain?" or "what is the size of the largest component at each step?"

Example problems:

Bipartiteness Checking with DSU

Maintain a DSU where each node has a "parity" relative to its root. When adding edge (u,v), check whether u and v should land in the same or different partition. If they are already in the same component and the parity conflicts, the graph is not bipartite. Combine with rollback for online edge insertion/deletion queries.

Example problems:


Contest Cheat Sheet

text
+--------------------------------------------------------------+
|  UNION-FIND / DSU  --  CHEAT SHEET                           |
+--------------------------------------------------------------+
|  WHEN TO USE:                                                 |
|    - "Are u and v connected?" with edge additions             |
|    - Kruskal's MST                                            |
|    - Grouping by equivalence relation                         |
|    - Offline: reverse deletions into unions                   |
+--------------------------------------------------------------+
|  TEMPLATE:                                                    |
|    vector<int> p(n), rk(n, 0);                                |
|    iota(p.begin(), p.end(), 0);                               |
|    function<int(int)> find = [&](int x) {                     |
|        return p[x] == x ? x : p[x] = find(p[x]);             |
|    };                                                         |
|    auto unite = [&](int a, int b) {                           |
|        a = find(a); b = find(b);                              |
|        if (a == b) return false;                              |
|        if (rk[a] < rk[b]) swap(a, b);                        |
|        p[b] = a;                                              |
|        if (rk[a] == rk[b]) rk[a]++;                          |
|        return true;                                           |
|    };                                                         |
+--------------------------------------------------------------+
|  COMPLEXITY:   O(alpha(n)) ~ O(1)  per find/unite            |
|  SPACE:        O(n)                                           |
+--------------------------------------------------------------+
|  WATCH OUT:    find(u)==find(v), NOT p[u]==p[v]               |
|                init: iota or par[i]=i, rank[i]=0              |
|                rollback => NO path compression                |
+--------------------------------------------------------------+

Common Mistakes & Debugging

Forgetting Path Compression

Writing find as a simple loop without reassigning parents gives O(logn) per call instead of O(α(n)). On 106 operations that is the difference between ~50ms and TLE. Always use p[x] = find(p[x]).

Comparing p[u] Instead of find(u)

p[u] is the immediate parent, not the root. After unions p[u] may be stale. Always compare find(u) == find(v).

Rank vs Size—Don't Mix

Pick one strategy and stick with it. If using rank, do not add element counts to it—rank measures tree height, not set size. If using size, merge by comparing sz[a] < sz[b] and accumulate sz[b] into sz[a] after each merge.

0-Indexed vs 1-Indexed

If the problem uses 1-indexed nodes, either allocate DSU(n+1) or subtract 1 from every input. Off-by-one here causes silent wrong answers or segfaults.

Path Compression + Rollback = Bug

Path compression mutates the tree structure irreversibly. If you need rollback, you must not use path compression. Use the iterative while (p[x] != x) x = p[x]; instead.

Forgetting to Initialize

par[i] must equal i for all i before any unions. A common mistake is relying on the default zero-initialization of vector<int> p(n), which silently makes every element appear to belong to set 0.

Calling find() in a Loop Without Caching

If you call find(x) repeatedly inside a loop body—say, for every neighbor—you redo the path walk each time. Cache the result before the loop with int rx = find(x); and use rx inside.

Debugging Tips

  • Print the parent array after each operation to trace tree structure.
  • Validate the invariant: find(find(x)) == find(x) must always hold.
  • If the answer is wrong, check that you call find() at the right places—before every comparison, and after every union you intend to query.

Mental Traps

"DSU can answer distance / shortest-path queries." DSU is a connectivity oracle: it answers "are u and v in the same component?" in near-O(1), and nothing more. It stores no edge weights, no path structure, no distances. The moment you need "how far is u from v?" reach for BFS, Dijkstra, or Bellman-Ford. Weighted DSU tracks relative potentials ("B is 5 units ahead of A"), but that is not the same as shortest paths in a general graph.

text
  DSU ANSWERS vs GRAPH ALGORITHM ANSWERS
  ========================================

  Question               Tool
  --------------------   -----------------------
  Same component?        DSU  (Y)   O(alpha(n))
  Shortest path?         BFS/Dijkstra  (Y)
  Component count?       DSU  (Y)   O(1)
  Delete an edge?        DSU  (N)   (append-only)
  Relative potential?    Weighted DSU  (Y)
  Actual path nodes?     BFS/DFS  (Y)

"I can undo unions with path compression enabled." Path compression permanently mutates the parent array—it moves every visited node directly under the root. Rollback DSU works by restoring exact prior state from a stack. If compression has silently rewritten parents that were never part of an explicit unite() call, your rollback corrupts the structure. Rule: if you need rollback, disable path compression and use the iterative while (p[x] != x) x = p[x]; find.


Self-Test

Q1: What is the amortized complexity of find() with both path compression and union by rank?

AnswerO(alpha(n)) ~= O(1), where alpha is the inverse Ackermann function. For all practical input sizes, alpha(n) <= 4, making it effectively constant time.

Q2: Why is checking p[u] == p[v] wrong for connectivity, while find(u) == find(v) is correct?

Answer`p[u]` is the direct parent, not necessarily the root. Without path compression having been applied on the exact path from u to root, p[u] might be an intermediate node. `find(u)` follows the chain to the actual root, so comparing roots correctly checks connectivity.

Q3: Why must path compression be disabled in rollback DSU?

AnswerPath compression modifies the parent pointers of all nodes on the find path, creating many small changes that are expensive to undo. Rollback DSU needs to reverse union operations by restoring parent/rank of exactly two nodes. With path compression, you'd need to track and reverse all compressed pointers—too expensive.

Q4: DSU supports connectivity queries. Name two things it cannot do efficiently that require different data structures.

Answer(1) Disconnection / edge deletion—standard DSU only supports union, not split. Handling deletions requires offline processing (reverse time) or link-cut trees. (2) Shortest path queries between nodes—DSU only knows if nodes are connected, not the distance between them.

Q5: You union n1 pairs to form a single component. If you always attach the smaller tree under the larger (union by size) but skip path compression, what is the worst-case find() complexity?

AnswerO(log n). Union by rank/size alone guarantees tree height <= log_2 n, since each merge at least doubles the smaller tree's size. Path compression improves this to O(alpha(n)), but even without it, O(log n) per find is guaranteed.

Q6: Write the minimal DSU (find with path compression, unite with rank) from memory. Can you do it in under 2 minutes?

Q7: Initialize a DSU for n elements using iota. What breaks if you skip the initialization?

AnswerEvery element's parent defaults to 0 (from vector<int> zero-init), so find(x) returns 0 for all x—every element appears to be in the same set from the start.

How to Practice This

The goal is to internalize DSU so that the three core functions—find, unite, connected—flow without thought. Speed-drill the minimal template:

AttemptTargetNotes
1st< 6 minFocus on correctness—path compression in find, rank comparison in unite.
2nd< 4 minEliminate pauses—the three functions should be automatic.
3rd+< 3 minCompetition-ready. DSU is a building block—it must be instant.

Once the template is automatic, ramp up over three weeks:

  1. Week 1: Drill basic DSU daily until you hit 3 minutes consistently.
  2. Week 2: Practice weighted DSU—annotate edges and maintain consistency in find.
  3. Week 3: Implement rollback DSU and solve an offline connectivity problem with it.
  4. Ongoing: When you see "merge groups" or "connected components changing over time," reach for DSU first.

Practice Problems

#ProblemSourceDifficultyKey Idea
1Road ConstructionCSESEasyBasic DSU + track component count and max size
2Road ReparationCSESEasyKruskal's MST with DSU
3The Third LetterCF 1850HEasy-MedWeighted DSU / potential consistency
4Mocha and Diana (Easy)CF 1559D1MediumDual-DSU: unite in both forests simultaneously
5GCD and MSTCF 1513DMediumKruskal-like greedy with GCD property
6Connected ComponentsCF 292DMediumOffline reverse-processing with DSU
7People on a LineAtCoder ABC087DMediumWeighted DSU—difference constraints
8Edges in MSTCF 160DHardClassify edges as "any MST" / "some MST" / "no MST"
9Bipartite CheckingCF 813FHardDSU with rollback + parity + segment tree divide-and-conquer
10Extending Set of PointsCF 1140FHardDSU rollback with segment tree on time

Further Reading

A note on "DSU on tree": The term "DSU on tree" (also called small-to-large merging or heavy path merging) refers to a technique for answering subtree queries on rooted trees in O(nlogn). Despite the name, it has nothing to do with the Union-Find data structure described in this article. The idea: when merging data from child subtrees into a parent, always merge the smaller set into the larger one. Each element is moved at most O(logn) times because every move at least doubles the size of the set it joins. Do not confuse the two: Union-Find/DSU is a data structure for dynamic connectivity; "DSU on tree" is an algorithmic technique for efficient subtree aggregation. It is covered separately—see 10-dsu-on-tree.md.

See also:

Built for the climb from Codeforces 1100 to 2100.