Appearance
Union-Find (Disjoint Set Union)
Union-Find maintains a partition of
Difficulty: Intermediate | Prereqs: Arrays and Strings | Contest Frequency: Common—appears in most Div 2 C/D problems
Contents
- Intuition
- Walkthrough: Six Servers
- The Invariants
- Why Both Optimizations Matter
- When to Reach for DSU
- C++ STL Reference
- Implementation
- Operations Reference
- Union by Rank vs Union by Size
- Problem Patterns & Tricks
- Contest Cheat Sheet
- Common Mistakes & Debugging
- Self-Test
- How to Practice This
- Practice Problems
- Further Reading
Intuition
First, what is a connected component? Consider 5 nodes and 3 edges:
text
Nodes: {0, 1, 2, 3, 4}
Edges: (0,1), (1,2), (3,4)
Visual:
0 --- 1 --- 2 3 --- 4
\___________/ \___/
component A component BNodes 0, 1, and 2 can reach each other by following edges—they form one connected component. Nodes 3 and 4 form another. Node 0 cannot reach node 3 by any path, so they are in different components.
Now suppose someone asks: "Are 0 and 2 connected?" Without a specialized data structure, you have to run BFS or DFS from node 0 and check whether node 2 appears, costing
Imagine 6 servers numbered 0–5, starting isolated. Requests come in two flavors:
- connect(u, v)—link two servers into the same network.
- connected?(u, v)—are these two servers in the same network?
Here is a concrete sequence:
text
connect(0, 1)
connect(2, 3)
connected?(1, 3) --> NO
connect(1, 2)
connected?(0, 3) --> YES
connect(4, 5)
connected?(3, 5) --> NONaive approach—component label array. Maintain an array comp[] where comp[i] is the component ID of node comp = [0, 1, 2, 3, 4, 5].
text
connect(0, 1): comp[1] was 1, scan all 6 entries, change every 1 -> 0
comp = [0, 0, 2, 3, 4, 5] 6 ops
connect(2, 3): scan all 6 entries, change every 3 -> 2
comp = [0, 0, 2, 2, 4, 5] 6 ops
connected?(1, 3): comp[1]=0, comp[3]=2, NO 1 op
connect(1, 2): comp[1]=0, comp[2]=2, scan all 6, change every 2 -> 0
comp = [0, 0, 0, 0, 4, 5] 6 ops
connected?(0, 3): comp[0]=0, comp[3]=0, YES 1 op
connect(4, 5): scan all 6 entries, change every 5 -> 4
comp = [0, 0, 0, 0, 4, 4] 6 ops
connected?(3, 5): comp[3]=0, comp[5]=4, NO 1 opTotal: 27 operations for just 7 queries on 6 nodes. Every connect scans the entire array—
We're rewriting the same labels over and over. There has to be a better way.
Instead of labeling every element with a group ID, let each element point to a parent. The result is a forest of trees—one tree per group—with each tree's root serving as the group's representative.
Think of it as tribes merging. Each starts alone. When two groups merge, one leader becomes a child of the other. To check whether two elements belong together, each walks upward to its representative and the roots are compared.
The analogy has one important limit: in real life, leadership is meaningful. In DSU, the root is just an internal label. We can reshape the tree however we like as long as every node in the same set still reaches the same root. That freedom is precisely what makes path compression and union by rank possible.
Think of DSU as maintaining equivalence classes. Two elements are equivalent if they are connected, and equivalence is reflexive, symmetric, and transitive. Every union merges two classes; every find asks which class a node belongs to. DSU is the right tool whenever a problem defines some notion of sameness that grows over time—connected components, equal-value groups, same-color regions, and similar relations.
Walkthrough: Six Servers
Same 6 servers, same operations. Each node stores a parent pointer, and par[i] = i means
Step 0: Initialization—every node is its own root.
text
par: [0, 1, 2, 3, 4, 5] rank: [0, 0, 0, 0, 0, 0]
0 1 2 3 4 5 <-- 6 singleton treesStep 1: connect(0, 1)—find(0)=0, find(1)=1, different roots. Attach 1 under 0 (equal rank, pick 0 as root, bump its rank).
text
par: [0, 0, 2, 3, 4, 5] rank: [1, 0, 0, 0, 0, 0]
0 2 3 4 5
|
1
Cost: 2 finds (1 step each) + 1 link = 3 opsStep 2: connect(2, 3)—find(2)=2, find(3)=3, different roots. Attach 3 under 2.
text
par: [0, 0, 2, 2, 4, 5] rank: [1, 0, 1, 0, 0, 0]
0 2 4 5
| |
1 3
Cost: 2 finds (1 step each) + 1 link = 3 opsStep 3: connected?(1, 3)—find(1): walk 1→0 (root). find(3): walk 3→2 (root). Roots differ: 0 ≠ 2. Answer: NO.
text
Trees unchanged.
Cost: 2 finds (2 steps each) = 4 opsStep 4: connect(1, 2)—find(1): walk 1→0 (root=0). find(2)=2 (root). Roots 0 and 2 have equal rank (both 1). Attach 2 under 0, bump rank of 0.
text
par: [0, 0, 0, 2, 4, 5] rank: [2, 0, 1, 0, 0, 0]
0 4 5
/ \
1 2
|
3
Cost: 2 finds (2+1 steps) + 1 link = 4 opsStep 5: connected?(0, 3)—find(0)=0 (root, 1 step). find(3): walk 3→2→0 (root=0). Same root! Answer: YES.
Now path compression kicks in: node 3 passed through node 2 on the way up. Repoint both directly to root 0:
text
Before compression: After compression:
0 0
/ \ / | \
1 2 1 2 3
|
3
par: [0, 0, 0, 0, 4, 5] rank: [2, 0, 1, 0, 0, 0]
Cost: 2 finds (1+3 steps) = 4 ops
Future find(3) will be O(1)!Step 6: connect(4, 5)—find(4)=4, find(5)=5. Attach 5 under 4.
text
par: [0, 0, 0, 0, 4, 4] rank: [2, 0, 1, 0, 1, 0]
0 4
/ | \ |
1 2 3 5
Cost: 2 finds + 1 link = 3 opsStep 7: connected?(3, 5)—find(3)=0 (direct! path was compressed in Step 5). find(5): walk 5→4 (root=4). Roots differ: 0 ≠ 4. Answer: NO.
text
Trees unchanged.
Cost: 2 finds (1+2 steps) = 3 opsOperation count comparison:
text
Naive (label array): 27 operations (O(n) per connect)
Union-Find: 24 operations (and shrinking as trees flatten)On this tiny example the savings are modest. But as
Path Compression in Detail—A 5-Node Chain
To see the payoff more clearly, flatten a chain of 5 nodes in a single find() call.
Suppose a sequence of unions without compression has built this tall chain:
text
BEFORE find(4):
0 par: [0, 0, 1, 2, 3]
|
1 find(4) walks: 4 -> 3 -> 2 -> 1 -> 0 (root!)
| (4 hops to reach root)
2
|
3
|
4When find(4) is called, the recursive version unwinds and repoints every node on the path directly to root 0:
text
AFTER find(4):
0 par: [0, 0, 0, 0, 0]
/ | \ \
1 2 3 4 Every node now points directly to root 0.
Future find() on ANY of these nodes: 1 hop.One call to find(4) turns a depth-4 chain into a depth-1 star. That is path compression in a nutshell: pay
Step-by-step trace of the recursive find(4):
text
Call stack (going down): Rewiring (going back up):
------------------------- -------------------------
find(4): p[4]=3, not root p[4] = find(3) -> 0 ok
find(3): p[3]=2, not root p[3] = find(2) -> 0 ok
find(2): p[2]=1, not root p[2] = find(1) -> 0 ok
find(1): p[1]=0, not root p[1] = find(0) -> 0 ok
find(0): p[0]=0, ROOT! -> return 0Union by Rank—Step-by-Step Visual
Union by rank keeps trees shallow by always attaching the shorter tree under the taller one. To see how that plays out, trace 5 unions from scratch with rank tracking. The notation i(r) means node
text
Initial: 5 separate nodes, rank[i]=0 for all
1(0) 2(0) 3(0) 4(0) 5(0)union(1,2): ranks equal (both 0). Pick 1 as root, bump rank[1] to 1.
text
1(1) 3(0) 4(0) 5(0)
|
2(0)union(3,4): ranks equal (both 0). Pick 3 as root, bump rank[3] to 1.
text
1(1) 3(1) 5(0)
| |
2(0) 4(0)union(1,3): ranks equal (both 1). Pick 1 as root, bump rank[1] to 2.
text
1(2) 5(0)
/ \
2(0) 3(1)
|
4(0)union(1,5): rank[1]=2 > rank[5]=0, so attach 5 under 1. Rank does not increase—the taller tree absorbed a shorter one.
text
1(2)
/|\
2 3 5
|
4find(4): path is 4 -> 3 -> 1. Path compression repoints 4 directly to root 1:
text
1(2)
/|\ \
2 3 5 4Notice how rank guides every decision:
- Equal rank: either root works; bump the winner's rank.
- Unequal rank: shorter tree goes under taller tree, no rank change.
- Result: a tree of rank
contains nodes, so rank never exceeds .
The Invariants
text
+------------------------------------------------------------------+
| INVARIANT |
| |
| 1. Each tree's root is the unique representative for its set. |
| Two elements are in the same set iff find() returns the |
| same root. |
| |
| 2. Path compression re-parents nodes directly to the root |
| during find(). This changes tree shape but never changes |
| which root a node belongs to -- representatives are |
| preserved. |
| |
| 3. Union by rank attaches the shorter tree under the taller |
| root. This prevents any single chain from growing tall. |
| A tree of rank k contains at least 2^k nodes. |
+------------------------------------------------------------------+In Step 5, node 3 sat two levels below the root. Path compression moved it directly under root 0: the root did not change, node 3 stayed in the same set, and the next find(3) became a single hop. That is the compression payoff—you pay once, and every future query on that node is nearly free.
In Step 4, roots 0 and 2 had equal rank, so we attached 2 under 0 and bumped rank. Had we attached 0 under 2 the depth would have been the same. When ranks are unequal, always attaching the shorter tree under the taller one prevents any single chain from growing tall.
Why —Informal Amortized Proof
Each optimization alone is good but not great:
Union by rank alone guarantees tree height
(a rank- root has descendants, so rank never exceeds ). Every find costs . Path compression alone can still build tall trees—imagine a long chain before the first find. But once you call find on a deep node, every node on that path gets repointed to the root, so future finds on those nodes cost
. Amortized over many operations this gives via an iterated-logarithm argument.
Together they are far more powerful than either alone. The intuition:
Union by rank ensures no tree ever grows taller than
, so the worst case for a single find is —bounded, not catastrophic. Path compression moves every node you touch during a find directly to the root. You pay for visiting those nodes once; every future find involving them skips straight there.
Because trees are shallow (rank bound), the number of nodes that can be at each depth level is limited. Path compression keeps promoting nodes to the root level, so the pool of "deep" nodes drains rapidly. After enough operations, almost every node sits one hop from its root.
Tarjan's 1975 analysis shows this draining is so aggressive that any sequence of
find/unite operations on elements costs at most total, where is the inverse Ackermann function. Since the Ackermann function grows faster than any primitive recursive function, its inverse satisfies for every that could fit in a computer's memory ( ).
For contest purposes: treat find and unite as
Why Both Optimizations Matter
The real insight is the "representative" concept, not the tree itself. You do not care which node is the root—you care that find(u) == find(v) if and only if p[u] == p[v] is wrong: immediate parents are not representatives.
The two optimizations do fundamentally different things. Rank alone caps tree height at
text
WHY BOTH OPTIMIZATIONS ARE NEEDED
===================================
Rank only: Compression only: Both:
---------- ------------------ ----
Height <= log n Height starts O(n) Height <= log n
Every find: O(log n) After finds: flattens Finds flatten shallow trees
Never improves Amortized O(log n) Amortized O(alpha(n)) ~= O(1)
O O O
/ \ | / | \
O O find= | (first time) O O O
/ \ O(n) O find ~= O(alpha(n)) ~= 1
O O |
ODSU is append-only. You can merge sets but you cannot split them. If the problem involves edge deletions, either process queries in reverse—so deletions become additions—or use rollback DSU with a checkpoint stack. Spotting "this is a deletion problem" and flipping to reverse processing is a reliable contest reflex.
When to Reach for DSU
Once the mechanics are clear, the pattern is easy to recognize.
Trigger phrases—if you see these in a problem statement, think DSU:
- "connected components" with edges added over time
- "merge groups" / "equivalence classes" / "same set"
- "online connectivity"—answer "is
reachable from ?" as edges arrive - "Kruskal" / "minimum spanning tree"—sort edges, greedily unite
- "transitive relation"—if
and then
Counter-examples—these look like DSU but aren't:
- "Shortest path between
and "—DSU answers only yes/no connectivity, not distances. Use BFS or Dijkstra. - "Delete edges and check connectivity"—vanilla DSU cannot undo unions. Process queries in reverse (deletions become additions) or use rollback DSU—see the Offline Queries and Rollback patterns below.
- "Connected components in a static graph"—if no edges are being added, a simple BFS/DFS finds all components in
. DSU shines when edges arrive incrementally and you need answers between arrivals.
C++ STL Reference
There is no STL equivalent for Union-Find—you must implement it yourself. The structure is small enough to inline in any contest solution.
| Operation | Signature | What it does | Amortized Time |
|---|---|---|---|
| Initialize | DSU(int n) | Create | |
| Find | int find(int x) | Return root representative of | |
| Union | bool unite(int a, int b) | Merge sets containing false if already same set | |
| Connected | bool connected(int a, int b) | Check if | |
| Set size | int size(int x) | Return number of elements in | |
| Component count | int components() | Return current number of disjoint sets |
Implementation
Minimal Contest Template
cpp
#include <bits/stdc++.h>
using namespace std;
struct DSU {
vector<int> p, rank_;
int comps;
DSU(int n) : p(n), rank_(n, 0), comps(n) { iota(p.begin(), p.end(), 0); }
int find(int x) { return p[x] == x ? x : p[x] = find(p[x]); }
bool unite(int a, int b) {
a = find(a); b = find(b);
if (a == b) return false;
if (rank_[a] < rank_[b]) swap(a, b);
p[b] = a;
if (rank_[a] == rank_[b]) rank_[a]++;
comps--;
return true;
}
bool connected(int a, int b) { return find(a) == find(b); }
};
int main() {
int n, m;
cin >> n >> m;
DSU dsu(n);
while (m--) {
int t, u, v;
cin >> t >> u >> v;
if (t == 1) dsu.unite(u, v);
else cout << (dsu.connected(u, v) ? "YES" : "NO") << "\n";
}
}Annotated Version (Union by Size)
cpp
#include <bits/stdc++.h>
using namespace std;
struct DSU {
vector<int> parent;
vector<int> sz; // sz[x] is meaningful only when x is a root
int comps; // number of connected components
// Create n singleton sets: {0}, {1}, ..., {n-1}
DSU(int n) : parent(n), sz(n, 1), comps(n) {
// Every element is its own parent initially
iota(parent.begin(), parent.end(), 0);
}
// Find the root representative of x.
// Path compression: every node on the path from x to root
// gets re-parented directly to root. This flattens the tree
// so future finds on these nodes are O(1).
int find(int x) {
if (parent[x] != x)
parent[x] = find(parent[x]); // recursive path compression
return parent[x];
}
// Merge the sets containing a and b.
// Union by size: attach smaller tree under larger tree's root.
// Returns true if a merge actually happened (false if already same set).
bool unite(int a, int b) {
a = find(a);
b = find(b);
if (a == b) return false; // already in the same set
// Ensure a is the larger set (by element count)
if (sz[a] < sz[b]) swap(a, b);
parent[b] = a; // b's root becomes child of a's root
sz[a] += sz[b]; // absorb b's count into a
comps--; // one fewer component
return true;
}
bool connected(int a, int b) {
return find(a) == find(b);
}
// Number of elements in the set containing x
int size(int x) {
return sz[find(x)];
}
int components() const {
return comps;
}
};
int main() {
int n, m;
cin >> n >> m;
DSU dsu(n);
while (m--) {
int type, u, v;
cin >> type >> u >> v;
if (type == 1) {
dsu.unite(u, v);
} else {
cout << (dsu.connected(u, v) ? "YES" : "NO") << "\n";
}
}
return 0;
}Operations Reference
With both optimizations active, every DSU operation is effectively constant time.
| Operation | Time (amortized) | Time (worst, no compression) | Space |
|---|---|---|---|
DSU(n) (init) | |||
find(x) | -- | ||
unite(a, b) | -- | ||
connected(a, b) | -- | ||
size(x) | -- | ||
components() | -- | ||
Only union by rank (no compression):
Union by Rank vs Union by Size
Two variants of the balancing heuristic exist. Union by rank tracks tree height:
cpp
int par[MAXN], rnk[MAXN];
void init(int n) { iota(par, par + n, 0); fill(rnk, rnk + n, 0); }
void unite(int a, int b) {
a = find(a); b = find(b);
if (a == b) return;
if (rnk[a] < rnk[b]) swap(a, b);
par[b] = a;
if (rnk[a] == rnk[b]) rnk[a]++;
}Union by size tracks subtree size:
cpp
int par[MAXN], sz[MAXN];
void init(int n) { iota(par, par + n, 0); fill(sz, sz + n, 1); }
void unite(int a, int b) {
a = find(a); b = find(b);
if (a == b) return;
if (sz[a] < sz[b]) swap(a, b);
par[b] = a;
sz[a] += sz[b];
}Both give
Problem Patterns & Tricks
Basic Connectivity Queries
Given a graph where edges are added one at a time, answer "are
Example problems:
- CF 1411C -- Sri Lanka (~1300)
- CSES -- Road Construction (classic)
Kruskal's MST
Sort edges by weight. Iterate in order and call unite(u, v). If the endpoints are already connected, skip (adding the edge would create a cycle). Otherwise add it to the MST. Stop when
cpp
sort(edges.begin(), edges.end()); // edges = {w, u, v}
DSU dsu(n);
long long mst = 0;
for (auto& [w, u, v] : edges)
if (dsu.unite(u, v)) mst += w;Example problems:
- CF 160D -- Edges in MST (~2000)
- CF 1513D -- GCD and MST (~1800)
Offline Queries—Reverse Processing
When queries involve deleting edges or removing elements, process them in reverse order—deletions become unions. This is far simpler than maintaining a deletable DSU.
Example problems:
DSU with Rollback
When you need to undo unions—e.g., divide-and-conquer on queries—use union by rank without path compression. Store each union on a stack and revert by restoring old parent/rank values. The trade-off is
cpp
struct RollbackDSU {
vector<int> p, rank_;
vector<pair<int,int>> history; // (node, old_parent)
RollbackDSU(int n) : p(n), rank_(n, 0) { iota(p.begin(), p.end(), 0); }
int find(int x) { while (p[x] != x) x = p[x]; return x; } // NO compression
int save() { return history.size(); }
void rollback(int checkpoint) {
while ((int)history.size() > checkpoint) {
auto [node, old_par] = history.back(); history.pop_back();
if (node < 0) rank_[~node]--; // rank restore sentinel
else p[node] = old_par;
}
}
bool unite(int a, int b) {
a = find(a); b = find(b);
if (a == b) return false;
if (rank_[a] < rank_[b]) swap(a, b);
history.push_back({b, p[b]});
p[b] = a;
if (rank_[a] == rank_[b]) { history.push_back({~a, 0}); rank_[a]++; }
return true;
}
};Worked example—rollback: 4 nodes {0, 1, 2, 3}, all singletons. Perform two unions, then roll back one.
Step 0: Initial state
text
p: [0, 1, 2, 3] rank: [0, 0, 0, 0]
history: []
0 1 2 3 (4 singletons)
checkpoint = save() -> 0Step 1: unite(0, 1)
find(0)=0, find(1)=1—different roots, equal rank. Push {1, 1} (node 1's old parent was 1) onto history. Attach 1 under 0. Equal rank → push rank sentinel {~0, 0}, bump rank[0].
text
p: [0, 0, 2, 3] rank: [1, 0, 0, 0]
history: [(1, 1), (~0, 0)]
^ ^
parent rank
restore restore
0 2 3
|
1
checkpoint_after_first = save() -> 2Step 2: unite(2, 3)
find(2)=2, find(3)=3—different roots, equal rank. Push {3, 3} onto history. Attach 3 under 2. Equal rank → push {~2, 0}, bump rank[2].
text
p: [0, 0, 2, 2] rank: [1, 0, 1, 0]
history: [(1, 1), (~0, 0), (3, 3), (~2, 0)]
0 2
| |
1 3
save() -> 4Step 3: rollback(checkpoint_after_first)—undo Step 2 only
Roll back to checkpoint 2 (history size was 2 after Step 1). Pop (~2, 0) → rank sentinel, so rank[2]-- → rank[2] = 0. Pop (3, 3) → normal parent restore, so p[3] = 3. History size is now 2, which equals the checkpoint. Stop.
text
p: [0, 0, 2, 3] rank: [1, 0, 0, 0]
history: [(1, 1), (~0, 0)]
0 2 3 Step 2 is fully undone!
| Step 1 (0-1 union) is still intact.
1Each unite pushes 1–2 entries (a parent change and an optional rank change). rollback pops them in reverse order, restoring exact prior state. Because there is no path compression, find() never mutates the parent array—rollback only ever needs to undo the effects of unite.
Example problems:
- CF 813F -- Bipartite Checking (~2400)
Weighted / Potential DSU
Each edge
cpp
struct WeightedDSU {
vector<int> p;
vector<long long> diff; // diff[x] = potential[x] - potential[parent[x]]
vector<int> rank_;
WeightedDSU(int n) : p(n), diff(n, 0), rank_(n, 0) { iota(p.begin(), p.end(), 0); }
pair<int, long long> find(int x) {
if (p[x] == x) return {x, 0};
auto [root, d] = find(p[x]);
p[x] = root;
diff[x] += d;
return {root, diff[x]};
}
// potential[b] - potential[a] = w
bool unite(int a, int b, long long w) {
auto [ra, da] = find(a);
auto [rb, db] = find(b);
if (ra == rb) return (db - da) == w; // consistency check
w = w + da - db; // adjust for root potentials
if (rank_[ra] < rank_[rb]) { swap(ra, rb); w = -w; }
p[rb] = ra; diff[rb] = w;
if (rank_[ra] == rank_[rb]) rank_[ra]++;
return true;
}
};Worked example—relative positions on a number line: diff[x] tracks potential[x] - potential[parent[x]]. For a root diff[r] = 0. The potential of any node diff values along the path from
Step 0: Initial state
text
p: [0, 1, 2, 3, 4] diff: [0, 0, 0, 0, 0] rank: [0, 0, 0, 0, 0]
Each person is their own root with potential 0.Step 1: unite(0, 1, 3)—"pos[1] - pos[0] = 3" (person 1 is 3m right of 0)
find(0) = (0, 0), find(1) = (1, 0). Different roots.
text
p: [0, 0, 2, 3, 4] diff: [0, 3, 0, 0, 0] rank: [1, 0, 0, 0, 0]
0 ----(+3)---- 1 2 3 4
Meaning: potential[1] - potential[0] = 3.
If we say pos[0] = 0, then pos[1] = 3.Step 2: unite(1, 2, 5)—"pos[2] - pos[1] = 5" (person 2 is 5m right of 1)
find(1) = (0, 3), find(2) = (2, 0). Different roots (0 != 2).
text
p: [0, 0, 0, 3, 4] diff: [0, 3, 8, 0, 0] rank: [1, 0, 0, 0, 0]
0
/ \
(+3) (+8)
/ \
1 2
Verification: pos[2] - pos[1] = diff[2] - diff[1] = 8 - 3 = 5. (ok)
Relative to root: pos[0]=0, pos[1]=3, pos[2]=8.Step 3: unite(3, 4, 2)—"pos[4] - pos[3] = 2" (person 4 is 2m right of 3)
find(3) = (3, 0), find(4) = (4, 0). Different roots.
text
p: [0, 0, 0, 3, 3] diff: [0, 3, 8, 0, 2] rank: [1, 0, 0, 1, 0]
0 3
/ \ |
(+3) (+8) (+2)
/ \ |
1 2 4
Two separate components: {0,1,2} and {3,4}.Step 4: unite(2, 3, -1)—"pos[3] - pos[2] = -1" (person 3 is 1m left of 2)
find(2) = (0, 8), find(3) = (3, 0). Different roots (0 != 3).
text
p: [0, 0, 0, 0, 3] diff: [0, 3, 8, 7, 2] rank: [2, 0, 0, 1, 0]
0
/ | \
(+3)(+8)(+7)
/ | \
1 2 3
|
(+2)
|
4
All 5 people in one component now.
Relative positions (using root 0 as origin):
pos[0] = 0
pos[1] = 3
pos[2] = 8
pos[3] = 7 (8 + (-1) = 7, consistent!)
pos[4] = 7 + 2 = 9
Query: "What is pos[4] - pos[1]?"
find(4): walk 4->3->0, accumulate diff: 2 + 7 = 9.
find(1): walk 1->0, accumulate diff: 3.
Answer: 9 - 3 = 6. Person 4 is 6m right of person 1.
Consistency check: unite(1, 4, 6) -> find(1)=(0,3), find(4)=(0,9).
Same root! Check: 9 - 3 = 6 = w. (ok) Consistent.Example problems:
- CF 1850H -- The Third Letter (~1700)
- AtCoder ABC 087D -- People on a Line (~1500)
Counting Components / Largest Component
Track component count or maximum component size while merging. Common in problems asking "after all edges are added, how many components remain?" or "what is the size of the largest component at each step?"
Example problems:
Bipartiteness Checking with DSU
Maintain a DSU where each node has a "parity" relative to its root. When adding edge
Example problems:
- CF 813F -- Bipartite Checking (~2400)
Contest Cheat Sheet
text
+--------------------------------------------------------------+
| UNION-FIND / DSU -- CHEAT SHEET |
+--------------------------------------------------------------+
| WHEN TO USE: |
| - "Are u and v connected?" with edge additions |
| - Kruskal's MST |
| - Grouping by equivalence relation |
| - Offline: reverse deletions into unions |
+--------------------------------------------------------------+
| TEMPLATE: |
| vector<int> p(n), rk(n, 0); |
| iota(p.begin(), p.end(), 0); |
| function<int(int)> find = [&](int x) { |
| return p[x] == x ? x : p[x] = find(p[x]); |
| }; |
| auto unite = [&](int a, int b) { |
| a = find(a); b = find(b); |
| if (a == b) return false; |
| if (rk[a] < rk[b]) swap(a, b); |
| p[b] = a; |
| if (rk[a] == rk[b]) rk[a]++; |
| return true; |
| }; |
+--------------------------------------------------------------+
| COMPLEXITY: O(alpha(n)) ~ O(1) per find/unite |
| SPACE: O(n) |
+--------------------------------------------------------------+
| WATCH OUT: find(u)==find(v), NOT p[u]==p[v] |
| init: iota or par[i]=i, rank[i]=0 |
| rollback => NO path compression |
+--------------------------------------------------------------+Common Mistakes & Debugging
Forgetting Path Compression
Writing find as a simple loop without reassigning parents gives p[x] = find(p[x]).
Comparing p[u] Instead of find(u)
p[u] is the immediate parent, not the root. After unions p[u] may be stale. Always compare find(u) == find(v).
Rank vs Size—Don't Mix
Pick one strategy and stick with it. If using rank, do not add element counts to it—rank measures tree height, not set size. If using size, merge by comparing sz[a] < sz[b] and accumulate sz[b] into sz[a] after each merge.
0-Indexed vs 1-Indexed
If the problem uses 1-indexed nodes, either allocate DSU(n+1) or subtract 1 from every input. Off-by-one here causes silent wrong answers or segfaults.
Path Compression + Rollback = Bug
Path compression mutates the tree structure irreversibly. If you need rollback, you must not use path compression. Use the iterative while (p[x] != x) x = p[x]; instead.
Forgetting to Initialize
par[i] must equal i for all vector<int> p(n), which silently makes every element appear to belong to set 0.
Calling find() in a Loop Without Caching
If you call find(x) repeatedly inside a loop body—say, for every neighbor—you redo the path walk each time. Cache the result before the loop with int rx = find(x); and use rx inside.
Debugging Tips
- Print the
parentarray after each operation to trace tree structure. - Validate the invariant:
find(find(x)) == find(x)must always hold. - If the answer is wrong, check that you call
find()at the right places—before every comparison, and after every union you intend to query.
Mental Traps
"DSU can answer distance / shortest-path queries." DSU is a connectivity oracle: it answers "are
text
DSU ANSWERS vs GRAPH ALGORITHM ANSWERS
========================================
Question Tool
-------------------- -----------------------
Same component? DSU (Y) O(alpha(n))
Shortest path? BFS/Dijkstra (Y)
Component count? DSU (Y) O(1)
Delete an edge? DSU (N) (append-only)
Relative potential? Weighted DSU (Y)
Actual path nodes? BFS/DFS (Y)"I can undo unions with path compression enabled." Path compression permanently mutates the parent array—it moves every visited node directly under the root. Rollback DSU works by restoring exact prior state from a stack. If compression has silently rewritten parents that were never part of an explicit unite() call, your rollback corrupts the structure. Rule: if you need rollback, disable path compression and use the iterative while (p[x] != x) x = p[x]; find.
Self-Test
Q1: What is the amortized complexity of
find()with both path compression and union by rank?Answer
O(alpha(n)) ~= O(1), where alpha is the inverse Ackermann function. For all practical input sizes, alpha(n) <= 4, making it effectively constant time.
Q2: Why is checking
p[u] == p[v]wrong for connectivity, whilefind(u) == find(v)is correct?Answer
`p[u]` is the direct parent, not necessarily the root. Without path compression having been applied on the exact path from u to root, p[u] might be an intermediate node. `find(u)` follows the chain to the actual root, so comparing roots correctly checks connectivity.
Q3: Why must path compression be disabled in rollback DSU?
Answer
Path compression modifies the parent pointers of all nodes on the find path, creating many small changes that are expensive to undo. Rollback DSU needs to reverse union operations by restoring parent/rank of exactly two nodes. With path compression, you'd need to track and reverse all compressed pointers—too expensive.
Q4: DSU supports connectivity queries. Name two things it cannot do efficiently that require different data structures.
Answer
(1) Disconnection / edge deletion—standard DSU only supports union, not split. Handling deletions requires offline processing (reverse time) or link-cut trees. (2) Shortest path queries between nodes—DSU only knows if nodes are connected, not the distance between them.
Q5: You union
pairs to form a single component. If you always attach the smaller tree under the larger (union by size) but skip path compression, what is the worst-case find()complexity?Answer
O(log n). Union by rank/size alone guarantees tree height <= log_2 n, since each merge at least doubles the smaller tree's size. Path compression improves this to O(alpha(n)), but even without it, O(log n) per find is guaranteed.
Q6: Write the minimal DSU (find with path compression, unite with rank) from memory. Can you do it in under 2 minutes?
Q7: Initialize a DSU for
elements using iota. What breaks if you skip the initialization?Answer
Every element's parent defaults to 0 (from vector<int> zero-init), so find(x) returns 0 for all x—every element appears to be in the same set from the start.
How to Practice This
The goal is to internalize DSU so that the three core functions—find, unite, connected—flow without thought. Speed-drill the minimal template:
| Attempt | Target | Notes |
|---|---|---|
| 1st | < 6 min | Focus on correctness—path compression in find, rank comparison in unite. |
| 2nd | < 4 min | Eliminate pauses—the three functions should be automatic. |
| 3rd+ | < 3 min | Competition-ready. DSU is a building block—it must be instant. |
Once the template is automatic, ramp up over three weeks:
- Week 1: Drill basic DSU daily until you hit 3 minutes consistently.
- Week 2: Practice weighted DSU—annotate edges and maintain consistency in
find. - Week 3: Implement rollback DSU and solve an offline connectivity problem with it.
- Ongoing: When you see "merge groups" or "connected components changing over time," reach for DSU first.
Practice Problems
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Road Construction | CSES | Easy | Basic DSU + track component count and max size |
| 2 | Road Reparation | CSES | Easy | Kruskal's MST with DSU |
| 3 | The Third Letter | CF 1850H | Easy-Med | Weighted DSU / potential consistency |
| 4 | Mocha and Diana (Easy) | CF 1559D1 | Medium | Dual-DSU: unite in both forests simultaneously |
| 5 | GCD and MST | CF 1513D | Medium | Kruskal-like greedy with GCD property |
| 6 | Connected Components | CF 292D | Medium | Offline reverse-processing with DSU |
| 7 | People on a Line | AtCoder ABC087D | Medium | Weighted DSU—difference constraints |
| 8 | Edges in MST | CF 160D | Hard | Classify edges as "any MST" / "some MST" / "no MST" |
| 9 | Bipartite Checking | CF 813F | Hard | DSU with rollback + parity + segment tree divide-and-conquer |
| 10 | Extending Set of Points | CF 1140F | Hard | DSU rollback with segment tree on time |
Further Reading
- cp-algorithms—Disjoint Set Union—comprehensive reference with all variants (weighted, rollback, compress-by-halving).
- CF Blog—DSU on trees (small-to-large)—a different technique also called "DSU on trees" (not the same as Union-Find; confusingly named).
- CF Blog—Arpa's trick for offline DSU—divide-and-conquer on queries with DSU rollback.
- Tarjan 1975—"Efficiency of a Good but Not Linear Set Union Algorithm"—the original analysis proving the inverse Ackermann bound.
A note on "DSU on tree": The term "DSU on tree" (also called small-to-large merging or heavy path merging) refers to a technique for answering subtree queries on rooted trees in 10-dsu-on-tree.md.
See also:
- MST—Kruskal—Kruskal's algorithm relies on DSU to detect cycles while building the MST
- Graph Representation—DSU provides an implicit graph representation for connectivity queries
- Persistent DSU—rollback-capable DSU for offline connectivity problems
- Tree Basics—DSU maintains a forest of trees internally; understanding tree structure helps
- Data Structure Selection Guide—when to pick DSU vs. other structures
- Practice Ladder—Data Structures