Skip to content

Small-to-Large Merging

Quick Revisit

  • USE WHEN: merging child data structures into parent during tree DFS (distinct colors, set union on tree)
  • INVARIANT: always merge smaller set into larger — each element moves O(log n) times total
  • TIME: O(n log n) total | SPACE: O(n)
  • CLASSIC BUG: copying the larger set instead of swapping references/pointers — kills the amortization
  • PRACTICE: 08-ladder-mixed

Always merge the smaller structure into the larger one. A single rule that turns O(n2) brute-force into O(nlogn) across trees, sets, and DSU alike. This file is the principle and proof; for the contest-tuned implementation of the heavy-child variant, see DSU on Tree in Chapter 6.

Difficulty: [Intermediate]  ·  CF rating: 1600–1900 Prerequisites: Union-Find DSU, DFS and Tree DFS


Why This Exists

Suppose you have a rooted tree on n nodes. Each node has a color. For every node v, you want to know how many distinct colors appear in the subtree rooted at v.

The naive plan: run a DFS, and at each node maintain a set<int> of colors seen in its subtree. When you return from a child, merge the child's set into the parent's set.

For a balanced tree this is fine -- each merge is small.
But what about a chain?
        1 (red)
        |
        2 (blue)
        |
        3 (red)
        |
        4 (green)
        |
        5 (blue)

Node 5's set has 1 element. Merge into node 4's set: 1 copy. Node 4's set now has 2 elements -- merge into node 3: 2 copies. Node 3 has 3 elements -- merge into node 2: 3 copies. And so on. Total copies: 1+2+3++(n1)=O(n2).

The fix is embarrassingly simple: always merge the smaller set into the larger one. If the child's set is bigger than the parent's set, swap them first, then pour the smaller into the bigger.

Why does this work? Every time an element gets moved, the set it lands in is at least twice the size of the set it came from (because we only move elements from the smaller side). So each element can be moved at most O(logn) times before it sits in a set of size n. With n elements each moving O(logn) times, the total work is O(nlogn).

This is the exact same argument behind union by size in DSU. When you do union(u, v), you attach the smaller tree under the root of the larger tree. Each node's depth increases by 1 only when its component at least doubles, giving O(logn) depth. Small-to-large merging IS the DSU size heuristic, generalized to arbitrary data structures.

Small-to-Large vs DSU-on-Tree

Both solve "aggregate data over subtrees" in O(nlogn). The difference:

  • Small-to-large (set merging): Explicitly maintain a container per node. Merge smaller into larger during DFS. Works for any mergeable structure (set, map, multiset, even segment trees). Straightforward to implement.

  • DSU-on-tree (Euler tour trick): Don't maintain per-node containers at all. Instead, process the "heavy child" without clearing, then process light children with add/remove. Uses the heavy-light structure implicitly. Constant factor is often better because you avoid pointer-heavy containers, but the logic is trickier.

Pick set merging when the container semantics matter (e.g., you need a sorted set or a frequency map). Pick DSU-on-tree when you just need aggregate counts and want speed.


The Doubling Argument, Visually

Consider this tree with 8 nodes, each with a color label:

              1(R)
             / \
           2(B)  3(G)
          / \      \
        4(R) 5(B)  6(R)
        |          |
       7(G)       8(B)

Colors: R=red, B=blue, G=green. We want distinct color count per subtree.

DFS processes leaves first. Here is the merge sequence with small-to-large:

Step 1: Leaf nodes get singleton sets.
  node 7: {G}        size 1
  node 5: {B}        size 1
  node 8: {B}        size 1

Step 2: Process node 4. Has child 7.
  node 4 set: {R}    size 1
  node 7 set: {G}    size 1
  Same size -- pick either direction. Merge 7 into 4.
  node 4: {R, G}     size 2     moves: 1

Step 3: Process node 6. Has child 8.
  node 6 set: {R}    size 1
  node 8 set: {B}    size 1
  Merge 8 into 6.
  node 6: {R, B}     size 2     moves: 1

Step 4: Process node 2. Children: 4, 5.
  node 2 set: {B}           size 1
  node 4 set: {R, G}        size 2
  node 5 set: {B}           size 1
  Child 4 is larger than node 2 -- SWAP. node 2 takes {R, G}.
  Merge old node 2 ({B}) into new node 2: {R, G, B}   moves: 1
  Merge node 5 ({B}) into node 2: B already present.   moves: 1
  node 2: {R, G, B}  size 3    total moves this step: 2

Step 5: Process node 3. Has child 6.
  node 3 set: {G}           size 1
  node 6 set: {R, B}        size 2
  Child 6 is larger -- SWAP. node 3 takes {R, B}.
  Merge old node 3 ({G}) into new: {R, B, G}   moves: 1
  node 3: {R, B, G}  size 3

Step 6: Process node 1. Children: 2, 3.
  node 1 set: {R}              size 1
  node 2 set: {R, G, B}        size 3
  node 3 set: {R, B, G}        size 3
  Child 2 is largest -- SWAP. node 1 takes {R, G, B}.
  Merge old node 1 ({R}): already present.   moves: 1
  Merge node 3 ({R, B, G}): all present.     moves: 3
  node 1: {R, G, B}  size 3    total moves: 4

Total element moves across all steps: 1+1+2+1+4=9.

Naive merge (always child into parent) on this tree would do similar work here, but on a worst-case chain of n nodes it does O(n2). Small-to-large guarantees O(nlogn) regardless of tree shape.

The Doubling Picture

Track how many times each element moves:

Element journey (tracking how many times each item is copied):

  G at node 7: created
    -> moved into node 4  (set size 1 -> 2)  .... move #1
    -> stays at node 2 via swap               .... free
    -> stays at node 1 via swap               .... free
  Total moves for this G: 1

  R at node 4: created
    -> stays at node 2 via swap               .... free
    -> stays at node 1 via swap               .... free
  Total moves for this R: 0

  B at node 5: created
    -> moved into node 2  (set size 1 -> 3)  .... move #1
    -> stays at node 1 via swap               .... free
  Total moves for this B: 1

  R at node 6: created
    -> stays at node 3 via swap               .... free
    -> moved into node 1  (set size 3 -> 3)  .... move #1
  Total moves for this R: 1  (duplicate, doesn't increase set)

  B at node 8: created
    -> moved into node 6  (set size 1 -> 2)  .... move #1
    -> stays at node 3 via swap               .... free
    -> moved into node 1  (set size 3 -> 3)  .... move #2
  Total moves for B(8): 2

The key: every time an element moves, the receiving set is at least as large as the source set. After the move, the element is in a set at least twice its previous size. So: max moves per element =log2n.

Total work=elementsO(logn)=O(nlogn)

C++ STL Tools for Merging

The main containers you'll merge small-to-large:

ContainerInsert one elementMerge k elements from smaller
set<T>O(logn)O(klogn)
map<K,V>O(logn)O(klogn)
unordered_set<T>O(1) amortizedO(k) amortized
multiset<T>O(logn)O(klogn)

Critical operations:

  • swap(a, b) -- O(1) for all STL containers. This is how you "take" the larger child's set without copying. Never skip this.
  • a.insert(b.begin(), b.end()) -- range insert. For sets/maps.
  • a.merge(b) -- C++17. Moves nodes from b into a without reallocation. Faster than insert for node-based containers (set, map). After the call, b contains only elements that had duplicates in a.
  • a.size() -- O(1). Use to decide which is smaller.

std::set::merge is preferred over manual iteration + insert when available:

cpp
set<int> a = {1, 3, 5};
set<int> b = {2, 3, 4};
a.merge(b);
// a = {1, 2, 3, 4, 5}, b = {3} (duplicate stayed in b)

For small-to-large, you almost always want the duplicate-free version, so merge works perfectly -- leftover duplicates in the smaller set don't matter since you discard it.

Data structure choice matters. set insertion is O(logn), giving O(nlog2n) total. unordered_set is O(1) average, giving true O(nlogn). For sorted access, use set; for pure membership, unordered_set is faster.


Implementation: Set Merging

The Problem

Given a rooted tree with n nodes (rooted at 1), each node has a color cv{1,,n}. For each node, output the number of distinct colors in its subtree.

Contest-Ready Version

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    scanf("%d", &n);
    vector<int> color(n + 1);
    for (int i = 1; i <= n; i++) scanf("%d", &color[i]);

    vector<vector<int>> adj(n + 1);
    for (int i = 0; i < n - 1; i++) {
        int u, v;
        scanf("%d%d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    vector<int> ans(n + 1);
    vector<set<int>*> node_set(n + 1);

    // Iterative DFS with post-order processing
    vector<int> order;
    vector<int> par(n + 1, 0);
    vector<bool> visited(n + 1, false);
    stack<int> stk;
    stk.push(1);
    visited[1] = true;
    while (!stk.empty()) {
        int v = stk.top(); stk.pop();
        order.push_back(v);
        for (int u : adj[v]) {
            if (!visited[u]) {
                visited[u] = true;
                par[u] = v;
                stk.push(u);
            }
        }
    }

    // Process in reverse BFS order (leaves first)
    reverse(order.begin(), order.end());
    for (int v : order) {
        node_set[v] = new set<int>();
        node_set[v]->insert(color[v]);
        for (int u : adj[v]) {
            if (u == par[v]) continue;
            // Merge smaller into larger
            if (node_set[u]->size() > node_set[v]->size())
                swap(node_set[v], node_set[u]);
            node_set[v]->merge(*node_set[u]);
            delete node_set[u];
            node_set[u] = nullptr;
        }
        ans[v] = node_set[v]->size();
    }

    for (int i = 1; i <= n; i++)
        printf("%d%c", ans[i], " \n"[i == n]);

    delete node_set[1];
    return 0;
}

Walkthrough

The core loop does three things per node:

  1. Create a set for the node, insert its own color.
  2. For each child, compare set sizes. If the child's set is larger, swap pointers (this is O(1) -- we swap set* pointers, not the sets themselves). Then merge the smaller set into the larger using set::merge.
  3. Record the answer as the set's size.

The swap-pointers trick is essential. Without it, you'd always copy into the parent's set regardless of size, losing the O(nlogn) guarantee.

Memory: we delete child sets after merging to avoid holding O(n) sets simultaneously.

Cleaner Version Using Recursive DFS

If stack depth isn't a concern (tree isn't a chain of 105+ nodes):

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200005;
vector<int> adj[MAXN];
int color[MAXN], ans[MAXN];
set<int>* dfs_set[MAXN];

void dfs(int v, int p) {
    dfs_set[v] = new set<int>();
    dfs_set[v]->insert(color[v]);
    for (int u : adj[v]) {
        if (u == p) continue;
        dfs(u, v);
        if (dfs_set[u]->size() > dfs_set[v]->size())
            swap(dfs_set[v], dfs_set[u]);
        dfs_set[v]->merge(*dfs_set[u]);
        delete dfs_set[u];
    }
    ans[v] = dfs_set[v]->size();
}

int main() {
    int n;
    scanf("%d", &n);
    for (int i = 1; i <= n; i++) scanf("%d", &color[i]);
    for (int i = 0; i < n - 1; i++) {
        int u, v;
        scanf("%d%d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
    dfs(1, 0);
    for (int i = 1; i <= n; i++)
        printf("%d%c", ans[i], " \n"[i == n]);
    delete dfs_set[1];
}

Implementation: DSU-on-Tree (Heavy-Child Variant)

DSU-on-tree (also called "small-to-large on Euler tour" or the "heavy child trick") drops explicit set containers and uses a global array of counters instead. The same amortization argument applies — each node is touched at most O(logn) times, because every node-to-root path has at most logn light edges.

The shape of the algorithm:

  1. Find the heavy child (largest subtree) of each node.
  2. DFS into all light children first; undo their contributions.
  3. DFS into the heavy child; keep its contributions.
  4. Add the current node + all light subtrees on top.
  5. Record the answer.
  6. If we ourselves are a light child of our parent, undo everything.

Why this works: the heavy child's data is reused for free — we never copy or clear it. Light children's contributions cost their subtree size, but a node is only ever inside a light subtree on O(logn) ancestors, so total add/remove work is O(nlogn).

For the full contest-ready implementation, complexity proof, and standard problem variants, see DSU on Tree (Ch 6). The two files split deliberately: this file owns the principle; that file owns the implementation.

Set Merging vs DSU-on-Tree at a glance

AspectSet mergingDSU-on-tree
Container per nodeYes (set<int>*)No (global cnt[])
MemoryO(nlogn) worst caseO(n)
Constant factorHigher (alloc, ptr-heavy)Lower (array ops)
FlexibilityAny mergeable structureNeeds add/remove ops
Difficulty to codeEasyMedium

Complexity Reference

OperationTimeNotes
Small-to-large set merge (total)O(nlog2n)O(nlogn) merges, each O(logn) insert
Small-to-large with unordered_setO(nlogn) expectedO(1) insert amortized
Small-to-large with set::mergeO(nlog2n) worstNode transfer avoids realloc
DSU-on-treeO(nlogn)Each node added/removed O(logn) times
DSU union by sizeO(nlogn) totalSame analysis
Naive merge (always small into big)O(n2) worst caseChain tree

The extra logn factor in set merging comes from the O(logn) cost per set::insert. If you replace set with a hash set, you get O(nlogn) expected. DSU-on-tree avoids this because add/remove are O(1) array operations.


Problem Patterns and Tricks

Pattern 1: Distinct Values in Subtree

The classic application. Maintain set<int> per node, merge small-to-large. Answer is set.size() after merging all children.

Variant: "number of distinct values occurring at least k times." Use map<int,int> (color -> count) instead. Merge by iterating the smaller map and updating counts in the larger.

Pattern 2: Frequency Maps for Subtree Aggregates

For "sum of colors that appear exactly once" or "most frequent color in subtree," maintain a map<int,int> mapping color to frequency, plus auxiliary variables tracking the answer.

cpp
// When merging entry (color c, count k) from smaller into larger:
// 1. Remove old contribution of c from answer
// 2. Update frequency: larger[c] += k
// 3. Add new contribution of c to answer

This pattern handles any query that decomposes over color frequencies.

Pattern 3: DSU-on-Tree for Counting Queries

DSU-on-tree shines when the "aggregate" is a simple counter or sum. Examples:

  • Count of distinct colors (maintain distinct counter)
  • Sum of values at nodes with a specific color
  • Maximum frequency color in subtree

You define add(v) and remove(v) to update global state when node v enters or leaves the current window. The framework handles the rest.

Pattern 4: Choosing Between Techniques

Need sorted order or set operations?         --> set merge
Need just counts / sums?                     --> DSU-on-tree
Need to merge segment trees?                 --> small-to-large on seg trees
Offline, can sort queries?                   --> consider Mo on trees
Need persistent results across subtrees?     --> set merge (keeps data)

Pattern 5: Merging Sorted Structures Beyond Sets

Small-to-large works for any data structure where "merge" means inserting elements one by one. Examples:

  • Vectors (sorted): Merge smaller sorted vector into larger using binary search insertion. O(nlog2n) total.
  • Segment trees: Merge two segment trees by walking both simultaneously. With dynamic nodes, merging a tree of size k into a larger one costs O(klogC) where C is the value range. Total: O(nlognlogC).
  • Fenwick trees / BITs: Less natural to merge -- consider DSU-on-tree with a global BIT instead.

Pattern 6: Weighted DSU (Edge-Weight Aggregation)

When edges have weights and you need subtree queries involving both structure and values, combine DSU union-by-size with weight tracking. The small-to-large principle ensures each element's weight is updated at most O(logn) times.


Gotchas and Debugging

Swap Pointers, Not Contents

The most common bug: copying the larger set into a new set instead of swapping.

cpp
// WRONG: O(n^2) -- copies the big set every time
for (int x : *child_set) parent_set->insert(x);

// RIGHT: O(1) swap, then merge small into big
if (child_set->size() > parent_set->size())
    swap(parent_set, child_set);
parent_set->merge(*child_set);

If you're using set<int> node_set[MAXN] (by value, not pointer), then swap does swap contents in O(1) for STL containers -- but only if you swap the actual container objects, not copies.

Forgetting to Merge Into the Correct Side

With pointer-based sets, after swap(node_set[v], node_set[u]), the parent v now holds what was u's set. Make sure subsequent children merge into node_set[v], not some stale reference.

Root Handling

The root's answer is just its final set size. But if you're tracking something beyond size (like sum of frequencies), make sure the root's own contribution is included before reading the answer. A common mistake: inserting the node's own color after merging children, then forgetting to do it for the root.

Fix: always insert the node's own color first, before processing children.

Off-by-One in Subtree Sizes (DSU-on-Tree)

In DSU-on-tree, the Euler tour range for node v's subtree is [tin[v],tout[v]) -- a half-open interval. If you use a closed interval, you'll either miss one node or process an extra node from a sibling's subtree.

cpp
// Correct: half-open [tin[v], tout[v])
for (int t = tin[u]; t < tout[u]; t++) add(euler[t]);

// WRONG: closed interval includes one node too many
for (int t = tin[u]; t <= tout[u]; t++) add(euler[t]);

Memory Leaks in Contest

Allocating new set<int> per node and forgetting to delete after merge won't cause WA, but can cause MLE on tight problems. Either delete immediately after merge or use an arena/pool allocator.

Confusing Heavy Child with Largest Set

In set merging, "heavy" means the child with the largest set after its subtree is processed -- not the child with the largest subtree size. These differ when colors repeat. For small-to-large set merge, compare set.size(), not subtree_size. For DSU-on-tree, compare subtree_size (since we don't have explicit sets).

Mental traps

"I'll merge in any order — it's the same idea." No. The direction is the entire basis of the complexity guarantee. Repeatedly merging a set of size n/2 into a set of size 1 gives O(n2).

"I don't need the proof, just the rule." Knowing the proof tells you exactly when the technique applies. The invariant is "after each move, the containing set is at least twice as large." If you break the rule, the doubling fails and the bound fails.

"This is only a tree-DFS technique." The DFS is the most common application, but the principle applies anywhere you repeatedly merge two groups: offline graph problems, set partitioning, weighted union-find with set contents.

"DSU-on-tree is the same as small-to-large." Related but not identical. DSU-on-tree inherits the heavy child's data without copying (an optimization); explicit small-to-large merges elements one by one. DSU-on-tree uses the heavy-light structure implicitly.

Destroy small buckets after merging

A subtle TLE pattern: in Lomsat gelral (CF 600E) with std::map<int,int>, after merging a child's map into the parent, forgetting to clear() the child's map leaves it alive. Later, when the parent is merged upward, the ancestor sees a "small" bucket that is actually still full. The amortization shatters because the size-comparison lies. Rule: pour the small bucket into the big one, then destroy the small bucket.

Debug Drills

Drill 1: Merging sets on a tree -- but the answer is wrong for the root.

cpp
void dfs(int v, int p) {
    sets[v].insert(color[v]);
    for (int u : adj[v]) {
        if (u == p) continue;
        dfs(u, v);
        if (sets[v].size() < sets[u].size())
            swap(sets[v], sets[u]);
        for (int x : sets[v])   // <-- look here
            sets[v].insert(x);
    }
    ans[v] = sets[v].size();
}
What's wrong?

The merge loop iterates over sets[v] (the larger set) and inserts into sets[v] -- it should iterate over sets[u] (the smaller set) and insert into sets[v]. Iterating and inserting into the same container is also undefined behavior for std::set. Fix: for (int x : sets[u]) sets[v].insert(x);

Drill 2: DSU union-by-size -- but find is slow on some inputs.

cpp
int par[MAXN], sz[MAXN];
int find(int x) {
    while (par[x] != x) x = par[x];
    return x;
}
void unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return;
    if (sz[a] < sz[b]) swap(a, b);
    par[b] = a;
    // sz[a] += sz[b];   <-- missing line
}
What's wrong?

After attaching b under a, the code never updates sz[a]. So sz[a] stays at 1, and future unions always think every component is size 1 -- effectively producing a random tree shape instead of a balanced one. Worst-case find becomes O(n). Fix: add sz[a] += sz[b]; after par[b] = a;.

Drill 3: Small-to-large merging of std::map for frequency counts -- correct logic but TLE.

cpp
void merge_into(map<int,long long>& big, map<int,long long>& small) {
    for (auto& [key, val] : small)
        big[key] += val;
    small.clear();
}

void dfs(int v, int p) {
    freq[v][color[v]] = 1;
    for (int u : adj[v]) {
        if (u == p) continue;
        dfs(u, v);
        merge_into(freq[v], freq[u]);  // always merge u into v
    }
}
What's wrong?

The code always merges the child into the parent without checking sizes. If the child's map is larger than the parent's, you should swap them first. Without the swap, a chain-shaped tree degrades to O(n²). Fix: before merge_into, add if (freq[v].size() < freq[u].size()) swap(freq[v], freq[u]); to ensure you always iterate over the smaller map.


Self-Test

Drawn from reflection notes

Without opening any reference:

  • [ ] Prove the O(nlogn) complexity of small-to-large merging in 3-4 sentences. If you can't, you don't own this technique yet.

  • [ ] Write the small-to-large DFS template from memory -- inherit the largest child's set, merge smaller children into it. Include the swap step.

  • [ ] Explain why std::swap on two std::set objects is O(1) -- what does swap actually do internally? Why is this critical for the technique?

  • [ ] Identify a problem that looks like small-to-large but isn't -- where the set operation is not a union, or where the "small into large" direction doesn't apply. What goes wrong?

  • [ ] State the time complexity for small-to-large using std::set (O(logn) insert) vs std::unordered_set (O(1) average insert). What is the total in each case? When would you prefer one over the other?


Practice Problems

ProblemSourceDifficultyTechniqueLink
Distinct ColorsCSES1600Set merge / DSU-on-treecses.fi/problemset/task/1139
Lomsat gelralCF 600E1900DSU-on-tree (most frequent color)codeforces.com/contest/600/problem/E
Arpa's letter-marked treeCF 741D2100DSU-on-tree + bitmaskcodeforces.com/contest/741/problem/D
Blood Cousins ReturnCF 246E1800Set merge for distinct names at depthcodeforces.com/contest/246/problem/E
Tree QueriesAtCoder ABC 202 E1700Euler tour + counting (DSU-on-tree variant)atcoder.jp/contests/abc202/tasks/abc202_e
Propagating treeCF 383C1700Euler tour with range updatescodeforces.com/contest/383/problem/C
Count on a tree IISPOJ COT22000Mo on trees (alternative technique to compare)spoj.com/problems/COT2

Start with CSES Distinct Colors (pure set merge). Then CF 600E for DSU-on-tree. CF 741D is a great "next step" that combines the technique with bitmask tricks.

Rating progression:

CF RatingWhat to Master
1200Understand why naïve merging of sets is O(n²). Implement basic DSU with union by size. Know that swap is O(1) for STL containers.
1500Implement DSU-on-tree (Euler tour + heavy child reuse). Solve "count distinct colors in subtree" type problems. Understand the log n amortized bound.
1800Combine small-to-large with additional data structures (merging std::map or std::set with order statistics). Handle queries that require rollback or persistent structures.
2100Apply to non-tree structures (virtual trees, auxiliary graphs). Merge segment trees or Fenwick trees with the small-to-large principle. Prove tight bounds for specific problem constraints.

Further Reading

Built for the climb from Codeforces 1100 to 2100.