DSU on Tree (Small-to-Large Merging)

Quick Revisit
USE WHEN: Offline subtree aggregate queries for every vertex (distinct count, mode, frequency) where N ≤ 10⁵
INVARIANT: Keep heavy child's data, merge light children into it; each element re-inserted O(log n) times since each merge at least doubles the receiving set
TIME: O(n log n) | SPACE: O(n)
CLASSIC BUG: Choosing heavy child by subtree depth instead of subtree size — destroys the O(log n) merge guarantee and degrades to O(n²)
PRACTICE: 05-ladder-graphs

Answer "for every vertex $v$ , compute some aggregate over the multiset of values in $v$ 's subtree" in $O (n \log n)$ total time -- by keeping the heavy child's data and merging all light children into it.

Difficulty: [Intermediate]ID: AL-23 Prerequisites: DFS & Tree DFS. Euler Tour & LCA is helpful for the offline traversal pattern. The name is misleading -- this technique does not use the DSU (union-find) data structure; it shares only the small-to-large idea. If you have not learned union-find, you can still learn DSU on tree. CF Rating: 1700-2000

Name disambiguation. "DSU on tree" is a confusing name. This is not the disjoint-set-union data structure with find and union operations. It is small-to-large merging on DFS subtrees: a recursive DFS that, at each node, keeps the largest child's data and re-inserts elements from the smaller siblings. The "DSU" label survives only because the small-to-large trick was first analysed in the context of weighted union-find. If the name is throwing you off, mentally replace "DSU on tree" with "small-to-large on subtrees" -- that is what the algorithm actually is.

If I Had 5 Minutes

What: Answer "for each vertex, compute X over its subtree" for ALL vertices in $O (n \log n)$ .
When: Offline subtree queries -- distinct count, mode, frequency map -- where $N \leq 10^{5}$ .
Core trick: Keep the heavy (largest) child's data, merge every light child into it.
Complexity: $O (n \log n)$ total -- each element re-inserted $O (\log n)$ times because each move at least doubles the receiving set's size.
Gotcha: Heavy child = largest subtree size, NOT deepest. Swapping merge direction -> $O (n^{2})$ .

Mnemonic Anchor

"Keep the King, Move the Pawns" -- always keep the larger set intact and re-insert elements from the smaller one. The King (heavy child's data) stays on the throne; the Pawns (light children's data) walk over to join it.

Intuition

You have a tree with 7 nodes, rooted at node 1. Each node has a color:

         1(R)
        / \
      2(B)  3(R)
      / \     \
   4(R) 5(B)  6(G)
   |
   7(G)

Task: for every vertex $v$ , count the number of distinct colors in $v$ 's subtree.

The brute-force approach runs a DFS from every vertex, collecting colors and counting distinct ones. For a leaf like node 7, you look at just one color. For the root, you walk all 7 nodes. In the worst case (a long chain), this is $O (n)$ per vertex, so $O (n^{2})$ total.

With $n = 2 \times 10^{5}$ that is $4 \times 10^{10}$ operations -- far too slow.

We need a way to reuse the color sets we already built for children instead of recomputing from scratch for each vertex.

Keep the answer for the "heavy" (largest-subtree) child, and merge all "light" children into it. Each element participates in $O (\log n)$ merge operations across the entire tree because every merge at least doubles the set size it enters.

Why does this work? When we merge a small set of size $s$ into a larger set, every element of the small set moves. But after the merge, the combined set has size $\geq 2 s$ . An element can only double its containing set $⌊ \log_{2} n ⌋$ times before the set has size $n$ . So each element is moved at most $O (\log n)$ times total -- across the entire tree, not per vertex.

This is the small-to-large (or "weighted union") principle: always merge the smaller set into the larger one.

Real-life analogy: Like moving into your spouse's bigger apartment instead of making them move into yours -- the total moving effort is minimized when the smaller household relocates.

Historical note: The technique is sometimes called "small-to-large merging" or "heavy-light merging on sets." It was popularized in competitive programming around 2015-2016, with the Codeforces blog post by Arpa being a key reference. The underlying principle (weighted union) dates back to classic union-find analysis.

💡 The Aha Moment: When you keep the heavy child's data and only re-insert light children, you're not doing extra work -- you're avoiding redundant work. The heavy child's set is already correct for its subtree. Re-building it from scratch would cost $O (sz [h e a v y])$ every time, but that's the largest subtree. By keeping it, you only pay for the light children, whose total sizes are bounded by the small-to-large argument. That single decision -- "don't touch the biggest piece" -- is what turns $O (n^{2})$ into $O (n \log n)$ .

Visual Walkthrough

Step 1 -- Compute subtree sizes. Numbers in parentheses are subtree sizes.

           1(7)
          / \
       2(4)  3(2)
       / \     \
    4(2) 5(1)  6(1)
    |
    7(1)

Step 2 -- Identify the heavy child at each internal node. Heavy child = child with the largest subtree.

    Heavy child selections:
      1 -> heavy: 2 (sz 4)   light: 3 (sz 2)
      2 -> heavy: 4 (sz 2)   light: 5 (sz 1)
      3 -> heavy: 6 (sz 1)   (only child)
      4 -> heavy: 7 (sz 1)   (only child)

    Edge classification:
           1
          //\
        2    3
       //\    \\
      4   5    6
     //
     7

    // = heavy edge      \ = light edge

Step 3 -- Process bottom-up. At each vertex, inherit the heavy child's color set, then insert light children's elements one by one.

    Process node 7:  set = {G}          distinct = 1
    Process node 4:  inherit {G} from 7
                     add color(4)=R     set = {G, R}      distinct = 2
    Process node 5:  set = {B}          distinct = 1
    Process node 2:  inherit {G, R} from heavy child 4
                     merge {B} from light child 5
                     add color(2)=B     set = {G, R, B}   distinct = 3
    Process node 6:  set = {G}          distinct = 1
    Process node 3:  inherit {G} from 6
                     add color(3)=R     set = {G, R}       distinct = 2
    Process node 1:  inherit {G, R, B} from heavy child 2
                     merge {G, R} from light child 3
                       G already in set (no new distinct)
                       R already in set (no new distinct)
                     add color(1)=R     set = {G, R, B}   distinct = 3

Light child 3's set had 2 elements; they merged into a set of size 3. Light child 5's set had 1 element; it merged into a set of size 2. Each element was moved at most twice (once at node 2, once at node 1) -- consistent with the $O (\log n)$ bound.

Detailed Step-by-Step: Node-by-Node Execution

Let us trace exactly which sets merge into which, showing the data structure state after each operation. We process nodes in post-order (leaves first, root last).

  Tree (colors in parentheses):        Subtree sizes:
         1(R)                                1(7)
        / \                                 / \
      2(B)  3(R)                         2(4)  3(2)
      / \     \                          / \     \
   4(R) 5(B)  6(G)                    4(2) 5(1)  6(1)
   |                                   |
   7(G)                                7(1)

Round 1 -- Node 7 (leaf)

  Current node: 7    Color: G    Children: none
  Action: Create new set {G}
  State: set(7) = {G}            answer(7) = 1 distinct

Round 2 -- Node 4 (heavy child = 7)

  Current node: 4    Color: R    Children: [7(heavy)]
  Action: Inherit set from heavy child 7: {G}
          Add self color R
  State: set(4) = {G, R}         answer(4) = 2 distinct
  
  Merges: 0 (only inherited)
  +---+---+
  | G | R |  <-- set(4), inherited from 7, plus 4's own color
  +---+---+

Round 3 -- Node 5 (leaf)

  Current node: 5    Color: B    Children: none
  Action: Create new set {B}
  State: set(5) = {B}            answer(5) = 1 distinct

Round 4 -- Node 2 (heavy child = 4, light child = 5)

  Current node: 2    Color: B    Children: [4(heavy), 5(light)]
  Action: Inherit set from heavy child 4: {G, R}
          Merge light child 5's set {B} into it
            Insert B --> new color, distinct++
          Add self color B (already present, no change to distinct)
  
  set(4)={G,R}    set(5)={B}
       |               |
       v               v
  +---+---+       +---+
  | G | R |  <--  | B |  (B moves to the larger set)
  +---+---+       +---+
       |
       v
  +---+---+---+
  | G | R | B |   <-- set(2) after merge
  +---+---+---+
  
  State: set(2) = {G, R, B}      answer(2) = 3 distinct
  Merges: 1 element moved (B from set(5))

Round 5 -- Node 6 (leaf)

  Current node: 6    Color: G    Children: none
  Action: Create new set {G}
  State: set(6) = {G}            answer(6) = 1 distinct

Round 6 -- Node 3 (heavy child = 6)

  Current node: 3    Color: R    Children: [6(heavy)]
  Action: Inherit set from heavy child 6: {G}
          Add self color R
  State: set(3) = {G, R}         answer(3) = 2 distinct

Round 7 -- Node 1 (root) (heavy child = 2, light child = 3)

  Current node: 1    Color: R    Children: [2(heavy), 3(light)]
  Action: Inherit set from heavy child 2: {G, R, B}
          Merge light child 3's set {G, R} into it
            Insert G --> already present, no change
            Insert R --> already present, no change
          Add self color R (already present, no change)
  
  set(2)={G,R,B}    set(3)={G,R}
       |                  |
       v                  v
  +---+---+---+      +---+---+
  | G | R | B |  <--  | G | R |  (G,R move but are duplicates)
  +---+---+---+      +---+---+
       |
       v
  +---+---+---+
  | G | R | B |   <-- set(1) unchanged (both were duplicates)
  +---+---+---+
  
  State: set(1) = {G, R, B}      answer(1) = 3 distinct
  Merges: 2 elements moved (G and R from set(3), both already present)

Summary of element movements across all rounds:

  Element 7(G): created at node 7, inherited to 4, inherited to 2, inherited to 1
                Never "moved" (always in the heavy path set)  -> 0 re-insertions
  
  Element 4(R): added at node 4, inherited to 2, inherited to 1
                Never "moved"                                 -> 0 re-insertions
  
  Element 5(B): created at node 5, MOVED into set(2)
                Then inherited to 1                           -> 1 re-insertion
  
  Element 6(G): created at node 6, inherited to 3, MOVED into set(1)
                                                              -> 1 re-insertion
  
  Element 3(R): added at node 3, MOVED into set(1)           -> 1 re-insertion
  
  Total re-insertions: 3  (for n=7, O(n log n) = ~19, well within bound)

The Invariant

After processing vertex $v$ , the data structure contains exactly the multiset of values from $v$ 's subtree. Each element participates in at most $O (\log n)$ merge operations across the entire tree.

This invariant guarantees correctness (the set is always accurate for the current subtree) and efficiency ( $O (n \log n)$ total insertions).

When to Reach for This

Trigger phrases in problem statements:

"For each vertex, count distinct ..."
"For each subtree, find the mode / most frequent element ..."
"Answer a query for every vertex" (offline, subtree-scoped)
"Merge information from children" where the information is a set or frequency map

If the query is about paths (not subtrees), DSU on tree does not directly apply -- consider HLD or centroid decomposition instead.

When NOT to Use This

Queries on paths, not subtrees. DSU on tree only sees subtree aggregates. For path queries ( $u \to v$ ), use Heavy-Light Decomposition or Centroid Decomposition.
Online updates between queries. DSU on tree is an offline technique -- it computes all answers in one pass. If the tree changes or values update between queries, use HLD + segment tree or link-cut trees.
Persistent results across modifications. If you need to roll back or maintain multiple versions of the subtree aggregate, consider Persistent Segment Tree or rollback DSU.
Aggregates that can't be maintained incrementally. DSU on tree inserts elements one by one. If the aggregate (e.g., median, or some non-decomposable function) can't be updated incrementally with $O (\log n)$ per insertion, the total complexity degrades. For median-type queries, consider merge sort trees or persistent order-statistics structures.
Queries on a single specific subtree. If you only need the answer for one vertex (not all vertices), a simple DFS in $O (n)$ suffices -- DSU on tree's machinery is overkill.

What the Code Won't Teach You

DSU on tree has nothing to do with the union-find (DSU) data structure. The name is misleading. It refers to the "small-to-large" merging paradigm -- always merge the smaller collection into the larger one. No path compression, no union by rank, no find() function. If you arrive expecting union-find operations, you're looking at the wrong algorithm.

The Euler tour is the hidden enabler. Subtree $(v)$ occupies a contiguous range $[tin [v], tout [v])$ in DFS order. This means "add all nodes in subtree $v$ " is just a range scan, and "remove all nodes" is the reverse scan. Without this contiguity, the add/remove operations would require explicit set membership tracking.

  Euler tour makes subtrees contiguous:

  Tree:           DFS order:    Euler tour ranges:
       1          1 2 4 7 5 3 6   tin  tout
      / \                          1:  0    7
     2   3        Subtree of 2:    2:  1    5
    / \   \       positions 1..4   3:  5    7
   4   5   6                       4:  2    4
   |              Subtree of 3:    5:  4    5
   7              positions 5..6   6:  6    7
                                   7:  3    4
  Adding subtree(2) = scanning positions [1..5) in DFS order
  No set lookup needed -- just iterate the range.

The heavy-light split here is NOT the same as HLD. The heavy child definition is identical (child with largest subtree), but the algorithms differ completely. HLD decomposes paths into chain segments for path queries. DSU on tree processes subtrees, inheriting from the heavy child to amortize re-computation. Same structural decomposition, entirely different purpose.

  Same decomposition, different use:

  HLD:                           DSU on tree:
  +---------------------------+  +---------------------------+
  | Decomposes PATHS into     |  | Processes SUBTREES by     |
  | O(log n) heavy chains     |  | inheriting heavy child's  |
  | for path queries/updates  |  | data structure, merging   |
  |                           |  | light children into it    |
  | Answers: path sum,        |  | Answers: distinct count,  |
  |   path max, path update   |  |   mode, frequency per     |
  |                           |  |   subtree (offline)       |
  +---------------------------+  +---------------------------+

Algorithm

High-Level Steps

Root the tree and compute subtree sizes via DFS.
Identify the heavy child of each node (child with largest subtree).
DFS post-order. For each vertex $v$ :
- Recurse into all light children first. After each light child finishes and its answer is recorded, discard (or "undo") the light child's contribution.
- Recurse into the heavy child last. Keep its data structure intact.
- Re-add all elements from light children's subtrees into the surviving data structure.
- Add $v$ itself.
- Record $v$ 's answer (distinct count, mode, etc.).
If $v$ is itself a light child of its parent, its data will be discarded after the parent records $v$ 's answer -- this is handled by the parent's DFS logic.

Why "Discard then Re-add"?

We cannot keep all children's data simultaneously -- that would give $O (n^{2})$ space in the worst case (imagine a star graph). Instead, we process light children, record their answers, clear the structure, then process the heavy child and keep it. Finally, we walk through each light subtree again, inserting elements into the heavy child's structure. The small-to-large guarantee ensures the total re-insertions are $O (n \log n)$ .

Alternative: Pointer / Reference Swapping

Instead of clearing and re-inserting, we can give each vertex its own map or set and swap the largest child's container into the parent via std::swap (constant time). Then iterate over each smaller child's container, inserting elements into the parent's (now-large) container. This avoids the "discard + re-walk" pattern and is often simpler to implement.

Implementation

The Three-Phase Order

For each vertex v with children c1, c2, ..., ck (heavy child = ch):

Process all LIGHT children (recurse, then CLEAR their contributions)
Process the HEAVY child (recurse, KEEP its contributions)
Re-add all light children's contributions to merge with heavy child's data

Why clear before heavy? If light children's data remains when processing the heavy child, you'd double-count elements. The heavy child's data is kept because it's the largest subtree -- re-adding it would be expensive (defeating the O(n log n) guarantee).

The amortization works because each vertex is added/removed O(log n) times total (it's "light" in at most log n ancestors).

Version 1: Clean Contest Template (Distinct Colors per Subtree)

Solves: "For each vertex, how many distinct colors appear in its subtree?"

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
int n;
int color[MAXN];
vector<int> adj[MAXN];
int sz[MAXN];
int ans[MAXN];

// cnt[c] = number of nodes with color c currently in the global set
map<int,int> *ptr[MAXN];        // each node owns a map (pointer)
int distinct_count[MAXN];        // running distinct count per set

void dfs_sz(int v, int p) {
    sz[v] = 1;
    for (int u : adj[v]) if (u != p) {
        dfs_sz(u, v);
        sz[v] += sz[u];
    }
}

// Insert color c into the map *m, updating distinct counter at id.
void add(map<int,int> *m, int &distinct, int c) {
    if (++(*m)[c] == 1) ++distinct;
}

void dfs(int v, int p) {
    int hvy = -1, mx = 0;
    for (int u : adj[v]) if (u != p) {
        if (sz[u] > mx) { mx = sz[u]; hvy = u; }
    }

    // 1. Process light children, build their maps.
    for (int u : adj[v]) if (u != p && u != hvy)
        dfs(u, v);

    // 2. Process heavy child -- its map survives.
    if (hvy != -1) {
        dfs(hvy, v);
        ptr[v] = ptr[hvy];               // steal heavy child's map
        distinct_count[v] = distinct_count[hvy];
    } else {
        ptr[v] = new map<int,int>();      // leaf: fresh map
        distinct_count[v] = 0;
    }

    // 3. Add v's own color.
    add(ptr[v], distinct_count[v], color[v]);

    // 4. Merge each light child's map into v's map.
    for (int u : adj[v]) if (u != p && u != hvy) {
        for (auto &[c, cnt] : *ptr[u]) {
            if ((*ptr[v]).count(c) == 0)
                ++distinct_count[v];
            (*ptr[v])[c] += cnt;
        }
        delete ptr[u];                    // free light child's map
    }

    ans[v] = distinct_count[v];
}

int main() {
    scanf("%d", &n);
    for (int i = 1; i <= n; i++) scanf("%d", &color[i]);
    for (int i = 0; i < n - 1; i++) {
        int u, v; scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
    dfs_sz(1, 0);
    dfs(1, 0);
    for (int i = 1; i <= n; i++)
        printf("%d%c", ans[i], " \n"[i == n]);
}

Version 2: Swap-Based Merging with Frequency Tracking

Solves: "For each vertex, find the most frequent color in its subtree. If tied, report the smallest color."

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
int n;
int color[MAXN];
vector<int> adj[MAXN];
int sz[MAXN];
int ans[MAXN];

// Per-node: frequency map  color -> count
//           and the current (mode_count, mode_color) pair.
struct Info {
    map<int,int> freq;
    int mode_cnt = 0;
    int mode_col = 0;

    void add(int c) {
        int f = ++freq[c];
        if (f > mode_cnt || (f == mode_cnt && c < mode_col)) {
            mode_cnt = f;
            mode_col = c;
        }
    }

    // Merge a smaller Info into this one.
    void merge(Info &o) {
        for (auto &[c, cnt] : o.freq) {
            int f = (freq[c] += cnt);
            if (f > mode_cnt || (f == mode_cnt && c < mode_col)) {
                mode_cnt = f;
                mode_col = c;
            }
        }
    }
};

Info* info[MAXN];

void dfs_sz(int v, int p) {
    sz[v] = 1;
    for (int u : adj[v]) if (u != p) {
        dfs_sz(u, v);
        sz[v] += sz[u];
    }
}

void dfs(int v, int p) {
    int hvy = -1, mx = 0;
    for (int u : adj[v]) if (u != p)
        if (sz[u] > mx) { mx = sz[u]; hvy = u; }

    for (int u : adj[v]) if (u != p && u != hvy)
        dfs(u, v);

    if (hvy != -1) {
        dfs(hvy, v);
        info[v] = info[hvy];             // steal heavy child
    } else {
        info[v] = new Info();
    }

    info[v]->add(color[v]);

    for (int u : adj[v]) if (u != p && u != hvy) {
        info[v]->merge(*info[u]);
        delete info[u];
    }

    ans[v] = info[v]->mode_col;
}

int main() {
    scanf("%d", &n);
    for (int i = 1; i <= n; i++) scanf("%d", &color[i]);
    for (int i = 0; i < n - 1; i++) {
        int u, v; scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
    dfs_sz(1, 0);
    dfs(1, 0);
    for (int i = 1; i <= n; i++)
        printf("%d%c", ans[i], " \n"[i == n]);
}

Version 3: Euler-Tour + Global Array (No Extra Allocation)

Classic "DSU on tree" style using the Euler tour. Instead of per-node maps, maintain a single global frequency array. Process the heavy child last (keep its contribution), and for light children, add then remove their subtree elements.

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
int n;
int color[MAXN];
vector<int> adj[MAXN];
int sz[MAXN], par[MAXN];
int in_[MAXN], out_[MAXN], order[MAXN];
int timer_ = 0;
long long ans[MAXN];

// Global state
int cnt[MAXN];        // cnt[c] = frequency of color c in current window
long long cur_ans;    // current answer (sum of colors appearing exactly once, etc.)
int mx_freq;          // or: track max frequency, depending on problem

void dfs_sz(int v, int p) {
    par[v] = p; sz[v] = 1;
    // Move heavy child to front of adj list for easy access.
    int hvy_idx = -1, mx = 0;
    for (int i = 0; i < (int)adj[v].size(); i++) {
        int u = adj[v][i];
        if (u == p) continue;
        dfs_sz(u, v);
        sz[v] += sz[u];
        if (sz[u] > mx) { mx = sz[u]; hvy_idx = i; }
    }
    if (hvy_idx > 0) swap(adj[v][0], adj[v][hvy_idx]);
    // Handle case where adj[v][0] == p but heavy child is elsewhere.
    if (!adj[v].empty() && adj[v][0] == p && hvy_idx != -1)
        swap(adj[v][0], adj[v][hvy_idx]);
}

void dfs_order(int v, int p) {
    in_[v] = timer_;
    order[timer_++] = v;
    for (int u : adj[v]) if (u != p) dfs_order(u, v);
    out_[v] = timer_;
}

void add_node(int v) {
    int c = color[v];
    ++cnt[c];
    // Adapt this line to the specific problem.
    // Example: track max frequency.
    if (cnt[c] > mx_freq) mx_freq = cnt[c];
}

void remove_node(int v) {
    int c = color[v];
    --cnt[c];
    // Note: decreasing mx_freq lazily is tricky;
    // for problems needing exact max, use the map-based approach.
}

void add_subtree(int v) {
    for (int i = in_[v]; i < out_[v]; i++) add_node(order[i]);
}

void remove_subtree(int v) {
    for (int i = in_[v]; i < out_[v]; i++) remove_node(order[i]);
}

// keep = true means "this is the heavy child, don't undo afterwards"
void dfs(int v, int p, bool keep) {
    // Identify heavy child (first in adj list after dfs_sz rearrangement).
    int hvy = -1;
    for (int u : adj[v]) if (u != p) { hvy = u; break; }

    // 1. Process light children (undo their contributions).
    for (int u : adj[v]) if (u != p && u != hvy)
        dfs(u, v, false);

    // 2. Process heavy child (keep its contribution).
    if (hvy != -1)
        dfs(hvy, v, true);

    // 3. Add v itself and all light subtrees.
    add_node(v);
    for (int u : adj[v]) if (u != p && u != hvy)
        add_subtree(u);

    // 4. Record answer for v.
    ans[v] = mx_freq;  // adapt per problem

    // 5. If v is a light child, undo everything.
    if (!keep) {
        remove_node(v);
        for (int u : adj[v]) if (u != p && u != hvy)
            remove_subtree(u);
        // Reset mx_freq if needed (problem-dependent).
        // For exact tracking, recompute or use the map-based version.
    }
}

int main() {
    scanf("%d", &n);
    for (int i = 1; i <= n; i++) scanf("%d", &color[i]);
    for (int i = 0; i < n - 1; i++) {
        int u, v; scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
    dfs_sz(1, 0);
    timer_ = 0;
    dfs_order(1, 0);
    mx_freq = 0;
    dfs(1, 0, true);
    for (int i = 1; i <= n; i++)
        printf("%lld%c", ans[i], " \n"[i == n]);
}

Operations Reference

Operation	Time	Space	Notes
Build (subtree sizes + Euler tour)	$O (n)$	$O (n)$	Single DFS each
Process all vertices (total)	$O (n \log n)$	$O (n)$	Each element merged $O (\log n)$ times
Per-vertex answer extraction	$O (1)$	--	Read from current data structure state
Map-based merge (Version 1 & 2)	$O (n \log^{2} n)$	$O (n)$	Extra $\log n$ from map operations
Array-based (Version 3)	$O (n \log n)$	$O (n + C)$	$C$ = color range; $O (1)$ per insert
Total space	--	$O (n)$ or $O (n + C)$	Map-based vs array-based

Why $O (n \log n)$ ? Each node $v$ is in the light subtree of at most $O (\log n)$ ancestors. Each time $v$ is a light descendant, it gets re-inserted once. Total re-insertions across all nodes: $\sum_{v} O (\log n) = O (n \log n)$ .

Problem Patterns & Tricks

Pattern 1: Distinct Values per Subtree

The canonical application. For each vertex $v$ , count the number of distinct colors (or values) in $v$ 's subtree. Use a set or frequency map; merge small into large.

Problems: CF 600E -- Lomsat gelral

Pattern 2: Mode (Most Frequent Element) per Subtree

Track the frequency of every color and maintain the current maximum frequency (and the sum/min/count of colors achieving it). The merge step updates the mode as each element is inserted.

cpp

// While merging element with color c:
int f = ++freq[c];
if (f > best_freq) { best_freq = f; best_sum = c; }
else if (f == best_freq) { best_sum += c; }

Problems: CF 600E (sum of modal colors), CF 570D

Pattern 3: K-th Smallest / Order Statistics per Subtree

Maintain an order-statistics tree (policy-based __gnu_pbds::tree) or a sorted structure per subtree. Merge small into large. After merging, answer order-statistic queries in $O (\log n)$ .

cpp

#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
typedef tree<int,null_type,less<int>,rb_tree_tag,
             tree_order_statistics_node_update> ordered_set;

Pattern 4: Path Queries via Virtual Vertex

DSU on tree handles subtree queries. To answer a path query $u \to v$ , create a virtual vertex at the LCA whose "subtree" captures the path. This is advanced and often replaced by HLD; use only when the aggregate naturally decomposes.

Pattern 5: XOR / Sum of Distinct Elements per Subtree

Instead of counting distinct elements, compute their XOR or sum. Maintain a set of present elements; when an element's frequency goes from 0 to 1, add it to the running XOR/sum. When it goes from 1 to 0 (during removal in the Euler-tour version), remove it.

Problems: CF 208E

Pattern 6: Offline Queries Mapped to Subtrees

Given queries "how many nodes in the subtree of $v$ have value in range $[l, r]$ ?", process all vertices with DSU on tree. At each vertex $v$ , the data structure holds exactly $v$ 's subtree. Answer all queries for $v$ at that moment using a BIT or merge-sort tree over values.

Problems: CF 375D -- Tree of Science

Pattern 7: Combine with Centroid Decomposition

For problems requiring both subtree aggregates and path-distance queries, compute subtree aggregates with DSU on tree and distance-related aggregates with centroid decomposition. The two techniques compose cleanly.

Contest Cheat Sheet

+-------------------------------------------------------------------+
| DSU ON TREE (SMALL-TO-LARGE MERGING) CHEAT SHEET                  |
+-------------------------------------------------------------------+
| WHEN TO USE:                                                      |
|   - "For each vertex, compute [aggregate] over its subtree"       |
|   - Distinct count, mode, frequency, k-th element per subtree     |
|   - Offline subtree queries where per-vertex answer is needed     |
+-------------------------------------------------------------------+
| ALGORITHM:                                                        |
|   1. Compute subtree sizes, identify heavy child (largest sub)    |
|   2. DFS post-order:                                              |
|      a. Recurse light children (discard or process separately)    |
|      b. Recurse heavy child (KEEP its data structure)             |
|      c. Re-add light subtree elements into heavy's structure      |
|      d. Add current vertex                                        |
|      e. Record answer for current vertex                          |
+-------------------------------------------------------------------+
| TWO IMPLEMENTATION STYLES:                                        |
|   Map/set per node:  steal heavy child's map via pointer swap,    |
|                      iterate + insert light children's maps.      |
|   Global array:      Euler-tour order, add/remove subtrees;       |
|                      heavy child's contribution persists.         |
+-------------------------------------------------------------------+
| COMPLEXITY:                                                       |
|   Time:  O(n log n) total  [O(n log^2 n) with map]               |
|   Space: O(n) or O(n + C)                                        |
+-------------------------------------------------------------------+
| KEY PROOF IDEA:                                                   |
|   Each element is in a "light subtree" of at most O(log n)        |
|   ancestors. Every light merge at least doubles the target set.   |
+-------------------------------------------------------------------+
| GOTCHA: Always merge SMALL into LARGE, never the reverse.         |
| GOTCHA: Heavy child = largest subtree, NOT deepest.               |
| GOTCHA: For global-array version, fully undo light contributions. |
+-------------------------------------------------------------------+

Gotchas & Debugging

1. Merging Large into Small (Reversed Direction)

The most catastrophic bug. If you accidentally iterate over the heavy child's map and insert into a light child's map, the $O (n \log n)$ guarantee is destroyed and you get $O (n^{2})$ . Always check: the outer loop iterates the smaller container.

cpp

// CORRECT: iterate small, insert into large
for (auto &[c, cnt] : *small_map)
    (*large_map)[c] += cnt;

// WRONG: iterate large, insert into small -- O(n^2)

2. Heavy Child = Largest Subtree, Not Deepest

The heavy child is the one with the largest subtree size, not the deepest descendant. Using depth instead of size breaks the logarithmic merge bound. Verify: hvy is the child $u$ maximizing sz[u].

3. Stack Overflow on Deep Trees

Recursive DFS on a line graph of $2 \times 10^{5}$ nodes overflows the default 8 MB stack. Mitigations:

Linux: ulimit -s unlimited
Contest judges: check stack size limits; some allow pragmas.
Best: convert to iterative DFS with an explicit stack.

4. Forgetting to Add the Current Vertex

After merging children's data, you must add vertex $v$ itself to the structure before recording its answer. A common off-by-one: the root's own color is missing from its answer.

5. Global State Not Fully Reset (Euler-Tour Version)

In the global-array version, when processing a light child with keep = false, you must undo all additions -- including the light child's own node and all its descendants. Missing even one node corrupts all subsequent heavy-path computations.

6. Mode Tracking During Removal

In the global-array version, decrementing a frequency can lower the global maximum frequency. Naively decrementing mx_freq is wrong because another color might still have that frequency. Solutions:

Use a second counter cnt_of_freq[f] = number of colors with frequency $f$ . When cnt_of_freq[mx_freq] drops to 0, decrement mx_freq.
Or use the map-based approach (Versions 1 & 2), which never removes elements.

7. Pointer Ownership / Memory Leaks

In the map-per-node approach, after merging a light child's map into the parent, delete the light child's map to avoid $O (n)$ leaked allocations. Alternatively, use a memory pool or unique_ptr.

8. Debugging Tips

Print the merge sizes at each vertex. Verify that every merge has |small| <= |large|. If not, the direction is wrong.
For the Euler-tour version, after processing the entire tree, verify cnt[c] == 0 for all colors -- the root's keep = true should leave everything added, or if you undo at the end, everything should be zero.
Test on a star graph ( $n - 1$ leaves): every leaf is a light child, so the root must merge $n - 1$ single-element sets. Total operations: $O (n)$ .
Test on a line graph (path): the heavy path is the entire line. Only one light child per vertex (none). Total operations: $O (n)$ .
Test on a balanced binary tree: each internal node has one heavy and one light child. Each element is re-inserted $O (\log n)$ times. Total: $O (n \log n)$ .

Mental Traps

"I process each node exactly once, so the total time is $O (n)$ ."

Each node is visited once as the current vertex, yes. But each node's data is re-inserted every time it is a light descendant being merged into a heavy sibling's structure. A node is light in at most $O (\log n)$ ancestors (because each light edge at least doubles the subtree size going up). Total re-insertions: $\sum_{v} O (\log n) = O (n \log n)$ .

  Re-insertion count for a node on a root-to-leaf path:

  Root ─── heavy ─── heavy ─── heavy ─── ... ─── leaf
                \         \
               light      light
  
  Node "leaf" is re-inserted once per LIGHT edge
  on the path from leaf to root.
  
  Max light edges on any root-to-leaf path: O(log n)
    (each light edge doubles subtree size)
  
  -> Each node re-inserted O(log n) times
  -> Total across all nodes: O(n log n)

"DSU on tree can answer path queries, not just subtree queries."

DSU on tree is fundamentally a subtree technique. At each vertex $v$ , the data structure holds exactly the multiset of values in $v$ 's subtree. If your query involves a path $u \to v$ (not a subtree), DSU on tree does not directly apply. For path queries, use HLD or centroid decomposition instead.

  What DSU on tree sees vs. what path queries need:

  Query: "distinct colors on path 4 -> 6"

       1(R)                DSU on tree at node 1:
      / \                    has ALL 7 nodes' colors
    2(B)  3(R)               -> answers subtree(1), not path(4,6)
    / \     \
  4(R) 5(B) 6(G)           Path 4->6 = {4, 2, 1, 3, 6}
  |                         != subtree of any single node
  7(G)
  
  For path queries -> use HLD or centroid decomposition

Self-Test

Before moving to practice problems, verify you can answer these without looking at the notes:

[ ] Explain the three-phase DFS order for processing a vertex $v$ : (1) recurse light children and clear, (2) recurse heavy child and keep, (3) re-add light children's data, add $v$ , record answer.
[ ] State the complexity argument: each node is re-inserted $O (\log n)$ times because each light edge at least doubles the subtree size, giving $O (n \log n)$ total work.
[ ] Given a tree with 7 nodes and colors, trace the DSU-on-tree execution showing which nodes' data is kept, cleared, and re-added at each step.
[ ] Explain why merging large into small (instead of small into large) breaks the $O (n \log n)$ guarantee and degrades to $O (n^{2})$ .
[ ] Describe when DSU on tree does NOT apply -- path queries, online queries, and non-subtree aggregates -- and name the alternative technique for each.

Practice Problems

#	Problem	Source	Difficulty	Key Idea
1	Lomsat gelral -- CF 600E	CF	Medium	Sum of most frequent colors per subtree; canonical DSU on tree
2	Tree and Queries -- CF 375D	CF	Medium	Count colors with freq $\geq k$ per subtree; BIT + DSU on tree
3	Arpa's letter-marked tree -- CF 741D	CF	Medium-Hard	Longest path in subtree with character constraint; small-to-large on sets
4	Blood Cousins -- CF 208E	CF	Medium	Count nodes at depth $d$ in subtree; Euler tour + binary search or DSU on tree
5	Distinct Colors -- CSES 1139	CSES	Medium	Distinct colors per subtree; textbook application
6	Tree Painting -- CF 1097D	CF	Medium	Re-rooting + subtree size; can be solved with small-to-large ideas
7	Propagating tree -- CF 383C	CF	Medium	Subtree updates and queries; Euler tour; extend with merging for variants
8	Nearest Marked -- CF 1009F	CF	Medium	Centroid decomposition or DSU on tree for per-subtree statistics
9	Vertical Paths -- CF 1324F	CF	Easy-Med	Decompose tree into vertical paths; warm-up for heavy path intuition
10	Johnny and Grandmaster -- CF 1361E	CF	Medium-Hard	Tree DP with merging; benefits from small-to-large set management

DSU on Tree (Small-to-Large Merging) ​

If I Had 5 Minutes ​

Mnemonic Anchor ​

Intuition ​

Visual Walkthrough ​

Detailed Step-by-Step: Node-by-Node Execution ​

The Invariant ​

When to Reach for This ​

When NOT to Use This ​

What the Code Won't Teach You ​

Algorithm ​

High-Level Steps ​

Why "Discard then Re-add"? ​

Alternative: Pointer / Reference Swapping ​

Implementation ​

The Three-Phase Order ​

Version 1: Clean Contest Template (Distinct Colors per Subtree) ​

Version 2: Swap-Based Merging with Frequency Tracking ​

Version 3: Euler-Tour + Global Array (No Extra Allocation) ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Distinct Values per Subtree ​

Pattern 2: Mode (Most Frequent Element) per Subtree ​

Pattern 3: K-th Smallest / Order Statistics per Subtree ​

Pattern 4: Path Queries via Virtual Vertex ​

Pattern 5: XOR / Sum of Distinct Elements per Subtree ​

Pattern 6: Offline Queries Mapped to Subtrees ​

Pattern 7: Combine with Centroid Decomposition ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

1. Merging Large into Small (Reversed Direction) ​

2. Heavy Child = Largest Subtree, Not Deepest ​

3. Stack Overflow on Deep Trees ​

4. Forgetting to Add the Current Vertex ​

5. Global State Not Fully Reset (Euler-Tour Version) ​

6. Mode Tracking During Removal ​

7. Pointer Ownership / Memory Leaks ​

8. Debugging Tips ​

Mental Traps ​

Self-Test ​

Practice Problems ​

Further Reading ​