Skip to content

Centroid Decomposition

Quick Revisit

  • USE WHEN: Counting/querying over all paths in a tree (paths of given length, paths with given property)
  • INVARIANT: Centroid's removal leaves all subtrees of size ≤ n/2; centroid tree has depth O(log n)
  • TIME: O(n log n) total across all levels | SPACE: O(n log n)
  • CLASSIC BUG: Not recomputing subtree sizes after removing the centroid — stale sizes cause wrong centroid selection and O(n) depth instead of O(log n)
  • PRACTICE: 05-ladder-graphs

Recursively find the centroid of a tree (the node whose removal leaves no subtree larger than n/2), solve for all paths passing through it, then recurse on each remaining subtree -- reducing O(n2) tree-path problems to O(nlogn) via a divide-and-conquer that guarantees O(logn) recursion depth.

Difficulty: [Advanced]ID: AL-22 Prerequisites: DFS & Tree DFS, Euler Tour & LCACF Rating: 1900-2200+

Contest Frequency: * Specialized -- appears in tree path query problems


Intuition

You have a tree with N=7 nodes:

         1
        / \
       2   3
      / \   \
     4   5   6
     |
     7

Task: Count the number of pairs of nodes (u,v) with dist(u,v)K where K=3.

The brute-force approach: root the tree, run a DFS/BFS from every node, count pairs at distance K. That is O(N2).

For each u in {1..N}:
    BFS from u, count nodes v > u with dist(u,v) <= K

Pairs found (K=3):
  (1,2)=1  (1,3)=1  (1,4)=2  (1,5)=2  (1,6)=2  (1,7)=3
  (2,3)=2  (2,4)=1  (2,5)=1  (2,6)=3  (2,7)=2
  (3,4)=3  (3,5)=3  (3,6)=1
  (4,5)=2  (4,7)=1
  (5,7)=2

  Total pairs with dist <= 3:  17

With N=2×105 the brute force performs 4×1010 operations -- far too slow.

We need a divide-and-conquer on the tree that, like merge sort on arrays, halves the problem at each level and processes each node only O(logN) times.

Find the centroid (the node whose removal splits the tree into subtrees each of size N/2), solve for all paths that pass through it, then recurse on each subtree -- this gives O(NlogN) because each node appears in O(logN) levels of the decomposition.

Why the centroid? Any node can split the tree into components, but the centroid guarantees every component has size N/2. This means the recursion depth is at most log2N. Since each node is processed once per level it appears in, the total work across all levels is O(NlogN).

The key structural observation needs to be stated carefully. It is not true that every path "passes through some centroid at every level"; that would be vacuous. The right statement is:

For each path u-v in the original tree, there is exactly one centroid c in the centroid decomposition tree (CDT) such that c is the lowest common ancestor of u and v in the CDT. Equivalently, c is the centroid that first separates u from v as we descend the CDT. We "charge" the path to that one centroid -- it is the unique level where u and v end up in different child components for the first time, or where one of them is c itself.

This is what makes the decomposition partition all (N2) paths cleanly: each path is counted exactly once, at its CDT-LCA. No double-counting, no missed paths. The implementation enforces this by, at each centroid c, counting pairs through c but subtracting pairs whose endpoints both come from the same child subtree of c -- those will be re-counted when we recurse into that child.

Visual Walkthrough

Step 1 -- Compute subtree sizes and find the centroid. For a tree of N=7, the centroid is the node whose largest remaining component after removal is 7/2=3.

    Tree (sizes in parentheses, rooted at 1):

         1(7)
        / \
       2(4) 3(2)
      / \     \
     4(2) 5(1) 6(1)
     |
     7(1)

    Try removing each node, find max component:
      Remove 1: components {2,4,5,7}(4), {3,6}(2)  -> max = 4 > 3  NO
      Remove 2: components {1,3,6}(3), {4,7}(2), {5}(1) -> max = 3  YES
      Remove 3: components {1,2,4,5,7}(5), {6}(1) -> max = 5 > 3  NO

    Centroid = node 2.

Step 2 -- Process all paths through centroid 2. Run a DFS from node 2 into each subtree, collecting distances. Paths through 2 combine a node in one subtree with a node in another (or with 2 itself).

    Remove centroid 2. Remaining subtrees rooted at neighbors:

    Subtree A (via 1):  1 -- 3 -- 6
    Subtree B (via 4):  4 -- 7
    Subtree C (via 5):  5

    DFS from centroid 2, recording (node, distance):
      Subtree A: (1, 1), (3, 2), (6, 3)
      Subtree B: (4, 1), (7, 2)
      Subtree C: (5, 1)

    To count pairs with dist <= K=3 passing through 2:
      Sort all distances: [1, 1, 1, 2, 2, 3]
      Use two pointers or merge step to count pairs
      summing to <= K.  Also add pairs (2, v) for each v.

Step 3 -- Recurse on each subtree. Each subtree gets its own centroid, and we repeat.

    Level 0:  centroid = 2,  covers whole tree (N=7)

    Level 1:  Subtree {1,3,6} -> centroid = 3
              Subtree {4,7}   -> centroid = 4  (or 7)
              Subtree {5}     -> centroid = 5  (leaf, base case)

    Level 2:  Subtree {1} -> centroid = 1
              Subtree {6} -> centroid = 6
              Subtree {7} -> centroid = 7

Step 4 -- The centroid tree. Connect each centroid to its parent centroid (the centroid of the level above that contained it). This forms a new tree of depth O(logN).

    Centroid tree:

              2
           /  |  \
          3   4   5
         / \   \
        1   6   7

    Depth = 2   (log2(7) ~ 2.8)

    Every path in the original tree is "owned" by
    exactly one centroid in this tree.

The Invariant

+----------------------------------------------------------------------+
|  1. Every subtree processed at recursion level L has size <= N/2^L.  |
|     Therefore the recursion depth is O(log N).                       |
|                                                                      |
|  2. Every path u-v in the original tree has a unique CDT-LCA --      |
|     the deepest centroid in the centroid decomposition tree that     |
|     is an ancestor of both u and v. We charge the path to that       |
|     centroid only, by subtracting the pairs that lie entirely        |
|     inside one child subtree (they belong to that child's level).    |
|                                                                      |
|  3. Each node appears in exactly O(log N) levels. Total work across  |
|     all levels is O(N log N) if we do O(1) work per node per level.  |
+----------------------------------------------------------------------+

When to Reach for This

Trigger phrases in problem statements:

  • "Count paths in a tree with property P" (distance, sum, XOR, modular condition).
  • "Tree path queries" where paths are not rooted -- need to consider all (N2) paths.
  • "Distances in tree" -- count or enumerate pairs at specific distances.
  • "Update a node, query all nodes within distance K" -- centroid tree enables O(logN) ancestors to check.

Counter-examples (do NOT use centroid decomposition):

  • Path queries with updates on a rooted tree where paths go through ancestors -- HLD is simpler and online.
  • Subtree queries only -- Euler tour suffices.
  • Single-source distances -- BFS/DFS from that source in O(N).
  • Need to maintain the tree structure after operations -- centroid decomposition is a static decomposition.

What the Code Won't Teach You

The centroid decomposition tree is NOT the original tree. The CDT is a new tree where the parent of node v is the centroid that "captured" v -- i.e., the centroid of the smallest component containing v during the recursive decomposition. This tree has depth O(logn) and enables online path queries by walking up ancestors.

  Original tree:            Centroid decomposition tree (CDT):

       1                              2
      / \                          /  |  \
     2   3                        3   4   5
    / \   \                      / \   \
   4   5   6                    1   6   7
   |
   7                    Depth of CDT: 2  (log₂ 7 ~= 2.8)

  Key difference: node 2 is the ROOT of the CDT
  (it was the first centroid found), and node 1
  is a LEAF (it was captured last).

The "subtract overcounting" step is where most bugs hide. When you count pairs through centroid c, you also count pairs that are entirely within one child's subtree -- those paths don't actually pass through c. The standard fix: for each child subtree, subtract pairs counted within it. This doubles the per-centroid work but keeps total work linear per level.

  Why inclusion-exclusion is needed:

  Centroid c with children A, B, C:

    Pairs counted by "all distances from c":
      AxB  +  AxC  +  BxC  +  Ax{c}  +  Bx{c}  +  Cx{c}
      ^^^     ^^^     ^^^     <- genuine cross-subtree pairs OK
    
    ALSO counted (wrongly):
      AxA  +  BxB  +  CxC    <- intra-subtree pairs X
    
    Fix: subtract count_pairs(A) + count_pairs(B) + count_pairs(C)

Centroid decomposition is divide-and-conquer on trees -- the merge-sort analogy helps. In merge sort, you split the array at the midpoint, handle cross-boundary pairs, then recurse. In centroid decomposition, you split the tree at the centroid, handle cross-centroid paths, then recurse. The centroid guarantees each component has size n/2, just as the midpoint guarantees each half has size n/2. This is why the depth is O(logn).


C++ STL Reference

Function / ClassHeaderWhat it doesTime Complexity
vector<int><vector>Adjacency list, distance collection--
sort(begin, end)<algorithm>Sort distances for two-pointer countingO(klogk)
upper_bound(b, e, v)<algorithm>Binary search in sorted distancesO(logk)
accumulate(b, e, init)<numeric>Sum over collected valuesO(k)
fill(b, e, v)<algorithm>Reset removed/visited flags between levelsO(k)

Implementation (Contest-Ready)

+------------------------------------------------------------+
| CRITICAL: Mark the centroid as removed BEFORE recursing    |
| into its subtrees. If you recurse first and mark after,    |
| the subtree size calculations will include the centroid,   |
| breaking the "each subtree has size <= n/2" guarantee.     |
| This causes infinite recursion or wrong decomposition.     |
+------------------------------------------------------------+
cpp
void build(int v) {
    int c = find_centroid(v);
    removed[c] = true;        // Mark FIRST
    // Process centroid c...
    for (int u : adj[c]) {
        if (!removed[u]) {
            build(u);         // Recurse AFTER marking
        }
    }
}

Version 1: Minimal Contest Template

Solves: count pairs of nodes with distance K (e.g., CF 161D).

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
int n, K;
vector<pair<int,int>> adj[MAXN];  // {neighbor, weight}
int sz[MAXN];
bool removed[MAXN];
long long ans = 0;

int get_sz(int v, int p) {
    sz[v] = 1;
    for (auto [u, w] : adj[v])
        if (u != p && !removed[u])
            sz[v] += get_sz(u, v);
    return sz[v];
}

int get_centroid(int v, int p, int tree_sz) {
    for (auto [u, w] : adj[v])
        if (u != p && !removed[u] && sz[u] > tree_sz / 2)
            return get_centroid(u, v, tree_sz);
    return v;
}

vector<int> dists;

void get_dists(int v, int p, int d) {
    dists.push_back(d);
    for (auto [u, w] : adj[v])
        if (u != p && !removed[u])
            get_dists(u, v, d + w);
}

long long count_pairs(vector<int>& a, int limit) {
    sort(a.begin(), a.end());
    long long cnt = 0;
    int lo = 0, hi = (int)a.size() - 1;
    while (lo < hi) {
        if (a[lo] + a[hi] <= limit) {
            cnt += hi - lo;
            lo++;
        } else {
            hi--;
        }
    }
    return cnt;
}

void solve(int v) {
    int tree_sz = get_sz(v, -1);
    int c = get_centroid(v, -1, tree_sz);
    removed[c] = true;

    // Count all pairs through c with dist <= K
    vector<int> all_dists = {0};  // include centroid itself (dist 0)
    for (auto [u, w] : adj[c]) {
        if (removed[u]) continue;
        dists.clear();
        get_dists(u, c, w);
        // Subtract pairs within same subtree (inclusion-exclusion)
        ans -= count_pairs(dists, K);
        for (int d : dists) all_dists.push_back(d);
    }
    ans += count_pairs(all_dists, K);

    for (auto [u, w] : adj[c])
        if (!removed[u])
            solve(u);
}

int main() {
    scanf("%d %d", &n, &K);
    for (int i = 0; i < n - 1; i++) {
        int u, v, w;
        scanf("%d %d %d", &u, &v, &w);
        adj[u].push_back({v, w});
        adj[v].push_back({u, w});
    }
    solve(1);
    printf("%lld\n", ans);
}

Version 2: Explained Version with Centroid Tree Construction

Builds the centroid tree explicitly. Supports distance-based update/query: "update node v, then query sum of all marked nodes within distance K of node u."

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
const int LOG  = 18;  // log2(200000) ~ 17.6
int n, q;
vector<pair<int,int>> adj[MAXN];  // {neighbor, weight}

// --- Centroid decomposition arrays ---
int sz[MAXN];
bool removed[MAXN];
int cpar[MAXN];          // parent in the centroid tree
int cdepth[MAXN];        // depth in the centroid tree
int dist_to_canc[MAXN][LOG];
// dist_to_canc[v][k] = distance from v to its k-th ancestor
// in the centroid tree (0-th ancestor = v's centroid at its level)

// --- Find subtree size (ignoring removed nodes) ---
int get_sz(int v, int p) {
    sz[v] = 1;
    for (auto [u, w] : adj[v])
        if (u != p && !removed[u])
            sz[v] += get_sz(u, v);
    return sz[v];
}

// --- Find centroid of the component containing v ---
// Walk toward the child with sz > tree_sz / 2 until no such
// child exists.  That node is the centroid.
int get_centroid(int v, int p, int tree_sz) {
    for (auto [u, w] : adj[v])
        if (u != p && !removed[u] && sz[u] > tree_sz / 2)
            return get_centroid(u, v, tree_sz);
    return v;
}

// --- BFS/DFS to compute distances from centroid c to all
//     nodes in its component, storing at the appropriate
//     centroid-tree level ---
void compute_dists(int v, int p, int c, int d, int level) {
    dist_to_canc[v][level] = d;
    for (auto [u, w] : adj[v])
        if (u != p && !removed[u])
            compute_dists(u, v, c, d + w, level);
}

// --- Build centroid decomposition ---
// 1. Find centroid c of current component.
// 2. Record distances from c to all nodes in this component.
// 3. Mark c as removed.
// 4. Recurse on each remaining subtree.
void build(int v, int parent_centroid, int level) {
    int tree_sz = get_sz(v, -1);
    int c = get_centroid(v, -1, tree_sz);

    cpar[c] = parent_centroid;
    cdepth[c] = level;
    removed[c] = true;

    // Record distances from c to every node in its component
    compute_dists(c, -1, c, 0, level);

    for (auto [u, w] : adj[c])
        if (!removed[u])
            build(u, c, level + 1);
}

// --- Query structure: for each centroid, store a sorted list
//     of distances to all "active" nodes in its component.
//     Here we use a simpler approach: BIT/Fenwick per centroid
//     is overkill; instead, for each centroid, maintain a count
//     of active nodes and sum of distances. ---

// Simple approach: walk up centroid tree ancestors of query node.
// For "count marked nodes within distance K of u":
long long cnt[MAXN];  // cnt[c] = number of marked nodes in c's component
long long sum[MAXN];  // sum[c] = sum of dist_to_canc for marked nodes

// Mark node v as active
void update(int v) {
    int cur = v;
    while (cur != -1) {
        int lvl = cdepth[cur];
        cnt[cur]++;
        sum[cur] += dist_to_canc[v][lvl];
        cur = cpar[cur];
    }
}

// Query: count of marked nodes within distance K of u
// Walk up centroid tree.  At each ancestor c (level L):
//   dist(u, marked_node) = dist_to_canc[u][L] + dist_to_canc[marked_node][L]
//   We want dist_to_canc[u][L] + dist_to_canc[marked_node][L] <= K.
//   Simple version (returns total distance sum, not count):
long long query_count(int u, int K_limit) {
    long long result = 0;
    int cur = u;
    int prev = -1;
    while (cur != -1) {
        int lvl = cdepth[cur];
        int du = dist_to_canc[u][lvl];
        if (du <= K_limit) {
            // All marked nodes in cur's component at distance
            // <= K_limit - du from cur contribute.
            // Simplified: count all (exact filtering needs more
            // structure like a sorted list + binary search).
            result += cnt[cur];
        }
        // Subtract double-counted nodes from child centroid
        // (inclusion-exclusion handled by the caller or
        //  by maintaining per-child-subtree counters)
        prev = cur;
        cur = cpar[cur];
    }
    return result;
}

int main() {
    scanf("%d %d", &n, &q);
    for (int i = 0; i < n - 1; i++) {
        int u, v, w;
        scanf("%d %d %d", &u, &v, &w);
        adj[u].push_back({v, w});
        adj[v].push_back({u, w});
    }

    build(1, -1, 0);

    while (q--) {
        int type; scanf("%d", &type);
        if (type == 1) {
            int v; scanf("%d", &v);
            update(v);
        } else {
            int u, k; scanf("%d %d", &u, &k);
            printf("%lld\n", query_count(u, k));
        }
    }
}

Note on inclusion-exclusion in Version 2: The simplified query_count above illustrates the centroid-tree walk. For exact distance-filtered counting, maintain a sorted array or BIT of distances at each centroid and binary search. See Pattern 3 below for the precise technique.


ASCII Trace: Centroid Decomposition of a 7-Node Tree

Input tree:                 Adjacency list:
        0                    0: [1, 2]
       / \                   1: [0, 3, 4]
      1   2                  2: [0, 5, 6]
     / \   / \               3: [1]
    3   4 5   6              4: [1]
                             5: [2]
                             6: [2]

Step 1: Compute subtree sizes (rooted at 0)
  sz[] = [7, 3, 3, 1, 1, 1, 1]
  n=7, centroid must have all parts <= 7/2=3
  Node 0: remove -> parts {3, 3}        <= 3 OK  -> centroid = 0

Step 2: Remove centroid 0, recurse on components {1,3,4} and {2,5,6}

  Component {1,3,4}:          Component {2,5,6}:
      1                           2
     / \                         / \
    3   4                       5   6
  sz=[3,1,1]                  sz=[3,1,1]
  centroid=1                  centroid=2

Step 3: Remove 1 -> components {3},{4}    Remove 2 -> components {5},{6}
  centroid=3, centroid=4                  centroid=5, centroid=6

Centroid tree (parent -> child):
           0
          / \
         1   2
        / \ / \
       3  4 5  6

Depths in centroid tree: 0->0, 1->1, 2->1, 3->2, 4->2, 5->2, 6->2
Max depth = O(log n) = 2  (guaranteed for any tree)

Operations Reference

OperationTimeSpaceNotes
Build centroid decompositionO(NlogN)O(NlogN)O(logN) levels, O(N) work per level
Count pairs with distance KO(Nlog2N)O(N)Sorting per centroid adds a log factor
Centroid tree walk (update/query)O(logN) per op--Walk up O(logN) ancestors
Centroid tree walk + binary searchO(log2N) per opO(NlogN)Sorted distance list at each centroid level
Total space (with distance arrays)--O(NlogN)Each node stores dist to O(logN) ancestors

Problem Patterns & Tricks

Pattern 1: Count Paths with Distance K

The classic centroid decomposition problem. At each centroid c, collect distances from c to all nodes in its component. Sort the distance array and use two pointers to count pairs summing to K. Apply inclusion-exclusion: subtract pairs where both nodes belong to the same child subtree (those paths do not pass through c).

cpp
void solve(int v) {
    int c = get_centroid(v, -1, get_sz(v, -1));
    removed[c] = true;
    vector<int> all = {0};
    for (auto [u, w] : adj[c]) {
        if (removed[u]) continue;
        dists.clear();
        get_dists(u, c, w);
        ans -= count_pairs(dists, K);   // subtract intra-subtree
        for (int d : dists) all.push_back(d);
    }
    ans += count_pairs(all, K);         // add all through c
    for (auto [u, w] : adj[c])
        if (!removed[u]) solve(u);
}

Problems: CF 161D, IOI 2011 Race

Pattern 2: Path with Exact/Modular Property

Instead of distance K, find paths with exact distance K, or paths where edge-weight sum 0(modM). At each centroid, collect distances into a frequency map (array or hash map). For each node at distance d, look up how many nodes have distance Kd (or (Md)modM) in previously processed subtrees.

cpp
// Count paths with distance exactly K through centroid c
int freq[MAXK];  // freq[d] = count of nodes at distance d
void solve(int c) {
    // ...find centroid, mark removed...
    freq[0] = 1;  // centroid itself
    for (auto [u, w] : adj[c]) {
        if (removed[u]) continue;
        dists.clear();
        get_dists(u, c, w);
        // Query: for each dist d, ans += freq[K - d]
        for (int d : dists)
            if (d <= K) ans += freq[K - d];
        // Update: add this subtree's distances to freq
        for (int d : dists)
            if (d <= K) freq[d]++;
    }
    // Clean up freq (don't memset -- only clear used entries)
    // ...
}

Problems: IOI 2011 Race (exact distance, minimize edges), CF 914E (XOR on path)

Pattern 3: Online Update/Query at Distance (Centroid Tree Walk)

Build the centroid tree explicitly. Store dist_to_canc[v][level] -- the distance from v to its ancestor at each centroid-tree level. For an update at node v, walk up the centroid tree and insert dist_to_canc[v][level] into a data structure at each ancestor. For a query "sum of values within distance K of u", walk up and query each ancestor's data structure. Use inclusion-exclusion: at ancestor c with child centroid c on the path to u, subtract the contribution already counted through c.

Maintain two data structures per centroid: one for all nodes in the component, one for nodes in the "parent direction" subtree, enabling clean subtraction.

cpp
// At each centroid c, maintain:
//   all[c]  = sorted distances of activated nodes in c's component
//   sub[c]  = sorted distances of activated nodes, measured from cpar[c]
// Query at u with limit K:
long long query(int u, int K) {
    long long res = 0;
    int cur = u;
    while (cur != -1) {
        int d = dist_to_canc[u][cdepth[cur]];
        if (d <= K)
            res += count_within(all[cur], K - d);
        if (cpar[cur] != -1) {
            d = dist_to_canc[u][cdepth[cpar[cur]]];
            if (d <= K)
                res -= count_within(sub[cur], K - d);
        }
        cur = cpar[cur];
    }
    return res;
}

Problems: CF 342E (closest marked node), CF 1303G

Pattern 4: XOR / Bitwise Path Queries

When the path "distance" uses XOR (or other bitwise operations), centroid decomposition pairs naturally with a trie. At each centroid, insert XOR-prefix distances into a binary trie, then for each node in a new subtree, query the trie for the maximum XOR pairing.

Problems: CF 914E (max XOR path in tree)

Pattern 5: Counting Paths by Number of Edges

When edge weights are all 1 (unweighted tree), "distance" = number of edges. The distance array at each centroid is bounded by the component size. Use a simple frequency array of size N instead of sorting.

cpp
int freq[MAXN];
// At centroid c: freq[d]++ for each node at depth d
// Query: for target T, answer += freq[T - d] for each new node at depth d

Problems: CF 161D (pairs at exact distance K, unweighted)

Pattern 6: Minimize/Maximize Over All Paths Through Centroid

Instead of counting, find the path through each centroid that minimizes or maximizes some objective (e.g., minimum number of edges among paths with weight sum =K). Process subtrees one at a time, maintaining a running "best seen so far" structure, and update the global answer.

Problems: IOI 2011 Race (min edges for exact weight sum K)


Contest Cheat Sheet

+------------------------------------------------------------------+
| CENTROID DECOMPOSITION CHEAT SHEET                               |
+------------------------------------------------------------------+
| WHEN TO USE:                                                     |
|   - Count/find paths with a distance/weight property             |
|   - Pairs of nodes at distance <= K, == K, or XOR == K           |
|   - Update node, query all nodes within distance K               |
|   - Divide-and-conquer on trees (the "merge sort" of trees)      |
+------------------------------------------------------------------+
| FIND CENTROID:                                                   |
|   get_sz(v, -1)  then  get_centroid(v, -1, sz[v])               |
|   Centroid c: max component after removal <= N/2                 |
+------------------------------------------------------------------+
| CORE LOOP (counting pattern):                                    |
|   solve(v):                                                      |
|     c = get_centroid(v)                                          |
|     removed[c] = true                                            |
|     for each child subtree of c:                                 |
|       collect distances                                          |
|       subtract intra-subtree pairs (inclusion-exclusion)         |
|     add all-through-c pairs                                      |
|     recurse on each child subtree                                |
+------------------------------------------------------------------+
| CENTROID TREE (online queries):                                  |
|   build(v, parent_centroid, level):                              |
|     c = get_centroid(v)                                          |
|     cpar[c] = parent_centroid                                    |
|     store dist from every node to c at this level                |
|     recurse on subtrees                                          |
|   update(v): walk up cpar[], O(log N) ancestors                 |
|   query(u): walk up cpar[], query + subtract (incl-excl)        |
+------------------------------------------------------------------+
| COMPLEXITY:                                                      |
|   Build: O(N log N)    Depth: O(log N)                           |
|   Count pairs: O(N log^2 N)  (sort per level)                   |
|   Online update/query: O(log^2 N) per op (with BIT/sorted)      |
|   Space: O(N log N) for distance arrays                          |
+------------------------------------------------------------------+
| GOTCHA: Re-root get_sz before get_centroid at each level.        |
| GOTCHA: Inclusion-exclusion -- subtract same-subtree pairs.       |
| GOTCHA: Clean freq[] by undoing, NOT memset (TLE risk).          |
+------------------------------------------------------------------+

Gotchas & Debugging

1. Forgetting to Recompute Subtree Sizes

get_sz must be called before get_centroid at every recursive level. After removing previous centroids, subtree sizes change. Using stale sizes finds the wrong centroid and breaks the O(logN) depth guarantee.

2. Missing Inclusion-Exclusion

When counting pairs through centroid c, pairs where both endpoints are in the same child subtree do not actually pass through c. You must subtract these. The standard technique: process child subtrees one at a time, subtracting intra-subtree counts before merging into the global pool.

3. Cleaning Up Frequency Arrays with memset

After processing a centroid, you need to reset the frequency array. Using memset(freq, 0, sizeof freq) is O(N) per level × O(logN) levels =O(NlogN), which is fine. But if the array is indexed by edge weight (potentially up to 109), you must undo entries individually:

cpp
// GOOD: undo only entries you set
for (int d : all_dists) freq[d] = 0;

// BAD: memset over a huge array
memset(freq, 0, sizeof freq);  // TLE if array is 10^9

4. Stack Overflow on Deep Trees

Like HLD, the recursive solve, get_sz, and get_centroid functions can overflow the stack on degenerate trees (N105). Options:

  • Increase stack size: ulimit -s unlimited on Linux.
  • Convert get_sz and get_dists to iterative using an explicit stack.
  • The centroid-finding recursion itself is bounded by tree depth, not N, so it is usually safe.

5. Off-by-One in Two-Pointer Counting

When using two pointers on a sorted distance array to count pairs summing to K:

cpp
// CORRECT: lo < hi ensures we count unordered pairs, not self-pairs
while (lo < hi) {
    if (a[lo] + a[hi] <= K) { cnt += hi - lo; lo++; }
    else hi--;
}

Forgetting lo < hi (using lo <= hi) counts a node paired with itself.

6. Edge Weights vs. Hop Count

Confirm whether the problem uses weighted edges or counts hops. If unweighted, every edge has weight 1, and get_dists(u, c, 1) is correct. Accidentally using w from an adjacency list that was never initialized yields garbage.

7. Centroid Tree Parent of Root

The root centroid (level 0) has no parent. Set cpar[root_centroid] = -1 and guard all ancestor walks with while (cur != -1). Forgetting this causes an infinite loop or segfault.

8. Debugging Tips

  • Verify removed[c] is set before recursing, not after. Setting it after causes infinite recursion (the centroid is found again in its own subtrees).
  • Print the centroid at each level and the component size. Sizes should halve. If not, get_centroid is broken.
  • Test on a line graph (1-2-3-...-N): centroid should be N/2, depth should be O(logN).
  • Test on a star graph (node 1 connected to all others): centroid = node 1, all subtrees are leaves, depth = 2.

Mental Traps

"Since each centroid is processed once, the total work is O(n)."

Each centroid is processed once, yes. But each node appears in O(logn) levels of the decomposition. At level , all n nodes are distributed among the components, and the total size across all components is n. With O(logn) levels, each doing O(n) work, the total is O(nlogn). If you add per-node work that costs O(f) (like sorting), the total becomes O(nlognf).

  How each node participates in multiple levels:

  Level 0:  centroid = 2     all 7 nodes processed at this level
  Level 1:  centroids = 3,4,5   6 remaining nodes split among them
  Level 2:  centroids = 1,6,7   3 leaf nodes
  
  Node 1 appears at: level 0 (in centroid 2's component)
                      level 1 (in centroid 3's component)
                      level 2 (as its own centroid)
                  = 3 levels ~= log₂(7)
  
  Total node-visits: 7 + 6 + 3 = 16 ~= 7 x log₂(7)
  Each level sums to at most n -> O(n log n) total

"Centroid decomposition is only for distance counting problems."

The canonical example counts pairs at distance K, but the technique applies to any path property computable by combining information from two endpoints: XOR of edge weights, sum modulo M, number of edges with a specific label, minimum/maximum edge on path. Any aggregate that decomposes as f(path(u,c))f(path(v,c)) for centroid c works with centroid decomposition.

  Applications beyond distance counting:

  +-----------------------------+--------------------------------+
  | Path property               | Data structure at centroid     |
  +-----------------------------+--------------------------------+
  | dist(u,v) <= K              | Sorted array + two pointers    |
  | dist(u,v) == K              | Frequency array: freq[d]       |
  | XOR(u,v) maximized          | Binary trie of XOR-prefixes    |
  | weight sum ≡ 0 (mod M)      | Freq array indexed by residue  |
  | Min edges for exact weight  | freq[w] = min edges seen       |
  +-----------------------------+--------------------------------+

Common Mistakes

  1. Using stale subtree sizes when finding the next centroid. Centroid decomposition TLEs on a bamboo (chain) graph because subtree sizes weren't recalculated after finding the parent centroid. With stale sizes from the previous DFS, the "centroid" found is not the true centroid and the recursion depth becomes O(n) instead of O(logn). After identifying the centroid and removing it, re-run the size DFS in each remaining subtree before finding their centroids.

Debug Drills

Drill 1 -- Centroid finding: using stale subtree sizes

cpp
int centroid(int v, int parent, int tree_size) {
    for (int to : adj[v]) {
        if (to != parent && !removed[to] && sz[to] > tree_size / 2)
            return centroid(to, v, tree_size);
    }
    return v;
}
What's wrong?

sz[to] might be stale (computed for a larger tree before centroids were removed). You must recompute subtree sizes for the current component before finding the centroid:

cpp
void calc_sz(int v, int parent) {
    sz[v] = 1;
    for (int to : adj[v])
        if (to != parent && !removed[to])
            calc_sz(to, v), sz[v] += sz[to];
}

Call calc_sz at the start of each decompose call.

Drill 2 -- Double-counting paths within the same subtree

cpp
// For centroid c, count paths of length k:
int total = 0;
vector<int> all_dists;
for (int child : adj[c]) {
    if (removed[child]) continue;
    vector<int> child_dists = get_distances(child, c);
    for (int d : child_dists)
        total += count(all_dists, k - d);  // count pairs
    for (int d : child_dists)
        all_dists.push_back(d);
}
What's wrong?

This approach is actually correct -- it processes subtrees one by one, counting pairs between the current subtree and all previously processed subtrees, avoiding double-counting. But if you instead collected ALL distances first and counted all pairs, you'd double-count paths within the same subtree. The "process one subtree at a time" approach shown here is the fix. A common bug is to forget this ordering and count all pairs globally.

Drill 3 -- Forgetting to mark centroid as removed

cpp
void decompose(int v) {
    int c = find_centroid(v);
    // Missing: removed[c] = true;
    for (int to : adj[c]) {
        if (!removed[to])
            decompose(to);
    }
}
What's wrong?

Without marking c as removed, the recursive calls will see c as part of their subtrees, causing infinite recursion or incorrect subtree sizes:

cpp
removed[c] = true;

Self-Test

Before moving to practice problems, verify you can answer these without looking at the notes:

  • [ ] Write the centroid-finding function from memory -- get_sz then get_centroid, skipping removed[] nodes, walking toward the child with sz > tree_sz / 2.
  • [ ] Explain why each node appears in O(logn) levels of the decomposition -- the centroid guarantees each component has size n/2, so sizes halve at each level.
  • [ ] Implement the inclusion-exclusion for counting pairs at distance K: count all pairs through centroid c, subtract pairs within each child subtree.
  • [ ] State why removed[c] = true must be set BEFORE recursing into subtrees -- otherwise get_sz includes the centroid, breaking the size-halving guarantee.
  • [ ] Describe the centroid decomposition tree (CDT) and how to use it for online queries: walk up O(logn) ancestors, query/update at each level, apply inclusion-exclusion.

Practice Problems

CF RatingWhat You Should Be Able to Do
1200Solve tree-path problems with brute-force DFS for small trees.
1500Understand the centroid property and why recursion depth is O(logn).
1800Implement centroid decomposition; solve "count paths of length k" or "closest marked node" problems.
2100Combine centroid decomp with FFT/NTT for convolution-based path counting; handle path queries with updates via the centroid tree.
#ProblemSourceDifficultyKey Idea
1Distance in Tree -- CF 161DCFCF 1800Count pairs at exact distance K; basic centroid decomp
2Race -- IOI 2011IOIMediumMin edges for path with weight sum =K; freq array
3Xenia and Tree -- CF 342ECFCF 2100Online closest marked node via centroid tree walk
4Ciel and Gondola -- CF 321CCFCF 2100Centroid decomposition used to build balanced separator tree
5Lucky Tree -- CF 914ECFCF 2100Max XOR on path; centroid decomp + trie
6Tree Queries -- CF 1303GCFCF 2600Online update/query at distance; centroid tree + BIT
7Digit Tree -- CF 716ECFCF 2400Count paths forming number divisible by M; modular arithmetic
8Tree and Queries -- CF 375DCFCF 2000Count colors appearing K times on path (Mo on trees alt: centroid)
9Sherlock's Array -- CF 776FCFCF 2400Complex path property; centroid decomp + data structure
10CSES Fixed-Length Paths ICSESHardCount paths of exactly K edges; centroid + merge trick

Further Reading

Built for the climb from Codeforces 1100 to 2100.