Heavy-Light Decomposition

Quick Revisit
USE WHEN: Path update/query on a tree (sum, max, etc.) that needs segment tree speed
INVARIANT: Any root-to-leaf path crosses at most O(log n) light edges (chain boundaries)
TIME: O(log²n) per path query/update (log n chains × log n segment tree) | SPACE: O(n)
CLASSIC BUG: Including the LCA node in both half-path queries — double-counts the LCA's value
PRACTICE: 05-ladder-graphs

Decompose a tree into $O (\log n)$ heavy chains so that any root-to-leaf path crosses at most $O (\log n)$ chains -- enabling $O (\log^{2} n)$ path queries and updates via a segment tree over the flattened chain array.

Difficulty: [Advanced]ID: AL-21 Prerequisites: Euler Tour and LCA, Segment TreeCF Rating: 1900-2200+

Intuition

You have a tree with 7 nodes, rooted at node 1:

You need to answer queries on paths -- "what is the sum on path $4 \to 6$ ?" or "update node 5, then query the max on path $7 \to 3$ " -- interleaved with point updates. Euler tour handles subtree queries beautifully (every subtree is a contiguous range), but an arbitrary path like $7 \to 4 \to 2 \to 1 \to 3 \to 6$ is not a contiguous range in the Euler-tour array. The naive approach walks the path node by node: $O (n)$ per query. With $n = 2 \times 10^{5}$ and $q = 2 \times 10^{5}$ , that is $4 \times 10^{10}$ operations -- hopelessly slow.

We need a way to chop every tree path into a small number of contiguous array segments, each answerable by a segment tree in $O (\log n)$ .

Decompose the tree into heavy chains (paths through the heaviest child at each node) -- any root-to-leaf path crosses at most $O (\log n)$ chains, and each chain is a contiguous segment in a flat array, queryable with a segment tree.

Why "heaviest child"? At each internal node, the child with the largest subtree gets the heavy edge; all other children get light edges. A maximal path of consecutive heavy edges forms a heavy chain. Every node belongs to exactly one chain.

The magic: every time you cross a light edge going upward, the subtree size at least doubles. Since subtree size is bounded by $n$ , you cross at most $⌊ \log_{2} n ⌋$ light edges on any root-to-leaf path -- and therefore traverse at most $O (\log n)$ chains.

Assign DFS positions by always visiting the heavy child first. This guarantees each chain occupies a contiguous segment of positions. A path query from $u$ to $v$ decomposes into $O (\log n)$ chain segments, each answered by the segment tree in $O (\log n)$ . Total: $O (\log^{2} n)$ .

Visual Walkthrough

Step 1 -- Compute subtree sizes. Root the tree at node 1. Numbers in parentheses are subtree sizes.

           1(7)
          / \
       2(4)  3(2)
       / \     \
    4(2) 5(1)  6(1)
    |
    7(1)

Step 2 -- Classify edges as heavy or light. At each node, the edge to the child with the largest subtree is heavy (==); the rest are light (--).

    Heavy child selections:
      1 -> 2  (sz 4 beats sz 2)
      2 -> 4  (sz 2 beats sz 1)
      3 -> 6  (only child)
      4 -> 7  (only child)

    Edge types:
      1==2  heavy      1--3  light
      2==4  heavy      2--5  light
      3==6  heavy
      4==7  heavy

           1
          //\
        2    3
       //\    \\
      4   5    6
     //
     7

    // = heavy edge      \ = light edge

Step 3 -- Form chains. Each maximal run of heavy edges is one chain. Light children start new chains (head = themselves).

    Chain A (head=1):  1 ==> 2 ==> 4 ==> 7
    Chain B (head=5):  5
    Chain C (head=3):  3 ==> 6

Step 4 -- Assign DFS positions (heavy child first). Visit the heavy child before light children. This makes each chain contiguous in the position array.

    DFS visit order:  1, 2, 4, 7, 5, 3, 6

    Position array:
    +-----+-----+-----+-----+-----+-----+-----+
    |  1  |  2  |  4  |  7  |  5  |  3  |  6  |
    +-----+-----+-----+-----+-----+-----+-----+
    pos: 0    1     2     3     4     5     6
         |<--- chain A -------->|  B  |<- C ->|

    Subtree ranges (also contiguous!):
      subtree(1) = [0, 6]   subtree(2) = [1, 4]
      subtree(4) = [2, 3]   subtree(3) = [5, 6]

Step 5 -- Decompose a path query. Query: path from $7$ to $6$ .

    Path in tree: 7 -- 4 -- 2 -- 1 -- 3 -- 6

    Iteration 1:  hd[7]=1, hd[6]=3.  Different chains.
                  dep[hd[6]]=1 > dep[hd[7]]=0  -->  process v=6
                  seg_query(pos[3]..pos[6]) = seg_query(5, 6)   [chain C]
                  v = parent[hd[6]] = parent[3] = 1

    Iteration 2:  hd[7]=1, hd[1]=1.  SAME chain!
                  dep[1]=0 < dep[7]=3
                  seg_query(pos[1]..pos[7]) = seg_query(0, 3)   [chain A]

    Result = combine( seg_query(5,6), seg_query(0,3) )

    Total: 2 segment tree queries.  For general n, at most O(log n) queries.

Why the loop terminates: Each iteration removes one chain from the path. Since any path crosses $O (\log n)$ chains, the loop runs $O (\log n)$ times.

  PATH QUERY LOOP -- peeling off chains:

  Path: 7 -> 4 -> 2 -> 1 -> 3 -> 6

  Iter 1:  v=6 is on chain C (head=3, deeper)
           query seg[5..6], jump v = par[3] = 1
           Remaining: 7 -> ... -> 1

  Iter 2:  u=7 and v=1 share chain A (head=1)
           query seg[0..3]
           DONE -- both on same chain.

  +-- chain A (head=1) --+  +- chain C -+
  | [1]  [2]  [4]  [7]  |  | [3]  [6]  |
  | pos:0  1    2    3   |  | pos:5   6 |
  +--- query [0..3] -----+  +- q[5..6] -+

The Invariant

+----------------------------------------------------------------------+
|  1. Each heavy chain occupies a CONTIGUOUS range in the position     |
|     array (guaranteed by visiting the heavy child first in DFS).     |
|                                                                      |
|  2. Any path u -> v crosses O(log n) chains (because each light     |
|     edge at least doubles the subtree size).                         |
|                                                                      |
|  3. A segment tree built on the position array answers each chain   |
|     segment query in O(log n).  Total per path: O(log^2 n).         |
+----------------------------------------------------------------------+

When to Reach for This

Trigger phrases in problem statements:

"path queries with updates" -- HLD is the default tool.
"path sum / path max on a tree" with point or range updates.
"tree + segment tree" -- the problem is begging for HLD.
"edge values change, query path aggregate" -- HLD with edge-to-child remap.

Counter-examples (do NOT use HLD):

Subtree-only queries with no path queries -- Euler tour + BIT/segment tree is simpler and faster.
Only need LCA, no aggregates -- binary lifting or sparse table suffices.
Static tree, offline distance queries -- centroid decomposition or sparse table on Euler tour.
Problem only asks "is node $u$ an ancestor of $v$ ?" -- DFS in/out times are enough.

Choose HLD when ...

+------------------------------------------------------------------+
| Reach for HLD only when ALL of the following hold:               |
|                                                                  |
|   * The queries are on PATHS (not just subtrees) in the tree.    |
|   * The structure is dynamic: values change, or queries are      |
|     online (so an offline LCA + prefix-sum approach is closed).  |
|   * The aggregate is associative and segment-tree-friendly       |
|     (sum, max, min, gcd, matrix product, etc.).                  |
|                                                                  |
| If any of these is false, prefer the simpler tool:               |
|   * Subtree only           -> Euler tour + BIT / segment tree    |
|   * Static path queries    -> LCA + prefix sums                  |
|   * Distance-based count   -> Centroid decomposition             |
|   * "k special nodes"      -> Virtual / auxiliary tree           |
+------------------------------------------------------------------+

HLD competes with two other techniques in the chapter -- Euler tour subtree queries (lesson 01) and LCA via binary lifting (Chapter 3). The discriminator is path + dynamic + aggregate; if any of those is missing, one of the simpler tools wins.

What the Code Won't Teach You

HLD is a structural theorem, not a coding trick. The code shows you two DFS passes and a while loop. What it doesn't show is why it works: every light edge you cross going upward at least doubles the subtree size, so you cross at most $O (\log n)$ light edges on any root-to-leaf path. This is a mathematical fact about trees, independent of any code. If you lose this insight, you'll cargo-cult the template without understanding when it applies.

  WHY O(log n) CHAINS -- the doubling argument:

  When you cross a LIGHT edge upward from child c to parent p:
    sz[c] <= sz[p] / 2    (c is not the heaviest child)

  So the subtree size at least DOUBLES each time
  you cross a light edge going from leaf to root.

  leaf           root
    |              |
    c --light--> p --light--> g --light--> ...
  sz=1         sz>=2         sz>=4         sz>=8

  After k light edges: sz >= 2^k
  Since sz <= n:  k <= log₂(n)

  ==> At most O(log n) chain switches per path.

The debugging invariant is more important than the algorithm. After running dfs_hld, check end_[v] - pos_[v] == sz[v] for EVERY node. If this fails, your decomposition is broken -- full stop. This single assertion catches 90% of HLD bugs: wrong child-visit order, premature end_ assignment, forgotten heavy child. Learn to add it, verify, then remove.

Edge values require one subtle adjustment that looks trivial and is the #1 source of WA. When values live on edges (mapped to child nodes), the final segment query on the shared chain must use pos_[u] + 1 to exclude the LCA. And when u == v, skip the query entirely. This two-line fix accounts for more WA verdicts than the entire rest of HLD combined.

C++ STL Reference

Function / Class	Header	What it does	Time Complexity
`vector<int>`	`<vector>`	Adjacency list storage, HLD arrays	--
`swap(a, b)`	`<utility>`	Swap two values in the path query loop	$O (1)$
`max(a, b)` / `min(a, b)`	`<algorithm>`	Merge segment tree nodes	$O (1)$
`LLONG_MIN`	`<climits>`	Identity element for max queries	--
`scanf` / `printf`	`<cstdio>`	Fast I/O (preferred over cin/cout for tight TLs)	--

Implementation (Contest-Ready)

Version 1: Minimal Contest Template

Solves: point update on nodes + path max query (e.g., CSES Path Queries II).

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
int n, q;
vector<int> adj[MAXN];
int par[MAXN], dep[MAXN], sz[MAXN], hvy[MAXN];
int hd[MAXN], pos_[MAXN], end_[MAXN], timer;
long long a[MAXN];

// --- Iterative segment tree (max, 0-indexed) ---
long long seg[4 * MAXN];

void seg_build() {
    for (int i = 0; i < n; i++) seg[i + n] = a[i];
    for (int i = n - 1; i > 0; i--) seg[i] = max(seg[2*i], seg[2*i+1]);
}

void seg_upd(int p, long long v) {
    for (seg[p += n] = v; p >>= 1; )
        seg[p] = max(seg[2*p], seg[2*p+1]);
}

long long seg_qry(int l, int r) {
    long long res = LLONG_MIN;
    for (l += n, r += n + 1; l < r; l >>= 1, r >>= 1) {
        if (l & 1) res = max(res, seg[l++]);
        if (r & 1) res = max(res, seg[--r]);
    }
    return res;
}

// --- HLD ---
void dfs_sz(int v, int p) {
    par[v] = p; sz[v] = 1; hvy[v] = -1;
    int mx = 0;
    for (int u : adj[v]) if (u != p) {
        dep[u] = dep[v] + 1;
        dfs_sz(u, v);
        sz[v] += sz[u];
        if (sz[u] > mx) { mx = sz[u]; hvy[v] = u; }
    }
}

void dfs_hld(int v, int h) {
    hd[v] = h; pos_[v] = timer++;
    if (hvy[v] != -1) dfs_hld(hvy[v], h);
    for (int u : adj[v]) if (u != par[v] && u != hvy[v])
        dfs_hld(u, u);
    end_[v] = timer;
}

long long path_query(int u, int v) {
    long long res = LLONG_MIN;
    while (hd[u] != hd[v]) {
        if (dep[hd[u]] < dep[hd[v]]) swap(u, v);
        res = max(res, seg_qry(pos_[hd[u]], pos_[u]));
        u = par[hd[u]];
    }
    if (dep[u] > dep[v]) swap(u, v);
    res = max(res, seg_qry(pos_[u], pos_[v]));
    return res;
}

int main() {
    scanf("%d %d", &n, &q);
    vector<long long> val(n + 1);
    for (int i = 1; i <= n; i++) scanf("%lld", &val[i]);
    for (int i = 0; i < n - 1; i++) {
        int u, v; scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }
    dfs_sz(1, 0); dfs_hld(1, 1);
    for (int i = 1; i <= n; i++) a[pos_[i]] = val[i];
    seg_build();
    while (q--) {
        int t; scanf("%d", &t);
        if (t == 1) {
            int s; long long x; scanf("%d %lld", &s, &x);
            seg_upd(pos_[s], x);
        } else {
            int u, v; scanf("%d %d", &u, &v);
            printf("%lld\n", path_query(u, v));
        }
    }
}

Version 2: Explained Version with Subtree Queries

Same core logic with detailed comments. Adds subtree query support.

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 200'005;
int n, q;
vector<int> adj[MAXN];

// --- HLD arrays ---
int par[MAXN];   // parent of v (par[root] = 0, a sentinel)
int dep[MAXN];   // depth of v (root = 0)
int sz[MAXN];    // subtree size of v
int hvy[MAXN];   // heavy child of v (-1 if leaf)
int hd[MAXN];    // chain head: topmost node in v's chain
int pos_[MAXN];  // position of v in the flattened DFS-order array
int end_[MAXN];  // first position AFTER subtree(v); subtree = [pos_[v], end_[v])
int timer = 0;

// --- Iterative segment tree (max, 0-indexed positions) ---
// Leaves at seg[n..2n-1], internal at seg[1..n-1].
long long seg[4 * MAXN];

void seg_build() {
    for (int i = n - 1; i > 0; i--)
        seg[i] = max(seg[2 * i], seg[2 * i + 1]);
}

void seg_upd(int p, long long v) {
    // Set position p to value v, propagate up to root.
    seg[p += n] = v;
    while (p >>= 1)
        seg[p] = max(seg[2 * p], seg[2 * p + 1]);
}

long long seg_qry(int l, int r) {
    // Query max over [l, r] (inclusive, 0-indexed).
    long long res = LLONG_MIN;
    for (l += n, r += n + 1; l < r; l >>= 1, r >>= 1) {
        if (l & 1) res = max(res, seg[l++]);
        if (r & 1) res = max(res, seg[--r]);
    }
    return res;
}

// --- DFS 1: compute parent, depth, subtree size, heavy child ---
// Heavy child = child with largest subtree. This greedy choice
// minimizes chain switches on any root-to-leaf path.
void dfs_sz(int v, int p) {
    par[v] = p;
    sz[v] = 1;
    hvy[v] = -1;
    int mx = 0;
    for (int u : adj[v]) {
        if (u == p) continue;
        dep[u] = dep[v] + 1;
        dfs_sz(u, v);
        sz[v] += sz[u];
        if (sz[u] > mx) {
            mx = sz[u];
            hvy[v] = u;
        }
    }
}

// --- DFS 2: assign chain heads and DFS-order positions ---
// Visit heavy child FIRST so each chain occupies contiguous
// positions. Light children start new chains (head = themselves).
void dfs_hld(int v, int h) {
    hd[v] = h;
    pos_[v] = timer++;
    if (hvy[v] != -1)
        dfs_hld(hvy[v], h);     // heavy child continues this chain
    for (int u : adj[v]) {
        if (u == par[v] || u == hvy[v]) continue;
        dfs_hld(u, u);          // light child starts a new chain
    }
    end_[v] = timer;            // subtree(v) = [pos_[v], end_[v])
}

// --- Path query: max over the path u <-> v ---
// Walk u and v upward chain by chain until they share a chain.
// Always process the deeper chain head first.
long long path_query(int u, int v) {
    long long res = LLONG_MIN;
    while (hd[u] != hd[v]) {
        if (dep[hd[u]] < dep[hd[v]]) swap(u, v);
        // Query from u's chain head down to u
        res = max(res, seg_qry(pos_[hd[u]], pos_[u]));
        u = par[hd[u]];         // jump above the chain
    }
    // u and v now share a chain; query between them
    if (dep[u] > dep[v]) swap(u, v);
    res = max(res, seg_qry(pos_[u], pos_[v]));
    return res;
}

// --- Subtree query: max over all nodes in subtree(v) ---
// subtree(v) maps to [pos_[v], end_[v] - 1] in the flat array.
long long subtree_query(int v) {
    return seg_qry(pos_[v], end_[v] - 1);
}

// --- Point update: set node v's value to x ---
void node_update(int v, long long x) {
    seg_upd(pos_[v], x);
}

// --- LCA via HLD (free byproduct) ---
int lca(int u, int v) {
    while (hd[u] != hd[v]) {
        if (dep[hd[u]] < dep[hd[v]]) swap(u, v);
        u = par[hd[u]];
    }
    return dep[u] < dep[v] ? u : v;
}

int main() {
    scanf("%d %d", &n, &q);
    vector<long long> val(n + 1);
    for (int i = 1; i <= n; i++) scanf("%lld", &val[i]);
    for (int i = 0; i < n - 1; i++) {
        int u, v; scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    // Phase 1: subtree sizes and heavy children
    dep[1] = 0;
    dfs_sz(1, 0);

    // Phase 2: assign positions (heavy child first for contiguous chains)
    timer = 0;
    dfs_hld(1, 1);

    // Phase 3: place node values into flat array at HLD positions
    for (int i = 1; i <= n; i++) seg[pos_[i] + n] = val[i];
    seg_build();

    while (q--) {
        int t; scanf("%d", &t);
        if (t == 1) {
            int s; long long x; scanf("%d %lld", &s, &x);
            node_update(s, x);
        } else if (t == 2) {
            int u, v; scanf("%d %d", &u, &v);
            printf("%lld\n", path_query(u, v));
        } else {
            int v; scanf("%d", &v);
            printf("%lld\n", subtree_query(v));
        }
    }
}

Switching from max to sum: Replace LLONG_MIN with 0, replace every max(...) with +. The structure is identical.

Operations Reference

Operation	Time	Space	Notes
Build (2 DFS + seg tree)	$O (n)$	$O (n)$	Two DFS passes + segment tree build
Path query (max/sum/min)	$O (\log^{2} n)$	--	$O (\log n)$ chain segments $\times$ $O (\log n)$ per query
Path update (single node)	$O (\log n)$	--	Single segment tree point update
Path update (range, lazy)	$O (\log^{2} n)$	--	$O (\log n)$ chain segments, each a range update
Subtree query	$O (\log n)$	--	Single contiguous range in segment tree
Subtree update (range, lazy)	$O (\log n)$	--	Single contiguous range
LCA	$O (\log n)$	--	Same loop as path query, no seg tree calls
Total space	--	$O (n)$	HLD arrays + segment tree

  COMPLEXITY BREAKDOWN -- where the logs come from:

  Path query:
    O(log n) chain segments  x  O(log n) per seg tree query
    = O(log² n) total

  Subtree query:
    1 contiguous range  x  O(log n) per seg tree query
    = O(log n) total

  LCA (free byproduct):
    O(log n) chain jumps  x  O(1) per jump (no seg tree)
    = O(log n) total

Problem Patterns & Tricks

Pattern 1: Path Aggregate (Max / Sum / Min)

The bread-and-butter HLD problem. Given mutable node values, answer "what is the max/sum on path $u \to v$ ?" Decompose the path into $O (\log n)$ chain segments, query each on the segment tree.

Problems: CSES Path Queries II, CF 343D

Pattern 2: Edge Values (Remap to Child Node)

When values live on edges rather than nodes, assign each edge's value to its child endpoint (the node farther from root). The query loop is the same as Pattern 1, but the final segment query on the shared chain differs by exactly one position. Side by side:

                Node values                 Edge values (stored on child)
                -----------                 ------------------------------
chain segments  seg_qry(pos[hd[u]],         seg_qry(pos[hd[u]],
during loop:      pos[u])                     pos[u])     -- same

final query     seg_qry(pos[lca],           seg_qry(pos[lca] + 1,
(u shallower      pos[v])                     pos[v])
= LCA):
                INCLUDES the LCA node       EXCLUDES the LCA node
                                            (its slot stores the edge
                                            above LCA, not on the path)

u == v case:    seg_qry returns a[lca]      empty range -> identity
                                            (skip the call)

The reason the loop body is identical: while we are still on the deeper chain, the segment we read corresponds to a sequence of child slots, none of which is yet the LCA. Only the final shared-chain query touches the LCA, and that is the one place where the node/edge convention matters.

cpp

// Edge-values: final query on shared chain (u is shallower)
if (u != v)  // skip empty range when path endpoints coincide
    res = max(res, seg_qry(pos_[u] + 1, pos_[v]));

Problems: SPOJ QTREE, CF 593D

Pattern 3: Subtree Queries

HLD's DFS ordering places every subtree in a contiguous segment $[pos [v], end [v])$ . Subtree sum, subtree max, or "paint entire subtree" become single segment tree operations. Combine with path queries in the same problem.

Problems: CF 343D, CSES Subtree Queries

Pattern 4: LCA as Free Byproduct

The HLD path loop naturally finds the LCA: when both nodes reach the same chain, the shallower one is the LCA. No binary lifting or sparse table needed.

cpp

int lca(int u, int v) {
    while (hd[u] != hd[v]) {
        if (dep[hd[u]] < dep[hd[v]]) swap(u, v);
        u = par[hd[u]];
    }
    return dep[u] < dep[v] ? u : v;
}

Problems: CF 916E

Pattern 5: Path Update + Path Query (Lazy Propagation)

When updates affect entire paths (e.g., "add $x$ to all nodes on $u \to v$ "), combine HLD with a lazy segment tree. The decomposition into chain segments works identically; each segment gets a range update instead of a point query.

Problems: CF 343D (subtree fill + path clear)

Pattern 6: Non-Commutative Operations

For operations where order matters (e.g., matrix product along a path), the left-to-right direction of chain segments matters. When walking from $u$ to LCA, segments accumulate bottom-to-top; from LCA to $v$ , top-to-bottom. Maintain two accumulators and merge carefully.

cpp

// Collect left_acc  (u -> LCA direction, reversed segments)
// Collect right_acc (LCA -> v direction, in-order segments)
// answer = reverse(left_acc) * right_acc

Problems: CF 487E

Contest Cheat Sheet

+------------------------------------------------------------------+
| HEAVY-LIGHT DECOMPOSITION CHEAT SHEET                            |
+------------------------------------------------------------------+
| WHEN TO USE:                                                     |
|   - Path queries (max/sum/min) with updates on a tree            |
|   - Subtree + path queries in the same problem                   |
|   - Edge or node values that change dynamically                  |
|   - Need LCA + range queries in one structure                    |
+------------------------------------------------------------------+
| BUILD (2 DFS passes):                                            |
|   dfs_sz: par, dep, sz, hvy (heavy child = largest subtree)     |
|   dfs_hld: hd (chain head), pos_ (DFS pos), end_ (subtree end) |
|   Visit heavy child FIRST for contiguous chains.                 |
+------------------------------------------------------------------+
| PATH QUERY:                                                      |
|   while hd[u] != hd[v]:                                         |
|     climb deeper chain head, query its segment, jump to parent   |
|   query [pos[shallow], pos[deep]] on the shared chain            |
+------------------------------------------------------------------+
| SUBTREE:   seg_qry(pos_[v], end_[v] - 1)                        |
| EDGE VALS: skip LCA node: use pos_[u]+1 on last segment         |
| SUM<->MAX: swap LLONG_MIN/0, swap max/+                         |
+------------------------------------------------------------------+
| COMPLEXITY:                                                      |
|   Build: O(n)  Path query: O(log^2 n)  Subtree: O(log n)        |
|   Space: O(n)  LCA: O(log n)                                    |
+------------------------------------------------------------------+
| GOTCHA: heavy child visited first in dfs_hld, not random order.  |
| GOTCHA: end_[v] = timer set AFTER recursing all children.        |
+------------------------------------------------------------------+

Gotchas & Debugging

1. Stack Overflow on Deep Trees

Both dfs_sz and dfs_hld are recursive. For $n \geq 10^{5}$ with a degenerate (line) tree, the default 8 MB stack overflows. Fix options:

Linux: run with ulimit -s unlimited before executing.
Codeforces: add #pragma comment(linker, "/STACK:1000000000") (MSVC) or increase stack in submission settings.
Best: convert both DFS functions to iterative using an explicit stack.

2. Heavy Child Not Visited First

If you iterate adj[v] without special-casing hvy[v], chains will not be contiguous in the flat array. The heavy child must be the first recursive call in dfs_hld.

3. end_[] Set Too Early

end_[v] must be assigned after all children of $v$ are fully processed. Setting it before recursing into light children gives incorrect subtree ranges. Rule: end_[v] = timer is the last statement in dfs_hld.

4. Edge Values: Forgetting to Exclude LCA

When values are on edges (mapped to child nodes), the last chain segment query must skip the LCA node. Use pos_[u] + 1 as the left bound. When $u = v$ (LCA is both endpoints), skip the query entirely to avoid an invalid range.

5. Non-Commutative Merge Direction

For non-commutative operations, segments collected while climbing from $u$ are in reverse order relative to the path direction. Failing to reverse one side before combining is a common source of WA.

6. Confusing pos_[] with Node Index

pos_[v] is the segment tree position, not the node number. When placing input values into the flat array, write arr[pos_[v]] = val[v], not arr[v] = val[v].

7. Root Choice

Most problems root at node 1. If the root differs, update both calls: dfs_sz(root, 0) and dfs_hld(root, root). Using the wrong root silently corrupts all arrays.

8. Debugging Tips

Verify end_[v] - pos_[v] == sz[v] for every node. If not, the DFS order is wrong.
Print each chain segment the path query loop processes. Count them -- should be $\leq 2 \log_{2} n$ .
Test on a line graph (one long chain, worst case for depth) and a star graph (all light edges, worst case for chain count).
For WA, check: did you build the segment tree on a[pos_[v]] (HLD order), not a[v] (input order)?

Position Mapping Pitfalls

Values live at pos[v], NOT at v. When you HLD-decompose, each vertex v gets a position pos[v] in the flattened array. The segment tree is built over this flattened array. When updating vertex v's value, update position pos[v] in the segment tree.

Heavy child must be processed FIRST in the DFS that assigns positions. This ensures each heavy chain occupies a contiguous range in the flattened array.

end[v] = the last position in v's subtree. Subtree query on v = range query [pos[v], end[v]].

Mental Traps

Trap 1: "I need two DFS passes -- that must be where the bug is."

The two DFS passes (dfs_sz for sizes, dfs_hld for positions) are clean and rarely wrong in isolation. The real bug is almost always in the interface: placing values at pos_[v] instead of v when building the segment tree, or forgetting to visit the heavy child first in dfs_hld. When debugging, don't stare at the DFS code -- check the data flow between the two passes and the segment tree.

  WHERE HLD BUGS ACTUALLY HIDE:

  dfs_sz -----> dfs_hld -----> seg_build -----> path_query
    |              |               |                |
  rarely       heavy child     pos_[v] vs v     depth swap
  wrong        visited first?  in array init?   on shared chain?
               (common bug)    (common bug)     (common bug)
                   ^                ^                ^
                   |                |                |
                   +--- 90% of HLD bugs are HERE ---+

Trap 2: "HLD is overkill -- I'll just use Euler tour for path queries."

Euler tour gives you subtree ranges, not path ranges. A path like $7 \to 4 \to 2 \to 1 \to 3 \to 6$ is NOT a contiguous range in any Euler tour ordering. You need HLD (or LCA + difference tricks for simpler aggregates). The moment you see "path query with updates," Euler tour alone is insufficient.

  Euler tour array:  [1, 2, 4, 7, 5, 3, 6]
                     
  subtree(2) = [1..4] contiguous  OK
  path 7->6   = positions {3, 2, 1, 0, 5, 6}  NOT contiguous  X
                     
  HLD array:         [1, 2, 4, 7, 5, 3, 6]  (same array!)
                     
  path 7->6 = chain A segment [0..3] + chain C segment [5..6]
           = 2 contiguous segments  OK  (O(log n) segments max)

Common Mistakes

Forgetting to normalize the same-chain segment query. After the chain-climbing loop equalizes u and v onto the same chain, pos[u] can still be greater than pos[v]. HLD code passes samples but WAs on test 3 with a path-max query: the code queries [pos[u], pos[v]] assuming u is higher, but either pointer can end up deeper. Fix: after the while loop, if (pos[u] > pos[v]) swap(u, v); query(pos[u], pos[v]);

Debug Drills

Drill 1 -- HLD chain-climbing missing normalization

cpp

long long path_query(int u, int v) {
    long long res = 0;
    while (head[u] != head[v]) {
        if (depth[head[u]] < depth[head[v]]) swap(u, v);
        res += seg.query(pos[head[u]], pos[u]);
        u = parent[head[u]];
    }
    // BUG: missing normalization
    res += seg.query(pos[u], pos[v]);
    return res;
}

What's wrong?

After the while loop, u and v are on the same chain, but pos[u] might be greater than pos[v]. The segment tree query assumes l <= r.

Fix: Add if (depth[u] > depth[v]) swap(u, v); before the final query, then query (pos[u], pos[v]).

Drill 2 -- HLD DFS visits light child before heavy child

cpp

void dfs_hld(int u, int p) {
    pos[u] = timer++;
    for (int v : adj[u]) {
        if (v == p) continue;
        head[v] = (v == heavy[u]) ? head[u] : v;
        dfs_hld(v, u);  // visits children in adjacency list order
    }
}

What's wrong?

The heavy child is not visited first. In the adjacency list, the heavy child might appear at any position. If a light child is visited before the heavy child, the heavy chain won't be contiguous in the pos[] array.

Fix: Before the loop, explicitly visit the heavy child first:

cpp

if (heavy[u] != -1) {
    head[heavy[u]] = head[u];
    dfs_hld(heavy[u], u);
}
for (int v : adj[u]) {
    if (v == p || v == heavy[u]) continue;
    head[v] = v;
    dfs_hld(v, u);
}

Drill 3 -- Edge-weight path query includes the LCA vertex

cpp

long long path_query_edges(int u, int v) {
    long long res = 0;
    while (head[u] != head[v]) {
        if (depth[head[u]] < depth[head[v]]) swap(u, v);
        res += seg.query(pos[head[u]], pos[u]);
        u = parent[head[u]];
    }
    if (depth[u] > depth[v]) swap(u, v);
    res += seg.query(pos[u], pos[v]);  // includes LCA vertex
    return res;
}

What's wrong?

For edge weights (stored at the deeper node), the LCA vertex should NOT be included -- it represents the edge above the LCA. The final query should be seg.query(pos[u] + 1, pos[v]). Also need to handle the case where u == v (no edges on the path -> result is 0).

Fix:

cpp

if (u != v)
    res += seg.query(pos[u] + 1, pos[v]);

Self-Test

Before moving on, verify you can answer these without looking at the notes:

[ ] Write dfs_sz and dfs_hld from memory -- including heavy child identification, heavy-child-first visit order, and end_[v] = timer as the last statement.
[ ] Write the path_query while-loop from memory -- the chain-head depth comparison, the seg_qry bounds, the par[hd[u]] jump, and the final same-chain query with depth swap.
[ ] Explain why end_[v] - pos_[v] must equal sz[v] -- and how to use this as a runtime assertion to catch broken decompositions.
[ ] Handle edge values: explain the pos_[u] + 1 adjustment on the final chain segment and why you skip the query when u == v.
[ ] State the complexity: $O (n)$ build, $O (\log^{2} n)$ path query, $O (\log n)$ subtree query -- and explain where each $\log$ factor comes from.

Practice Problems

CF Rating	What You Should Be Able To Do
1200	Understand DFS ordering and parent arrays -- HLD isn't expected yet
1500	Know that HLD exists; solve subtree queries with Euler tour + BIT
1800	Implement HLD for path sum/max queries; handle edge vs. vertex variants
2100	Combine HLD with lazy propagation for path updates; handle edge cases like single-node paths; solve problems mixing path and subtree queries

#	Problem	Source	Difficulty	Key Idea
1	Path Queries	CSES	Easy	Path sum + node update; solvable with Euler tour or HLD
2	Path Queries II	CSES	Medium	Path max + point update; textbook HLD application
3	QTREE -- Query on a tree	SPOJ	Medium	Path max on edges; classic edge-to-child remap
4	Water Tree -- CF 343D	CF	Medium	Subtree fill + ancestor clear + point query
5	Milk Visits -- USACO Plat Dec 2019	USACO	Medium	Path queries with node types on a tree
6	Happy Tree Party -- CF 593D	CF	Medium-Hard	Path product on edges with division updates
7	Jamie and Tree -- CF 916E	CF	Hard	Virtual re-rooting + subtree/path HLD operations
8	Tourists -- CF 487E	CF	Very Hard	Block-cut tree + HLD; path min with updates

Heavy-Light Decomposition ​

Intuition ​

Visual Walkthrough ​

The Invariant ​

When to Reach for This ​

Choose HLD when ... ​

What the Code Won't Teach You ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1: Minimal Contest Template ​

Version 2: Explained Version with Subtree Queries ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Path Aggregate (Max / Sum / Min) ​

Pattern 2: Edge Values (Remap to Child Node) ​

Pattern 3: Subtree Queries ​

Pattern 4: LCA as Free Byproduct ​

Pattern 5: Path Update + Path Query (Lazy Propagation) ​

Pattern 6: Non-Commutative Operations ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

1. Stack Overflow on Deep Trees ​

2. Heavy Child Not Visited First ​

3. end_[] Set Too Early ​

4. Edge Values: Forgetting to Exclude LCA ​

5. Non-Commutative Merge Direction ​

6. Confusing pos_[] with Node Index ​

7. Root Choice ​

8. Debugging Tips ​

Position Mapping Pitfalls ​

Mental Traps ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Related Techniques ​

Heavy-Light Decomposition

Intuition

Visual Walkthrough

The Invariant

When to Reach for This

Choose HLD when ...

What the Code Won't Teach You

C++ STL Reference

Implementation (Contest-Ready)

Version 1: Minimal Contest Template

Version 2: Explained Version with Subtree Queries

Operations Reference

Problem Patterns & Tricks

Pattern 1: Path Aggregate (Max / Sum / Min)

Pattern 2: Edge Values (Remap to Child Node)

Pattern 3: Subtree Queries

Pattern 4: LCA as Free Byproduct

Pattern 5: Path Update + Path Query (Lazy Propagation)

Pattern 6: Non-Commutative Operations

Contest Cheat Sheet

Gotchas & Debugging

1. Stack Overflow on Deep Trees

2. Heavy Child Not Visited First

3. end_[] Set Too Early

4. Edge Values: Forgetting to Exclude LCA

5. Non-Commutative Merge Direction

6. Confusing pos_[] with Node Index

7. Root Choice

8. Debugging Tips

Position Mapping Pitfalls

Mental Traps

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading

Related Techniques