Appearance
Persistent Segment Tree
Quick Revisit
- USE WHEN: Querying historical versions of an array, or finding k-th smallest element in any subarray
- INVARIANT: Each update allocates O(log n) NEW nodes along the affected root-to-leaf path; all other nodes are SHARED with the previous version via pointer (copy-on-write)
- TIME: O(log n) per update/query | SPACE: O(n log n) total across all versions
- CLASSIC BUG: Reusing node storage (array-based pool) without allocating new nodes on update — silently corrupts previous versions
- PRACTICE: 05-ladder-graphs
Maintain multiple versions of a segment tree where each update allocates
new nodes along one root-to-leaf path and shares every other subtree with the previous version (copy-on-write) -- answer historical queries and find the -th smallest in any subarray in .
Difficulty: [Advanced] -- CF 1900-2100
Prerequisites: Segment Tree, Coordinate Compression
ID: DS-14
Why is this in the graph chapter? Persistent segment trees are a data structure, not a graph algorithm, but they show up here because the most common contest applications layer them on top of tree decompositions: persistent segment trees keyed by Euler-tour position give online
-th-smallest-on-path queries, persistent BITs combined with HLD give versioned path aggregates, and persistent arrays are the standard tool for offline-to-online conversions on tree problems. The technique deserves its own lesson alongside the tree-query decompositions because almost every advanced application of it is on a tree.
Intuition
You have an array and receive queries of the form: "What is the
a = [3, 1, 4, 1, 5, 9] (indices 1..6)
Query: 2nd smallest in a[2..5] = {1, 4, 1, 5} --> answer is 1
Query: 3rd smallest in a[1..4] = {3, 1, 4, 1} --> answer is 3Naive approach: For each query, extract the subarray, sort it, return the
| Approach | Per query | |
|---|---|---|
| Sort subarray | ||
| Merge sort tree + binary search | ||
| Persistent segment tree |
The merge sort tree approach works but requires binary search over the answer plus a count query per candidate, giving an extra
We need a way to "subtract" the frequency information of prefix
"Build a segment tree over values for each prefix of the array; instead of storing
Think of it like version control for files. When you commit a change to one file in a repository, Git does not copy every file -- it reuses the unchanged blobs and only stores the diff. A persistent segment tree does the same: when one path from root to leaf changes, all other subtrees are shared via pointers.
The analogy is nearly exact: each "commit" (version) is a new root pointer, the "files" are leaf nodes, and "directories" are internal nodes. The only place it breaks down is that Git uses content-addressable hashing while we use direct pointers.
Visual Walkthrough
We build a segment tree over a value range a one at a time, creating a new version after each insertion.
Compressed values (for simplicity, values happen to already be in
a = [3, 1, 4, 1, 5, 9] --> compressed: [2, 1, 3, 1, 4, 5]
sorted distinct: {1, 3, 4, 5, 9} --> mapped to {1, 2, 3, 4, 5}
original: 1 -> 1, 3 -> 2, 4 -> 3, 5 -> 4, 9 -> 5Value range segment tree has nodes covering
Version 0 (empty tree):
root0
[1..5]=0
/ \
[1..3]=0 [4..5]=0
/ \ / \
[1..2]=0 [3]=0 [4]=0 [5]=0
/ \
[1]=0 [2]=0Step 1: Insert compressed value 2 (original 3). Create version 1.
Only the path containing value 2 changes: root -> [1..3] -> [1..2] -> [2]. We create 4 new nodes (marked *). All other subtrees are shared.
root0 root1*
[1..5]=0 [1..5]=1*
/ \ / \
[1..3]=0 [4..5]=0 [1..3]=1* [4..5]=0 <-- shared!
/ \ / \
[1..2]=0 [3]=0 [1..2]=1* [3]=0 <-- shared!
/ \ / \
[1]=0 [2]=0 [1]=0 [2]=1*
^-- shared! ^-- shared!Step 2: Insert compressed value 1 (original 1). Create version 2.
root1 root2*
[1..5]=1 [1..5]=2*
/ \ / \
[1..3]=1 [4..5]=0 [1..3]=2* [4..5]=0 <-- shared!
/ \ / \
[1..2]=1 [3]=0 [1..2]=2* [3]=0 <-- shared!
/ \ / \
[1]=0 [2]=1 [1]=1* [2]=1
^-- shared from v1!Memory growth after all 6 insertions:
Version: v0 v1 v2 v3 v4 v5 v6
New nodes: 8 4 4 4 4 4 4
Total: 8 12 16 20 24 28 32
+--+--+--+--+--+--+--+
| v0 base tree (8) | = O(m) initial nodes
+--+--+--+--+--+--+--+
|v1|v2|v3|v4|v5|v6| | = 6 x O(log m) new nodes
+--+--+--+--+--+--+ |
| shared subtrees | = most of the pool
+----------------------+
Total pool: O(m + n log m) ~= 8 + 6x4 = 32 nodes
(vs 8x7 = 56 if we copied whole tree each time)NODE POOL LAYOUT -- persistent segment tree memory:
Pool index: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...
├──── v0 base tree ────┤├─ v1 ─┤├─ v2 ─┤├─ v3 ─┤
│ 8 nodes (full tree) │ 4 new │ 4 new │ 4 new │
└──────────────────────┘└───────┘└──────┘└──────┘
roots[0] = 1 (root of base tree)
roots[1] = 9 (root of version 1, shares nodes 2-8)
roots[2] = 13 (root of version 2, shares nodes from v0 and v1)
roots[3] = 17 (root of version 3)
Each new version adds O(log m) nodes to the pool.
Total after n updates: 8 + 4n nodes (for m=8 values).
┌──────────────────────────────────────────────────┐
│ NEVER modify pool[1..8] -- they belong to v0. │
│ NEVER modify pool[9..12] -- they belong to v1. │
│ Old versions are IMMUTABLE. │
└──────────────────────────────────────────────────┘After inserting all 6 elements we have versions
Answering: "2nd smallest in
Prefix
Walk both trees in parallel from root:
Step 1: At root [1..5]
left_count = v5.left.count - v1.left.count
= 3 - 1 = 2
k = 2. Since k <= left_count (2 <= 2), go LEFT.
Step 2: At [1..3]
left_count = v5.[1..2].count - v1.[1..2].count
= 2 - 1 = 1
k = 2. Since k > left_count (2 > 1), go RIGHT. k = k - 1 = 1.
Step 3: At [3..3] -- leaf!
Value 3 in compressed space = original value 4.
Answer: 4.Verify:
Wait -- let me recheck the compression.
Let me redo the walk carefully. Versions:
Step 1: root [1..5]
left_count = v5.[1..3].cnt - v1.[1..3].cnt = 3 - 1 = 2
k = 2 <= 2, go LEFT into [1..3].
Step 2: node [1..3]
left_count = v5.[1..2].cnt - v1.[1..2].cnt = 2 - 1 = 1
k = 2 > 1, go RIGHT into [3..3]. k = 2 - 1 = 1.
Step 3: leaf [3..3]
Compressed value 3 maps to original value 4.Sorted subarray
After inserting
After inserting
VERSION COMPARISON -- "subarray tree" via subtraction:
Array a = [3, 1, 4, 1, 5], values compressed to [1..5]
Version 2 (prefix [1..2]): Version 5 (prefix [1..5]):
[1..5]=2 [1..5]=5
/ \ / \
[1..3]=2 [4..5]=0 [1..3]=3 [4..5]=2
/ \ / \ / \
[1..2]=2 [3]=0 [1..2]=2 [3]=1 [4]=1 [5]=1
Difference (subarray [3..5] = {4, 1, 5}):
[1..5]: 5-2 = 3
/ \
[1..3]: 3-2=1 [4..5]: 2-0=2
/ \ / \
[1..2]:0 [3]:1 [4]:1 [5]:1
k=2 query on [3..5]: left_count=1, k=2>1 -> go right
at [4..5]: left_count([4])=1, k=1<=1 -> go left
leaf [4] -> original value 5. Sorted {1,4,5}, 2nd = 4...
(Verify by hand: subarray [3..5] = {4,1,5}, sorted = {1,4,5}, k=2 -> 4 OK)Brute force: sort subarray (
The Invariant
+---------------------------------------------------------------+
| INVARIANT: Version i of the persistent segment tree equals |
| the frequency-count tree built from {a[1], a[2], ..., a[i]}. |
| Versions i and i-1 share all nodes EXCEPT the O(log n) nodes |
| on the root-to-leaf path for the inserted value. |
| Therefore: cnt_in_range(l, r, val_range) = |
| version[r].query(val_range) - version[l-1].query(val_range) |
+---------------------------------------------------------------+This is the "prefix sum over segment trees" idea. Just as prefix[r] - prefix[l-1] gives the sum of a subarray, differencing two versions gives the frequency counts for any contiguous subarray. The k-th smallest walk exploits this by choosing left/right at each level based on the difference count.
Why is the invariant maintained? Each insertion adds exactly one occurrence of a value, updating only the root-to-leaf path for that value's position. All sibling subtrees are shared by pointer, so they carry identical counts. No node from a previous version is ever mutated.
When to Reach for This
Trigger phrases in problem statements:
- "
-th smallest in subarray " -> persistent segment tree - "answer queries about version
of the data structure" -> persistence - "undo the last update" or "time-travel queries" -> persistence
- "
, queries online" + "order statistics on subarrays" - "number of distinct values in range less than
" -> persistence on values
Counter-examples (looks like persistence but isn't):
- "k-th smallest in the whole array with updates" -- that is an order-statistics tree or a BIT with binary search. No need for persistence.
- "
-th smallest in subarrays, but queries are offline and " -- Mo's algorithm with an order-statistics structure may suffice and is simpler to implement. - "Range update + range query" -- that is a lazy segment tree, not persistence. Persistence and lazy propagation interact badly (see Gotchas).
What the Code Won't Teach You
Persistence is a mindset, not a data structure. The code creates new nodes and shares old subtrees -- that's the mechanism. The insight is deeper: any tree-shaped data structure can be made persistent by copying only the O(log n) nodes on the root-to-update path. This same principle applies to persistent tries, persistent balanced BSTs, and persistent heaps. Once you see the pattern -- "copy the path, share the rest" -- you'll recognize persistence opportunities everywhere.
THE "COPY THE PATH" PRINCIPLE:
Version i: Version i+1:
R R' <- new root
/ \ / \
A B A' B <- B shared!
/ \ / \
C D C D' <- C shared!
^
only D changed -> new D'
New nodes: R', A', D' (O(log n) = 3 nodes)
Shared: B, C (O(n) nodes reused)
Old version i still works through root RThe prefix-sum analogy is the entire algorithm. Just as prefix[r] - prefix[l-1] gives you a subarray sum, version[r] - version[l-1] gives you frequency counts for any subarray. If you understand prefix sums, you understand persistent segment trees -- the only new idea is that each "prefix" is an entire segment tree rather than a single number.
Memory is the real enemy, not time. Each update creates
Persistence on trees opens up path queries. When your data lives on a tree (not an array), build persistent segment trees along the DFS: the version at node
PERSISTENCE ON TREES -- path query structure:
Tree: Persistent seg trees:
1 v1 = insert(empty, val[1])
/ \ v2 = insert(v1, val[2])
2 3 v3 = insert(v1, val[3])
/ \ v4 = insert(v2, val[4])
4 5 v5 = insert(v2, val[5])
Query: k-th smallest on path 4->3
LCA(4,3) = 1, parent(LCA) = none (root)
Count in [val_range] on path 4->3:
= cnt(v4) + cnt(v3) - cnt(v1) - cnt(v_parent(1))
= cnt(v4) + cnt(v3) - cnt(v1) - 0
Same prefix-subtraction idea, but on a tree!C++ STL Reference
No STL container directly implements persistence. We build from scratch.
| Component | Usage | Notes |
|---|---|---|
new / memory pool | Allocate nodes dynamically | Pool allocation is 3-5x faster than new |
vector<int> | Store root pointers per version | roots[i] = root of version |
sort + unique + lower_bound | Coordinate compression | Map values to |
<algorithm> | lower_bound for decompression | Recover original value from compressed |
Implementation
Minimal Contest Template
cpp
#include <bits/stdc++.h>
using namespace std;
const int MAXN = 200'005;
const int MAXNODES = MAXN * 20; // n * ~20 nodes (n log n)
int lc[MAXNODES], rc[MAXNODES], cnt[MAXNODES];
int tot = 0;
int roots[MAXN];
int build(int l, int r) {
int p = ++tot;
cnt[p] = 0;
if (l == r) return p;
int m = (l + r) / 2;
lc[p] = build(l, m);
rc[p] = build(m + 1, r);
return p;
}
int update(int prev, int l, int r, int pos) {
int p = ++tot;
lc[p] = lc[prev]; rc[p] = rc[prev]; cnt[p] = cnt[prev] + 1;
if (l == r) return p;
int m = (l + r) / 2;
if (pos <= m) lc[p] = update(lc[prev], l, m, pos);
else rc[p] = update(rc[prev], m + 1, r, pos);
return p;
}
int query(int u, int v, int l, int r, int k) {
if (l == r) return l;
int m = (l + r) / 2;
int d = cnt[lc[v]] - cnt[lc[u]];
if (k <= d) return query(lc[u], lc[v], l, m, k);
else return query(rc[u], rc[v], m + 1, r, k - d);
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, q;
cin >> n >> q;
vector<int> a(n), sorted_a;
for (int i = 0; i < n; i++) { cin >> a[i]; sorted_a.push_back(a[i]); }
sort(sorted_a.begin(), sorted_a.end());
sorted_a.erase(unique(sorted_a.begin(), sorted_a.end()), sorted_a.end());
int m = sorted_a.size();
auto compress = [&](int x) {
return int(lower_bound(sorted_a.begin(), sorted_a.end(), x) - sorted_a.begin());
};
roots[0] = build(0, m - 1);
for (int i = 0; i < n; i++)
roots[i + 1] = update(roots[i], 0, m - 1, compress(a[i]));
while (q--) {
int l, r, k;
cin >> l >> r >> k; // 1-indexed
int idx = query(roots[l - 1], roots[r], 0, m - 1, k);
cout << sorted_a[idx] << '\n';
}
return 0;
}Explained Version
cpp
#include <bits/stdc++.h>
using namespace std;
// --- Memory pool for persistent nodes ---
// Each version creates O(log m) new nodes. Total nodes: O(n log m).
// With n = 2*10^5 and m = 2*10^5, log m ~ 18, so ~3.6*10^6 nodes.
// We allocate 20 * MAXN to be safe.
const int MAXN = 200'005;
const int MAXNODES = MAXN * 20;
int lc[MAXNODES]; // left child index
int rc[MAXNODES]; // right child index
int cnt[MAXNODES]; // count of values in this subtree
int tot = 0; // next free node index
// roots[i] = root node index of version i.
// Version 0 = empty tree.
// Version i = tree after inserting a[0], a[1], ..., a[i-1].
int roots[MAXN];
// Build the initial empty tree over value range [l, r].
// Every node has count 0. Returns the root of this subtree.
int build(int l, int r) {
int p = ++tot;
cnt[p] = 0;
if (l == r) return p;
int m = (l + r) / 2;
lc[p] = build(l, m);
rc[p] = build(m + 1, r);
return p;
}
// Insert one occurrence of value at position `pos` in the value range.
// `prev` is the root of the previous version's subtree.
// Returns the root of the new version's subtree.
// Only O(log m) new nodes are created; all other children are shared.
int update(int prev, int l, int r, int pos) {
int p = ++tot; // allocate new node
// Start by copying everything from prev
lc[p] = lc[prev];
rc[p] = rc[prev];
cnt[p] = cnt[prev] + 1; // one more element in this range
if (l == r) return p; // leaf reached
int m = (l + r) / 2;
// Recurse only into the side containing `pos`.
// The other side is shared with `prev` (already copied above).
if (pos <= m)
lc[p] = update(lc[prev], l, m, pos);
else
rc[p] = update(rc[prev], m + 1, r, pos);
return p;
}
// Find the k-th smallest value in the subarray a[ql..qr] (1-indexed).
// We walk versions u = roots[ql-1] and v = roots[qr] in parallel.
// At each node, the "subarray count" in the left child is:
// cnt[lc[v]] - cnt[lc[u]]
// If k <= that count, the answer is in the left half; otherwise right.
int query(int u, int v, int l, int r, int k) {
if (l == r) return l; // found the value (compressed index)
int m = (l + r) / 2;
int left_count = cnt[lc[v]] - cnt[lc[u]];
if (k <= left_count)
return query(lc[u], lc[v], l, m, k);
else
return query(rc[u], rc[v], m + 1, r, k - left_count);
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, q;
cin >> n >> q;
vector<int> a(n);
vector<int> sorted_a;
for (int i = 0; i < n; i++) {
cin >> a[i];
sorted_a.push_back(a[i]);
}
// --- Coordinate compression ---
// Map values to [0, m-1] so the segment tree has O(m) leaves.
sort(sorted_a.begin(), sorted_a.end());
sorted_a.erase(unique(sorted_a.begin(), sorted_a.end()), sorted_a.end());
int m = sorted_a.size();
auto compress = [&](int x) -> int {
return int(lower_bound(sorted_a.begin(), sorted_a.end(), x)
- sorted_a.begin());
};
// --- Build version 0 (empty) and versions 1..n ---
roots[0] = build(0, m - 1);
for (int i = 0; i < n; i++)
roots[i + 1] = update(roots[i], 0, m - 1, compress(a[i]));
// --- Answer queries ---
while (q--) {
int l, r, k;
cin >> l >> r >> k; // 1-indexed, find k-th smallest in a[l..r]
int compressed_idx = query(roots[l - 1], roots[r], 0, m - 1, k);
cout << sorted_a[compressed_idx] << '\n'; // decompress
}
return 0;
}Operations Reference
| Operation | Time | Space (new nodes) |
|---|---|---|
| Build empty tree (version 0) | ||
| Insert one value (new version) | ||
| Build all | ||
| Count of values |
Where
Total memory: MAXNODES
QUERY WALKTHROUGH -- k-th smallest in a[2..5]:
roots[1] (prefix [1..1]) roots[5] (prefix [1..5])
R₁ R₅
/ \ / \
[1..3]=1 [4..5]=0 [1..3]=3 [4..5]=2
/ \ / \
[1..2]=1 [3]=0 [1..2]=3 [3]=0
Difference tree (subarray [2..5]):
R₅ - R₁
/ \
[1..3]=2 [4..5]=2
/ \
[1..2]=2 [3]=0
k=2: left_count=2, k<=2 -> go LEFT
k=2: left_count([1])=2, k<=2 -> go LEFT
Leaf [1] -> compressed value 1 -> original value 1 OKProblem Patterns & Tricks
Pattern 1: K-th Smallest in Subarray
The canonical application. Build persistent tree on value-coordinate-compressed array. Answer each query by walking two versions.
cpp
// query(roots[l-1], roots[r], 0, m-1, k)
// returns compressed index -> decompress with sorted_a[idx]Problems: MKTHNUM (SPOJ), CF 1042D
Pattern 2: Count of Values in Range within Subarray
Instead of walking to find the
cpp
int count(int u, int v, int l, int r, int ql, int qr) {
if (ql > r || qr < l) return 0;
if (ql <= l && r <= qr) return cnt[v] - cnt[u];
int m = (l + r) / 2;
return count(lc[u], lc[v], l, m, ql, qr)
+ count(rc[u], rc[v], m + 1, r, ql, qr);
}Problems: CF 620E
Pattern 3: Online Queries (Forced Online)
Some problems force online processing by encrypting queries with the previous answer. Merge sort tree + binary search handles offline only. Persistent segment tree handles online naturally since each query is
Problems: CF 1535F
Pattern 4: Persistence on Trees (Path Queries)
Given a weighted tree, find the
query on path u..v:
answer = walk(roots[u], roots[v], roots[lca], roots[parent[lca]], ...)The count for path
Problems: COT (SPOJ), CF 893F
Pattern 5: Version Rollback / Undo
Maintain a stack of root pointers. "Undo" = pop the latest root. "Query version roots[t]. This is trivial with persistence since old roots are never invalidated.
cpp
// undo last update
version--;
// query version t
int ans = query(roots[t - 1], roots[t], 0, m - 1, k);Problems: CF 707D
Pattern 6: Persistent + Difference for Interval Frequency
Need the number of elements equal to
cpp
int freq = count(roots[l-1], roots[r], 0, m-1, cx, cx);
// where cx = compressed index of xThis generalizes to "how many elements in
Problems: CF 1422F
Contest Cheat Sheet
+---------------------------------------------------------------+
| PERSISTENT SEGMENT TREE CHEAT SHEET |
+---------------------------------------------------------------+
| When to use: |
| - k-th smallest in subarray [l,r], online |
| - queries on historical versions of a data structure |
| - prefix-subtraction trick on segment trees |
| - count values in [a,b] within subarray [l,r] |
+---------------------------------------------------------------+
| Memory: |
| const int MAXNODES = MAXN * 20; |
| int lc[MAXNODES], rc[MAXNODES], cnt[MAXNODES]; |
| int roots[MAXN]; |
+---------------------------------------------------------------+
| Core calls: |
| roots[0] = build(0, m - 1); |
| roots[i+1] = update(roots[i], 0, m-1, compressed_val); |
| kth = query(roots[l-1], roots[r], 0, m-1, k); |
+---------------------------------------------------------------+
| Complexity: |
| Build: O(n log m) time, O(n log m) space |
| Query: O(log m) time, O(1) space |
| m = distinct values (<= n after compression) |
+---------------------------------------------------------------+Gotchas & Debugging
Memory: The #1 Killer
Persistent trees use far more memory than regular segment trees. Each update creates
- Rule of thumb: allocate
nodes. For , that is nodes 12 bytes MB. - If the memory limit is 256 MB, you are fine. If it is 64 MB, consider using
shortor restructuring. - Never use
new/mallocper node. Use the flat array pool shown in the template. Dynamic allocation has massive overhead and will TLE or MLE.
Off-by-One in Versions
roots[0] = empty tree
roots[i] = tree after inserting a[0], a[1], ..., a[i-1]
Query a[l..r] (1-indexed):
use roots[l-1] and roots[r] <-- NOT roots[l] and roots[r]Getting this wrong gives WA on almost every test case. Double-check your indexing convention (0-indexed vs 1-indexed) before submitting.
Coordinate Compression Mistakes
- Forgetting to
sort+uniquebefore compressing. - Using
lower_boundon the unsorted array. - Off-by-one when the compressed range is
but you build the tree on . Pick one convention and stick to it.
Persistence + Lazy Propagation = Pain
Combining persistence with lazy propagation is possible but tricky. When you push down a lazy tag, you create new children -- but the old version's children must remain untouched. This means pushdown itself must create new nodes. Memory usage roughly doubles. Avoid this unless the problem specifically requires it.
Node Counter Overflow
The tot counter must not exceed MAXNODES. If it does, you get silent memory corruption. Add a debug assert:
cpp
int p = ++tot;
assert(tot < MAXNODES); // remove before submission or use #ifdefPersistence Invariant
NEVER modify nodes from old versions. When updating, always create NEW nodes. If you modify an existing node, you corrupt all versions that share it. This is the fundamental invariant of persistence.
Kth smallest with two versions: The answer uses query(roots[r]) - query(roots[l-1]). Specifically, at each node: left_count = versions[r].left.cnt - versions[l-1].left.cnt. If left_count >= k, recurse left; else recurse right with k -= left_count.
Common Debugging Checklist
- Print
totafter building all versions -- is it within bounds? - Verify coordinate compression: print compressed values and check they are in
. - For a single query, manually trace the walk and compare with brute force.
- Check that
buildinitializes all counts to 0 (not garbage). - If MLE: reduce
MAXNODES, useintnotlong longfor counts.
Mental Traps
Trap 1: "I need to copy the entire tree for each version."
This is the
WRONG: Copy whole tree RIGHT: Copy only the path
n=8, update value at pos 3: Same update:
Create 15 new nodes (all!) Create 4 new nodes (path only)
[new] [new]
/ \ / \
[new] [new] [new] [shared]
/ \ / \ / \
[new][new][new][new] [new][shared]
/ \ / \ / \ / \ / \
8 leaves, all new [shared][new]
Memory: O(n) per update Memory: O(log n) per update
Total: O(n²) -> MLE Total: O(n log n) -> OKTrap 2: "Persistence and lazy propagation work together naturally."
They don't. Lazy propagation pushes tags down to children -- but in a persistent tree, those children might be shared with other versions. Pushing to a shared child corrupts all versions sharing that child. To combine persistence with lazy propagation, every pushdown must create new child nodes (doubling memory usage). Avoid this unless the problem absolutely requires it.
THE LAZY + PERSISTENCE CONFLICT:
Version 1 and Version 2 share child node X:
v1_root ──-> ... ──-> [X] <-── ... <-── v2_root
Lazy pushdown on v2 modifies X in-place:
v1_root ──-> ... ──-> [X'] <-── ... <-── v2_root
^
CORRUPTED! v1 now sees v2's update.
Fix: pushdown creates NEW children
v1_root ──-> ... ──-> [X] [X'] <-── ... <-── v2_root
^ ^
untouched new copy with pushed tags
Cost: 2x memoryTrap 3: "I can use coordinate compression with 1-indexed AND 0-indexed interchangeably."
Pick one convention and stick with it. If your segment tree covers values build, update, and query produces subtle off-by-one errors that pass most test cases but fail on edge values (the minimum or maximum in the array).
COORDINATE COMPRESSION -- index convention trap:
Values: [3, 1, 4, 1, 5]
Sorted unique: [1, 3, 4, 5] -> m = 4
Convention A (0-indexed): Convention B (1-indexed):
1 -> 0 1 -> 1
3 -> 1 3 -> 2
4 -> 2 4 -> 3
5 -> 3 5 -> 4
build(0, m-1) = build(0,3) build(1, m) = build(1,4)
X MIXING: compress to 0-indexed, build(1, m)
-> leaf for value "1" is at position 0
-> but tree covers [1..4], so position 0 doesn't exist!
-> silent wrong answer on queries involving minimum valueTrap 4: "Persistent segment tree always beats merge sort tree."
Not always. Merge sort tree uses
PERSISTENT SEG TREE vs MERGE SORT TREE:
┌─────────────────────┬─────────────────┬──────────────────┐
│ │ Persistent Seg │ Merge Sort Tree │
├─────────────────────┼─────────────────┼──────────────────┤
│ Query time │ O(log n) │ O(log³ n) │
│ Space │ O(n log n) │ O(n log n) │
│ Online queries? │ OK YES │ X harder │
│ Implementation │ Complex (pools) │ Simpler │
│ Debugging │ Harder │ Easier │
│ Lazy propagation │ Painful + 2xmem │ N/A │
└─────────────────────┴─────────────────┴──────────────────┘
Rule of thumb:
Online + tight TL -> persistent seg tree
Offline + lenient TL -> merge sort tree may sufficeCommon Mistakes
- Underestimating the node pool size. Persistent segment tree RE on the judge even though small tests pass and the pool seems "large enough." The trap:
buildfor the initial empty tree allocatesnodes (a full empty tree), and then each of the insertions allocates more -- so a pool of 20 * nis not enough. Size the pool asMAXN * 2 + MAXN * 20(or just40 * MAXNto be safe), and always calculate the exact bound:.
Debug Drills
Drill 1 -- Persistent update forgets to refresh cnt
cpp
int update(int prev, int lo, int hi, int idx) {
int cur = new_node();
tree[cur] = tree[prev]; // copy previous node
if (lo == hi) {
tree[cur].cnt++;
return cur;
}
int mid = (lo + hi) / 2;
if (idx <= mid)
tree[cur].left = update(tree[cur].left, lo, mid, idx);
else
tree[cur].right = update(tree[cur].right, mid + 1, hi, idx);
// BUG: forgot to update cnt
return cur;
}What's wrong?
After recursively updating the left or right child, the cnt of the current node is not recalculated. It still holds the old count from prev.
Fix: Add before return cur;:
cpp
tree[cur].cnt = tree[tree[cur].left].cnt + tree[tree[cur].right].cnt;Drill 2 -- k-th smallest query with wrong left root
cpp
int kth(int l_root, int r_root, int lo, int hi, int k) {
if (lo == hi) return lo;
int mid = (lo + hi) / 2;
int left_count = tree[tree[r_root].left].cnt - tree[tree[l_root].left].cnt;
if (k <= left_count)
return kth(tree[l_root].left, tree[r_root].left, lo, mid, k);
else
return kth(tree[l_root].right, tree[r_root].right, mid + 1, hi, k - left_count);
}
// Called as: kth(root[l], root[r], 0, MAX_VAL, k)What's wrong?
The left root should be root[l-1], not root[l]. The persistent tree version root[i] contains elements root[r] - root[l-1].
Fix: Call as kth(root[l-1], root[r], 0, MAX_VAL, k).
Drill 3 -- Node pool sized too small
cpp
const int MAXN = 200005;
const int MAXLOG = 18;
struct Node { int left, right, cnt; } tree[MAXN * MAXLOG];
int node_cnt = 0;
int new_node() { return node_cnt++; }
// build empty tree, then n insertions
int root[MAXN];
root[0] = build(0, MAX_VAL); // allocates ~2*MAX_VAL nodes
for (int i = 1; i <= n; i++)
root[i] = update(root[i-1], 0, MAX_VAL, a[i]); // allocates ~MAXLOG nodes eachWhat's wrong?
The pool size MAXN * MAXLOG = build function alone allocates MAX_VAL is
Fix: Size the pool as 2 * MAX_VAL + MAXN * MAXLOG + extra, or skip building an explicit empty tree and let null pointer (index 0) represent empty nodes with cnt = 0.
Self-Test
Before moving on, verify you can answer these without looking at the notes:
- [ ] Explain "copy the path, share the rest" -- draw a persistent segment tree after one update, marking which nodes are new and which are shared from the previous version.
- [ ] Write the
updatefunction from memory -- allocating a new node, copying left/right children fromprev, incrementingcnt, and recursing into only the side containing the target position. - [ ] Solve "k-th smallest in subarray
" using two root pointers -- walk roots[l-1]androots[r]in parallel, usingcnt[lc[v]] - cnt[lc[u]]to decide left vs right at each level. - [ ] Calculate the memory budget: for
elements and updates, how many nodes total, how many bytes, and what MAXNODESshould be. - [ ] State when persistence is NOT needed -- offline queries (use Mo's algorithm), whole-array order statistics (use BIT + binary search), or range update + range query (use lazy segment tree).
- [ ] Given a persistent segment tree on the value range
with 5 versions, draw the node-sharing diagram after inserting values 3, 1, 4, 1, 5 -- mark shared vs. new nodes for version 3. - [ ] Compute the exact node pool size needed for
elements: initial tree has nodes (where = next power of 2 >= distinct values), plus new nodes from updates. Verify this fits in 256 MB. - [ ] Trace the k-th smallest query for
on subarray through versions and , showing the left_count comparison at each tree level. - [ ] Explain why persistence + lazy propagation doubles memory: each pushdown creates 2 new child nodes instead of modifying shared children in-place.
- [ ] Describe how to build persistent segment trees on a rooted tree for path queries -- what is the version ordering, and how does the LCA formula generalize the prefix subtraction?
Practice Problems
| CF Rating | What You Should Be Able To Do |
|---|---|
| 1200 | Understand segment trees; persistence is not expected |
| 1500 | Know that persistent data structures exist; solve k-th smallest offline with merge-sort tree |
| 1800 | Implement persistent seg tree for k-th smallest in subarray; understand path copying |
| 2100 | Combine persistence with LCA for tree path queries; handle coordinate compression; manage memory pools to avoid MLE |
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Static Range Kth Smallest | Library Checker | Easy | Direct persistent segtree |
| 2 | MKTHNUM | SPOJ | Easy | k-th smallest in subarray |
| 3 | COT | SPOJ | Medium | k-th on tree path via persistence |
| 4 | D - Pairs | CF 1042D | Medium | Count pairs with sum in range |
| 5 | D - Copying Data | CF 292D | Medium | Array version control |
| 6 | D - Persistent Bookcase | CF 707D | Medium | Version rollback on DFS tree |
| 7 | E - The Classic Problem | CF 464E | Hard | Persistent segtree for big numbers |
| 8 | F - Strange Covering | CF 893F | Hard | Tree path + persistence |
| 9 | Range Queries and Copies | CSES | 1900 | Persistent seg tree: create copies, query/update any version |
| 10 | Salary Queries | CSES | 1700 | Dynamic order statistics (can solve with persistent or BIT+compression) |
Further Reading
- cp-algorithms: Persistent Segment Tree -- covers both the version-array and pointer-based approaches.
- CF Blog: Persistent Data Structures -- community discussion with implementation tricks.
- CF Blog: Merge Sort Tree vs Persistent Segtree -- comparison of approaches for k-th smallest.
- For lazy propagation with persistence, see "See:
persistent-16-segment-tree-lazy.md" (advanced, rarely needed). - For functional-style persistence (immutable nodes, path copying), see any reference on "fat node" vs "path copying" persistence -- we use path copying here as it is simpler and cache-friendlier.