Appearance
Making Standard Data Structures Solve Non-Standard Problems
Quick Revisit
- REACH FOR THIS WHEN: the problem asks for a query your standard data structure does not directly support
- KEY DECISION: which augmentation pattern fits -- custom merge, nested structure, persistence, lazy, or cascading?
- QUICK RULE: don't build from scratch; add a field, a pointer, or a deferred computation to what you already know
- SEE ALSO: Data Structure Selection Guide, DP State Design
You don't need a new data structure. You need to teach an old one a new trick.
Prerequisites: Segment Tree, Binary Indexed Tree
The Core Idea
You know segment trees, BSTs, and Fenwick trees. You know what operations they support. But the problem in front of you asks for an operation that doesn't quite fit.
Don't build something from scratch. Take the data structure you know, and augment it: store extra information in each node, combine it differently during queries, or layer one structure inside another.
Five Augmentation Patterns
1. Segment Tree with Custom Merge
The vanilla segment tree stores sums or maxima. But a node can store anything that merges associatively.
Example: Maximum subarray sum (Kadane's, but with range queries).
Each node stores four values: total sum, best prefix, best suffix, and best subarray. Merging two children:
cpp
struct Node {
long long total, prefix, suffix, best;
};
Node merge(Node L, Node R) {
return {
L.total + R.total,
max(L.prefix, L.total + R.prefix),
max(R.suffix, R.total + L.suffix),
max({L.best, R.best, L.suffix + R.prefix})
};
}Now you can query "maximum subarray sum in range merge function changed.
2. Merge Sort Tree -- A Segment Tree of Sorted Arrays
Each node of the segment tree stores the sorted array of all elements in its range.
What it buys you: For any range
cpp
// Build: each node stores sorted subarray
void build(int v, int lo, int hi) {
if (lo == hi) { tree[v] = {a[lo]}; return; }
int mid = (lo + hi) / 2;
build(2*v, lo, mid);
build(2*v+1, mid+1, hi);
merge(tree[2*v].begin(), tree[2*v].end(),
tree[2*v+1].begin(), tree[2*v+1].end(),
back_inserter(tree[v]));
}
// Query: count elements <= k in range [l, r]
int query(int v, int lo, int hi, int l, int r, int k) {
if (l > hi || r < lo) return 0;
if (l <= lo && hi <= r)
return upper_bound(tree[v].begin(), tree[v].end(), k)
- tree[v].begin();
int mid = (lo + hi) / 2;
return query(2*v, lo, mid, l, r, k)
+ query(2*v+1, mid+1, hi, l, r, k);
}Space:
3. Persistent Data Structures -- Keep the Past Alive Cheaply
A persistent segment tree keeps all previous versions alive. Instead of modifying a node, you create a new node pointing to the unchanged children.
What it buys you: Query any historical version. "What was the sum of range
Classic application: k-th smallest in a range. Build a persistent segment tree over values. Version
Each update creates
4. Lazy Propagation -- Batching Pending Work
You already know lazy propagation for range updates on segment trees. But the augmentation principle goes further: any operation that can be composed and applied later can be made lazy.
Beyond range-add: Range-set, range-multiply, range-affine (
Example: Range affine, point query. Tag:
So the composed tag is
Verify on paper before coding. Pick concrete tags, e.g.
Identity tag is
5. Fractional Cascading -- Accelerating Successive Binary Searches
You have
How. Augment each list with every other element from the next list, plus pointers showing where each element would land in the next list. After one binary search in the first list, follow pointers to find the position in each successive list in
Where it shows up. The merge sort tree above does
Augmentation Comparison
| Pattern | What you add | Query gain | Space cost | When to reach for it |
|---|---|---|---|---|
| Custom merge | Extra fields per node + new merge function | Arbitrary associative queries (e.g., max subarray sum) | O(n) | Standard merge (sum/max) is not enough |
| Merge sort tree | Sorted array per node | Count/rank in arbitrary range, O(log^2 n) | O(n log n) | "How many elements <= k in range [l,r]?" |
| Persistent | New root per update, path-copied nodes | Query any historical version | O(n log n) for n updates | "k-th smallest in range" or version queries |
| Lazy propagation | Pending-operation tags per node | Range updates in O(log n) | O(n) | Range update + range query |
| Fractional cascading | Interleaved pointers between sorted lists | Successive binary searches in O(1) each | O(n log n) | Merge sort tree too slow (log^2 -> log) |
Recognition Cues
- "Maximum subarray sum in range [l,r] with updates" --> custom merge (4-field node)
- "Count elements <= k in range [l,r]" --> merge sort tree (or persistent segment tree)
- "K-th smallest in range [l,r]" --> persistent segment tree over values
- "Range add + range query" --> lazy propagation
- "Range affine transformation" --> composable lazy tags: (a, b) means x -> ax + b
- "Query after the first t updates" --> persistent structure
- "Repeated binary searches on related sorted lists" --> fractional cascading
The Augmentation Mindset
When a problem asks for a query your data structure doesn't support:
- What extra information per node would answer the query? (Custom merge)
- Should each node store a sub-structure? (Merge sort tree, 2D segment tree)
- Do I need historical versions? (Persistence)
- Can I batch operations and defer work? (Lazy propagation)
- Am I repeating similar binary searches? (Fractional cascading)
You're not inventing new data structures. You're adding a field here, a pointer there, a deferred computation somewhere. The skeleton stays the same.
The Combination Game
The most powerful problems combine these ideas:
- Persistent + Lazy: Persistent segment tree with lazy propagation (tricky but possible — requires path-copying the lazy tags as well as the values).
- Merge Sort Tree + Binary Search on Answer: Count elements in range
that are , then binary search on to find the -th smallest. - Segment Tree over Time + Rollback DSU: When edges are added and later removed at known timestamps, build a segment tree whose leaves are time slots and whose nodes store the edges live during their interval. Sweep the tree with rollback DSU: on entering a node, union its edges; on leaving, undo. Different from offline-deletion-as-reverse-addition, and different again from small-to-large merging on trees — pick the one that matches the temporal structure.
Common Mistakes in Choosing
- Reaching for persistence when offline processing suffices. Persistent structures use O(n log n) space. If queries can be answered offline (sorted by time), a plain structure with rollback or sweep may be simpler and lighter.
- Forgetting that lazy tags must compose. If you can't represent g(f(x)) as a single tag, lazy propagation won't work. Test composition on paper before coding.
- Using merge sort tree when a persistent segment tree is cleaner. Both solve "count <= k in range," but persistent seg tree also handles k-th smallest directly. Choose based on which you can implement reliably under pressure.
- Over-augmenting. Adding fields that are never queried wastes time and introduces bugs. Only augment what the problem actually asks for.
- Ignoring the constant factor. Persistent segment trees have ~4-6x the constant of plain segment trees. At n = 10^6 with tight time limits, this matters.
Cross-References
- Segment Tree -- the base structure for most augmentations
- BIT / Fenwick -- simpler alternative when applicable
- Data Structure Selection Guide -- choosing the base structure
- Problem Transformations -- offline processing as an alternative to persistence
Practice Problems
| # | Problem | Source | Rating | Augmentation Used |
|---|---|---|---|---|
| 1 | Max Subarray Sum Queries | CSES | 1800 | Custom merge (4-field node) |
| 2 | MKTHNUM | SPOJ | 2000 | Persistent segment tree |
| 3 | Range Affine Point Get | Library Checker | 1800 | Composable lazy tags |
| 4 | KQUERY | SPOJ | 1800 | Merge sort tree |
| 5 | Array Queries | CF 797E | 2200 | Segment tree + sqrt decomposition |
Recap
- Augmentation, not invention: take a structure you know, add information to nodes or change the merge.
- Five patterns: custom merge, merge sort tree, persistence, lazy propagation, fractional cascading.
- Custom merge is the most common: just change what each node stores and how children combine.
- Persistence costs O(log n) new nodes per update -- budget the memory.
- Lazy tags must compose: if you can't write g(f(x)) as a single tag, the approach won't work.
- The combination game (persistent + lazy, seg tree + DSU) is where the hardest problems live.