Making Standard Data Structures Solve Non-Standard Problems

Quick Revisit
REACH FOR THIS WHEN: the problem asks for a query your standard data structure does not directly support
KEY DECISION: which augmentation pattern fits -- custom merge, nested structure, persistence, lazy, or cascading?
QUICK RULE: don't build from scratch; add a field, a pointer, or a deferred computation to what you already know
SEE ALSO: Data Structure Selection Guide, DP State Design

You don't need a new data structure. You need to teach an old one a new trick.
Prerequisites: Segment Tree, Binary Indexed Tree

The Core Idea

You know segment trees, BSTs, and Fenwick trees. You know what operations they support. But the problem in front of you asks for an operation that doesn't quite fit.

Don't build something from scratch. Take the data structure you know, and augment it: store extra information in each node, combine it differently during queries, or layer one structure inside another.

Five Augmentation Patterns

1. Segment Tree with Custom Merge

The vanilla segment tree stores sums or maxima. But a node can store anything that merges associatively.

Example: Maximum subarray sum (Kadane's, but with range queries).

Each node stores four values: total sum, best prefix, best suffix, and best subarray. Merging two children:

cpp

struct Node {
    long long total, prefix, suffix, best;
};

Node merge(Node L, Node R) {
    return {
        L.total + R.total,
        max(L.prefix, L.total + R.prefix),
        max(R.suffix, R.total + L.suffix),
        max({L.best, R.best, L.suffix + R.prefix})
    };
}

Now you can query "maximum subarray sum in range $[l, r]$ " in $O (\log n)$ . The data structure is the same segment tree you already know -- only the merge function changed.

2. Merge Sort Tree -- A Segment Tree of Sorted Arrays

Each node of the segment tree stores the sorted array of all elements in its range.

What it buys you: For any range $[l, r]$ , count how many elements are $\leq k$ (or find the $k$ -th smallest) by binary-searching in $O (\log^{2} n)$ nodes.

cpp

// Build: each node stores sorted subarray
void build(int v, int lo, int hi) {
    if (lo == hi) { tree[v] = {a[lo]}; return; }
    int mid = (lo + hi) / 2;
    build(2*v, lo, mid);
    build(2*v+1, mid+1, hi);
    merge(tree[2*v].begin(), tree[2*v].end(),
          tree[2*v+1].begin(), tree[2*v+1].end(),
          back_inserter(tree[v]));
}

// Query: count elements <= k in range [l, r]
int query(int v, int lo, int hi, int l, int r, int k) {
    if (l > hi || r < lo) return 0;
    if (l <= lo && hi <= r)
        return upper_bound(tree[v].begin(), tree[v].end(), k)
               - tree[v].begin();
    int mid = (lo + hi) / 2;
    return query(2*v, lo, mid, l, r, k)
         + query(2*v+1, mid+1, hi, l, r, k);
}

Space: $O (n \log n)$ . Query: $O (\log^{2} n)$ . It's a brute-force-feeling idea -- just store everything -- but the tree structure keeps it efficient.

3. Persistent Data Structures -- Keep the Past Alive Cheaply

A persistent segment tree keeps all previous versions alive. Instead of modifying a node, you create a new node pointing to the unchanged children.

What it buys you: Query any historical version. "What was the sum of range $[l, r]$ after the first $t$ updates?" becomes a simple query on version $t$ .

Classic application: k-th smallest in a range. Build a persistent segment tree over values. Version $i$ includes the first $i$ elements. To find the $k$ -th smallest in $a [l \dots r]$ , subtract version $l - 1$ from version $r$ and walk down the difference tree.

Each update creates $O (\log n)$ new nodes, so total space is $O (n \log n)$ for $n$ updates.

4. Lazy Propagation -- Batching Pending Work

You already know lazy propagation for range updates on segment trees. But the augmentation principle goes further: any operation that can be composed and applied later can be made lazy.

Beyond range-add: Range-set, range-multiply, range-affine ( $x \to a x + b$ ), range-min-with-value. The key requirement: pending operations must compose. If you apply operation $f$ , then $g$ , you need to represent $g \circ f$ as a single pending tag.

Example: Range affine, point query. Tag: $(a, b)$ meaning $f (x) = a x + b$ . Apply $(a_{1}, b_{1})$ first, then $(a_{2}, b_{2})$ :

(a_{2}, b_{2}) \circ (a_{1}, b_{1}) : x \mapsto a_{2} (a_{1} x + b_{1}) + b_{2} = (a_{2} a_{1}) x + (a_{2} b_{1} + b_{2}) .

So the composed tag is $(a_{2} a_{1}, a_{2} b_{1} + b_{2})$ . Push down: apply the composed operation to children.

Verify on paper before coding. Pick concrete tags, e.g. $f_{1} = 2 x + 3$ then $f_{2} = 5 x + 7$ . Hand: $f_{2} (f_{1} (1)) = f_{2} (5) = 32$ . Formula: $(5 \cdot 2, 5 \cdot 3 + 7) = (10, 22)$ , evaluated at $1$ is $32$ . Lazy bugs almost always come from composition-order mistakes — write the function composition out before you trust the formula, every time.

Identity tag is $(1, 0)$ . The order matters: $(a_{2}, b_{2}) \circ (a_{1}, b_{1}) \neq (a_{1}, b_{1}) \circ (a_{2}, b_{2})$ in general. Decide once whether your tag means "pending operation, applied next" or "operation already absorbed, child must catch up," and stick to it.

5. Fractional Cascading -- Accelerating Successive Binary Searches

You have $k$ sorted lists, and you need to binary-search the same value in all of them. Naively: $O (k \log n)$ . With fractional cascading: $O (k + \log n)$ .

How. Augment each list with every other element from the next list, plus pointers showing where each element would land in the next list. After one binary search in the first list, follow pointers to find the position in each successive list in $O (1)$ .

Where it shows up. The merge sort tree above does $O (\log^{2} n)$ queries -- one binary search per segment tree node. Fractional cascading shaves it to $O (\log n)$ . In practice, the $\log^{2}$ version is usually fast enough, but for tight time limits, this is the upgrade path.

Augmentation Comparison

Pattern	What you add	Query gain	Space cost	When to reach for it
Custom merge	Extra fields per node + new merge function	Arbitrary associative queries (e.g., max subarray sum)	O(n)	Standard merge (sum/max) is not enough
Merge sort tree	Sorted array per node	Count/rank in arbitrary range, O(log^2 n)	O(n log n)	"How many elements <= k in range [l,r]?"
Persistent	New root per update, path-copied nodes	Query any historical version	O(n log n) for n updates	"k-th smallest in range" or version queries
Lazy propagation	Pending-operation tags per node	Range updates in O(log n)	O(n)	Range update + range query
Fractional cascading	Interleaved pointers between sorted lists	Successive binary searches in O(1) each	O(n log n)	Merge sort tree too slow (log^2 -> log)

Recognition Cues

"Maximum subarray sum in range [l,r] with updates" --> custom merge (4-field node)
"Count elements <= k in range [l,r]" --> merge sort tree (or persistent segment tree)
"K-th smallest in range [l,r]" --> persistent segment tree over values
"Range add + range query" --> lazy propagation
"Range affine transformation" --> composable lazy tags: (a, b) means x -> ax + b
"Query after the first t updates" --> persistent structure
"Repeated binary searches on related sorted lists" --> fractional cascading

The Augmentation Mindset

When a problem asks for a query your data structure doesn't support:

What extra information per node would answer the query? (Custom merge)
Should each node store a sub-structure? (Merge sort tree, 2D segment tree)
Do I need historical versions? (Persistence)
Can I batch operations and defer work? (Lazy propagation)
Am I repeating similar binary searches? (Fractional cascading)

You're not inventing new data structures. You're adding a field here, a pointer there, a deferred computation somewhere. The skeleton stays the same.

The Combination Game

The most powerful problems combine these ideas:

Persistent + Lazy: Persistent segment tree with lazy propagation (tricky but possible — requires path-copying the lazy tags as well as the values).
Merge Sort Tree + Binary Search on Answer: Count elements in range $[l, r]$ that are $\leq X$ , then binary search on $X$ to find the $k$ -th smallest.
Segment Tree over Time + Rollback DSU: When edges are added and later removed at known timestamps, build a segment tree whose leaves are time slots and whose nodes store the edges live during their interval. Sweep the tree with rollback DSU: on entering a node, union its edges; on leaving, undo. Different from offline-deletion-as-reverse-addition, and different again from small-to-large merging on trees — pick the one that matches the temporal structure.

Common Mistakes in Choosing

Reaching for persistence when offline processing suffices. Persistent structures use O(n log n) space. If queries can be answered offline (sorted by time), a plain structure with rollback or sweep may be simpler and lighter.
Forgetting that lazy tags must compose. If you can't represent g(f(x)) as a single tag, lazy propagation won't work. Test composition on paper before coding.
Using merge sort tree when a persistent segment tree is cleaner. Both solve "count <= k in range," but persistent seg tree also handles k-th smallest directly. Choose based on which you can implement reliably under pressure.
Over-augmenting. Adding fields that are never queried wastes time and introduces bugs. Only augment what the problem actually asks for.
Ignoring the constant factor. Persistent segment trees have ~4-6x the constant of plain segment trees. At n = 10^6 with tight time limits, this matters.

Cross-References

Segment Tree -- the base structure for most augmentations
BIT / Fenwick -- simpler alternative when applicable
Data Structure Selection Guide -- choosing the base structure
Problem Transformations -- offline processing as an alternative to persistence

Practice Problems

#	Problem	Source	Rating	Augmentation Used
1	Max Subarray Sum Queries	CSES	1800	Custom merge (4-field node)
2	MKTHNUM	SPOJ	2000	Persistent segment tree
3	Range Affine Point Get	Library Checker	1800	Composable lazy tags
4	KQUERY	SPOJ	1800	Merge sort tree
5	Array Queries	CF 797E	2200	Segment tree + sqrt decomposition

Recap

Augmentation, not invention: take a structure you know, add information to nodes or change the merge.
Five patterns: custom merge, merge sort tree, persistence, lazy propagation, fractional cascading.
Custom merge is the most common: just change what each node stores and how children combine.
Persistence costs O(log n) new nodes per update -- budget the memory.
Lazy tags must compose: if you can't write g(f(x)) as a single tag, the approach won't work.
The combination game (persistent + lazy, seg tree + DSU) is where the hardest problems live.

Making Standard Data Structures Solve Non-Standard Problems ​

The Core Idea ​

Five Augmentation Patterns ​

1. Segment Tree with Custom Merge ​

2. Merge Sort Tree -- A Segment Tree of Sorted Arrays ​

3. Persistent Data Structures -- Keep the Past Alive Cheaply ​

4. Lazy Propagation -- Batching Pending Work ​

5. Fractional Cascading -- Accelerating Successive Binary Searches ​

Augmentation Comparison ​

Recognition Cues ​

The Augmentation Mindset ​

The Combination Game ​

Common Mistakes in Choosing ​

Cross-References ​

Practice Problems ​

Recap ​