Skip to content

Making Standard Data Structures Solve Non-Standard Problems

Quick Revisit

  • REACH FOR THIS WHEN: the problem asks for a query your standard data structure does not directly support
  • KEY DECISION: which augmentation pattern fits -- custom merge, nested structure, persistence, lazy, or cascading?
  • QUICK RULE: don't build from scratch; add a field, a pointer, or a deferred computation to what you already know
  • SEE ALSO: Data Structure Selection Guide, DP State Design

You don't need a new data structure. You need to teach an old one a new trick.

Prerequisites: Segment Tree, Binary Indexed Tree

The Core Idea

You know segment trees, BSTs, and Fenwick trees. You know what operations they support. But the problem in front of you asks for an operation that doesn't quite fit.

Don't build something from scratch. Take the data structure you know, and augment it: store extra information in each node, combine it differently during queries, or layer one structure inside another.

Five Augmentation Patterns

1. Segment Tree with Custom Merge

The vanilla segment tree stores sums or maxima. But a node can store anything that merges associatively.

Example: Maximum subarray sum (Kadane's, but with range queries).

Each node stores four values: total sum, best prefix, best suffix, and best subarray. Merging two children:

cpp
struct Node {
    long long total, prefix, suffix, best;
};

Node merge(Node L, Node R) {
    return {
        L.total + R.total,
        max(L.prefix, L.total + R.prefix),
        max(R.suffix, R.total + L.suffix),
        max({L.best, R.best, L.suffix + R.prefix})
    };
}

Now you can query "maximum subarray sum in range [l,r]" in O(logn). The data structure is the same segment tree you already know -- only the merge function changed.

2. Merge Sort Tree -- A Segment Tree of Sorted Arrays

Each node of the segment tree stores the sorted array of all elements in its range.

What it buys you: For any range [l,r], count how many elements are k (or find the k-th smallest) by binary-searching in O(log2n) nodes.

cpp
// Build: each node stores sorted subarray
void build(int v, int lo, int hi) {
    if (lo == hi) { tree[v] = {a[lo]}; return; }
    int mid = (lo + hi) / 2;
    build(2*v, lo, mid);
    build(2*v+1, mid+1, hi);
    merge(tree[2*v].begin(), tree[2*v].end(),
          tree[2*v+1].begin(), tree[2*v+1].end(),
          back_inserter(tree[v]));
}

// Query: count elements <= k in range [l, r]
int query(int v, int lo, int hi, int l, int r, int k) {
    if (l > hi || r < lo) return 0;
    if (l <= lo && hi <= r)
        return upper_bound(tree[v].begin(), tree[v].end(), k)
               - tree[v].begin();
    int mid = (lo + hi) / 2;
    return query(2*v, lo, mid, l, r, k)
         + query(2*v+1, mid+1, hi, l, r, k);
}

Space: O(nlogn). Query: O(log2n). It's a brute-force-feeling idea -- just store everything -- but the tree structure keeps it efficient.

3. Persistent Data Structures -- Keep the Past Alive Cheaply

A persistent segment tree keeps all previous versions alive. Instead of modifying a node, you create a new node pointing to the unchanged children.

What it buys you: Query any historical version. "What was the sum of range [l,r] after the first t updates?" becomes a simple query on version t.

Classic application: k-th smallest in a range. Build a persistent segment tree over values. Version i includes the first i elements. To find the k-th smallest in a[lr], subtract version l1 from version r and walk down the difference tree.

Each update creates O(logn) new nodes, so total space is O(nlogn) for n updates.

4. Lazy Propagation -- Batching Pending Work

You already know lazy propagation for range updates on segment trees. But the augmentation principle goes further: any operation that can be composed and applied later can be made lazy.

Beyond range-add: Range-set, range-multiply, range-affine (xax+b), range-min-with-value. The key requirement: pending operations must compose. If you apply operation f, then g, you need to represent gf as a single pending tag.

Example: Range affine, point query. Tag: (a,b) meaning f(x)=ax+b. Apply (a1,b1) first, then (a2,b2):

(a2,b2)(a1,b1):xa2(a1x+b1)+b2=(a2a1)x+(a2b1+b2).

So the composed tag is (a2a1,a2b1+b2). Push down: apply the composed operation to children.

Verify on paper before coding. Pick concrete tags, e.g. f1=2x+3 then f2=5x+7. Hand: f2(f1(1))=f2(5)=32. Formula: (52,53+7)=(10,22), evaluated at 1 is 32. Lazy bugs almost always come from composition-order mistakes — write the function composition out before you trust the formula, every time.

Identity tag is (1,0). The order matters: (a2,b2)(a1,b1)(a1,b1)(a2,b2) in general. Decide once whether your tag means "pending operation, applied next" or "operation already absorbed, child must catch up," and stick to it.

5. Fractional Cascading -- Accelerating Successive Binary Searches

You have k sorted lists, and you need to binary-search the same value in all of them. Naively: O(klogn). With fractional cascading: O(k+logn).

How. Augment each list with every other element from the next list, plus pointers showing where each element would land in the next list. After one binary search in the first list, follow pointers to find the position in each successive list in O(1).

Where it shows up. The merge sort tree above does O(log2n) queries -- one binary search per segment tree node. Fractional cascading shaves it to O(logn). In practice, the log2 version is usually fast enough, but for tight time limits, this is the upgrade path.

Augmentation Comparison

PatternWhat you addQuery gainSpace costWhen to reach for it
Custom mergeExtra fields per node + new merge functionArbitrary associative queries (e.g., max subarray sum)O(n)Standard merge (sum/max) is not enough
Merge sort treeSorted array per nodeCount/rank in arbitrary range, O(log^2 n)O(n log n)"How many elements <= k in range [l,r]?"
PersistentNew root per update, path-copied nodesQuery any historical versionO(n log n) for n updates"k-th smallest in range" or version queries
Lazy propagationPending-operation tags per nodeRange updates in O(log n)O(n)Range update + range query
Fractional cascadingInterleaved pointers between sorted listsSuccessive binary searches in O(1) eachO(n log n)Merge sort tree too slow (log^2 -> log)

Recognition Cues

  • "Maximum subarray sum in range [l,r] with updates" --> custom merge (4-field node)
  • "Count elements <= k in range [l,r]" --> merge sort tree (or persistent segment tree)
  • "K-th smallest in range [l,r]" --> persistent segment tree over values
  • "Range add + range query" --> lazy propagation
  • "Range affine transformation" --> composable lazy tags: (a, b) means x -> ax + b
  • "Query after the first t updates" --> persistent structure
  • "Repeated binary searches on related sorted lists" --> fractional cascading

The Augmentation Mindset

When a problem asks for a query your data structure doesn't support:

  1. What extra information per node would answer the query? (Custom merge)
  2. Should each node store a sub-structure? (Merge sort tree, 2D segment tree)
  3. Do I need historical versions? (Persistence)
  4. Can I batch operations and defer work? (Lazy propagation)
  5. Am I repeating similar binary searches? (Fractional cascading)

You're not inventing new data structures. You're adding a field here, a pointer there, a deferred computation somewhere. The skeleton stays the same.

The Combination Game

The most powerful problems combine these ideas:

  • Persistent + Lazy: Persistent segment tree with lazy propagation (tricky but possible — requires path-copying the lazy tags as well as the values).
  • Merge Sort Tree + Binary Search on Answer: Count elements in range [l,r] that are X, then binary search on X to find the k-th smallest.
  • Segment Tree over Time + Rollback DSU: When edges are added and later removed at known timestamps, build a segment tree whose leaves are time slots and whose nodes store the edges live during their interval. Sweep the tree with rollback DSU: on entering a node, union its edges; on leaving, undo. Different from offline-deletion-as-reverse-addition, and different again from small-to-large merging on trees — pick the one that matches the temporal structure.

Common Mistakes in Choosing

  • Reaching for persistence when offline processing suffices. Persistent structures use O(n log n) space. If queries can be answered offline (sorted by time), a plain structure with rollback or sweep may be simpler and lighter.
  • Forgetting that lazy tags must compose. If you can't represent g(f(x)) as a single tag, lazy propagation won't work. Test composition on paper before coding.
  • Using merge sort tree when a persistent segment tree is cleaner. Both solve "count <= k in range," but persistent seg tree also handles k-th smallest directly. Choose based on which you can implement reliably under pressure.
  • Over-augmenting. Adding fields that are never queried wastes time and introduces bugs. Only augment what the problem actually asks for.
  • Ignoring the constant factor. Persistent segment trees have ~4-6x the constant of plain segment trees. At n = 10^6 with tight time limits, this matters.

Cross-References

Practice Problems

#ProblemSourceRatingAugmentation Used
1Max Subarray Sum QueriesCSES1800Custom merge (4-field node)
2MKTHNUMSPOJ2000Persistent segment tree
3Range Affine Point GetLibrary Checker1800Composable lazy tags
4KQUERYSPOJ1800Merge sort tree
5Array QueriesCF 797E2200Segment tree + sqrt decomposition

Recap

  • Augmentation, not invention: take a structure you know, add information to nodes or change the merge.
  • Five patterns: custom merge, merge sort tree, persistence, lazy propagation, fractional cascading.
  • Custom merge is the most common: just change what each node stores and how children combine.
  • Persistence costs O(log n) new nodes per update -- budget the memory.
  • Lazy tags must compose: if you can't write g(f(x)) as a single tag, the approach won't work.
  • The combination game (persistent + lazy, seg tree + DSU) is where the hardest problems live.

Built for the climb from Codeforces 1100 to 2100.