Data Structure Selection Guide

Quick Revisit
Reach for this when: you know the operations you need but not which structure supports them.
Two rules govern everything below: (1) enumerate ALL required operations first, then (2) pick the weakest structure that handles them all. Prefix sums beat BIT, BIT beats segment tree, segment tree beats lazy segment tree — unless the operations force the heavier choice.
Sometimes the right "data structure" is no online structure at all. If queries can be processed in any order, consider sorting them and using a sweep line, Mo's algorithm, or coordinate compression with BIT. See Offline Queries.
See also: BFS vs DFS Guide, DP vs Greedy Guide.

You know you need "fast range queries" or "fast lookups" -- but which structure? This guide maps requirements to the right choice.

Quick-Select Table
The Range-Query Decision Matrix
BST vs Hash Table
When Simpler Beats Fancier
Complexity Cheat Sheet
What the Code Won't Teach You
Self-Test
Cross-References

Step 0: Enumerate ALL Operations First

Before choosing a data structure, write down every operation you need:

What are you querying? (point value, range sum, range min, kth element?)
What are you updating? (point update, range update, both?)
Are updates online or can you process offline?
Do you need to insert/delete elements, or is the array static?

The most common mistake is choosing a data structure based on the first operation you notice, then discovering it can't handle the second. Example: you pick a BIT for range sum queries, then realize you also need range minimum -- BIT can't do that.

Array Beats Map by 10-100x

For integer keys in range [0, N), use int arr[N] instead of map<int,int> or unordered_map<int,int>. Direct array access is O(1) with tiny constant; map is O(log n) with huge constant; unordered_map is O(1) average but with hash overhead + cache misses. In tight loops (Mo's, sweep line), this difference is the margin between AC and TLE.

Quick-Select Table

I need...	Best choice	Time	Notes
Key-value lookup	`unordered_map`	$O (1)$ avg	Use custom hash to avoid hack
Ordered iteration + lookup	`map` / `set`	$O (\log n)$	Balanced BST internally
Order-statistic (k-th element)	PBDS `order_of_key`	$O (\log n)$	`#include <ext/pb_ds/...>`
Prefix sums, no updates	Prefix array	$O (1)$ query	Build once in $O (n)$
Prefix sums + point updates	BIT (Fenwick)	$O (\log n)$	Short code, small constant
Range query + point update (any op)	Segment tree	$O (\log n)$	Handles non-invertible ops
Range query + range update	Segment tree + lazy	$O (\log n)$	More code, same asymptotic
Static RMQ (min/max, no updates)	Sparse table	$O (1)$ query	$O (n \log n)$ build, no updates
Predecessor / successor queries	`set`	$O (\log n)$	`lower_bound`, `upper_bound`
Priority access (min or max)	`priority_queue`	$O (\log n)$	No decrease-key; use lazy deletion
Union / connectivity queries	DSU	$O (α (n))$	Path compression + rank
String prefix matching	Trie	$O(	s
Sliding window min/max	Monotonic deque	$O (1)$ amortized	Perfect for fixed-size windows
Next greater / smaller element	Monotonic stack	$O (n)$ total	Single pass

Quick Decision Table

  DATA STRUCTURE SELECTION FLOWCHART:

  Do you have updates?
  |
  +-- NO --> Is the operation overlap-friendly (min, max, gcd)?
  |          +-- YES --> Sparse Table  (O(1) query, O(n log n) build)
  |          +-- NO  --> Is the operation invertible (sum, xor)?
  |                      +-- YES --> Prefix Sums  (O(1) query, O(n) build)
  |                      +-- NO  --> Segment Tree  (O(log n) query, O(n) build)
  |
  +-- YES --> Is the operation invertible (sum, xor)?
              |
              +-- YES --> BIT (Fenwick Tree)
              |           Simpler, 2-3x faster, half the memory of seg tree.
              |           Supports point updates + prefix queries in O(log n).
              |
              +-- NO  --> Segment Tree
                          Handles ANY associative merge (min, max, gcd, custom structs).
                          |
                          +-- Do you need range updates (not just point)?
                              +-- YES --> Segment Tree + Lazy Propagation  O(log n)
                              +-- NO  --> Plain Segment Tree  O(log n)

Scenario	Updates?	Invertible?	Best Choice	Query	Why not the others?
Static range sum	No	Yes	Prefix sums	O(1)	Seg tree is O(log n) -- overkill
Static range min	No	No	Sparse table	O(1)	Seg tree works but O(log n)
Point update + range sum	Yes	Yes	BIT	O(log n)	Faster & simpler than seg tree
Point update + range min	Yes	No	Segment tree	O(log n)	BIT can't handle non-invertible ops
Range update + range sum	Yes	Yes	Seg tree + lazy	O(log n)	BIT can do it with tricks, but fragile
Range update + range max	Yes	No	Seg tree + lazy	O(log n)	Only option for non-invertible + range update

The Range-Query Decision Matrix

This is the most common source of confusion. Use this matrix:

                      Point Update    Range Update    No Updates
                    +---------------+---------------+-------------+
  Invertible op     |               |               |             |
  (sum, xor)        |    BIT        | BIT + tricks  | Prefix sum  |
                    |  O(log n)     |  O(log n)     |   O(1)      |
                    +---------------+---------------+-------------+
  Non-invertible    |               |               |             |
  (min, max, gcd)   | Segment tree  |  Seg tree +   | Sparse table|
                    |  O(log n)     | lazy O(log n) |   O(1)      |
                    +---------------+---------------+-------------+
  Custom merge      |               |               |             |
  (struct, matrix)  | Segment tree  |  Seg tree +   | Seg tree or |
                    |  O(log n)     | lazy O(log n) | sparse table|
                    +---------------+---------------+-------------+

Decision rules:

No updates + invertible -> prefix sum. Don't overthink it.
No updates + non-invertible + overlap-friendly (min, max, gcd) -> sparse table for $O (1)$ .
Point updates + invertible -> BIT. It's 2-3x faster than segment tree with half the code.
Anything else -> segment tree. It handles every case; the question is whether something simpler suffices.

BST vs Hash Table

Aspect	`map` / `set` (BST)	`unordered_map` / `unordered_set` (Hash)
Lookup	$O (\log n)$	$O (1)$ average, $O (n)$ worst
Ordered iteration	Yes (sorted)	No
`lower_bound` / `upper_bound`	$O (\log n)$	Not available
Memory	~40-50 bytes/node	~50-60 bytes/bucket (with chaining)
Hackable?	No	Yes, unless you use custom hash
When to use BST	Need ordered traversal, predecessor/successor queries, range iteration
When to use hash	Pure lookup/insert/delete, no ordering needed, need $O (1)$

Rule of thumb: Start with unordered_map for speed. Switch to map if you need ordering. Always use a custom hash in contests to prevent adversarial inputs.

When Simpler Beats Fancier

Scenario	Don't use	Do use	Why
Static array, sum queries	Segment tree	Prefix sums	$O (1)$ vs $O (\log n)$ , trivial code
Fixed-size sliding window max	Segment tree	Monotonic deque	$O (1)$ amortized, less code
Count of elements $\leq x$ in sorted array	BST / segment tree	`upper_bound`	Already sorted -> binary search
Set membership check	`set`	`unordered_set` or `bitset`	$O (1)$ vs $O (\log n)$
Small universe ( $\leq 10^{7}$ ) frequency	`map`	Plain array	Array access is 10x faster

Complexity Cheat Sheet

Structure	Build	Query	Update	Space
Prefix sum	$O (n)$	$O (1)$	N/A	$O (n)$
BIT	$O (n)$	$O (\log n)$	$O (\log n)$	$O (n)$
Segment tree	$O (n)$	$O (\log n)$	$O (\log n)$	$O (4 n)$
Sparse table	$O (n \log n)$	$O (1)$	N/A	$O (n \log n)$
`set` / `map`	$O (n \log n)$	$O (\log n)$	$O (\log n)$	$O (n)$
`unordered_map`	$O (n)$ avg	$O (1)$ avg	$O (1)$ avg	$O (n)$
DSU	$O (n)$	$O (α (n))$	$O (α (n))$	$O (n)$
Monotonic stack/deque	$O (n)$	$O (1)$ amort	N/A	$O (n)$

What the Code Won't Teach You

Insights drawn from reflection notes

Think operations first, structure second. The first question after reading a problem should be "what operations do I need?" not "what algorithm is this?" Operations drive structure selection; algorithms use the selected structure.

Amortized vs worst-case matters in specific contexts. vector::push_back is $O (1)$ amortized but occasionally $O (n)$ . For most CP, amortized is fine. But in online settings where every query must respond in $O (\log n)$ worst-case, amortized guarantees can fail.

Memory estimation before coding saves time. A segment tree of $4 n$ nodes with 4 long long values each, at $n = 10^{6}$ , uses $\sim 128$ MB -- often exceeding the 256 MB limit. Count before you code.

  OPERATION-FIRST SELECTION FLOWCHART:

  What do I need?
       |
       +-- Point query only?
       |   +-- Array  O(1)
       |
       +-- Range query (static)?
       |   +-- Invertible (sum, xor)?  -> Prefix sums O(1)
       |   +-- Idempotent (min, max)?  -> Sparse table O(1)
       |
       +-- Range query + Point update?
       |   +-- Invertible?  -> BIT  O(log n), simpler
       |   +-- Otherwise?   -> Segment tree  O(log n)
       |
       +-- Range query + Range update?
       |   +-- Segment tree + lazy propagation
       |
       +-- Connectivity queries?
       |   +-- DSU  O(alpha(n)) ~= O(1)
       |
       +-- Ordered access + predecessor?
           +-- std::set / balanced BST  O(log n)

  MEMORY BUDGET CALCULATOR:

  Structure    Nodes    Values/Node   Bytes/Val   Total
  ----------   ------   -----------   ---------   ---------
  Seg tree     4n       1 int         4           16n bytes
  Lazy seg     4n       4 long long   8           128n bytes
  BIT          n        1 int         4           4n bytes
  Sparse tbl   n*logn   1 int         4           4n*logn

  For n=10^6:
    BIT:        ~4 MB     OK
    Seg tree:   ~16 MB    OK
    Lazy seg:   ~128 MB   borderline!
    Sparse tbl: ~80 MB    tight

Common Mistakes in Choosing

Integrated from reflection notes

Trap: "I should use the most powerful structure available."

The most powerful structure often has a larger constant and is harder to debug. Use the weakest structure that supports all required operations correctly.

Trap: "A segment tree can handle anything."

Almost. But it requires an associative merge function. For "k-th unseen element" or "dynamic order statistics," standard segment trees don't apply -- you need order-statistics trees or policy-based data structures.

Trap: "DSU is only for connectivity."

DSU can track component sizes, find cycle lengths, implement Kruskal's MST, support offline LCA, and enable small-to-large merging. The "only connectivity" view undersells it.

Trap: "BIT is always worse than segment tree."

For sum queries with point updates, BIT is simpler, faster (~2-3x), and uses half the memory. BIT is better when it applies. Segment tree's advantage is generality.

Trap: "I need an online data structure for these queries."

Often you do not. If all queries are known up front and the answer to one query does not depend on the answers to others, you can sort, batch, or sweep:

Range queries on a static array with offline access -> sweep line + BIT, often simpler than seg tree.
Distance / count queries on subarrays -> Mo's algorithm.
Threshold-style queries ("first time the answer becomes ≥ k") -> parallel binary search.
Large coordinate space -> coordinate compression + BIT, instead of a balanced BST.

Before reaching for a heavy online structure, ask: do the queries truly need to be answered online?

Self-Test

Drawn from reflection notes

[ ] List the required operations for: "Add $v$ to all elements in $[l, r]$ ; find the maximum in $[l, r]$ ." Select the appropriate data structure and justify.
[ ] Name three situations where BIT is better than segment tree, and three where segment tree is required over BIT.
[ ] Estimate the memory for a lazy segment tree storing 4 long long values per node, with $n = 3 \times 10^{5}$ . Does it fit in 256 MB?
[ ] Name one operation that neither BIT nor standard segment tree supports efficiently. What data structure handles it?
[ ] State the time complexity of lower_bound on a std::set, a sorted std::vector, and a BIT (for prefix-sum-based k-th element). When would you use each?

Cross-References

BIT / Fenwick
Segment Tree -- includes trade-off table vs BIT vs sparse table
Sparse Table
Hash Map -- custom hash techniques
Set and Multiset -- PBDS ordered set
DSU
Monotonic Stack
Monotonic Queue
Problem Pattern Recognition
Practice Ladders

What Would You Do If...?

You need range sum queries with point updates -- BIT or segment tree?
You need range min queries with no updates -- sparse table or segment tree?
You need to merge intervals dynamically -- what DS supports this?

The Mistake That Teaches

Problem E needs range max queries with range add updates. You code a segment tree with lazy propagation -- 80 lines, 2 bugs, 40 minutes. Your teammate solves it with sqrt decomposition in 25 lines, 15 minutes. Sometimes the 'worse' data structure is the better contest choice.

Recap

Operations first, structure second: list every query and update before choosing.
Invertible + point update = BIT; non-invertible or range update = segment tree.
No updates + overlap-friendly = sparse table; no updates + invertible = prefix sums.
Simpler beats fancier: use the weakest structure that covers all operations.
Online may not be required: if queries can be sorted/batched, an offline approach (sweep, Mo's, parallel binary search) often beats a heavier online structure.
Memory check before coding: estimate 4n, 16n, or 128n bytes and compare to the limit.

Data Structure Selection Guide ​

Contents ​

Step 0: Enumerate ALL Operations First ​

Array Beats Map by 10-100x ​

Quick-Select Table ​

Quick Decision Table ​

The Range-Query Decision Matrix ​

BST vs Hash Table ​

When Simpler Beats Fancier ​

Complexity Cheat Sheet ​

What the Code Won't Teach You ​

Common Mistakes in Choosing ​

Self-Test ​

Cross-References ​

What Would You Do If...? ​

The Mistake That Teaches ​

Recap ​