Appearance
Data Structure Selection Guide
Quick Revisit
- Reach for this when: you know the operations you need but not which structure supports them.
- Two rules govern everything below: (1) enumerate ALL required operations first, then (2) pick the weakest structure that handles them all. Prefix sums beat BIT, BIT beats segment tree, segment tree beats lazy segment tree — unless the operations force the heavier choice.
- Sometimes the right "data structure" is no online structure at all. If queries can be processed in any order, consider sorting them and using a sweep line, Mo's algorithm, or coordinate compression with BIT. See Offline Queries.
- See also: BFS vs DFS Guide, DP vs Greedy Guide.
You know you need "fast range queries" or "fast lookups" -- but which structure? This guide maps requirements to the right choice.
Contents
- Quick-Select Table
- The Range-Query Decision Matrix
- BST vs Hash Table
- When Simpler Beats Fancier
- Complexity Cheat Sheet
- What the Code Won't Teach You
- Self-Test
- Cross-References
Step 0: Enumerate ALL Operations First
Before choosing a data structure, write down every operation you need:
- What are you querying? (point value, range sum, range min, kth element?)
- What are you updating? (point update, range update, both?)
- Are updates online or can you process offline?
- Do you need to insert/delete elements, or is the array static?
The most common mistake is choosing a data structure based on the first operation you notice, then discovering it can't handle the second. Example: you pick a BIT for range sum queries, then realize you also need range minimum -- BIT can't do that.
Array Beats Map by 10-100x
For integer keys in range [0, N), use int arr[N] instead of map<int,int> or unordered_map<int,int>. Direct array access is O(1) with tiny constant; map is O(log n) with huge constant; unordered_map is O(1) average but with hash overhead + cache misses. In tight loops (Mo's, sweep line), this difference is the margin between AC and TLE.
Quick-Select Table
| I need... | Best choice | Time | Notes |
|---|---|---|---|
| Key-value lookup | unordered_map | Use custom hash to avoid hack | |
| Ordered iteration + lookup | map / set | Balanced BST internally | |
| Order-statistic (k-th element) | PBDS order_of_key | #include <ext/pb_ds/...> | |
| Prefix sums, no updates | Prefix array | Build once in | |
| Prefix sums + point updates | BIT (Fenwick) | Short code, small constant | |
| Range query + point update (any op) | Segment tree | Handles non-invertible ops | |
| Range query + range update | Segment tree + lazy | More code, same asymptotic | |
| Static RMQ (min/max, no updates) | Sparse table | ||
| Predecessor / successor queries | set | lower_bound, upper_bound | |
| Priority access (min or max) | priority_queue | No decrease-key; use lazy deletion | |
| Union / connectivity queries | DSU | Path compression + rank | |
| String prefix matching | Trie | $O( | s |
| Sliding window min/max | Monotonic deque | Perfect for fixed-size windows | |
| Next greater / smaller element | Monotonic stack | Single pass |
Quick Decision Table
DATA STRUCTURE SELECTION FLOWCHART:
Do you have updates?
|
+-- NO --> Is the operation overlap-friendly (min, max, gcd)?
| +-- YES --> Sparse Table (O(1) query, O(n log n) build)
| +-- NO --> Is the operation invertible (sum, xor)?
| +-- YES --> Prefix Sums (O(1) query, O(n) build)
| +-- NO --> Segment Tree (O(log n) query, O(n) build)
|
+-- YES --> Is the operation invertible (sum, xor)?
|
+-- YES --> BIT (Fenwick Tree)
| Simpler, 2-3x faster, half the memory of seg tree.
| Supports point updates + prefix queries in O(log n).
|
+-- NO --> Segment Tree
Handles ANY associative merge (min, max, gcd, custom structs).
|
+-- Do you need range updates (not just point)?
+-- YES --> Segment Tree + Lazy Propagation O(log n)
+-- NO --> Plain Segment Tree O(log n)| Scenario | Updates? | Invertible? | Best Choice | Query | Why not the others? |
|---|---|---|---|---|---|
| Static range sum | No | Yes | Prefix sums | O(1) | Seg tree is O(log n) -- overkill |
| Static range min | No | No | Sparse table | O(1) | Seg tree works but O(log n) |
| Point update + range sum | Yes | Yes | BIT | O(log n) | Faster & simpler than seg tree |
| Point update + range min | Yes | No | Segment tree | O(log n) | BIT can't handle non-invertible ops |
| Range update + range sum | Yes | Yes | Seg tree + lazy | O(log n) | BIT can do it with tricks, but fragile |
| Range update + range max | Yes | No | Seg tree + lazy | O(log n) | Only option for non-invertible + range update |
The Range-Query Decision Matrix
This is the most common source of confusion. Use this matrix:
Point Update Range Update No Updates
+---------------+---------------+-------------+
Invertible op | | | |
(sum, xor) | BIT | BIT + tricks | Prefix sum |
| O(log n) | O(log n) | O(1) |
+---------------+---------------+-------------+
Non-invertible | | | |
(min, max, gcd) | Segment tree | Seg tree + | Sparse table|
| O(log n) | lazy O(log n) | O(1) |
+---------------+---------------+-------------+
Custom merge | | | |
(struct, matrix) | Segment tree | Seg tree + | Seg tree or |
| O(log n) | lazy O(log n) | sparse table|
+---------------+---------------+-------------+Decision rules:
- No updates + invertible -> prefix sum. Don't overthink it.
- No updates + non-invertible + overlap-friendly (min, max, gcd) -> sparse table for
. - Point updates + invertible -> BIT. It's 2-3x faster than segment tree with half the code.
- Anything else -> segment tree. It handles every case; the question is whether something simpler suffices.
BST vs Hash Table
| Aspect | map / set (BST) | unordered_map / unordered_set (Hash) |
|---|---|---|
| Lookup | ||
| Ordered iteration | Yes (sorted) | No |
lower_bound / upper_bound | Not available | |
| Memory | ~40-50 bytes/node | ~50-60 bytes/bucket (with chaining) |
| Hackable? | No | Yes, unless you use custom hash |
| When to use BST | Need ordered traversal, predecessor/successor queries, range iteration | |
| When to use hash | Pure lookup/insert/delete, no ordering needed, need |
Rule of thumb: Start with unordered_map for speed. Switch to map if you need ordering. Always use a custom hash in contests to prevent adversarial inputs.
When Simpler Beats Fancier
| Scenario | Don't use | Do use | Why |
|---|---|---|---|
| Static array, sum queries | Segment tree | Prefix sums | |
| Fixed-size sliding window max | Segment tree | Monotonic deque | |
| Count of elements | BST / segment tree | upper_bound | Already sorted -> binary search |
| Set membership check | set | unordered_set or bitset | |
| Small universe ( | map | Plain array | Array access is 10x faster |
Complexity Cheat Sheet
| Structure | Build | Query | Update | Space |
|---|---|---|---|---|
| Prefix sum | N/A | |||
| BIT | ||||
| Segment tree | ||||
| Sparse table | N/A | |||
set / map | ||||
unordered_map | ||||
| DSU | ||||
| Monotonic stack/deque | N/A |
What the Code Won't Teach You
Insights drawn from reflection notes
Think operations first, structure second. The first question after reading a problem should be "what operations do I need?" not "what algorithm is this?" Operations drive structure selection; algorithms use the selected structure.
Amortized vs worst-case matters in specific contexts. vector::push_back is
Memory estimation before coding saves time. A segment tree of long long values each, at
OPERATION-FIRST SELECTION FLOWCHART:
What do I need?
|
+-- Point query only?
| +-- Array O(1)
|
+-- Range query (static)?
| +-- Invertible (sum, xor)? -> Prefix sums O(1)
| +-- Idempotent (min, max)? -> Sparse table O(1)
|
+-- Range query + Point update?
| +-- Invertible? -> BIT O(log n), simpler
| +-- Otherwise? -> Segment tree O(log n)
|
+-- Range query + Range update?
| +-- Segment tree + lazy propagation
|
+-- Connectivity queries?
| +-- DSU O(alpha(n)) ~= O(1)
|
+-- Ordered access + predecessor?
+-- std::set / balanced BST O(log n) MEMORY BUDGET CALCULATOR:
Structure Nodes Values/Node Bytes/Val Total
---------- ------ ----------- --------- ---------
Seg tree 4n 1 int 4 16n bytes
Lazy seg 4n 4 long long 8 128n bytes
BIT n 1 int 4 4n bytes
Sparse tbl n*logn 1 int 4 4n*logn
For n=10^6:
BIT: ~4 MB OK
Seg tree: ~16 MB OK
Lazy seg: ~128 MB borderline!
Sparse tbl: ~80 MB tightCommon Mistakes in Choosing
Integrated from reflection notes
Trap: "I should use the most powerful structure available."
The most powerful structure often has a larger constant and is harder to debug. Use the weakest structure that supports all required operations correctly.
Trap: "A segment tree can handle anything."
Almost. But it requires an associative merge function. For "k-th unseen element" or "dynamic order statistics," standard segment trees don't apply -- you need order-statistics trees or policy-based data structures.
Trap: "DSU is only for connectivity."
DSU can track component sizes, find cycle lengths, implement Kruskal's MST, support offline LCA, and enable small-to-large merging. The "only connectivity" view undersells it.
Trap: "BIT is always worse than segment tree."
For sum queries with point updates, BIT is simpler, faster (~2-3x), and uses half the memory. BIT is better when it applies. Segment tree's advantage is generality.
Trap: "I need an online data structure for these queries."
Often you do not. If all queries are known up front and the answer to one query does not depend on the answers to others, you can sort, batch, or sweep:
- Range queries on a static array with offline access -> sweep line + BIT, often simpler than seg tree.
- Distance / count queries on subarrays -> Mo's algorithm.
- Threshold-style queries ("first time the answer becomes ≥ k") -> parallel binary search.
- Large coordinate space -> coordinate compression + BIT, instead of a balanced BST.
Before reaching for a heavy online structure, ask: do the queries truly need to be answered online?
Self-Test
Drawn from reflection notes
[ ] List the required operations for: "Add
to all elements in ; find the maximum in ." Select the appropriate data structure and justify. [ ] Name three situations where BIT is better than segment tree, and three where segment tree is required over BIT.
[ ] Estimate the memory for a lazy segment tree storing 4
long longvalues per node, with. Does it fit in 256 MB? [ ] Name one operation that neither BIT nor standard segment tree supports efficiently. What data structure handles it?
[ ] State the time complexity of
lower_boundon astd::set, a sortedstd::vector, and a BIT (for prefix-sum-based k-th element). When would you use each?
Cross-References
- BIT / Fenwick
- Segment Tree -- includes trade-off table vs BIT vs sparse table
- Sparse Table
- Hash Map -- custom hash techniques
- Set and Multiset -- PBDS ordered set
- DSU
- Monotonic Stack
- Monotonic Queue
- Problem Pattern Recognition
- Practice Ladders
What Would You Do If...?
- You need range sum queries with point updates -- BIT or segment tree?
- You need range min queries with no updates -- sparse table or segment tree?
- You need to merge intervals dynamically -- what DS supports this?
The Mistake That Teaches
Problem E needs range max queries with range add updates. You code a segment tree with lazy propagation -- 80 lines, 2 bugs, 40 minutes. Your teammate solves it with sqrt decomposition in 25 lines, 15 minutes. Sometimes the 'worse' data structure is the better contest choice.
Recap
- Operations first, structure second: list every query and update before choosing.
- Invertible + point update = BIT; non-invertible or range update = segment tree.
- No updates + overlap-friendly = sparse table; no updates + invertible = prefix sums.
- Simpler beats fancier: use the weakest structure that covers all operations.
- Online may not be required: if queries can be sorted/batched, an offline approach (sweep, Mo's, parallel binary search) often beats a heavier online structure.
- Memory check before coding: estimate 4n, 16n, or 128n bytes and compare to the limit.