Skip to content

Data Structure Selection Guide

Quick Revisit

  • Reach for this when: you know the operations you need but not which structure supports them.
  • Two rules govern everything below: (1) enumerate ALL required operations first, then (2) pick the weakest structure that handles them all. Prefix sums beat BIT, BIT beats segment tree, segment tree beats lazy segment tree — unless the operations force the heavier choice.
  • Sometimes the right "data structure" is no online structure at all. If queries can be processed in any order, consider sorting them and using a sweep line, Mo's algorithm, or coordinate compression with BIT. See Offline Queries.
  • See also: BFS vs DFS Guide, DP vs Greedy Guide.

You know you need "fast range queries" or "fast lookups" -- but which structure? This guide maps requirements to the right choice.

Contents


Step 0: Enumerate ALL Operations First

Before choosing a data structure, write down every operation you need:

  • What are you querying? (point value, range sum, range min, kth element?)
  • What are you updating? (point update, range update, both?)
  • Are updates online or can you process offline?
  • Do you need to insert/delete elements, or is the array static?

The most common mistake is choosing a data structure based on the first operation you notice, then discovering it can't handle the second. Example: you pick a BIT for range sum queries, then realize you also need range minimum -- BIT can't do that.

Array Beats Map by 10-100x

For integer keys in range [0, N), use int arr[N] instead of map<int,int> or unordered_map<int,int>. Direct array access is O(1) with tiny constant; map is O(log n) with huge constant; unordered_map is O(1) average but with hash overhead + cache misses. In tight loops (Mo's, sweep line), this difference is the margin between AC and TLE.


Quick-Select Table

I need...Best choiceTimeNotes
Key-value lookupunordered_mapO(1) avgUse custom hash to avoid hack
Ordered iteration + lookupmap / setO(logn)Balanced BST internally
Order-statistic (k-th element)PBDS order_of_keyO(logn)#include <ext/pb_ds/...>
Prefix sums, no updatesPrefix arrayO(1) queryBuild once in O(n)
Prefix sums + point updatesBIT (Fenwick)O(logn)Short code, small constant
Range query + point update (any op)Segment treeO(logn)Handles non-invertible ops
Range query + range updateSegment tree + lazyO(logn)More code, same asymptotic
Static RMQ (min/max, no updates)Sparse tableO(1) queryO(nlogn) build, no updates
Predecessor / successor queriessetO(logn)lower_bound, upper_bound
Priority access (min or max)priority_queueO(logn)No decrease-key; use lazy deletion
Union / connectivity queriesDSUO(α(n))Path compression + rank
String prefix matchingTrie$O(s
Sliding window min/maxMonotonic dequeO(1) amortizedPerfect for fixed-size windows
Next greater / smaller elementMonotonic stackO(n) totalSingle pass

Quick Decision Table

  DATA STRUCTURE SELECTION FLOWCHART:

  Do you have updates?
  |
  +-- NO --> Is the operation overlap-friendly (min, max, gcd)?
  |          +-- YES --> Sparse Table  (O(1) query, O(n log n) build)
  |          +-- NO  --> Is the operation invertible (sum, xor)?
  |                      +-- YES --> Prefix Sums  (O(1) query, O(n) build)
  |                      +-- NO  --> Segment Tree  (O(log n) query, O(n) build)
  |
  +-- YES --> Is the operation invertible (sum, xor)?
              |
              +-- YES --> BIT (Fenwick Tree)
              |           Simpler, 2-3x faster, half the memory of seg tree.
              |           Supports point updates + prefix queries in O(log n).
              |
              +-- NO  --> Segment Tree
                          Handles ANY associative merge (min, max, gcd, custom structs).
                          |
                          +-- Do you need range updates (not just point)?
                              +-- YES --> Segment Tree + Lazy Propagation  O(log n)
                              +-- NO  --> Plain Segment Tree  O(log n)
ScenarioUpdates?Invertible?Best ChoiceQueryWhy not the others?
Static range sumNoYesPrefix sumsO(1)Seg tree is O(log n) -- overkill
Static range minNoNoSparse tableO(1)Seg tree works but O(log n)
Point update + range sumYesYesBITO(log n)Faster & simpler than seg tree
Point update + range minYesNoSegment treeO(log n)BIT can't handle non-invertible ops
Range update + range sumYesYesSeg tree + lazyO(log n)BIT can do it with tricks, but fragile
Range update + range maxYesNoSeg tree + lazyO(log n)Only option for non-invertible + range update

The Range-Query Decision Matrix

This is the most common source of confusion. Use this matrix:

                      Point Update    Range Update    No Updates
                    +---------------+---------------+-------------+
  Invertible op     |               |               |             |
  (sum, xor)        |    BIT        | BIT + tricks  | Prefix sum  |
                    |  O(log n)     |  O(log n)     |   O(1)      |
                    +---------------+---------------+-------------+
  Non-invertible    |               |               |             |
  (min, max, gcd)   | Segment tree  |  Seg tree +   | Sparse table|
                    |  O(log n)     | lazy O(log n) |   O(1)      |
                    +---------------+---------------+-------------+
  Custom merge      |               |               |             |
  (struct, matrix)  | Segment tree  |  Seg tree +   | Seg tree or |
                    |  O(log n)     | lazy O(log n) | sparse table|
                    +---------------+---------------+-------------+

Decision rules:

  1. No updates + invertible -> prefix sum. Don't overthink it.
  2. No updates + non-invertible + overlap-friendly (min, max, gcd) -> sparse table for O(1).
  3. Point updates + invertible -> BIT. It's 2-3x faster than segment tree with half the code.
  4. Anything else -> segment tree. It handles every case; the question is whether something simpler suffices.

BST vs Hash Table

Aspectmap / set (BST)unordered_map / unordered_set (Hash)
LookupO(logn)O(1) average, O(n) worst
Ordered iterationYes (sorted)No
lower_bound / upper_boundO(logn)Not available
Memory~40-50 bytes/node~50-60 bytes/bucket (with chaining)
Hackable?NoYes, unless you use custom hash
When to use BSTNeed ordered traversal, predecessor/successor queries, range iteration
When to use hashPure lookup/insert/delete, no ordering needed, need O(1)

Rule of thumb: Start with unordered_map for speed. Switch to map if you need ordering. Always use a custom hash in contests to prevent adversarial inputs.


When Simpler Beats Fancier

ScenarioDon't useDo useWhy
Static array, sum queriesSegment treePrefix sumsO(1) vs O(logn), trivial code
Fixed-size sliding window maxSegment treeMonotonic dequeO(1) amortized, less code
Count of elements x in sorted arrayBST / segment treeupper_boundAlready sorted -> binary search
Set membership checksetunordered_set or bitsetO(1) vs O(logn)
Small universe (107) frequencymapPlain arrayArray access is 10x faster

Complexity Cheat Sheet

StructureBuildQueryUpdateSpace
Prefix sumO(n)O(1)N/AO(n)
BITO(n)O(logn)O(logn)O(n)
Segment treeO(n)O(logn)O(logn)O(4n)
Sparse tableO(nlogn)O(1)N/AO(nlogn)
set / mapO(nlogn)O(logn)O(logn)O(n)
unordered_mapO(n) avgO(1) avgO(1) avgO(n)
DSUO(n)O(α(n))O(α(n))O(n)
Monotonic stack/dequeO(n)O(1) amortN/AO(n)

What the Code Won't Teach You

Insights drawn from reflection notes

Think operations first, structure second. The first question after reading a problem should be "what operations do I need?" not "what algorithm is this?" Operations drive structure selection; algorithms use the selected structure.

Amortized vs worst-case matters in specific contexts. vector::push_back is O(1) amortized but occasionally O(n). For most CP, amortized is fine. But in online settings where every query must respond in O(logn) worst-case, amortized guarantees can fail.

Memory estimation before coding saves time. A segment tree of 4n nodes with 4 long long values each, at n=106, uses 128 MB -- often exceeding the 256 MB limit. Count before you code.

  OPERATION-FIRST SELECTION FLOWCHART:

  What do I need?
       |
       +-- Point query only?
       |   +-- Array  O(1)
       |
       +-- Range query (static)?
       |   +-- Invertible (sum, xor)?  -> Prefix sums O(1)
       |   +-- Idempotent (min, max)?  -> Sparse table O(1)
       |
       +-- Range query + Point update?
       |   +-- Invertible?  -> BIT  O(log n), simpler
       |   +-- Otherwise?   -> Segment tree  O(log n)
       |
       +-- Range query + Range update?
       |   +-- Segment tree + lazy propagation
       |
       +-- Connectivity queries?
       |   +-- DSU  O(alpha(n)) ~= O(1)
       |
       +-- Ordered access + predecessor?
           +-- std::set / balanced BST  O(log n)
  MEMORY BUDGET CALCULATOR:

  Structure    Nodes    Values/Node   Bytes/Val   Total
  ----------   ------   -----------   ---------   ---------
  Seg tree     4n       1 int         4           16n bytes
  Lazy seg     4n       4 long long   8           128n bytes
  BIT          n        1 int         4           4n bytes
  Sparse tbl   n*logn   1 int         4           4n*logn

  For n=10^6:
    BIT:        ~4 MB     OK
    Seg tree:   ~16 MB    OK
    Lazy seg:   ~128 MB   borderline!
    Sparse tbl: ~80 MB    tight

Common Mistakes in Choosing

Integrated from reflection notes

Trap: "I should use the most powerful structure available."

The most powerful structure often has a larger constant and is harder to debug. Use the weakest structure that supports all required operations correctly.

Trap: "A segment tree can handle anything."

Almost. But it requires an associative merge function. For "k-th unseen element" or "dynamic order statistics," standard segment trees don't apply -- you need order-statistics trees or policy-based data structures.

Trap: "DSU is only for connectivity."

DSU can track component sizes, find cycle lengths, implement Kruskal's MST, support offline LCA, and enable small-to-large merging. The "only connectivity" view undersells it.

Trap: "BIT is always worse than segment tree."

For sum queries with point updates, BIT is simpler, faster (~2-3x), and uses half the memory. BIT is better when it applies. Segment tree's advantage is generality.

Trap: "I need an online data structure for these queries."

Often you do not. If all queries are known up front and the answer to one query does not depend on the answers to others, you can sort, batch, or sweep:

  • Range queries on a static array with offline access -> sweep line + BIT, often simpler than seg tree.
  • Distance / count queries on subarrays -> Mo's algorithm.
  • Threshold-style queries ("first time the answer becomes ≥ k") -> parallel binary search.
  • Large coordinate space -> coordinate compression + BIT, instead of a balanced BST.

Before reaching for a heavy online structure, ask: do the queries truly need to be answered online?


Self-Test

Drawn from reflection notes

  • [ ] List the required operations for: "Add v to all elements in [l,r]; find the maximum in [l,r]." Select the appropriate data structure and justify.

  • [ ] Name three situations where BIT is better than segment tree, and three where segment tree is required over BIT.

  • [ ] Estimate the memory for a lazy segment tree storing 4 long long values per node, with n=3×105. Does it fit in 256 MB?

  • [ ] Name one operation that neither BIT nor standard segment tree supports efficiently. What data structure handles it?

  • [ ] State the time complexity of lower_bound on a std::set, a sorted std::vector, and a BIT (for prefix-sum-based k-th element). When would you use each?


Cross-References


What Would You Do If...?

  1. You need range sum queries with point updates -- BIT or segment tree?
  2. You need range min queries with no updates -- sparse table or segment tree?
  3. You need to merge intervals dynamically -- what DS supports this?

The Mistake That Teaches

Problem E needs range max queries with range add updates. You code a segment tree with lazy propagation -- 80 lines, 2 bugs, 40 minutes. Your teammate solves it with sqrt decomposition in 25 lines, 15 minutes. Sometimes the 'worse' data structure is the better contest choice.


Recap

  • Operations first, structure second: list every query and update before choosing.
  • Invertible + point update = BIT; non-invertible or range update = segment tree.
  • No updates + overlap-friendly = sparse table; no updates + invertible = prefix sums.
  • Simpler beats fancier: use the weakest structure that covers all operations.
  • Online may not be required: if queries can be sorted/batched, an offline approach (sweep, Mo's, parallel binary search) often beats a heavier online structure.
  • Memory check before coding: estimate 4n, 16n, or 128n bytes and compare to the limit.

Built for the climb from Codeforces 1100 to 2100.