Offline Queries

Quick Revisit
USE WHEN: all queries known upfront — reorder them to share computation between overlapping ranges
INVARIANT: queries sorted by key (endpoint, time) so consecutive queries share most of their work
TIME: O((n+q) log n) with sorting + data structure | SPACE: O(n + q)
CLASSIC BUG: forgetting to store original query indices — answers must be output in input order
PRACTICE: 08-ladder-mixed

AL-38 | Read all queries before answering any of them, reorder for efficient processing. Turns $O (n \cdot q)$ brute force into $O ((n + q) \log n)$ with a sweep and a data structure.

Difficulty: [Intermediate]Prerequisites: Sorting and Searching, Fenwick Tree. DSU is a later application (offline connectivity), not a hard prerequisite for the first reading. CF Rating Range: 1400--1800

2. Intuition
3. C++ STL Reference
4. Implementation (Contest-Ready)
5. Operations Reference
6. Problem Patterns & Tricks
7. Contest Cheat Sheet
8. Gotchas & Debugging
9. Self-Test
10. Practice Problems
11. Further Reading

Intuition

You have an array of $n$ elements and $q$ range queries -- range sums, range minimums, count of distinct values, and so on. Answering them online (one at a time, in arrival order) means each query is processed independently: scan the range, compute the answer, move on.

Consider 5 queries on a 6-element array:

  a = [4, 2, 7, 1, 5, 3]
       1  2  3  4  5  6

  Q0: sum [1, 4]       Q1: sum [2, 6]       Q2: sum [1, 3]
  Q3: sum [3, 6]       Q4: sum [2, 5]

Q0 touches positions 1--4. Q4 touches 2--5. They overlap on 2--4, yet online processing scans those positions twice. Q1 and Q3 overlap on 3--6 -- scanned twice again. Every shared sub-range is redundant work that we pay for but gain nothing from. With $q$ queries each spanning up to $n$ elements, the worst case is $O (n \cdot q)$ .

If you know all queries upfront, you can reorder them to minimize redundant work -- sorting by endpoint, sweep line, or grouping lets you share computation across queries.

Think of a librarian who receives 50 book requests at once. Instead of walking to the shelves in request order -- romance, science, romance, history, science -- she batches by shelf location: all romance books in one trip, all science books in the next. The total number of books fetched is the same, but the walking (the redundant scanning) is eliminated.

Offline query processing does exactly this: read every query first, then choose a processing order that lets a single sweep over the data answer all of them.

  Online processing:              Offline processing:

  Q1 --> process --> answer       Q1 --+
  Q2 --> process --> answer       Q2 --+--> sort --> batch process --> answers
  Q3 --> process --> answer       Q3 --+
  ...                             ...
  Qn --> process --> answer       Qn --+

  Each query independent.         Queries reordered for efficiency.
  Total: O(n * q)                 Total: O((n + q) log n)

A Small Taxonomy

Not all offline methods reorder queries the same way. Three distinct flavors show up across this chapter:

Timeline-preserving (sweep + sort key). Queries are sorted by a monotone key (right endpoint, weight threshold, time) and processed in that order while a single data structure sweeps. Examples: sort-by-R + BIT, offline DSU. The relative order of "events" is fixed; only the order of queries changes.
Spatial / pointer-based (Mo's). Queries on a static array are reordered to minimize total pointer movement. The data structure is rebuilt incrementally as the window slides; there is no global timeline.
Answer-guessing / batched search (parallel binary search, CDQ). Queries do not get reordered up-front; instead, all queries are processed in lockstep against trial answers (PBS) or partitioned recursively across a midpoint (CDQ). The work per query stays the same; the saving comes from sharing the expensive "replay" or "merge" step across queries.

This file covers the first flavor. The rest of the chapter covers the other two.

Visual Walkthrough

Array a = [4, 2, 7, 1, 5, 3] (1-indexed). Five range-sum queries:

  Q0: [1, 4]   Q1: [2, 6]   Q2: [1, 3]   Q3: [3, 6]   Q4: [2, 5]

Step 1 -- Sort queries by right endpoint $r$ :

  Q2: [1, 3]  r=3
  Q0: [1, 4]  r=4
  Q4: [2, 5]  r=5
  Q1: [2, 6]  r=6
  Q3: [3, 6]  r=6

Step 2 -- Initialize a BIT (Fenwick tree) of size 6, sweep pointer $i = 1$ . Add $a [1] = 4$ to BIT at position 1.

  BIT state:  [4  .  .  .  .  .]
               ^
               i=1
  No query has r=1. Continue.

Step 3 -- Advance sweep: $i = 2$ , add $a [2] = 2$ . Then $i = 3$ , add $a [3] = 7$ . Now answer Q2 (r=3).

  BIT state:  [4  2  7  .  .  .]
               1  2  3
                     ^
                     i=3

  --> Q2: [1,3] = prefix(3) - prefix(0) = 13 - 0 = 13

Step 4 -- Advance sweep: $i = 4$ , add $a [4] = 1$ . Answer Q0 (r=4).

  BIT state:  [4  2  7  1  .  .]
                        ^
                        i=4

  --> Q0: [1,4] = prefix(4) - prefix(0) = 14 - 0 = 14

Step 5 -- Advance sweep: $i = 5$ , add $a [5] = 5$ . Answer Q4 (r=5).

  BIT state:  [4  2  7  1  5  .]
                           ^
                           i=5

  --> Q4: [2,5] = prefix(5) - prefix(1) = 15 - 4 = 11

Step 6 -- Advance sweep: $i = 6$ , add $a [6] = 3$ . Answer Q1 and Q3 (both have r=6).

  BIT state:  [4  2  7  1  5  3]
                              ^
                              i=6

  --> Q1: [2,6] = prefix(6) - prefix(1) = 22 - 4 = 18
  --> Q3: [3,6] = prefix(6) - prefix(2) = 22 - 6 = 16

Each element is added to the BIT exactly once. Each query is a single $O (\log n)$ BIT prefix query. Total: $O ((n + q) \log n)$ .

The Invariant

+-------------------------------------------------------------------+
| INVARIANT: After the sweep pointer has advanced to position r,    |
| the BIT reflects the full prefix a[1..r], and every query whose   |
| right endpoint <= r has been answered.                             |
+-------------------------------------------------------------------+

This guarantees correctness: when we evaluate a query $[l, r]$ , the BIT already contains every element in $a [1. . r]$ , so $prefix (r) - prefix (l - 1)$ is exact.

When to Reach for This

Trigger phrases in problem statements:

"All queries are given upfront" or "read all queries before answering."
"No updates between queries" -- the array (or graph) is static.
"Sort queries" -- editorials or hints mention reordering queries.
"For each query, find ..." with a static data set.

Counter-examples (this technique does NOT apply):

Online queries with updates -- if elements change between queries, you cannot reorder. Use a segment tree with lazy propagation instead.
Queries must be answered in arrival order -- some problems enforce an ordering (e.g., each query depends on the answer to the previous one). You cannot reorder; consider persistent data structures.
Forced online via XOR decryption -- some problems XOR the next query with the previous answer, making it impossible to know future queries. Mo's algorithm with modifications or persistent structures may help, but basic offline reordering cannot.

Decision tree for choosing an offline technique:

  Which offline technique to use?

           Queries on ranges?
          /                  \
        YES                  NO --> coordinate compression,
        /                          offline connectivity (DSU)
   Need updates
   between queries?
  /              \
 YES              NO
 |                 |
 Online         Can you maintain
 required         state incrementally?
               /          \
             YES           NO
              |             |
           Sort by R      Mo's algorithm
           + BIT/segtree  O((n+q)*sqrt(n))
           O((n+q)*log n)

What the Code Won't Teach You

The design question is always: "what order makes this cheap?" The topic file shows several orderings (by right endpoint, by weight). But in contests you encounter new situations and must reason fresh. Ask yourself: if I sort queries by X, what property does that give me about consecutive queries? Does that property make some operation (BIT update, pointer move) amortize to near-zero?
Offline + divide and conquer is a pattern. A family of offline techniques uses D&C: split the query range $[l, r]$ at the midpoint, count contributions where updates come from the left half and queries from the right half, recurse (CDQ D&C). The offline component is that you have all updates and queries upfront and can assign them to halves.
"Offline" is about information, not implementation. The real question is: does knowing future queries let you avoid redundant work? If yes, offline wins. If queries are inherently sequential (each depends on the last), offline buys you nothing.

C++ STL Reference

Function / Class	Header	What it does	Time Complexity
`sort(first, last)`	`<algorithm>`	Sort queries/edges by chosen key	$O (n \log n)$
`sort(first, last, cmp)`	`<algorithm>`	Sort with custom comparator (lambda)	$O (n \log n)$
`iota(first, last, val)`	`<numeric>`	Fill index array with `0, 1, 2, ...` for indirect sort	$O (n)$
`lower_bound(first, last, val)`	`<algorithm>`	Binary search for coordinate compression	$O (\log n)$
`unique(first, last)`	`<algorithm>`	Remove consecutive duplicates (after sort) for compression	$O (n)$
`vector<array<int,K>>`	`<array>`	Lightweight query struct with default `<` by first element	$O (1)$

Implementation (Contest-Ready)

Version 1 -- Minimal: Offline Distinct Values (Sort by R + BIT)

cpp

#include <bits/stdc++.h>
using namespace std;

struct BIT {
    int n;
    vector<int> t;
    BIT(int n) : n(n), t(n + 1) {}
    void update(int i, int v) { for (; i <= n; i += i & -i) t[i] += v; }
    int query(int i) { int s = 0; for (; i > 0; i -= i & -i) s += t[i]; return s; }
    int query(int l, int r) { return query(r) - query(l - 1); }
};

int main() {
    int n, q;
    scanf("%d%d", &n, &q);
    vector<int> a(n + 1);
    for (int i = 1; i <= n; i++) scanf("%d", &a[i]);

    vector<array<int, 3>> qr(q); // {l, r, original_index}
    for (int i = 0; i < q; i++) {
        scanf("%d%d", &qr[i][0], &qr[i][1]);
        qr[i][2] = i;
    }
    sort(qr.begin(), qr.end(), [](auto& a, auto& b) {
        return a[1] < b[1]; // sort by right endpoint
    });

    BIT bit(n);
    unordered_map<int, int> last;
    vector<int> ans(q);
    int ptr = 0;

    for (int i = 1; i <= n; i++) {
        if (last.count(a[i]))
            bit.update(last[a[i]], -1);
        bit.update(i, 1);
        last[a[i]] = i;
        while (ptr < q && qr[ptr][1] == i) {
            ans[qr[ptr][2]] = bit.query(qr[ptr][0], i);
            ptr++;
        }
    }

    for (int i = 0; i < q; i++)
        printf("%d\n", ans[i]);
}

Version 2 -- Explained: Offline Connectivity (Sort + DSU)

cpp

#include <bits/stdc++.h>
using namespace std;

struct DSU {
    vector<int> parent, rank_;
    DSU(int n) : parent(n), rank_(n, 0) {
        iota(parent.begin(), parent.end(), 0);
    }
    int find(int x) {
        // Path compression: safe here because we never need to undo.
        while (parent[x] != x) x = parent[x] = parent[parent[x]];
        return x;
    }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        // Union by rank keeps tree depth O(log n).
        if (rank_[a] < rank_[b]) swap(a, b);
        parent[b] = a;
        if (rank_[a] == rank_[b]) rank_[a]++;
        return true;
    }
    bool connected(int a, int b) { return find(a) == find(b); }
};

int main() {
    int n, m, q;
    scanf("%d%d%d", &n, &m, &q);

    // Each edge has a weight. We sort edges by weight so we can add them
    // incrementally -- lightest first.
    struct Edge { int u, v, w; };
    vector<Edge> edges(m);
    for (auto& [u, v, w] : edges) scanf("%d%d%d", &u, &v, &w);

    sort(edges.begin(), edges.end(), [](auto& a, auto& b) {
        return a.w < b.w;
    });

    // Each query asks: "are u and v connected using only edges with
    // weight <= w?" We sort queries by w so they align with the edge
    // sweep -- once we reach query weight w, all relevant edges are
    // already in the DSU.
    struct Query { int u, v, w, idx; };
    vector<Query> queries(q);
    for (int i = 0; i < q; i++) {
        scanf("%d%d%d", &queries[i].u, &queries[i].v, &queries[i].w);
        queries[i].idx = i; // preserve input order for output
    }

    sort(queries.begin(), queries.end(), [](auto& a, auto& b) {
        return a.w < b.w;
    });

    DSU dsu(n);
    vector<int> ans(q);
    int ei = 0; // pointer into sorted edge array

    for (auto& [u, v, w, idx] : queries) {
        // Add every edge whose weight fits within this query's limit.
        // Because both arrays are sorted by weight, each edge is
        // processed at most once across all queries -- total O(m * a(n)).
        while (ei < m && edges[ei].w <= w) {
            dsu.unite(edges[ei].u, edges[ei].v);
            ei++;
        }
        ans[idx] = dsu.connected(u, v) ? 1 : 0;
    }

    for (int i = 0; i < q; i++)
        printf("%s\n", ans[i] ? "YES" : "NO");
}

Operations Reference

Operation	Time	Space
Sort queries	$O (q \log q)$	$O (q)$
Sort edges	$O (m \log m)$	$O (m)$
Sweep + BIT update/query (per element)	$O (\log n)$	--
Sweep + BIT total	$O ((n + q) \log n)$	$O (n)$
Sweep + DSU unite/find (per operation)	$O (α (n))$	--
Sweep + DSU total	$O ((m + q) \cdot α (n))$	$O (n)$
Overall (BIT variant)	$O ((n + q) \log n)$	$O (n + q)$
Overall (DSU variant)	$O (m \log m + q \log q + (m + q) \cdot α (n))$	$O (n + m + q)$

Problem Patterns & Tricks

Pattern 1: Sort Queries by Right Endpoint + BIT

Sort all $[l, r]$ queries by $r$ . Sweep $i = 1 \dots n$ , maintaining a BIT. At each position, update the BIT (e.g., mark last occurrence of $a [i]$ ). Answer all queries with right endpoint $= i$ .

  Queries sorted by right endpoint:

  Array indices:  [  0  1  2  3  4  5  6  7  8  9  ]

  Q3: |--L=1------R=3--|                            process first  (R=3)
  Q1: |-----L=2---------R=5--|                       process second (R=5)
  Q5: |--------L=3-----------R=7--|                  process third  (R=7)
  Q2: |-----------L=4-----------R=8--|               process fourth (R=8)
       <-- sweep line moves left to right -->
       adding elements one by one into BIT

Works for: distinct count, number of values in a range, count of elements satisfying a positional property over $[l, r]$ .

Examples: CSES Distinct Values Queries, CF 220B.

Pattern 2: Sort Queries by Right Endpoint + Segment Tree

Same sweep, but use a segment tree when you need range-min/max instead of prefix sums. Typical use: track the minimum distance between two equal elements, updated as the sweep advances.

cpp

// Sketch: when sweep hits position i with value a[i],
// if a[i] was last seen at position p, update seg_tree at p
// with distance (i - p). Query [l, r] for range minimum.

Example: CF 522D (Closest Equals).

Pattern 3: Sort Events and Queries Together (Sweep Line)

Treat array elements and queries as events on a shared axis. Sort everything by position (or coordinate). Process events as you sweep. Useful when queries and data points live in the same space.

cpp

// Merge points and queries into one event list
// Negative type = data point, non-negative = query index
vector<pair<int,int>> events;
sort(events.begin(), events.end());
for (auto [pos, type] : events) {
    if (type < 0) bit.update(/* ... */);
    else ans[type] = bit.query(/* ... */);
}

Example: CF 1311F (Moving Points -- sort by coordinate, BIT for count/sum).

Pattern 4: Offline Connectivity with DSU

Queries ask "connected?" under a weight or time threshold. Sort edges and queries by threshold. Sweep through, adding edges to a DSU, answering queries at the right moment. Each edge is added at most once.

Example: CF 1851G (Vlad and the Mountains).

Pattern 5: Coordinate Compression

The simplest offline trick: read all values (from both data and queries) upfront, then compress to rank space. Required when values reach $10^{9}$ but you need array-indexed structures.

cpp

vector<int> vals(a.begin(), a.end());
for (auto& [l, r, v] : queries) vals.push_back(v);
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
auto compress = [&](int x) {
    return (int)(lower_bound(vals.begin(), vals.end(), x) - vals.begin());
};

Example: Any BIT/segment tree problem with values up to $10^{9}$ .

Pattern 6: DSU with Rollback (Offline Dynamic Connectivity)

When you need to add and remove edges, use DSU with rollback. Union by rank only (no path compression -- must be reversible). Save a checkpoint before temporary unions and restore afterwards.

cpp

struct RollbackDSU {
    vector<int> par, rnk;
    vector<pair<int,int>> history;
    RollbackDSU(int n) : par(n), rnk(n, 0) { iota(par.begin(), par.end(), 0); }
    int find(int x) { while (par[x] != x) x = par[x]; return x; }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rnk[a] < rnk[b]) swap(a, b);
        history.push_back({b, par[b]});
        par[b] = a;
        if (rnk[a] == rnk[b]) { history.push_back({~a, rnk[a]}); rnk[a]++; }
        return true;
    }
    int save() { return (int)history.size(); }
    void rollback(int checkpoint) {
        while ((int)history.size() > checkpoint) {
            auto [node, val] = history.back(); history.pop_back();
            if (node < 0) rnk[~node] = val;
            else par[node] = val;
        }
    }
};

Used in divide-and-conquer on query time intervals. Find is $O (\log n)$ instead of $O (α (n))$ , but rollback works.

Example: CF 1140F (Extending Set of Points).

Contest Cheat Sheet

+------------------------------------------------------------------+
|                   OFFLINE QUERIES CHEAT SHEET                    |
+------------------------------------------------------------------+
| WHEN TO USE:                                                     |
|   - All queries given upfront (no forced online)                 |
|   - Range queries on a static array                              |
|   - "Connected using edges with weight <= w?" type queries       |
|   - Values up to 10^9 --> coordinate compression                 |
+------------------------------------------------------------------+
| KEY TEMPLATES:                                                   |
|   sort(queries.begin(), queries.end(), by_right_endpoint);       |
|   for (int i = 1; i <= n; i++) {                                 |
|       update_BIT(i);                                             |
|       while (ptr < q && queries[ptr].r == i)                     |
|           ans[queries[ptr].idx] = BIT.query(queries[ptr].l, i);  |
|   }                                                              |
+------------------------------------------------------------------+
| COMPLEXITY:                                                      |
|   Sweep + BIT:  O((n + q) log n)  time  |  O(n + q) space       |
|   Sweep + DSU:  O((m + q) a(n))   time  |  O(n + m + q) space   |
+------------------------------------------------------------------+
| REMEMBER:                                                        |
|   - Store original query index for output order                  |
|   - 1-indexed BIT needs 1-indexed queries                        |
|   - Sort edges AND queries by the SAME key                       |
|   - DSU rollback: NO path compression, union by rank only        |
+------------------------------------------------------------------+

Gotchas & Debugging

Forgot to store query index. You sort queries but need to output answers in input order. Always carry the original index. Use array<int,3> or a struct with an idx field.
0-indexed vs 1-indexed. BIT is naturally 1-indexed. If your array is 0-indexed, shift everything by 1 before touching the BIT.
Process order within the sweep. Update the data structure for position $i$ before answering queries with $r = i$ . If you answer first, you miss the contribution of $a [i]$ .
unordered_map TLE. Worst-case $O (n)$ per lookup due to hash collisions. For "last occurrence" tracking, prefer a plain vector<int> after coordinate compression, or use a custom hash.
DSU path compression prevents rollback. If you need rollback, use union by rank without path compression. find() becomes $O (\log n)$ instead of amortized $O (α (n))$ .
Integer overflow in BIT. If you sum values (not just count), use long long in the BIT array.
Queries with $l > r$ . Shouldn't happen per most problem constraints, but guard against it if you generate sub-queries internally.

Mental Traps

"If I can store all queries, I can solve the problem offline." Storing queries is necessary but not sufficient. The benefit of offline processing comes from reordering queries to amortize cost. If you store all queries and process them in input order, you've gained nothing over online processing.
"Offline techniques are a fallback when online is too slow." Online techniques are sometimes strictly better (e.g., segment trees with lazy propagation). Offline is better when: (1) queries can be reordered to enable cheaper processing, or (2) the problem's structure inherently allows precomputation. The choice depends on query structure, not just time complexity.
"Offline means I can use any order I want." You can reorder queries, but the correct order depends entirely on your algorithm. Mo's order (by block + $r$ ) is optimal for Mo's. Sort by right endpoint for BIT sweeps. Sort by query parameter for parallel binary search. The wrong sort order gives a correct-looking but slow or wrong algorithm.
"Offline always means Mo's algorithm." Mo's is one offline technique among many. Others include sort-by-R + BIT, CDQ divide and conquer, parallel binary search, and persistent structures. Jumping to Mo's on every offline problem misses cleaner, faster solutions.
The mistake that teaches (sort-by-R distinct count). I once sorted queries by left endpoint instead of right, while sweeping left-to-right. My BIT didn't contain all information for the range at query time, and 40% of answers were silently wrong. Debugging breakthrough: print the sweep pointer alongside each query's R. Always verify that your sweep direction and sort key are consistent.
The off-by-one on first occurrence. On CSES Distinct Values Queries, I wrote BIT[i] += 1 before BIT[last[a[i]]] -= 1. For the first occurrence (when last[a[i]] == 0), this caused a spurious -1 at position 0 of the BIT -- corrupting all prefix sums. Always check if (last[a[i]] != 0) or initialize last to -1. Off-by-one on sentinel values is the #1 offline query bug.

Debug Drills

Drill 1: DSU with rollback -- but rollback doesn't actually undo the union.

cpp

struct DSU {
    vector<int> par, rank_;
    vector<pair<int,int>> history; // (node, old_parent)
    DSU(int n) : par(n), rank_(n, 0) { iota(par.begin(), par.end(), 0); }
    int find(int x) { while (par[x] != x) x = par[x]; return x; }
    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rank_[a] < rank_[b]) swap(a, b);
        history.push_back({b, par[b]});
        par[b] = a;
        if (rank_[a] == rank_[b]) rank_[a]++;
        return true;
    }
    void rollback(int saved) {
        while ((int)history.size() > saved) {
            auto [node, old_par] = history.back();
            history.pop_back();
            par[node] = old_par;
        }
    }
};

What's wrong?

When rank_[a] == rank_[b] and we increment rank_[a], the rollback only restores par[b] but never restores rank_[a]. After rollback, rank_[a] is permanently inflated, corrupting future union-by-rank decisions. Fix: also push the old rank onto the history stack and restore it during rollback.

Drill 2: Offline distinct-count queries with sort-by-R sweep.

cpp

vector<int> ans(Q);
sort(queries.begin(), queries.end(), [](auto& a, auto& b) {
    return a.r < b.r;
});
int ptr = 0;
for (auto& [l, r, idx] : queries) {
    while (ptr < r) {
        ptr++;
        if (last_pos[a[ptr]] != -1)
            bit.update(last_pos[a[ptr]], -1);
        bit.update(ptr, 1);
        last_pos[a[ptr]] = ptr;
    }
    ans[idx] = bit.query(r) - bit.query(l - 1);
}

What's wrong?

The pointer advances with ptr++ before processing, making it 1-indexed on entry -- but if the array a[] and BIT are 0-indexed, a[ptr] is off-by-one. More critically, the while condition ptr < r means when the loop exits, ptr == r, but the element at index r has already been processed (since ptr++ happens first). If r is 0-indexed, the element at index r is never processed. Fix: use while (ptr <= r) { /* process a[ptr] */ ptr++; } or adjust to while (ptr < r) { /* process a[ptr+1] first, then ptr++ */ } depending on your indexing convention.

Drill 3: Forgetting to map answers back to original indices.

cpp

sort(queries.begin(), queries.end(), [](auto& a, auto& b) {
    return a.r < b.r;
});
vector<int> ans(Q);
for (int i = 0; i < Q; i++) {
    // ... process query i ...
    ans[i] = result; // BUG
}
for (int i = 0; i < Q; i++)
    cout << ans[i] << "\n";

What's wrong?

After sorting, query i in the sorted order is not the i-th original query. Writing ans[i] = result stores the answer at the wrong index. Fix: use ans[queries[i].original_index] = result so the answer is placed at the position corresponding to the original input order.

Drill 4: Sorting queries but forgetting to store indices.

cpp

sort(queries.begin(), queries.end(), [](auto& a, auto& b) {
    return a.r < b.r;
});
for (int i = 0; i < q; i++) {
    // process queries[i]
    ans[i] = bit.query(queries[i].r) - bit.query(queries[i].l - 1);
}
// Output: for (int i = 0; i < q; i++) cout << ans[i] << "\n";

What's wrong?

After sorting, queries[i] is no longer the i-th original query. The answer ans[i] is stored at the sorted position, not the original position. Fix: store the original index in each query struct and use it:

cpp

ans[queries[i].idx] = bit.query(queries[i].r) - bit.query(queries[i].l - 1);

Without this, answers are printed in sorted-by-R order instead of input order.

Drill 5: Offline DSU -- processing order.

cpp

// Edges sorted by weight, queries sorted by weight threshold
int ei = 0;
for (auto& [threshold, u, v, idx] : queries) {
    while (ei < m && edges[ei].w < threshold)  // wrong comparator
        dsu.unite(edges[ei].u, edges[ei].v), ei++;
    ans[idx] = dsu.find(u) == dsu.find(v);
}

What's wrong?

The condition edges[ei].w < threshold uses strict less-than. If the query asks "are u and v connected using edges with weight <= threshold?", edges with weight equal to the threshold are excluded. Fix: use <=:

cpp

while (ei < m && edges[ei].w <= threshold)

This off-by-one on the comparator is especially insidious because it only fails when an edge weight exactly equals a query threshold -- which may not appear in small test cases.

Drill 6: Last occurrence tracking.

cpp

map<int, int> last_pos;
for (int i = 1; i <= n; i++) {
    if (last_pos.count(a[i]))
        bit.update(last_pos[a[i]], -1);
    bit.update(i, 1);
    last_pos[a[i]] = i;
    // answer all queries with r == i
    while (qi < q && queries[qi].r == i) {
        ans[queries[qi].idx] = bit.query(queries[qi].r) - bit.query(queries[qi].l);
        qi++;
    }
}

What's wrong?

The answer should be bit.query(queries[qi].r) - bit.query(queries[qi].l - 1), not bit.query(queries[qi].l). With a 1-indexed BIT, bit.query(x) returns the prefix sum from 1 to x. The count of distinct values in [L, R] is prefix(R) - prefix(L-1). Using prefix(L) excludes position L itself from the count, giving an answer that's too small by 1 whenever position L holds a "last occurrence."

Debug Checklist

Debugging tip. Print queries after sorting to verify the sweep order. Check that every query gets answered exactly once (counter should equal $q$ at the end).

Self-Test

Before moving on, check that you can do each of these without looking back at the notes:

[ ] Define "offline" vs. "online" query processing in one sentence each -- in terms of the constraint on query processing order, not in terms of technique.
[ ] Explain why sorting queries by right endpoint lets a BIT-based sweep answer range queries in $O ((n + q) \log n)$ total time.
[ ] Write the boilerplate for sorting queries with stored indices and outputting answers in original order (~10 lines of C++).
[ ] Name three different offline techniques (beyond Mo's) and describe the problem structure each is suited for.
[ ] Given a new problem statement, identify whether it permits offline processing or forces online -- look for XOR decryption, query-answer dependencies, or "all queries given upfront."
[ ] Explain when DSU with rollback is needed instead of DSU with path compression, and why path compression prevents rollback.
[ ] Describe how coordinate compression enables offline techniques when values reach $10^{9}$ but the number of distinct values is small.

Practice Problems

Rating Progression

CF Rating	What to Master
1200	Recognize when all queries are known upfront; sort queries by one endpoint and sweep with a pointer or prefix sum.
1500	Sort-by-R technique with a BIT to answer "distinct elements in range" or "last occurrence" style queries.
1800	DSU with rollback for connectivity queries on dynamic edge sets; understand when union-by-rank is mandatory (no path compression). Offline DSU (sort edges + queries by weight/time), divide-and-conquer on time.
2100	Combine offline reordering with segment trees or persistent structures; recognize when offline + sweep beats a heavy online DS. CDQ + BIT, parallel binary search + offline DSU, offline + persistent structures.

#	Problem	Source	Difficulty	Key Idea
1	Distinct Values Queries	CSES	Easy	Sort by R, BIT last-occurrence trick
2	CF 220B -- Little Elephant and Array	CF	Medium	Sort by R, BIT counts values appearing exactly $k$ times
3	CF 522D -- Closest Equals	CF	Medium	Sort by R, segment tree tracks min distance between equal values
4	CF 1311F -- Moving Points	CF	Medium	Sort points by coordinate, BIT for pairwise distance sum
5	CF 1851G -- Vlad and the Mountains	CF	Medium	Sort queries and edges by height, offline DSU
6	CF 1140F -- Extending Set of Points	CF	Hard	DSU with rollback, divide-and-conquer on time
7	CF 600E -- Lomsat gelral	CF	Hard	DSU on tree (small-to-large merge), offline subtree queries

Offline Queries ​

Contents ​

Intuition ​

A Small Taxonomy ​

Visual Walkthrough ​

The Invariant ​

When to Reach for This ​

What the Code Won't Teach You ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1 -- Minimal: Offline Distinct Values (Sort by R + BIT) ​

Version 2 -- Explained: Offline Connectivity (Sort + DSU) ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Sort Queries by Right Endpoint + BIT ​

Pattern 2: Sort Queries by Right Endpoint + Segment Tree ​

Pattern 3: Sort Events and Queries Together (Sweep Line) ​

Pattern 4: Offline Connectivity with DSU ​

Pattern 5: Coordinate Compression ​

Pattern 6: DSU with Rollback (Offline Dynamic Connectivity) ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Mental Traps ​

Debug Drills ​

Debug Checklist ​

Self-Test ​

Practice Problems ​

Further Reading ​

Offline Queries

Contents

Intuition

A Small Taxonomy

Visual Walkthrough

The Invariant

When to Reach for This

What the Code Won't Teach You

C++ STL Reference

Implementation (Contest-Ready)

Version 1 -- Minimal: Offline Distinct Values (Sort by R + BIT)

Version 2 -- Explained: Offline Connectivity (Sort + DSU)

Operations Reference

Problem Patterns & Tricks

Pattern 1: Sort Queries by Right Endpoint + BIT

Pattern 2: Sort Queries by Right Endpoint + Segment Tree

Pattern 3: Sort Events and Queries Together (Sweep Line)

Pattern 4: Offline Connectivity with DSU

Pattern 5: Coordinate Compression

Pattern 6: DSU with Rollback (Offline Dynamic Connectivity)

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

Debug Drills

Debug Checklist

Self-Test

Practice Problems

Further Reading