Skip to content

Mo's Algorithm

Quick Revisit

  • USE WHEN: offline range queries on a static array where add/remove one element is O(1)
  • INVARIANT: queries sorted by (⌊l/B⌋, r) with B = √n; total pointer movement O((n+q)√n)
  • TIME: O((n+q)√n) | SPACE: O(n + MAXVAL)
  • INVALID WHEN: array has updates (use 3D Mo); add/remove is not O(1) or not reversible; query answer cannot be maintained incrementally
  • TUNING: when q is much smaller than n, B = max(1, n/√q) is asymptotically tighter; see "block size tuning" below
  • PRACTICE: 08-ladder-mixed

Offline range-query technique that answers q queries on a static array in O((n+q)n) by sorting queries intelligently and maintaining a sliding window. A staple for Div 2 D/E on Codeforces when updates aren't required and online isn't forced.

ID: AL-40 Difficulty: [Advanced]Prerequisites: Sqrt Decomposition, Offline QueriesCF Rating: 1800-2100

Contest Frequency: * Specialized -- appears in offline range query problems


Contents


Intuition

2.1 The Pain

Consider a=[3,1,4,1,5,9,2,6] (n=8) and six queries asking "how many distinct values in [l,r]?":

  Q0: [0, 3]   Q1: [2, 7]   Q2: [1, 4]
  Q3: [0, 7]   Q4: [5, 7]   Q5: [3, 6]

Naive approach: for each query, scan the subarray and count distinct values. Worst case per query is O(n), total O(nq). For n,q=105 that is 1010 operations -- hopelessly slow.

2.2 The Key Insight

Sort queries by (l/n,r) so that consecutive queries have nearly overlapping ranges -- then maintain the answer by adding/removing one element at a time, with total pointer movement O((n+q)n).

Think of a lazy student who has to answer a list of homework questions, each about a different range of the textbook. Instead of answering them in the order given -- flipping wildly between chapters -- the student rearranges the questions so each one is close to the previous one, minimizing page-turning. That's Mo's algorithm: sort queries so that consecutive queries overlap as much as possible, making the total pointer movement minimal.

Concretely, think of two pointers, curL and curR, that define the "current window" over the array. To transition from one query to the next, the pointers walk left or right, adding or removing elements as they go. If queries were in random order the pointers would jump wildly. Mo's sorting guarantees:

  • Within one block of l-values, r is sorted, so the right pointer sweeps mostly forward -- O(n) per block, O(nn) total.
  • The left pointer stays inside a block of width n, moving at most n per transition -- O(qn) total.
  Block structure (block_size = sqrt(n)):         n = 16, block_size = 4
  
  +-------+-------+-------+-------+
  | blk 0 | blk 1 | blk 2 | blk 3 |
  +-------+-------+-------+-------+
  |0 1 2 3|4 5 6 7|8 9 A B|C D E F|
  +-------+-------+-------+-------+
       ^                       ^
       |                       |
    l/B = 0               l/B = 3
  
  Queries sorted by: (l / block_size, R)
  Even blocks: R ascending   -->-->-->
  Odd  blocks: R descending  <--<--<--
  
  This zigzag pattern minimizes total R-pointer movement.

2.3 Visual Walkthrough

Array: a=[3,1,4,1,5,9,2,6], block size B=8=3.

Step 0 -- Sort queries by (l/3,r):

  Original            Block of l    Sorted (by block, then r)
  Q0: [0,3]  blk 0    Q0: [0,3]  blk 0, r=3
  Q1: [2,7]  blk 0    Q2: [1,4]  blk 0, r=4
  Q2: [1,4]  blk 0    Q1: [2,7]  blk 0, r=7
  Q3: [0,7]  blk 0    Q3: [0,7]  blk 0, r=7
  Q4: [5,7]  blk 1    Q5: [3,6]  blk 1, r=6
  Q5: [3,6]  blk 1    Q4: [5,7]  blk 1, r=7

Step 1 -- Process Q0 [0, 3]. Start with empty window (curL=0,curR=1). Add indices 0..3:

  a:  [ 3  1  4  1  5  9  2  6 ]
       *  *  *  *  .  .  .  .
       ^        ^
      curL    curR
  cnt: {3:1, 1:2, 4:1}   distinct = 3

Step 2 -- Process Q2 [1, 4]. Move curL from 0 to 1 (remove a[0]=3), move curR from 3 to 4 (add a[4]=5):

  a:  [ 3  1  4  1  5  9  2  6 ]
       .  *  *  *  *  .  .  .
          ^        ^
        curL     curR
  cnt: {1:2, 4:1, 5:1}   distinct = 3
  Pointer moves: L +1, R +1  (total so far: 2)

Step 3 -- Process Q1 [2, 7]. Move curL from 1 to 2 (remove a[1]=1), move curR from 4 to 7 (add a[5]=9,a[6]=2,a[7]=6):

  a:  [ 3  1  4  1  5  9  2  6 ]
       .  .  *  *  *  *  *  *
             ^              ^
           curL           curR
  cnt: {4:1, 1:1, 5:1, 9:1, 2:1, 6:1}   distinct = 6
  Pointer moves: L +1, R +3  (total so far: 6)

Step 4 -- Process Q3 [0, 7]. Move curL from 2 to 0 (add a[1]=1,a[0]=3), curR stays:

  a:  [ 3  1  4  1  5  9  2  6 ]
       *  *  *  *  *  *  *  *
       ^                    ^
      curL                curR
  cnt: {3:1, 1:2, 4:1, 5:1, 9:1, 2:1, 6:1}   distinct = 7
  Pointer moves: L +2, R +0  (total so far: 8)

Remaining queries Q5 and Q4 (block 1) follow the same pattern. Total pointer movement stays O((n+q)n) -- here roughly 16 moves instead of the naive ~48.

  Mo's pointer movement on sorted queries:

  Array indices:  0   1   2   3   4   5   6   7   8   9
  Blocks:        |--blk 0--|--blk 1--|--blk 2--|--blk 3-|
                     B = 3

  Sorted query order (within block 0):
  Q0: L=0  R=3   L--*--*--R
  Q2: L=1  R=4   .L--*--*--R           R moves +1 -->
  Q1: L=2  R=7   ..L--*--*--*--*--R    R moves +3 --->
  Q3: L=0  R=7   L--*--*--*--*--*--R   R stays, L moves -2 <--
                  ^                ^
                  |                |
               L moves at        R sweeps
              most B=3 per      monotonically
               transition       within block

  Within same block: L moves O(sqrt(n)), R moves monotonically
  Across all blocks: total = O((n + q) * sqrt(n))

Worked Example: Mo's Algorithm -- 5 Queries on Array [2, 0, 1, 3, 2, 1, 4, 2]

Query: count of distinct elements. n = 8, block size B = ⌈√8⌉ = 3.

  Queries (0-indexed, inclusive):
    Q0:[1,5]  Q1:[0,3]  Q2:[4,7]  Q3:[2,6]  Q4:[0,7]

  Step 1 -- Sort by (⌊L/B⌋, R):
  | Query | L | R | Block | Sorted position |
  |-------|---|---|-------|-----------------|
  |  Q1   | 0 | 3 |   0   |       1st       |
  |  Q0   | 1 | 5 |   0   |       2nd       |
  |  Q3   | 2 | 6 |   0   |       3rd       |
  |  Q4   | 0 | 7 |   0   |       4th       |
  |  Q2   | 4 | 7 |   1   |       5th       |

  Step 2 -- Process in sorted order, tracking [curL, curR]:
  Array: [2, 0, 1, 3, 2, 1, 4, 2]

  | Order | Query | [curL,curR]  | L moves  | R moves | adds | removes | answer |
  |-------|-------|-------------|----------|---------|------|---------|--------|
  |   --   |  init | [0, -1]     |    --     |    --    |   --  |    --    |   --    |
  |   1   |  Q1   | [0, 3]      | 0->0 = 0 | -1->3=+4|   4  |    0    |   4    |
  |   2   |  Q0   | [1, 5]      | 0->1 = +1| 3->5 =+2|   2  |    1    |   4    |
  |   3   |  Q3   | [2, 6]      | 1->2 = +1| 5->6 =+1|   1  |    1    |   4    |
  |   4   |  Q4   | [0, 7]      | 2->0 = -2| 6->7 =+1|   3  |    0    |   5    |
  |   5   |  Q2   | [4, 7]      | 0->4 = +4| 7->7 = 0|   0  |    4    |   4    |

  Totals: 8 L-moves + 8 R-moves = 16 add/remove ops
  Naive (independent per query): 4+6+4+5+8 = 27 ops
  Mo's order saves ~40% here; gap widens with more queries.

2.4 The Invariant

+------------------------------------------------------------------+
| INVARIANT: Between queries the data structure reflects the exact  |
| answer for the range [curL, curR].                                |
|                                                                   |
| - Each add(val) / remove(val) runs in O(1)                       |
|   (or O(log n) with augmented structures).                        |
| - Total pointer movement across ALL queries: O((n+q) * sqrt(n)). |
+------------------------------------------------------------------+

Why O(1) per element? For distinct count, maintain cnt[val]. add: increment cnt[val]; if it goes 01, increment distinct. remove: decrement cnt[val]; if it goes 10, decrement distinct.

2.5 When to Reach for This

Trigger phrases in problem statements:

  • "offline range queries" -- queries given upfront, answers output at the end.
  • "no updates" / "static array" -- the array does not change.
  • "distinct count in range", "frequency of values in subarray".
  • "queries can be reordered" -- explicitly or implicitly.

Counter-examples (when NOT to use Mo's): these are recognition negatives -- if any one applies, Mo's is the wrong tool, and noticing the negative is as important as noticing the positive trigger phrases above.

SituationWhy not Mo'sUse instead
Array has point updates between queriesadd/remove invariant breaks; the array no longer reflects "the same window" across queriesMo with updates / 3D Mo (O(n5/3), more complex), or segment tree
Online required (next query depends on previous answer, XOR decryption, etc.)Cannot reorder queries; midpoints cannot be batchedPersistent segment tree, merge-sort tree
add/remove is not O(1) -- e.g., requires recomputing a non-invertible aggregateTotal becomes O((n+q)nf); often TLE for any non-trivial fSegment tree with merge, offline divide-and-conquer, or Mo with rollback if remove is the only expensive side
Query answer cannot be maintained incrementally (e.g., median of window)Even with O(1) data structure ops, recomputing the answer per pointer move is Ω(logn)Wavelet tree, persistent structures
n,q>2×105 and TL is tightn constant factor can be largeWavelet tree, offline D&C

What the Code Won't Teach You

Recognizing Mo-solvable problems. The key condition: the problem is offline, the array is static (no updates), and there exists an efficient (ideally O(1)) way to add or remove a single element from the "current answer." Distinct count, sum, XOR, frequency-based functions -- all satisfy this. Problems that don't: anything requiring you to "undo" a non-invertible operation (use Mo with rollback in that case).

The add/remove function is the design problem. The template gives add/remove for distinct count. In contests, you need to design add/remove for whatever the problem asks. This design question is often harder than Mo's itself. Examples:

  • "Sum of frequencies squared" -> maintain a counter-of-counters
  • "XOR of all elements" -> XOR is its own inverse, add = remove
  • "k-th most frequent" -> harder, might need a sorted structure or secondary data structure

The initialization. Mo's starts with an empty window (curL = 0, curR = -1). This means your first query expands from empty. Your add/remove functions must handle adding to an empty state correctly -- test that add works when cnt[] is all zeros and cur_ans is 0.

How This Evolved

The most straightforward answer to "how many distinct values in [l,r]?" is brute force: for each query, scan the subarray and count. That's O(n) per query, O(nq) total -- fine for small inputs, disastrous at scale. If the array is static and queries can be answered offline, can we do better by choosing a smarter order?

The critical observation is that if you already know the answer for [l,r], extending to [l,r+1] or [l1,r] costs just O(1) -- add one element, update the count. So the real cost of answering all queries is proportional to how far the two pointers travel in total. Random query order means wild pointer jumps. Mo's insight is to sort queries by (l/n,r), grouping them into n blocks by their left endpoint. Within each block, the left pointer moves at most O(n) per transition, and the right pointer sweeps monotonically across the array -- O(n) per block. Summing over all blocks gives O(nn) for the right pointer and O(qn) for the left, yielding O((n+q)n) total.

Later refinements sharpened this further. The zigzag optimization (alternating the sort direction of r in even/odd blocks) roughly halves the right-pointer movement in practice. Mo with rollback extends the technique to problems where the remove operation is expensive or impossible, by only adding elements and rolling back to a checkpoint. And Mo on trees adapts the block-decomposition idea to Euler-tour flattened trees, answering path queries offline. Each variant builds on the same core principle: control pointer movement by sorting queries intelligently.


C++ STL Reference

Function / ClassHeaderWhat it doesTime
sort(v.begin(), v.end(), cmp)<algorithm>Sort queries by Mo orderingO(qlogq)
vector<int> cnt(MAXVAL)<vector>Frequency array for add/removeO(1) per access
__builtin_clz(x)built-inBit tricks for block computationO(1)
swap(a, b)<utility>Swap elements in add/removeO(1)

Key point: Mo's algorithm uses almost no STL beyond sorting. The core is raw array manipulation with cnt[] for O(1) add/remove. Avoid map/set inside add/remove -- the O(logn) factor per operation turns the total complexity into O((n+q)nlogn), which often TLEs.


Implementation (Contest-Ready)

Problem: Given an array of n integers and q queries [l,r], for each query output the number of distinct values in a[l..r] (0-indexed).

Pointer Movement Order Matters

When moving from query [curL, curR] to [L, R], expand BEFORE shrinking:

  1. while (curR < R) add(a[++curR]); -- expand right
  2. while (curL > L) add(a[--curL]); -- expand left
  3. while (curR > R) remove(a[curR--]); -- shrink right
  4. while (curL < L) remove(a[curL++]); -- shrink left

Why this order? Expanding first ensures the window always covers a valid range. If you shrink first, you might temporarily have curL > curR (an invalid/empty window), which can cause index-out-of-bounds or wrong state in your add/remove functions.

Pre-increment vs post-increment: add(a[++curR]) adds element THEN moves pointer. add(a[curR++]) moves pointer THEN adds -- off-by-one. The ++curR / --curL pattern for expansion and curR-- / curL++ for shrinking is the safe convention.

  Extending / shrinking the current window [L..R]:

  Current:     [  L . . . . . . . R  ]
                  ^               ^
                curL            curR

  Extend R:    [  L . . . . . . . R--->R+1 ]   add(arr[++curR])
  Extend L:    [ L-1<---L . . . . . . . R  ]   add(arr[--curL])
  Shrink R:    [  L . . . . . . R-1 ]<--R       rem(arr[curR--])
  Shrink L:    L-->[ L+1 . . . . . . . R   ]   rem(arr[curL++])

  +----------------------------------------------------------+
  | ORDER MATTERS: always expand before shrink!              |
  |                                                          |
  |  SAFE:   expand R --> expand L --> shrink R --> shrink L  |
  |  UNSAFE: shrink first --> window may become empty/invalid |
  +----------------------------------------------------------+

Version 1 -- Minimal contest template

cpp
#include <bits/stdc++.h>
using namespace std;

int block;

struct Query {
    int l, r, idx;
    bool operator<(const Query& o) const {
        int b1 = l / block, b2 = o.l / block;
        if (b1 != b2) return b1 < b2;
        return (b1 & 1) ? (r > o.r) : (r < o.r); // odd block: r desc, even block: r asc
    }
};

int cnt[1000001];
int cur_ans;

void add(int val) { if (++cnt[val] == 1) cur_ans++; }
void rem(int val) { if (--cnt[val] == 0) cur_ans--; }

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, q;
    cin >> n >> q;
    vector<int> a(n);
    for (auto& x : a) cin >> x;

    block = max(1, (int)sqrt(n));
    vector<Query> queries(q);
    for (int i = 0; i < q; i++) {
        cin >> queries[i].l >> queries[i].r;
        queries[i].idx = i;
    }
    sort(queries.begin(), queries.end());

    vector<int> ans(q);
    int curL = 0, curR = -1;
    cur_ans = 0;

    for (auto& [l, r, idx] : queries) {
        while (curR < r) add(a[++curR]);
        while (curL > l) add(a[--curL]);
        while (curR > r) rem(a[curR--]);
        while (curL < l) rem(a[curL++]);
        ans[idx] = cur_ans;
    }

    for (int i = 0; i < q; i++)
        cout << ans[i] << "\n";
}

Version 2 -- Explained version

cpp
#include <bits/stdc++.h>
using namespace std;

// Block size for Mo's ordering. Set to sqrt(n).
int block;

struct Query {
    int l, r, idx;
    bool operator<(const Query& o) const {
        int b1 = l / block, b2 = o.l / block;
        if (b1 != b2) return b1 < b2;
        // Odd-even trick: zigzag R between blocks to minimize pointer movement.
        //   even blocks (b1 & 1 == 0): sort r ascending  (r < o.r)
        //   odd  blocks (b1 & 1 == 1): sort r descending (r > o.r)
        // This reduces total right-pointer movement by ~30-40%.
        return (b1 & 1) ? (r > o.r) : (r < o.r);
    }
};

// cnt[v] = how many times value v appears in the current window [curL, curR].
// cur_ans = number of distinct values (cnt[v] > 0) in the window.
int cnt[1000001];
int cur_ans;

// Include value val into the window.
void add(int val) {
    cnt[val]++;
    if (cnt[val] == 1) cur_ans++; // first occurrence -> new distinct value
}

// Exclude value val from the window.
void rem(int val) {
    cnt[val]--;
    if (cnt[val] == 0) cur_ans--; // last occurrence removed -> lost a distinct value
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, q;
    cin >> n >> q;
    vector<int> a(n);
    for (auto& x : a) cin >> x;

    // Block size = sqrt(n). The max(1, ...) guards against n=0.
    block = max(1, (int)sqrt(n));

    vector<Query> queries(q);
    for (int i = 0; i < q; i++) {
        cin >> queries[i].l >> queries[i].r;
        queries[i].idx = i; // remember original order for output
    }

    // Sort all queries by Mo's ordering.
    sort(queries.begin(), queries.end());

    vector<int> ans(q);
    // Current window is [curL, curR]. Start empty: curL=0, curR=-1.
    int curL = 0, curR = -1;
    cur_ans = 0;

    for (auto& [l, r, idx] : queries) {
        // Expand/shrink window to match [l, r].
        // ORDER MATTERS: expand before shrinking to avoid
        // the window becoming invalid (curL > curR+1 with wrong state).
        //
        // Safe order: expand R, expand L, shrink R, shrink L.
        // This ensures the window always contains a valid range.
        while (curR < r) add(a[++curR]);   // expand right
        while (curL > l) add(a[--curL]);   // expand left
        while (curR > r) rem(a[curR--]);   // shrink right
        while (curL < l) rem(a[curL++]);   // shrink left

        ans[idx] = cur_ans;
    }

    // Output answers in original query order.
    for (int i = 0; i < q; i++)
        cout << ans[i] << "\n";
}

Operations Reference

OperationTimeSpace
Sort queries (Mo order)O(qlogq)O(q)
Process all queries (total)O((n+q)n)O(n)
Single add/remove (distinct count)O(1)-
Single add/remove (with map/set)O(logn)-
Mo with updates (3D Mo)O(n5/3)O(n)
Mo on trees (Euler tour)O((n+q)n)O(n)

Notes:

  • The O((n+q)n) bound assumes B=n. When n and q differ significantly, optimal block size is B=n/q, giving O(nq).
  • With the odd-even optimization, the constant factor roughly halves.
  • If add/remove is O(logn), total becomes O((n+q)nlogn) -- often too slow for n,q=2×105 under 2s. Try to make add/remove O(1).
  • Space is dominated by the frequency array. If values are up to 106, that's 4 MB for int cnt[]. Coordinate-compress if values are up to 109.


🔍 Why This Works -- Block Size √n Minimizes Total Movement

Mo's algorithm answers offline range queries by sorting them cleverly and moving two pointers (left, right) to adjust the current range.

Why √n blocks? Split the array into blocks of size B. Sort queries by (block of left endpoint, then right endpoint).

  • Right pointer: Within a block of left endpoints, right moves monotonically. Across n/B blocks, right travels O(n) per block -> total O(nn/B).
  • Left pointer: Between consecutive queries in the same block, left moves O(B). Over q queries, left travels O(qB).
  • Total: O(n2/B+qB). Setting B=n balances both terms to O((n+q)n).

📐 Where Does the Complexity Come From? -- O((n+q)√n) Derivation

Setup: Array of size n, q queries, block size B.

Right pointer movement: Within each block of left endpoints, right moves monotonically (we sort by right within a block). Each block causes right to sweep at most O(n). There are n/B blocks -> right moves O(nn/B)=O(n2/B) total.

Left pointer movement: Between consecutive queries in the same block, left moves at most O(B) (both lefts are in the same block). Between blocks, left moves at most O(B) too. Over q queries -> left moves O(qB) total.

Total: O(n2/B+qB). Taking the derivative w.r.t. B and setting to zero: n2/B2+q=0B=n/q. For qn, this gives B=n and total O((n+q)n).

Problem Patterns & Tricks

Pattern 1 -- Distinct element count in range

The classic Mo application. Maintain cnt[val]; increment distinct when cnt goes 01, decrement when 10.

Recognize it: "How many distinct values in [l,r]?" on a static array with offline queries.

CF 86D  -- Powerful Array (sum of cnt[x]^2 * x)
SPOJ DQUERY -- D-query (exact distinct count)

Pattern 2 -- Frequency-based aggregate queries

Queries ask for f(cntx) over all values x in [l,r], where f is some function of frequency. Common variants: sum of squares of frequencies, count of values appearing exactly k times.

Trick: Maintain a secondary counter freq_cnt[k] = number of values with frequency k. Update both cnt[] and freq_cnt[] in add/remove.

cpp
void add(int val) {
    freq_cnt[cnt[val]]--;
    cnt[val]++;
    freq_cnt[cnt[val]]++;
    cur_ans += 2LL * cnt[val] - 1; // for sum-of-squares: delta is (c+1)^2 - c^2
}
CF 86D  -- Powerful Array (sum of cnt[x]^2 * x)
CF 375D -- Tree and Queries (count of colors with freq >= k)

Pattern 3 -- XOR / bitwise queries on ranges

Query: XOR of all elements in [l,r], or count of pairs with XOR property. XOR is its own inverse, so add and remove are the same operation.

cpp
void add(int val) { cur_xor ^= val; }
void rem(int val) { cur_xor ^= val; } // same as add!
CF 617E -- XOR and Favorite Number (count pairs with XOR = k)

Pattern 4 -- Mo with updates (Mo's algorithm with modifications)

Handle point updates by adding a time dimension. Each query becomes (l,r,t) where t is the number of updates before this query. Block size B=n2/3. Sort by (l/B,r/B,t). Three pointers: left, right, time. Total: O(n5/3).

cpp
struct Query {
    int l, r, t, idx;
    bool operator<(const Query& o) const {
        int b1 = l / block, b2 = o.l / block;
        if (b1 != b2) return b1 < b2;
        int c1 = r / block, c2 = o.r / block;
        if (c1 != c2) return c1 < c2;
        return t < o.t;
    }
};

Apply/undo updates as the time pointer moves forward/backward.

CF 940F  -- Machine Learning (Mo with updates, count of mex of frequencies)
CF 1476F -- Lanterns (offline with updates variant)

Pattern 5 -- Mo on trees

Flatten the tree using an Euler tour. A path query (u,v) maps to one or two subarray queries on the Euler tour array. Use in[v] and out[v] timestamps. If u is an ancestor of v, query [in[u],in[v]]. Otherwise, query [out[u],in[v]] and handle the LCA separately.

Key trick: Each node appears twice in the Euler tour (entry and exit). Toggle: first occurrence adds the node, second occurrence removes it.

SPOJ COT2  -- Count on a tree II (distinct colors on path)
CF 375D    -- Tree and Queries (colors with freq >= k on subtree)

Pattern 6 -- Mo with rollback (Undo Mo)

Some operations (like "merge two sets") don't have efficient remove. Mo with rollback handles this: at the start of each block of queries, save a checkpoint. Process queries within the block by only expanding right (add-only). For the left pointer, expand temporarily and rollback after answering.

When to use: DSU-based queries (e.g., number of connected components in a range of edges), where remove is hard but add + undo is possible.

CF 896C -- Willem, Chtholly and Seniorious (chtholly tree, but Mo+rollback also works)
CF 1039D -- You Are Given a Tree (sqrt decomposition + Mo-like approach)

Contest Cheat Sheet

+----------------------------------------------------------------+
|  MO'S ALGORITHM -- QUICK REFERENCE                             |
+----------------------------------------------------------------+
|  WHEN: Offline range queries, static array, O(1) add/remove    |
|  TIME: O((n+q)sqrt(n))    SPACE: O(n + MAXVAL)                 |
+----------------------------------------------------------------+
|  SETUP:                                                        |
|    block = max(1, (int)sqrt(n));                                |
|    sort queries by (l/block, r)  // odd-even: flip r in odd    |
+----------------------------------------------------------------+
|  CORE LOOP:                                                    |
|    int curL=0, curR=-1;                                        |
|    for(auto&[l,r,idx]:queries){                                |
|      while(curR<r) add(a[++curR]);                             |
|      while(curL>l) add(a[--curL]);                             |
|      while(curR>r) rem(a[curR--]);                             |
|      while(curL<l) rem(a[curL++]);                             |
|      ans[idx] = cur_ans;                                       |
|    }                                                           |
+----------------------------------------------------------------+
|  WATCH OUT:                                                    |
|  - Expand BEFORE shrink (avoid invalid empty states)           |
|  - Block size sqrt(n), NOT sqrt(q)                             |
|  - Coordinate compress if vals > 1e6                           |
|  - add/remove MUST be O(1) or you blow the time budget         |
|  - Odd-even sort: (b&1) ? r DESC : r ASC  (~30% faster)       |
|  - Store original query index for output ordering              |
+----------------------------------------------------------------+

Gotchas & Debugging

Block size tuning (advanced constant-factor note). The walkthrough above and the contest template both use B = sqrt(n). That is the right starting point: simple, correct, and asymptotically optimal when nq. The sharper bound is B = ceil(n / sqrt(q)):

  • When nq: B = sqrt(n) and B = n / sqrt(q) agree.
  • When qn: B = n / sqrt(q) reduces total pointer movement and can be 2-3x faster -- the difference between AC and TLE on problems with many fewer queries than array elements.
  • Empirically, try B = max(1, (int)ceil(n / sqrt(q))) if the default is borderline.

This is a tuning move, not the default. Do not strip the simple sqrt(n) template until you understand why the alternative is asymptotically tighter.

1. Block size matters more than you think. Use B=n as default. For problems where qn, use B=n/q for tighter bounds. Wrong block size is the #1 cause of TLE with Mo's algorithm. Experiment: try B = \max(1, (int)(n / \sqrt{q + 0.5))) if default TLEs.

2. Order of expand/shrink operations. The four while-loops must expand the window before shrinking it. If you shrink first, you can end up with curL>curR+1 in a state where add/remove assumptions break. Safe order: expand R, expand L, shrink R, shrink L.

3. Off-by-one in add/remove boundaries. The idiom add(a[++curR]) increments first, then accesses. rem(a[curR--]) accesses first, then decrements. Mixing these up shifts your window by one. Test on a single-element query [i,i] -- it should add exactly a[i].

4. Forgetting to coordinate-compress. If values go up to 109 but there are only n2×105 distinct values, you can't allocate cnt[1000000000]. Compress values to [0,n) first.

5. Not saving the query index. After sorting, queries are in Mo order, not input order. If you forget to store and use idx, your output is scrambled. Always assign queries[i].idx = i before sorting.

6. Incorrect odd-even comparator. The tiebreaker for r must flip direction for odd vs even blocks. If you accidentally use (b1 & 1) ? (r < o.r) : (r > o.r) (inverted), the optimization hurts instead of helping -- right pointer zigzags more.

7. Mo with updates: wrong block size. For 3D Mo (with updates), block size should be B=n2/3, not n. Using n gives O(n7/4) instead of O(n5/3).

8. Global state not reset between test cases. If the problem has multiple test cases, cnt[] and cur_ans must be reset. Prefer memset(cnt, 0, sizeof cnt) or clear only the used entries.

Mental Traps

Trap: "Mo's is slow -- it's O(nn), which is worse than a segment tree." This comparison is wrong because it compares apples to oranges. Mo's handles query types that segment trees can't handle efficiently at all -- like "number of distinct values in a range." The relevant comparison is: Mo's O(nn) vs. naïve O(nq). Mo's wins enormously.

Trap: "Mo's is just sorting queries smartly -- the rest is obvious." The sort is clever but the reason it works requires understanding the amortized analysis. If you just sort without understanding why this particular ordering minimizes pointer movement, you'll make mistakes when the analysis assumptions change (e.g., applying Mo's to a problem with updates, or with a non-O(1) add/remove).

Trap: "I can use Mo's for problems with updates." Standard Mo's requires a static array. For problems with updates, you need "Mo's with updates" (3D Mo), which has block size n2/3 and complexity O(n5/3). Using standard Mo's block size (n) for 3D Mo gives O(n7/4), which is significantly worse. These are different algorithms with different sort criteria.

Trap: "The odd-even optimization is just a constant factor." It roughly halves the right-pointer movement in practice. For problems near the time limit, this is the difference between AC and TLE. It's worth including as a default habit, not treating as optional.

The mistake that teaches

Sum-of-frequency-squares remove delta. I was solving a sum-of-squares-of-frequencies problem with add = answer += 2*cnt[a[pos]] + 1; cnt[a[pos]]++; and naively wrote remove = answer -= 2*cnt[a[pos]] + 1; cnt[a[pos]]--;. But in add, delta is computed before incrementing ((cnt+1)^2 - cnt^2 = 2*cnt + 1). In remove, we need delta before decrementing (cnt^2 - (cnt-1)^2 = 2*cnt - 1). My code subtracted 2*cnt + 1 instead of 2*cnt - 1 -- off by 2 per removal. Invisible on small tests; wrong on large. Lesson: when writing remove, re-derive the delta formula; don't just negate add.

CF 86D Powerful Array (variable shadowing + template dependence). I copy-pasted a Mo's template, stripped the parity optimization "to simplify," and introduced a second bug: I computed B = sqrt(n) but had overwritten n with the query count earlier. With n_array = 200000 and q = 200000 both assigned to n, it passed samples but WA on test 2. Lesson: don't reuse variable names (n_array vs q), and don't strip pieces from templates until you understand why they're there.

Debug Drills

Drill 1: Mo's algorithm with incorrect pointer movement order.

cpp
int curL = 0, curR = -1;
for (auto& [l, r, idx] : queries) {
    while (curR < r) add(++curR);
    while (curL < l) remove(curL++);
    while (curR > r) remove(curR--);
    while (curL > l) add(--curL);
    ans[idx] = answer;
}
What's wrong?

The order of the four while loops matters. Here, curR expands first, then curL shrinks, then curR shrinks, then curL expands. The problem is: if we first expand R and then shrink L, we might temporarily have a very large window, which is fine -- but if we first shrink R before expanding L, we could get curR < curL - 1 (empty window going negative). The canonical safe order is: expand both directions first, then shrink both: while (curL > l) add(--curL); while (curR < r) add(++curR); while (curL < l) remove(curL++); while (curR > r) remove(curR--);. The given order can cause remove to be called when the element isn't in the window, corrupting cnt[].

Drill 2: Wrong block size causes TLE.

cpp
int B = sqrt(n); // n = 100000, so B = 316
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
    if (a.l / B != b.l / B) return a.l / B < b.l / B;
    return a.r < b.r;
});
What's wrong?

The code is correct but suboptimal, leading to TLE on tight time limits. The fix is the block-parity optimization: for even-numbered blocks, sort by R ascending; for odd-numbered blocks, sort by R descending. This reduces the total R-pointer movement by roughly half. Fix: return (a.l / B != b.l / B) ? (a.l / B < b.l / B) : ((a.l / B) & 1) ? (a.r > b.r) : (a.r < b.r);

Drill 3: add and remove with unbounded values.

cpp
int answer = 0;
int cnt[MAXVAL] = {};

void add(int pos) {
    if (cnt[a[pos]] == 0) answer++;
    cnt[a[pos]]++;
}

void remove(int pos) {
    cnt[a[pos]]--;
    if (cnt[a[pos]] == 0) answer--;
}
What's wrong?

In add, the check cnt[a[pos]] == 0 happens before incrementing, so it correctly detects a new distinct element. In remove, the decrement happens before the check, so cnt[a[pos]] == 0 correctly detects that the element is gone. This looks correct -- but the bug is subtle: if a[pos] can be negative or exceed MAXVAL, then cnt[a[pos]] is an out-of-bounds access. This commonly happens when the values aren't coordinate-compressed. Fix: compress values to [0, n) before running Mo's, or use an unordered_map instead of a fixed array.

Drill 4: Add/remove asymmetry with initial window state.

cpp
void add(int idx) {
    cnt[a[idx]]++;
    if (cnt[a[idx]] == 1) distinct++;
}
void remove(int idx) {
    if (cnt[a[idx]] == 1) distinct--;
    cnt[a[idx]]--;
}
// Process queries
int curL = 0, curR = -1;
for (auto& q : queries) {
    while (curR < q.r) add(++curR);
    while (curL > q.l) add(--curL);
    while (curR > q.r) remove(curR--);
    while (curL < q.l) remove(curL++);
    ans[q.idx] = distinct;
}
What's wrong?

The order of the four while-loops matters! The code first expands R, then expands L, then shrinks R, then shrinks L. This is correct. However, the bug is subtle: if curL starts at 0 and curR starts at -1, the first expansion adds elements correctly. But consider the remove function -- it checks cnt[a[idx]] == 1 before decrementing. If a[idx] has count 0 (never added), decrementing makes it -1. This happens if curL/curR initialization is wrong relative to the first query. The initialization curL = 0, curR = -1 means the window is empty, which is correct -- but if queries[0] has L > 0, the "shrink L" loop runs remove(0) on an element that was never added. Fix: Expand before shrinking. The given order does expand first, so the real issue is that curL = 0 means element 0 is "inside" the window when curR = -1 implies it's empty. Use curL = 1, curR = 0 or ensure the first operations are always expansions.

Drill 5: Block comparator doesn't handle ties correctly.

cpp
bool cmp(Query& a, Query& b) {
    return a.l / B < b.l / B || a.r < b.r;
}
What's wrong?

This comparator is not a valid strict weak ordering. When a.l / B == b.l / B, it falls through to a.r < b.r, which is correct. But when a.l / B < b.l / B, it returns true regardless of R -- that's fine. The problem is when a.l / B > b.l / B: the first condition is false, so it evaluates a.r < b.r, returning true if a.r < b.r even though a's block is after b's block. Fix:

cpp
if (a.l / B != b.l / B) return a.l / B < b.l / B;
return a.r < b.r;

The original can cause cmp(a,b) == true && cmp(b,a) == true simultaneously, violating strict weak ordering and causing undefined behavior in std::sort.

Drill 6: Mo's with wrong block size.

cpp
int n, q;
cin >> n >> q;
int B = sqrt(q);  // should be sqrt(n)
// ... read array and queries ...
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
    if (a.l / B != b.l / B) return a.l / B < b.l / B;
    return a.r < b.r;
});
What's wrong?

Block size should be based on the array size n, not the number of queries q. With B = sqrt(q): if q is much smaller than n (say n = 200000, q = 100), then B ~= 10, creating 20000 blocks. The L pointer moves O(B) = O(10) per query, but R moves O(n) per block transition, totaling O(n * n/B) = O(n * 20000) which is far worse than intended. The standard formula B = sqrt(n) ensures total moves are O((n + q) * sqrt(n)). For the optimal block size when n and q differ significantly, use B = max(1, (int)(n / sqrt(q))).

Debugging checklist:

  • Print curL, curR, cur_ans after each query. Does the window match [l,r]?
  • Test on a single query [0,n1] -- should give the answer for the whole array.
  • Test on n=1,q=1.
  • Verify block size: cerr << "block=" << block << "\n";

Self-Test

Without opening the implementation above, verify you can reproduce these from memory:

  • [ ] Write the Mo's query comparator (the operator< in the Query struct) from memory, including the odd-even block optimization for the right-pointer tiebreak.
  • [ ] Write the main processing loop -- four while-loops in the correct expand-before-shrink order with correct pre/post increment pointer arithmetic (++curR, --curL, curR--, curL++).
  • [ ] Explain the amortized cost analysis in 3 sentences: why does the right pointer move O(nn) total? Why does the left pointer move O(qn) total?
  • [ ] Design add(val) and rem(val) for a new problem: "count the number of pairs (i,j) in the window with a[i]=a[j]." (Hint: this is the sum-of-squares-of-frequencies variant -- the delta when frequency goes from c to c+1 is +2c.)
  • [ ] State what changes for Mo's with updates (3D Mo): which new pointer is added, what block size to use (n2/3), and what the resulting complexity is (O(n5/3)).
  • [ ] Given a problem statement, identify whether Mo's is applicable: check for offline queries, static array, and O(1) add/remove. List three problem types where Mo's does NOT apply and name the alternative technique for each.
  • [ ] Explain why expanding the window before shrinking is necessary -- what invalid state can occur if you shrink first, and how does it break the add/remove invariant?

Practice Problems

Rating Progression

CF RatingWhat to Master
1200Understand the brute-force "two-pointer window" for range queries; recognize when all queries are offline and the answer depends on frequency counts.
1500Implement standard Mo's: sort by (l/B, r) with block size B = √n; maintain cnt[] array and a running answer; handle add/remove correctly. Output answers by original index.
1800Mo's with the parity optimization (even blocks sort R ascending, odd blocks sort R descending); Mo's on trees via Euler tour flattening. Mo's with rollback for non-invertible operations.
2100Mo's with updates (3D Mo's with timestamps); choosing optimal block size B ~= n^(2/3) for update variants; combining Mo's with bitset or other DS for complex queries; tuning block size for specific Q/N ratios.
#ProblemSourceDifficultyKey Idea
1D-querySPOJEasyTextbook distinct count with Mo's
2Powerful ArrayCF 86DEasy-MedSum of cnts2s; practice add/remove with square updates
3Tree and QueriesCF 375DMediumMo on subtrees, count colors with freq k
4XOR and Favorite NumberCF 617EMediumMo + prefix XOR, count pairs with XOR = k
5Lomsat gelralCF 600EMediumDSU on tree (alternative to Mo), dominant color per subtree
6Machine LearningCF 940FMedium-HardMo with updates (3D Mo), mex of frequency array
7Count on a Tree IISPOJMedium-HardMo on tree paths via Euler tour
8Arpa's letter-marked treeCF 741DHardDSU on tree / Mo on tree, bitmask of character parities
9Array QueriesCF 797EHardSqrt decomposition + Mo-like approach for jump queries
10DestinyCF 840DHardRandomized or Mo-based approach for frequent element in range
11Distinct Values QueriesCSES1700Count distinct values in range -- classic Mo's application

Further Reading

See also:

  • 02-sqrt-decomposition.md -- block decomposition fundamentals
  • 01-offline-queries.md -- general offline query techniques
  • 01-euler-tour-and-lca.md -- needed for Mo on trees
  • 04-cdq-divide-and-conquer.md -- offline divide and conquer for multi-dimensional problems
  • 05-parallel-binary-search.md -- batch multiple binary searches over the same timeline

Reconstruct It

Setup: You have an array of n elements and Q offline range queries, each asking for some aggregate (e.g., count of distinct values) over [li,ri]. A naive approach processes each query independently in O(n), giving O(NQ) total -- too slow.

What if you could reorder the queries to minimize total work?

Constraint hint: Imagine maintaining a sliding window [curL,curR] and adding/removing elements one at a time. If consecutive queries have similar l and r values, the window moves very little. The question is: what ordering minimizes total pointer movement?

Step 1 -- Block decomposition on l. Divide the array into blocks of size Bn. Group queries by which block their l falls into. Within each block, sort queries by r. This way, l moves at most O(B) between consecutive queries in the same block, and r moves monotonically within each block.

Step 2 -- Count the total movement. There are n/B blocks. Within each block, r sweeps from left to right: O(n) movement for r per block, so O(nn/B) total for r. For l: it moves at most O(B) per query, so O(QB) total. Setting B=n balances both terms to O((N+Q)N).

Step 3 -- Maintain the answer incrementally. You need add(x) and remove(x) functions that update the current answer when the window expands or shrinks by one element. For "count of distinct values": add increments a frequency counter and increases the distinct count if the frequency goes from 0 to 1; remove decrements and decreases if frequency goes from 1 to 0.

Full derivation

The algorithm (Mo's):

  1. Choose block size B=n.
  2. For each query (li,ri), compute block_i = l_i / B.
  3. Sort queries by (block_i, r_i) -- with the optimization that for odd-numbered blocks, sort r in decreasing order (reduces total r-movement by ~50%).
  4. Initialize curL = 0, curR = -1 (empty window).
  5. For each query in sorted order:
    • Expand/shrink the window to [li,ri] by calling add/remove one element at a time.
    • Store the current answer for this query.
  6. Output answers in original query order.

Why it works: By grouping l-values into n-sized blocks, we ensure l never jumps more than O(n) between queries. Within each block, r is sorted, so it moves at most O(n) total per block. Total pointer movement: O(Qn+nn)=O((N+Q)N). This is optimal for comparison-based offline range queries with O(1) add/remove.


Solve With Me -- CF 86D "Powerful Array"

Problem: array of n integers, t queries. Each query gives [l,r] and asks for scnts2s where cnts is how many times value s appears in a[l..r]. Constraints: n,t2×105, values up to 106.

Let me try the naive approach first: for each query, iterate over [l,r], count frequencies, compute the sum. That's O(n) per query, O(nt)4×1010. Way too slow.

Can I precompute something? Prefix sums of... what? The answer depends on frequencies in the range, which don't decompose nicely into prefix sums. A segment tree could work if I merged frequency maps, but merging two frequency maps is O(distinct values) -- that's worse than brute force.

Hmm. Let me think differently. What if I could incrementally maintain the answer -- adding or removing one element at a time? If I add element x to the current window, the count of x goes from c to c+1, and the contribution changes from c2x to (c+1)2x, a delta of (2c+1)x. Similarly for removal: delta is (2c1)x. Both O(1)!

So I can maintain the answer with O(1) add/remove. Now I just need an order to process the queries that minimizes total pointer movement. That's exactly Mo's algorithm: sort queries by (l/n,r), then sweep two pointers.

Block size: n450. Within each block, l moves at most O(n) per query, and r is sorted so it moves O(n) total per block. With O(n) blocks, total movement is O((n+t)n)2×1054509×107. That fits in 5 seconds.

One thing that tripped me up: the Mo's block size formula. I initially used sqrt(n) but read somewhere that sqrt(n * 2 / 3) can be better. Tried both, didn't matter much for this problem. I also forgot to allocate the cnt[] array large enough -- values go up to 106, not n. Segfault on test 4 until I fixed that.

Another gotcha: the comparison function for sorting queries. Within the same block, if the block index is even I sort r ascending, if odd I sort r descending (the "zig-zag" optimization). This avoids r jumping back to zero at block boundaries. Shaved about 30% off runtime.

Final solution: ~60 lines of C++. Mo's with O(1) transitions. AC in 3.8 seconds out of 5.

What to recognize next time: When you see offline range queries and the answer can be maintained with O(1) element add/remove, it's Mo's. The "powerful" contribution formula looks scary but the delta is just algebra -- always expand (c±1)2c2 before panicking. And size your frequency array to the value range, not n.

Built for the climb from Codeforces 1100 to 2100.