Mo's Algorithm

Quick Revisit
USE WHEN: offline range queries on a static array where add/remove one element is O(1)
INVARIANT: queries sorted by (⌊l/B⌋, r) with B = √n; total pointer movement O((n+q)√n)
TIME: O((n+q)√n) | SPACE: O(n + MAXVAL)
INVALID WHEN: array has updates (use 3D Mo); add/remove is not O(1) or not reversible; query answer cannot be maintained incrementally
TUNING: when q is much smaller than n, B = max(1, n/√q) is asymptotically tighter; see "block size tuning" below
PRACTICE: 08-ladder-mixed

Offline range-query technique that answers $q$ queries on a static array in $O ((n + q) \sqrt{n})$ by sorting queries intelligently and maintaining a sliding window. A staple for Div 2 D/E on Codeforces when updates aren't required and online isn't forced.

ID: AL-40 Difficulty: [Advanced]Prerequisites: Sqrt Decomposition, Offline QueriesCF Rating: 1800-2100

Contest Frequency: * Specialized -- appears in offline range query problems

2. Intuition
3. C++ STL Reference
4. Implementation (Contest-Ready)
5. Operations Reference
6. Problem Patterns & Tricks
7. Contest Cheat Sheet
8. Gotchas & Debugging
9. Self-Test
10. Practice Problems
11. Further Reading

Intuition

2.1 The Pain

Consider $a = [3, 1, 4, 1, 5, 9, 2, 6]$ ( $n = 8$ ) and six queries asking "how many distinct values in $[l, r]$ ?":

  Q0: [0, 3]   Q1: [2, 7]   Q2: [1, 4]
  Q3: [0, 7]   Q4: [5, 7]   Q5: [3, 6]

Naive approach: for each query, scan the subarray and count distinct values. Worst case per query is $O (n)$ , total $O (n \cdot q)$ . For $n, q = 10^{5}$ that is $10^{10}$ operations -- hopelessly slow.

2.2 The Key Insight

Sort queries by $(⌊ l / \sqrt{n} ⌋, r)$ so that consecutive queries have nearly overlapping ranges -- then maintain the answer by adding/removing one element at a time, with total pointer movement $O ((n + q) \sqrt{n})$ .

Think of a lazy student who has to answer a list of homework questions, each about a different range of the textbook. Instead of answering them in the order given -- flipping wildly between chapters -- the student rearranges the questions so each one is close to the previous one, minimizing page-turning. That's Mo's algorithm: sort queries so that consecutive queries overlap as much as possible, making the total pointer movement minimal.

Concretely, think of two pointers, $c u r L$ and $c u r R$ , that define the "current window" over the array. To transition from one query to the next, the pointers walk left or right, adding or removing elements as they go. If queries were in random order the pointers would jump wildly. Mo's sorting guarantees:

Within one block of $l$ -values, $r$ is sorted, so the right pointer sweeps mostly forward -- $O (n)$ per block, $O (n \sqrt{n})$ total.
The left pointer stays inside a block of width $\sqrt{n}$ , moving at most $\sqrt{n}$ per transition -- $O (q \sqrt{n})$ total.

  Block structure (block_size = sqrt(n)):         n = 16, block_size = 4
  
  +-------+-------+-------+-------+
  | blk 0 | blk 1 | blk 2 | blk 3 |
  +-------+-------+-------+-------+
  |0 1 2 3|4 5 6 7|8 9 A B|C D E F|
  +-------+-------+-------+-------+
       ^                       ^
       |                       |
    l/B = 0               l/B = 3
  
  Queries sorted by: (l / block_size, R)
  Even blocks: R ascending   -->-->-->
  Odd  blocks: R descending  <--<--<--
  
  This zigzag pattern minimizes total R-pointer movement.

2.3 Visual Walkthrough

Array: $a = [3, 1, 4, 1, 5, 9, 2, 6]$ , block size $B = ⌈ \sqrt{8} ⌉ = 3$ .

Step 0 -- Sort queries by $(⌊ l / 3 ⌋, r)$ :

  Original            Block of l    Sorted (by block, then r)
  Q0: [0,3]  blk 0    Q0: [0,3]  blk 0, r=3
  Q1: [2,7]  blk 0    Q2: [1,4]  blk 0, r=4
  Q2: [1,4]  blk 0    Q1: [2,7]  blk 0, r=7
  Q3: [0,7]  blk 0    Q3: [0,7]  blk 0, r=7
  Q4: [5,7]  blk 1    Q5: [3,6]  blk 1, r=6
  Q5: [3,6]  blk 1    Q4: [5,7]  blk 1, r=7

Step 1 -- Process Q0 [0, 3]. Start with empty window ( $c u r L = 0, c u r R = - 1$ ). Add indices 0..3:

  a:  [ 3  1  4  1  5  9  2  6 ]
       *  *  *  *  .  .  .  .
       ^        ^
      curL    curR
  cnt: {3:1, 1:2, 4:1}   distinct = 3

Step 2 -- Process Q2 [1, 4]. Move $c u r L$ from 0 to 1 (remove $a [0] = 3$ ), move $c u r R$ from 3 to 4 (add $a [4] = 5$ ):

  a:  [ 3  1  4  1  5  9  2  6 ]
       .  *  *  *  *  .  .  .
          ^        ^
        curL     curR
  cnt: {1:2, 4:1, 5:1}   distinct = 3
  Pointer moves: L +1, R +1  (total so far: 2)

Step 3 -- Process Q1 [2, 7]. Move $c u r L$ from 1 to 2 (remove $a [1] = 1$ ), move $c u r R$ from 4 to 7 (add $a [5] = 9, a [6] = 2, a [7] = 6$ ):

  a:  [ 3  1  4  1  5  9  2  6 ]
       .  .  *  *  *  *  *  *
             ^              ^
           curL           curR
  cnt: {4:1, 1:1, 5:1, 9:1, 2:1, 6:1}   distinct = 6
  Pointer moves: L +1, R +3  (total so far: 6)

Step 4 -- Process Q3 [0, 7]. Move $c u r L$ from 2 to 0 (add $a [1] = 1, a [0] = 3$ ), $c u r R$ stays:

  a:  [ 3  1  4  1  5  9  2  6 ]
       *  *  *  *  *  *  *  *
       ^                    ^
      curL                curR
  cnt: {3:1, 1:2, 4:1, 5:1, 9:1, 2:1, 6:1}   distinct = 7
  Pointer moves: L +2, R +0  (total so far: 8)

Remaining queries Q5 and Q4 (block 1) follow the same pattern. Total pointer movement stays $O ((n + q) \sqrt{n})$ -- here roughly 16 moves instead of the naive ~48.

  Mo's pointer movement on sorted queries:

  Array indices:  0   1   2   3   4   5   6   7   8   9
  Blocks:        |--blk 0--|--blk 1--|--blk 2--|--blk 3-|
                     B = 3

  Sorted query order (within block 0):
  Q0: L=0  R=3   L--*--*--R
  Q2: L=1  R=4   .L--*--*--R           R moves +1 -->
  Q1: L=2  R=7   ..L--*--*--*--*--R    R moves +3 --->
  Q3: L=0  R=7   L--*--*--*--*--*--R   R stays, L moves -2 <--
                  ^                ^
                  |                |
               L moves at        R sweeps
              most B=3 per      monotonically
               transition       within block

  Within same block: L moves O(sqrt(n)), R moves monotonically
  Across all blocks: total = O((n + q) * sqrt(n))

Worked Example: Mo's Algorithm -- 5 Queries on Array [2, 0, 1, 3, 2, 1, 4, 2]

Query: count of distinct elements. n = 8, block size B = ⌈√8⌉ = 3.

  Queries (0-indexed, inclusive):
    Q0:[1,5]  Q1:[0,3]  Q2:[4,7]  Q3:[2,6]  Q4:[0,7]

  Step 1 -- Sort by (⌊L/B⌋, R):
  | Query | L | R | Block | Sorted position |
  |-------|---|---|-------|-----------------|
  |  Q1   | 0 | 3 |   0   |       1st       |
  |  Q0   | 1 | 5 |   0   |       2nd       |
  |  Q3   | 2 | 6 |   0   |       3rd       |
  |  Q4   | 0 | 7 |   0   |       4th       |
  |  Q2   | 4 | 7 |   1   |       5th       |

  Step 2 -- Process in sorted order, tracking [curL, curR]:
  Array: [2, 0, 1, 3, 2, 1, 4, 2]

  | Order | Query | [curL,curR]  | L moves  | R moves | adds | removes | answer |
  |-------|-------|-------------|----------|---------|------|---------|--------|
  |   --   |  init | [0, -1]     |    --     |    --    |   --  |    --    |   --    |
  |   1   |  Q1   | [0, 3]      | 0->0 = 0 | -1->3=+4|   4  |    0    |   4    |
  |   2   |  Q0   | [1, 5]      | 0->1 = +1| 3->5 =+2|   2  |    1    |   4    |
  |   3   |  Q3   | [2, 6]      | 1->2 = +1| 5->6 =+1|   1  |    1    |   4    |
  |   4   |  Q4   | [0, 7]      | 2->0 = -2| 6->7 =+1|   3  |    0    |   5    |
  |   5   |  Q2   | [4, 7]      | 0->4 = +4| 7->7 = 0|   0  |    4    |   4    |

  Totals: 8 L-moves + 8 R-moves = 16 add/remove ops
  Naive (independent per query): 4+6+4+5+8 = 27 ops
  Mo's order saves ~40% here; gap widens with more queries.

2.4 The Invariant

+------------------------------------------------------------------+
| INVARIANT: Between queries the data structure reflects the exact  |
| answer for the range [curL, curR].                                |
|                                                                   |
| - Each add(val) / remove(val) runs in O(1)                       |
|   (or O(log n) with augmented structures).                        |
| - Total pointer movement across ALL queries: O((n+q) * sqrt(n)). |
+------------------------------------------------------------------+

Why $O (1)$ per element? For distinct count, maintain cnt[val]. add: increment cnt[val]; if it goes $0 \to 1$ , increment distinct. remove: decrement cnt[val]; if it goes $1 \to 0$ , decrement distinct.

2.5 When to Reach for This

Trigger phrases in problem statements:

"offline range queries" -- queries given upfront, answers output at the end.
"no updates" / "static array" -- the array does not change.
"distinct count in range", "frequency of values in subarray".
"queries can be reordered" -- explicitly or implicitly.

Counter-examples (when NOT to use Mo's): these are recognition negatives -- if any one applies, Mo's is the wrong tool, and noticing the negative is as important as noticing the positive trigger phrases above.

Situation	Why not Mo's	Use instead
Array has point updates between queries	add/remove invariant breaks; the array no longer reflects "the same window" across queries	Mo with updates / 3D Mo ( $O (n^{5 / 3})$ , more complex), or segment tree
Online required (next query depends on previous answer, XOR decryption, etc.)	Cannot reorder queries; midpoints cannot be batched	Persistent segment tree, merge-sort tree
add/remove is not $O (1)$ -- e.g., requires recomputing a non-invertible aggregate	Total becomes $O ((n + q) \sqrt{n} \cdot f)$ ; often TLE for any non-trivial $f$	Segment tree with merge, offline divide-and-conquer, or Mo with rollback if remove is the only expensive side
Query answer cannot be maintained incrementally (e.g., median of window)	Even with O(1) data structure ops, recomputing the answer per pointer move is $Ω (\log n)$	Wavelet tree, persistent structures
$n, q > 2 \times 10^{5}$ and TL is tight	$\sqrt{n}$ constant factor can be large	Wavelet tree, offline D&C

What the Code Won't Teach You

Recognizing Mo-solvable problems. The key condition: the problem is offline, the array is static (no updates), and there exists an efficient (ideally $O (1)$ ) way to add or remove a single element from the "current answer." Distinct count, sum, XOR, frequency-based functions -- all satisfy this. Problems that don't: anything requiring you to "undo" a non-invertible operation (use Mo with rollback in that case).

The add/remove function is the design problem. The template gives add/remove for distinct count. In contests, you need to design add/remove for whatever the problem asks. This design question is often harder than Mo's itself. Examples:

"Sum of frequencies squared" -> maintain a counter-of-counters
"XOR of all elements" -> XOR is its own inverse, add = remove
"k-th most frequent" -> harder, might need a sorted structure or secondary data structure

The initialization. Mo's starts with an empty window (curL = 0, curR = -1). This means your first query expands from empty. Your add/remove functions must handle adding to an empty state correctly -- test that add works when cnt[] is all zeros and cur_ans is 0.

How This Evolved

The most straightforward answer to "how many distinct values in $[l, r]$ ?" is brute force: for each query, scan the subarray and count. That's $O (n)$ per query, $O (n q)$ total -- fine for small inputs, disastrous at scale. If the array is static and queries can be answered offline, can we do better by choosing a smarter order?

The critical observation is that if you already know the answer for $[l, r]$ , extending to $[l, r + 1]$ or $[l - 1, r]$ costs just $O (1)$ -- add one element, update the count. So the real cost of answering all queries is proportional to how far the two pointers travel in total. Random query order means wild pointer jumps. Mo's insight is to sort queries by $(⌊ l / \sqrt{n} ⌋, r)$ , grouping them into $\sqrt{n}$ blocks by their left endpoint. Within each block, the left pointer moves at most $O (\sqrt{n})$ per transition, and the right pointer sweeps monotonically across the array -- $O (n)$ per block. Summing over all blocks gives $O (n \sqrt{n})$ for the right pointer and $O (q \sqrt{n})$ for the left, yielding $O ((n + q) \sqrt{n})$ total.

Later refinements sharpened this further. The zigzag optimization (alternating the sort direction of $r$ in even/odd blocks) roughly halves the right-pointer movement in practice. Mo with rollback extends the technique to problems where the remove operation is expensive or impossible, by only adding elements and rolling back to a checkpoint. And Mo on trees adapts the block-decomposition idea to Euler-tour flattened trees, answering path queries offline. Each variant builds on the same core principle: control pointer movement by sorting queries intelligently.

C++ STL Reference

Function / Class	Header	What it does	Time
`sort(v.begin(), v.end(), cmp)`	`<algorithm>`	Sort queries by Mo ordering	$O (q \log q)$
`vector<int> cnt(MAXVAL)`	`<vector>`	Frequency array for add/remove	$O (1)$ per access
`__builtin_clz(x)`	built-in	Bit tricks for block computation	$O (1)$
`swap(a, b)`	`<utility>`	Swap elements in add/remove	$O (1)$

Key point: Mo's algorithm uses almost no STL beyond sorting. The core is raw array manipulation with cnt[] for $O (1)$ add/remove. Avoid map/set inside add/remove -- the $O (\log n)$ factor per operation turns the total complexity into $O ((n + q) \sqrt{n} \log n)$ , which often TLEs.

Implementation (Contest-Ready)

Problem: Given an array of $n$ integers and $q$ queries $[l, r]$ , for each query output the number of distinct values in $a [l . . r]$ (0-indexed).

Pointer Movement Order Matters

When moving from query [curL, curR] to [L, R], expand BEFORE shrinking:

while (curR < R) add(a[++curR]); -- expand right
while (curL > L) add(a[--curL]); -- expand left
while (curR > R) remove(a[curR--]); -- shrink right
while (curL < L) remove(a[curL++]); -- shrink left

Why this order? Expanding first ensures the window always covers a valid range. If you shrink first, you might temporarily have curL > curR (an invalid/empty window), which can cause index-out-of-bounds or wrong state in your add/remove functions.

Pre-increment vs post-increment: add(a[++curR]) adds element THEN moves pointer. add(a[curR++]) moves pointer THEN adds -- off-by-one. The ++curR / --curL pattern for expansion and curR-- / curL++ for shrinking is the safe convention.

  Extending / shrinking the current window [L..R]:

  Current:     [  L . . . . . . . R  ]
                  ^               ^
                curL            curR

  Extend R:    [  L . . . . . . . R--->R+1 ]   add(arr[++curR])
  Extend L:    [ L-1<---L . . . . . . . R  ]   add(arr[--curL])
  Shrink R:    [  L . . . . . . R-1 ]<--R       rem(arr[curR--])
  Shrink L:    L-->[ L+1 . . . . . . . R   ]   rem(arr[curL++])

  +----------------------------------------------------------+
  | ORDER MATTERS: always expand before shrink!              |
  |                                                          |
  |  SAFE:   expand R --> expand L --> shrink R --> shrink L  |
  |  UNSAFE: shrink first --> window may become empty/invalid |
  +----------------------------------------------------------+

Version 1 -- Minimal contest template

cpp

#include <bits/stdc++.h>
using namespace std;

int block;

struct Query {
    int l, r, idx;
    bool operator<(const Query& o) const {
        int b1 = l / block, b2 = o.l / block;
        if (b1 != b2) return b1 < b2;
        return (b1 & 1) ? (r > o.r) : (r < o.r); // odd block: r desc, even block: r asc
    }
};

int cnt[1000001];
int cur_ans;

void add(int val) { if (++cnt[val] == 1) cur_ans++; }
void rem(int val) { if (--cnt[val] == 0) cur_ans--; }

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, q;
    cin >> n >> q;
    vector<int> a(n);
    for (auto& x : a) cin >> x;

    block = max(1, (int)sqrt(n));
    vector<Query> queries(q);
    for (int i = 0; i < q; i++) {
        cin >> queries[i].l >> queries[i].r;
        queries[i].idx = i;
    }
    sort(queries.begin(), queries.end());

    vector<int> ans(q);
    int curL = 0, curR = -1;
    cur_ans = 0;

    for (auto& [l, r, idx] : queries) {
        while (curR < r) add(a[++curR]);
        while (curL > l) add(a[--curL]);
        while (curR > r) rem(a[curR--]);
        while (curL < l) rem(a[curL++]);
        ans[idx] = cur_ans;
    }

    for (int i = 0; i < q; i++)
        cout << ans[i] << "\n";
}

Version 2 -- Explained version

cpp

#include <bits/stdc++.h>
using namespace std;

// Block size for Mo's ordering. Set to sqrt(n).
int block;

struct Query {
    int l, r, idx;
    bool operator<(const Query& o) const {
        int b1 = l / block, b2 = o.l / block;
        if (b1 != b2) return b1 < b2;
        // Odd-even trick: zigzag R between blocks to minimize pointer movement.
        //   even blocks (b1 & 1 == 0): sort r ascending  (r < o.r)
        //   odd  blocks (b1 & 1 == 1): sort r descending (r > o.r)
        // This reduces total right-pointer movement by ~30-40%.
        return (b1 & 1) ? (r > o.r) : (r < o.r);
    }
};

// cnt[v] = how many times value v appears in the current window [curL, curR].
// cur_ans = number of distinct values (cnt[v] > 0) in the window.
int cnt[1000001];
int cur_ans;

// Include value val into the window.
void add(int val) {
    cnt[val]++;
    if (cnt[val] == 1) cur_ans++; // first occurrence -> new distinct value
}

// Exclude value val from the window.
void rem(int val) {
    cnt[val]--;
    if (cnt[val] == 0) cur_ans--; // last occurrence removed -> lost a distinct value
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, q;
    cin >> n >> q;
    vector<int> a(n);
    for (auto& x : a) cin >> x;

    // Block size = sqrt(n). The max(1, ...) guards against n=0.
    block = max(1, (int)sqrt(n));

    vector<Query> queries(q);
    for (int i = 0; i < q; i++) {
        cin >> queries[i].l >> queries[i].r;
        queries[i].idx = i; // remember original order for output
    }

    // Sort all queries by Mo's ordering.
    sort(queries.begin(), queries.end());

    vector<int> ans(q);
    // Current window is [curL, curR]. Start empty: curL=0, curR=-1.
    int curL = 0, curR = -1;
    cur_ans = 0;

    for (auto& [l, r, idx] : queries) {
        // Expand/shrink window to match [l, r].
        // ORDER MATTERS: expand before shrinking to avoid
        // the window becoming invalid (curL > curR+1 with wrong state).
        //
        // Safe order: expand R, expand L, shrink R, shrink L.
        // This ensures the window always contains a valid range.
        while (curR < r) add(a[++curR]);   // expand right
        while (curL > l) add(a[--curL]);   // expand left
        while (curR > r) rem(a[curR--]);   // shrink right
        while (curL < l) rem(a[curL++]);   // shrink left

        ans[idx] = cur_ans;
    }

    // Output answers in original query order.
    for (int i = 0; i < q; i++)
        cout << ans[i] << "\n";
}

Operations Reference

Operation	Time	Space
Sort queries (Mo order)	$O (q \log q)$	$O (q)$
Process all queries (total)	$O ((n + q) \sqrt{n})$	$O (n)$
Single add/remove (distinct count)	$O (1)$	-
Single add/remove (with `map`/`set`)	$O (\log n)$	-
Mo with updates (3D Mo)	$O (n^{5 / 3})$	$O (n)$
Mo on trees (Euler tour)	$O ((n + q) \sqrt{n})$	$O (n)$

Notes:

The $O ((n + q) \sqrt{n})$ bound assumes $B = \sqrt{n}$ . When $n$ and $q$ differ significantly, optimal block size is $B = n / \sqrt{q}$ , giving $O (n \sqrt{q})$ .
With the odd-even optimization, the constant factor roughly halves.
If add/remove is $O (\log n)$ , total becomes $O ((n + q) \sqrt{n} \log n)$ -- often too slow for $n, q = 2 \times 10^{5}$ under 2s. Try to make add/remove $O (1)$ .
Space is dominated by the frequency array. If values are up to $10^{6}$ , that's 4 MB for int cnt[]. Coordinate-compress if values are up to $10^{9}$ .

🔍 Why This Works -- Block Size √n Minimizes Total Movement

Mo's algorithm answers offline range queries by sorting them cleverly and moving two pointers (left, right) to adjust the current range.

Why √n blocks? Split the array into blocks of size $B$ . Sort queries by (block of left endpoint, then right endpoint).

Right pointer: Within a block of left endpoints, right moves monotonically. Across $n / B$ blocks, right travels $O (n)$ per block -> total $O (n \cdot n / B)$ .
Left pointer: Between consecutive queries in the same block, left moves $O (B)$ . Over $q$ queries, left travels $O (q \cdot B)$ .
Total: $O (n^{2} / B + q B)$ . Setting $B = \sqrt{n}$ balances both terms to $O ((n + q) \sqrt{n})$ .

📐 Where Does the Complexity Come From? -- O((n+q)√n) Derivation

Setup: Array of size $n$ , $q$ queries, block size $B$ .

Right pointer movement: Within each block of left endpoints, right moves monotonically (we sort by right within a block). Each block causes right to sweep at most $O (n)$ . There are $n / B$ blocks -> right moves $O (n \cdot n / B) = O (n^{2} / B)$ total.

Left pointer movement: Between consecutive queries in the same block, left moves at most $O (B)$ (both lefts are in the same block). Between blocks, left moves at most $O (B)$ too. Over $q$ queries -> left moves $O (q \cdot B)$ total.

Total: $O (n^{2} / B + q B)$ . Taking the derivative w.r.t. $B$ and setting to zero: $- n^{2} / B^{2} + q = 0 \Rightarrow B = n / \sqrt{q}$ . For $q \approx n$ , this gives $B = \sqrt{n}$ and total $O ((n + q) \sqrt{n})$ .

Problem Patterns & Tricks

Pattern 1 -- Distinct element count in range

The classic Mo application. Maintain cnt[val]; increment distinct when cnt goes $0 \to 1$ , decrement when $1 \to 0$ .

Recognize it: "How many distinct values in $[l, r]$ ?" on a static array with offline queries.

CF 86D  -- Powerful Array (sum of cnt[x]^2 * x)
SPOJ DQUERY -- D-query (exact distinct count)

Pattern 2 -- Frequency-based aggregate queries

Queries ask for $\sum f (c n t_{x})$ over all values $x$ in $[l, r]$ , where $f$ is some function of frequency. Common variants: sum of squares of frequencies, count of values appearing exactly $k$ times.

Trick: Maintain a secondary counter freq_cnt[k] = number of values with frequency $k$ . Update both cnt[] and freq_cnt[] in add/remove.

cpp

void add(int val) {
    freq_cnt[cnt[val]]--;
    cnt[val]++;
    freq_cnt[cnt[val]]++;
    cur_ans += 2LL * cnt[val] - 1; // for sum-of-squares: delta is (c+1)^2 - c^2
}

CF 86D  -- Powerful Array (sum of cnt[x]^2 * x)
CF 375D -- Tree and Queries (count of colors with freq >= k)

Pattern 3 -- XOR / bitwise queries on ranges

Query: XOR of all elements in $[l, r]$ , or count of pairs with XOR property. XOR is its own inverse, so add and remove are the same operation.

cpp

void add(int val) { cur_xor ^= val; }
void rem(int val) { cur_xor ^= val; } // same as add!

CF 617E -- XOR and Favorite Number (count pairs with XOR = k)

Pattern 4 -- Mo with updates (Mo's algorithm with modifications)

Handle point updates by adding a time dimension. Each query becomes $(l, r, t)$ where $t$ is the number of updates before this query. Block size $B = n^{2 / 3}$ . Sort by $(l / B, r / B, t)$ . Three pointers: left, right, time. Total: $O (n^{5 / 3})$ .

cpp

struct Query {
    int l, r, t, idx;
    bool operator<(const Query& o) const {
        int b1 = l / block, b2 = o.l / block;
        if (b1 != b2) return b1 < b2;
        int c1 = r / block, c2 = o.r / block;
        if (c1 != c2) return c1 < c2;
        return t < o.t;
    }
};

Apply/undo updates as the time pointer moves forward/backward.

CF 940F  -- Machine Learning (Mo with updates, count of mex of frequencies)
CF 1476F -- Lanterns (offline with updates variant)

Pattern 5 -- Mo on trees

Flatten the tree using an Euler tour. A path query $(u, v)$ maps to one or two subarray queries on the Euler tour array. Use in[v] and out[v] timestamps. If $u$ is an ancestor of $v$ , query $[i n [u], i n [v]]$ . Otherwise, query $[o u t [u], i n [v]]$ and handle the LCA separately.

Key trick: Each node appears twice in the Euler tour (entry and exit). Toggle: first occurrence adds the node, second occurrence removes it.

SPOJ COT2  -- Count on a tree II (distinct colors on path)
CF 375D    -- Tree and Queries (colors with freq >= k on subtree)

Pattern 6 -- Mo with rollback (Undo Mo)

Some operations (like "merge two sets") don't have efficient remove. Mo with rollback handles this: at the start of each block of queries, save a checkpoint. Process queries within the block by only expanding right (add-only). For the left pointer, expand temporarily and rollback after answering.

When to use: DSU-based queries (e.g., number of connected components in a range of edges), where remove is hard but add + undo is possible.

CF 896C -- Willem, Chtholly and Seniorious (chtholly tree, but Mo+rollback also works)
CF 1039D -- You Are Given a Tree (sqrt decomposition + Mo-like approach)

Contest Cheat Sheet

+----------------------------------------------------------------+
|  MO'S ALGORITHM -- QUICK REFERENCE                             |
+----------------------------------------------------------------+
|  WHEN: Offline range queries, static array, O(1) add/remove    |
|  TIME: O((n+q)sqrt(n))    SPACE: O(n + MAXVAL)                 |
+----------------------------------------------------------------+
|  SETUP:                                                        |
|    block = max(1, (int)sqrt(n));                                |
|    sort queries by (l/block, r)  // odd-even: flip r in odd    |
+----------------------------------------------------------------+
|  CORE LOOP:                                                    |
|    int curL=0, curR=-1;                                        |
|    for(auto&[l,r,idx]:queries){                                |
|      while(curR<r) add(a[++curR]);                             |
|      while(curL>l) add(a[--curL]);                             |
|      while(curR>r) rem(a[curR--]);                             |
|      while(curL<l) rem(a[curL++]);                             |
|      ans[idx] = cur_ans;                                       |
|    }                                                           |
+----------------------------------------------------------------+
|  WATCH OUT:                                                    |
|  - Expand BEFORE shrink (avoid invalid empty states)           |
|  - Block size sqrt(n), NOT sqrt(q)                             |
|  - Coordinate compress if vals > 1e6                           |
|  - add/remove MUST be O(1) or you blow the time budget         |
|  - Odd-even sort: (b&1) ? r DESC : r ASC  (~30% faster)       |
|  - Store original query index for output ordering              |
+----------------------------------------------------------------+

Gotchas & Debugging

Block size tuning (advanced constant-factor note). The walkthrough above and the contest template both use B = sqrt(n). That is the right starting point: simple, correct, and asymptotically optimal when $n \approx q$ . The sharper bound is B = ceil(n / sqrt(q)):

When $n \approx q$ : B = sqrt(n) and B = n / sqrt(q) agree.
When $q ≪ n$ : B = n / sqrt(q) reduces total pointer movement and can be 2-3x faster -- the difference between AC and TLE on problems with many fewer queries than array elements.
Empirically, try B = max(1, (int)ceil(n / sqrt(q))) if the default is borderline.

This is a tuning move, not the default. Do not strip the simple sqrt(n) template until you understand why the alternative is asymptotically tighter.

1. Block size matters more than you think. Use $B = \sqrt{n}$ as default. For problems where $q ≫ n$ , use $B = n / \sqrt{q}$ for tighter bounds. Wrong block size is the #1 cause of TLE with Mo's algorithm. Experiment: try $B = \max(1, (int)(n / \sqrt{q + 0.5)))$ if default TLEs.

2. Order of expand/shrink operations. The four while-loops must expand the window before shrinking it. If you shrink first, you can end up with $c u r L > c u r R + 1$ in a state where add/remove assumptions break. Safe order: expand R, expand L, shrink R, shrink L.

3. Off-by-one in add/remove boundaries. The idiom add(a[++curR]) increments first, then accesses. rem(a[curR--]) accesses first, then decrements. Mixing these up shifts your window by one. Test on a single-element query $[i, i]$ -- it should add exactly $a [i]$ .

4. Forgetting to coordinate-compress. If values go up to $10^{9}$ but there are only $n \leq 2 \times 10^{5}$ distinct values, you can't allocate cnt[1000000000]. Compress values to $[0, n)$ first.

5. Not saving the query index. After sorting, queries are in Mo order, not input order. If you forget to store and use idx, your output is scrambled. Always assign queries[i].idx = i before sorting.

6. Incorrect odd-even comparator. The tiebreaker for $r$ must flip direction for odd vs even blocks. If you accidentally use (b1 & 1) ? (r < o.r) : (r > o.r) (inverted), the optimization hurts instead of helping -- right pointer zigzags more.

7. Mo with updates: wrong block size. For 3D Mo (with updates), block size should be $B = n^{2 / 3}$ , not $\sqrt{n}$ . Using $\sqrt{n}$ gives $O (n^{7 / 4})$ instead of $O (n^{5 / 3})$ .

8. Global state not reset between test cases. If the problem has multiple test cases, cnt[] and cur_ans must be reset. Prefer memset(cnt, 0, sizeof cnt) or clear only the used entries.

Mental Traps

Trap: "Mo's is slow -- it's $O (n \sqrt{n})$ , which is worse than a segment tree." This comparison is wrong because it compares apples to oranges. Mo's handles query types that segment trees can't handle efficiently at all -- like "number of distinct values in a range." The relevant comparison is: Mo's $O (n \sqrt{n})$ vs. naïve $O (n q)$ . Mo's wins enormously.

Trap: "Mo's is just sorting queries smartly -- the rest is obvious." The sort is clever but the reason it works requires understanding the amortized analysis. If you just sort without understanding why this particular ordering minimizes pointer movement, you'll make mistakes when the analysis assumptions change (e.g., applying Mo's to a problem with updates, or with a non- $O (1)$ add/remove).

Trap: "I can use Mo's for problems with updates." Standard Mo's requires a static array. For problems with updates, you need "Mo's with updates" (3D Mo), which has block size $n^{2 / 3}$ and complexity $O (n^{5 / 3})$ . Using standard Mo's block size ( $\sqrt{n}$ ) for 3D Mo gives $O (n^{7 / 4})$ , which is significantly worse. These are different algorithms with different sort criteria.

Trap: "The odd-even optimization is just a constant factor." It roughly halves the right-pointer movement in practice. For problems near the time limit, this is the difference between AC and TLE. It's worth including as a default habit, not treating as optional.

The mistake that teaches

Sum-of-frequency-squares remove delta. I was solving a sum-of-squares-of-frequencies problem with add = answer += 2*cnt[a[pos]] + 1; cnt[a[pos]]++; and naively wrote remove = answer -= 2*cnt[a[pos]] + 1; cnt[a[pos]]--;. But in add, delta is computed before incrementing ((cnt+1)^2 - cnt^2 = 2*cnt + 1). In remove, we need delta before decrementing (cnt^2 - (cnt-1)^2 = 2*cnt - 1). My code subtracted 2*cnt + 1 instead of 2*cnt - 1 -- off by 2 per removal. Invisible on small tests; wrong on large. Lesson: when writing remove, re-derive the delta formula; don't just negate add.

CF 86D Powerful Array (variable shadowing + template dependence). I copy-pasted a Mo's template, stripped the parity optimization "to simplify," and introduced a second bug: I computed B = sqrt(n) but had overwritten n with the query count earlier. With n_array = 200000 and q = 200000 both assigned to n, it passed samples but WA on test 2. Lesson: don't reuse variable names (n_array vs q), and don't strip pieces from templates until you understand why they're there.

Debug Drills

Drill 1: Mo's algorithm with incorrect pointer movement order.

cpp

int curL = 0, curR = -1;
for (auto& [l, r, idx] : queries) {
    while (curR < r) add(++curR);
    while (curL < l) remove(curL++);
    while (curR > r) remove(curR--);
    while (curL > l) add(--curL);
    ans[idx] = answer;
}

What's wrong?

The order of the four while loops matters. Here, curR expands first, then curL shrinks, then curR shrinks, then curL expands. The problem is: if we first expand R and then shrink L, we might temporarily have a very large window, which is fine -- but if we first shrink R before expanding L, we could get curR < curL - 1 (empty window going negative). The canonical safe order is: expand both directions first, then shrink both: while (curL > l) add(--curL); while (curR < r) add(++curR); while (curL < l) remove(curL++); while (curR > r) remove(curR--);. The given order can cause remove to be called when the element isn't in the window, corrupting cnt[].

Drill 2: Wrong block size causes TLE.

cpp

int B = sqrt(n); // n = 100000, so B = 316
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
    if (a.l / B != b.l / B) return a.l / B < b.l / B;
    return a.r < b.r;
});

What's wrong?

The code is correct but suboptimal, leading to TLE on tight time limits. The fix is the block-parity optimization: for even-numbered blocks, sort by R ascending; for odd-numbered blocks, sort by R descending. This reduces the total R-pointer movement by roughly half. Fix: return (a.l / B != b.l / B) ? (a.l / B < b.l / B) : ((a.l / B) & 1) ? (a.r > b.r) : (a.r < b.r);

Drill 3: add and remove with unbounded values.

cpp

int answer = 0;
int cnt[MAXVAL] = {};

void add(int pos) {
    if (cnt[a[pos]] == 0) answer++;
    cnt[a[pos]]++;
}

void remove(int pos) {
    cnt[a[pos]]--;
    if (cnt[a[pos]] == 0) answer--;
}

What's wrong?

In add, the check cnt[a[pos]] == 0 happens before incrementing, so it correctly detects a new distinct element. In remove, the decrement happens before the check, so cnt[a[pos]] == 0 correctly detects that the element is gone. This looks correct -- but the bug is subtle: if a[pos] can be negative or exceed MAXVAL, then cnt[a[pos]] is an out-of-bounds access. This commonly happens when the values aren't coordinate-compressed. Fix: compress values to [0, n) before running Mo's, or use an unordered_map instead of a fixed array.

Drill 4: Add/remove asymmetry with initial window state.

cpp

void add(int idx) {
    cnt[a[idx]]++;
    if (cnt[a[idx]] == 1) distinct++;
}
void remove(int idx) {
    if (cnt[a[idx]] == 1) distinct--;
    cnt[a[idx]]--;
}
// Process queries
int curL = 0, curR = -1;
for (auto& q : queries) {
    while (curR < q.r) add(++curR);
    while (curL > q.l) add(--curL);
    while (curR > q.r) remove(curR--);
    while (curL < q.l) remove(curL++);
    ans[q.idx] = distinct;
}

What's wrong?

The order of the four while-loops matters! The code first expands R, then expands L, then shrinks R, then shrinks L. This is correct. However, the bug is subtle: if curL starts at 0 and curR starts at -1, the first expansion adds elements correctly. But consider the remove function -- it checks cnt[a[idx]] == 1 before decrementing. If a[idx] has count 0 (never added), decrementing makes it -1. This happens if curL/curR initialization is wrong relative to the first query. The initialization curL = 0, curR = -1 means the window is empty, which is correct -- but if queries[0] has L > 0, the "shrink L" loop runs remove(0) on an element that was never added. Fix: Expand before shrinking. The given order does expand first, so the real issue is that curL = 0 means element 0 is "inside" the window when curR = -1 implies it's empty. Use curL = 1, curR = 0 or ensure the first operations are always expansions.

Drill 5: Block comparator doesn't handle ties correctly.

cpp

bool cmp(Query& a, Query& b) {
    return a.l / B < b.l / B || a.r < b.r;
}

What's wrong?

This comparator is not a valid strict weak ordering. When a.l / B == b.l / B, it falls through to a.r < b.r, which is correct. But when a.l / B < b.l / B, it returns true regardless of R -- that's fine. The problem is when a.l / B > b.l / B: the first condition is false, so it evaluates a.r < b.r, returning true if a.r < b.r even though a's block is after b's block. Fix:

cpp

if (a.l / B != b.l / B) return a.l / B < b.l / B;
return a.r < b.r;

The original can cause cmp(a,b) == true && cmp(b,a) == true simultaneously, violating strict weak ordering and causing undefined behavior in std::sort.

Drill 6: Mo's with wrong block size.

cpp

int n, q;
cin >> n >> q;
int B = sqrt(q);  // should be sqrt(n)
// ... read array and queries ...
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
    if (a.l / B != b.l / B) return a.l / B < b.l / B;
    return a.r < b.r;
});

What's wrong?

Block size should be based on the array size n, not the number of queries q. With B = sqrt(q): if q is much smaller than n (say n = 200000, q = 100), then B ~= 10, creating 20000 blocks. The L pointer moves O(B) = O(10) per query, but R moves O(n) per block transition, totaling O(n * n/B) = O(n * 20000) which is far worse than intended. The standard formula B = sqrt(n) ensures total moves are O((n + q) * sqrt(n)). For the optimal block size when n and q differ significantly, use B = max(1, (int)(n / sqrt(q))).

Debugging checklist:

Print curL, curR, cur_ans after each query. Does the window match $[l, r]$ ?
Test on a single query $[0, n - 1]$ -- should give the answer for the whole array.
Test on $n = 1, q = 1$ .
Verify block size: cerr << "block=" << block << "\n";

Self-Test

Without opening the implementation above, verify you can reproduce these from memory:

[ ] Write the Mo's query comparator (the operator< in the Query struct) from memory, including the odd-even block optimization for the right-pointer tiebreak.
[ ] Write the main processing loop -- four while-loops in the correct expand-before-shrink order with correct pre/post increment pointer arithmetic (++curR, --curL, curR--, curL++).
[ ] Explain the amortized cost analysis in 3 sentences: why does the right pointer move $O (n \sqrt{n})$ total? Why does the left pointer move $O (q \sqrt{n})$ total?
[ ] Design add(val) and rem(val) for a new problem: "count the number of pairs $(i, j)$ in the window with $a [i] = a [j]$ ." (Hint: this is the sum-of-squares-of-frequencies variant -- the delta when frequency goes from $c$ to $c + 1$ is $+ 2 c$ .)
[ ] State what changes for Mo's with updates (3D Mo): which new pointer is added, what block size to use ( $n^{2 / 3}$ ), and what the resulting complexity is ( $O (n^{5 / 3})$ ).
[ ] Given a problem statement, identify whether Mo's is applicable: check for offline queries, static array, and $O (1)$ add/remove. List three problem types where Mo's does NOT apply and name the alternative technique for each.
[ ] Explain why expanding the window before shrinking is necessary -- what invalid state can occur if you shrink first, and how does it break the add/remove invariant?

Practice Problems

Rating Progression

CF Rating	What to Master
1200	Understand the brute-force "two-pointer window" for range queries; recognize when all queries are offline and the answer depends on frequency counts.
1500	Implement standard Mo's: sort by (l/B, r) with block size B = √n; maintain `cnt[]` array and a running answer; handle add/remove correctly. Output answers by original index.
1800	Mo's with the parity optimization (even blocks sort R ascending, odd blocks sort R descending); Mo's on trees via Euler tour flattening. Mo's with rollback for non-invertible operations.
2100	Mo's with updates (3D Mo's with timestamps); choosing optimal block size B ~= n^(2/3) for update variants; combining Mo's with bitset or other DS for complex queries; tuning block size for specific Q/N ratios.

#	Problem	Source	Difficulty	Key Idea
1	D-query	SPOJ	Easy	Textbook distinct count with Mo's
2	Powerful Array	CF 86D	Easy-Med	Sum of $c n t_{s}^{2} \cdot s$ ; practice add/remove with square updates
3	Tree and Queries	CF 375D	Medium	Mo on subtrees, count colors with freq $\geq k$
4	XOR and Favorite Number	CF 617E	Medium	Mo + prefix XOR, count pairs with XOR = $k$
5	Lomsat gelral	CF 600E	Medium	DSU on tree (alternative to Mo), dominant color per subtree
6	Machine Learning	CF 940F	Medium-Hard	Mo with updates (3D Mo), mex of frequency array
7	Count on a Tree II	SPOJ	Medium-Hard	Mo on tree paths via Euler tour
8	Arpa's letter-marked tree	CF 741D	Hard	DSU on tree / Mo on tree, bitmask of character parities
9	Array Queries	CF 797E	Hard	Sqrt decomposition + Mo-like approach for jump queries
10	Destiny	CF 840D	Hard	Randomized or Mo-based approach for frequent element in range
11	Distinct Values Queries	CSES	1700	Count distinct values in range -- classic Mo's application

What to recognize next time: When you see offline range queries and the answer can be maintained with $O (1)$ element add/remove, it's Mo's. The "powerful" contribution formula looks scary but the delta is just algebra -- always expand $(c \pm 1)^{2} - c^{2}$ before panicking. And size your frequency array to the value range, not $n$ .

Mo's Algorithm

Contents

Intuition

2.1 The Pain

2.2 The Key Insight

2.3 Visual Walkthrough

Worked Example: Mo's Algorithm -- 5 Queries on Array [2, 0, 1, 3, 2, 1, 4, 2]

2.4 The Invariant

2.5 When to Reach for This

What the Code Won't Teach You

How This Evolved

C++ STL Reference

Implementation (Contest-Ready)

Pointer Movement Order Matters

Version 1 -- Minimal contest template

Version 2 -- Explained version

Operations Reference

🔍 Why This Works -- Block Size √n Minimizes Total Movement

📐 Where Does the Complexity Come From? -- O((n+q)√n) Derivation

Problem Patterns & Tricks

Pattern 1 -- Distinct element count in range

Pattern 2 -- Frequency-based aggregate queries

Pattern 3 -- XOR / bitwise queries on ranges

Pattern 4 -- Mo with updates (Mo's algorithm with modifications)

Pattern 5 -- Mo on trees

Pattern 6 -- Mo with rollback (Undo Mo)

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

The mistake that teaches

Debug Drills

Self-Test

Practice Problems

Further Reading

Reconstruct It

Solve With Me -- CF 86D "Powerful Array"

Mo's Algorithm ​

Contents ​

Intuition ​

2.1 The Pain ​

2.2 The Key Insight ​

2.3 Visual Walkthrough ​

Worked Example: Mo's Algorithm -- 5 Queries on Array [2, 0, 1, 3, 2, 1, 4, 2] ​

2.4 The Invariant ​

2.5 When to Reach for This ​

What the Code Won't Teach You ​

How This Evolved ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Pointer Movement Order Matters ​

Version 1 -- Minimal contest template ​

Version 2 -- Explained version ​

Operations Reference ​

🔍 Why This Works -- Block Size √n Minimizes Total Movement ​

📐 Where Does the Complexity Come From? -- O((n+q)√n) Derivation ​

Problem Patterns & Tricks ​

Pattern 1 -- Distinct element count in range ​

Pattern 2 -- Frequency-based aggregate queries ​

Pattern 3 -- XOR / bitwise queries on ranges ​

Pattern 4 -- Mo with updates (Mo's algorithm with modifications) ​

Pattern 5 -- Mo on trees ​

Pattern 6 -- Mo with rollback (Undo Mo) ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Mental Traps ​

The mistake that teaches ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Reconstruct It ​

Solve With Me -- CF 86D "Powerful Array" ​

Mo's Algorithm

Contents

Intuition

2.1 The Pain

2.2 The Key Insight

2.3 Visual Walkthrough

Worked Example: Mo's Algorithm -- 5 Queries on Array [2, 0, 1, 3, 2, 1, 4, 2]

2.4 The Invariant

2.5 When to Reach for This

What the Code Won't Teach You

How This Evolved

C++ STL Reference

Implementation (Contest-Ready)

Pointer Movement Order Matters

Version 1 -- Minimal contest template

Version 2 -- Explained version

Operations Reference

🔍 Why This Works -- Block Size √n Minimizes Total Movement

📐 Where Does the Complexity Come From? -- O((n+q)√n) Derivation

Problem Patterns & Tricks

Pattern 1 -- Distinct element count in range

Pattern 2 -- Frequency-based aggregate queries

Pattern 3 -- XOR / bitwise queries on ranges

Pattern 4 -- Mo with updates (Mo's algorithm with modifications)

Pattern 5 -- Mo on trees

Pattern 6 -- Mo with rollback (Undo Mo)

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

The mistake that teaches

Debug Drills

Self-Test

Practice Problems

Further Reading

Reconstruct It

Solve With Me -- CF 86D "Powerful Array"