Appearance
Mo's Algorithm
Quick Revisit
- USE WHEN: offline range queries on a static array where add/remove one element is O(1)
- INVARIANT: queries sorted by (⌊l/B⌋, r) with B = √n; total pointer movement O((n+q)√n)
- TIME: O((n+q)√n) | SPACE: O(n + MAXVAL)
- INVALID WHEN: array has updates (use 3D Mo); add/remove is not O(1) or not reversible; query answer cannot be maintained incrementally
- TUNING: when q is much smaller than n, B = max(1, n/√q) is asymptotically tighter; see "block size tuning" below
- PRACTICE: 08-ladder-mixed
Offline range-query technique that answers
ID: AL-40 Difficulty: [Advanced]Prerequisites: Sqrt Decomposition, Offline QueriesCF Rating: 1800-2100
Contest Frequency: * Specialized -- appears in offline range query problems
Contents
- 2. Intuition
- 3. C++ STL Reference
- 4. Implementation (Contest-Ready)
- 5. Operations Reference
- 6. Problem Patterns & Tricks
- 7. Contest Cheat Sheet
- 8. Gotchas & Debugging
- 9. Self-Test
- 10. Practice Problems
- 11. Further Reading
Intuition
2.1 The Pain
Consider
Q0: [0, 3] Q1: [2, 7] Q2: [1, 4]
Q3: [0, 7] Q4: [5, 7] Q5: [3, 6]Naive approach: for each query, scan the subarray and count distinct values. Worst case per query is
2.2 The Key Insight
Sort queries by
so that consecutive queries have nearly overlapping ranges -- then maintain the answer by adding/removing one element at a time, with total pointer movement .
Think of a lazy student who has to answer a list of homework questions, each about a different range of the textbook. Instead of answering them in the order given -- flipping wildly between chapters -- the student rearranges the questions so each one is close to the previous one, minimizing page-turning. That's Mo's algorithm: sort queries so that consecutive queries overlap as much as possible, making the total pointer movement minimal.
Concretely, think of two pointers,
- Within one block of
-values, is sorted, so the right pointer sweeps mostly forward -- per block, total. - The left pointer stays inside a block of width
, moving at most per transition -- total.
Block structure (block_size = sqrt(n)): n = 16, block_size = 4
+-------+-------+-------+-------+
| blk 0 | blk 1 | blk 2 | blk 3 |
+-------+-------+-------+-------+
|0 1 2 3|4 5 6 7|8 9 A B|C D E F|
+-------+-------+-------+-------+
^ ^
| |
l/B = 0 l/B = 3
Queries sorted by: (l / block_size, R)
Even blocks: R ascending -->-->-->
Odd blocks: R descending <--<--<--
This zigzag pattern minimizes total R-pointer movement.2.3 Visual Walkthrough
Array:
Step 0 -- Sort queries by
Original Block of l Sorted (by block, then r)
Q0: [0,3] blk 0 Q0: [0,3] blk 0, r=3
Q1: [2,7] blk 0 Q2: [1,4] blk 0, r=4
Q2: [1,4] blk 0 Q1: [2,7] blk 0, r=7
Q3: [0,7] blk 0 Q3: [0,7] blk 0, r=7
Q4: [5,7] blk 1 Q5: [3,6] blk 1, r=6
Q5: [3,6] blk 1 Q4: [5,7] blk 1, r=7Step 1 -- Process Q0 [0, 3]. Start with empty window (
a: [ 3 1 4 1 5 9 2 6 ]
* * * * . . . .
^ ^
curL curR
cnt: {3:1, 1:2, 4:1} distinct = 3Step 2 -- Process Q2 [1, 4]. Move
a: [ 3 1 4 1 5 9 2 6 ]
. * * * * . . .
^ ^
curL curR
cnt: {1:2, 4:1, 5:1} distinct = 3
Pointer moves: L +1, R +1 (total so far: 2)Step 3 -- Process Q1 [2, 7]. Move
a: [ 3 1 4 1 5 9 2 6 ]
. . * * * * * *
^ ^
curL curR
cnt: {4:1, 1:1, 5:1, 9:1, 2:1, 6:1} distinct = 6
Pointer moves: L +1, R +3 (total so far: 6)Step 4 -- Process Q3 [0, 7]. Move
a: [ 3 1 4 1 5 9 2 6 ]
* * * * * * * *
^ ^
curL curR
cnt: {3:1, 1:2, 4:1, 5:1, 9:1, 2:1, 6:1} distinct = 7
Pointer moves: L +2, R +0 (total so far: 8)Remaining queries Q5 and Q4 (block 1) follow the same pattern. Total pointer movement stays
Mo's pointer movement on sorted queries:
Array indices: 0 1 2 3 4 5 6 7 8 9
Blocks: |--blk 0--|--blk 1--|--blk 2--|--blk 3-|
B = 3
Sorted query order (within block 0):
Q0: L=0 R=3 L--*--*--R
Q2: L=1 R=4 .L--*--*--R R moves +1 -->
Q1: L=2 R=7 ..L--*--*--*--*--R R moves +3 --->
Q3: L=0 R=7 L--*--*--*--*--*--R R stays, L moves -2 <--
^ ^
| |
L moves at R sweeps
most B=3 per monotonically
transition within block
Within same block: L moves O(sqrt(n)), R moves monotonically
Across all blocks: total = O((n + q) * sqrt(n))Worked Example: Mo's Algorithm -- 5 Queries on Array [2, 0, 1, 3, 2, 1, 4, 2]
Query: count of distinct elements. n = 8, block size B = ⌈√8⌉ = 3.
Queries (0-indexed, inclusive):
Q0:[1,5] Q1:[0,3] Q2:[4,7] Q3:[2,6] Q4:[0,7]
Step 1 -- Sort by (⌊L/B⌋, R):
| Query | L | R | Block | Sorted position |
|-------|---|---|-------|-----------------|
| Q1 | 0 | 3 | 0 | 1st |
| Q0 | 1 | 5 | 0 | 2nd |
| Q3 | 2 | 6 | 0 | 3rd |
| Q4 | 0 | 7 | 0 | 4th |
| Q2 | 4 | 7 | 1 | 5th |
Step 2 -- Process in sorted order, tracking [curL, curR]:
Array: [2, 0, 1, 3, 2, 1, 4, 2]
| Order | Query | [curL,curR] | L moves | R moves | adds | removes | answer |
|-------|-------|-------------|----------|---------|------|---------|--------|
| -- | init | [0, -1] | -- | -- | -- | -- | -- |
| 1 | Q1 | [0, 3] | 0->0 = 0 | -1->3=+4| 4 | 0 | 4 |
| 2 | Q0 | [1, 5] | 0->1 = +1| 3->5 =+2| 2 | 1 | 4 |
| 3 | Q3 | [2, 6] | 1->2 = +1| 5->6 =+1| 1 | 1 | 4 |
| 4 | Q4 | [0, 7] | 2->0 = -2| 6->7 =+1| 3 | 0 | 5 |
| 5 | Q2 | [4, 7] | 0->4 = +4| 7->7 = 0| 0 | 4 | 4 |
Totals: 8 L-moves + 8 R-moves = 16 add/remove ops
Naive (independent per query): 4+6+4+5+8 = 27 ops
Mo's order saves ~40% here; gap widens with more queries.2.4 The Invariant
+------------------------------------------------------------------+
| INVARIANT: Between queries the data structure reflects the exact |
| answer for the range [curL, curR]. |
| |
| - Each add(val) / remove(val) runs in O(1) |
| (or O(log n) with augmented structures). |
| - Total pointer movement across ALL queries: O((n+q) * sqrt(n)). |
+------------------------------------------------------------------+Why cnt[val]. add: increment cnt[val]; if it goes distinct. remove: decrement cnt[val]; if it goes distinct.
2.5 When to Reach for This
Trigger phrases in problem statements:
- "offline range queries" -- queries given upfront, answers output at the end.
- "no updates" / "static array" -- the array does not change.
- "distinct count in range", "frequency of values in subarray".
- "queries can be reordered" -- explicitly or implicitly.
Counter-examples (when NOT to use Mo's): these are recognition negatives -- if any one applies, Mo's is the wrong tool, and noticing the negative is as important as noticing the positive trigger phrases above.
| Situation | Why not Mo's | Use instead |
|---|---|---|
| Array has point updates between queries | add/remove invariant breaks; the array no longer reflects "the same window" across queries | Mo with updates / 3D Mo ( |
| Online required (next query depends on previous answer, XOR decryption, etc.) | Cannot reorder queries; midpoints cannot be batched | Persistent segment tree, merge-sort tree |
| add/remove is not | Total becomes | Segment tree with merge, offline divide-and-conquer, or Mo with rollback if remove is the only expensive side |
| Query answer cannot be maintained incrementally (e.g., median of window) | Even with O(1) data structure ops, recomputing the answer per pointer move is | Wavelet tree, persistent structures |
| Wavelet tree, offline D&C |
What the Code Won't Teach You
Recognizing Mo-solvable problems. The key condition: the problem is offline, the array is static (no updates), and there exists an efficient (ideally
The add/remove function is the design problem. The template gives add/remove for distinct count. In contests, you need to design add/remove for whatever the problem asks. This design question is often harder than Mo's itself. Examples:
- "Sum of frequencies squared" -> maintain a counter-of-counters
- "XOR of all elements" -> XOR is its own inverse, add = remove
- "k-th most frequent" -> harder, might need a sorted structure or secondary data structure
The initialization. Mo's starts with an empty window (curL = 0, curR = -1). This means your first query expands from empty. Your add/remove functions must handle adding to an empty state correctly -- test that add works when cnt[] is all zeros and cur_ans is 0.
How This Evolved
The most straightforward answer to "how many distinct values in
The critical observation is that if you already know the answer for
Later refinements sharpened this further. The zigzag optimization (alternating the sort direction of
C++ STL Reference
| Function / Class | Header | What it does | Time |
|---|---|---|---|
sort(v.begin(), v.end(), cmp) | <algorithm> | Sort queries by Mo ordering | |
vector<int> cnt(MAXVAL) | <vector> | Frequency array for add/remove | |
__builtin_clz(x) | built-in | Bit tricks for block computation | |
swap(a, b) | <utility> | Swap elements in add/remove |
Key point: Mo's algorithm uses almost no STL beyond sorting. The core is raw array manipulation with cnt[] for map/set inside add/remove -- the
Implementation (Contest-Ready)
Problem: Given an array of
Pointer Movement Order Matters
When moving from query [curL, curR] to [L, R], expand BEFORE shrinking:
while (curR < R) add(a[++curR]);-- expand rightwhile (curL > L) add(a[--curL]);-- expand leftwhile (curR > R) remove(a[curR--]);-- shrink rightwhile (curL < L) remove(a[curL++]);-- shrink left
Why this order? Expanding first ensures the window always covers a valid range. If you shrink first, you might temporarily have curL > curR (an invalid/empty window), which can cause index-out-of-bounds or wrong state in your add/remove functions.
Pre-increment vs post-increment: add(a[++curR]) adds element THEN moves pointer. add(a[curR++]) moves pointer THEN adds -- off-by-one. The ++curR / --curL pattern for expansion and curR-- / curL++ for shrinking is the safe convention.
Extending / shrinking the current window [L..R]:
Current: [ L . . . . . . . R ]
^ ^
curL curR
Extend R: [ L . . . . . . . R--->R+1 ] add(arr[++curR])
Extend L: [ L-1<---L . . . . . . . R ] add(arr[--curL])
Shrink R: [ L . . . . . . R-1 ]<--R rem(arr[curR--])
Shrink L: L-->[ L+1 . . . . . . . R ] rem(arr[curL++])
+----------------------------------------------------------+
| ORDER MATTERS: always expand before shrink! |
| |
| SAFE: expand R --> expand L --> shrink R --> shrink L |
| UNSAFE: shrink first --> window may become empty/invalid |
+----------------------------------------------------------+Version 1 -- Minimal contest template
cpp
#include <bits/stdc++.h>
using namespace std;
int block;
struct Query {
int l, r, idx;
bool operator<(const Query& o) const {
int b1 = l / block, b2 = o.l / block;
if (b1 != b2) return b1 < b2;
return (b1 & 1) ? (r > o.r) : (r < o.r); // odd block: r desc, even block: r asc
}
};
int cnt[1000001];
int cur_ans;
void add(int val) { if (++cnt[val] == 1) cur_ans++; }
void rem(int val) { if (--cnt[val] == 0) cur_ans--; }
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, q;
cin >> n >> q;
vector<int> a(n);
for (auto& x : a) cin >> x;
block = max(1, (int)sqrt(n));
vector<Query> queries(q);
for (int i = 0; i < q; i++) {
cin >> queries[i].l >> queries[i].r;
queries[i].idx = i;
}
sort(queries.begin(), queries.end());
vector<int> ans(q);
int curL = 0, curR = -1;
cur_ans = 0;
for (auto& [l, r, idx] : queries) {
while (curR < r) add(a[++curR]);
while (curL > l) add(a[--curL]);
while (curR > r) rem(a[curR--]);
while (curL < l) rem(a[curL++]);
ans[idx] = cur_ans;
}
for (int i = 0; i < q; i++)
cout << ans[i] << "\n";
}Version 2 -- Explained version
cpp
#include <bits/stdc++.h>
using namespace std;
// Block size for Mo's ordering. Set to sqrt(n).
int block;
struct Query {
int l, r, idx;
bool operator<(const Query& o) const {
int b1 = l / block, b2 = o.l / block;
if (b1 != b2) return b1 < b2;
// Odd-even trick: zigzag R between blocks to minimize pointer movement.
// even blocks (b1 & 1 == 0): sort r ascending (r < o.r)
// odd blocks (b1 & 1 == 1): sort r descending (r > o.r)
// This reduces total right-pointer movement by ~30-40%.
return (b1 & 1) ? (r > o.r) : (r < o.r);
}
};
// cnt[v] = how many times value v appears in the current window [curL, curR].
// cur_ans = number of distinct values (cnt[v] > 0) in the window.
int cnt[1000001];
int cur_ans;
// Include value val into the window.
void add(int val) {
cnt[val]++;
if (cnt[val] == 1) cur_ans++; // first occurrence -> new distinct value
}
// Exclude value val from the window.
void rem(int val) {
cnt[val]--;
if (cnt[val] == 0) cur_ans--; // last occurrence removed -> lost a distinct value
}
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, q;
cin >> n >> q;
vector<int> a(n);
for (auto& x : a) cin >> x;
// Block size = sqrt(n). The max(1, ...) guards against n=0.
block = max(1, (int)sqrt(n));
vector<Query> queries(q);
for (int i = 0; i < q; i++) {
cin >> queries[i].l >> queries[i].r;
queries[i].idx = i; // remember original order for output
}
// Sort all queries by Mo's ordering.
sort(queries.begin(), queries.end());
vector<int> ans(q);
// Current window is [curL, curR]. Start empty: curL=0, curR=-1.
int curL = 0, curR = -1;
cur_ans = 0;
for (auto& [l, r, idx] : queries) {
// Expand/shrink window to match [l, r].
// ORDER MATTERS: expand before shrinking to avoid
// the window becoming invalid (curL > curR+1 with wrong state).
//
// Safe order: expand R, expand L, shrink R, shrink L.
// This ensures the window always contains a valid range.
while (curR < r) add(a[++curR]); // expand right
while (curL > l) add(a[--curL]); // expand left
while (curR > r) rem(a[curR--]); // shrink right
while (curL < l) rem(a[curL++]); // shrink left
ans[idx] = cur_ans;
}
// Output answers in original query order.
for (int i = 0; i < q; i++)
cout << ans[i] << "\n";
}Operations Reference
| Operation | Time | Space |
|---|---|---|
| Sort queries (Mo order) | ||
| Process all queries (total) | ||
| Single add/remove (distinct count) | - | |
Single add/remove (with map/set) | - | |
| Mo with updates (3D Mo) | ||
| Mo on trees (Euler tour) |
Notes:
- The
bound assumes . When and differ significantly, optimal block size is , giving . - With the odd-even optimization, the constant factor roughly halves.
- If add/remove is
, total becomes -- often too slow for under 2s. Try to make add/remove . - Space is dominated by the frequency array. If values are up to
, that's 4 MB for int cnt[]. Coordinate-compress if values are up to.
🔍 Why This Works -- Block Size √n Minimizes Total Movement
Mo's algorithm answers offline range queries by sorting them cleverly and moving two pointers (left, right) to adjust the current range.
Why √n blocks? Split the array into blocks of size
- Right pointer: Within a block of left endpoints, right moves monotonically. Across
blocks, right travels per block -> total . - Left pointer: Between consecutive queries in the same block, left moves
. Over queries, left travels . - Total:
. Setting balances both terms to .
📐 Where Does the Complexity Come From? -- O((n+q)√n) Derivation
Setup: Array of size
Right pointer movement: Within each block of left endpoints, right moves monotonically (we sort by right within a block). Each block causes right to sweep at most
Left pointer movement: Between consecutive queries in the same block, left moves at most
Total:
Problem Patterns & Tricks
Pattern 1 -- Distinct element count in range
The classic Mo application. Maintain cnt[val]; increment distinct when cnt goes
Recognize it: "How many distinct values in
CF 86D -- Powerful Array (sum of cnt[x]^2 * x)
SPOJ DQUERY -- D-query (exact distinct count)Pattern 2 -- Frequency-based aggregate queries
Queries ask for
Trick: Maintain a secondary counter freq_cnt[k] = number of values with frequency cnt[] and freq_cnt[] in add/remove.
cpp
void add(int val) {
freq_cnt[cnt[val]]--;
cnt[val]++;
freq_cnt[cnt[val]]++;
cur_ans += 2LL * cnt[val] - 1; // for sum-of-squares: delta is (c+1)^2 - c^2
}CF 86D -- Powerful Array (sum of cnt[x]^2 * x)
CF 375D -- Tree and Queries (count of colors with freq >= k)Pattern 3 -- XOR / bitwise queries on ranges
Query: XOR of all elements in
cpp
void add(int val) { cur_xor ^= val; }
void rem(int val) { cur_xor ^= val; } // same as add!CF 617E -- XOR and Favorite Number (count pairs with XOR = k)Pattern 4 -- Mo with updates (Mo's algorithm with modifications)
Handle point updates by adding a time dimension. Each query becomes
cpp
struct Query {
int l, r, t, idx;
bool operator<(const Query& o) const {
int b1 = l / block, b2 = o.l / block;
if (b1 != b2) return b1 < b2;
int c1 = r / block, c2 = o.r / block;
if (c1 != c2) return c1 < c2;
return t < o.t;
}
};Apply/undo updates as the time pointer moves forward/backward.
CF 940F -- Machine Learning (Mo with updates, count of mex of frequencies)
CF 1476F -- Lanterns (offline with updates variant)Pattern 5 -- Mo on trees
Flatten the tree using an Euler tour. A path query in[v] and out[v] timestamps. If
Key trick: Each node appears twice in the Euler tour (entry and exit). Toggle: first occurrence adds the node, second occurrence removes it.
SPOJ COT2 -- Count on a tree II (distinct colors on path)
CF 375D -- Tree and Queries (colors with freq >= k on subtree)Pattern 6 -- Mo with rollback (Undo Mo)
Some operations (like "merge two sets") don't have efficient remove. Mo with rollback handles this: at the start of each block of queries, save a checkpoint. Process queries within the block by only expanding right (add-only). For the left pointer, expand temporarily and rollback after answering.
When to use: DSU-based queries (e.g., number of connected components in a range of edges), where remove is hard but add + undo is possible.
CF 896C -- Willem, Chtholly and Seniorious (chtholly tree, but Mo+rollback also works)
CF 1039D -- You Are Given a Tree (sqrt decomposition + Mo-like approach)Contest Cheat Sheet
+----------------------------------------------------------------+
| MO'S ALGORITHM -- QUICK REFERENCE |
+----------------------------------------------------------------+
| WHEN: Offline range queries, static array, O(1) add/remove |
| TIME: O((n+q)sqrt(n)) SPACE: O(n + MAXVAL) |
+----------------------------------------------------------------+
| SETUP: |
| block = max(1, (int)sqrt(n)); |
| sort queries by (l/block, r) // odd-even: flip r in odd |
+----------------------------------------------------------------+
| CORE LOOP: |
| int curL=0, curR=-1; |
| for(auto&[l,r,idx]:queries){ |
| while(curR<r) add(a[++curR]); |
| while(curL>l) add(a[--curL]); |
| while(curR>r) rem(a[curR--]); |
| while(curL<l) rem(a[curL++]); |
| ans[idx] = cur_ans; |
| } |
+----------------------------------------------------------------+
| WATCH OUT: |
| - Expand BEFORE shrink (avoid invalid empty states) |
| - Block size sqrt(n), NOT sqrt(q) |
| - Coordinate compress if vals > 1e6 |
| - add/remove MUST be O(1) or you blow the time budget |
| - Odd-even sort: (b&1) ? r DESC : r ASC (~30% faster) |
| - Store original query index for output ordering |
+----------------------------------------------------------------+Gotchas & Debugging
Block size tuning (advanced constant-factor note). The walkthrough above and the contest template both use B = sqrt(n). That is the right starting point: simple, correct, and asymptotically optimal when B = ceil(n / sqrt(q)):
- When
: B = sqrt(n)andB = n / sqrt(q)agree. - When
: B = n / sqrt(q)reduces total pointer movement and can be 2-3x faster -- the difference between AC and TLE on problems with many fewer queries than array elements. - Empirically, try
B = max(1, (int)ceil(n / sqrt(q)))if the default is borderline.
This is a tuning move, not the default. Do not strip the simple sqrt(n) template until you understand why the alternative is asymptotically tighter.
1. Block size matters more than you think. Use
2. Order of expand/shrink operations. The four while-loops must expand the window before shrinking it. If you shrink first, you can end up with
3. Off-by-one in add/remove boundaries. The idiom add(a[++curR]) increments first, then accesses. rem(a[curR--]) accesses first, then decrements. Mixing these up shifts your window by one. Test on a single-element query
4. Forgetting to coordinate-compress. If values go up to cnt[1000000000]. Compress values to
5. Not saving the query index. After sorting, queries are in Mo order, not input order. If you forget to store and use idx, your output is scrambled. Always assign queries[i].idx = i before sorting.
6. Incorrect odd-even comparator. The tiebreaker for (b1 & 1) ? (r < o.r) : (r > o.r) (inverted), the optimization hurts instead of helping -- right pointer zigzags more.
7. Mo with updates: wrong block size. For 3D Mo (with updates), block size should be
8. Global state not reset between test cases. If the problem has multiple test cases, cnt[] and cur_ans must be reset. Prefer memset(cnt, 0, sizeof cnt) or clear only the used entries.
Mental Traps
Trap: "Mo's is slow -- it's
Trap: "Mo's is just sorting queries smartly -- the rest is obvious." The sort is clever but the reason it works requires understanding the amortized analysis. If you just sort without understanding why this particular ordering minimizes pointer movement, you'll make mistakes when the analysis assumptions change (e.g., applying Mo's to a problem with updates, or with a non-
Trap: "I can use Mo's for problems with updates." Standard Mo's requires a static array. For problems with updates, you need "Mo's with updates" (3D Mo), which has block size
Trap: "The odd-even optimization is just a constant factor." It roughly halves the right-pointer movement in practice. For problems near the time limit, this is the difference between AC and TLE. It's worth including as a default habit, not treating as optional.
The mistake that teaches
Sum-of-frequency-squares remove delta. I was solving a sum-of-squares-of-frequencies problem with add = answer += 2*cnt[a[pos]] + 1; cnt[a[pos]]++; and naively wrote remove = answer -= 2*cnt[a[pos]] + 1; cnt[a[pos]]--;. But in add, delta is computed before incrementing ((cnt+1)^2 - cnt^2 = 2*cnt + 1). In remove, we need delta before decrementing (cnt^2 - (cnt-1)^2 = 2*cnt - 1). My code subtracted 2*cnt + 1 instead of 2*cnt - 1 -- off by 2 per removal. Invisible on small tests; wrong on large. Lesson: when writing remove, re-derive the delta formula; don't just negate add.
CF 86D Powerful Array (variable shadowing + template dependence). I copy-pasted a Mo's template, stripped the parity optimization "to simplify," and introduced a second bug: I computed B = sqrt(n) but had overwritten n with the query count earlier. With n_array = 200000 and q = 200000 both assigned to n, it passed samples but WA on test 2. Lesson: don't reuse variable names (n_array vs q), and don't strip pieces from templates until you understand why they're there.
Debug Drills
Drill 1: Mo's algorithm with incorrect pointer movement order.
cpp
int curL = 0, curR = -1;
for (auto& [l, r, idx] : queries) {
while (curR < r) add(++curR);
while (curL < l) remove(curL++);
while (curR > r) remove(curR--);
while (curL > l) add(--curL);
ans[idx] = answer;
}What's wrong?
The order of the four while loops matters. Here, curR expands first, then curL shrinks, then curR shrinks, then curL expands. The problem is: if we first expand R and then shrink L, we might temporarily have a very large window, which is fine -- but if we first shrink R before expanding L, we could get curR < curL - 1 (empty window going negative). The canonical safe order is: expand both directions first, then shrink both: while (curL > l) add(--curL); while (curR < r) add(++curR); while (curL < l) remove(curL++); while (curR > r) remove(curR--);. The given order can cause remove to be called when the element isn't in the window, corrupting cnt[].
Drill 2: Wrong block size causes TLE.
cpp
int B = sqrt(n); // n = 100000, so B = 316
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
if (a.l / B != b.l / B) return a.l / B < b.l / B;
return a.r < b.r;
});What's wrong?
The code is correct but suboptimal, leading to TLE on tight time limits. The fix is the block-parity optimization: for even-numbered blocks, sort by R ascending; for odd-numbered blocks, sort by R descending. This reduces the total R-pointer movement by roughly half. Fix: return (a.l / B != b.l / B) ? (a.l / B < b.l / B) : ((a.l / B) & 1) ? (a.r > b.r) : (a.r < b.r);
Drill 3: add and remove with unbounded values.
cpp
int answer = 0;
int cnt[MAXVAL] = {};
void add(int pos) {
if (cnt[a[pos]] == 0) answer++;
cnt[a[pos]]++;
}
void remove(int pos) {
cnt[a[pos]]--;
if (cnt[a[pos]] == 0) answer--;
}What's wrong?
In add, the check cnt[a[pos]] == 0 happens before incrementing, so it correctly detects a new distinct element. In remove, the decrement happens before the check, so cnt[a[pos]] == 0 correctly detects that the element is gone. This looks correct -- but the bug is subtle: if a[pos] can be negative or exceed MAXVAL, then cnt[a[pos]] is an out-of-bounds access. This commonly happens when the values aren't coordinate-compressed. Fix: compress values to [0, n) before running Mo's, or use an unordered_map instead of a fixed array.
Drill 4: Add/remove asymmetry with initial window state.
cpp
void add(int idx) {
cnt[a[idx]]++;
if (cnt[a[idx]] == 1) distinct++;
}
void remove(int idx) {
if (cnt[a[idx]] == 1) distinct--;
cnt[a[idx]]--;
}
// Process queries
int curL = 0, curR = -1;
for (auto& q : queries) {
while (curR < q.r) add(++curR);
while (curL > q.l) add(--curL);
while (curR > q.r) remove(curR--);
while (curL < q.l) remove(curL++);
ans[q.idx] = distinct;
}What's wrong?
The order of the four while-loops matters! The code first expands R, then expands L, then shrinks R, then shrinks L. This is correct. However, the bug is subtle: if curL starts at 0 and curR starts at -1, the first expansion adds elements correctly. But consider the remove function -- it checks cnt[a[idx]] == 1 before decrementing. If a[idx] has count 0 (never added), decrementing makes it -1. This happens if curL/curR initialization is wrong relative to the first query. The initialization curL = 0, curR = -1 means the window is empty, which is correct -- but if queries[0] has L > 0, the "shrink L" loop runs remove(0) on an element that was never added. Fix: Expand before shrinking. The given order does expand first, so the real issue is that curL = 0 means element 0 is "inside" the window when curR = -1 implies it's empty. Use curL = 1, curR = 0 or ensure the first operations are always expansions.
Drill 5: Block comparator doesn't handle ties correctly.
cpp
bool cmp(Query& a, Query& b) {
return a.l / B < b.l / B || a.r < b.r;
}What's wrong?
This comparator is not a valid strict weak ordering. When a.l / B == b.l / B, it falls through to a.r < b.r, which is correct. But when a.l / B < b.l / B, it returns true regardless of R -- that's fine. The problem is when a.l / B > b.l / B: the first condition is false, so it evaluates a.r < b.r, returning true if a.r < b.r even though a's block is after b's block. Fix:
cpp
if (a.l / B != b.l / B) return a.l / B < b.l / B;
return a.r < b.r;The original can cause cmp(a,b) == true && cmp(b,a) == true simultaneously, violating strict weak ordering and causing undefined behavior in std::sort.
Drill 6: Mo's with wrong block size.
cpp
int n, q;
cin >> n >> q;
int B = sqrt(q); // should be sqrt(n)
// ... read array and queries ...
sort(queries.begin(), queries.end(), [&](auto& a, auto& b) {
if (a.l / B != b.l / B) return a.l / B < b.l / B;
return a.r < b.r;
});What's wrong?
Block size should be based on the array size n, not the number of queries q. With B = sqrt(q): if q is much smaller than n (say n = 200000, q = 100), then B ~= 10, creating 20000 blocks. The L pointer moves O(B) = O(10) per query, but R moves O(n) per block transition, totaling O(n * n/B) = O(n * 20000) which is far worse than intended. The standard formula B = sqrt(n) ensures total moves are O((n + q) * sqrt(n)). For the optimal block size when n and q differ significantly, use B = max(1, (int)(n / sqrt(q))).
Debugging checklist:
- Print
curL, curR, cur_ansafter each query. Does the window match? - Test on a single query
-- should give the answer for the whole array. - Test on
. - Verify block size:
cerr << "block=" << block << "\n";
Self-Test
Without opening the implementation above, verify you can reproduce these from memory:
- [ ] Write the Mo's query comparator (the
operator<in theQuerystruct) from memory, including the odd-even block optimization for the right-pointer tiebreak. - [ ] Write the main processing loop -- four
while-loops in the correct expand-before-shrink order with correct pre/post increment pointer arithmetic (++curR,--curL,curR--,curL++). - [ ] Explain the amortized cost analysis in 3 sentences: why does the right pointer move
total? Why does the left pointer move total? - [ ] Design
add(val)andrem(val)for a new problem: "count the number of pairsin the window with ." (Hint: this is the sum-of-squares-of-frequencies variant -- the delta when frequency goes from to is .) - [ ] State what changes for Mo's with updates (3D Mo): which new pointer is added, what block size to use (
), and what the resulting complexity is ( ). - [ ] Given a problem statement, identify whether Mo's is applicable: check for offline queries, static array, and
add/remove. List three problem types where Mo's does NOT apply and name the alternative technique for each. - [ ] Explain why expanding the window before shrinking is necessary -- what invalid state can occur if you shrink first, and how does it break the add/remove invariant?
Practice Problems
Rating Progression
| CF Rating | What to Master |
|---|---|
| 1200 | Understand the brute-force "two-pointer window" for range queries; recognize when all queries are offline and the answer depends on frequency counts. |
| 1500 | Implement standard Mo's: sort by (l/B, r) with block size B = √n; maintain cnt[] array and a running answer; handle add/remove correctly. Output answers by original index. |
| 1800 | Mo's with the parity optimization (even blocks sort R ascending, odd blocks sort R descending); Mo's on trees via Euler tour flattening. Mo's with rollback for non-invertible operations. |
| 2100 | Mo's with updates (3D Mo's with timestamps); choosing optimal block size B ~= n^(2/3) for update variants; combining Mo's with bitset or other DS for complex queries; tuning block size for specific Q/N ratios. |
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | D-query | SPOJ | Easy | Textbook distinct count with Mo's |
| 2 | Powerful Array | CF 86D | Easy-Med | Sum of |
| 3 | Tree and Queries | CF 375D | Medium | Mo on subtrees, count colors with freq |
| 4 | XOR and Favorite Number | CF 617E | Medium | Mo + prefix XOR, count pairs with XOR = |
| 5 | Lomsat gelral | CF 600E | Medium | DSU on tree (alternative to Mo), dominant color per subtree |
| 6 | Machine Learning | CF 940F | Medium-Hard | Mo with updates (3D Mo), mex of frequency array |
| 7 | Count on a Tree II | SPOJ | Medium-Hard | Mo on tree paths via Euler tour |
| 8 | Arpa's letter-marked tree | CF 741D | Hard | DSU on tree / Mo on tree, bitmask of character parities |
| 9 | Array Queries | CF 797E | Hard | Sqrt decomposition + Mo-like approach for jump queries |
| 10 | Destiny | CF 840D | Hard | Randomized or Mo-based approach for frequent element in range |
| 11 | Distinct Values Queries | CSES | 1700 | Count distinct values in range -- classic Mo's application |
Further Reading
- cp-algorithms: Mo's Algorithm -- covers basic Mo, Mo with updates, and Mo on trees.
- CF Blog: Mo's Algorithm (Mos Algorithm) -- the original popular CF tutorial.
- CF Blog: Mo's Algorithm on Trees -- Euler tour trick for path queries.
- CF Blog: Mo with Rollback -- handling non-invertible operations.
- CF Blog: Sqrt Decomposition Tricks -- advanced block decomposition techniques.
See also:
02-sqrt-decomposition.md-- block decomposition fundamentals01-offline-queries.md-- general offline query techniques01-euler-tour-and-lca.md-- needed for Mo on trees04-cdq-divide-and-conquer.md-- offline divide and conquer for multi-dimensional problems05-parallel-binary-search.md-- batch multiple binary searches over the same timeline
Reconstruct It
Setup: You have an array of
elements and offline range queries, each asking for some aggregate (e.g., count of distinct values) over . A naive approach processes each query independently in , giving total -- too slow. What if you could reorder the queries to minimize total work?
Constraint hint: Imagine maintaining a sliding window
Step 1 -- Block decomposition on
Step 2 -- Count the total movement. There are
Step 3 -- Maintain the answer incrementally. You need add(x) and remove(x) functions that update the current answer when the window expands or shrinks by one element. For "count of distinct values": add increments a frequency counter and increases the distinct count if the frequency goes from 0 to 1; remove decrements and decreases if frequency goes from 1 to 0.
Full derivation
The algorithm (Mo's):
- Choose block size
. - For each query
, compute block_i = l_i / B. - Sort queries by
(block_i, r_i)-- with the optimization that for odd-numbered blocks, sortin decreasing order (reduces total -movement by ~50%). - Initialize
curL = 0, curR = -1(empty window). - For each query in sorted order:
- Expand/shrink the window to
by calling add/removeone element at a time. - Store the current answer for this query.
- Expand/shrink the window to
- Output answers in original query order.
Why it works: By grouping
Solve With Me -- CF 86D "Powerful Array"
Problem: array of
Let me try the naive approach first: for each query, iterate over
Can I precompute something? Prefix sums of... what? The answer depends on frequencies in the range, which don't decompose nicely into prefix sums. A segment tree could work if I merged frequency maps, but merging two frequency maps is
Hmm. Let me think differently. What if I could incrementally maintain the answer -- adding or removing one element at a time? If I add element
So I can maintain the answer with
Block size:
One thing that tripped me up: the Mo's block size formula. I initially used sqrt(n) but read somewhere that sqrt(n * 2 / 3) can be better. Tried both, didn't matter much for this problem. I also forgot to allocate the cnt[] array large enough -- values go up to
Another gotcha: the comparison function for sorting queries. Within the same block, if the block index is even I sort
Final solution: ~60 lines of C++. Mo's with O(1) transitions. AC in 3.8 seconds out of 5.