Appearance
Sqrt Decomposition
Quick Revisit
- USE WHEN: range queries + point updates and segment tree feels like overkill or is hard to apply
- INVARIANT: array split into √n blocks; full blocks use precomputed answers, partial blocks are brute-forced
- TIME: O(√n) per query and per update | SPACE: O(n)
- CLASSIC BUG: last block is smaller than √n — forgetting to handle the partial tail block
- PRACTICE: 08-ladder-mixed
AL-39 | Split an array into
Why this lives in the offline-techniques chapter. Sqrt decomposition itself is not an offline method -- it supports online point updates and range queries, and could equally well sit in Chapter 2. It is grouped here because (a) it is the canonical block-based fallback when segment trees feel like overkill, and (b) the block-decomposition mindset is the prerequisite for Mo's algorithm. Treat the chapter placement as a roadmap waypoint, not a claim that sqrt decomposition is fundamentally offline.
Difficulty: [Intermediate]Prerequisites: Arrays and Strings, Prefix SumsCF Rating Range: 1600 -- 2000
Contents
- 2. Intuition
- 3. C++ STL Reference
- 4. Implementation (Contest-Ready)
- 5. Operations Reference
- 6. Problem Patterns & Tricks
- 7. Contest Cheat Sheet
- 8. Gotchas & Debugging
- 9. Self-Test
- 10. Practice Problems
- 11. Further Reading
Intuition
2.1 The Pain
Consider an array of 16 elements. You need to support two operations: range sum queries and point updates.
Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
a[]: | 3 | 1 | 4 | 1 | 5 | 9 | 2 | 6 | 5 | 3 | 5 | 8 | 9 | 7 | 9 | 3 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+The naive approach scans every element in the range:
2.2 The Key Insight
Split the array into blocks of size
Analogy. Imagine organizing books on a bookshelf. You group every
Sqrt block structure (n=16, block_size=4):
Array: [ 3 1 4 1 | 5 9 2 6 | 5 3 5 8 | 9 7 9 3 ]
Blocks: +-block 0--+ +-block 1--+ +-block 2--+ +-block 3--+
Sums: [ 9 | 22 | 21 | 28 ]
Query [2..13]:
partial |##| full block | full block | partial |##|
[ 3 1 |4 1 | 5 9 2 6 | 5 3 5 8 | 9 7| 9 3 ]
^ ^
L=2 R=13
= brute(2..3) + block_sum[1] + block_sum[2] + brute(12..13)For our 16-element array,
Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
a[]: | 3 | 1 | 4 | 1 | 5 | 9 | 2 | 6 | 5 | 3 | 5 | 8 | 9 | 7 | 9 | 3 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| block 0 | block 1 | block 2 | block 3 |
| sum = 9 | sum = 22 | sum = 21 | sum = 28 |
+---------------+---------------+---------------+---------------+2.3 Visual Walkthrough
Step 1 -- Build block sums ( )
Scan the array once, accumulating each element into its block:
block[0] = a[0]+a[1]+a[2]+a[3] = 3+1+4+1 = 9
block[1] = a[4]+a[5]+a[6]+a[7] = 5+9+2+6 = 22
block[2] = a[8]+a[9]+a[10]+a[11] = 5+3+5+8 = 21
block[3] = a[12]+a[13]+a[14]+a[15] = 9+7+9+3 = 28Step 2 -- Range query: sum( , )
Index: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
a[]: | 3 | 1 |*4 |*1 |*5 |*9 |*2 |*6 |*5 |*3 |*5 |*8 |*9 |*7 | 9 | 3 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| block 0 | block 1 | block 2 | block 3 |
^ ^
l=2 r=13
(a) Partial left block (block 0, indices 2..3):
Scan a[2] + a[3] = 4 + 1 = 5 cost: 2 elements
(b) Full middle blocks (blocks 1 and 2):
block[1] + block[2] = 22 + 21 = 43 cost: 2 block lookups
(c) Partial right block (block 3, indices 12..13):
Scan a[12] + a[13] = 9 + 7 = 16 cost: 2 elements
Total = 5 + 43 + 16 = 64
Verify: 4+1+5+9+2+6+5+3+5+8+9+7 = 64 <-- correctWorst-case cost:
Query complexity analysis:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |##| full blocks (at most sqrt(n)) |##| | |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
^ ^
partial partial
(<= B elts) (<= B elts)
Total = O(B) + O(n/B) + O(B) = O(B + n/B)
Minimized when B = sqrt(n): O(sqrt(n))Step 3 -- Point update: set
Old a[5] = 9. New a[5] = 1. Difference = 1 - 9 = -8.
block_id = 5 / 4 = 1.
block[1] += -8 => 22 - 8 = 14.
a[5] = 1.
After update:
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
a[]: | 3 | 1 | 4 | 1 | 5 | 1 | 2 | 6 | 5 | 3 | 5 | 8 | 9 | 7 | 9 | 3 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| block 0 | block 1 | block 2 | block 3 |
| sum = 9 | sum = 14 | sum = 21 | sum = 28 |
^
updated
Cost: O(1)Step 4 -- Query after update: sum( , )
l and r are both in block 1. Same-block case: just scan.
a[4] + a[5] + a[6] + a[7] = 5 + 1 + 2 + 6 = 14
(Matches the updated block[1] = 14.)
Cost: O(B) = O(sqrt(n))2.4 The Invariant
+--------------------------------------------------------------+
| INVARIANT: block[b] always stores the aggregate (sum / min / |
| max) of the elements a[b*B .. min((b+1)*B-1, n-1)]. |
| |
| - Point update maintains this in O(1) for sum (O(B) for |
| min/max, since you must rescan the block). |
| - Range query decomposes [l,r] into at most O(sqrt(n)) |
| blocks plus two partial tails of O(sqrt(n)) elements each. |
+--------------------------------------------------------------+Every operation preserves the invariant. Build sets it. Point update restores it by adjusting a single block. Queries rely on it to skip full blocks.
2.5 When to Reach for This
Trigger phrases in problem statements:
- "range queries + updates" with
-- sqrt decomp is the simplest solution. - "
per query acceptable" -- you see the constraint math working out. - The aggregate operation is "too complex for segment tree" -- e.g., maintaining a sorted order per block, mode queries, or count-distinct queries.
Counter-examples (when sqrt decomp is NOT the right tool):
- You need
per operation -- use a segment tree or BIT instead. - All queries are offline and you want
with add/remove -- Mo's algorithm is the sqrt technique for that (see Mo's Algorithm). and tight time limit -- per query may be too slow; reach for structures.
What the Code Won't Teach You
When sqrt beats everything else. Several competition problems have no known
The block decomposition mindset. Sqrt isn't just a data structure -- it's a design pattern. You can apply it to decompose algorithms (Mo's algorithm), data (heavy-light decomposition), or complexity (sqrt trick for factorials). The unifying idea is: partition into
The "two levels of granularity" intuition. Every sqrt-decomposed structure maintains two views of the same data: the fine-grained element level and the coarse-grained block level. Updates touch one element and one block. Queries combine partial element-level scans with block-level summaries. Getting comfortable thinking at both levels simultaneously -- and knowing when to drop to element level vs. stay at block level -- is the core skill that the template code obscures.
C++ STL Reference
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
vector<T> | <vector> | Underlying array storage | |
fill(first, last, val) | <algorithm> | Initialize block aggregates | |
accumulate(first, last, init) | <numeric> | Sum a range (useful for building blocks) | |
min_element(first, last) | <algorithm> | Find min in a partial block | |
max_element(first, last) | <algorithm> | Find max in a partial block | |
sqrtl(n) / (int)sqrt((double)n) | <cmath> | Compute block size | |
__builtin_popcount(x) | built-in | Useful in bitwise sqrt-decomp variants |
Key detail: There is no STL "sqrt decomposition" container. You build it yourself with a vector for the array and a vector for block aggregates. That's the whole point -- it's simple enough to code from scratch in 2 minutes.
Implementation (Contest-Ready)
Same-Block Queries Need Special Handling
When query range [l, r] falls entirely within one block, the "process full blocks" loop runs zero times. You must handle this case separately:
Query [3, 6] with block_size = 4:
Block 0: [0,3], Block 1: [4,7], Block 2: [8,11]
l=3 is in block 0, r=6 is in block 1 -> different blocks -> normal path.
Query [5, 7] -- both in block 1:
DON'T process block 1 as a full block (it's partial on both sides).
Instead, iterate elements 5, 6, 7 directly.cpp
int lb = l / B, rb = r / B;
if (lb == rb) {
// Same block -- brute force elements [l, r]
for (int i = l; i <= r; i++) ans += a[i];
} else {
// Partial left block + full middle blocks + partial right block
...
}Block rebuild on point update: After updating a[i], you must recompute block_sum[i / B] (or whatever aggregate your blocks store). Forgetting this means queries return stale block values.
Point update with sqrt blocks:
Update arr[6] = 10 (was 2):
Before: [ 3 1 4 1 | 5 9 2 6 | 5 3 5 8 | 9 7 9 3 ]
Sums: [ 9 | 22 | 21 | 28 ]
^
block 1 contains index 6
After: [ 3 1 4 1 | 5 9 *10* 6 | 5 3 5 8 | 9 7 9 3 ]
Sums: [ 9 | 30 | 21 | 28 ]
+8 (diff applied to block sum)Version 1: Range Sum Queries + Point Update -- Minimal Template
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n, q;
cin >> n >> q;
vector<long long> a(n);
for (auto& x : a) cin >> x;
int B = max(1, (int)sqrt(n));
int nb = (n + B - 1) / B;
vector<long long> bs(nb, 0);
for (int i = 0; i < n; i++)
bs[i / B] += a[i];
// Point update: a[i] = val
auto update = [&](int i, long long val) {
bs[i / B] += val - a[i];
a[i] = val;
};
// Range sum query: sum of a[l..r]
auto query = [&](int l, int r) -> long long {
long long res = 0;
int bl = l / B, br = r / B;
if (bl == br) {
for (int i = l; i <= r; i++) res += a[i];
} else {
for (int i = l; i < (bl + 1) * B; i++) res += a[i];
for (int b = bl + 1; b < br; b++) res += bs[b];
for (int i = br * B; i <= r; i++) res += a[i];
}
return res;
};
while (q--) {
int t; cin >> t;
if (t == 1) {
int i; long long v; cin >> i >> v;
update(i, v);
} else {
int l, r; cin >> l >> r;
cout << query(l, r) << "\n";
}
}
}Version 2: Range Sum Queries + Point Update -- Explained
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n, q;
cin >> n >> q;
vector<long long> a(n);
for (auto& x : a) cin >> x;
// Block size: sqrt(n), but at least 1 to avoid division by zero.
int B = max(1, (int)sqrt(n));
// Number of blocks: ceil(n / B).
int nb = (n + B - 1) / B;
// bs[b] = sum of elements in block b.
vector<long long> bs(nb, 0);
for (int i = 0; i < n; i++)
bs[i / B] += a[i];
// Point update: set a[i] = val.
// Adjust the block sum by the difference (val - old value).
// Cost: O(1).
auto update = [&](int i, long long val) {
bs[i / B] += val - a[i];
a[i] = val;
};
// Range sum query: return a[l] + a[l+1] + ... + a[r].
// Three cases:
// 1) l and r are in the same block -> just scan.
// 2) Otherwise:
// a) Scan partial left block: l to end of its block.
// b) Sum full middle blocks using bs[].
// c) Scan partial right block: start of its block to r.
// Cost: O(sqrt(n)).
auto query = [&](int l, int r) -> long long {
long long res = 0;
int bl = l / B, br = r / B;
if (bl == br) {
// Same block: just iterate.
for (int i = l; i <= r; i++) res += a[i];
} else {
// Left partial block: l to end of block bl.
for (int i = l; i < (bl + 1) * B; i++) res += a[i];
// Full middle blocks.
for (int b = bl + 1; b < br; b++) res += bs[b];
// Right partial block: start of block br to r.
for (int i = br * B; i <= r; i++) res += a[i];
}
return res;
};
while (q--) {
int t; cin >> t;
if (t == 1) {
// Type 1: point update. Read index i and new value v.
int i; long long v; cin >> i >> v;
update(i, v);
} else {
// Type 2: range sum query. Read l and r (0-indexed, inclusive).
int l, r; cin >> l >> r;
cout << query(l, r) << "\n";
}
}
}Operations Reference
| Operation | Time | Space |
|---|---|---|
| Build (precompute block sums) | ||
Point update a[i] = val | ||
| Range sum / min / max query | ||
| Range update (add | ||
| Range update + range query (lazy blocks) | ||
| Total for |
When is this fast enough? For
Problem Patterns & Tricks
Pattern 1: Range Query + Point Update (Basic)
The bread-and-butter pattern. Maintain block aggregates (sum, min, max, gcd) and answer range queries in
Use this when the problem has
Examples: CF 13E -- Holes, CF 797F -- Mice and Holes
Pattern 2: Range Update + Range Query (Lazy Blocks)
Add a lazy[b] value per block. A range update "add lazy[b] for full blocks. Queries add lazy[b] to each block sum.
cpp
// Range add v to [l, r]:
// partial blocks: a[i] += v, rebuild block sum
// full blocks: lazy[b] += v
// Query a[l..r]:
// partial blocks: a[i] + lazy[block(i)]
// full blocks: bs[b] + lazy[b] * BExamples: CF 52C -- Circular RMQ, CF 242E -- XOR on Segment
Pattern 3: Mo's Algorithm (Offline Sqrt Decomposition on Queries)
Sort
This is the most important application of sqrt decomposition in competitive programming. See Mo's Algorithm for the full treatment.
Examples: CF 86D -- Powerful Array, CF 617E -- XOR and Favorite Number
Pattern 4: Threshold / Heavy-Light by Frequency
Partition values into "heavy" (frequency
This is a classic sqrt trick for count/frequency problems.
Examples: CF 375D -- Tree and Queries, CF 840D -- Destiny
Pattern 5: Sorted Blocks (Insert / Delete / k-th element)
Maintain each block as a sorted list. Insert/delete costs
cpp
// To find k-th element across sorted blocks:
for (int b = 0; b < nb; b++) {
if (k <= (int)block[b].size()) {
return block[b][k - 1]; // 1-indexed k
}
k -= block[b].size();
}Examples: CF 455D -- Serega and Fun
Pattern 6: Batch Processing (Sqrt Buffering)
Accumulate updates in a buffer. Every
This amortizes expensive rebuilds. Total cost for
Examples: CF 342E -- Xenia and Tree
Contest Cheat Sheet
+------------------------------------------------------------------+
| SQRT DECOMPOSITION CHEAT SHEET |
+------------------------------------------------------------------+
| WHEN TO USE: |
| - Range queries + point/range updates, n <= 2e5 |
| - Segment tree is overkill or hard to implement |
| - Mo's algorithm (offline queries) |
| - Threshold trick: split into heavy/light by sqrt(n) |
+------------------------------------------------------------------+
| TEMPLATE (range sum + point update): |
| |
| int B = max(1, (int)sqrt(n)); |
| vector<long long> bs((n+B-1)/B, 0); |
| for (int i=0;i<n;i++) bs[i/B] += a[i]; // build |
| |
| // update a[i] = v: |
| bs[i/B] += v - a[i]; a[i] = v; |
| |
| // query sum [l,r]: |
| scan partial left + sum full blocks + scan partial right |
+------------------------------------------------------------------+
| COMPLEXITY: O(sqrt(n)) per query/update |
| SPACE: O(n) array + O(sqrt(n)) block sums |
+------------------------------------------------------------------+
| REMEMBER: Last block may be smaller than B |
| Block of index i = i / B |
| Block b covers indices [b*B, min((b+1)*B-1, n-1)] |
+------------------------------------------------------------------+Gotchas & Debugging
Block size choice
Always use int B = max(1, (int)sqrt(n)). The max(1, ...) prevents division by zero when ceil(sqrt(n)) -- it gives slightly worse constant in some cases, and floor is the standard choice.
For some problems, tuning
Last incomplete block
The last block may contain fewer than
cpp
for (int i = br * B; i <= r; i++) ... // r < n, so this is safeBut if you ever iterate over "all elements in block
cpp
int lo = b * B;
int hi = min((b + 1) * B - 1, n - 1);
for (int i = lo; i <= hi; i++) ...Same-block edge case
When
Rebuilding block aggregate
After a point update, you must update the block aggregate. For sum, the bs[i/B] += new_val - old_val. For min/max, you may need to rescan the entire block in
Integer overflow
Block sums can be large. If each element is up to long long.
Off-by-one with 1-indexed input
Many CF problems use 1-indexed arrays. Convert to 0-indexed immediately, or adjust block formulas: block = (i - 1) / B for 1-indexed.
Mental Traps
Trap: "Segment tree is always better than sqrt decomposition."
Segment trees have
Trap: "The block size is always
The optimal
Trap: "Sqrt decomposition is trivial to implement."
The idea is trivial. The boundary cases (same block, left partial, right partial), lazy updates for range modifications, and block rebuilds on updates are all fertile ground for off-by-one errors. Most WAs in sqrt decomposition come from boundary handling, not algorithmic misunderstanding.
Trap: "I should always use sqrt decomposition for range queries."
For static arrays, prefix sums give
The mistake that teaches
RMQ with same-block queries. I implemented sqrt decomposition with l and r are in the same block, my code processed them as "partial left block + full middle blocks + partial right block" -- but there were no full middle blocks, and the two partial pieces overlapped, double-counting elements. Fix: if l/B == r/B, brute-force from l to r directly. The "left tail + full blocks + right tail" decomposition assumes at least one full block in between.
CF 13E Holes (hidden chain inside rebuild). For each cell, precompute where you exit the current block and how many jumps it takes. My rebuild had a nested loop -- for each cell, I followed the jump chain to the block boundary, which could be
Debug Drills
Drill 1: Off-by-one in block boundary computation.
cpp
int B = (int)sqrt(n);
vector<long long> block_sum(n / B + 1, 0);
for (int i = 0; i < n; i++)
block_sum[i / B] += a[i];
long long query(int l, int r) {
long long res = 0;
int bl = l / B, br = r / B;
if (bl == br) {
for (int i = l; i <= r; i++) res += a[i];
return res;
}
for (int i = l; i < (bl + 1) * B; i++) res += a[i]; // left tail
for (int b = bl + 1; b < br; b++) res += block_sum[b]; // full blocks
for (int i = br * B; i <= r; i++) res += a[i]; // right tail
return res;
}What's wrong?
When n is not divisible by B, the expression n / B + 1 may allocate one too few blocks. For example, if n = 10 and B = 3, the blocks are {0,1,2}, {3,4,5}, {6,7,8}, {9} -- that's 4 blocks, but 10/3 + 1 = 4 happens to work. However, if n = 9 and B = 3, you get 9/3 + 1 = 4 blocks but only need 3. The real danger: if B = (int)sqrt(n) rounds down to 0 when n = 0, you get division by zero. Fix: use B = max(1, (int)sqrt(n)) and allocate (n + B - 1) / B blocks.
Drill 2: Forgetting to apply lazy tag before reading individual elements.
cpp
void range_add(int l, int r, int val) {
int bl = l / B, br = r / B;
if (bl == br) {
for (int i = l; i <= r; i++) a[i] += val;
block_sum[bl] += (long long)val * (r - l + 1);
return;
}
for (int i = l; i < (bl + 1) * B; i++) a[i] += val;
block_sum[bl] += (long long)val * ((bl + 1) * B - l);
for (int b = bl + 1; b < br; b++) lazy[b] += val;
for (int i = br * B; i <= r; i++) a[i] += val;
block_sum[br] += (long long)val * (r - br * B + 1);
}
int point_query(int idx) {
return a[idx]; // BUG
}What's wrong?
point_query returns a[idx] without adding lazy[idx / B]. The lazy tag stores a pending addition that hasn't been applied to individual elements. Fix: return a[idx] + lazy[idx / B];. Alternatively, push the lazy tag to all elements in the block before accessing them.
Drill 3: Rebuilding sorted block incorrectly after update.
cpp
// Each block maintains a sorted copy for order-statistic queries
void update(int pos, int new_val) {
int b = pos / B;
a[pos] = new_val;
// Rebuild sorted block
sorted_block[b].clear();
for (int i = b * B; i < (b + 1) * B; i++)
sorted_block[b].push_back(a[i]);
sort(sorted_block[b].begin(), sorted_block[b].end());
}What's wrong?
The rebuild loop iterates from b*B to (b+1)*B, but for the last block, (b+1)*B may exceed n. This reads uninitialized or out-of-bounds memory. Fix: use min((b+1)*B, n) as the upper bound: for (int i = b*B; i < min((b+1)*B, n); i++).
Drill 4: Inclusive vs exclusive right boundary in partial block.
cpp
int B = sqrt(n);
for (int i = 0; i < n; i++)
block_id[i] = i / B;
// Query [l, r]:
if (block_id[l] == block_id[r]) {
// brute force
} else {
// partial left block: l to end of block
for (int i = l; i < (block_id[l] + 1) * B; i++) // ...
// full blocks
for (int b = block_id[l] + 1; b < block_id[r]; b++) // ...
// partial right block: start of block to r
for (int i = block_id[r] * B; i < r; i++) // ... BUG HERE
}What's wrong?
The right partial block loop uses i < r but should use i <= r. The range is [l, r] inclusive, so position r itself must be included. This off-by-one means the rightmost element of every query is silently excluded -- producing answers that are slightly wrong.
Drill 5: Block size when n is small.
cpp
int B = sqrt(n);
int num_blocks = (n + B - 1) / B;
vector<long long> block_sum(num_blocks, 0);
for (int i = 0; i < n; i++)
block_sum[i / B] += a[i];What's wrong?
When n = 0, B = sqrt(0) = 0, and i / B causes division by zero. Even for small n like 1-3, sqrt(n) might return 1, which works but leads to degenerate behavior. Always guard: int B = max(1, (int)sqrt(n));. Also, for n = 0, the loop doesn't execute, but num_blocks computes (0 + 0 - 1) / 0 -- another division by zero.
Drill 6: Lazy propagation push-down on last block.
cpp
void push(int b) {
if (lazy[b] != 0) {
for (int i = b * B; i < (b + 1) * B; i++)
a[i] += lazy[b];
lazy[b] = 0;
}
}
// Problem: the last block may have fewer than B elementsWhat's wrong?
The loop i < (b + 1) * B goes out of bounds for the last block when n is not a multiple of B. The last block has only n - b * B elements. Fix:
cpp
for (int i = b * B; i < min((b + 1) * B, n); i++)Without this, you write past the array end, causing undefined behavior -- which might silently corrupt other data rather than crashing.
Debug checklist
- Print block sums after building -- verify they match naive sums.
- Test query where
and are in the same block. - Test query where
and (full range). - Test with
(single element, single block). - After point update, verify the block sum changed correctly.
Self-Test
Without opening the implementation above, verify you can answer each item:
- [ ] Write the 3-part range query (left partial + full blocks + right partial) from memory, including the same-block special case.
- [ ] Given a point update
a[i] = v, explain how and why you update the block sum infor sum aggregates vs. for min/max aggregates. - [ ] Derive the optimal block size
when partial-block processing costs instead of . (Hint: minimize .) - [ ] Name a problem category where sqrt decomposition works but segment trees struggle (non-mergeable operations, arbitrary reordering, etc.).
- [ ] Explain the "same block" edge case: why does it need separate handling, and what goes wrong if you treat
and as belonging to two separate partial blocks? - [ ] Describe how lazy tags work for range updates on full blocks, and why applying element-by-element to full blocks ruins the complexity.
- [ ] Trace through a range query
on a 16-element array with , identifying which parts are partial and which are full blocks.
Practice Problems
Rating Progression
| CF Rating | What to Master |
|---|---|
| 1200 | Basic block decomposition: precompute block sums, answer range sum queries by combining full blocks + partial tails. Understand blocks: array of size n split into chunks of size ~√n. |
| 1500 | Lazy block updates: maintain a per-block "add" tag and apply it during queries on partial blocks. Implement range sum with point updates using sqrt blocks; handle partial blocks at boundaries. |
| 1800 | Sorted blocks for order-statistic queries; rebuilding a single block in O(√n) after an update; threshold tricks (brute-force small, DS for large). Lazy propagation within blocks, the threshold trick (values appearing > √n times), Mo's algorithm. |
| 2100 | Sqrt on queries (Mo's algorithm foundations); batched sqrt-decomposition on time for offline updates; combining sqrt blocks with other DS inside blocks. Sqrt of sqrt (block-of-blocks), sqrt decomposition on queries. |
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Static Range Sum Queries | CSES | 1100 | Warmup -- prefix sums suffice, but practice block decomp as exercise |
| 2 | Dynamic Range Sum Queries | CSES | 1400 | Range sum + point update -- direct sqrt decomp application |
| 3 | Circular RMQ | CF 52C | 1600 | Range add + range min on circular array, lazy blocks |
| 4 | XOR on Segment | CF 242E | 1700 | Range XOR update + range sum query with sqrt or segment tree |
| 5 | Powerful Array | CF 86D | 1800 | Mo's algorithm -- the classic sqrt offline problem |
| 6 | Tree and Queries | CF 375D | 1800 | Mo's on trees with Euler tour + threshold trick |
| 7 | XOR and Favorite Number | CF 617E | 1900 | Mo's algorithm with XOR prefix sums |
| 8 | Holes | CF 13E | 2000 | Sqrt decomp with jump pointers per block |
| 9 | Serega and Fun | CF 455D | 2100 | Sqrt decomp with cyclic shifts inside blocks |
| 10 | Destiny | CF 840D | 2100 | Sqrt threshold trick -- heavy/light value decomposition |
Further Reading
- cp-algorithms: Sqrt Decomposition -- comprehensive reference with multiple variants.
- CF Blog: Sqrt decomposition tutorial -- beginner-friendly walkthrough with examples.
- CF Blog: An alternative sorting order for Mo's -- block-parity optimization that speeds up Mo's by ~1.5x.
- CF Blog: Sqrt-decomposition tricks -- advanced patterns and threshold techniques.
- Related: Mo's Algorithm -- the most important application of sqrt ideas.
- Related: Segment Tree -- the
alternative when sqrt is too slow. - Related: Prefix Sums -- static version; use when there are no updates.