Appearance
Coordinate Compression
Quick Revisit
- USE WHEN: Values are huge/sparse but only relative order matters—need to use them as array indices
- INVARIANT: sort + deduplicate + lower_bound maps n values to [0, n) preserving order
- TIME: O(n log n) to compress | SPACE: O(n)
- CLASSIC BUG: Forgetting to add query points to the compression set before compressing
- PRACTICE: Practice Ladder
Map
Difficulty: Intermediate | CF Rating: 1300–1700 | Prerequisites: Sorting and Searching, Hash Map and Unordered Map
Contents
- Intuition
- C++ STL Reference
- Implementation (Contest-Ready)
- Operations Reference
- Problem Patterns & Tricks
- Contest Cheat Sheet
- Gotchas & Debugging
- Self-Test
- Practice Problems
- Further Reading
- Rating Progression
- Before You Code Checklist
- What Would You Do If...?
- The Mistake That Teaches
- Debug This
- Historical Origin
- What to Solve Next
Intuition
Consider these 6 coordinates:
cpp
a[] = { 1000000000, 3, 500, 1000000000, 3, 750 }You need to use them as array indices—to update a Binary Indexed Tree or a counting array. The largest value is int array; the memory limit is 256 MB.
Look at the waste. Between index 4 and index 499—nothing. Between 501 and 749—nothing. Between 751 and 999999999—nothing. Out of
Only the relative order of values matters, not their magnitudes.
There are exactly 4 distinct values:
Analogy. A concert venue has seats numbered
Visual Walkthrough
text
STEP 0 -- Original array
+------------+---+-----+------------+---+-----+
| 1000000000 | 3 | 500 | 1000000000 | 3 | 750 |
+------------+---+-----+------------+---+-----+
[0] [1] [2] [3] [4] [5]
STEP 1 -- Copy values, sort them
sorted[]:
+---+---+-----+-----+------------+------------+
| 3 | 3 | 500 | 750 | 1000000000 | 1000000000 |
+---+---+-----+-----+------------+------------+
STEP 2 -- Remove consecutive duplicates (unique + erase)
dict[]:
+---+-----+-----+------------+
| 3 | 500 | 750 | 1000000000 |
+---+-----+-----+------------+
[0] [1] [2] [3]
STEP 3 -- Map each original value via lower_bound on dict[]
a[0] = 1000000000 --> lower_bound finds dict[3] --> 3
a[1] = 3 --> lower_bound finds dict[0] --> 0
a[2] = 500 --> lower_bound finds dict[1] --> 1
a[3] = 1000000000 --> lower_bound finds dict[3] --> 3
a[4] = 3 --> lower_bound finds dict[0] --> 0
a[5] = 750 --> lower_bound finds dict[2] --> 2
STEP 4 -- Result: compressed array
+---+---+---+---+---+---+
| 3 | 0 | 1 | 3 | 0 | 2 |
+---+---+---+---+---+---+
[0] [1] [2] [3] [4] [5]
BIT / segment tree size needed: 4 (not 1000000001)The Invariant
text
+------------------------------------------------------------------+
| compressed[i] < compressed[j] iff original[i] < original[j] |
+------------------------------------------------------------------+Coordinate compression is an order-preserving bijection from the set of distinct values to
What the Code Won't Teach You
The technique is about space, not correctness. Coordinate compression doesn't change what an algorithm does—it enables an algorithm that requires integer indices in a small range to operate on data with large values. The underlying algorithm (segment tree, BIT) is the actual tool; compression is the adapter that plugs oversized values into it.
text
Without compression: With compression:
BIT size = max_val = 10^9 BIT size = n = 200000
+---------------------------+ +-----------+
| 99.9999% empty slots | | every |
| wasting 4 GB of memory | | slot used |
+---------------------------+ +-----------+
Memory: 4 GB (MLE) Memory: 800 KB (OK)When you need val - 1 and val + 1 in the compression set. This arises when queries involve strict inequalities. If you query "how many values
text
Worked example: why query values need compression
data = [7, 2, 5, 5, 9, 2] query: "count values <= 6"
Compress data only: vals = [2, 5, 7, 9]
rank: 0 1 2 3
Query "count <= 6": rank of 6 = ??? (6 not in vals!)
lower_bound gives rank 2 (pointing at 7). WRONG.
Compress data + queries: vals = [2, 5, 6, 7, 9]
rank: 0 1 2 3 4
Query "count <= 6": rank of 6 = 2 --> BIT prefix [0..2]
Counts elements with rank 0,1,2 = values 2,5,6.
Only 2 and 5 exist in data --> answer = 4 (two 2s + two 5s).The lookup direction matters: value-to-rank vs. rank-to-value.lower_bound gives value-to-rank: "what rank does value vals[r] directly. Some problems need both directions—for instance, "output the
With the core mechanics covered, here is how to recognize when compression is the right tool.
When to Reach for This
Trigger phrases in problem statements:
- "coordinates up to
" (or ) with - "use values as array indices" but the value range far exceeds
- "segment tree on values" or "BIT on values"—not on positions
- "count elements less than
" across an array - "offline queries" where all query values are known in advance
Counter-examples (compression not needed):
- Values are already in
(e.g., a permutation). Use them directly. - You only need to compare values, never index into an array by value. A plain sort suffices.
- The value range fits in memory (e.g., values in
with a generous limit). Direct indexing is simpler.
C++ STL Reference
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
sort(first, last) | <algorithm> | Sorts range in-place (ascending by default) | |
sort(first, last, comp) | <algorithm> | Sorts with custom comparator | |
unique(first, last) | <algorithm> | Removes consecutive duplicates; returns iterator to new logical end | |
vec.erase(first, last) | <vector> | Erases elements in range [first, last) from vector | |
lower_bound(first, last, val) | <algorithm> | Iterator to first element | |
upper_bound(first, last, val) | <algorithm> | Iterator to first element | |
distance(first, last) | <iterator> | Number of increments from first to last | |
ranges::sort(range) | <algorithm> | C++20 ranges version of sort |
Key idiom: sort → unique → erase is the standard three-step for deduplicating a vector.
text
lower_bound vs upper_bound on sorted vals = [2, 5, 5, 7, 9]:
lower_bound(5): upper_bound(5):
+---+---+---+---+---+ +---+---+---+---+---+
| 2 | 5 | 5 | 7 | 9 | | 2 | 5 | 5 | 7 | 9 |
+---+---+---+---+---+ +---+---+---+---+---+
^ ^
first >= 5 first > 5
rank = 1 rank = 3
For compression (after unique): vals = [2, 5, 7, 9]
lower_bound(5) -> rank 1 (use for "rank of value")
upper_bound(5) -> rank 2 (use for "count of values <= x")cpp
sort(v.begin(), v.end());
v.erase(unique(v.begin(), v.end()), v.end());Implementation (Contest-Ready)
Version 1: Minimal Contest Template
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<int> a(n);
for (auto& x : a) cin >> x;
// Compress
vector<int> vals(a);
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
// Map each element to its compressed index
vector<int> c(n);
for (int i = 0; i < n; i++)
c[i] = lower_bound(vals.begin(), vals.end(), a[i]) - vals.begin();
// c[i] is in [0, vals.size())
// To recover original value: vals[c[i]]
// Example output
for (int i = 0; i < n; i++)
cout << c[i] << " \n"[i == n - 1];
return 0;
}Version 2: Explained Version with Reusable Function
cpp
#include <bits/stdc++.h>
using namespace std;
// Compresses values in-place. Returns the sorted unique values (the "dictionary").
// After this call, a[i] contains the compressed index in [0, dict.size()).
// To recover original value: dict[a[i]].
template <typename T>
vector<T> compress(vector<T>& a) {
// Step 1: Copy all values into a separate vector.
// We need the originals to stay intact during sorting.
vector<T> dict(a);
// Step 2: Sort + deduplicate.
// After sort, duplicates are adjacent, so unique() collapses them.
// erase() shrinks the vector to the new logical size.
sort(dict.begin(), dict.end());
dict.erase(unique(dict.begin(), dict.end()), dict.end());
// dict now holds each distinct value exactly once, in sorted order.
// dict.size() == number of distinct values == compressed range size.
// Step 3: Replace each original value with its index in dict[].
// lower_bound on a sorted array is O(log n), so this loop is O(n log n).
for (auto& x : a) {
// lower_bound returns iterator to the position of x in dict.
// Subtracting dict.begin() converts to a 0-based index.
x = lower_bound(dict.begin(), dict.end(), x) - dict.begin();
}
return dict;
}
// Compress multiple arrays that share the same value space.
// Example: data values + query values both need the same mapping.
template <typename T>
vector<T> compress_multi(vector<vector<T>*> arrays) {
vector<T> dict;
for (auto* arr : arrays)
for (auto& x : *arr)
dict.push_back(x);
sort(dict.begin(), dict.end());
dict.erase(unique(dict.begin(), dict.end()), dict.end());
for (auto* arr : arrays)
for (auto& x : *arr)
x = lower_bound(dict.begin(), dict.end(), x) - dict.begin();
return dict;
}
int main() {
int n;
cin >> n;
vector<int> a(n);
for (auto& x : a) cin >> x;
// Single-array compression
vector<int> dict = compress(a);
cout << "Compressed: ";
for (int i = 0; i < n; i++)
cout << a[i] << " \n"[i == n - 1];
cout << "Dictionary (" << dict.size() << " distinct values): ";
for (int i = 0; i < (int)dict.size(); i++)
cout << dict[i] << " \n"[i == (int)dict.size() - 1];
return 0;
}Operations Reference
| Operation | Time | Space |
|---|---|---|
| Build dictionary (sort + unique + erase) | ||
Compress one element (lower_bound) | ||
| Compress all | ||
Decompress one element (dict[c[i]]) | ||
| Compress |
Here
Problem Patterns & Tricks
Inversion Count via BIT
Compress values, then sweep left-to-right. For each element, query the BIT for how many already-inserted values are greater. The canonical application.
text
Inversion count worked example: a = [4, 1, 3, 2]
Step 1 -- Compress: vals = [1,2,3,4] ranks: 1->0, 2->1, 3->2, 4->3
compressed a = [3, 0, 2, 1]
Step 2 -- Sweep left-to-right with BIT:
BIT = [0, 0, 0, 0] (size = 4)
i=0: a[0]=3 query(3)=0 inv from above=0 update(3,+1)
BIT = [0, 0, 0, 1]
i=1: a[1]=0 query(3)-query(0)=1 inv=1 update(0,+1)
BIT = [1, 0, 0, 1] inversions: (4,1)
i=2: a[2]=2 query(3)-query(2)=1 inv=1 update(2,+1)
BIT = [1, 0, 1, 1] inversions: (4,3)
i=3: a[3]=1 query(3)-query(1)=2 inv=2 update(1,+1)
BIT = [1, 1, 1, 1] inversions: (4,2),(3,2)
Total inversions = 0 + 1 + 1 + 2 = 4
Verify: (4,1),(4,3),(4,2),(3,2) = 4 <-- correctcpp
// After compression, a[i] in [0, m)
long long inv = 0;
for (int i = 0; i < n; i++) {
inv += query(m - 1) - query(a[i]); // count of inserted vals > a[i]
update(a[i], 1);
}Range Update / Point Query on Sparse Values
Values up to
text
Multi-array compression -- data + queries share one dictionary:
data = [100, 3, 50] queries = [50, 200]
| |
v v
combined = [3, 50, 100, 200] (sort + unique)
rank: 0 1 2 3
compressed data = [2, 0, 1]
compressed queries = [1, 3]
BIT size = 4 (not 200!)"Count of Elements Less Than X" Offline
Sort queries and data together, compress all values into one dictionary, then sweep with a BIT or merge sort.
Coordinate Compression for 2D Sweep Line
Compress x-coordinates (or y-coordinates) before sweeping. Essential when coordinates reach
Problems: CF 1401E
Segment Tree with Lazy Propagation on Compressed Coordinates
When you need range operations on values rather than positions, compress the value space first, then build a segment tree of size
Problems: CF 915E
Longest Increasing Subsequence (LIS) via BIT
Compress values, then compute LIS in BIT[v] is the length of the longest increasing subsequence ending at compressed value
cpp
// After compression: a[i] in [0, m)
vector<int> dp(m, 0);
// BIT query max in [0, a[i]-1], then update a[i] with result+1Problems: CF 340D
Contest Cheat Sheet
text
+--------------------------------------------------------------+
| COORDINATE COMPRESSION CHEAT SHEET |
+--------------------------------------------------------------+
| WHEN TO USE: |
| - Values up to 10^9, need BIT/segtree/counting array |
| - n <= 2*10^5 but value range is huge |
| - "How many elements less/greater than X" type queries |
+--------------------------------------------------------------+
| TEMPLATE (copy-paste): |
| |
| vector<int> vals(a); |
| sort(vals.begin(), vals.end()); |
| vals.erase(unique(vals.begin(), vals.end()), vals.end()); |
| for (auto& x : a) |
| x = lower_bound(vals.begin(), vals.end(), x) |
| - vals.begin(); |
| |
+--------------------------------------------------------------+
| COMPLEXITY: |
| Time: O(n log n) Space: O(n) |
+--------------------------------------------------------------+
| REMEMBER: |
| - Compress ALL arrays sharing the value space |
| - Decompress: vals[compressed_index] |
| - lower_bound for >=, upper_bound for > |
| - BIT/segtree size = vals.size(), NOT original max |
+--------------------------------------------------------------+Gotchas & Debugging
Forgetting to merge value spaces. If your data has values in
a[]and your queries reference values too, compress both into the same dictionary. Mismatched dictionaries produce silently wrong indices.Off-by-one with
lower_boundvs.upper_bound.lower_bound→ first positionvalue → use for "rank of this value." upper_bound→ first positionvalue → use for "count of values x" (result = upper_bound - begin).- Mixing them up produces wrong answers with no compiler warning.
Sorting the original instead of a copy. You need the original order for the actual algorithm. Always copy before sorting.
Using 1-indexed BIT with 0-indexed compression. If your BIT is 1-indexed, shift every compressed value up:
c[i] = lb_index + 1.Integer overflow in the original values. Compressed indices are small, so arithmetic on them is safe. But the original values can overflow
intduring preprocessing—uselong longwhen values exceed. Duplicate handling.
lower_boundmaps all duplicates to the same index, which is usually what you want (rank-based equality). If you need unique indices per occurrence, you need tie-breaking by position (e.g., stable sort on(value, original_index)pairs).Empty input. If
, the dictionary is empty and lower_boundon an empty range is technically fine, but a BIT or segment tree of size 0 will segfault. Guard against it explicitly.
Debugging tip: Print the dictionary and compressed array; verify dict[c[i]] == original_a[i] for every
Mental Traps
Trap: "Coordinate compression is just sorting."
text
Compression is THREE steps, not one:
sort unique + erase lower_bound
+--------+ +--------+ +--------+
| 3 3 5 | -> | 3 5 7 | a[i] ->| rank |
| 5 7 7 | +--------+ +--------+
+--------+ deduplicate rank-lookup
Skip deduplication --> duplicate values get
different ranks --> wrong answers.Trap: "I'll compress and figure out the indexing later."
text
WRONG approach: RIGHT approach:
+---------------------------+ +---------------------------+
| 1. Compress data values | | 1. Decide: 0-based or |
| 2. Build BIT | | 1-based indexing? |
| 3. Wait, BIT is 1-indexed | | 2. Need val-1 or val+1 |
| but compression is | | in compression set? |
| 0-indexed... add 1? | | 3. Collect ALL values |
| Which values? Some? | | (data + queries + adj) |
| 4. Debug for 2 hours | | 4. Compress + build |
+---------------------------+ +---------------------------+
Design the indexing scheme BEFORE you compress.Design your indexing scheme—0-based vs. 1-based, which sentinel values to include—before you call sort. A two-minute plan saves two hours of debugging.
Trap: "Compression and normalization are the same thing."
text
Coordinate compression: Normalization:
+-------------------------+ +-------------------------+
| values -> integer ranks | | values -> [0.0, 1.0] |
| {3, 500, 750} -> {0,1,2}| | {3, 500, 750} -> |
| | | {0.0, 0.665, 1.0} |
| Preserves ORDER only | | Preserves RATIOS |
| For integer indexing | | For real-valued tasks |
+-------------------------+ +-------------------------+Self-Test
- [ ] Write the full coordinate compression pipeline (sort + unique + erase + rank lookup) from memory, with no bugs in the
unique+eraseidiom. - [ ] Explain why both data values and query values must be included in the compression set, with a concrete example where omitting query values produces wrong results.
- [ ] State when you need to include
val - 1orval + 1in the compression set. - [ ] Given a compressed rank
, state how to recover the original value. - [ ] Explain the purpose of coordinate compression in one sentence.
- [ ] Describe the difference between 0-indexed compression and 1-indexed BIT usage, and how to bridge the gap.
- [ ] Given
a = [100, 5, 100, 50], produce the compressed array by hand, showing each step.
Practice Problems
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | B - Sort the Array | CF 451B | 1300 | Sorting + checking, warm-up for compression thinking |
| 2 | D - Pair of Topics | CF 1324D | 1400 | Compress differences, count pairs with BIT/binary search |
| 3 | D - Number of Segments with Big Sum | CF 1042D | 1600 | Prefix sums + compression + BIT for counting |
| 4 | E - Construct the Binary Tree | CF 1311E | 1600 | Compress + BIT for inversions during construction |
| 5 | E - Inversions After Shuffle | CF 749E | 1500 | Expected inversions, BIT on compressed values |
| 6 | D - Serega and Fun | CF 459D | 1800 | Count inversions between two halves, compression required |
| 7 | D - Restore Permutation | CF 1208D | 1700 | Reconstruct permutation using BIT on compressed sums |
| 8 | E - Physical Education Lessons | CF 915E | 2000 | Interval operations on compressed coordinate space |
| 9 | Nested Ranges Check | CSES | 1600 | Compress endpoints, check containment with BIT |
| 10 | Nested Ranges Count | CSES | 1700 | Count containing/contained ranges after compression |
| 11 | Salary Queries | CSES | 1700 | Coordinate compress salaries for BIT point updates |
Further Reading
- cp-algorithms: Coordinate Compression—formal treatment with geometry applications.
- USACO Guide: Coordinate Compression—good introductory examples.
- CF Blog: Coordinate Compression tutorial—community explanations and examples.
- For BIT usage after compression, see: Binary Indexed Tree (Fenwick)
- For segment tree usage after compression, see: Segment Tree
- Difference Arrays—the dual preprocessing trick
- Practice Ladder—curated problems
The core insight: you don't need the actual values—only their relative order. Mapping
Rating Progression
- CF 1200: Compress values before counting with a frequency array; map large coordinates to dense indices for simple lookups.
- CF 1500: Compress before BIT/segment tree for inversion counting or range queries on values.
- CF 1800: Compress both endpoints of intervals for sweep-line algorithms; handle events on a compressed timeline.
- CF 2100: Multi-dimensional compression for 2D segment trees; compress on-the-fly with dynamic segment trees as an alternative.
Before You Code Checklist
- [ ] Collected all values that need compression (including query boundaries, not just array elements).
- [ ] Sorted and deduplicated with
sort+unique+erase(or used aset). - [ ] Used
lower_boundto map original values to compressed indices—and verified 0-indexed vs. 1-indexed for BIT compatibility. - [ ] Confirmed the compressed index is used everywhere the original value was used as an index.
- [ ] Checked whether the problem requires mapping back from compressed index to original value (keep the sorted unique array accessible).
What Would You Do If...?
- You need to compress coordinates but also need the gap sizes between consecutive compressed values—how would you preserve distance information?
- New values arrive online (not all known upfront)—can you still use static compression, or do you need a different approach?
- Two different arrays need compression—should you compress them independently or merge all values into one shared mapping?
The Mistake That Teaches
A contestant compressed array values for a BIT but forgot to compress the query boundaries too. The BIT was indexed
Debug This
Bug 1:
cpp
// Compress values in array a[]
vector<int> sorted_a = a;
sort(sorted_a.begin(), sorted_a.end());
sorted_a.erase(unique(sorted_a.begin(), sorted_a.end()), sorted_a.end());
for (int i = 0; i < n; i++)
a[i] = lower_bound(sorted_a.begin(), sorted_a.end(), a[i]) - sorted_a.begin();
// Now use a[i] as 1-indexed BIT position
bit.update(a[i], 1); // Bug hereAnswer
lower_bound returns a 0-based index, but BITs typically require 1-based indexing. Use a[i] + 1 or a[i] = ... - sorted_a.begin() + 1 during compression.
Bug 2:
cpp
// Compress, then count inversions
vector<int> vals(a.begin(), a.end());
sort(vals.begin(), vals.end());
// Forgot to deduplicate!
for (int i = 0; i < n; i++)
a[i] = lower_bound(vals.begin(), vals.end(), a[i]) - vals.begin();Answer
Without unique + erase, duplicate values get different compressed indices (e.g., [3,3,5] compressed as [0,1,2] instead of [0,0,1]). This breaks any logic that depends on equal values mapping to the same index.
Bug 3:
cpp
// Compress coordinates for sweep line
vector<int> xs;
for (auto& seg : segments) {
xs.push_back(seg.l);
xs.push_back(seg.r);
}
sort(xs.begin(), xs.end());
xs.erase(unique(xs.begin(), xs.end()), xs.end());
// Query: how many segments cover point q?
int cq = lower_bound(xs.begin(), xs.end(), q) - xs.begin();
// Bug: q might not be in xsAnswer
If q was not added to xs before compression, lower_bound returns the position where q would be inserted, not a valid compressed index. All query points must be added to the compression set alongside segment endpoints.
Historical Origin
Coordinate compression is a discretization technique rooted in the computational geometry of the 1970s–80s, where sweep-line algorithms by Bentley and Ottmann needed to handle real-valued coordinates on integer-indexed structures. It became a competitive programming staple through USACO and IOI problems in the 2000s. A useful mnemonic to carry into contests: "Shrink the map, keep the rank—values change, order doesn't."
What to Solve Next
- Greedy—the next core technique; coordinate compression often appears as a preprocessing step in greedy interval problems.
- Practice Ladder—drill compression as a subroutine in BIT/segment-tree problems.